27 September 2009

Burning Man 2009: Coverage

On Monday (day 3), we had made our first coverage estimates, sending SMS to ourselves or to short codes that returned status information. We did this a few more times over the course of the week. We had hoped to use RRLP to automate the coverage estimation process, but so few phones were returning useful RRPL results that we had to do it by hand, with a bicycle, a test phone and a note pad. This is a rough estimate of our -90 dBm coverage within BRC:

"A", "B" and "C" are our three sectors, arranged to maximize coverage over the city and insure coverage at the front gate, the airport and the greeters' station. The "x" at 4:30 marks our camp and tower. The "0" at 3:00 marks Fusion Valley, the first link in our backhaul chain. These coverage estimates match those predicted by the Hata suburban model, the same propagation characteristic observed in BRC in our 2008 test.

Despite the failure of RRLP to produce data, we did get some automatic coverage information from measurements of the timing errors of RACH bursts arriving at each BTS. In a GSM system, the timing error of an arriving RACH burst relative to the signal framing yields a rough estimate of the round-trip propagation time from the BTS to the handset and back. We logged every one of the 2,018,321 validated RACH bursts we received and finally got around to sorting through some of that data late last week.

Analysis of RACH burst timing indicated that most of the arriving RACH bursts came from a distance of about 1 km (mode of 1 km and mean of 1.3 km) and nearly all of them (>99%) came from a distance of 3 km or less. That makes sense given our coverage, our tower location, the geometry of the city and the fact that BRC accounts for >90% of the area's population during the festival.

Outside BRC, though, we predicted that coverage would follow the Hata open rural model, meaning that once a handset got out of the shadow of all of the RVs, shipping containers and metal-framed domes, our range could be 15 km or more. There is some evidence of that in our logs. A small fraction of RACH (~0.5%) bursts arrived on sector A from clustered distances of 11 km and 15 km. Gerlach, NV was 15 km away in the direction of sector A, has a population of about 500 and could account for the cluster of access attempts at that distance. We still don't have a good explanation for the cluster of access attempts at 11 km.

25 September 2009

Burning Man 2009: Days 4-6 (part 2)

Sorry for the posting delay. Back to the blog. For now.

Apart from excessive registration activity, we had a few other technical challenges, most related to backhaul and power.

Our backhaul was a point-to-point WiFi link, running from a Nanostation 5 on our tower to a 30' tower in a camp called Fusion Valley, over at the 3:00 plaza. Fusion Valley, in turn, linked us to Center Camp where we got onto a 10 MB/s microwave relay back to an ISP in the "real world". The people running this network had been very supportive of our project and their system worked most of the time, but keeping anything working reliably on the Playa is hard and they were not granted any magical protection against that. Our static IP number changed and sometimes just quit working. A few times we lost backhaul completely, like the time someone at Fusion Valley plugged some power tools into the same circuit as the network gear, tripped the breaker and then just walked away for a while. Well, it's Burning Man. Stuff happens.

We were running a mostly-local service. Occasional loss of backhaul should not have been a serious problem, but it was, because Asterisk kept trying to run DNS and reverse-DNS queries. Asterisk would just lock up when it couldn't reach the internet. We tried replacing every hostname we could find with a numeric address, we tried filling out /etc/hosts by hand, we tried setting up a local DNS cache, we really tried our best to understand and trim down our Asterisk configuration, but we just could not stop Asterisk from freezing when the backhaul was down. Since we were using the Asterisk SIP registry has our HLR for location updating and SMS address resolution, a frozen Asterisk server also shut down everything else we were doing.

Another technical problem was power. Here, we did something dumb: we assumed that new batteries would be topped off, and so we didn't bother checking the acid levels. We had enough battery capacity to run the full system for at least 10 hours but the first time we tried to leave the system on overnight we woke up to a low voltage alarm at 4:00. Since we didn't want to run a generator right next to the tent at 4:00, we shut everything down and went back to bed more than a little concerned about the batteries. The next morning we started the generator but the batteries were not responding well, not taking a charge. By the afternoon, though, we thought to check the acid levels and found they were very low. We topped off the batteries with water and had no more problems that week, except for a little boil-over from over-filling. BTW: Playa dust is excellent for neutralizing battery acid spills.

There were also a few loose ends for John to tie up in the SMS server, like saving and reloading the message queue and "bouncing" undeliverable message back to their senders. He also added some test features, like a "411" short code that would return system status information, that were really handy for coverage testing. These weren't really problems, though, just straightforward development.

By day 6, Thursday, we had fixed the registration loads, had decent power and backhaul, and a good feature set on the SMS server. We were (finally) starting to have a stable system.

18 September 2009

Burning Man 2009: Days 4-6 (part 1)

OK, this is where the blog gets technical.

Day 4 (Tuesday) started with fried eggs, cornbread pancakes and Turkish coffee, followed by the departure of the tower crew. We were all disappointed that they could not stay for the full week. Hopefully, they will next year. After we said our goodbyes and Martin's SUV disappeared into the dust, we turned our attention back the the BTS units.

When we first turned on the system at full power, we got flooded with location updating (GSM registration) requests. We had expected this. We had seen it the year before. We had heard about it from network engineers from big carriers. I had seen it in IMSI catchers. We expected it would die down after a couple of hours, too, but there we were wrong. Now what? We had to determine the cause of the excessive load, we had to slow it down, and we had to make whatever changes we could to allow the system to serve provisioned test users even in the face of this load.

Over the next two days of experiments and analysis, we found multiple causes for the excessive registration load.

  • We were using the "IMSI attach" protocol, meaning that any time a phone slipped out and service and came back it would attempt to register again. That's great for keeping up-to-date presence information in small networks, but it was a disaster in a large network with spotty coverage along the edges.
  • We were trying to do RRLP queries during each registration. RRLP is the protocol used to communicate with the GPS receivers inside most US handsets. We had hoped to make maps of coverage in real time and Alon had put a lot of work into that. But without GPS assistance data the useful RRLP response rate was less than 1% and a lot of the requests were timing out, causing phones to spend more time on the radio channel and exacerbating the loading problem.
  • We had the cell reselect hysteresis set too low, causing phones to change cells (and thus request registration) too frequently.
  • We had the registration timer (T3212) set too low. Again, a small T3212 was good for presence tracking in small networks, but a poor choice in this environment.
  • We were using a random exponential back-off for access requests but were resetting the back-off timer value (T3122) any time there was a successful channel grant. This was too permissive.
  • There was another GSM network on the playa, Commnet Wireless, operating from Frog Pond, a hot spring on private property a few miles away. Any time a phone passed into Commnet Wireless' service and back out again, it would request another registration.

Most of these problems could be fixed with configuration changes and software modifications, however:

  • We could not set T3212 above one hour because asterisk would not allow us to set the SIP registry timeout to more than an hour.
  • Commnet was just something we would have to live with.
(Aerial photo of the Commnet site at Frog Pond.)

Fixing those problems greatly reduced the registration problem. We made two additional changes, though, to insure smooth operation:

  • We reserved a few control channels for non-registration activities. This insured that provisioned users would have a chance to get service even during surges of registration activity.
  • We added a control loop that would automatically adjust downlink power to limit congestion. This was a huge improvement in terms of overall system behavior, since it would automatically ramp up power at just the right rate whenever we had to restart a BTS unit.
Of course, congestion management was not the only problem we had to solve. More on that in part2.

Burning Man 2009: Day 3

On the morning of day 3 (Monday) the tower was up but not fully wired. Martin, the sponsor who had actually provided the tower, climbed up to have a look around. Our neighborhood was getting built out quickly on this first day of public access.

The first order of the morning was for Arturo to mount the third cell sector antenna, run the RF cabling, and set up our wifi link. Once that was complete, Arturo took a break to use Bill's spiffy field shower and Star, a visitor to the camp and friend of the project, volunteered to install the light-up call sign that Bill had brought.

By the late morning, we had the BTS units laid out on a trailer and wired-up under a shade tarp. We were ready to start testing.

First, we started up the BTS units without the power amplifiers and made a few test calls with handsets in the tent. Things looked normal, so we turned on the power amps. We got an immediate flood of registration attempts, which was expected. The flood continued, though, for well over two hours, which was not expected. By the third hour we turned off the power amplifiers, took a look at the logs, and started to think about why the registration load was so high and what we might to do manage it. (More on that the next post.) Meanwhile, the tower guys (Arturo, Victor and Martin) sat out on the back of the truck playing guitar and signing Mexican tunes, which was a nice touch on a windy afternoon in the high desert.

That evening, we made beef stew with the leftover meat and bones from Sunday's steaks. The tower crew got a much-deserved night out on the town in Black Rock City, starting with a tour of the city from Ranger Velveeta (Annie) and ending with a visit to Vamp Camp in the 7:30 plaza. We met back together at 3am and talked about plans for the next morning.

17 September 2009

Burning Man 2009: Days 1 & 2

So I'm finally getting around to writing up some of our experiences at Burning Man.

Let's start with the first two days. We came in with early arrival passes on Saturday 29 August and most of us arrived in the afternoon. The first order of business was to set up the shelter before dark. John Gilmore dropped by to lend a hand and say hello. Then we set up the kitchen. The tower crew arrived that evening and we made all had a hot dinner together in the cool desert night.

The plan for the next morning was to start early and have the tower erected by noon, but no plan survives contact with the Playa. The first complication was that the ground was much harder than the tower crew had expected. They were expecting sand and loose rock, typical of their other desert installations in northern Mexico and the southwestern US. The Playa was different: 20-30 cm of brittle gypsum laying over dense, damp clay. They had made a set of screw-in anchors, but the screw pitch was too steep and anchors were impossible to drive in this hard ground. They considered borrowing some welding equipment from DPW to re-engineer the screws, but in the end we just stripped off the threads and drove the bare stakes (each about 1.7 m long) directly into the playa with a sledgehammer. Dealing with the anchors blew the whole morning and by noon it was starting to get windy.

We spent the rest of the day erecting the tower, which meant a lot of sitting around waiting for short breaks in the wind, especially as we got closer to the final height of 21 m. There was also a brief distraction when a gust of wind nearly took our shelter, ripping most of the grommets out of our tarps and breaking several of the PVC ribs. Mr. Gilmore literally saved our camp with a box of spare nylon webbing and clips left over from his own shade structure.

Meanwhile, Bill was busy rigging up our shower and greywater evaporation pond so that we could clean up properly when all this backbreaking work was complete.

By late afternoon, our main tower technician, Arturo, was hoisting the final tower section, with two antennas attached. He wrestled them into place in a 40-kt wind and a small crowd on the ground applauded. Somewhere in all that drama Cameragirl and Chaos came by to "inspect our erection" and critique our guy lines. We contemplated putting a wind turbine on the top of the tower, but decided against it. Given the wind conditions, that wind turbine could have supplied most of our power most of the time, but we were concerned that the installation would have been just too dangerous, so we would be relying on the gasoline generator all week.

That evening there were 12 for dinner in the camp. We had steak and mashed potatoes and everyone toasted Arturo for his tower work. He was definitely the man of the evening. Even though the tower itself was erected by sundown, the electronics were still not fully installed, meaning that we were at least half a day behind schedule already. But we had set up a 21 m tower in a windstorm without any injuries, so we saw the day as a success. Such is life in BRC: you frequently adjust your expectations to match reality.

09 September 2009

We're Back

We cleaned out and returned the truck today, marking the end of our 11-day site test in Black Rock City, Nevada, site of the Burning Man festival. Like every other project in BRC, we had our ups and downs, but the trip was a success in the sense that we learned an awful lot about operating a full-range cell site under continuous load from thousands of handsets.

There were a lot of successes. We were happy with the hardware performance and packaging. We rigged a 70' tower in high winds with no injuries. Operating range matched our predictions. We discovered and fixed several bugs under conditions that would have been hard to simulate in a lab. Backhaul integration worked, both for +1 NANP and +883 iNum.

There were also problems. We had problems with Asterisk configuration and performance. We lost IP connectivity a few times every day. The presence of the Commnet Wireless GSM network complicated our operations in unexpected ways. Test users did not follow even the simplest of instructions. We will better understand what really happened as we sort through the hundreds of megabytes of logs in the flash drives of the BTS units and the CDRs and logs of Asterisk and our SMS sever.

There were delights, too. We had visits from a lot of well-wishers and supporters, many of whom volunteered their efforts and brought us gifts. We had good neighbors (the "popcorn guys") and shared dinner with them on a few nights. And we had a lot of fun, both in the hacking sessions and out on the playa. (Alon even chipped a tooth fighting in Thunderdome at the Goth camp, part of the complete Burning Man experience.)

We'll write this all up over the next few days as we sort through the data, but I'll close with some photos of our BRC neighborhood (4:30 & H) under the full moon, me and Alon in our network "ops center" (three laptops and a couple of PAP2 phones in a tent) and us standing around looking concerned while Pavel climbs the tower to strip it: