18 September 2009

Burning Man 2009: Days 4-6 (part 1)


OK, this is where the blog gets technical.

Day 4 (Tuesday) started with fried eggs, cornbread pancakes and Turkish coffee, followed by the departure of the tower crew. We were all disappointed that they could not stay for the full week. Hopefully, they will next year. After we said our goodbyes and Martin's SUV disappeared into the dust, we turned our attention back the the BTS units.

When we first turned on the system at full power, we got flooded with location updating (GSM registration) requests. We had expected this. We had seen it the year before. We had heard about it from network engineers from big carriers. I had seen it in IMSI catchers. We expected it would die down after a couple of hours, too, but there we were wrong. Now what? We had to determine the cause of the excessive load, we had to slow it down, and we had to make whatever changes we could to allow the system to serve provisioned test users even in the face of this load.

Over the next two days of experiments and analysis, we found multiple causes for the excessive registration load.

  • We were using the "IMSI attach" protocol, meaning that any time a phone slipped out and service and came back it would attempt to register again. That's great for keeping up-to-date presence information in small networks, but it was a disaster in a large network with spotty coverage along the edges.
  • We were trying to do RRLP queries during each registration. RRLP is the protocol used to communicate with the GPS receivers inside most US handsets. We had hoped to make maps of coverage in real time and Alon had put a lot of work into that. But without GPS assistance data the useful RRLP response rate was less than 1% and a lot of the requests were timing out, causing phones to spend more time on the radio channel and exacerbating the loading problem.
  • We had the cell reselect hysteresis set too low, causing phones to change cells (and thus request registration) too frequently.
  • We had the registration timer (T3212) set too low. Again, a small T3212 was good for presence tracking in small networks, but a poor choice in this environment.
  • We were using a random exponential back-off for access requests but were resetting the back-off timer value (T3122) any time there was a successful channel grant. This was too permissive.
  • There was another GSM network on the playa, Commnet Wireless, operating from Frog Pond, a hot spring on private property a few miles away. Any time a phone passed into Commnet Wireless' service and back out again, it would request another registration.

Most of these problems could be fixed with configuration changes and software modifications, however:

  • We could not set T3212 above one hour because asterisk would not allow us to set the SIP registry timeout to more than an hour.
  • Commnet was just something we would have to live with.
(Aerial photo of the Commnet site at Frog Pond.)

Fixing those problems greatly reduced the registration problem. We made two additional changes, though, to insure smooth operation:

  • We reserved a few control channels for non-registration activities. This insured that provisioned users would have a chance to get service even during surges of registration activity.
  • We added a control loop that would automatically adjust downlink power to limit congestion. This was a huge improvement in terms of overall system behavior, since it would automatically ramp up power at just the right rate whenever we had to restart a BTS unit.
Of course, congestion management was not the only problem we had to solve. More on that in part2.


7 comments:

  1. Thank you for your report, but we're still waiting for the next technical part !
    ;-)

    ReplyDelete
  2. Yes we are :)

    ReplyDelete
  3. Hi i was wondering if the openbts supports call back and call through solutions if hooked to a callback or call through gateway? i was also trying to get my head around the need for inum if openbts is a network on its own (so to speak) isn't it capable of generating its own direct dial in numbers (DDI or DID)? just curious... and a big congrats at Niue wishing even greater success at future tests.

    ReplyDelete
  4. Mryaxx -

    We can assign arbitrary extensions to handsets in Asterisk, but to support inbound calls from the PSTN you need a real-world phone number, a DID. We can assign DIDs to handsets in an OpenBTS network and route inbound calls from the real world, but we cannot just generate our own DIDs. We have to get them from a real-world telephone carrier and they cost money. As part of their support for this test, Voxbone loaned us a block of 10,000 iNum DIDs. We also leased some California DIDs from Link2Voip to use as contact numbers for the FCC, Commnet, etc.

    -- David

    ReplyDelete
  5. Hello.. Firstly I would like to send greetings to all readers. After this, I recognize the content so interesting about this article. For me personally I liked all the information. I would like to know of cases like this more often. In my personal experience I might mention a book called Green Parks Costa Rica in this book that I mentioned have very interesting topics, and also you have much to do with the main theme of this article.

    ReplyDelete
  6. thank you for your detailed description.

    ReplyDelete