OK, this is where the blog gets technical.
Day 4 (Tuesday) started with fried eggs, cornbread pancakes and Turkish coffee, followed by the departure of the tower crew. We were all disappointed that they could not stay for the full week. Hopefully, they will next year. After we said our goodbyes and Martin's SUV disappeared into the dust, we turned our attention back the the BTS units.
When we first turned on the system at full power, we got flooded with location updating (GSM registration) requests. We had expected this. We had seen it the year before. We had heard about it from network engineers from big carriers. I had seen it in IMSI catchers. We expected it would die down after a couple of hours, too, but there we were wrong. Now what? We had to determine the cause of the excessive load, we had to slow it down, and we had to make whatever changes we could to allow the system to serve provisioned test users even in the face of this load.
Over the next two days of experiments and analysis, we found multiple causes for the excessive registration load.
- We were using the "IMSI attach" protocol, meaning that any time a phone slipped out and service and came back it would attempt to register again. That's great for keeping up-to-date presence information in small networks, but it was a disaster in a large network with spotty coverage along the edges.
- We were trying to do RRLP queries during each registration. RRLP is the protocol used to communicate with the GPS receivers inside most US handsets. We had hoped to make maps of coverage in real time and Alon had put a lot of work into that. But without GPS assistance data the useful RRLP response rate was less than 1% and a lot of the requests were timing out, causing phones to spend more time on the radio channel and exacerbating the loading problem.
- We had the cell reselect hysteresis set too low, causing phones to change cells (and thus request registration) too frequently.
- We had the registration timer (T3212) set too low. Again, a small T3212 was good for presence tracking in small networks, but a poor choice in this environment.
- We were using a random exponential back-off for access requests but were resetting the back-off timer value (T3122) any time there was a successful channel grant. This was too permissive.
- There was another GSM network on the playa, Commnet Wireless, operating from Frog Pond, a hot spring on private property a few miles away. Any time a phone passed into Commnet Wireless' service and back out again, it would request another registration.
Most of these problems could be fixed with configuration changes and software modifications, however:
- We could not set T3212 above one hour because asterisk would not allow us to set the SIP registry timeout to more than an hour.
- Commnet was just something we would have to live with.
(Aerial photo of the Commnet site at Frog Pond.)
Fixing those problems greatly reduced the registration problem. We made two additional changes, though, to insure smooth operation:
- We reserved a few control channels for non-registration activities. This insured that provisioned users would have a chance to get service even during surges of registration activity.
- We added a control loop that would automatically adjust downlink power to limit congestion. This was a huge improvement in terms of overall system behavior, since it would automatically ramp up power at just the right rate whenever we had to restart a BTS unit.