Turning off WiFi doesn't set a route after the modem connects data
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | Canonical System Image |
High
|
Canonical Phone Foundations | ||
| | network-manager (Ubuntu) |
High
|
Tony Espy | ||
| | network-manager (Ubuntu RTM) |
High
|
Tony Espy | ||
| | ofono (Ubuntu) |
High
|
Tony Espy | ||
| | ofono (Ubuntu RTM) |
High
|
Tony Espy | ||
Bug Description
I just switched off WiFi in order to test if the device switches successfully to a mobile data connection. After disabling WiFi the indicator showed "H" in no time. So it would seem all is fine. I opened the browser and no data went through.
Here's the output of "ip route" and "list-contexts":
phablet@
[sudo] password for phablet:
phablet@
[ /ril_1 ]
[ /ril_0 ]
[ /ril_0/context1 ]
Name = E-Plus Web GPRS
Settings = { Netmask=
Username = eplus
Protocol = ip
Active = 1
Password = internet
Type = internet
[ /ril_0/context2 ]
Name = E-Plus MMS
Username = mms
Settings = { }
Protocol = ip
Active = 0
Password = eplus
Type = mms
[ /ril_0/context3 ]
Name = ___ubuntu_
Settings = { }
Username =
Protocol = ip
Active = 0
Password =
Type = internet
phablet@
current build number: 21
device name: krillin
channel: ubuntu-
last update: 2015-03-13 17:11:07
version version: 21
version ubuntu: 20150312
version device: 20150310-3201c0a
version custom: 20150216-561-29-186
| description: | updated |
| description: | updated |
| Changed in network-manager (Ubuntu): | |
| assignee: | nobody → Tony Espy (awe) |
| Changed in network-manager (Ubuntu RTM): | |
| assignee: | nobody → Tony Espy (awe) |
| Changed in network-manager (Ubuntu): | |
| importance: | Undecided → High |
| Changed in network-manager (Ubuntu RTM): | |
| importance: | Undecided → High |
| tags: | added: connectivity |
| Changed in network-manager (Ubuntu RTM): | |
| status: | New → Confirmed |
| Tony Espy (awe) wrote : | #1 |
| Changed in network-manager (Ubuntu RTM): | |
| importance: | High → Critical |
| Changed in network-manager (Ubuntu): | |
| importance: | High → Critical |
| Tony Espy (awe) wrote : | #2 |
This doesn't seem to be easily reproducible, which strengthens the theory that this is a race condition between rild and Network Manager.
When the mobile data connection is active, and WiFi connected, the main routing table looks like this:
# ip route show
default via <address> dev wlanX proto static
<address> dev ccmni0 proto static scope link
<address> dev wlanX proto kernel scope link src <address> metric 9
When WiFi is disabled, it looks like this:
# ip route show
default via <address> dev ccmni0 proto static
<address> dev ccmni0 proto static scope link
Note, rild and Network Manager use a different proto value when adding routes ( static vs. kernel ).
| Tony Espy (awe) wrote : | #3 |
Also, due to the use of hybris to disable/enable WiFi on krillin, the WiFi device's numeric component is incremented every time WiFi is power cycled ( eg. wlan0, wlan1, wlan2, ... ).
| description: | updated |
| Tony Espy (awe) wrote : | #4 |
I lowered the priority to High, as this problem is hard to reproduce.
Also, the one time I was able to reproduce it, the connection was re-established after some time.
| Changed in network-manager (Ubuntu): | |
| importance: | Critical → High |
| Changed in network-manager (Ubuntu RTM): | |
| importance: | Critical → High |
| Tony Espy (awe) wrote : | #5 |
@MIchael
When this happened, did you wait and see whether or not the connection came back on it's own?
If not, the next time you see it happen, can you check periodically for a few minutes to see if the connection is restored. There's a 5m internal timeout in NetworkManager that may be involved.
If the connection comes back on it's own, then the issue is less serious than if it's stuck in this state.
| Michael Zanetti (mzanetti) wrote : | #6 |
@Tony
Today I left home twice, once it worked fine, the other time I ran into this again. This kinda matches my experience of running into this every other time I leave home. Maybe a bit less. I would estimate it to about 40% of the times I leave my home. Reproducing it manually by turning WiFi off doesn't trigger it as often as with walking out WiFi range.
For me it definitely doesn't recover on its own, at least not with just waiting a few minutes. However, toggling flight mode on the phone usually makes the connection recover for me.
IMO this is quite critical, and even if it would recover on its own after a while, there's nothing more annoying than being on the go and first having to toggle flight mode, or waiting 10 minutes before being able to use the map, or quickly googling something.
Today for example we went for a walk, and after walking for some 20 minutes I wanted to look up the map. Pulled out the phone, the indicator says "H", I open the map and it doesn't load. Knowing that mobile data can be flaky at times, I waited for about 5 minutes for the map, keeping on tapping the screen in order to not get the map app suspended. Then I decided to toggle flight mode which then finally got me some map data. By then obviously we already walked in some wrong direction and were on our way back already. So this issue has the characteristic to always strike when you quickly need some mobile data.
| Tony Espy (awe) wrote : | #7 |
@Michael
Thanks for the feedback.
I'd like keep this bug specifically for the problem that occurs when WiFi is toggled off ( as per the bug description and summary ). The problem when going out of range of the access point may be something completely different, and is being addressed in bug #1410113.
Also, problems with the location service while related, should be considered separate too.
Regarding the toggle WiFi problem, I created a stress test to try and reproduce the problem, while I will have reviewed on Monday, to ensure that I didn't get anything wrong. The WiFi toggle switch, contrary to my original understanding, doesn't toggle the urfkill switch directly, instead, it toggles the value of the global NetworkManager property 'WirelessEnabled'. NM in turn will enable/disable WiFi via urfkill, which in turn on krillin uses hybris to load/unload the WiFi driver.
I believe the problem we hit when WiFi is disabled, and the routing table is empty, is caused by a race between rild and NetworkManager, however until I can reproduce, it's just that... a theory.
As mentioned, I was able to reproduce this once two days ago, but haven't managed to reproduce it since. I've run 500 iterations of enable/disable WiFi using my stress test, and haven't yet hit the issue. This is why I reduced the Importance of *this* scenario to High. The out-of-range problem will now be my priority.
Finally, one other bug that may compound this problem is a long-standing issue with the network indicator which shows extreme latency sometimes displaying the correct network connection to the user. So even though the indicator may show that you have a mobile data connection, it may not actually be showing the true state of things. See bug #1339792 for more details. There's a concerted effort to fix issues with the indicator; ubuntu silo-6 contains a new version which hopefully will improve this situation.
| Tony Espy (awe) wrote : | #8 |
The attached script is a simple stress test that's run on the phone. It toggles WiFi on, sleeps for 10 seconds, then toggles WiFi off and checks for an empty routing table, and then sleeps for another 10 seconds. I currently has a hard-coded loop count.
The script enables/disables WiFi by toggling NM's 'WirelessEnabled' property.
To work properly, it needs to be run with a previously connect WiFi access point available. It also assumes that a valid SIM card is inserted in slot 1 of a krillin, as it checks the routing table for a 'ccmni0' device when WiFi gets disabled.
Finally, I also usually ensure that the phone will not lock the screen when this test is run by setting the system settings privacy setting such that the phone is never locked.
| Sebastien Bacher (seb128) wrote : | #9 |
I'm hitting what seems a similar issue on bq/rtm for some days. Today I turned off wifi to test something from a different ip and I got the 3G icon, nmcli shows ril_1 having an active connection but "ip route" is empty, several hours later my device is still not getting any data through
| Sebastien Bacher (seb128) wrote : | #10 |
the wifi was turned off around 12:22 in that log iirc
| Changed in canonical-devices-system-image: | |
| status: | New → Confirmed |
| Launchpad Janitor (janitor) wrote : | #11 |
Status changed to 'Confirmed' because the bug affects multiple users.
| Changed in network-manager (Ubuntu): | |
| status: | New → Confirmed |
| Tony Espy (awe) wrote : | #12 |
@Sebastien
We have a proposed ofono fix that definitely improves the situation, however nobody else with Canonical besides yourself an Michael have been able to reproduce on RTM. See bug #1435328 for an update on the analysis so far.
Could you install the version ( 1.12.bzr6894+
https:/
...and also make sure you have network-manager version 0.9.10.0-4ubuntu14 installed. It's currently in silo 009, but should be landing in the archive shortly.
I'm trying to determine if we need a fix to network-manager in RTM as well, per Alfonso's comments in the other bug.
| Changed in ofono (Ubuntu): | |
| status: | New → In Progress |
| Changed in ofono (Ubuntu RTM): | |
| status: | New → In Progress |
| Changed in ofono (Ubuntu): | |
| importance: | Undecided → High |
| Changed in ofono (Ubuntu RTM): | |
| importance: | Undecided → High |
| assignee: | nobody → Tony Espy (awe) |
| Changed in ofono (Ubuntu): | |
| assignee: | nobody → Tony Espy (awe) |
| Changed in canonical-devices-system-image: | |
| importance: | Undecided → High |
| milestone: | none → ww19-ota |
| status: | Confirmed → Fix Committed |
| Changed in network-manager (Ubuntu): | |
| status: | Confirmed → Incomplete |
| Changed in network-manager (Ubuntu RTM): | |
| status: | Confirmed → Incomplete |
| Changed in ofono (Ubuntu): | |
| status: | In Progress → Fix Released |
| Changed in ofono (Ubuntu RTM): | |
| status: | In Progress → Fix Released |
| Changed in canonical-devices-system-image: | |
| assignee: | nobody → Canonical Phone Foundations (canonical-phonedations-team) |
| Changed in canonical-devices-system-image: | |
| status: | Fix Committed → Fix Released |
| Tony Espy (awe) wrote : | #13 |
Note, Alfonso just hit the empty routing table bug again today while testing my flight-mode fixes for arale.
After some discussion, we both think that the lxc-android-config NM dispatcher script 02default_
That said, in Alfonso's latest case, the routing table was empty when switching mobile data from one SIM to the other. Perhaps it's not the script that's wiping the table, but NM's core routing logic itself. One modem is coming down, and one is going up, it could be that the adding of routes for the new SIM and the removal of routes for the first SIM are colliding.

This sounds similar to bug #1410113, which is a more generic bug. I'm not going to mark this a duplicate however as it has a reproducible scenario.
Also confirmed this bug for RTM, as I reproduced it on the third try. I verified that there was a mobile data connection and that it worked ( ie. I could ping ubuntu.com ), activated a WiFi connection, verified the routing table, and then disabled WiFi. On the third cycle, I ended up with an empty routing table:
phablet@ ubuntu- phablet: ~$ netstat -run
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
...that said, at some point, it looks like the system did recover at some later point, as it appears to have magically healed itself, and I know have a valid routing table, and usable mobile data connection.
I will continue to work on reproducing, and will add logs, and other data if/when I can do so again.
phablet@ ubuntu- phablet: ~$ system-image-cli -i touch/ubuntu- rtm/14. 09
current build number: 20
device name: krillin
channel: ubuntu-
last update: 2015-03-26 15:44:56
version version: 20
version ubuntu: 20150312
version device: 20150310-3201c0a
version custom: 20150216-561-29-186