After connection drops, mobile-data takes ~5m to re-connect

Bug #1418077 reported by Jonas G. Drange on 2015-02-04
78
This bug affects 13 people
Affects Status Importance Assigned to Milestone
Canonical System Image
High
Canonical Phone Foundations
network-manager (Ubuntu)
Critical
Tony Espy

Bug Description

If the mobile data connection drops due to signal loss or modem reset ( eg. changing radio technology, setting 3g slot, ... ), there's a delay of 5m before NetworkManager attempts to re-connect.

The original bug pertained to changing the radio technology ( which causes the modem to unregister and then re-register using the new technology ), however the problem is more generic. The steps to reproduce focus on this case, as this is the easiest way to get the modem to drop the connection for long enough for the problem to occur.

Forcing the modem to disconnect via wrapping the device in tin-foil should also trigger the problem.

Steps to reproduce:
1. Confirm cellular data (disable wifi)
2. Change TechnologyPreference on online SIM
3. Confirm no cellular data
4. Wait ~10 minutes
5. Confirm cellular data

What happens:
Changing something on the modem causes cellular data to disappear for 10 minutes

What should happen:
It should behave as before, i.e. cellular data should come back immediately

current build number: 97
device name: krillin
channel: ubuntu-touch/devel-proposed
alias: ubuntu-touch/vivid-proposed
last update: 2015-02-02 12:48:25
version version: 97
version ubuntu: 20150202
version device: 20150128-5379bdb
version custom: 20150202

Related branches

summary: - Deactivating all contexts, disconnecting the device, then re-activating
- a context yields no Internet connection
+ NM fails to create a cellular data connection for 10 minutes
description: updated
summary: - NM fails to create a cellular data connection for 10 minutes
+ NM fails to create a cellular data connection for a unacceptably long
+ time

interestingly... on mako i'm only seeing this when switching from umts to gsm, not switching from gsm to umts.

Changed in network-manager (Ubuntu):
status: New → Confirmed
Ken VanDine (ken-vandine) wrote :

I've also reproduced this when disabling wifi, it took nearly 6 minutes before network-manager tried to activate the ofono context.

I've seen some reports of these; but without logs this is the kind of thing that is impossible to debug.

Could someone seeing this enable debug logs in NetworkManager (editing /etc/init/network-manager.conf and adding --log-level=debug --log-domains=core,mb to the Exec line) as well as enabling ofono debug logs (same idea, add -d to the Exec line), and try to follow the states reported from /usr/share/ofono/scripts/list-modems as the time passes, since this seems to be roughly limited to about a 10 minutes window?

Logs would end up in /var/log/syslog; the full syslog is useful to debug this behavior.

Changed in network-manager (Ubuntu):
status: Confirmed → Incomplete
importance: Undecided → High
Jonas G. Drange (jonas-drange) wrote :
Jonas G. Drange (jonas-drange) wrote :
Changed in network-manager (Ubuntu):
status: Incomplete → Confirmed
Jonas G. Drange (jonas-drange) wrote :

Those logs are from the 2 sec before changing the tech pref on /ril_1, to when data started working again.

Bill Filler (bfiller) on 2015-02-06
Changed in network-manager (Ubuntu):
importance: High → Critical
Pat McGowan (pat-mcgowan) wrote :

reportedly affects rtm as well

Changed in canonical-devices-system-image:
importance: Undecided → High
milestone: none → ww09-2015
status: New → Confirmed
Changed in network-manager (Ubuntu):
assignee: nobody → Mathieu Trudel-Lapierre (mathieu-tl)
Changed in canonical-devices-system-image:
assignee: nobody → Michael Frey (mfrey)
Pat McGowan (pat-mcgowan) wrote :

from what I see this only happens when changing the technology in settings, whereas when turning off wifi or when coming up at boot cell data is ready right away

seems less critical given that is not a frequent occurrence although obviously a lousy user experience

Martin Pitt (pitti) wrote :

I see a similar effect on mako vivid-proposed. Sometimes after doing nothing for a few hours I pick up the phone and find it having no data connection at all, without any action from me. This could potentially be because of short 3G network outages. I can confirm that switching from 3G to 2G in the indicator triggers that.

BTW, I tried to enable/disable flight mode, disable/enable mobile data, and further toggle between 2G and 3G, and nothing of that brings it back. Although I didn't wait for a very long time, I'll give it 30 mins now.

Martin Pitt (pitti) wrote :

Normally it looks like this:

$ ip route
default via 10.51.99.33 dev rmnet_usb0 proto static metric 1024
10.51.99.33 dev rmnet_usb0 proto static scope link metric 1024
100.148.199.0/24 dev rmnet_usb0 proto kernel scope link src 10.51.99.34

But after this happens, "ip route" is completely blank and "ip a |grep UP" only shows lo.

Martin Pitt (pitti) wrote :

"sudo service network-manager restart" helps, this recovers the 3G connection within a few secs.

Bill Filler (bfiller) wrote :

The other big problem here is that when I'm connected to wifi, I expect to have a 3G connection as well so things like MMS will work. Currently when I'm connected to wifi MMS does not work at all - I cannot recv MMS attachements because those require a data connection and there doesn't seem to be one.

Tony Espy (awe) wrote :

So to summarize, I see the following problems being reported here:

1. vivid/krillin: when mobile data is active, and radio technology preference is changed, mobile data connection is not always restored.

2. vivid/mako: disable wifi, mobile data connection takes >= 6m to be restored. Note, when this happens, be sure to check 'nmcli d', as there's been an outstanding indicator-network bug around *forever* about too much latency between panel icon changes. See bug #1339792.

3. Hearsay ( from pmcgowan ) that these problems occur on RTM too?

4. Reports from pitti that mobile data occasionally drops and isn't restored. @pitti, what device are you using? This sounds similar to a bug #1425987 reported by Jean Baptiste.

5. No mobile data connection when WiFi connection active; this affects MMS.

So... let's keep this bug for discussions related to the original problem, ie. when the radio data technology is changed, NM takes ~5-6m to restore mobile data connection.

If problem with disabling WiFi can be reproduced, then Ken, please file a new bug for it.

Let's use Jean-Baptiste's bug to track cellular connection dropping and not being restored.

Bill, please file a new bug regarding your MMS issue. We actually do in fact keep the mobile data connection active when WiFi is activated, NM just changes the default route. Note, you may be seeing issues due to issues with US operators.

Tony Espy (awe) wrote :

Also, running mako/vivid-devel/#122, I was able to re-create Jonas' original bug.

I see a delay of 5-6m between changing radio technology and mobile data being re-established. I see it going from 2g -> 3g and visa versa.

Looking at the log, when the technology changes, the NM SCPlugin-Ofono runs as it watches the ofono settings files via inotify. I wonder if this is part of the puzzle. Also, I checked and ofono's ConnectionManager re-attaches pretty quick with the new technology, so the delay is definitely on NM's side.

Tony Espy [2015-03-05 1:36 -0000]:
> 4. Reports from pitti that mobile data occasionally drops and isn't
> restored. @pitti, what device are you using?

vivid-proposed on mako.

> This sounds similar to a bug #1425987 reported by Jean Baptiste.

Right. I got that more often last weekend when I was travelling.
Roaming between cells and going through areas without 3G (or even 2G)
coverage triggers that a lot more often. The main difference for me is
that data often does not come back on its own at all, even after hours
(even at home). I have to restart NM several times for it to come
back.

But indeed the original bug report here gives a nice reproducer
(changing data connection type). Let's fix that and then see how much
it fixes the other use cases.

Tony Espy (awe) on 2015-03-09
summary: - NM fails to create a cellular data connection for a unacceptably long
- time
+ After radio technology is changed, mobile-data takes ~5m to re-connect
Changed in network-manager (Ubuntu):
assignee: Mathieu Trudel-Lapierre (mathieu-tl) → Tony Espy (awe)
status: Confirmed → In Progress
Changed in canonical-devices-system-image:
status: Confirmed → Won't Fix
status: Won't Fix → In Progress
milestone: ww09-2015 → ww13-2015

So what seems to be happening, is that the radio technology switch causes the modem to unregister from the current carrier, and attempt to re-register using the new tech preference ( eg. umts, lte, gsm, ... ). This will cause the ofono ConnectionManager 'Attached' property to return FALSE until the phone re-registers and re-attaches to GPRS.

NM detects the device change state from 'Activated' to 'Failed' ( more on this later ) and in turn schedules an auto-activation attempt of the connection ( context1 ). As the modem's not attached, the activation fails and the retry_count gets decremented, and it tries again repeatedly ( with no delay ), until the retry_count ( initialized to four ) drops to zero.

At this point, NM disables the connection and adds a timeout of 300s in the future which will call a policy function that resets the connection's retry_count, and triggers a new autoconnect sequence. 300s / 60s/m == 5m, our mysterious delay.

What we need is some mechanism for autoconnect to be blocked when the modem is not registered. It looks like nm-device-modem.c implements something like this in it's device_state_changed(). It checks a bunch of reason codes, but NM_DEVICE_STATE_REASON_MODEM_NO_CARRIER ( which is the error returned by stage1 if the modem isn't registered ) isn't checked.

The code also would need a mechanism which would re-enable the autoconnect when the device becomes enabled again ( ie. when 'Attached' becomes TRUE again ).

Also with regards to the device 'Activated' to 'Failed' transition, it's a bit odd that 'Disconnected' isn't used, as 'Failed' causes a retry attempt to be wasted ( see nm-policy.c ) because device_state_changed() treats a new state of FAILED as a connect failure and decrements the retry_count.

Tony Espy (awe) wrote :

So confirmed that adding a case for NM_DEVICE_STATE_REASON_MODEM_NO_CARRIER in device_state_changed ( nm-device-modem.c ), causes the connection to be blocked from auto-connecting.

I still see the initial retries decrement log message in device_state_changed ( nm-policy.c ), I mentioned this is due to the transition of the device from 'Activated' to 'Failed', it's not a new activation attempt. When this is logged( w/retries=4 ), I don't see any subsequent attempts to retry, ever.

nm-policy controls the clearing of the connection's 'autoconnect_blocked_reason' ( which is what gets used in this scenario to block autoconnect ), and at the moment, nothing done by the NM ofono code is triggering the right state changes to clear the reason when 'Attached' becomes TRUE again. Nor is there a timer set, as is done when an autoconnect sequence exceeds it's retry count ( ie. the ~5m timer mentioned above ).

tags: added: lt-category-visible
Tony Espy (awe) on 2015-03-12
summary: - After radio technology is changed, mobile-data takes ~5m to re-connect
+ After connection drops, mobile-data takes ~5m to re-connect
Tony Espy (awe) on 2015-03-12
description: updated
Changed in canonical-devices-system-image:
assignee: Michael Frey (mfrey) → Canonical Phone Foundations (canonical-phonedations-team)
Leopoldo Pena (leopenausa) wrote :

Probably related bug here:
https://bugs.launchpad.net/canonical-devices-system-image/+bug/1435328

In this case it doesn't reconnect at all.

tags: added: connectivity
Tony Espy (awe) wrote :

@Leopoldo

Actually it's a different problem, as this is a specific bug with the version of network-manager in vivid, whereas your phone is running Ubuntu RTM which has an older version of network-manager.

Tony Espy (awe) wrote :

So I actually have a fix that seems to work now.

There was an additional problem I discovered where the disconnect_done logic in nm-modem-ofono.c was setting the modem state to CONNECTED, which was incorrect. I've pushed a new branch and associated with this bug. It needs a bit more cleanup before it's ready to land, but looks promising at this point.

Changed in canonical-devices-system-image:
milestone: ww13-2015 → ww17-2015
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package network-manager - 0.9.10.0-4ubuntu14

---------------
network-manager (0.9.10.0-4ubuntu14) vivid; urgency=medium

  [ Tony Espy ]
  * Fix ofono mobile-data 5m reconnect bug (LP: #1418077)
  * Cleanup of the NM ofono code, getting rid of unused and/or
    code that was commented out.
 -- Ricardo Salveti de Araujo <email address hidden> Mon, 13 Apr 2015 01:35:11 -0300

Changed in network-manager (Ubuntu):
status: In Progress → Fix Released
Changed in canonical-devices-system-image:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers