modem-manager fails to recognize 3G modem after suspend/resume

Bug #614274 reported by Torsten Spindler
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: network-manager

A 3G modem is no longer recognized by modem-manager when the system resumes. The USB device under /sys/devices/pci0000:00/0000:00:1d.1/usb6/6-1 is gone, the dmesg output below contains the following info:

[ 4251.624038] usb 6-1: new full speed USB device using uhci_hcd and address 4
[ 4251.744422] usb 6-1: device descriptor read/64, error -71
[ 4251.968059] usb 6-1: device descriptor read/64, error -71
[ 4252.184088] usb 6-1: new full speed USB device using uhci_hcd and address 5
[ 4252.304068] usb 6-1: device descriptor read/64, error -71
[ 4252.528099] usb 6-1: device descriptor read/64, error -71
[ 4252.744095] usb 6-1: new full speed USB device using uhci_hcd and address 6
[ 4253.160115] usb 6-1: device not accepting address 6, error -71
[ 4253.272083] usb 6-1: new full speed USB device using uhci_hcd and address 7
[ 4253.688067] usb 6-1: device not accepting address 7, error -71
[ 4253.688104] hub 6-0:1.0: unable to enumerate USB device on port 1

Revision history for this message
Torsten Spindler (tspindler) wrote :
Revision history for this message
Torsten Spindler (tspindler) wrote :
Revision history for this message
Torsten Spindler (tspindler) wrote :
Revision history for this message
Torsten Spindler (tspindler) wrote :
Revision history for this message
Boris Devouge (bdevouge) wrote :

Torsten,

 Possibly related to:

https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/259028
 ( upstream https://bugzilla.gnome.org/show_bug.cgi?id=594085 )

Seems we might need the usb_modeswitch udev helper to run on wake up from suspend.

Hope this helps,

Revision history for this message
Torsten Spindler (tspindler) wrote :

The problem is not always present. While I could reproduce it last week with ease, in a test run of 10 suspend/resume cycles today the problem fails to manifest. I do not believe the usb_modeswitch is a solution, as the sierra module takes care of this, at least according to the upstream bug 594085.

Revision history for this message
Torsten Spindler (tspindler) wrote :

Removing the driver via modprobe -r sierra and then inserting it again does not cure the problem, the modem is still gone.

description: updated
affects: network-manager (Ubuntu) → linux (Ubuntu)
Revision history for this message
Torsten Spindler (tspindler) wrote :

After an additional suspend/resume cycle the device was back and working.

tags: added: kernel-suspend
tags: added: kj-triage
Revision history for this message
Torsten Spindler (tspindler) wrote :

A test of suspend/resume cycles gave the following results:
Laptop 1:
250 cycles, no problem
250 cycles, no problem
Laptop 2:
151 cycles, modem gone, after next suspend/resume modem was back
36 cycles, same as above
43 cycles, same as above
28 cycles, same as above
250 cycles, no problem

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

@Torsten-

I cannot reproduce the problem on the similar T61 that was sent to me. I've tried your the pae-lvm1 kernel from your PPA, as well as the standard Lucid amd64 kernel. For each, I ran 750 cycles of the sus-res.sh test script without seeing the problem. Lets compare 'sudo lshw' output on my T61 to that on your two machines and see if we can identify any interesting hardware differences (mine attached).

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :
Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

@Torsten-

This thread references "device descriptor read/64, error -71":
http://ubuntuforums.org/archive/index.php/t-797789.html

From that thread ... these have no apparent effect on my T61 (sus-res continues to work reliably), but you might try it on yours:
  echo -1 | sudo tee /sys/module/usbcore/parameters/autosuspend
and/or:
  echo Y | sudo tee /sys/module/usbcore/parameters/old_scheme_first

Revision history for this message
Torsten Spindler (tspindler) wrote :

lshw from the laptop where the problem occured infrequently.

Revision history for this message
Torsten Spindler (tspindler) wrote :

lsusb -v output for the laptop where the problem occurred infrequently.

Revision history for this message
Torsten Spindler (tspindler) wrote :

A quick glance at the lshw output reveals that the motherboard product id differs:
Kamals: 6463Y3W
Torstens: 6463WFX

Revision history for this message
Torsten Spindler (tspindler) wrote :

The laptop with Uli that passed 500 suspend/resumes (comment #9) is also a 6463Y3W.

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

Attached is my lsusb -v output (lsusb-kamal-cannotreproduce).

The only interesting differences from yours are that (1) I have my BlueTooth enabled but you don't, and (2) you have a SanDisk "Cruzer Contour" USB stick plugged in. Torsten, can you verify that you can still reproduce the problem without that USB stick?

Revision history for this message
DSHR (s-heuer) wrote :

On a Thinkpad X61 I saw the following behaviour:

Returning from suspend lsusb does not show the modem - so no modem available in nw-manager.

After switching off wireless with the slide on the front the modem was shown in lsusb. After switching on networking it is avalable in nm-applet.

Maybe it is a good idea to be more careful when going to sleep to power down the modem? The LED ist the last one blinking when going into suspend mode ...

Some snippets from dmesg:

[34362.032119] PM: resume of drv:usb dev:usb6 complete after 247.943 msecs
[34362.034202] sd 2:0:0:0: [sda] Starting disk
[34362.268152] usb 6-1: reset full speed USB device using uhci_hcd and address 26
???
[34362.468225] sierra 6-1:1.0: no reset_resume for driver sierra?
[34362.468371] sierra ttyUSB0: Sierra USB modem converter now disconnected from ttyUSB0
[34362.468438] sierra ttyUSB1: Sierra USB modem converter now disconnected from ttyUSB1
[34362.468501] sierra ttyUSB2: Sierra USB modem converter now disconnected from ttyUSB2
[34362.468517] sierra 6-1:1.0: device disconnected
[34362.468522] PM: resume of drv:usb dev:6-1 complete after 364.866 msecs
[34362.468545] PM: resume of devices complete after 2028.007 msecs
[34362.468566] sierra 6-1:1.0: Sierra USB modem converter detected
[34362.469307] usb 6-1: Sierra USB modem converter now attached to ttyUSB0

modem not recognized:
[34106.740417] Registered led device: iwl-phy0::TX
[34106.753084] ADDRCONF(NETDEV_UP): wlan0: link is not ready
[34106.884031] usb 6-1: new full speed USB device using uhci_hcd and address 21
[34107.008397] usb 6-1: device descriptor read/64, error -71
[34107.232054] usb 6-1: device descriptor read/64, error -71
[34107.448094] usb 6-1: new full speed USB device using uhci_hcd and address 22
[34107.568083] usb 6-1: device descriptor read/64, error -71
[34107.792105] usb 6-1: device descriptor read/64, error -71
[34108.008072] usb 6-1: new full speed USB device using uhci_hcd and address 23
[34108.420268] usb 6-1: device not accepting address 23, error -71
[34108.532174] usb 6-1: new full speed USB device using uhci_hcd and address 24
[34108.948098] usb 6-1: device not accepting address 24, error -71
[34108.948136] hub 6-0:1.0: unable to enumerate USB device on port 1

Revision history for this message
Torsten Spindler (tspindler) wrote : Re: [Bug 614274] Re: modem-manager fails to recognize 3G modem after suspend/resume

@ Kamal: I can reproduce the problem without USB stick.

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

@DSHR - Thanks for that information.

For what it's worth, I cannot reproduce the problem on my T61, regardless of the position of the wireless slide switch.

Revision history for this message
Torsten Spindler (tspindler) wrote :

I did a test series with the usb option changes as in comment #12 by Kamal. Here are the results:

=========================================================================
'normal'
42 cycles, modem gone
=========================================================================
echo -1 | sudo tee /sys/module/usbcore/parameters/autosuspend
unknown numbers of cycles, suspend/resume stopped and desktop was shown.
31 cycles, modem gone
=========================================================================
echo Y | sudo tee /sys/module/usbcore/parameters/old_scheme_first
250 cycles, modem around
=========================================================================

I will run another 250 cycles with the Y set for old_scheme_first and only report back if the modem goes away again. Otherwise I think we have found a solution for the problem.

Revision history for this message
Torsten Spindler (tspindler) wrote :

Very unfortunately the next test run with old_scheme_first Y did stop after 15 suspend/resumes.

Revision history for this message
Torsten Spindler (tspindler) wrote :

With both options set it failed to detect the modem after 33 cycles.

$ echo Y | sudo tee /sys/module/usbcore/parameters/old_scheme_first
$ echo -1 | sudo tee /sys/module/usbcore/parameters/autosuspend

Revision history for this message
Torsten Spindler (tspindler) wrote :

As per comment #18 I tested what happens when using rfkill to kill the non existing modem. After the followin sequence the modem appeared again in network-manager:
$ sudo rfkill block 2
$ sudo rfkill unblock 2

I will check now if killing the modem upon resume will always lead to a working modem.

Revision history for this message
Torsten Spindler (tspindler) wrote :

Unfortunately the results are mixed when killing the modem upon suspend/resume:

150 suspend/resumes, modem there
200 suspend/resumes, modem gone
86 suspend/resumes, modem gone

Here's the modified script /etc/pm/power.d/20_wwan:

#!/bin/sh

WWANSTATE=$(cat /proc/acpi/ibm/wan|awk '/status:/ {print $2}')
if [ "x${WWANSTATE}" = "xdisabled" ]; then
 echo "enable" >/proc/acpi/ibm/wan
fi

WWANINDEX=$(rfkill list | grep tpacpi_wwan_sw | cut -f1 -d:)
rfkill block $WWANINDEX
sleep 1
rfkill unblock $WWANINDEX

Revision history for this message
Torsten Spindler (tspindler) wrote :

With the rfkill script running on suspend/resume, once the modem is gone I also need to restart network-manager to bring it back.

E.g. modem disappeared
1) sudo /etc/pm/power.d/20_wwan
2) sudo restart network-manager

Revision history for this message
Torsten Spindler (tspindler) wrote :

When I reproduced the problem again with the above 20_wwan script, I could get the modem back by restarting network manager. I'm testing a modified 20_wwan script now, where the rfkill is only sent on suspend.

Revision history for this message
Torsten Spindler (tspindler) wrote :

When moving the rfkill block to a 'case "$1" in resume)' block, the suspend resume script failed after 20, 25 and 27 times.

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

@Torsten & DSHR-

I've constructed a kernel for you to try. This is the latest Lucid kernel, but with the Sierra driver from upstream 2.6.36-rc3. Lets see if this has any effect on the suspend/resume behavior. (I realize that I forgot to build the -pae kernel, but I don't think that will matter for this test):
http://kernel.ubuntu.com/~kamal/lp614274-sierra/

Revision history for this message
Torsten Spindler (tspindler) wrote :

The attached 20_wwan script placed in /etc/pm/sleep.d/ does seem to decrease the likeliness of the error to happen. I had a run of 102 suspend/resume cycles (timer got unwritable), a run of 74 cycles (gdm was shown instead of desktop) and a run of 150 cycles (modem disappeared and could not get back with rfkill). From my point of view, disabling the device with rfkill before the suspend and unblocking it during resume seems to improve the situation, albeit it does not solve it completely.

I will now test the new kernel provided by Kamal.

Revision history for this message
Torsten Spindler (tspindler) wrote :

Good news: with your kernel the sierra 3g modem was present during 250
suspend/resume cycles. I will now run another 250 to validate this
finding.

Revision history for this message
Torsten Spindler (tspindler) wrote :

A second run of 250 suspend/resume cycles with the test kernel was also successful!

Revision history for this message
Torsten Spindler (tspindler) wrote :

A third run of the 64bit kernel was also a success. I believe the problem is fixed with this updated sierra module. I'm waiting for comments from LVM on the robustness of the 32bit kernel.

Revision history for this message
tserries (t-serries) wrote :

Bas news from LVM: the device was lost after 196 cycles.

But I don't know how reliable my tests are.
1) Using the 32-bit-kernel from Kamal the modem was not switched on (LED in the panel was off) if LAN cable was connected.
2) After the syspend/resume-cylces I was no longer able to use LAN (neither with the kernel from Kamal nor with the default kernel). In the logs I found a message "ADDRCONF (NETDEV_UP) eth0: ..."; don't know if this is related to my issue. But PXE-Boot is still ok; so I can exclude hardware defect.

I will reinstall my system and rerun the tests using i386 and amd64 kernel. As it's friday dont expect results before monday.

Revision history for this message
Torsten Spindler (tspindler) wrote :

I briefly tested the 32bit kernel and I had a 100% failure rate after 4 suspends - the 3G device was gone after each single suspend. The device could be brought back with the (sudo rfkill block 2; sudo rfkill unblock 2) commands.

@ Kamal: did you install a 32bit or 64bit system on the T61? It seems to make quite a difference when it comes to suspend/resume and the 3G modem.

Changed in linux (Ubuntu):
assignee: nobody → Kamal Mostafa (kamalmostafa)
status: New → In Progress
Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

@Torsten - I have been testing with a 32-bit system installed on the T61 -- primarily the 32-bit pae-lvm1 kernel from your PPA -- but now switched to my 32-bit test kernel from comment #29, non-pae. FWIW, the problem has still not occurred on my machine using any kernel in many repeated sus-res.sh 250 cycle runs.

@tserries - Thanks, we'll look forward to your results with the 64-bit. I haven't tried the wired LAN at all yet, but I will have a look.

The wireless LAN appears to work fine over any number of sus-res cycles on my unit ... how does the wireless LAN behave on yours?

Revision history for this message
Torsten Spindler (tspindler) wrote :

The wireless LAN driver is blacklisted on the production machines, there is no WLAN access for the T61.

Revision history for this message
Torsten Spindler (tspindler) wrote :

Today I tested the 32bit kernel from Kamal together with the 20_wwan script. Strangely all automatic suspend/resume cycles were stopped by an unwritable timer. However, the modem never went away and I had these amount of successful suspend/resume cycles: 69, 78, 69, 96.

Revision history for this message
Torsten Spindler (tspindler) wrote :

Todays testing was also only interrupted by unwritable timer problems. Overall I did series of
123, 77, 196, 33
suspend/resume cycles with Kamals new kernel and the 20_wwan script. I believe the combination of kernel and script or the script alone is worth to be tested in a broader scope now.

Revision history for this message
tserries (t-serries) wrote :

All in all, there seems to be an improvement. I tested the following configurations:

i386-kernel from Kamal (unmodified 20_wwan script): 252 suspend-resume-cycles, then the device was gone. For the fist 230 cycles the device id increased by one per cycle (from 2 up to 231), during the subsequent cycles the device id did not change any more.

amd64-kernel from Kamal (unmodified 20_wwan script): 497 suspend-resume-cycles with unmodified device id. I accidently interrupted before completing 500 cycles, device was still there after 497 cycles.

Default Ubuntu-Kernel (Linux c383019 2.6.32-24-generic-pae #42-Ubuntu SMP Fri Aug 20 15:37:22 UTC 2010 i686 GNU/Linux) with modified 20_wwan script: completed 500 suspend-resume-cycles, device id increased by 1per cycles. Device is still present.

I will need to rerun the test with our modified kernel... to ensure the the result will also show up in production.

Revision history for this message
tserries (t-serries) wrote :

I just completed 500 sucessfull suspend-resume-cycles with our productional kernel (Linux c383019 2.6.32-24-generic-pae #39+lvm1-Ubuntu SMP Wed Jul 14 02:37:43 UTC 2010 i686 GNU/Linux) and the modified 20_wwan without loosing the device.

To verify the positive effect of the modified 20_wwan script I'll run a regression test with our prod. kernel without the modified 20_wwan script.

Revision history for this message
tserries (t-serries) wrote :

I don't know if it's good or bad news: after 405 cycles with our prod. kernel (without 20_wwan script) the session crashed:

Sep 9 13:16:02 c383019 modem-manager: (ttyUSB1): probe requested by plugin 'Sierra'
Sep 9 13:16:03 c383019 kernel: [14325.621126] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Sep 9 13:16:03 c383019 kernel: [14325.621145] render error detected, EIR: 0x00000000
Sep 9 13:16:03 c383019 kernel: [14325.621424] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -5 (awaiting 264788 at 264787)
Sep 9 13:16:03 c383019 kernel: [14325.853036] usb 3-1: new full speed USB device using uhci_hcd and address 29
[...]
Sep 9 13:16:04 c383019 pulseaudio[28668]: main.c: Unable to contact D-Bus: org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /tmp/dbus-F59LPs
Om7D: Connection refused
Sep 9 13:16:04 c383019 kernel: [14326.999474] gdm-session-wor[1433]: segfault at c063ec3c ip b732e8bb sp bfefe9ec error 5 in libc-2.11.1.so[b72c3000+153000]

Revision history for this message
Torsten Spindler (tspindler) wrote :

I had intermittent crashes as well, but I'm not overly concerned by
them. They are very infrequent, maybe once per thousand resumes, or even
less.

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

A few ideas...

1. Lets compare BIOS versions, and lets all upgrade to the latest.

"sudo dmidecode -s bios-version" on my T61 showed BIOS version "2.10".

I've just updated mine to BIOS version "2.27" a.k.a. "7LETC7WW (2.27-1.08)" using the .iso image here:
 http://www-307.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-67989

The complete list of changes does include references to all sorts fixes that might be relevant here, both to the intermittent modem problem and to the crashes you're experiencing: "memory may get garbled during ... resume", "Unexpected interrupts from the USB controller", etc.. (FWIW, I'm not experiencing these crashes). Complete BIOS change list:
 http://www-307.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-68451

2. What happens if you insert a "sleep 1" right above the "rfkill unblock" line in the modified 20_wwan script from comment #30?

3. I noticed this BIOS option: Config->USB-->"Always on USB" which leaves USB ports powered during low power states (it is disabled on my machine). It might be interesting to try enabling it on one of the machines that exhibits the problem (and revert to the kernel and setup where the problem was happening frequently).

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

Side note: I can't make the 'e1000e' wired LAN connection work at all on my T61 -- even on first boot (no suspend/resume involved) dmesg shows "eth1: link is not ready" and ifconfig shows UP but no packets. None of the following makes the LAN work: disabling the rfkill block all, rmmod+modprobe e1000e. I don't think the LAN problem is related to this bug (the USB modem issue).

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

Strike the words "disabling the" from my comment #45. I mean simply that "rfkill block all" didn't help the LAN.

Revision history for this message
Hans-Gerd van Schelve (van-schelve) wrote :

Gentlemen,

when you are talking about differences in bios versions let me put in one mind:

All of our systems that are currently having these problems (and of course many other modem based else) had run a different linux os in the *past*. We did not seen such a problem with modem suspend / resume there. So is there really a chance that it can depend on the BIOS version?

Revision history for this message
Hans-Gerd van Schelve (van-schelve) wrote :

One additional question:

what are the currently used suspend / resume scripts for the full testcase? I think that some of you have local modified versions.

Revision history for this message
Torsten Spindler (tspindler) wrote :

Here's the sus-res.sh script used for the suspend/resume cycles.

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

@Hans-Gerd, I do think that its possible that the BIOS version could affect the problem here... The intermittent nature of the problem, the theory that inserting a delay in the resume sequence helps, and the fact that the newer T61's don't seem to ever exhibit the problem makes me think that the root cause is likely to be a race condition that Lenovo may have since corrected. It could be that the problem didn't manifest on your previous linux systems just due to luck -- that the kernel's suspend/resume timing just didn't happen to hit the race condition. Anyway, its one more variable that we can easily test, so I think its worth testing.

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

I've been using the same sus-res.sh -- the script Torsten just posted in comment #49.

Revision history for this message
Kamal Mostafa (kamalmostafa) wrote :

I have constructed a new test kernel PPA:
https://launchpad.net/~kamalmostafa/+archive/linux-kamal-sierra

This is the latest Lucid kernel with the addition of:
  - the Sierra driver backported from upstream 2.6.36-rc3
  - the patch from Torsten's "lvm1" PPA kernel: [Chris Wilson] "drm/i915: Unset cursor if out-of-bounds upon mode change (v2)"

The intent is to provide a kernel which matches the LVM production kernel, plus the new Sierra driver.

Revision history for this message
Removed by request (stoni.ch-deactivatedaccount) wrote :

Hello all, please note that problem still exists in natty / modemmanager 0.5-0ubuntu1, although since the latest update it looks as the modem is recognized automatically some minutes after resume. So workaround is still required: have a script or resume rule killing modem-manager - it will recover by itself, recognize the modem, ask for the PIN and make the modem available in network manager.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
assignee: Kamal Mostafa (kamalmostafa) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.