After resume, NM tries to connect to previous, now non-available Wi-Fi network -- happens with wl and not with ath9k

Bug #452571 reported by Tony Espy
44
This bug affects 6 people
Affects Status Importance Assigned to Milestone
NetworkManager
Unknown
Medium
linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Binary package hint: network-manager

On Ubuntu Karmic Beta + the latest updates as of this morning ( EDT time ), I run into the following issue, which is very similar to bug #379201.

If I'm connected to my home AP, then suspend my machine and wake it up somewhere else, NM tries to re-connect to my home network, even though it's not available in scans. It eventually times out, and I get prompted for a key / passphrase again.

Note, this happens on my new Macbook, which happens to use the Broadcom STA ( 'wl' ) driver. I've tried reproducing on my Macbook Pro which has an ath9k card and haven't been able to reproduce it yet.

ProblemType: Bug
Architecture: i386
CRDA: Error: [Errno 2] No such file or directory
Date: Thu Oct 15 17:46:04 2009
DistroRelease: Ubuntu 9.10
IfupdownConfig:
 auto lo
 iface lo inet loopback
IpRoute:
 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.106 metric 1
 169.254.0.0/16 dev eth0 scope link metric 1000
 default via 192.168.1.1 dev eth0 proto static
NonfreeKernelModules: nvidia wl
Package: network-manager 0.8~a~git.20091013t193206.679d548-0ubuntu1
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-14.47-generic
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
SourcePackage: network-manager
Uname: Linux 2.6.31-14-generic i686

Revision history for this message
Tony Espy (awe) wrote :
Revision history for this message
Tony Espy (awe) wrote :

Also, I tried killing NM and the applet, and reproducing it using 'iwlist scan'.

In my loft, I was able to run 'iwlist scan eth1' and see my home AP. I suspended, and went down to the lobby in my building, where I resumed, and ran 'iwlist scan eth1' again... this time I didn't see my home AP.

The only thing I can think of is that I may not be quick enough to see the scan results on resume due to the fact that I have 'check pw' enabled on my screensaver. More work is required to pin this down to the 'wl' driver.

Changed in network-manager (Ubuntu):
status: New → Confirmed
Revision history for this message
Tony Espy (awe) wrote :

So I tried my experiment again, this time with a simple shell script running in root shell across a suspend/resume:

while true; do
  echo `date`;
  iwlist eth1 scan | grep hyperion
done

You can see where the system goes to suspend, and after resume, the original AP is no longer found.

More digging needs to be done to pin this one down...

Revision history for this message
Tony Espy (awe) wrote :

OK, I think the issue is that the wl driver doesn't explicitly disconnect on suspend / resume, whereas the ath9k driver does.

I say this, because on my Macbook Pro, I see an explicit disconnected notification bubble, whereas on the Macbook, the applet icon just starts spinning.

So, NM just goes ahead and uses it's cached scan results...

Revision history for this message
Alexander Sack (asac) wrote :

isnt this still the driver bug that we had everywhere in the past? e.g. something related to driver using jiffies rather than real time to calculate whether a scan result is old etc.

Revision history for this message
Tony Espy (awe) wrote :

No, check out the attachment in comment #3. It clearly shows that the driver doesn't report the AP in the scan results after resume.

Tony Espy (awe)
summary: - After Resume, NM tries to connect to previous, non-available Wi-Fi
- network
+ [Broadcom wl]: After Resume, NM tries to connect to previous, non-
+ available Wi-Fi network
Revision history for this message
Tony Espy (awe) wrote : Re: [Broadcom wl]: After Resume, NM tries to connect to previous, non-available Wi-Fi network

OK, so ran another side-by-side comparison this morning on two machines, a macbook with running the BCM wl driver, and a macbook pro running ath9k. Both machines are Karmic RC + lastest updates.

I bumped the verbosity of wpa_supplicant logging ( added -d to the args in the .service file ), wiped the daemon.log on both. I then did the following on both:

1. Associated with my home AP ( WEP )
2. Suspended both machines around 9:30 AM
3. Resumed both machines at around 10:15 AM at a local coffee shop

As expected, the ath9k machine displayed a Disconnected notification dialog, whereas the wl machine started spinning the applet icon, and eventually popped up a key / password dialog for my home AP.

I went through the logs side by side and they look almost identical, except for the fact that the 'wl' driver never seems to generate a disconnect wireless event on sleep. This is what I expected.

As NM does have logic to deactivate and clear the currently associated AP on sleep, I think the 'w' driver behavior is exposing a bug in NM.

I'll attach sanitized versions of both logs.

Revision history for this message
Tony Espy (awe) wrote :
Revision history for this message
Tony Espy (awe) wrote :
Revision history for this message
Nilbus (nilbus) wrote :

This is similar to bug #264683, and the same workaround applies.

create the file /etc/pm/config.d/suspend_wl with the contents:
SUSPEND_MODULES=wl

This works around the problem, though suspending or removing and then re-inserting the wl module causes my kernel to panic when I try to connect to my campus' unsecured wireless network. Works fine on any secure networks. I will report this bug separately.

Is anyone else experiencing kernel panics after suspend/resume when reloading the wl module? This can be tested by:
1. connect to a secured network
2. suspend, and resume in a different location with an unsecured network
3. sudo rmmod wl && sudo modprobe wl - note that the old access points are no longer visible
4. connect to an unsecure network. My kernel panics within 30 seconds.

Revision history for this message
Runar Ingebrigtsen (ringe) wrote :

I have the same issue with my BCM4312 card. After applying the workaround I got the disconnected message, but the first attempt to use the network failed as I coulnd't resolve any address. I picked the relevant wireless network in nm-applet and everything was fine, as it was after doing the same without the workaround. I have yet to test going from a secure to an unsecure network but will get back with an update on that tomorrow. If I can reproduce the non-resolving issue again I will attach some logs here.

Revision history for this message
Tony Espy (awe) wrote :

I'm at UDS right now, so I can't really give too much attention to this bug until I'm back home next week, but I will try and work with upstream to get the network-manager portion(s) of the bug fixed.

Revision history for this message
Paul V. Stodghill (paul-stodghill) wrote :

Please note that because of bug #452571 and bug #461990, _neither_ the wired nor the wireless network adapter works reliably on an HP Mini 110 under Ubuntu 9.10.

Revision history for this message
Nilbus (nilbus) wrote :

After installing the newest wl driver released by Broadcom on 2/9/2010, everything works flawlessly.

Steps:
Download the driver from http://www.broadcom.com/support/802.11/linux_sta.php
Install it using the instructions in README.txt.
Create /etc/pm/config.d/suspend_wl with the contents: SUSPEND_MODULES=wl

To verify that the new kernel driver was in use, I made sure that the contents of /sys/module/wl/srcversion before I started and after rebooting.

To resolve this issue:
- The module must be reloaded on suspended/hibernate. Right now it is not, which is what causes the old APs to remain. Given the lack of other modules that use this method, there's probably a better way. Anyone know what that would be?
- The wl driver that ships with ubuntu should be upgraded to the latest version.

Revision history for this message
Tony Espy (awe) wrote :

Re-loading the module is a hack to work around the bug which is caused by the fact that the STA driver doesn't generate an explicit disconnect iw event ( see comment #7 ). Network Manager caches the APs, not the driver. Since NM doesn't see a disconnect, it never clears it's cache.

Revision history for this message
Jeff Fortin Tam (kiddo) wrote :

Take a look at https://bugzilla.gnome.org/show_bug.cgi?id=607457 for a thorough explanation of the problem (and that "broadcom must fix their drivers"). So network-manager says it's not a bug on their side. Broadcom's contact form basically autoreplies that they don't care about end-users, and that we should bug our distributors and manufacturers of wireless devices... so if anything, Canonical could bug Dell about this too.

Revision history for this message
Paul V. Stodghill (paul-stodghill) wrote :

I compiled 2.6.33.2 using these instructions,

http://vanilja.org/kernel/

I then blacklisted the "wl" driver and configured the "b43" driver using the instructions from here,

http://linuxwireless.org/en/users/Drivers/b43#fw-b43-lp

In addition to giving me a working ethernet interface (finally!), my wireless interface now seems to reconnect correctly after resuming. I need to do some more tests using a different network at work, but I am hopeful that using the "b43" driver will workaround the problem.

Revision history for this message
Paul V. Stodghill (paul-stodghill) wrote :

I forgot the mention: the reason for compiling a custom kernel is because you need to enable CONFIG_B43_FORCE_PIO in order for the "b43" driver to load correctly under 2.6.33.2 on the HP 110-1030NR.

Revision history for this message
Nilbus (nilbus) wrote :

Paul, is your device (run lspci) similiar to this one, namely a/b/g/n (has support for n) ?

0c:00.0 Network controller: Broadcom Corporation BCM4328 802.11a/b/g/n (rev 03)

Revision history for this message
Paul V. Stodghill (paul-stodghill) wrote :

No. Just b/g.

snapdragon$ lspci | fgrep Broadcom
01:00.0 Network controller: Broadcom Corporation BCM4312 802.11b/g (rev 01)
snapdragon$ lspci -n | fgrep 01:00.0
01:00.0 0280: 14e4:4315 (rev 01)
snapdragon$

Revision history for this message
Nilbus (nilbus) wrote : Re: [Bug 452571] Re: [Broadcom wl]: After Resume, NM tries to connect to previous, non-available Wi-Fi network

okay, that's good to know. The N series has never worked with b43 as far as
I've read, though I've never tried your kernel tweaks.

Revision history for this message
Paul V. Stodghill (paul-stodghill) wrote : Re: [Broadcom wl]: After Resume, NM tries to connect to previous, non-available Wi-Fi network

When I resumed my netbook at work this morning, not only did it correctly recognize that it could no longer connect to my home network (with security), but the NetworkManager was able to successfully connect to my work network (without security).

So, until Broadcom fixes its driver, the "b43" driver with a newer (2.6.33+) kernel appears to be a working alternative.

Tony Espy (awe)
description: updated
Changed in network-manager:
importance: Unknown → Medium
Thomas Hood (jdthood)
summary: - [Broadcom wl]: After Resume, NM tries to connect to previous, non-
- available Wi-Fi network
+ After resume, NM tries to connect to previous, now non-available Wi-Fi
+ network -- happens with wl and not with ath9k
Revision history for this message
Thomas Hood (jdthood) wrote :

Has this been fixed in Ubuntu 12.04?

affects: network-manager (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Closing this bug with Won't fix as this kernel / release is no longer supported.
Please feel free to open a new bug report if you're still experiencing this on a newer release (Bionic 18.04.3 / Disco 19.04)
Thanks!

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.