background scanning causes drivers to disassociate - WiFi roaming causes NetworkManager to lose routing

Bug #201202 reported by Benson Margulies on 2008-03-11
34
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bcm4400-source (Ubuntu)
Undecided
Unassigned
network-manager (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: network-manager

Gutsy, current updates as of this date.

network-manager 0.6.5-0ubuntu16.7.10.0 0

I'm in a hotel in Sweden with Telia HomeRun wireless.

If I use a session with the NetworkManager, I lose connectivity each time that something decides that I have roamed from one bssid to another.

Syslog shows:

Mar 11 16:04:48 bim-1330 NetworkManager: <debug> [1205265888.124081] nm_device_802_11_wireless_update_bssid(): Roamed from BSSID 00:19:A9:B5:9B:40 to 00:19:A9:B5:5D:B0 on wireless network 'homerun'
Mar 11 16:04:48 bim-1330 NetworkManager: <debug> [1205265888.125820] nm_dbus_signal_filter(): NetworkManagerInfo triggered update of wireless network 'homerun'
Mar 11 16:04:48 bim-1330 kernel: [ 653.632000] SoftMAC: Open Authentication completed with 00:19:a9:b5:9c:80
Mar 11 16:04:49 bim-1330 kernel: [ 654.496000] SoftMAC: Open Authentication completed with 00:19:a9:b5:9b:40
Mar 11 16:04:50 bim-1330 NetworkManager: <debug> [1205265890.124297] nm_device_802_11_wireless_update_bssid(): Roamed from BSSID 00:19:A9:B5:5D:B0 to 00:19:A9:B5:9C:80 on wireless network 'homerun'
Mar 11 16:04:50 bim-1330 NetworkManager: <debug> [1205265890.126017] nm_dbus_signal_filter(): NetworkManagerInfo triggered update of wireless network 'homerun'

and from then on, I cannot ping anything unless I force the interface to go back down and up again. Sometimes, I can't connect at all.

As an experiment, I logged out, logged in again on a KDE session. I've been working for an hour. Previously, I never got past a few minutes.

Alexander Sack (asac) wrote :

can you please the rc2 of 0.6.6 which was on the hardy alpha 6 release or try once beta is out? (you can either test with livecd or by upgrading your system - you decide).

please also attach your complete /var/log/syslog (capturing some roaming incidents), tell me about which time i can see the roaming issue in it and provide information on the chipset/driver you are using for wireless.

Thanks,
  - Alexander

Thanks,
 - Alexander

Changed in network-manager:
status: New → Incomplete
Benson Margulies (bimargulies) wrote :

1) I did include a roaming incident above. I'll attach some more syslog.

2) Sadly, I am no longer in Stockholm, and so I can no longer produce this problem on command. I strongly suspect that it's related to the inability to connect to the public wireless at Dulles International Airport, where I'll be again in a few weeks.

Benson Margulies (bimargulies) wrote :
Jae Stutzman (jaebird) wrote :

This is an interesting question. I have been to many hotels, they either use some kinda of WDS or mesh mode...all using the same SSID. I've heard that XP and Vista seamlessly roam between them picking the AP with the strongest signal. How does NetworkManager work? It doesn't appear to do this very well.

Dana Goyette (danagoyette) wrote :

At my college (Cal Poly, SLO), all the wireless access points use the same SSID, but each is on a different subnet. When I roam from one access point to another, my connection mysteriously breaks without notice... and thinks it's still up. It seems NetworkManager can't handle different access points with same SSID and different subnets. Another interesting note: these access points do not use NAT -- instead, it gives true internet IPs.

Here's one possible fix: on roaming to a different SSID, NetworkManager should try to ping the old default route / gateway..... and if it's not reachable, it should re-run dhclient. (Of course, you'd have to make allowances for gateways set not to respond to pings.)

Dana Goyette (danagoyette) wrote :

Two corrections:
...instead, they give true internet IPs.
... upon roaming to a different access point with the _same_ SSID, ...

driver having issues with multiple bssid's for one ssid.

Dana Goyette (danagoyette) wrote :

This is not just bcm-anything; it happens on my Intel 3945ABG (iwl3945) card.

It's very frustrating to do lots of work to isolate and describe a problem,
only to have some anonymous expert throw out the work with no explanation.
There was no reason to think that this was ever specific to a chipset, and
now Dana delivers the evidence.

On Tue, Jun 3, 2008 at 2:10 AM, Dana Goyette <email address hidden> wrote:

> This is not just bcm-anything; it happens on my Intel 3945ABG (iwl3945)
> card.
>
> --
> [bcm4xx] WiFi roaming causes NetworkManager to lose routing
> https://bugs.launchpad.net/bugs/201202
> You received this bug notification because you are a direct subscriber
> of the bug.
>

I Also have the same problem with my iwl3945 (Intel Wireless 3945), I'm using nm-applet to manage my wireless, and it just won't connect. Works fine if i switch over to Windows.. Hers my Sys Log.
Note: Im trying to connect to a open wireless network in a hotel.

Jun 17 10:56:19 blumlaptop dhcdbd: message_handler: message handler not found under /com/redhat/dhcp/eth1 for sub-path eth1.dbus.get.reason
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Will activate connection 'eth1/ChanningsHotel'.
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Device eth1 activation scheduled...
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Activation (eth1) started...
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Activation (eth1) Stage 1 of 5 (Device Prepare) scheduled...
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Old device 'eth1' activating, won't change.
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Activation (eth1) Stage 1 of 5 (Device Prepare) started...
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Activation (eth1) Stage 2 of 5 (Device Configure) scheduled...
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Activation (eth1) Stage 1 of 5 (Device Prepare) complete.
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Activation (eth1) Stage 2 of 5 (Device Configure) starting...
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Activation (eth1/wireless): access point 'ChanningsHotel' is unencrypted, no key needed.
Jun 17 10:56:19 blumlaptop NetworkManager: <info> Old device 'eth1' activating, won't change.
Jun 17 10:56:19 blumlaptop last message repeated 3 times
...
Jun 17 10:56:21 blumlaptop NetworkManager: <info> Activation (eth1) Stage 2 of 5 (Device Configure) complete.
Jun 17 10:56:21 blumlaptop kernel: [ 75.370224] eth1: Initial auth_alg=0
Jun 17 10:56:21 blumlaptop kernel: [ 75.370236] eth1: authenticate with AP 00:1e:2a:10:e4:c2
Jun 17 10:56:21 blumlaptop kernel: [ 75.549628] eth1: authenticate with AP 00:1e:2a:10:e4:c2
Jun 17 10:56:22 blumlaptop kernel: [ 75.736553] eth1: authenticate with AP 00:1e:2a:10:e4:c2
Jun 17 10:56:22 blumlaptop kernel: [ 75.880368] eth1: authentication with AP 00:1e:2a:10:e4:c2 timed out

Daniel T Chen (crimsun) wrote :

Reproducible using ndiswrapper and bcm4315 (8.04.1) as well as ipw2200 and ipw2195abg (7.10).

Changed in network-manager:
status: New → Confirmed

Same problem here. Using a Belkin Wireles extender in Repeater mode. Can't stay connected to the stronger Repeater MAC. Works perfect in windows....

pickett.aaron (pickett-aaron) wrote :

Same SSID, Router and Repeater

Tim Wright (timw) wrote :

I don't know if I have the exact same problem, but on Hardy with the latest updates, NM is still unusable on my laptop. I have to APs in the house, both with the same SSID. Sat in the family room, the signal strength is similar from both and the system wants to flip-flop. EVERY time it does so, it breaks all my connections. This works flawlessly under Windows on the same machine, sigh.

The APs have no security on at the moment (just a MAC address filter - there's nothing important on the network). Here's some sample output from /var/log/daemon.log:

Oct 6 16:09:32 timw-laptop NetworkManager: <debug> [1223334572.791447] nm_device_802_11_wireless_update_bssid(): Roamed from BSSID 00:13:10:69:D1:29 to 00:0F:66:92:6A:9C on wireless network 'splhi'
Oct 6 16:13:42 timw-laptop NetworkManager: <info> Supplicant state changed: 0
Oct 6 16:13:42 timw-laptop NetworkManager: <info> Supplicant state changed: 1
Oct 6 16:13:42 timw-laptop NetworkManager: <debug> [1223334822.839332] nm_device_802_11_wireless_update_bssid(): Roamed from BSSID 00:0F:66:92:6A:9C to 00:13:10:69:D1:29 on wireless network 'splhi'

As I say, EVERY time this happens, it drops my active connections (e.g. scp gets killed). This is on a Thinkpad T42 with ipw2200 wireless.

Ross Reedstrom (reedstrm) wrote :

Seeing the same thing, here. W/ both an HP tablet (ipw2100) and a newer thinkpad (don't have the model to hand). Same setup, multiple APs, one SSID, different subnets. Roam, and bingo, lost network. I'm tempted to set up a log monitor to poke dhclient when it sees a roaming event, even.

Tim Wright (timw) wrote :

Ah, I neglected to mention, my two APs are on the *same* subnet - their IP address differs by one, so there is absolutely no reason to change anything at the client end of things. In the mean time, I have had to select manual configuration. Allowing the ipw2200 driver to roam on its own works just fine and dandy.

Wira Mulia (wheerdam) wrote :

I also have this problem with my ralink based wifi adapter (rt73usb driver is used). Whenever NetworkManager (or whatever that is) decides to switch to a different bssid that belongs to the same essid, I lose connection. The subnet is the same for all base stations. Reconnecting to the wireless network and logging in to the wifi proxy thing again works, until it decides to roam to another base station and the same thing happens.

frooky (5-elements) wrote :

I am also seeing the issue as well:

Lenovo T500 with the intel 5100 (AGN) chipset

We have a wireless network with 32 AP's in our office using a wireless controller. They are all about 30% power level and I can talk with 3 at any time... so in other words this is happening about every 5 min when it thinks that it should be talking to a different one

If you need any info, logs or testing please let me know.

from syslog about 30 sec after I connected it dropped this last one:

Nov 4 10:08:08 clotho NetworkManager: <info> (wlan0): supplicant connection state change: 5 -> 6
Nov 4 10:08:08 clotho NetworkManager: <info> (wlan0): supplicant connection state change: 6 -> 7
Nov 4 10:08:13 clotho NetworkManager: <debug> [1225814893.514782] periodic_update(): Roamed from BSSID 00:1B:2B:<SNIP> (<SNIP>) to 00:18:74:<SNIP> (<SNIP>)
Nov 4 10:08:16 clotho kernel: [ 617.188608] CE: hpet increasing min_delta_ns to 15000 nsec
Nov 4 10:08:17 clotho kernel: [ 618.308590] CE: hpet increasing min_delta_ns to 22500 nsec

frooky (5-elements) wrote :

Sorry forgot to mention: ibex with the updates from Alex

ii network-manager 0.7~~svn20081018t105859-0ubuntu2~nm4 network management framework daemon
ii network-manager-gnome 0.7~~svn20081020t000444-0ubuntu1 network management framework (GNOME frontend

Alexander Sack (asac) wrote :

frooky. a few basic questions up front:

Can you confirm that you have a good wireless in environment without so many APs? Do the APs in the office share the same ESSID?

Could you please attach a more complete syslog? it should at least capture everything starting where NM starts.

frooky (5-elements) wrote :

Sorry for the slow response it's been a hectic week at work. Yes we have a good wireless infrastucture setup with that many AP's. They all share the same SSID's and vlan's. It's designed for us to be able to roam around the building without problems. I wasn't able to get a syslog capture today since it was so busy with new announcements (I work for Wayport) and other things going on. As soon as I get in the office tomorrow or monday and I see the problem I will sanitize the logs and get them on here.

I am getting exactly the same problem, and the more frustrating part is that it totally ruins my wifi access at work. We have many APs with the same SSID, and at my desk it ALWAYS connects to a 00:0F:xxx AP, then roams to a 00:0D:xxx AP immediately after a "dhcp bound to" followed by a "device state change: 7 -> 8", and before a "Policy set 'ESSID' (ath0) as default for routing and DNS.". ALWAYS. And then, of course, routing, etc, seems ok, but not even skype, nothing works, no pings, tracepath, etc.
Is there any way to blacklist a specific AP, if there is no way to disable this idiotic roaming?

Bob McElrath (bob+ubuntu) wrote :

Confirmed with 8.10 and ath9k. The wireless often roams to (none), which totally drops the connection. I can cause it to reassociate with an AP by using 'iwconfig ath0 ap <MAC>'. And about 2 minutes later it will roam back to (none). Very frustrating.

It seems to me like there is some code in NetworkManager which selects the best AP, and it's broken. It seems to me that NM should *never* roam to (none). When is that ever a good idea? It should only roam if it finds a new AP with better signal strength. Attached is a patch which should do that. Unfortunately, as I'm not travelling and in airports/hotels right now, I can't easily test it. Can someone give it a whirl?

nahtgesicht (nahtgesicht) wrote :

I am sitting in my university, where there are also different APs with a single essid. I also had problems with roaming to (none), and even if the connection was alive, everything was crawling and ping did not work anymore.
I am using Intrepid with an Thinkpad X31 with an ipw2100 card.

 I just applied your patch to the current intrepid sources (network-manager-0.7~~svn20081018t105859) and recompiled the necessary packages. I restarted the network manager, and I did not have a single connection loss in at least 4 hours. (Before every few mins). Browsing also seems alot snappier.For me it works now. Thanks alot.

Andres Mujica (andres.mujica) wrote :

as described in comments not really hw dependant.

Changed in bcm4400-source (Ubuntu):
status: Incomplete → Invalid
Mark Stockton (mark-linuxworx) wrote :

I'm having the exact same wifi network roaming problem using Kubuntu 904 on my Toshiba L300 Satellite Pro. Is it possible that this problem still exists in 904?

Mark Stockton (mark-linuxworx) wrote :

Should maybe add that the chip set is Realtek RTL8187B

Alexander Sack (asac) wrote :

Bob. Thanks for the patch. However, NetworkManager does not do the
roaming on its own; it is a wpasupplicant/driver decision to switch/disassoc
from the AP.

The output seen (and wrongly patched out by the patch) is NM reactively
updating its internal state _after_ wpasupplicant/driver did the roaming to (none).

The reason why we see this "roaming to (none)" symptom is that some
drivers (like atheros) are buggy when background scanning is
used. They often disassociate temporarily and that's when we see the
roaming to (none) in the log. We discussed this at UDS with
linville and friends and there is work on in progress to make drivers
behave better [1].

Also we thought and spec'ed to remove background scanning for karmic
[2], however with wifi consumers for geolocation (like in firefox),
its getting even more important that background scanning is done
regularly.

[1] - http://bugzilla.kernel.org/show_bug.cgi?id=12635#c7
[2] - https://wiki.ubuntu.com/DesktopTeam/Specs/Karmic/NetworkUI

Also see bug 291760 which is about background scanning issues.

summary: - WiFi roaming causes NetworkManager to lose routing
+ background scanning causes drivers to disassociate - WiFi roaming causes
+ NetworkManager to lose routing
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.