Wireless connection does not re-connect

Bug #1181964 reported by Bernd Edlinger on 2013-05-20
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
network-manager (Ubuntu)
Medium
Unassigned

Bug Description

If the Access point is rebooted, Linux wont re-connect.
That behaviour is reproducable with the -43 Kernel version.
Only way to re-connect is disable & enable Wireless connection or reboot.

Previous version, -41 and -39 did automatically re-connect after a few seconds.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-43-generic-pae 3.2.0-43.68
ProcVersionSignature: Ubuntu 3.2.0-43.68-generic-pae 3.2.42
Uname: Linux 3.2.0-43-generic-pae i686
NonfreeKernelModules: fcclassic
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
AplayDevices: aplay: device_list:252: no soundcards found...
ApportVersion: 2.0.1-0ubuntu17.2
Architecture: i386
ArecordDevices: arecord: device_list:252: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/midiC0D0', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Mon May 20 09:53:54 2013
HibernationDevice: RESUME=UUID=20273ec5-368b-49e4-b7b4-b9983dc66732
InstallationMedia: Ubuntu 12.04.1 LTS "Precise Pangolin" - Release i386 (20120817.1)
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Giga-Byte Technology CO., LTD i440BX-W977
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-43-generic-pae root=UUID=cb9535b0-5aba-42d3-9186-2a5255f2d167 ro nomodeset quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-43-generic-pae N/A
 linux-backports-modules-3.2.0-43-generic-pae N/A
 linux-firmware 1.79.4
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/18/00
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: 4.51 PG
dmi.board.name: i440BX-W977
dmi.board.vendor: Corporation Name
dmi.board.version: 1.0
dmi.chassis.type: 2
dmi.chassis.vendor: Corporation Name
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvr4.51PG:bd12/18/00:svnGiga-ByteTechnologyCO.,LTD:pni440BX-W977:pvr1.0:rvnCorporationName:rni440BX-W977:rvr1.0:cvnCorporationName:ct2:cvr1.0:
dmi.product.name: i440BX-W977
dmi.product.version: 1.0
dmi.sys.vendor: Giga-Byte Technology CO., LTD

Bernd Edlinger (bernd-edlinger) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream stable kernel? Please test the latest v3.2 stable kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.45-precise/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: regression-update
Bernd Edlinger (bernd-edlinger) wrote :

Ok, I installed the following kernel: linux-image-3.2.45-030245-generic-pae_3.2.45-030245.201305140735_i386.deb

This bug is definitely fixed with this image:
A short disconnect (reboot AP) => network re-connects immediately.
A long disconnect (1 minute power-off AP) => network re-connects after 5 minutes.

If you like I can upload syslog from upstream kernel for analysis.
Thanks!

tags: added: kernel-fixed-upstream
Bernd Edlinger (bernd-edlinger) wrote :

Oops, sorry...

A few hours later the bug showed up again -- in the upstream kernel, without me trying to reproduce it.

The traces in the event at time index 43224.360095 look very similar to the -43 kernel.
[43223.763471] wlan0: authenticate with 00:13:49:e3:9d:8e (try 1)
[43223.960105] wlan0: authenticate with 00:13:49:e3:9d:8e (try 2)
[43224.160122] wlan0: authenticate with 00:13:49:e3:9d:8e (try 3)
[43224.360095] wlan0: authentication with 00:13:49:e3:9d:8e timed out

Therefore: linux 3.2.45 does not fix this issue completely!

Attached you'll find the kernel messages from the complete session,
first are two successful reconnects and later the failed re-connect.

tags: added: kernel-bug-exists-upstream
removed: kernel-fixed-upstream
Bernd Edlinger (bernd-edlinger) wrote :

Hello,

unforutnately, I must admit, that I am no longer able to reproduce this bug here.

what I did was compile & install a locally generated test kernel over the 3.2.0-43 and later
re-install the 3.2.0-43 kernel from the .deb file in /var/cache/apt/archives

and, guess: now the network connects nicely again.

Therefore I do no longer think that the change between 3.2.0-41 and 3.2.0-43 can be connected to
the connectivity problem, because it is just one line of code, in this function, which is apparently
never executed on my machine (I added a printk there, but that was not executed):

--- linux-3.2.0/kernel/events/core.c
+++ linux-3.2.0/kernel/events/core.c
@@ -5164,7 +5164,7 @@

 static int perf_swevent_init(struct perf_event *event)
 {
- int event_id = event->attr.config;
+ u64 event_id = event->attr.config;

        if (event->attr.type != PERF_TYPE_SOFTWARE)
                return -ENOENT;

Therefore the reason must be somewhere else, in case the
network problem comes back, I added the following lines to
/etc/NetworkManager/NetworkManager.conf:

[logging]
level=DEBUG
domains=HW,RFKILL,ETHER,WIFI,BT,MB,DHCP4,DHCP6,PPP,WIFI_SCAN,IP4,IP6,AUTOIP4,DNS,VPN,SHARING,SUPPLICANT,AGENTS,SETTINGS,SUSPEND,CORE,DEVICE,OLPC,WIMAX

maybe that traces will give us some better insight.

Thanks, and Good Bye.

Bernd Edlinger (bernd-edlinger) wrote :

Now it did happen again. The Network Manager traces show the Access point has a short interruption,
and access point is removed from the Network List, only one other AP in the list, but the data is wrong:
ap_list_dump() does not show the same data than "iwlist wlan0 scanning".

=> therefore it seems to be a Network Manager bug, but much harder to reproduce than I initially assumed.

Bernd Edlinger (bernd-edlinger) wrote :

OK, when you look at nm-device-wifi.c I think I see what is wrong:

When the AP is almost always sending its beacons, but reboots quickly at some point in time, it can be
deleted by cull_scan_list() if it is scheduled exactly at the second because it is not the active AP, and the list entry
was not changed for 3*SCAN_INTERVAL_MAX, and the property WPAS_REMOVED_TAG is set.
some underlying supplicant process does only send bss_removed or new_bss message when the live list is changed
from its perspective.

However, when the AP is once not in the live list when scanned, it may not be guaranteed that the connection really
breaks, but the flag WPAS_REMOVED_TAG is set now and will not be reset when the new_bss is later received the and
the AP data are updated, but the WPAS_REMOVED_TAG should be cleared again. So when that flag is set, cull_scan_list()
can remove the AP at any time, even if the connection would be restored very quickly.
this should be fixed by adding the following line to merge_scanned_ap() when the ap data is updated:

g_object_set_data (G_OBJECT (ap), WPAS_REMOVED_TAG, NULL);

And probably the function supplicant_iface_bss_removed_cb() should set the last-seen property to now.
You should add the following statemeents there to prevent the AP to be removed immediately,
if the signal strength did not change recently.:

g_get_current_time (&now);
nm_ap_set_last_seen (ap, now.tv_sec);

I am pretty sure that this is a network manager bug, and how to solve it.

affects: linux (Ubuntu) → network-manager (Ubuntu)

I compiled the network-manager component as follows:

cd network-manager-0.9.4.0

./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --enable-more-warnings=no

make

installed (as root): src/.libs/NetworkManager to /usr/sbin/NetworkManager (renamed original NetworkManager for possible undo)

The patch runs on my machine since 24h now, without permanent connectivity losses up to now.

The attachment "this is a proposed fix for this bug." seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch

Yes, it is a patch.

Steps to reproduce:

- configure a permanent connection to a WEP-encrypted AP.
- the AP has very good signal quality, and is 99.9% of the time available.
- only sporadic short 1-2 seconds interruptions of the beacon.
- one or two other APs have very poor signal quality, and enter/leave the live list often.

after some days/weeks/months the wifi connection breaks, and the network manager
does no longer see the AP, although the iwlist wlan0 scanning sees it at 99.9% of the time.

That status lasts until either network manager is re-started or the AP is shut down for 10 minutes
and then started again, at that time the AP is visible again.

I am 100% sure that this has an impact on other users too.

Changed in network-manager (Ubuntu):
status: Incomplete → Confirmed

PING...
this patch did run non-stop for 6 weeks now...

Status update

finally I was able to fix this issue upstream see: https://bugzilla.gnome.org/show_bug.cgi?id=733105

But it is only completely fixed in network-manager 1.0

If you want to fix something for ubuntu 12.04 or ubuntu 14.04
you can use my latest local patches.

Note: that in ubuntu 14.04 there is also a bug in the wpa_supplicant
which can lock up the radio work queue, which makes any further
WiFi connections impossible.

Fortunately that must have been fixed in the meantime, because
I found a fix in the upstream wpa_supplicant repository.

here is the latest network-manager patch for 14.04

This is a wpa_supplicant fix that was found upstreams.
It is only necessary for ubuntu 14.04.

The wpa_supplicant from ubuntu 12.04 did not try to
do an internal scan for the AP, and does not need any fix.

To post a comment you must log in.