The infamous "deauthenticating by local choice (reason=3)"

Bug #872578 reported by Jonathan Allard
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
wpasupplicant (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Symptom
----------------
Wireless will disconnect spontaneously when using a WPA2-Entreprise AP, at a frequency ranging from 30 seconds to hours.

Parameters
-------------------
Kernel: 2.6.38-11
Distribution: Ubuntu 11.04 (Natty Narwhal)
Computer: ThinkPad W520
Wireless adapter: Intel Centrino Advanced-N 6205
Wireless driver: iwlagn

What has been tried
-------------------------------
- Disabling power management: no change
- Using 'options iwlagn lln_disable=1 [...]': invalid options

Similar bugs: #548992, #837755

=================

Have a look at my super-verbose log, and knock yourself out. The problematic times can be found by searching for 'deauthenticating'.

Revision history for this message
Jonathan Allard (joallard) wrote :
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Well, what appears to be failing here is really roaming that is failing -- the "deauth" message is just a side-effect of it, but not an indication of the issue by itself.

Please try again with the suggested fix of disabling N: the option you want to use is "11n_disable=1", instead of "lln_disable=1".

Also, please use 'apport-collect 872578' to add extra debugging information to this bug report. Thanks!

Revision history for this message
Jonathan Allard (joallard) wrote :

I've upgraded to Oneiric, and it's gone from reason 3 to reason 2... Strange.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Reason 2 is "Previous authentication not valid", likely caused by auth keys that changes (is authentication to that network done via one-time passwords?)
#define WLAN_REASON_PREV_AUTH_NOT_VALID 2

Please follow the steps in comment #2 to attach debugging information to this bug report.

Changed in wpasupplicant (Ubuntu):
status: New → Incomplete
Revision history for this message
Jonathan Allard (joallard) wrote :

Well, I don't want to use `apport-collect` because it collects some personal information which I'm trying to limit. Can you be more specific as to your log request?

Revision history for this message
Jonathan Allard (joallard) wrote :

Although the same behavior is present when upgrading to kernel 3.0.0-12, not the same messages appear: the problem seems to be different. Should we close this one?

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Is there another bug I should be looking at?

The most important thing is to attach /var/log/syslog.

Revision history for this message
Jonathan Allard (joallard) wrote :

Sorry, I wasn't clear enough. The log I attached was the syslog. And no, I didn't file another bug yet. I'm trying to reproduce this one as much as possible to see if the same error message reoccurs.

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :
Download full text (3.4 KiB)

I had gotten that, but the issue is that this particular syslog excerpt has so many things happening in it it's hard to figure out what's wrong from what's okay: for instance, at some point wpasupplicant gets killed with signal 15, and there's at least one instance of either a suspend/resume cycle or a case where networking/wifi was disabled then re-enabled. It muddies the waters a little.

I think I may have gotten to some good info; but it's nothing that clearly points towards a wpasupplicant issue. There's two cases here:

1) You'll pretty often roam from a BSS to another due to differring signal levels. I think we *may* want to consider whether roaming for this little difference is really worth it (didn't really check the code to see what kind of threshold is set).

Oct 11 17:51:44 panhandle wpa_supplicant[772]: Considering within-ESS reassociation
Oct 11 17:51:44 panhandle wpa_supplicant[772]: Current BSS: 00:0b:86:d5:1e:91 level=175
Oct 11 17:51:44 panhandle wpa_supplicant[772]: Selected BSS: 00:0b:86:d5:16:51 level=177
Oct 11 17:51:44 panhandle wpa_supplicant[772]: Trying to associate with 00:0b:86:d5:16:51 (SSID='wpa.mcgill.ca' freq=5745 MHz)

That's part of why you shortly after that see a "deauth by local choice (reason=3)": reason 3 is WLAN_REASON_DEAUTH_LEAVING; due to roaming or any such cases.

2) Not all wpa.mcgill.ca APs seem to properly respond to the auth requests; seemingly due to overloading. Consider the following instance:

Oct 11 17:47:58 panhandle kernel: [55574.563224] wlan0: authenticate with 00:0b:86:d5:16:41 (try 1)
Oct 11 17:47:58 panhandle kernel: [55574.566014] wlan0: 00:0b:86:d5:16:41 denied authentication (status 17)
Oct 11 17:48:02 panhandle NetworkManager[568]: <warn> Activation (wlan0/wireless): association took too long.
Oct 11 17:48:02 panhandle NetworkManager[568]: <info> (wlan0): device state change: 5 -> 6 (reason 0)
Oct 11 17:48:02 panhandle wpa_supplicant[772]: dbus: Unregister network object '/fi/w1/wpa_supplicant1/Interfaces/1/Networks/0'
Oct 11 17:48:02 panhandle wpa_supplicant[772]: No keys have been configured - skip key clearing
Oct 11 17:48:02 panhandle wpa_supplicant[772]: State: ASSOCIATING -> DISCONNECTED

Status 17 is "WLAN_STATUS_AP_UNABLE_TO_HANDLE_NEW_STA"; basically the AP responds that it can't deal with the new association; more or less meaning it's already to capacity with clients and can't handle more. (See http://lists.shmoo.com/pipermail/hostap/2004-May/006769.html)

FWIW, I'm using the ieee_802_11_defs.h file from the wpasupplicant code to match status and reason codes, if you're interesting in knowing what the numbers mean ;) (http://hostap.epitest.fi/gitweb/gitweb.cgi?p=hostap.git;a=blob;f=src/common/ieee802_11_defs.h). You can similarly get the NM reason/status codes here: http://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/include/NetworkManager.h.

The best I can suggest at this point is to contact McGill's network administrators; discuss the matter with them (pointing out this bug report), and see if there's more information that can be retrieved from comparing your system's logs with logs from the APs and auth system. I'll keep this bug report open in case ther...

Read more...

Revision history for this message
Jonathan Allard (joallard) wrote :

Great findings! Thanks for taking the time to look through the logs. I'll try to get more precise logs for us to look at. Also, like I said, I've upgraded to Oneiric, so I'll try to see if the same problem persists or if it's another one.

But isn't the behavior of switching APs over such small differences in signal problematic? For example given 2 near possible APs with variable signal strength and a computer that's not moving, the code would perpetually switch from one to another back and forth. Isn't that wrong? Is this what's happening here?

Revision history for this message
Jonathan Allard (joallard) wrote :

There is some underlying problem in my case, but I don't think switching back and forth is desirable behavior. Maybe the thresholds should be looked at

Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

Jonathan,

In this case, please file a report specifically about updating thresholds with Debian, or directly to the wpasupplicant upstream maintainers (at http://w1.fi/bugz/).

I'll mark this Won't Fix for Ubuntu because I really strongly believe the threshold are probably just fine; and tend to work properly even in high-contention environments where I've been (such as UDS and other conferences). It doesn't mean I don't agree that there could be tweaks made, but I would rather see those made by the developers themselves or coming from Debian than to artificially keep a delta for this just for Ubuntu.

In this particular case the problem seems to be related directly to an interaction between slow authentication and over-busy APs, both problems being complex enough and important enough that they really should be fixed regardless ;)

Changed in wpasupplicant (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.