The infamous "deauthenticating by local choice (reason=3)"
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | wpasupplicant (Ubuntu) |
Undecided
|
Unassigned | ||
Bug Description
Symptom
----------------
Wireless will disconnect spontaneously when using a WPA2-Entreprise AP, at a frequency ranging from 30 seconds to hours.
Parameters
-------------------
Kernel: 2.6.38-11
Distribution: Ubuntu 11.04 (Natty Narwhal)
Computer: ThinkPad W520
Wireless adapter: Intel Centrino Advanced-N 6205
Wireless driver: iwlagn
What has been tried
-------
- Disabling power management: no change
- Using 'options iwlagn lln_disable=1 [...]': invalid options
Similar bugs: #548992, #837755
=================
Have a look at my super-verbose log, and knock yourself out. The problematic times can be found by searching for 'deauthenticating'.
| Jonathan Allard (joallard) wrote : | #1 |
| Jonathan Allard (joallard) wrote : | #3 |
I've upgraded to Oneiric, and it's gone from reason 3 to reason 2... Strange.
Reason 2 is "Previous authentication not valid", likely caused by auth keys that changes (is authentication to that network done via one-time passwords?)
#define WLAN_REASON_
Please follow the steps in comment #2 to attach debugging information to this bug report.
| Changed in wpasupplicant (Ubuntu): | |
| status: | New → Incomplete |
| Jonathan Allard (joallard) wrote : | #5 |
Well, I don't want to use `apport-collect` because it collects some personal information which I'm trying to limit. Can you be more specific as to your log request?
| Jonathan Allard (joallard) wrote : | #6 |
Although the same behavior is present when upgrading to kernel 3.0.0-12, not the same messages appear: the problem seems to be different. Should we close this one?
Is there another bug I should be looking at?
The most important thing is to attach /var/log/syslog.
| Jonathan Allard (joallard) wrote : | #8 |
Sorry, I wasn't clear enough. The log I attached was the syslog. And no, I didn't file another bug yet. I'm trying to reproduce this one as much as possible to see if the same error message reoccurs.
I had gotten that, but the issue is that this particular syslog excerpt has so many things happening in it it's hard to figure out what's wrong from what's okay: for instance, at some point wpasupplicant gets killed with signal 15, and there's at least one instance of either a suspend/resume cycle or a case where networking/wifi was disabled then re-enabled. It muddies the waters a little.
I think I may have gotten to some good info; but it's nothing that clearly points towards a wpasupplicant issue. There's two cases here:
1) You'll pretty often roam from a BSS to another due to differring signal levels. I think we *may* want to consider whether roaming for this little difference is really worth it (didn't really check the code to see what kind of threshold is set).
Oct 11 17:51:44 panhandle wpa_supplicant[
Oct 11 17:51:44 panhandle wpa_supplicant[
Oct 11 17:51:44 panhandle wpa_supplicant[
Oct 11 17:51:44 panhandle wpa_supplicant[
That's part of why you shortly after that see a "deauth by local choice (reason=3)": reason 3 is WLAN_REASON_
2) Not all wpa.mcgill.ca APs seem to properly respond to the auth requests; seemingly due to overloading. Consider the following instance:
Oct 11 17:47:58 panhandle kernel: [55574.563224] wlan0: authenticate with 00:0b:86:d5:16:41 (try 1)
Oct 11 17:47:58 panhandle kernel: [55574.566014] wlan0: 00:0b:86:d5:16:41 denied authentication (status 17)
Oct 11 17:48:02 panhandle NetworkManager[
Oct 11 17:48:02 panhandle NetworkManager[
Oct 11 17:48:02 panhandle wpa_supplicant[
Oct 11 17:48:02 panhandle wpa_supplicant[
Oct 11 17:48:02 panhandle wpa_supplicant[
Status 17 is "WLAN_STATUS_
FWIW, I'm using the ieee_802_11_defs.h file from the wpasupplicant code to match status and reason codes, if you're interesting in knowing what the numbers mean ;) (http://
The best I can suggest at this point is to contact McGill's network administrators; discuss the matter with them (pointing out this bug report), and see if there's more information that can be retrieved from comparing your system's logs with logs from the APs and auth system. I'll keep this bug report open in case ther...
| Jonathan Allard (joallard) wrote : | #10 |
Great findings! Thanks for taking the time to look through the logs. I'll try to get more precise logs for us to look at. Also, like I said, I've upgraded to Oneiric, so I'll try to see if the same problem persists or if it's another one.
But isn't the behavior of switching APs over such small differences in signal problematic? For example given 2 near possible APs with variable signal strength and a computer that's not moving, the code would perpetually switch from one to another back and forth. Isn't that wrong? Is this what's happening here?
| Jonathan Allard (joallard) wrote : | #11 |
There is some underlying problem in my case, but I don't think switching back and forth is desirable behavior. Maybe the thresholds should be looked at
Jonathan,
In this case, please file a report specifically about updating thresholds with Debian, or directly to the wpasupplicant upstream maintainers (at http://
I'll mark this Won't Fix for Ubuntu because I really strongly believe the threshold are probably just fine; and tend to work properly even in high-contention environments where I've been (such as UDS and other conferences). It doesn't mean I don't agree that there could be tweaks made, but I would rather see those made by the developers themselves or coming from Debian than to artificially keep a delta for this just for Ubuntu.
In this particular case the problem seems to be related directly to an interaction between slow authentication and over-busy APs, both problems being complex enough and important enough that they really should be fixed regardless ;)
| Changed in wpasupplicant (Ubuntu): | |
| status: | Incomplete → Won't Fix |


Well, what appears to be failing here is really roaming that is failing -- the "deauth" message is just a side-effect of it, but not an indication of the issue by itself.
Please try again with the suggested fix of disabling N: the option you want to use is "11n_disable=1", instead of "lln_disable=1".
Also, please use 'apport-collect 872578' to add extra debugging information to this bug report. Thanks!