Invalid ECSA IEs in probe response frames causes connection to drop

Bug #1201470 reported by Robbie Williamson on 2013-07-15
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
network-manager (Ubuntu)
High
Unassigned
Saucy
High
Unassigned

Bug Description

The connection slowly degrades until dropping. Disabling 11n seems to workaround the issue. With the last disconnect, the following messages were in dmesg:

[ 2458.121218] iwlwifi 0000:03:00.0: Microcode SW error detected. Restarting 0x2000000.
[ 2458.121229] iwlwifi 0000:03:00.0: CSR values:
[ 2458.121234] iwlwifi 0000:03:00.0: (2nd byte of CSR_INT_COALESCING is CSR_INT_PERIODIC_REG)
[ 2458.121265] iwlwifi 0000:03:00.0: CSR_HW_IF_CONFIG_REG: 0X00488700
[ 2458.121296] iwlwifi 0000:03:00.0: CSR_INT_COALESCING: 0X0000ff40
[ 2458.121328] iwlwifi 0000:03:00.0: CSR_INT: 0X00000000
[ 2458.121361] iwlwifi 0000:03:00.0: CSR_INT_MASK: 0X00000000
[ 2458.121393] iwlwifi 0000:03:00.0: CSR_FH_INT_STATUS: 0X00000000
[ 2458.121426] iwlwifi 0000:03:00.0: CSR_GPIO_IN: 0X00000030
[ 2458.121455] iwlwifi 0000:03:00.0: CSR_RESET: 0X00000000
[ 2458.121486] iwlwifi 0000:03:00.0: CSR_GP_CNTRL: 0X080403c5
[ 2458.121516] iwlwifi 0000:03:00.0: CSR_HW_REV: 0X000000b0
[ 2458.121547] iwlwifi 0000:03:00.0: CSR_EEPROM_REG: 0X62390ffd
[ 2458.121576] iwlwifi 0000:03:00.0: CSR_EEPROM_GP: 0X90000001
[ 2458.121609] iwlwifi 0000:03:00.0: CSR_OTP_GP_REG: 0X00030001
[ 2458.121641] iwlwifi 0000:03:00.0: CSR_GIO_REG: 0X00080042
[ 2458.121674] iwlwifi 0000:03:00.0: CSR_GP_UCODE_REG: 0X000071af
[ 2458.121706] iwlwifi 0000:03:00.0: CSR_GP_DRIVER_REG: 0X00000000
[ 2458.121735] iwlwifi 0000:03:00.0: CSR_UCODE_DRV_GP1: 0X00000000
[ 2458.121767] iwlwifi 0000:03:00.0: CSR_UCODE_DRV_GP2: 0X00000000
[ 2458.121796] iwlwifi 0000:03:00.0: CSR_LED_REG: 0X00000040
[ 2458.121828] iwlwifi 0000:03:00.0: CSR_DRAM_INT_TBL_REG: 0X88210a3e
[ 2458.121860] iwlwifi 0000:03:00.0: CSR_GIO_CHICKEN_BITS: 0X27800200
[ 2458.121893] iwlwifi 0000:03:00.0: CSR_ANA_PLL_CFG: 0X00000000
[ 2458.121925] iwlwifi 0000:03:00.0: CSR_HW_REV_WA_REG: 0X0001001a
[ 2458.121955] iwlwifi 0000:03:00.0: CSR_DBG_HPET_MEM_REG: 0Xffff0000
[ 2458.121959] iwlwifi 0000:03:00.0: FH register values:
[ 2458.122019] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_STTS_WPTR_REG: 0X210f5200
[ 2458.122077] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_RBDCB_BASE_REG: 0X02115ed0
[ 2458.122138] iwlwifi 0000:03:00.0: FH_RSCSR_CHNL0_WPTR: 0X00000000
[ 2458.122196] iwlwifi 0000:03:00.0: FH_MEM_RCSR_CHNL0_CONFIG_REG: 0X80801114
[ 2458.122257] iwlwifi 0000:03:00.0: FH_MEM_RSSR_SHARED_CTRL_REG: 0X000000fc
[ 2458.122315] iwlwifi 0000:03:00.0: FH_MEM_RSSR_RX_STATUS_REG: 0X07030000
[ 2458.122376] iwlwifi 0000:03:00.0: FH_MEM_RSSR_RX_ENABLE_ERR_IRQ2DRV: 0X00000000
[ 2458.122435] iwlwifi 0000:03:00.0: FH_TSSR_TX_STATUS_REG: 0X07ff0001
[ 2458.122495] iwlwifi 0000:03:00.0: FH_TSSR_TX_ERROR_REG: 0X00000000
[ 2458.122504] iwlwifi 0000:03:00.0: Loaded firmware version: 18.168.6.1
[ 2458.122902] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
[ 2458.122907] iwlwifi 0000:03:00.0: Status: 0x0000004C, count: 6
[ 2458.122912] iwlwifi 0000:03:00.0: 0x000019A8 | ADVANCED_SYSASSERT
[ 2458.122917] iwlwifi 0000:03:00.0: 0x000166CC | uPc
[ 2458.122921] iwlwifi 0000:03:00.0: 0x0000962A | branchlink1
[ 2458.122925] iwlwifi 0000:03:00.0: 0x0000962A | branchlink2
[ 2458.122929] iwlwifi 0000:03:00.0: 0x0000D6BE | interruptlink1
[ 2458.122933] iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2
[ 2458.122937] iwlwifi 0000:03:00.0: 0x00000001 | data1
[ 2458.122941] iwlwifi 0000:03:00.0: 0x20ED7F5E | data2
[ 2458.122946] iwlwifi 0000:03:00.0: 0x00000138 | line
[ 2458.122950] iwlwifi 0000:03:00.0: 0x00018EAD | beacon time
[ 2458.122954] iwlwifi 0000:03:00.0: 0x00000153 | tsf low
[ 2458.122958] iwlwifi 0000:03:00.0: 0x00000000 | tsf hi
[ 2458.122962] iwlwifi 0000:03:00.0: 0x00000000 | time gp1
[ 2458.122966] iwlwifi 0000:03:00.0: 0x8F09138F | time gp2
[ 2458.122970] iwlwifi 0000:03:00.0: 0x00000000 | time gp3
[ 2458.122974] iwlwifi 0000:03:00.0: 0x754312A8 | uCode version
[ 2458.122979] iwlwifi 0000:03:00.0: 0x000000B0 | hw version
[ 2458.122983] iwlwifi 0000:03:00.0: 0x00488700 | board version
[ 2458.122987] iwlwifi 0000:03:00.0: 0x0900004E | hcmd
[ 2458.122991] iwlwifi 0000:03:00.0: 0xAF863080 | isr0
[ 2458.122995] iwlwifi 0000:03:00.0: 0x1141E000 | isr1
[ 2458.122999] iwlwifi 0000:03:00.0: 0x00000F1A | isr2
[ 2458.123003] iwlwifi 0000:03:00.0: 0x0147FCC3 | isr3
[ 2458.123007] iwlwifi 0000:03:00.0: 0x00000000 | isr4
[ 2458.123011] iwlwifi 0000:03:00.0: 0x01000112 | isr_pref
[ 2458.123015] iwlwifi 0000:03:00.0: 0x00024B96 | wait_event
[ 2458.123019] iwlwifi 0000:03:00.0: 0x00000094 | l2p_control
[ 2458.123023] iwlwifi 0000:03:00.0: 0x00000000 | l2p_duration
[ 2458.123027] iwlwifi 0000:03:00.0: 0x0000000F | l2p_mhvalid
[ 2458.123031] iwlwifi 0000:03:00.0: 0x00004080 | l2p_addr_match
[ 2458.123035] iwlwifi 0000:03:00.0: 0x00000005 | lmpm_pmg_sel
[ 2458.123039] iwlwifi 0000:03:00.0: 0x06061222 | timestamp
[ 2458.123043] iwlwifi 0000:03:00.0: 0x00000010 | flow_handler
[ 2458.123220] iwlwifi 0000:03:00.0: Start IWL Event Log Dump: nothing in log
[ 2458.123237] iwlwifi 0000:03:00.0: FW error in SYNC CMD REPLY_TX_LINK_QUALITY_CMD
[ 2458.123244] iwlwifi 0000:03:00.0: Command REPLY_ADD_STA failed: FW Error
[ 2458.123249] iwlwifi 0000:03:00.0: Adding station ff:ff:ff:ff:ff:ff failed.

ProblemType: Bug
DistroRelease: Ubuntu 13.10
Package: linux-firmware 1.112
ProcVersionSignature: Ubuntu 3.10.0-2.11-generic 3.10.0
Uname: Linux 3.10.0-2-generic x86_64
ApportVersion: 2.10.2-0ubuntu4
Architecture: amd64
Date: Mon Jul 15 16:21:58 2013
Dependencies:

InstallationDate: Installed on 2013-01-26 (170 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.5)
MarkForUpload: True
PackageArchitecture: all
SourcePackage: linux-firmware
UpgradeStatus: Upgraded to saucy on 2013-06-07 (37 days ago)

Robbie Williamson (robbiew) wrote :
Changed in linux-firmware (Ubuntu Saucy):
status: New → Confirmed
Robbie Williamson (robbiew) wrote :

attaching var/log/syslog

Robbie Williamson (robbiew) wrote :

lspci -vnvn output attached

Joseph Salisbury (jsalisbury) wrote :

Did this issue just start happening in Saucy? Did it also happen in Raring or other earlier releases?

Changed in linux-firmware (Ubuntu Saucy):
importance: Undecided → High
tags: added: kernel-key
Joseph Salisbury (jsalisbury) wrote :

If you have a chance, it would also be good to know if v3.11-rc1 also exhibits this bug. The 3.11-rc1 kernel can be downloaded from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.11-rc1-saucy/

Robbie Williamson (robbiew) wrote :

Joseph,

So the problem is also occurring with a Precise installed Dell laptop with a Broadcom card. Feels like the issue is related to multiple access points with the same SID, and possibly Network Manager related.

affects: linux-firmware (Ubuntu Saucy) → network-manager (Ubuntu Saucy)
Changed in network-manager (Ubuntu Saucy):
status: Confirmed → New

On Mon, Jul 15, 2013 at 05:16:04PM -0000, Robbie Williamson wrote:
> So the problem is also occurring with a Precise installed Dell laptop
> with a Broadcom card. Feels like the issue is related to multiple
> access points with the same SID, and possibly Network Manager related.

I'd be very surprised if the Precise/Broadcom and Saucy/Intel problems
had the same root cause. Especially based on these logs, which seem to
indicate a failure condition returned from the hardware/ucode.

Thanks for the feedback, Robbie. I'm going to keep a Linux package task as well, just in case there is anything kernel related.

Changed in linux (Ubuntu Saucy):
importance: Undecided → High
status: New → Incomplete

I see nothing here that points to an issue with NetworkManager; although there are messages about flushing TX queues and how the scan request can't be initiated.

Is this something that started happening with a new kernel?

Changed in network-manager (Ubuntu Saucy):
status: New → Incomplete
Seth Forshee (sforshee) wrote :

I suspect a driver or ucode issue based on the logs.

Robbie, can you attach the output of running 'sudo iw wlan0 scan' in the environment where you're getting this problem?

Robbie Williamson (robbiew) wrote :

attaching 'sudo iw wlan0 scan' output

Changed in network-manager (Ubuntu Saucy):
status: Incomplete → Confirmed
status: Confirmed → Incomplete
Changed in linux (Ubuntu Saucy):
status: Incomplete → Confirmed
Robbie Williamson (robbiew) wrote :

I agree that this particular issue seems to be Intel driver specific. The Dell related issue feels like a similar symptom resulting from the same problem, but should be addressed separately.

Joseph Salisbury (jsalisbury) wrote :

@Robbie, can you open a separate bug for the Dell issue?

Seth Forshee (sforshee) wrote :

I have identified one issue that looks like it's contributing to the problem. This isn't a kernel bug, but a problem with the data from the router that causes mac80211 to disconnect from the current AP and roam to another one. The test kernel linked to below contains a workaround that _might_ help, if the AP is also sending us the same information in another way. Please install this kernel and then attach syslog after using it for a little while. Also let me know whether or not it's working any better.

http://people.canonical.com/~sforshee/lp1201470/linux-3.10.0-3.12+lp12014670v201307161347/

Seth Forshee (sforshee) wrote :

I uploaded a new kernel, same as the previous one but with extra debug. Please test this one instead.

http://people.canonical.com/~sforshee/lp1201470/linux-3.10.0-3.12+lp1201470v201307171617/

summary: - Intel wireless microcode driver failure in Saucy causes disconnects
+ 8086:0085 Intel wireless microcode driver failure in Saucy causes
+ disconnects
Changed in linux (Ubuntu Saucy):
status: Confirmed → Incomplete
tags: removed: kernel-key
no longer affects: linux (Ubuntu Saucy)

This isn't a network-manager problem, so marking invalid there.

In this case I the microcode errors are a side effect of frequent reassociations caused by invalid channel swtich announcements from the AP. These errors _are_ a problem, but the issues with this specific network can be fixed by ignoring these invalid IEs. I'll post a fix shortly.

Changed in network-manager (Ubuntu Saucy):
status: Incomplete → Invalid
Changed in linux (Ubuntu):
status: Incomplete → In Progress
summary: - 8086:0085 Intel wireless microcode driver failure in Saucy causes
- disconnects
+ Invalid ECSA IEs in probe response frames causes connection to drop
Seth Forshee (sforshee) wrote :

And here's the fix. I've tested so I don't need feedback, but this kernel can be used until we get the fix into the archive.

http://people.canonical.com/~sforshee/lp1201470/linux-3.11.0-3.7+lp1201470v201308230917/

Seth Forshee (sforshee) on 2013-08-23
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.11.0-3.8

---------------
linux (3.11.0-3.8) saucy; urgency=low

  [ Johannes Berg ]

  * SAUCE: mac80211: ignore (E)CSA in probe response frames
    - LP: #1201470
 -- Tim Gardner <email address hidden> Fri, 23 Aug 2013 09:47:36 -0600

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Tim Flubshi (flubshi) wrote :

Thanks for your fix. I have two questions:
1.) Is this fix part of the upstream linux kernel or only ubuntu specific? If not: Will this be pushed to upstream and in which kernel version?
2.) Does it fix the related bug for the iwl-1000 wifi interface (part oft my Lenovo edge e320), which has similiar symptoms on multiple access points with the same ssid (eduroam network)?

On Fri, Dec 13, 2013 at 01:58:12PM -0000, Tim Flubshi wrote:
> Thanks for your fix. I have two questions:
> 1.) Is this fix part of the upstream linux kernel or only ubuntu specific? If not: Will this be pushed to upstream and in which kernel version?

The fix is upstream as of 3.12 and has been incorporated into the
relevant upstream stable releases as well. So as long as you're on a
relatively recent version of any currently supported kernel series this
problem should not be present (kernels prior to 3.10 were not affected
by the bug).

> 2.) Does it fix the related bug for the iwl-1000 wifi interface (part oft my Lenovo edge e320), which has similiar symptoms on multiple access points with the same ssid (eduroam network)?

I can't say for sure, but based on this short description I don't think
it sounds related. This issue isn't related to multiple APs with the
same SSID but rather to a small number of AP models from Netgear which
transmit some invalid data in probe response frames.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers