iwlwifi fails to work on 16 or more logical CPUs machines

Bug #1805088 reported by AceLan Kao on 2018-11-26
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
HWE Next
Undecided
Unassigned
linux (Ubuntu)
Undecided
AceLan Kao
Bionic
Undecided
Unassigned
linux-oem (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned

Bug Description

[Impact]
Got dmesg kernel BUG while loading iwlwifi driver on machines which has 16 or more logical CPUs
   kernel: RIP: 0010:iwl_pcie_rxq_alloc_rbs+0x1d0/0x1f0 [iwlwifi]
And it leads to wifi can't scan any APs, and leads to system hangs while suspending.

[Fix]
Below commit contained in v4.17 fix the issue.
   ab1068d6866e iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs

[Regression Potential]
Low.

AceLan Kao (acelankao) wrote :

commit ab1068d6866e28bf6427ceaea681a381e5870a4a
Author: Hao Wei Tee <email address hidden>
Date: Tue May 29 10:25:17 2018 +0300

    iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs

    When there are 16 or more logical CPUs, we request for
    `IWL_MAX_RX_HW_QUEUES` (16) IRQs only as we limit to that number of
    IRQs, but later on we compare the number of IRQs returned to
    nr_online_cpus+2 instead of max_irqs, the latter being what we
    actually asked for. This ends up setting num_rx_queues to 17 which
    causes lots of out-of-bounds array accesses later on.

    Compare to max_irqs instead, and also add an assertion in case
    num_rx_queues > IWM_MAX_RX_HW_QUEUES.

    This fixes https://bugzilla.kernel.org/show_bug.cgi?id=199551

    Fixes: 2e5d4a8f61dc ("iwlwifi: pcie: Add new configuration to enable MSIX")
    Signed-off-by: Hao Wei Tee <email address hidden>
    Tested-by: Sara Sharon <email address hidden>
    Signed-off-by: Luca Coelho <email address hidden>
    Signed-off-by: Kalle Valo <email address hidden>

AceLan Kao (acelankao) on 2018-11-26
tags: added: originate-from-1804367 somerville
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
status: Incomplete → In Progress
David Jordan (dmj726) wrote :

I encountered this iwlwifi failure while testing the Intel 9260 with the AMD Ryzen 7 2700x (8 core, 16 thread). By bisecting the kernel, I was able to narrow it down to the same commit. I can also verify that setting maxcpus=15, fixes wifi (at the cost of performance).

This is a really important fix for certain System76 products. We'd really like to see it backported so Ubuntu works without compromises on our machines. I'm happy to test any testing or proposed kernels.

AceLan Kao (acelankao) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
David Jordan (dmj726) wrote :

There isn't yet a new kernel in *bionic* -proposed to solve this issue, so I can't test it.

I have verified that building the 4.15.0-42.45 kernel with the patch (https://patchwork.kernel.org/patch/10382693/) applied fixes wifi for 16+ threaded systems on bionic.

AceLan Kao (acelankao) wrote :

There is a new kernel just uploaded, Ubuntu-4.15.0-43.46
but it looks like kernel team miss this patch.
According to the schedule[1], the next kernel in -proposed would be at Jan. 21st 2019

BTW, the above comment #3 is for oem kernel 4.15.0-1029.

1. http://kernel.ubuntu.com/

Changed in linux-oem (Ubuntu Bionic):
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem - 4.15.0-1030.35

---------------
linux-oem (4.15.0-1030.35) bionic; urgency=medium

  * linux-oem: 4.15.0-1030.35 -proposed tracker (LP: #1806663)

  * Add HMS CAN driver for Dell Edge Gateways (LP: #1807339)
    - SAUCE: (no-up) add IXXAT USB-to-CAN driver

  * Add support for 0cf3:535b QCA_ROME device (LP: #1807333)
    - Bluetooth: btusb: Add support for 0cf3:535b QCA_ROME device

  * Add support for Dell DW5821e WWAN/GPS module (LP: #1807342)
    - qmi_wwan: add support for the Dell Wireless 5821e module
    - qmi_wwan: fix interface number for DW5821e production firmware
    - USB: option: add support for DW5821e

  * Fix Terminus USB hub that may breaks connected USB devices after S3
    (LP: #1806850)
    - USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub

  * The line-out on the Dell Dock station can't work (LP: #1806532)
    - ALSA: usb-audio: Allow to override the longname string
    - ALSA: usb-audio: Give proper vendor/product name for Dell WD15 Dock
    - ALSA: usb-audio: Add vendor and product name for Dell WD19 Dock

  * Enable new Realtek card reader (LP: #1806335)
    - USB: usb-storage: Add new IDs to ums-realtek
    - SAUCE: (noup) USB: usb-storage: Make MMC support optional on ums-realtek

  [ Ubuntu: 4.15.0-43.46 ]

  * linux: 4.15.0-43.46 -proposed tracker (LP: #1806659)
  * System randomly hangs during suspend when mei_wdt is loaded (LP: #1803942)
    - SAUCE: base/dd: limit release function changes to vfio driver only
  * Workaround CSS timeout on AMD SNPS 3.0 xHC (LP: #1806838)
    - xhci: Allow more than 32 quirks
    - xhci: workaround CSS timeout on AMD SNPS 3.0 xHC
  * linux-buildinfo: pull out ABI information into its own package
    (LP: #1806380)
    - [Packaging] limit preparation to linux-libc-dev in headers
    - [Packaging] commonise debhelper invocation
    - [Packaging] ABI -- accumulate abi information at the end of the build
    - [Packaging] buildinfo -- add basic build information
    - [Packaging] buildinfo -- add firmware information to the flavour ABI
    - [Packaging] buildinfo -- add compiler information to the flavour ABI
    - [Packaging] buildinfo -- add buildinfo support to getabis
    - [Config] buildinfo -- add retpoline version markers
  * linux packages should own /usr/lib/linux/triggers (LP: #1770256)
    - [Packaging] own /usr/lib/linux/triggers
  * CVE-2018-12896
    - posix-timers: Sanitize overrun handling
  * CVE-2018-16276
    - USB: yurex: fix out-of-bounds uaccess in read handler
  * CVE-2018-10902
    - ALSA: rawmidi: Change resized buffers atomically
  * CVE-2018-18710
    - cdrom: fix improper type cast, which can leat to information leak.
  * CVE-2018-18690
    - xfs: don't fail when converting shortform attr to long form during
      ATTR_REPLACE
  * CVE-2018-14734
    - infiniband: fix a possible use-after-free bug
  * CVE-2018-18445
    - bpf: 32-bit RSH verification must truncate input before the ALU op
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

 -- Chia-Lin Kao (AceLan) <email address hidden> Sat, 08 Dec 2018 11:02:54 +0800

Changed in linux-oem (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux-oem (Ubuntu):
status: New → Fix Released
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
AceLan Kao (acelankao) on 2019-01-17
tags: added: verification-done-bionic
removed: verification-needed-bionic
Changed in hwe-next:
status: New → Fix Released
AceLan Kao (acelankao) on 2019-07-30
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.