QCA6174 stops working on newer kernels after second group rekeying

Bug #1743279 reported by André Brait on 2018-01-15
102
This bug affects 17 people
Affects Status Importance Assigned to Milestone
linux-firmware (Ubuntu)
Undecided
Unassigned
Xenial
Medium
Seth Forshee
Artful
Medium
Seth Forshee

Bug Description

After upgrading to the 4.13 kernel on Ubuntu 16.04.3, I've noticed my WiFi would stop working after every 20 minutes or so. The problem initially seems related to some DNS services crashing because of what happend in browsers and other software that usually rely on DNS but I've noticed I couldn't ping my router and other local devices for which I knew the IP addresses. The connection is still presented as being connected, but it just doesn't work.

After googling a lot, I came across this question on askubuntu.com

https://askubuntu.com/questions/967355/wifi-unstable-after-17-10-update

Which led me to this bug report on Debian's bug tracker:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=879184

Which led me to this bug in upstream:

http://lists.infradead.org/pipermail/ath10k/2017-September/010088.html

I've tested the proposed fixes myself and I can confirm they work.

What causes the WiFi to stop working is a bug related to the group rekeying routines.

It seems it only happens in >4.12 kernels, hence why I've only had problems after 4.13 was pushed as the current rolling HWE kernel for 16.04.3.

kvalo made the fix available in version WLAN.RM.4.4.1-00051-QCARMSWP-1 of the firmware-6.bin file, which is the current one present in upstream.

Updating the firmware-6.bin (and board-2.bin, optionally) to any version equal or later than that fixes the issue completely.

-------------------------------------------------------------

SRU Justification:
[Impact]
Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter, available in numerous laptops, including ones that ship with Ubuntu 16.04 pre-installed, silently stops working after the second group rekeying, which is usually few minutes after the user has connected to a WiFi network. The connection status remains unchanged but there's no connectivity at all. This effectively disconnects the user without notifying it of what's occurred.

Additionally, this happens for the only HWE kernel that's been patched against the recent Meltdown vulnerability, leaving the user without the option of using a recent kernel and a secure kernel at the same time.

[Test Case]
After applying the required firmwares, check if the connectivity is unaffected after the second group rekeying, which can be checked with

$ cat /var/log/syslog | grep wpa_.*rekeying

[How to fix it]

Update the firmware-6.bin file to version WLAN.RM.4.4.1-00051-QCARMSWP-1 or later.

[Regression Potential]
The new firmware overwrites the old one, but since it's been in upstream since October 2017, it should be good.

-------------------------------------------------------------

Description: Ubuntu 16.04.3 LTS
Release: 16.04

linux-firmware:
  Instalado: 1.157.14
  Candidato: 1.157.14
  Tabela de versão:
 *** 1.157.14 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages
        500 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial-security/main i386 Packages
        100 /var/lib/dpkg/status
     1.157 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial/main i386 Packages

André Brait (andrebrait) wrote :

Reading https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/ath10k/QCA6174/hw3.0/firmware-6.bin?id=96a7402d4172f4786ee93dd9f7cb3f76e1a8025e it seems the fix for this particular issue was made available in version WLAN.RM.4.4.1-00051-QCARMSWP-1. Updating board-2.bin and firmware-6.bin as in what's available in the upstream for firmware-linux right now should fix the issue.

André Brait (andrebrait) wrote :

It's important to notice that this bugs affects any post-4.12 kernel, so it's present in both 16.04.3 HWE and 17.10 versions.

description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu):
status: New → Confirmed
André Brait (andrebrait) wrote :

While this isn't fixed in Ubuntu, users can fix this issue with the following command:

sudo wget https://github.com/kvalo/ath10k-firmware/raw/master/QCA6174/hw3.0/4.4.1/firmware-6.bin_WLAN.RM.4.4.1-00051-QCARMSWP-1 -O /lib/firmware/ath10k/QCA6174/hw3.0/firmware-6.bin

description: updated
André Brait (andrebrait) wrote :

Hi there, AceLan,

I've subscribed you to this bug because of 1) I've noticed you reported another bug, which had a fix submitted to xenial-proposed a few days ago, and I'm looking for some guidance here on how to have this bug fixed ASAP (even by me, if it means creating and submitting a new package myself or something like that, as per Ubuntu's documentation on fixing bugs) and 2) You're probably affected by it, if you own a QCA6174 and you're using a kernel later than 4.12.

I'd consider this bug a high priority one because it directly affects users (and in a very significant way), it's been already fixed in upstream since October 2017 and porting the fix to Ubuntu would be trivial, I guess.

André Brait (andrebrait) wrote :

AceLan, you can disconsider the last comment. I've found a way to use mIRC on the web browser and I've contacted the Kernel team there. They're already looking into this.

Seth Forshee (sforshee) wrote :

@André: I've updated the firmware in the package below. Can you test this package and confirm it fixes the issue? Thanks!

http://people.canonical.com/~sforshee/lp1743279/linux-firmware_1.157.16~pre201801170951_all.deb

Changed in linux-firmware (Ubuntu):
status: Confirmed → Fix Released
Changed in linux-firmware (Ubuntu Xenial):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → Medium
status: New → In Progress
status: In Progress → Incomplete
André Brait (andrebrait) wrote :

@Seth I just confirmed that it does fix the bug.

I forced the re-installation of version 1.157.14 and rebooted. I checked that the old firmware was loaded and that the files in the /lib/firmware/ath10k/QCA6174/hw3.0/ folder were indeed the ones that came with the old package. Then I let the bug happen, which it did.

Then I downloaded your package and installed it with dpkg. Rebooted, checked that the new firmware was loaded, checked the files and it's been 5 group rekeyings without any issues.

André Brait (andrebrait) wrote :

@Seth All good after a night long connected without interruptions and/or slowdowns. It's working great.

Seth Forshee (sforshee) wrote :

Thanks! I've uploaded linux-firmware 1.157.16, once that hits xenial-proposed you'll get another request to test the package there.

Changed in linux-firmware (Ubuntu Xenial):
status: Incomplete → Fix Committed
Seth Forshee (sforshee) wrote :

Added artful nomination based on duplicate bug 1744187.

André Brait (andrebrait) wrote :

@Seth You might have missed it (there's lots of text here, so it's more than ok :-) ) but I already had said in one comment here that it affects Artful as well (and Bionic, of course, but that is still in development).

Pretty much it affects any distro using a kernel newer than or equal to 4.12. I've posted a report from Debian here and I've found mentions of this bug in Arch as well.

The original question on askubuntu that led me to report the bug (although I had experienced it in Xenial) was from a user running Artful.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu Artful):
status: New → Confirmed

This was the problem that caused me to roll back to Zesty shortly after Artful was released. I just redid the upgrade because of the EoL, expecting that it surely would have been fixed in the meantime, but alas not.

At least this time I could eventually find a bug and workaround :)

Seth Forshee (sforshee) on 2018-01-24
Changed in linux-firmware (Ubuntu Artful):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → Medium
status: Confirmed → Fix Committed
Łukasz Zemczak (sil2100) wrote :

What's the status of this for artful? I would prefer not pushing a fix for xenial without one in artful as users on this upgrade path would basically get regressions re-introduced. Could someone upload to the artful queue?

Seth Forshee (sforshee) wrote :

I will be uploading for artful shortly.

Hello André, or anyone else affected,

Accepted linux-firmware into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/1.169.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Łukasz Zemczak (sil2100) wrote :

Hello André, or anyone else affected,

Accepted linux-firmware into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/1.157.16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

oLoBoLGeR (olobolger) wrote :

Hi, I do confrirm that linux-firmware 1.157.16 indeed fixes this problem, and it doesn't introduce any regression to previous version. My laptop is a Dell XPS, which includes a Qualcomm Atheros QCA6174 Wireless Network Adapter.

I am on kernel 4.13.0-32 and by using firmware ver WLAN.RM.4.4.1-00051-QCARMSWP-1 api 6, I can check that the connection lasts and it can performs smoothly for some hours now (when it was dropping silently after 10/15 minutes before).

Thank you very much to everyone involved in solving this annoying bug.

André Brait (andrebrait) wrote :

I've been testing version 1.157.16 from xenial-proposed for the last few hours. It successfully fixes the issue, and I can confirm that the correct firmware is loaded.

tags: added: verification-done-xenial
tags: added: verification-needed-artful

Tested 1.169.3 from artful-proposed and can confirm it fixes the issue on 17.10

tags: added: verification-done-artful
removed: verification-needed-artful
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-firmware - 1.169.3

---------------
linux-firmware (1.169.3) artful; urgency=medium

  * QCA6174 stops working on newer kernels after second group rekeying
    (LP: #1743279)
    - ath10k: QCA6174 hw3.0: update firmware-6.bin to WLAN.RM.4.4.1-00051-QCARMSWP-1

 -- Seth Forshee <email address hidden> Thu, 25 Jan 2018 14:23:42 -0600

Changed in linux-firmware (Ubuntu Artful):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for linux-firmware has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

dann frazier (dannf) wrote :

@Drew It means that those fixes have now been made available in the linux-firmware package in xenial-proposed - you don't have to install the individual files.

In this case, you'll want to remove the files you added under /lib/firmware, install the updated linux-firmware package, and reboot to see it it solves your issue.

Here are instructions on verifying a fix from proposed:
  https://wiki.ubuntu.com/QATeam/PerformingSRUVerification

xenial-proposed is a staging area for fixes that need to be verified before moving into the updates stream.

SoundVM (soundvm) wrote :

Hello, the package linux-firmware - 1.169.3 doesn't fix the bug on 17.10 for me... quite the reverse
The wifi connexion was still very unstable, downgraded until i switched yesterday to the "kvalo Firmware patches" : no more errors like "ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02000020 DMADBG_7=0x0000a400" in dmesg and high speed connection.
I had no pb before on 17.04

ubuntu 17.10 (from 17.04) on Dell XPS8700 with Qualcomm Atheros AR9485 Wireless Network Adapter

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-firmware - 1.157.16

---------------
linux-firmware (1.157.16) xenial; urgency=medium

  * Connection issue in Bluetooth SPP mode between a Dell Edge Gateway 3000
    and an HC-05 BT module attached to Arduino Uno (LP: #1738773)
    - UBUNTU: linux-firmware: update firmware images for Redpine 9113 chipset

  * QCA6174 stops working on newer kernels after second group rekeying
    (LP: #1743279)
    - ath10k: QCA6174 hw3.0: update firmware-6.bin to WLAN.RM.4.4.1-00051-QCARMSWP-1

 -- Seth Forshee <email address hidden> Thu, 18 Jan 2018 07:29:44 -0600

Changed in linux-firmware (Ubuntu Xenial):
status: Fix Committed → Fix Released
Chai T. Rex (chaitrex) wrote :

@SoundVM, this bug is only about QCA6174, not AR9485, as seen in the bug report title ("QCA6174 stops working on newer kernels after second group rekeying").

You may want to submit a new bug report if you want the issue looked into.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers