QCA6174 stops working on newer kernels after second group rekeying

Bug #1743279 reported by Andre Brait
126
This bug affects 22 people
Affects Status Importance Assigned to Milestone
linux-firmware (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Medium
Seth Forshee
Artful
Fix Released
Medium
Seth Forshee

Bug Description

After upgrading to the 4.13 kernel on Ubuntu 16.04.3, I've noticed my WiFi would stop working after every 20 minutes or so. The problem initially seems related to some DNS services crashing because of what happend in browsers and other software that usually rely on DNS but I've noticed I couldn't ping my router and other local devices for which I knew the IP addresses. The connection is still presented as being connected, but it just doesn't work.

After googling a lot, I came across this question on askubuntu.com

https://askubuntu.com/questions/967355/wifi-unstable-after-17-10-update

Which led me to this bug report on Debian's bug tracker:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=879184

Which led me to this bug in upstream:

http://lists.infradead.org/pipermail/ath10k/2017-September/010088.html

I've tested the proposed fixes myself and I can confirm they work.

What causes the WiFi to stop working is a bug related to the group rekeying routines.

It seems it only happens in >4.12 kernels, hence why I've only had problems after 4.13 was pushed as the current rolling HWE kernel for 16.04.3.

kvalo made the fix available in version WLAN.RM.4.4.1-00051-QCARMSWP-1 of the firmware-6.bin file, which is the current one present in upstream.

Updating the firmware-6.bin (and board-2.bin, optionally) to any version equal or later than that fixes the issue completely.

-------------------------------------------------------------

SRU Justification:
[Impact]
Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter, available in numerous laptops, including ones that ship with Ubuntu 16.04 pre-installed, silently stops working after the second group rekeying, which is usually few minutes after the user has connected to a WiFi network. The connection status remains unchanged but there's no connectivity at all. This effectively disconnects the user without notifying it of what's occurred.

Additionally, this happens for the only HWE kernel that's been patched against the recent Meltdown vulnerability, leaving the user without the option of using a recent kernel and a secure kernel at the same time.

[Test Case]
After applying the required firmwares, check if the connectivity is unaffected after the second group rekeying, which can be checked with

$ cat /var/log/syslog | grep wpa_.*rekeying

[How to fix it]

Update the firmware-6.bin file to version WLAN.RM.4.4.1-00051-QCARMSWP-1 or later.

[Regression Potential]
The new firmware overwrites the old one, but since it's been in upstream since October 2017, it should be good.

-------------------------------------------------------------

Description: Ubuntu 16.04.3 LTS
Release: 16.04

linux-firmware:
  Instalado: 1.157.14
  Candidato: 1.157.14
  Tabela de versão:
 *** 1.157.14 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main i386 Packages
        500 http://archive.ubuntu.com/ubuntu xenial-security/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial-security/main i386 Packages
        100 /var/lib/dpkg/status
     1.157 500
        500 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages
        500 http://archive.ubuntu.com/ubuntu xenial/main i386 Packages

Revision history for this message
Andre Brait (andrebrait) wrote :

Reading https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/ath10k/QCA6174/hw3.0/firmware-6.bin?id=96a7402d4172f4786ee93dd9f7cb3f76e1a8025e it seems the fix for this particular issue was made available in version WLAN.RM.4.4.1-00051-QCARMSWP-1. Updating board-2.bin and firmware-6.bin as in what's available in the upstream for firmware-linux right now should fix the issue.

Revision history for this message
Andre Brait (andrebrait) wrote :

It's important to notice that this bugs affects any post-4.12 kernel, so it's present in both 16.04.3 HWE and 17.10 versions.

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu):
status: New → Confirmed
Revision history for this message
Andre Brait (andrebrait) wrote :

While this isn't fixed in Ubuntu, users can fix this issue with the following command:

sudo wget https://github.com/kvalo/ath10k-firmware/raw/master/QCA6174/hw3.0/4.4.1/firmware-6.bin_WLAN.RM.4.4.1-00051-QCARMSWP-1 -O /lib/firmware/ath10k/QCA6174/hw3.0/firmware-6.bin

description: updated
Revision history for this message
Andre Brait (andrebrait) wrote :

Hi there, AceLan,

I've subscribed you to this bug because of 1) I've noticed you reported another bug, which had a fix submitted to xenial-proposed a few days ago, and I'm looking for some guidance here on how to have this bug fixed ASAP (even by me, if it means creating and submitting a new package myself or something like that, as per Ubuntu's documentation on fixing bugs) and 2) You're probably affected by it, if you own a QCA6174 and you're using a kernel later than 4.12.

I'd consider this bug a high priority one because it directly affects users (and in a very significant way), it's been already fixed in upstream since October 2017 and porting the fix to Ubuntu would be trivial, I guess.

Revision history for this message
Andre Brait (andrebrait) wrote :

AceLan, you can disconsider the last comment. I've found a way to use mIRC on the web browser and I've contacted the Kernel team there. They're already looking into this.

Revision history for this message
Seth Forshee (sforshee) wrote :

@André: I've updated the firmware in the package below. Can you test this package and confirm it fixes the issue? Thanks!

http://people.canonical.com/~sforshee/lp1743279/linux-firmware_1.157.16~pre201801170951_all.deb

Changed in linux-firmware (Ubuntu):
status: Confirmed → Fix Released
Changed in linux-firmware (Ubuntu Xenial):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → Medium
status: New → In Progress
status: In Progress → Incomplete
Revision history for this message
Andre Brait (andrebrait) wrote :

@Seth I just confirmed that it does fix the bug.

I forced the re-installation of version 1.157.14 and rebooted. I checked that the old firmware was loaded and that the files in the /lib/firmware/ath10k/QCA6174/hw3.0/ folder were indeed the ones that came with the old package. Then I let the bug happen, which it did.

Then I downloaded your package and installed it with dpkg. Rebooted, checked that the new firmware was loaded, checked the files and it's been 5 group rekeyings without any issues.

Revision history for this message
Andre Brait (andrebrait) wrote :

@Seth All good after a night long connected without interruptions and/or slowdowns. It's working great.

Revision history for this message
Seth Forshee (sforshee) wrote :

Thanks! I've uploaded linux-firmware 1.157.16, once that hits xenial-proposed you'll get another request to test the package there.

Changed in linux-firmware (Ubuntu Xenial):
status: Incomplete → Fix Committed
Revision history for this message
Seth Forshee (sforshee) wrote :

Added artful nomination based on duplicate bug 1744187.

Revision history for this message
Andre Brait (andrebrait) wrote :

@Seth You might have missed it (there's lots of text here, so it's more than ok :-) ) but I already had said in one comment here that it affects Artful as well (and Bionic, of course, but that is still in development).

Pretty much it affects any distro using a kernel newer than or equal to 4.12. I've posted a report from Debian here and I've found mentions of this bug in Arch as well.

The original question on askubuntu that led me to report the bug (although I had experienced it in Xenial) was from a user running Artful.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu Artful):
status: New → Confirmed
Revision history for this message
Daniel Carosone (daniel-carosone) wrote :

This was the problem that caused me to roll back to Zesty shortly after Artful was released. I just redid the upgrade because of the EoL, expecting that it surely would have been fixed in the meantime, but alas not.

At least this time I could eventually find a bug and workaround :)

Seth Forshee (sforshee)
Changed in linux-firmware (Ubuntu Artful):
assignee: nobody → Seth Forshee (sforshee)
importance: Undecided → Medium
status: Confirmed → Fix Committed
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

What's the status of this for artful? I would prefer not pushing a fix for xenial without one in artful as users on this upgrade path would basically get regressions re-introduced. Could someone upload to the artful queue?

Revision history for this message
Seth Forshee (sforshee) wrote :

I will be uploading for artful shortly.

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello André, or anyone else affected,

Accepted linux-firmware into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/1.169.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello André, or anyone else affected,

Accepted linux-firmware into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/1.157.16 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
oLoBoLGeR (olobolger) wrote :

Hi, I do confrirm that linux-firmware 1.157.16 indeed fixes this problem, and it doesn't introduce any regression to previous version. My laptop is a Dell XPS, which includes a Qualcomm Atheros QCA6174 Wireless Network Adapter.

I am on kernel 4.13.0-32 and by using firmware ver WLAN.RM.4.4.1-00051-QCARMSWP-1 api 6, I can check that the connection lasts and it can performs smoothly for some hours now (when it was dropping silently after 10/15 minutes before).

Thank you very much to everyone involved in solving this annoying bug.

Revision history for this message
Andre Brait (andrebrait) wrote :

I've been testing version 1.157.16 from xenial-proposed for the last few hours. It successfully fixes the issue, and I can confirm that the correct firmware is loaded.

Andre Brait (andrebrait)
tags: added: verification-done-xenial
tags: added: verification-needed-artful
Revision history for this message
Bruce (bruce-steedman) wrote :

Tested 1.169.3 from artful-proposed and can confirm it fixes the issue on 17.10

Andre Brait (andrebrait)
tags: added: verification-done-artful
removed: verification-needed-artful
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-firmware - 1.169.3

---------------
linux-firmware (1.169.3) artful; urgency=medium

  * QCA6174 stops working on newer kernels after second group rekeying
    (LP: #1743279)
    - ath10k: QCA6174 hw3.0: update firmware-6.bin to WLAN.RM.4.4.1-00051-QCARMSWP-1

 -- Seth Forshee <email address hidden> Thu, 25 Jan 2018 14:23:42 -0600

Changed in linux-firmware (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for linux-firmware has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Drew (dboardman3) wrote :
Revision history for this message
dann frazier (dannf) wrote :

@Drew It means that those fixes have now been made available in the linux-firmware package in xenial-proposed - you don't have to install the individual files.

In this case, you'll want to remove the files you added under /lib/firmware, install the updated linux-firmware package, and reboot to see it it solves your issue.

Here are instructions on verifying a fix from proposed:
  https://wiki.ubuntu.com/QATeam/PerformingSRUVerification

xenial-proposed is a staging area for fixes that need to be verified before moving into the updates stream.

Revision history for this message
SoundVM (soundvm) wrote :

Hello, the package linux-firmware - 1.169.3 doesn't fix the bug on 17.10 for me... quite the reverse
The wifi connexion was still very unstable, downgraded until i switched yesterday to the "kvalo Firmware patches" : no more errors like "ath: phy0: DMA failed to stop in 10 ms AR_CR=0x00000024 AR_DIAG_SW=0x02000020 DMADBG_7=0x0000a400" in dmesg and high speed connection.
I had no pb before on 17.04

ubuntu 17.10 (from 17.04) on Dell XPS8700 with Qualcomm Atheros AR9485 Wireless Network Adapter

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-firmware - 1.157.16

---------------
linux-firmware (1.157.16) xenial; urgency=medium

  * Connection issue in Bluetooth SPP mode between a Dell Edge Gateway 3000
    and an HC-05 BT module attached to Arduino Uno (LP: #1738773)
    - UBUNTU: linux-firmware: update firmware images for Redpine 9113 chipset

  * QCA6174 stops working on newer kernels after second group rekeying
    (LP: #1743279)
    - ath10k: QCA6174 hw3.0: update firmware-6.bin to WLAN.RM.4.4.1-00051-QCARMSWP-1

 -- Seth Forshee <email address hidden> Thu, 18 Jan 2018 07:29:44 -0600

Changed in linux-firmware (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Chai T. Rex (chaitrex) wrote :

@SoundVM, this bug is only about QCA6174, not AR9485, as seen in the bug report title ("QCA6174 stops working on newer kernels after second group rekeying").

You may want to submit a new bug report if you want the issue looked into.

Revision history for this message
Tugkan Batu (tugkanbatu) wrote :

I think this bug affects me on Ubuntu 18.04 (it had not when I was on 17.10).

Following instructions for updating firmware at
https://wireless.wiki.kernel.org/en/users/Drivers/ath10k/firmware ,
I have replaced the firmware file /lib/firmware/ath10k/QCA6174/hw3.0/firmware-6.bin
(which is firmware-6.bin_WLAN.RM.4.4.1-00079-QCARMSWPZ-1)
with
firmware-6.bin_WLAN.RM.4.4.1-00110-QCARMSWP-1.

The problem persists. Do you think another update to the firmware file is needed?

Revision history for this message
Tugkan Batu (tugkanbatu) wrote :

Actually, not an upgrade but a downgrade seems to have solved my problem.
I have replaced the firmware file /lib/firmware/ath10k/QCA6174/hw3.0/firmware-6.bin with
firmware-6.bin_WLAN.RM.4.4.1-00065-QCARMSWP-1
(which is what Ubuntu 17.10 might have had)
and I have not had a single outage after a whole day. Before the change, the outages were very frequent.

So, it seems that the newer firmware files does not play nicely with my WiFi.

Hopefully, this could help to rectify the problem on Ubuntu 18.04 with the new driver versions. Please let me know if I can provide further information to help for the diagnosis.

Revision history for this message
Andre Brait (andrebrait) wrote :

Unless your dropouts were happening exactly after the second group rekeying, it is a different bug that would need its own bug report.

I haven't had any problems with 18.04 or any distro that uses newer kernels and firmwares (say Arch).

Revision history for this message
Tugkan Batu (tugkanbatu) wrote :

Thanks Andre. I am not sure how to confirm whether "the dropouts were happening exactly after the second group rekeying." The apparent behaviour fit the descriptions well, so I assumed it might be related.
I saw the following command mentioned (possiblly, for diagnosis), but I was not clear on what to expect):
cat /var/log/syslog | grep wpa_.*rekeying

How can I confirm whether my problems are related to the second group rekeying?

Apologies if it turns out that my bug is not relevant to this bug entry.

Revision history for this message
Nicholas Godridge (nickyg23) wrote :

same bug (apparently), killer 1435 QCA6174, upgraded board-2.bin and fimware-6.bin to the latest available on the ath10k but still experience disconnection, though with slightly more time in between drops (from every 1-5 minutes to once every 5-10)

Revision history for this message
Eugene Savelov (savelov) wrote :

Regarding latest firmware - I am currently testing updated board-2.bin and firmware-6.bin version 4.4.1.c3 instead of stock 4.4.1 from https://github.com/kvalo/ath10k-firmware/tree/master/QCA6174/hw3.0/4.4.1.c3

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.