RTL8821AE abruptly halts all network traffic

Bug #1618267 reported by Thor H. Johansen
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned

Bug Description

Computer: Lenovo IdeaPad 110
WiFi AP: Linksys E2500 v1.0 with firmware v2.0.00

After a few minutes of mixed Internet use, the WiFi hangs and the interface ceases to pass traffic:

64 bytes from 8.8.8.8: icmp_seq=281 ttl=55 time=12.8 ms
64 bytes from 8.8.8.8: icmp_seq=282 ttl=55 time=12.5 ms
64 bytes from 8.8.8.8: icmp_seq=283 ttl=55 time=58.4 ms
64 bytes from 8.8.8.8: icmp_seq=284 ttl=55 time=68.0 ms
64 bytes from 8.8.8.8: icmp_seq=285 ttl=55 time=65.6 ms
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
^C
--- 8.8.8.8 ping statistics ---
314 packets transmitted, 283 received, 9% packet loss, time 366615ms
rtt min/avg/max/mdev = 11.423/177.382/7227.733/806.648 ms, pipe 8

Before the driver hangs, speeds are 30/5 Mbps despite a 75 Mbps WiFi transmit rate and a 70/70 Mbps symmetrical fiber connection in the house. The connection and the router are known to perform better than this with other equipment.

Forcing the router into 802.11g mode does, at least at a cursory glance, seem to mitigate the halting problem, but speeds remain abysmal.

If there is any kind of technically involved tests I can do to speed up the debugging of this problem, let me know. I have some experience with embedded development in C.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-34-generic 4.4.0-34.53
ProcVersionSignature: Ubuntu 4.4.0-34.53-generic 4.4.15
Uname: Linux 4.4.0-34-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: thor 3179 F.... pulseaudio
 /dev/snd/controlC0: thor 3179 F.... pulseaudio
CurrentDesktop: Unity
Date: Tue Aug 30 01:36:22 2016
HibernationDevice: RESUME=UUID=a482ae0a-dd87-41de-be2c-50541edcfa53
InstallationDate: Installed on 2016-08-29 (0 days ago)
InstallationMedia: Ubuntu 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
MachineType: LENOVO 80TJ
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-34-generic.efi.signed root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-34-generic N/A
 linux-backports-modules-4.4.0-34-generic N/A
 linux-firmware 1.157.3
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/24/2016
dmi.bios.vendor: Lenovo
dmi.bios.version: 1QCN16WW
dmi.board.asset.tag: No Asset Tag
dmi.board.name: Nano 5A8
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40700 WIN
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo ideapad 110-15ACL
dmi.modalias: dmi:bvnLenovo:bvr1QCN16WW:bd03/24/2016:svnLENOVO:pn80TJ:pvrLenovoideapad110-15ACL:rvnLENOVO:rnNano5A8:rvrSDK0J40700WIN:cvnLENOVO:ct10:cvrLenovoideapad110-15ACL:
dmi.product.name: 80TJ
dmi.product.version: Lenovo ideapad 110-15ACL
dmi.sys.vendor: LENOVO

Revision history for this message
Thor H. Johansen (thorhajo) wrote :
Revision history for this message
Thor H. Johansen (thorhajo) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Thor H. Johansen, thank you for reporting this and helping make Ubuntu better.

In order to allow additional upstream developers to examine the issue, at your earliest convenience, could you please test the latest upstream kernel available from http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D ? Please keep in mind the following:
1) The one to test is at the very top line at the top of the page (not the daily folder).
2) The release names are irrelevant.
3) The folder time stamps aren't indicative of when the kernel actually was released upstream.
4) Install instructions are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds .

If testing on your main install would be inconvenient, one may:
1) Install Ubuntu to a different partition and then test this there.
2) Backup, or clone the primary install.

If the latest kernel did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this issue is fixed in the mainline kernel, please add the following tags by clicking on the yellow circle with a black pencil icon, next to the word Tags, located at the bottom of the report description:
kernel-fixed-upstream
kernel-fixed-upstream-X.Y-rcZ

Where X, and Y are the first two numbers of the kernel version, and Z is the release candidate number if it exists.

If the mainline kernel does not fix the issue, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-X.Y-rcZ

Please note, an error to install the kernel does not fit the criteria of kernel-bug-exists-upstream.

Also, you don't need to apport-collect further unless specifically requested to do so.

It is most helpful that after testing of the latest upstream kernel is complete, you mark this report Status Confirmed.

Lastly, to keep this issue relevant to upstream, please continue to test the latest mainline kernel as it becomes available.

Thank you for your help.

tags: added: bios-outdated-1qcn32ww
Changed in linux (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
Revision history for this message
Thor H. Johansen (thorhajo) wrote :

Got these warnings during mainline kernel install:

W: Possible missing firmware /lib/firmware/radeon/hainan_k_smc.bin for module radeon
W: Possible missing firmware /lib/firmware/radeon/oland_k_smc.bin for module radeon
W: Possible missing firmware /lib/firmware/radeon/verde_k_smc.bin for module radeon
W: Possible missing firmware /lib/firmware/radeon/pitcairn_k_smc.bin for module radeon
W: Possible missing firmware /lib/firmware/radeon/tahiti_k_smc.bin for module radeon
W: Possible missing firmware /lib/firmware/radeon/hawaii_k_smc.bin for module radeon
W: Possible missing firmware /lib/firmware/radeon/bonaire_k_smc.bin for module radeon

Downloaded the missing files from...

https://people.freedesktop.org/~agd5f/radeon_ucode/k/

...and placed them in /lib/firmware/radeon/, just in case. This removed the warnings. About to reboot to test out the new v4.8-rc3 kernel.

Revision history for this message
Thor H. Johansen (thorhajo) wrote :
Download full text (3.2 KiB)

Testing on the mainline v4.8-rc3 kernel, the first thing that happens is that the NIC dies before I can even use it. Browser just sits there, so I try to ping:

thor@thor-ideapad:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
ping: sendmsg: No buffer space available
ping: sendmsg: No buffer space available
^C
--- 8.8.8.8 ping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 28251ms

Then I unload the module to try to recover:

thor@thor-ideapad:~$ sudo modprobe -r rtl8821ae
thor@thor-ideapad:~$ sudo modprobe rtl8821ae

The modprobe kind of sits there for several seconds trying to unload before it returns to the command line. Something isn't responding as it should I think, because usually, kernel modules unload almost instantly.

So anyway, I now manage to stay connected, so I ping:

thor@thor-ideapad:~$ ping 8.8.8.8
...
...
...
64 bytes from 8.8.8.8: icmp_seq=16 ttl=55 time=8203 ms
64 bytes from 8.8.8.8: icmp_seq=17 ttl=55 time=7154 ms
64 bytes from 8.8.8.8: icmp_seq=18 ttl=55 time=6130 ms
64 bytes from 8.8.8.8: icmp_seq=19 ttl=55 time=5106 ms
64 bytes from 8.8.8.8: icmp_seq=20 ttl=55 time=4084 ms
64 bytes from 8.8.8.8: icmp_seq=21 ttl=55 time=3060 ms
64 bytes from 8.8.8.8: icmp_seq=22 ttl=55 time=2036 ms
64 bytes from 8.8.8.8: icmp_seq=23 ttl=55 time=12.7 ms
64 bytes from 8.8.8.8: icmp_seq=24 ttl=55 time=47.7 ms
64 bytes from 8.8.8.8: icmp_seq=25 ttl=55 time=12.1 ms
64 bytes from 8.8.8.8: icmp_seq=26 ttl=55 time=12.1 ms
64 bytes from 8.8.8.8: icmp_seq=27 ttl=55 time=11.5 ms
64 bytes from 8.8.8.8: icmp_seq=28 ttl=55 time=11.8 ms
64 bytes from 8.8.8.8: icmp_seq=29 ttl=55 time=11.7 ms
64 bytes from 8.8.8.8: icmp_seq=30 ttl=55 time=12.2 ms
64 bytes from 8.8.8.8: icmp_seq=31 ttl=55 time=11.4 ms
64 bytes from 8.8.8.8: icmp_seq=32 ttl=55 time=21.4 ms
64 bytes from 8.8.8.8: icmp_seq=33 ttl=55 time=13.4 ms
64 bytes from 8.8.8.8: icmp_seq=34 ttl=55 time=11.5 ms
64 bytes from 8.8.8.8: icmp_seq=35 ttl=55 time=12.5 ms
64 bytes from 8.8.8.8: icmp_seq=36 ttl=55 time=12.6 ms
64 bytes from 8.8.8.8: icmp_seq=37 ttl=55 time=11.5 ms
64 bytes from 8.8.8.8: icmp_seq=38 ttl=55 time=40.7 ms
64 bytes from 8.8.8.8: icmp_seq=39 ttl=55 time=69.3 ms
64 bytes from 8.8.8.8: icmp_seq=40 ttl=55 time=108 ms
64 bytes from 8.8.8.8: icmp_seq=41 ttl=55 time=194 ms
64 bytes from 8.8.8.8: icmp_seq=42 ttl=55 time=12.3 ms
64 bytes from 8.8.8.8: icmp_seq=43 ttl=55 time=192 ms
64 bytes from 8.8.8.8: icmp_seq=44 ttl=55 time=12.0 ms
64 bytes from 8.8.8.8: icmp_seq=45 ttl=55 time=191 ms
64 bytes from 8.8.8.8: icmp_seq=46 ttl=55 time=12.0 ms
64 bytes from 8.8.8.8: icmp_seq=47 ttl=55 time=137 ms
64 bytes from 8.8.8.8: icmp_seq=48 ttl=55 time=25.6 ms
64 bytes from 8.8.8.8: icmp_seq=49 ttl=55 time=3684 ms
64 bytes from 8.8.8.8: icmp_seq=50 ttl=55 time=2626 ms
64 bytes from 8.8.8.8: icmp_seq=51 ttl=55 time=1606 ms
64 bytes from 8.8.8.8: icmp_seq=52 ttl=55 time=585 ms
64 bytes from 8.8.8.8: icmp_seq=53 ttl=55 time=4381 ms
...
...
...

It kind of does this weird cycle with the ping times. It's what I saw before with the distribution kernel, so there's no difference there.

I noticed that this bug has been tagged with bios-outdated. Should I look into flas...

Read more...

Revision history for this message
Thor H. Johansen (thorhajo) wrote :

As for bad performance and eventual halting of traffic, it's still doing it. Messages from the kernel are, as before, deceptively calm. Just routine messages about association and authentication. Not an error in sight.

Revision history for this message
Thor H. Johansen (thorhajo) wrote :

I will perform the upgrade, but first, a message to Canonical:

The tone in these canned responses and pages feel somewhat patronizing. You get the distinct feeling they were written by a person whose patience and good manners have been worn extremely thin.

They have successfully delivered the relevant information, but they have also successfully insulted me for having the audacity of filing a bug report without having flashed my BIOS first, because apparently, this is considered bad "etiquette", and I should know that my filthy BIOS is "buggy, insecure, and outdated" and how dare I even show my face around here without taking care of such an obvious thing first?

Canonical may want to work on not insulting people who are trying their best to properly report a bug.

Revision history for this message
Thor H. Johansen (thorhajo) wrote :

For others who come across this report, here are some crucial bits of information about upgrading the BIOS on this machine:

1. Download the Windows BIOS update from the Lenovo support website.
2. Install innoextract (apt-get install innoextract) and use it to extract the MS-DOS flash utility inside.
3. Get a pre-built FreeDOS image from https://www.chtaube.eu/computers/freedos/bootable-usb/ and write it to a USB drive as instructed.
4. Mount the FAT partition on the USB drive (e.g. sudo mount /dev/sdb1 /mnt) and copy the flash image to the drive (e.g. sudo cp app/1QCN32WW.exe /mnt/).
5. Unmount (e.g. umount /mnt), reboot and mash Fn+F2 to get into the BIOS. If the system refuses to reboot, pull out the USB drive (MBR boot records seem to make the BIOS hang if it's not in Legacy Support mode).
6. Change Boot Mode to Legacy Support.
7. Exit and save changes.
8. You should now be able to boot FreeDOS. Stick with the defaults in the boot menus.
9. Type DIR to get a list of files.
10. Run the flasher utility (e.g. 1QCN32WW and return).
11. Let it do its thing.
12. You might get an Access Denied / Invalid Image error at this point. Don't panic! Your computer is not bricked. Hit Ctrl+Alt+Del and mash Fn+F2 to get into the BIOS again, disable the Safe Boot option, then Exit and save changes.
13. Your system will now boot again.

If you're installing Ubuntu on this system, I suggest you keep Safe Boot disabled all the time. It seems to cause nothing but trouble.

Revision history for this message
Thor H. Johansen (thorhajo) wrote :

Now for the WiFi bug report on the new BIOS:

The problem didn't go away, as expected. They wouldn't ship a machine without working WiFi. This is clearly a Linux specific problem, which is why I'm reporting it in the first place. If a BIOS flash somehow fixes a bug that only occurs on Linux, the bug should not be considered fixed. If the Windows driver is able to work around a BIOS issue, so should the Linux driver.

sudo dmidecode -s bios-version && sudo dmidecode -s bios-release-date
1QCN32WW
08/18/2016

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Thor H. Johansen:
1) Could you please post the results of the following terminal command:
iwconfig
2) Could you please advise what firmware version your Linksys WiFi router is using?

tags: added: latest-bios-1qcn32ww
removed: bios-outdated-1qcn32ww
Changed in linux (Ubuntu):
importance: Low → Medium
status: Confirmed → Incomplete
Revision history for this message
Thor H. Johansen (thorhajo) wrote :

thor@thor-ideapad:~$ iwconfig

lo no wireless extensions.

wlp1s0 IEEE 802.11 ESSID:"Johansen"
          Mode:Managed Frequency:5.18 GHz Access Point: 58:6D:8F:C6:38:89
          Bit Rate=72.2 Mb/s Tx-Power=30 dBm
          Retry short limit:7 RTS thr=2347 B Fragment thr:off
          Power Management:off
          Link Quality=70/70 Signal level=-38 dBm
          Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
          Tx excessive retries:0 Invalid misc:0 Missed beacon:0

enp2s0 no wireless extensions.

The E2500's hardware is v1.0 and the firmware is v2.0.00.

Devices that are working flawlessly with this router include:

iPhone 6 Plus
Huawei Nexus 6P
Samsung Galaxy S5
MacBook Pro 13" Mid-2012
RTL8192CUS-based USB WiFi dongle

I have never had any connection issues. My other laptops have been able to remain connected with open SSH sessions to remote systems for days at a time without disconnecting. I should mention that I now make sure to ping my router, not the Internet, to check if the WiFi interface is working properly. Since my Internet is reliable, it doesn't make much of a difference, but this way, I eliminate potential error sources.

The USB dongle listed last was purchased today and is plugged into the Lenovo as a workaround for WiFi until it's fixed. It works perfectly fine if I use the rtl8192cu-fixes driver. I unplug it whenever I test the internal WiFi for this bug ticket.

(Is it just me, or are all the Realtek WiFi drivers in the kernel kind of unreliable? This is not the first time Realtek chips have given me trouble.)

Revision history for this message
Thor H. Johansen (thorhajo) wrote :

The iwconfig output above is from while the interface still works, not when it exhibits the bug. Let me know if you want an iwconfig dump from when it's dead.

Revision history for this message
penalvch (penalvch) wrote :

Thor H. Johansen, to further narrow this down, if you force the router to broadcast on 802.11G only, is the issue still reproducible?

description: updated
Revision history for this message
Thor H. Johansen (thorhajo) wrote :

With the router in pure 802.11G mode, the interface just barely hangs in there. When I run SpeedTest.net, ping times go through the roof, the test sometimes gets stuck without completing, and the results are very bad. Oddly, ping reports no dropped packets. Just severely delayed ones.

However, the interface does not halt completely like it does when the router is in Mixed mode, and I am able to continue using it, seemingly indefinitely.

I am running these tests with 6 feet of open air between me and the AP. For comparison, some of the better WiFi clients in the house approach the full 70/70 cabled connection speed, without even being in the same room as the AP.

Revision history for this message
penalvch (penalvch) wrote :

Thor H. Johansen, The issue you are reporting is an upstream one. Could you please report this problem following the instructions verbatim at https://wiki.ubuntu.com/Bugs/Upstream/kernel to the appropriate mailing list (TO Larry Finger, and wlanfae CC linux-wireless)?

Please provide a direct URL to your post to the mailing list when it becomes available so that it may be tracked.

Thank you for your help.

Changed in linux (Ubuntu):
importance: Medium → High
status: Incomplete → Triaged
Revision history for this message
James Cameron (quozl) wrote :

Summary; my own rtl8821ae frequent connection lost problem was fixed by upstream patch b8b8b16352cd ("rtlwifi: rtl8821ae: Fix connection lost problem") merged for 4.14-rc4.

I've recently fixed a problem very much like the problems reported in this bug; and during my tests I saw the cycling ping times, the delay unloading the module, and the no buffer space, all of which suggested a firmware hang in the wireless device.

My main problem was that connection was lost shortly after it was made, and especially if there was a burst of download data. Workaround was to turn wireless off then on again, or reboot.

What I did was

(a) test several ubuntu kernels to find which ones were affected,

(b) using git bisect, tested several custom kernels, and proved my problem began with a single commit 40b368af4b75 ("rtlwifi: Fix alignment issues"),

(c) worked with upstream to determine a likely cause; the commit wasn't widely tested,

(d) proved through testing that the BIOS can have an effect; because the rtl8821ae driver does not reset the card, but inherits the card in whatever state the BIOS left it. This can randomise problem reports.

(e) proved that a power down and reboot can have a different effect to a warm reboot; some of the wireless card device registers are unchanged on warm reboot. This can randomise problem reports.

Hope that helps.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Thank you James.

Can I mark #1622293, #1653012, #1707185, as duplicate?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please try this kernel, commit b8b8b16352cd is cherry-picked into it:

http://people.canonical.com/~khfeng/lp1618267/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.