pc-kernel snap 4.15.0-152.159 (UC18) regression in network behaviour

Bug #1938269 reported by Jonathan Cave
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Several amd64 devices have shown an marked increase in failures to complete automated testing with this kernel snap that is currently on 18/beta channel.

The devices tested are:

dawson-j
====
device type: NUC7PJYH, NUC7CJYH
processor: Pentium Silver J5005 1.50GHz, Celeron J4005 2.00GHz
ethernet: Realtek 10ec:8168
wireless: Intel 8086:31dc

dawson-i
====
device-type: NUC7i3DNHE, NUC7i5DNHE
processor: i3-7100U 2.40GHz, i5-7300U 2.60GHz
ethernet: Intel 8086:156f
wireless: Intel 8086:24fd

dearest-team
====
device-type: Inspiron 5758
processor: i5-5200U 2.20GHz
ethernet: Realtek 10ec:8136
wireless: Intel 8086:08b3

Symptoms shown by these devices are not the controller side that manages the test run loses connection with the device while carrying out WiFi connection tests. The failure rate does not appear to be 100%, there have been some runs that have completed.

Revision history for this message
Jonathan Cave (jocave) wrote :

dawson-j
====
Is using iwlwifi driver, the following information is collected about the driver during the testing:

18:05:08 Interface wlan0 using module iwlwifi
18:05:08 Parameters:
18:05:08 lar_disable: N
18:05:08 fw_monitor: N
18:05:08 11n_disable: 0
18:05:08 power_save: N
18:05:08 swcrypto: 0
18:05:08 antenna_coupling: 0
18:05:08 power_level: 0
18:05:08 amsdu_size: 0
18:05:08 uapsd_disable: 3
18:05:08 d0i3_timeout: 1000
18:05:08 fw_restart: Y
18:05:08 d0i3_disable: Y
18:05:08 led_mode: 0
18:05:08 disable_11ac: N
18:05:08 bt_coex_active: Y
18:05:08 nvm_file: (null)
18:05:08
18:05:08 Checking kernel ring buffer for iwlwifi messages:
18:05:08 kern :info : [Tue Jul 27 16:55:07 2021] iwlwifi 0000:00:0c.0: enabling device (0000 -> 0002)
18:05:08 kern :info : [Tue Jul 27 16:55:07 2021] iwlwifi 0000:00:0c.0: loaded firmware version 34.3125811985.0 op_mode iwlmvm
18:05:08 kern :info : [Tue Jul 27 16:55:07 2021] iwlwifi 0000:00:0c.0: Detected Intel(R) Dual Band Wireless AC 9462, REV=0x318
18:05:08 kern :info : [Tue Jul 27 16:55:07 2021] iwlwifi 0000:00:0c.0: base HW address: 68:ec:c5:46:aa:b4
18:05:08 kern :info : [Tue Jul 27 16:55:34 2021] iwlwifi 0000:00:0c.0: Conflict between TLV & NVM regarding enabling LAR (TLV = enabled NVM =disabled)
18:05:08 kern :err : [Tue Jul 27 16:55:34 2021] iwlwifi 0000:00:0c.0: BIOS contains WGDS but no WRDS

description: updated
Revision history for this message
Jonathan Cave (jocave) wrote :

dawson-i
====
similar information for dawson-i:
14:56:57 Interface wlan0 using module iwlwifi
14:56:57 Parameters:
14:56:57 lar_disable: N
14:56:57 fw_monitor: N
14:56:57 11n_disable: 0
14:56:57 power_save: N
14:56:57 swcrypto: 0
14:56:57 antenna_coupling: 0
14:56:57 power_level: 0
14:56:57 amsdu_size: 0
14:56:57 uapsd_disable: 3
14:56:57 d0i3_timeout: 1000
14:56:57 fw_restart: Y
14:56:57 d0i3_disable: Y
14:56:57 led_mode: 0
14:56:57 disable_11ac: N
14:56:57 bt_coex_active: Y
14:56:57 nvm_file: (null)
14:56:57
14:56:57 Checking kernel ring buffer for iwlwifi messages:
14:56:57 kern :info : [Fri Jul 23 13:40:10 2021] iwlwifi 0000:01:00.0: enabling device (0000 -> 0002)
14:56:57 kern :info : [Fri Jul 23 13:40:10 2021] iwlwifi 0000:01:00.0: loaded firmware version 34.0.1 op_mode iwlmvm
14:56:57 kern :info : [Fri Jul 23 13:40:10 2021] iwlwifi 0000:01:00.0: Detected Intel(R) Dual Band Wireless AC 8265, REV=0x230
14:56:57 kern :info : [Fri Jul 23 13:40:10 2021] iwlwifi 0000:01:00.0: base HW address: ac:ed:5c:db:45:39
14:56:57 kern :err : [Fri Jul 23 13:40:26 2021] iwlwifi 0000:01:00.0: BIOS contains WGDS but no WRDS

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Paul Larson (pwlars) wrote :

I've been trying to gather some more data on this, and here's what I have so far...
I added a config for eth0 in /etc/systemd/network so that the test (which replaces eth netplan config with wifi config temporarily) doesn't leave us without the ability to reconnect.

Testing with dawson-uc18-m7-20190122-10 and stock uc18 after a snap refresh with current pc-kernel, it seems to sometimes work, sometimes crash and I can never reconnect, and sometimes it boots back up and I can ssh in again (and see 0m uptime)

Using the current stable image, I get the same behavior though... it's pretty random, so it's hard to say when this started happening:
pc-kernel 4.15.0-151.157 805 18/stable canonical kernel

Reverting the kernel to the one that was on the image for the dawson-uc18 image:
pc-kernel 4.15.0-44.47 182 18/stable canonical kernel
...This one is very old, I know, but after about 50 attempts, I haven't been able to reproduce it with this kernel.

Using the current stable kernel again, and reverting core18 to rev 1081 (which was on the image originally), I was somehow (maybe dumb luck?) about to still reproduce a problem, but it was in a state where I could still ssh to it. I could not run snap, ps, or almost anything else, but I could at least find this oops in dmesg: https://pastebin.canonical.com/p/jsctQmJZGc/

Revision history for this message
Paul Larson (pwlars) wrote :

I was also able to reproduce this in bionic on a dawson-j. We don't have serial console on that device, but I was able to get someone with remote access to grab a screenshot of the partial crash for me (attached)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.