network broken after March 4 updates

Bug #1963747 reported by Jeffrey Walton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubuntu PPA
New
Undecided
Unassigned

Bug Description

I have two Kubuntu machines based on Ubuntu 20.04, x86_64. The machines use a script to auto-update each night. The script performs an 'apt-get update && apt-get upgrade' each night around 4:00 AM as a systemd service.

When I woke this morning I could not connect to either machine. The wired network connection cycles from dectivated->activated->deactivated. It is just bouncing. I enabled wifi and tried to bring up the network but I had other problems. I seem to be getting a 10.0.0.xxx address so the network is also down.

Dmesg does not show errors related to networking.

Machine 1:

# dmesg | grep -i -E 'eth|enp|error|warn'
[ 0.729439] RAS: Correctable Errors collector initialized.
[ 1.013015] wmi_bus wmi_bus-PNP0C14:02: WQBC data block query control method not found
[ 1.020067] alx 0000:04:00.0 eth0: Qualcomm Atheros AR816x/AR817x Ethernet [d8:9e:f3:92:24:60]
[ 1.035103] alx 0000:04:00.0 enp4s0: renamed from eth0
[ 4.290092] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
[ 5.017326] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 5.113378] alx 0000:04:00.0 enp4s0: NIC Up: 1 Gbps Full
[ 5.113647] IPv6: ADDRCONF(NETDEV_CHANGE): enp4s0: link becomes ready

Machine 2:

 dmesg | grep -i -E 'eth|enp|error|warn'
[ 4.886623] ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index (0x000000005) is beyond end of object (length 0x5) (20210331/exoparg2-393)
               Initialized Local Variables for Method [GETP]:
[ 4.886652] Initialized Arguments for Method [GETP]: (2 arguments defined for method invocation)
[ 4.886667] ACPI Error: Aborting method \_TZ.GETP due to previous error (AE_AML_PACKAGE_LIMIT) (20210331/psparse-529)
[ 4.886677] ACPI Error: Aborting method \_TZ.CHGZ._CRT due to previous error (AE_AML_PACKAGE_LIMIT) (20210331/psparse-529)
[ 4.887690] ACPI BIOS Error (bug): AE_AML_PACKAGE_LIMIT, Index (0x000000005) is beyond end of object (length 0x5) (20210331/exoparg2-393)
               Initialized Local Variables for Method [GETP]:
[ 4.887713] Initialized Arguments for Method [GETP]: (2 arguments defined for method invocation)
[ 4.887727] ACPI Error: Aborting method \_TZ.GETP due to previous error (AE_AML_PACKAGE_LIMIT) (20210331/psparse-529)
[ 4.887736] ACPI Error: Aborting method \_TZ.CHGZ._CRT due to previous error (AE_AML_PACKAGE_LIMIT) (20210331/psparse-529)
[ 5.463994] RAS: Correctable Errors collector initialized.
[ 5.856324] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 80:e8:2c:4c:27:27
[ 5.856327] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[ 5.856405] e1000e 0000:00:1f.6 eth0: MAC: 12, PHY: 12, PBA No: FFFFFF-0FF
[ 5.857243] e1000e 0000:00:1f.6 enp0s31f6: renamed from eth0
[ 7.896863] EXT4-fs (nvme0n1p2): re-mounted. Opts: errors=remount-ro. Quota mode: none.
[ 8.223779] hp_wmi: query 0x4 returned error 0x5
[ 8.234838] hp_wmi: query 0xd returned error 0x5
[ 8.285693] hp_wmi: query 0x1b returned error 0x5
[ 8.633150] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 11.858857] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 11.858982] IPv6: ADDRCONF(NETDEV_CHANGE): enp0s31f6: link becomes ready

The one hint I have is, /var/log/apt/history.log shows a new linux-firmware was installed on both machines. This appears to be the only change on the day before trouble started:

Start-Date: 2022-03-04 04:19:40
Commandline: apt-get upgrade -y
Upgrade: linux-firmware:amd64 (1.187.26, 1.187.27)
End-Date: 2022-03-04 04:20:24

I've tried reinstalling network-manager with no joy.

Any help is appreciated.

Revision history for this message
Jeffrey Walton (noloader) wrote :

Both the 5.4.0-100 and 5.13.0-30 kernels have trouble. The machine cannot bring up the network with either kernel.

And the network manager Kubuntu uses sucks. It does not provide error messages. Clicking on the '!' icon within network settings for the connection does not provide any information.

Man, I could scream...

Revision history for this message
Jeffrey Walton (noloader) wrote :

The problem turned out to be a bad DHCP option in my pfSense DHCP server. I did not save the option after I was poking around. I am kind of surprised it became part of the server's configuration.

The bad option took down all my Linux machines. It took down about 12 of them - Debian, Ubuntu, Kubuntu, Mint and Fedora.

That ISC dhcp client is so damn fragile. It is scary to think you can take down an entire network because ISC's client is so fragile.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.