kernel panics/hangs after upgrading to linux-image-5.4.0-122-generic:amd64

Bug #1982014 reported by timeless
58
This bug affects 10 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

We had a series of unattended-upgrades:
```
Start-Date: 2022-07-13 06:19:32
Commandline: /usr/bin/unattended-upgrade
Install: linux-modules-extra-5.4.0-122-generic:amd64 (5.4.0-122.138, automatic), linux-modules-5.4.0-122-generic:amd64 (5.4.0-122.138, automatic), linux-headers-5.4.0-122:amd64 (5.4.0-122.138, automatic), linux-headers-5.4.0-122-generic:amd64 (5.4.0-122.138, automatic), linux-image-5.4.0-122-generic:amd64 (5.4.0-122.138, automatic)
Upgrade: linux-headers-generic:amd64 (5.4.0.121.122, 5.4.0.122.123), linux-image-generic:amd64 (5.4.0.121.122, 5.4.0.122.123), linux-generic:amd64 (5.4.0.121.122, 5.4.0.122.123)
End-Date: 2022-07-13 06:20:56

Start-Date: 2022-07-13 06:21:00
Commandline: /usr/bin/unattended-upgrade
Remove: linux-headers-5.4.0-120-generic:amd64 (5.4.0-120.136), linux-headers-5.4.0-120:amd64 (5.4.0-120.136)
End-Date: 2022-07-13 06:21:05

Start-Date: 2022-07-13 06:21:09
Commandline: /usr/bin/unattended-upgrade
Remove: linux-modules-extra-5.4.0-120-generic:amd64 (5.4.0-120.136), linux-modules-5.4.0-120-generic:amd64 (5.4.0-120.136), linux-image-5.4.0-120-generic:amd64 (5.4.0-120.136)
End-Date: 2022-07-13 06:21:28
```

The same day, I performed a manual upgrade:
```
Start-Date: 2022-07-13 14:20:07
Commandline: apt upgrade
Upgrade: pdns-backend-pgsql:amd64 (4.6.2-1pdns.focal, 4.6.3-1pdns.focal), pdns-server:amd64 (4.6.2-1pdns.focal, 4.6.3-1pdns.focal), git-man:amd64 (1:2.25.1-1ubuntu3.4, 1:2.25.1-1ubuntu3.5), git:amd64 (1:2.25.1-1ubuntu3.4, 1:2.25.1-1ubuntu3.5), linux-firmware:amd64 (1.187.31, 1.187.32), pdns-backend-bind:amd64 (4.6.2-1pdns.focal, 4.6.3-1pdns.focal)
End-Date: 2022-07-13 14:22:01
```

followed by a reboot.

Within a short period, the computer (bare metal, living in a data center) would hang (not respond to console, ping, ssh, ...) or panic.

After a couple of rounds of asking the IT staff to find the box and kick it (sometimes they saw an actual report of a panic, and sometimes it just didn't show anything), I reverted `linux-firmware:amd64` to `1.187.31` and rebooted. This did not fix the problem.

Eventually we realized that the earlier thing had upgraded the kernel separately. Reverting that to the previous (5.4.0-121.137) kernel resulted in a reliable happy box.

```
ii binutils-x86-64-linux-gnu 2.34-6ubuntu1.3 amd64 GNU binary utilities, for x86-64-linux-gnu target
ii linux-base 4.5ubuntu3.7 all Linux image base package
ii linux-firmware 1.187.31 all Firmware for Linux kernel drivers
ii linux-headers-5.4.0-121 5.4.0-121.137 all Header files related to Linux kernel version 5.4.0
ii linux-headers-5.4.0-121-generic 5.4.0-121.137 amd64 Linux kernel headers for version 5.4.0 on 64 bit x86 SMP
rc linux-image-5.4.0-100-generic 5.4.0-100.113 amd64 Signed kernel image generic
rc linux-image-5.4.0-104-generic 5.4.0-104.118 amd64 Signed kernel image generic
rc linux-image-5.4.0-105-generic 5.4.0-105.119 amd64 Signed kernel image generic
rc linux-image-5.4.0-107-generic 5.4.0-107.121 amd64 Signed kernel image generic
rc linux-image-5.4.0-109-generic 5.4.0-109.123 amd64 Signed kernel image generic
rc linux-image-5.4.0-110-generic 5.4.0-110.124 amd64 Signed kernel image generic
rc linux-image-5.4.0-113-generic 5.4.0-113.127 amd64 Signed kernel image generic
rc linux-image-5.4.0-117-generic 5.4.0-117.132 amd64 Signed kernel image generic
rc linux-image-5.4.0-120-generic 5.4.0-120.136 amd64 Signed kernel image generic
ii linux-image-5.4.0-121-generic 5.4.0-121.137 amd64 Signed kernel image generic
rc linux-image-5.4.0-122-generic 5.4.0-122.138 amd64 Signed kernel image generic
rc linux-image-5.4.0-94-generic 5.4.0-94.106 amd64 Signed kernel image generic
rc linux-image-5.4.0-96-generic 5.4.0-96.109 amd64 Signed kernel image generic
rc linux-image-5.4.0-97-generic 5.4.0-97.110 amd64 Signed kernel image generic
rc linux-image-5.4.0-99-generic 5.4.0-99.112 amd64 Signed kernel image generic
ii linux-libc-dev:amd64 5.4.0-122.138 amd64 Linux Kernel Headers for development
rc linux-modules-5.4.0-100-generic 5.4.0-100.113 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-104-generic 5.4.0-104.118 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-105-generic 5.4.0-105.119 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-107-generic 5.4.0-107.121 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-109-generic 5.4.0-109.123 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-110-generic 5.4.0-110.124 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-113-generic 5.4.0-113.127 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-117-generic 5.4.0-117.132 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-120-generic 5.4.0-120.136 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
ii linux-modules-5.4.0-121-generic 5.4.0-121.137 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-122-generic 5.4.0-122.138 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-62-generic 5.4.0-62.70 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-94-generic 5.4.0-94.106 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-96-generic 5.4.0-96.109 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-97-generic 5.4.0-97.110 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-5.4.0-99-generic 5.4.0-99.112 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-100-generic 5.4.0-100.113 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-104-generic 5.4.0-104.118 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-105-generic 5.4.0-105.119 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-107-generic 5.4.0-107.121 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-109-generic 5.4.0-109.123 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-110-generic 5.4.0-110.124 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-113-generic 5.4.0-113.127 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-117-generic 5.4.0-117.132 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-120-generic 5.4.0-120.136 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
ii linux-modules-extra-5.4.0-121-generic 5.4.0-121.137 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-122-generic 5.4.0-122.138 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-94-generic 5.4.0-94.106 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-96-generic 5.4.0-96.109 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-97-generic 5.4.0-97.110 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
rc linux-modules-extra-5.4.0-99-generic 5.4.0-99.112 amd64 Linux kernel extra modules for version 5.4.0 on 64 bit x86 SMP
```

```
lsb_release -rd
Description: Ubuntu 20.04.4 LTS
Release: 20.04
```

3) What you expected to happen
1. Servers that have been reliable for years should not spontaneously panic/hang (neither in general, nor after an upgrade, especially not an automated one)
2. Ideally as on Windows, a minidump should be safely generated to a volume so that I can provide it to support (or quickly debug it w/o losing many days and sleepless nights).

4) What happened instead
The computer would panic or hang and necessitate an actual human to go find the computer and kick it. During which time, I had to adjust routing so that another server could handle this server's job responsibilities. (And I had to have others do the same, as it happened repeatedly.)

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-121-generic 5.4.0-121.137
ProcVersionSignature: Ubuntu 5.4.0-121.137-generic 5.4.189
Uname: Linux 5.4.0-121-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
AlsaVersion: Advanced Linux Sound Architecture Driver Version k5.4.0-121-generic.
ApportVersion: 2.20.11-0ubuntu27.24
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D2c', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Card0.Amixer.info:
 Card hw:0 'MID'/'HDA Intel MID at 0xfbeec000 irq 33'
   Mixer name : 'Realtek ALC889'
   Components : 'HDA:10ec0889,15d9060d,00100004'
   Controls : 55
   Simple ctrls : 24
CasperMD5CheckResult: skip
Date: Mon Jul 18 13:16:53 2022
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 002 Device 003: ID 14dd:0002 Raritan Computer, Inc.
 Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Supermicro C7SIM-Q
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-121-generic root=UUID=25b80abe-c15a-4298-9a39-b83ed0d7a02f ro net.ifnames=0 biosdevname=0 quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-121-generic N/A
 linux-backports-modules-5.4.0-121-generic N/A
 linux-firmware 1.187.31
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2020-10-21 (635 days ago)
dmi.bios.date: 08/16/10
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.0b
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: C7SIM-Q
dmi.board.vendor: Supermicro
dmi.board.version: 0123456789
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 24
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.0b:bd08/16/10:svnSupermicro:pnC7SIM-Q:pvr0123456789:rvnSupermicro:rnC7SIM-Q:rvr0123456789:cvnSupermicro:ct24:cvr0123456789:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: C7SIM-Q
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Revision history for this message
timeless (timeless) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Angelo Hongens (ahongens) wrote :

Did you get any update on this?

Got some users on the proxmox that also have what seems like the same problem, including me.

https://forum.proxmox.com/threads/ubuntu-20-04-04-machine-freezes.112507/#post-488816

I haven’t been able to reproduce it, since I’m not running Ubuntu in production anymore with a lot of load. I was running CentOS7 and wanted to switch to Ubuntu, but since my first Ubuntu production machine was freezing quite some times during a weekend and I lost quite some customers over it, I switched back to centos here unfortunately.

Hope Ubuntu will one day be stable enough to run in production for my workload ;)

Revision history for this message
Walter (wdoekes) wrote :

This particular issue looks like it was introduced in 122 and fixed in 123:

==============

$ git show d50392ab | head
commit d50392abf462b65b4562c077dd34d9d01becdc41
Author: Francesco Ruggeri <email address hidden>
Date: Wed Apr 20 17:50:26 2022 -0700

    tcp: md5: incorrect tcp_header_len for incoming connections

    BugLink: https://bugs.launchpad.net/bugs/1979014

    [ Upstream commit 5b0b9e4c2c895227c8852488b3f09839233bba54 ]

$ git tag --contains d50392ab | tail -n1
Ubuntu-5.4.0-122.138

========================

$ git show 97842ea930e0
commit 97842ea930e0eb94bfdb87beaf87d56224c1e8ad
Author: Eric Dumazet <email address hidden>
Date: Sun Apr 24 13:35:09 2022 -0700

    tcp: make sure treq->af_specific is initialized

    BugLink: https://bugs.launchpad.net/bugs/1979566

    commit ba5a4fdd63ae0c575707030db0b634b160baddd7 upstream.
...
    Fixes: 5b0b9e4c2c89 ("tcp: md5: incorrect tcp_header_len for incoming connections")

$ git tag --contains 97842ea930e0 | tail -n1
Ubuntu-5.4.0-123.139

==============

I think we can close.

Cheers,
Walter

Revision history for this message
timeless (timeless) wrote :

Yes, we applied a testing fixed kernel and the problem went away.

timeless (timeless)
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.