4.15 kernel hard lockup about once a week
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Colin Ian King | ||
Bionic |
Fix Released
|
High
|
Unassigned |
Bug Description
== SRU Justification ==
When using zram (as installed and configured with the zram-config package)
systems can lockup after about a week of use. This occurs because of
a hang in a lock in zram.
== Test Case ==
Run stress-ng --brk 0 --stack 0 in a Bionic amd64 server VM with 1GM of
memory, 16 CPU threads and zram-config installed. Without the fix the
kernel will hang in a spinlock after 1-2 hours of run time. With the fix,
the hang does not occur. Testing shows that with the fix, 5 x 16 CPU hours
of stress testing with stress-ng works fine without the lockup occurring.
== The fix ==
Upstream commit c4d6c4cc7bfd ("zram: correct flag name of ZRAM_ACCESS") as
a prerequisite followed by a minor context wiggle backport of the fix with
commit 3c9959e02547 ("zram: fix lockdep warning of free block handling").
== Regression Potential ==
This touches the zram locking, so the core zram driver is affected. However
the fixes are backports from 5.0, so the fixes have had a fair amount of
testing in later kernels.
My main server has been running into hard lockups about once a week ever since I switched to the 4.15 Ubuntu 18.04 kernel.
When this happens, nothing is printed to the console, it's effectively stuck showing a login prompt. The system is running with panic=1 on the cmdline but isn't rebooting so the kernel isn't even processing this as a kernel panic.
As this felt like a potential hardware issue, I had my hosting provider give me a completely different system, different motherboard, different CPU, different RAM and different storage, I installed that system on 18.04 and moved my data over, a week later, I hit the issue again.
We've since also had a LXD user reporting similar symptoms here also on varying hardware:
https:/
My system doesn't have a lot of memory pressure with about 50% of free memory:
root@vorash:~# free -m
total used free shared buff/cache available
Mem: 31819 17574 402 513 13842 13292
Swap: 15909 2687 13222
I will now try to increase console logging as much as possible on the system in the hopes that next time it hangs we can get a better idea of what happened but I'm not too hopeful given the complete silence on the console when this occurs.
System is currently on:
Linux vorash 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
But I've seen this since the GA kernel on 4.15 so it's not a recent regression.
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Oct 23 16:12 seq
crw-rw---- 1 root audio 116, 33 Oct 23 16:12 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.4
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse:
Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/22822/fd/10: Permission denied
Cannot stat file /proc/22831/fd/10: Permission denied
DistroRelease: Ubuntu 18.04
HibernationDevice:
RESUME=none
CRYPTSETUP=n
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Intel Corporation S1200SP
NonfreeKernelMo
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.173.1
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-38-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: False
dmi.bios.date: 01/25/2018
dmi.bios.vendor: Intel Corporation
dmi.bios.version: S1200SP.
dmi.board.
dmi.board.name: S1200SP
dmi.board.vendor: Intel Corporation
dmi.board.version: H57532-271
dmi.chassis.
dmi.chassis.type: 23
dmi.chassis.vendor: .......
dmi.chassis.
dmi.modalias: dmi:bvnIntelCor
dmi.product.family: Family
dmi.product.name: S1200SP
dmi.product.
dmi.sys.vendor: Intel Corporation
CVE References
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
Changed in linux (Ubuntu Bionic): | |
importance: | Undecided → High |
tags: | added: bionic kernel-key |
tags: |
added: kernel-da-key removed: kernel-key |
tags: | added: cscc |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
Changed in linux (Ubuntu Bionic): | |
status: | Incomplete → Confirmed |
Changed in linux (Ubuntu): | |
assignee: | nobody → Colin Ian King (colin-king) |
Changed in linux (Ubuntu Bionic): | |
status: | Confirmed → Fix Committed |
no longer affects: | zram-config (Ubuntu) |
no longer affects: | zram-config (Ubuntu Bionic) |
Changed in linux (Ubuntu): | |
status: | Incomplete → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Fix Committed → Fix Released |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1799497
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.