System suddenly reboots on high load with 4.13.0-26-generic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-hwe (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Hi, I administer a system which encounters high load regularly over nighttime. Since Kernel 4.13.0-26-generic it suddenly reboots shortly after the load starts. When switching back to Kernel 4.10.0-42-generic everything works fine. I guess earlier versions of 4.13.0 worked too.
The system is running Ubuntu 16.04.3 LTS on a AMD EPYC 7351.
regards Marco
# lsb_release -rd
Description: Ubuntu 16.04.3 LTS
Release: 16.04
# apt-cache policy linux-image-
linux-image-
Installed: 4.13.0-
Candidate: 4.13.0-
Version table:
*** 4.13.0-
500 http://
500 http://
100 /var/lib/
I recognized that after a crash I see the following entries in dmesg
# dmesg |grep BERT
[ 0.000000] ACPI: BERT 0x00000000DA1DB7C0 000030 (v01 AMD AMD BERT 00000001 AMD 00000001)
[ 2.457831] BERT: [Firmware Bug]: Invalid error record.
So I now rather think it is a Hardware failure than a bug. I am setting the status from new to invalid (hoping thats the correct state).