mce errors since latest kernelupdate

Bug #1747944 reported by Achim Behrens
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

Hi, running 16.04 server on a Dell Poweredge T20. Since latest meltdown/spectre Kernel update i get "mce: [Hardware Error]: Machine check events logged" shown in the logs. This happens when the server is compiling stuff scheduled per cron.

the mce-log shows entries like:
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 0
TIME 1517991144 Wed Feb 7 09:12:24 2018
MCG status:
MCi status:
Corrected error
Error enabled
MCA: Internal parity error
STATUS 90000040000f0005 MCGSTATUS 0
MCGCAP c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 60

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-112-generic 4.4.0-112.135
ProcVersionSignature: Ubuntu 4.4.0-112.135-generic 4.4.98
Uname: Linux 4.4.0-112-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.4.0-112-generic.
AplayDevices: aplay: device_list:268: no soundcards found...
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
ArecordDevices: arecord: device_list:268: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/controlC0', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed Feb 7 16:49:09 2018
HibernationDevice: RESUME=UUID=9b654b0f-96ef-49f0-bfd8-ad7907cb7b99
InstallationDate: Installed on 2017-05-17 (266 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.3)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Dell Inc. PowerEdge T20
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-112-generic root=UUID=9edb2ffb-791e-4988-b3a9-7ce73227802c ro
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-112-generic N/A
 linux-backports-modules-4.4.0-112-generic N/A
 linux-firmware 1.157.15
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/26/2017
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.name: 0VD5HY
dmi.board.vendor: Dell Inc.
dmi.board.version: A07
dmi.chassis.type: 6
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd12/26/2017:svnDellInc.:pnPowerEdgeT20:pvr00:rvnDellInc.:rn0VD5HY:rvrA07:cvnDellInc.:ct6:cvr:
dmi.product.name: PowerEdge T20
dmi.product.version: 00
dmi.sys.vendor: Dell Inc.

Revision history for this message
Achim Behrens (k1l) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Achim Behrens (k1l) wrote :

Just saw that Dell revoked their A15 Bios Update for the Poweredge T20. So i downgraded to Bios A14 and the /proc/cpuinfo now shows 0x22 as microcode again. It was 0x23 on the revoked BIOS A15.

Looks like the mce errors are gone now on load.

Changed in linux (Ubuntu):
importance: Undecided → High
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.