2021-08-18 18:09:17 |
yannek |
description |
I'm using a 20.04.2 LTS install on a t470 thinkpad. After installing the linux-image-generic-hwe-20.04 the boot kernel switched to 5.8.0.55 when a lot of mce messages appeared in the kernel log, e.g. kern.log, e.g.:
Jun 14 10:57:50 monster kernel: [ 0.627088] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 3: 8c40004000100151
Jun 14 10:57:50 monster kernel: [ 0.627089] mce: [Hardware Error]: TSC c619c16f1 ADDR 4414b2940 MISC 306485
Jun 14 10:57:50 monster kernel: [ 0.627090] mce: [Hardware Error]: PROCESSOR 0:806e9 TIME 1623661045 SOCKET 0 APIC 2 microcode de
Using rasdaemon and the fixed ras-mc-ctl script from upstream this got elaborated to
$ ras-mc-ctl --errors
--snip--
188 2021-06-14 10:54:21 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc400e8000100151, addr=0x2146b9240, misc=0x00516485, walltime=0x60c7193d, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
189 2021-06-14 10:54:22 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc40020000100151, addr=0x4344eee40, misc=0x02526485, walltime=0x60c7193e, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
190 2021-06-14 10:54:26 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc40064000100151, addr=0x21447e7c0, misc=0x02526485, walltime=0x60c71942, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
--snap--
Is this just better reporting by the 5.8 kernel or is this a mismatch of kernel and hardware?
I have no sudden application crashes or other indications for failing hardware. And a few hours of memtest86+ (not the broken version from the repo but a current one from a boot cd) report no errors. |
I'm using a 20.04.2 LTS install on a t470 thinkpad. After installing the linux-image-generic-hwe-20.04 the boot kernel switched to 5.8.0.55 when a lot of mce messages appeared in the kernel log, e.g. kern.log, e.g.:
Jun 14 10:57:50 monster kernel: [ 0.627088] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 3: 8c40004000100151
Jun 14 10:57:50 monster kernel: [ 0.627089] mce: [Hardware Error]: TSC c619c16f1 ADDR 4414b2940 MISC 306485
Jun 14 10:57:50 monster kernel: [ 0.627090] mce: [Hardware Error]: PROCESSOR 0:806e9 TIME 1623661045 SOCKET 0 APIC 2 microcode de
Using rasdaemon and the fixed ras-mc-ctl script from upstream this got elaborated to
$ ras-mc-ctl --errors
--snip--
188 2021-06-14 10:54:21 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc400e8000100151, addr=0x2146b9240, misc=0x00516485, walltime=0x60c7193d, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
189 2021-06-14 10:54:22 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc40020000100151, addr=0x4344eee40, misc=0x02526485, walltime=0x60c7193e, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
190 2021-06-14 10:54:26 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc40064000100151, addr=0x21447e7c0, misc=0x02526485, walltime=0x60c71942, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
--snap--
Is this just better reporting by the 5.8 kernel or is this a mismatch of kernel and hardware?
I have no sudden application crashes or other indications for failing hardware. And a few hours of memtest86+ (not the broken version from the repo but a current one from a boot cd) report no errors.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.18
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC1: yannek 5763 F.... pulseaudio
/dev/snd/controlC0: yannek 5763 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: KDE
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2021-01-26 (204 days ago)
InstallationMedia: Kubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
MachineType: LENOVO 20HES0FW00
Package: linux (not installed)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.11.0-25-generic root=/dev/mapper/vgkubuntu-root ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.11.0-25.27~20.04.1-generic 5.11.22
RelatedPackageVersions:
linux-restricted-modules-5.11.0-25-generic N/A
linux-backports-modules-5.11.0-25-generic N/A
linux-firmware 1.187.16
RfKill:
0: phy0: Wireless LAN
Soft blocked: no
Hard blocked: no
Tags: focal
Uname: Linux 5.11.0-25-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 10/19/2020
dmi.bios.release: 1.65
dmi.bios.vendor: LENOVO
dmi.bios.version: N1QET90W (1.65 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20HES0FW00
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.35
dmi.modalias: dmi:bvnLENOVO:bvrN1QET90W(1.65):bd10/19/2020:br1.65:efr1.35:svnLENOVO:pn20HES0FW00:pvrThinkPadT470:rvnLENOVO:rn20HES0FW00:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad T470
dmi.product.name: 20HES0FW00
dmi.product.sku: LENOVO_MT_20HE_BU_Think_FM_ThinkPad T470
dmi.product.version: ThinkPad T470
dmi.sys.vendor: LENOVO |
|