Lots of cme when using 5.8.0 but not with 5.4.0

Bug #1931845 reported by yannek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I'm using a 20.04.2 LTS install on a t470 thinkpad. After installing the linux-image-generic-hwe-20.04 the boot kernel switched to 5.8.0.55 when a lot of mce messages appeared in the kernel log, e.g. kern.log, e.g.:

Jun 14 10:57:50 monster kernel: [ 0.627088] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 3: 8c40004000100151
Jun 14 10:57:50 monster kernel: [ 0.627089] mce: [Hardware Error]: TSC c619c16f1 ADDR 4414b2940 MISC 306485
Jun 14 10:57:50 monster kernel: [ 0.627090] mce: [Hardware Error]: PROCESSOR 0:806e9 TIME 1623661045 SOCKET 0 APIC 2 microcode de

Using rasdaemon and the fixed ras-mc-ctl script from upstream this got elaborated to

$ ras-mc-ctl --errors

--snip--
188 2021-06-14 10:54:21 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc400e8000100151, addr=0x2146b9240, misc=0x00516485, walltime=0x60c7193d, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
189 2021-06-14 10:54:22 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc40020000100151, addr=0x4344eee40, misc=0x02526485, walltime=0x60c7193e, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
190 2021-06-14 10:54:26 +0200 error: Instruction CACHE Level-1 Instruction-Fetch Error, mcg mcgstatus=0, mci Error_overflow Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0xcc40064000100151, addr=0x21447e7c0, misc=0x02526485, walltime=0x60c71942, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
--snap--

Is this just better reporting by the 5.8 kernel or is this a mismatch of kernel and hardware?
I have no sudden application crashes or other indications for failing hardware. And a few hours of memtest86+ (not the broken version from the repo but a current one from a boot cd) report no errors.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.18
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: yannek 5763 F.... pulseaudio
 /dev/snd/controlC0: yannek 5763 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: KDE
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2021-01-26 (204 days ago)
InstallationMedia: Kubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
MachineType: LENOVO 20HES0FW00
Package: linux (not installed)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.11.0-25-generic root=/dev/mapper/vgkubuntu-root ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.11.0-25.27~20.04.1-generic 5.11.22
RelatedPackageVersions:
 linux-restricted-modules-5.11.0-25-generic N/A
 linux-backports-modules-5.11.0-25-generic N/A
 linux-firmware 1.187.16
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
Tags: focal
Uname: Linux 5.11.0-25-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 10/19/2020
dmi.bios.release: 1.65
dmi.bios.vendor: LENOVO
dmi.bios.version: N1QET90W (1.65 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20HES0FW00
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.35
dmi.modalias: dmi:bvnLENOVO:bvrN1QET90W(1.65):bd10/19/2020:br1.65:efr1.35:svnLENOVO:pn20HES0FW00:pvrThinkPadT470:rvnLENOVO:rn20HES0FW00:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad T470
dmi.product.name: 20HES0FW00
dmi.product.sku: LENOVO_MT_20HE_BU_Think_FM_ThinkPad T470
dmi.product.version: ThinkPad T470
dmi.sys.vendor: LENOVO

Revision history for this message
yannek (yannek-deactivatedaccount) wrote :
Revision history for this message
yannek (yannek-deactivatedaccount) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1931845

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
yannek (yannek-deactivatedaccount) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected focal
description: updated
Revision history for this message
yannek (yannek-deactivatedaccount) wrote : CRDA.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : CurrentDmesg.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : IwConfig.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : Lspci.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : Lspci-vt.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : Lsusb.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : Lsusb-t.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : Lsusb-v.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : ProcEnviron.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : ProcInterrupts.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : ProcModules.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : PulseList.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : UdevDb.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : WifiSyslog.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote : acpidump.txt

apport information

Revision history for this message
yannek (yannek-deactivatedaccount) wrote :

The kernel in use is now 5.11.0-25-generic but the symptoms remain the same.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.14-rc7/amd64/
Headers are not needed.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
yannek (yannek-deactivatedaccount) wrote :
Download full text (3.6 KiB)

It does not happen every day, but still often. And still without discernible problems while working, fingers crossed.

$ uname -a
Linux monster 5.14.0-051400rc7-generic #202108222230 SMP Sun Aug 22 22:33:09 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ ras-mc-ctl --errors | tail
46904 2021-09-10 07:37:33 +0200 error: corrected filtering (some unreported errors in same region) Data CACHE Level-1 Data-Read Error, mcg mcgstatus=0, mci Corrected_error Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c08, status=0x8c20004000101135, addr=0x16ab665c0, misc=0x00142285, tsc=0x184d4b8c3, walltime=0x613af128, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
46905 2021-09-10 07:37:33 +0200 error: corrected filtering (some unreported errors in same region) Data CACHE Level-1 Data-Read Error, mcg mcgstatus=0, mci Corrected_error Threshold based error status: green, Large number of corrected cache errors. System operating, but might leadto uncorrected errors soon, mcgcap=0x00000c08, status=0x8c20004000101135, addr=0x16ab665c0, misc=0x00142285, tsc=0x184d4c39b, walltime=0x613af128, cpu=0x00000003, cpuid=0x000806e9, apicid=0x00000003, bank=0x00000003
46906 2021-09-10 07:37:33 +0200 error: Data CACHE Level-1 Data-Read Error, mcg mcgstatus=0, mci Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0x8c40004000100135, addr=0x13bbc2f40, misc=0x00142285, tsc=0x185ba6ba4, walltime=0x613af128, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
46907 2021-09-10 07:37:33 +0200 error: Data CACHE Level-1 Data-Read Error, mcg mcgstatus=0, mci Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0x8c40004000100135, addr=0x13bbc2f40, misc=0x00142285, tsc=0x185ba75b2, walltime=0x613af128, cpu=0x00000003, cpuid=0x000806e9, apicid=0x00000003, bank=0x00000003
46908 2021-09-10 07:37:33 +0200 error: Data CACHE Level-1 Data-Read Error, mcg mcgstatus=0, mci Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0x8c40004000100135, addr=0x13bbc2f40, misc=0x00122285, tsc=0x185baa957, walltime=0x613af128, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
46909 2021-09-10 07:37:33 +0200 error: Data CACHE Level-1 Data-Read Error, mcg mcgstatus=0, mci Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0x8c40004000100135, addr=0x15e38fc58, misc=0x00102285, tsc=0x1862c9349, walltime=0x613af128, cpu=0x00000001, cpuid=0x000806e9, apicid=0x00000002, bank=0x00000003
46910 2021-09-10 07:37:33 +0200 error: Data CACHE Level-1 Data-Read Error, mcg mcgstatus=0, mci Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0x8c40004000100135, addr=0x15e38fc58, misc=0x00102285, tsc=0x1862c99aa, walltime=0x613af128, cpu=0x00000003, cpuid=0x000806e9, apicid=0x00000003, bank=0x00000003
46911 2021-09-10 07:37:33 +0200 error: Generic CACHE Level-1 Eviction Error, mcg mcgstatus=0, mci Corrected_error Threshold based error status: yellow, mcgcap=0x00000c08, status=0x8c40004000100179, addr=0x1507ea6c0, misc=0x00744285, tsc=0x...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.