mce: ras: When inject 1bit ecc error, there is no mce log recorded in the dmesg
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Disco |
Fix Released
|
Undecided
|
Po-Hsu Lin |
Bug Description
== SRU Justification ==
With the 5.0 Disco kernel, the kernel cannot record the mce log while
injecting 1bit ecc error.
== Fix ==
* 09cbd219 (RAS/CEC: Increment cec_entered under the mutex lock)
* de0e0624 (RAS/CEC: Check count_threshold unconditionally)
Commit de0e0624 is the real fix for this issue, 09cbd219 is a fix to
avoid race condition, and it can make the latter become a clean
cherry-pick.
These have been landed on newer kernels.
== Test ==
Test kernel could be found here:
https:/
Verified by the bug reporter, fan jinke, the patched kernel can log
the error correctly.
== Regression Potential ==
Low, changes are limited to the RAS Correctable Errors Collector. And
the fix has been verified as working as expected.
== Original Bug Report ==
Using Linux kernel, When inject 1bit ecc error, there are some mce log recorded in the dmesg.like:
[ 1561.511210] mce: [Hardware Error]: Machine check events logged
[ 1561.511221] [Hardware Error]: Corrected error, no action required.
[ 1561.511311] [Hardware Error]: CPU:0 (18:0:2) MC16_STATUS[
[ 1561.511388] [Hardware Error]: Error Addr: 0x000000077cd66940
[ 1561.511439] [Hardware Error]: IPID: 0x0000009600150f00, Syndrome: 0x000010ce0a400d01
[ 1561.511499] [Hardware Error]: Unified Memory Controller Extended Error Code: 0
[ 1561.511556] [Hardware Error]: Unified Memory Controller Error: DRAM ECC error.
[ 1561.511646] EDAC MC0: 1 CE on mc#0csrow#
[ 1561.511648] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
*But, there are no the log when Using "Ubuntu 18.04.3 LTS"*
The upstream related commit is de0e0624d86ff9f
After merged this commit, Ubuntu kernel's dmesg can record the mce log as well.
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw----+ 1 root audio 116, 1 Dec 24 17:20 seq
crw-rw----+ 1 root audio 116, 33 Dec 24 17:20 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.10-0ubuntu27
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 19.04
InstallationDate: Installed on 2019-12-24 (0 days ago)
InstallationMedia: Ubuntu-Server 19.04 "Disco Dingo" - Release amd64 (20190416.1)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: Sugon HygonH210
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=linux
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.178
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: disco
Uname: Linux 5.0.0-13-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 03/15/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 210ER119
dmi.board.
dmi.board.name: HygonH210
dmi.board.vendor: Sugon
dmi.board.version: Default string
dmi.chassis.
dmi.chassis.type: 17
dmi.chassis.vendor: Sugon
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: Rack
dmi.product.name: HygonH210
dmi.product.sku: Default string
dmi.product.
dmi.sys.vendor: Sugon
CVE References
description: | updated |
Changed in linux (Ubuntu Disco): | |
status: | New → In Progress |
assignee: | nobody → Po-Hsu Lin (cypressyew) |
Changed in linux (Ubuntu): | |
status: | Incomplete → Fix Released |
Changed in linux (Ubuntu Disco): | |
status: | In Progress → Fix Committed |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1857413
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.