"mcelog --client" cannot ouput after performing PFA test on Ubunt22.04 and SR850v2
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mcelog (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
No mcelog --client output after running pfa test. It should have output after performing intel RAS PFA test.
I tried to run the PFA test on Ubuntu 22.04 OS and Lenovo SR850v2 system(Cedar Island platform).
But there is no any output after run the "mcelog --client" command.
I tried another OS, RHEL8.6 and run the same test and use the same command "mcelog --client" which mcelog packaged is provided by Red Hat. It can shows output after running the "mcelog --client"
I am not sure what I missed some steps. Below are my steps, please advise what i need to do. Thank you.
Test Steps:
1. Prepare Test environement
1-1 Enable the MCE feature in system UEFI setting.
1-2 download the Lenovo onecli package and set UEFI enviroment for RAS PFA test in Ubuntu 22.04 OS .
https:/
#./onecli config set SystemOobCustom
#./OneCli config set Memory.
#./OneCli config set Memory.
#./OneCli config set Memory.
#./OneCli config set Memory.
#./OneCli config set Memory.
#./OneCli config set Memory.
#./OneCli config set Memory.
#./OneCli config set AdvancedRAS.
#./OneCli config set AdvancedRAS.
#./OneCli config set Memory.PollCEevent enabled --override --log 5
#./OneCli config set SystemOobCustom
#ipmitool raw 0x3A 0xC4 0x03 0x00 0x1A 0x01 0x93 0x2F 0x61 0x63 0x2F 0x69 0x62 0x6D 0x63 0x2F 0x75 0x65 0x66 0x69 0x2F 0x44 0x63 0x69 0x45 0x6E 0x11 0x01
#reboot
2. Download the ras tool from github and compile the ras tool
root@test:
mount: /sys/kernel/debug: none already mounted on /run/credential
root@test:
flags for page 20526f: uptodate mmap anon swapbacked
vtop(7ffaa98d1000) = 20526f000
Hit any key to access: ^Z
[1]+ Stopped ./mca-recover
root@test:
0x00000008 Memory Correctable
0x00000010 Memory Uncorrectable non-fatal
0x00000020 Memory Uncorrectable fatal
Injecting Correctable Memory Error
Injecting 10 errors at address 0x20526f000.
System performance will be affected while errors are being injected.
inject times: 1
inject times: 2
inject times: 3
inject times: 4
inject times: 5
inject times: 6
inject times: 7
inject times: 8
inject times: 9
inject times: 10
Injection Complete
3. check the syslog and "mcelog --client"
root@test:
[ 3.351156] Booting paravirtualized kernel on bare hardware
[ 579.630701] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 162
[ 579.630707] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
[ 579.630709] {1}[Hardware Error]: event severity: corrected
[ 579.630711] {1}[Hardware Error]: Error 0, type: corrected
[ 579.630713] {1}[Hardware Error]: section_type: memory error
[ 579.630714] {1}[Hardware Error]: error_status: 0x0000000000000400
[ 579.630716] {1}[Hardware Error]: physical_address: 0x000000020526f000
[ 579.630718] {1}[Hardware Error]: node: 0 card: 7 module: 0 rank: 0 bank: 13 device: 1 row: 3625 column: 968
[ 579.630719] {1}[Hardware Error]: error_type: 2, single-bit ECC
[ 579.630722] {1}[Hardware Error]: DIMM location: CPU 1 DIMM 5
[ 579.643486] mce: [Hardware Error]: Machine check events logged
root@test:
tail: cannot open '/var/log/mes' for reading: No such file or directory
root@test:
Apr 8 23:37:49 test kernel: [ 579.630709] {1}[Hardware Error]: event severity: corrected
Apr 8 23:37:49 test kernel: [ 579.630711] {1}[Hardware Error]: Error 0, type: corrected
Apr 8 23:37:49 test kernel: [ 579.630713] {1}[Hardware Error]: section_type: memory error
Apr 8 23:37:49 test kernel: [ 579.630714] {1}[Hardware Error]: error_status: 0x0000000000000400
Apr 8 23:37:49 test kernel: [ 579.630716] {1}[Hardware Error]: physical_address: 0x000000020526f000
Apr 8 23:37:49 test kernel: [ 579.630718] {1}[Hardware Error]: node: 0 card: 7 module: 0 rank: 0 bank: 13 device: 1 row: 3625 column: 968
Apr 8 23:37:49 test kernel: [ 579.630719] {1}[Hardware Error]: error_type: 2, single-bit ECC
Apr 8 23:37:49 test kernel: [ 579.630722] {1}[Hardware Error]: DIMM location: CPU 1 DIMM 5
Apr 8 23:37:49 test kernel: [ 579.643486] mce: [Hardware Error]: Machine check events logged
Apr 8 23:37:50 test systemd-
root@test:
root@test:
information type: | Private Security → Public Security |
information type: | Public Security → Public |
Changed in mcelog (Ubuntu): | |
status: | Incomplete → Invalid |
mcelog version: v181 /kernel. googlesource. com/pub/ scm/utils/ cpu/mce/ mcelog. git
#git clone https:/