mcelog does not work due to lack of kernel support

Bug #588993 reported by bamyasi
56
This bug affects 10 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

mcelog does not seem to be happy with the current Ubuntu kernel. This is all I get in /var/log/syslog:

# grep mce /var/log/syslog
Jun 2 18:14:15 xxxxx mcelog: failed to prefill DIMM database from DMI data
Jun 2 18:14:15 xxxxx mcelog: Kernel does not support page offline interface

I can disable memory database prefill option in /etc/mcelog/mcelog.conf which silences the first error message. However, dmidecode seems to work just fine and produces reasonably looking output so I am not sure why prefill option of mcelog fails. Also, mcelog never outputs anything so I suspect it is not functional. I have a cluster of identical hardware and I can simulate a constant stream of MCE errors by misconfiguring memory settings in BIOS. The errors are successfully fetched and reported by mcelog on the nodes with CentOS 5 installed (kernel 2.6.18-164.15.1.el5). Nothing is reported under Ubuntu Server 10.04 LTS on this same hardware.

# lsb_release -rd
Description: Ubuntu 10.04 LTS
Release: 10.04

# uname -a
Linux xxxxx 2.6.32-22-server #33-Ubuntu SMP Wed Apr 28 14:34:48 UTC 2010 x86_64 GNU/Linux

# apt-cache policy mcelog
mcelog:
  Installed: 1.0~pre3-1
  Candidate: 1.0~pre3-1
  Version table:
 *** 1.0~pre3-1 0
        500 http://us.archive.ubuntu.com/ubuntu/ lucid/universe Packages
        100 /var/lib/dpkg/status

Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

Hello
Is this problem present with the latest updates ?
Thanks in advance
Fabio

Changed in ubuntu:
status: New → Incomplete
Revision history for this message
bamyasi (iadzhubey) wrote :

Yes, it looks like nothing has changed in Ubuntu Server 10.10 and mcelog interface is still non-functional. The only output I can see in the logs is:

"Dec 6 14:12:47 ula mcelog: failed to prefill DIMM database from DMI data"

EDAC driver seems to work now (it previously failed to load) but I have no idea how to further test mcelog.

Hardware: dual AMD Opteron 2389, nVidia MCP55Pro chipset

System:

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 10.10
Release: 10.10
Codename: maverick

# uname -a
Linux xxxx 2.6.35-22-server #35-Ubuntu SMP Sat Oct 16 22:02:33 UTC 2010 x86_64 GNU/Linux

--bamyasi

Changed in ubuntu:
status: Incomplete → New
affects: ubuntu → linux (Ubuntu)
tags: added: lucid maverick
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi bamyasi,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 588993

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
RoyK (roysk) wrote :

I just tried to build the latest mcelog from git on Lucid (2.6.32-22-server), and I get the same error message there, so either it's an unsolved bug in mcilog, or perhaps it's a kernel issue. This is on a dual SuperMicro H8DGU / Opteron 6136 system.

Revision history for this message
Eric Barch (ericb) wrote :

Experiencing this issue on a Quad Core Athlon II system.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Daniel Nyström (speakman) wrote :

"This is a harmless warning message. [..]"
http://mcelog.org/faq.html#9

Revision history for this message
Lowell Alleman (lowell-alleman) wrote :

I've been attempting to verify that mcelog is working on my 10.04 LTS system. (After coming across this issue, I wanted to double check that MCE events would actually be logged.) The mcelog website points to a tool called "mce-inject" that will simulate a failure to allow for testing like this. However, each time I attempt to run this tool, I get the following error:

Injecting mce on /dev/mcelog: Invalid argument

Nothing was logged by mcelog, and I'm not sure if this is an issue with the test or with some mce kernel component. It certainly seems like there is a problem somewhere in the stack.

I'm using mce-inject-0.1

Revision history for this message
Tuomas Heino (iheino+ub) wrote :

For the record, first ever CE reported by mcelog below on the hardware I have access to. No UCEs nor even repeating CEs seen. Have not tested mce-inject and/or apei/einj.ko kernel module.

mcelog: failed to prefill DIMM database from DMI data
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 0
TIME 1422983269 Tue Feb 3 19:07:49 2015
MCG status:
MCi status:
Corrected error
Error enabled
MCA: Internal parity error
STATUS 90000040000f0005 MCGSTATUS 0
MCGCAP c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 60

Syslog entries for mcelog startup on up-to-date trusty do still mention that page-offlining is unsupported:
Feb 19 08:26:04 gx1 mcelog: failed to prefill DIMM database from DMI data
Feb 19 08:26:04 gx1 mcelog: Kernel does not support page offline interface

tags: added: trusty
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.