[Lenovo Ubuntu 20.04.6&22.04.5 bug] After injecting a memory MCE error, no error logs were obtained from rasdaemon

Bug #2088250 reported by lijian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
rasdaemon (Ubuntu)
In Progress
Undecided
Tai Ho

Bug Description

[ Impact ]

* This update fixes LP: #2088250, where critical hardware error logs are not correctly recorded in the database.

* This fix has already been implemented and validated in Debian (version 0.8.1-3) and has been successfully running in Debian Bookworm-backports for several months.

* This SRU synchronizes Ubuntu Jammy with the stable Debian backport baseline

[ Test Plan ]

* Basic verification includes confirming the presence of binaries and the systemd unit files, as well as verifying the daemon's ability to initialize the event database:

for path in "/usr/lib/systemd/system/ras-mc-ctl.service" \
"/usr/lib/systemd/system/rasdaemon.service" \
"/usr/sbin/ras-mc-ctl" \
"/usr/sbin/rasdaemon" \
"/var/lib/rasdaemon"
do
ls -d $path
done

# Check event database persistence
ls /var/lib/rasdaemon/ras-mc_event.db

# Test summary reporting tool
/usr/sbin/ras-mc-ctl --summary

# Verify service health and log registration
systemctl status rasdaemon
journalctl -b | grep EDAC

* Note on Error Injection Testing: If you wish to perform active error injection testing, ensure the kernel has been built with EDAC debugfs support (CONFIG_EDAC_DEBUG) and is running on hardware that supports these triggers.

[ Where problems could occur ]

* Standalone Nature: rasdaemon is an optional, standalone monitoring application. It is not part of the critical boot path or core kernel functions. If the daemon were to fail or crash, it would not impact system stability, network connectivity, or the ability to boot.

* Large Delta: While the version jump from 0.6.7 to 0.8.1 is significant, this specific version has undergone testing in the Debian ecosystem. As the current Debian Maintainer for rasdaemon, I am overseeing this transition to ensure that the Ubuntu package benefits from the same stability and feature set as the current Debian Backport (https://packages.debian.org/bookworm-backports/rasdaemon).

[ Other Info ]

A traditional debdiff is not provided due to the extensive delta between the legacy Jammy version and the current Debian Backport. Alternatively, we can check the reference build from my PPA: https://launchpad.net/~tai271828/+archive/ubuntu/rasdaemon-deb-dev/

=========== Original Bug Description Below =================

Release: 20.04.6 and 22.04.5

rasdaemon version : 0.6.5(20.04.6) 0.6.7(22.04.5)

Describe:
    After injecting a memory MCE error, no error logs were obtained from rasdaemon
---------------------------------------
# root@test:/tmp# ras-mc-ctl --errors
No Memory errors.

No PCIe AER errors.

No Extlog errors.

No MCE errors.
--------------------------------------

After upgrading rasdaemon to 0.8.1-3 on 22.04.5, MCE errors will be recorded by rasdaemon.Due to the complexity of handling dependencies, we have not yet attempted 0.8.1-3 on 20.4.6.
Also, I found a similar bug https://bugs.launchpad.net/ubuntu/+source/rasdaemon/+bug/2058328.

Tags: focal jammy
Revision history for this message
lijian (lijian35) wrote :
lijian (lijian35)
description: updated
Revision history for this message
xiaochun Lee (xavier-lee) wrote (last edit ):

Hi, Canonical, is there any updates on this issue?
For 22.04.5, we'd like the latest rasdaemon version 0.8.1-3 at https://launchpad.net/ubuntu/+source/rasdaemon could be updated or downloaded from your official repo.
For 20.4.6, since there is much of other package dependencies (include glibc) when installing reasdaemon version 0.8.1-3, so we have not been tested on 20.4.6. We also hoping your official repo contains the latest version and their dependency packages.

For issue itself, I did a lot debugs on it, it seems the glibc function "poll()" can not be wakeup even it monitored ftrace file has data in, such as file /sys/kernel/debug/tracing/instances/rasdaemon/per_cpu/cpu0/trace_pipe_raw

This issue maybe fixed by commit 6986d81 rasdaemon: Fix poll() on per_cpu trace_pipe_raw blocks indefinitely and kernel commit 3e46d91 tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw

summary: - After injecting a memory MCE error, no error logs were obtained from
- rasdaemon
+ [Lenovo Ubuntu 20.04.6&22.04.5 bug] After injecting a memory MCE error,
+ no error logs were obtained from rasdaemon
Revision history for this message
Jeff Lane  (bladernr) wrote :

This is a universe package, and not something we maintain right now, it is synced from upstream but the versions in each LTS is frozen. I doubt that we would backport 0.8.1-3 all the way back to 22.04 from Plucky, but I am checking internally an have pointed folks to this bug to take a look and see what, if anything, we could do for now.

Revision history for this message
xiaochun Lee (xavier-lee) wrote (last edit ):

@Jeff
Thanks your reply, we're more care about updating rasdaemon version 0.6.5 to 0.8.1-3 on 20.04.6 at this moment since we will meet a deadline in our internal project. As you know, due to bunch of dependences will be resolved if update to version 0.8.1-3, so would be possible just separately backport to corresponding patches into it!
Or If you consider you're not supporting on 20.04.6 anymore, would you please clarify here so that we can go permanent limitation for our internal issue. Thanks!

Revision history for this message
xiaochun Lee (xavier-lee) wrote :

@Jeff
Sorry to push you, but is there any updates on this?

Revision history for this message
Jeff Lane  (bladernr) wrote :

There is no further update for now, I am still chasing this internally to try and get some time allocated to this by the team that the maintainer is a part of. As this is not something that we have a business obligation to resolve, finding the time for that is proving difficult, but I will continue to ask after this.

Revision history for this message
Jeff Lane  (bladernr) wrote :

No further updates at this time.

Revision history for this message
Jeff Lane  (bladernr) wrote :

No updates, I did reach out to the Debian maintainer but have not heard back.

Revision history for this message
xiaochun Lee (xavier-lee) wrote :

@Jeff, Is there any updates on this issue at this point of time?

Revision history for this message
Tai Ho (tai271828) wrote :

I can help with the backport of 0.8.1-3 to Jammy. However, for Focal, I doubt whether the backport is a good idea (or even "feasible"), for the following reaons:

1. Focal is reaching its End of Standard Support in a few days.
2. The rasdaemon 0.8.1-3 dependencies are largely incompatible with Focal (e.g. it requires libc6 (>= 2.34) and libtraceevent1 (>= 5.4)). Revamping core components like libc6 on a release nearing end of support is likely "forbidden."

As an alternative for Focal, I suggest simply compiling rasdaemon from source with the required libc version. However, please note that this approach is at your own risk, as the resulting binary may not be well tested by the community.

Tai Ho (tai271828)
description: updated
Tai Ho (tai271828)
Changed in rasdaemon (Ubuntu):
status: New → In Progress
assignee: nobody → Tai Ho (tai271828)
Revision history for this message
John Chittum (jchittum) wrote :

non-SRU member, but I do not believe what is in the PPA would pass an SRU.

https://documentation.ubuntu.com/project/SRU/howto/introduction-to-sru/

Ubuntu does not, by default, synchronize with Debian backports. It follows a stable release process that offers targeted updates to packages addressing specific bugs. If it is not possible, then an exception can be made

https://documentation.ubuntu.com/project/SRU/howto/prepare-special/

There's more info on special SRUs

https://documentation.ubuntu.com/project/SRU/reference/special/#reference-special-types-of-sru

For these SRU exceptions, you need to add a specific page dictating all the ins and outs of the exception. The other path is to attempt the Backports pocket instead.

https://documentation.ubuntu.com/project/how-ubuntu-is-made/processes/backports/

The backport process is still in the old wiki because it is not a well-used process

https://wiki.ubuntu.com/UbuntuBackports

In this case, if a targeted patch for the specific bug can be found, then an SRU will be much easier. If not, then using the PPA provided by tai271828 may be an option. The diff is rather large, so at least for me, in this drive-by, finding the specific patch isn't possible.

Revision history for this message
Sebastien Bacher (seb128) wrote :

Unsubscribing sponsors for now since that's not in a state to be picked up at this point, please subscribe ubuntu-sponsors back once the previous comment is addressed

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.