ras-mc-ctl doesn't provide BDF for PCIe errors

Bug #1888423 reported by dann frazier
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
rasdaemon (Debian)
Fix Released
Unknown
rasdaemon (Ubuntu)
Fix Released
Undecided
dann frazier
Focal
Fix Released
Undecided
dann frazier

Bug Description

[Impact]
rasdaemon provides ras-mc-ctl, a script for querying the rasdaemon database. When displaying PCIe AER events from the
database, it doesn't provide any information to identify the associated PCIe device. Knowing that some hardware is reporting errors, but not knowing what hardware that is, isn't terribly helpful.

This information is already stored in the database (has been since 0.6.5 in focal), so we just need to update ras-mc-ctl to display it as well.

[Test Case]
 - Trigger an AER event (how to do so appears to be pretty platform-specific).
 - Check for the Bus/device/function info in the output of ras-mc-ctl.

[Fix]
https://github.com/mchehab/rasdaemon/commit/059a901e97f4091e31c50ce55027daf707638f8d

[Regression Risk]
The change here adds additional content to the output of ras-mc-ctl. Instead of something like this:

PCIe AER events:
1 2020-04-16 22:09:48 +0000 Corrected error: Receiver Error
2 2020-04-16 22:23:24 +0000 Corrected error: Receiver Error

You'll now see something like this:
PCIe AER events:
1 2020-04-16 22:09:48 +0000 0000:0b:00.0 Corrected error: Receiver Error
2 2020-04-16 22:23:24 +0000 0000:0b:00.0 Corrected error: Receiver Error

As with any such unstructured output, it's possible that a user has some code to parse the output that would be confused by the additional content.

dann frazier (dannf)
Changed in rasdaemon (Ubuntu):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
dann frazier (dannf)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rasdaemon - 0.6.5-2ubuntu1

---------------
rasdaemon (0.6.5-2ubuntu1) groovy; urgency=medium

  * d/p/ras-mc-ctl-PCIe-AER-display-PCIe-dev-name.patch:
    ras-mc-ctl: Display bus/device/function of the PCIe device
    corresponding to an AER event. LP: #1888423.
  * d/p/rasdaemon-fix-the-wrong-declaring-of-sruct-ras_event.patch:
    Fix FTBFS w/ gcc-10.

 -- dann frazier <email address hidden> Tue, 21 Jul 2020 19:57:27 +0000

Changed in rasdaemon (Ubuntu):
status: In Progress → Fix Released
dann frazier (dannf)
description: updated
Changed in rasdaemon (Ubuntu Focal):
status: New → In Progress
assignee: nobody → dann frazier (dannf)
description: updated
Changed in rasdaemon (Debian):
status: Unknown → Confirmed
Changed in rasdaemon (Debian):
status: Confirmed → Fix Released
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello dann, or anyone else affected,

Accepted rasdaemon into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/rasdaemon/0.6.5-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in rasdaemon (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
dann frazier (dannf) wrote :

Verification:

ubuntu@akis:~$ cat einj.sh
#!/bin/bash

D="/sys/kernel/debug/apei/einj"

# supported errors for injection listed in ${D}/available_error_types
# magic numbers from ACPI 6.3 18.6.3 table 18-409
ERROR_TYPE="0x40"

# PCIe SBDF - ACPI 6.3 18.6.3 table 18-410
# Byte 3 – PCIe Segment Description
# Byte 2 – Bus Number
# Byte 1 – Device Number [Bits 7:3], Function Number Bits [2:0]
# Byte 0 - Reserved (all zero)
PCIDEV="0x00e60000"

modprobe einj

sleep 1

echo ${ERROR_TYPE} > ${D}/error_type
echo ${PCIDEV} > ${D}/param4

echo 1 > ${D}/error_inject

ubuntu@akis:~$ sudo ./einj.sh
ubuntu@akis:~$ sudo ras-mc-ctl --errors
No Memory errors.

PCIe AER events:
1 2020-08-11 19:59:36 +0000 0000:0b:00.0 Corrected error: Receiver Error

No Extlog errors.

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1304.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1305.

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
dann frazier (dannf) wrote :

Note that the "DBD::SQLite::db prepare failed:" message in the last comment also occurs pre-upgrade, so it is not a regression.

Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for rasdaemon has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package rasdaemon - 0.6.5-1ubuntu1.1

---------------
rasdaemon (0.6.5-1ubuntu1.1) focal; urgency=medium

  * d/p/ras-mc-ctl-PCIe-AER-display-PCIe-dev-name.patch:
    ras-mc-ctl: Display bus/device/function of the PCIe device
    corresponding to an AER event. LP: #1888423.

 -- dann frazier <email address hidden> Wed, 22 Jul 2020 13:51:43 -0600

Changed in rasdaemon (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Richard Huddleston (rhuddusa) wrote :

I am seeing

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1304.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1305.

dpkg -l | grep rasda
ii rasdaemon 0.6.5-1ubuntu1.1 amd64 utility to receive RAS error tracings

Revision history for this message
dann frazier (dannf) wrote :

@rhuddusa : yes, I did too (see comment #3), but that has nothing to do with this bug. Please report a new bug for that.

Revision history for this message
Satish Patel (satish-txt) wrote :

I am running 0.6.5-1ubuntu1.1 and getting following error on Ubuntu 20.04.

ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.

Revision history for this message
Pedro Principeza (pprincipeza) wrote :

@satish-txt Please refer to comment #8 [0]. A new LP Bug must be submitted for that error. AFAIU, the error does not affect the problem we solved with this version of `rasdaemon`.

[0] https://bugs.launchpad.net/ubuntu/+source/rasdaemon/+bug/1888423/comments/8

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.