Activity log for bug #1769730

Date Who What changed Old value New value Message
2018-05-07 19:39:14 dann frazier bug added bug
2018-05-07 19:39:24 dann frazier linux (Ubuntu): status New In Progress
2018-05-07 19:39:31 dann frazier nominated for series Ubuntu Bionic
2018-05-07 19:39:31 dann frazier bug task added linux (Ubuntu Bionic)
2018-05-07 19:39:41 dann frazier linux (Ubuntu Bionic): status New In Progress
2018-05-07 19:39:44 dann frazier linux (Ubuntu Bionic): assignee dann frazier (dannf)
2018-05-07 19:39:46 dann frazier linux (Ubuntu): assignee dann frazier (dannf)
2018-05-07 22:03:55 dann frazier description [Impact] The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user. [Test Case] $ sudo apt install rasdaemon # On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error. $ sudo ras-mc-ctl --errors # There should be an entry for the injected error, as shown below: No Memory errors. PCIe AER events: 1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error No Extlog errors. No MCE errors. [Regression Risk] [Impact] The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user. [Test Case] $ sudo apt install rasdaemon # On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error. $ sudo ras-mc-ctl --errors # There should be an entry for the injected error, as shown below: No Memory errors. PCIe AER events: 1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error No Extlog errors. No MCE errors. [Regression Risk] Above test was ran on x86 & ARM platforms to mitigate regression risk.
2018-05-07 22:06:48 dann frazier attachment added einj-aer.sh https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769730/+attachment/5135673/+files/einj-aer.sh
2018-05-09 18:24:01 Andrew Cloke bug added subscriber Andrew Cloke
2018-05-22 17:45:15 dann frazier description [Impact] The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user. [Test Case] $ sudo apt install rasdaemon # On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error. $ sudo ras-mc-ctl --errors # There should be an entry for the injected error, as shown below: No Memory errors. PCIe AER events: 1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error No Extlog errors. No MCE errors. [Regression Risk] Above test was ran on x86 & ARM platforms to mitigate regression risk. [Impact] The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user. [Test Case] $ sudo apt install rasdaemon # On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error. $ sudo ras-mc-ctl --errors # There should be an entry for the injected error, as shown below: No Memory errors. PCIe AER events: 1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error No Extlog errors. No MCE errors. [Fix] There is a 2-patch upstream fix that addresses this issue and cleanly cherry-picks into Ubuntu. The solution is to not artficially limit which PCIe errors are reported down to the AER driver to those that are recoverable. [Regression Risk] Above test was ran on x86 & ARM platforms to mitigate regression risk.
2018-05-23 09:38:06 Stefan Bader linux (Ubuntu Bionic): status In Progress Fix Committed
2018-05-24 18:06:24 Brad Figg tags verification-needed-bionic
2018-05-25 15:08:23 dann frazier tags verification-needed-bionic verification-done-bionic
2018-06-11 15:08:06 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-1092
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-3639
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-8087
2018-06-14 12:16:29 Launchpad Janitor linux (Ubuntu): status In Progress Fix Released