2018-05-07 19:39:14 |
dann frazier |
bug |
|
|
added bug |
2018-05-07 19:39:24 |
dann frazier |
linux (Ubuntu): status |
New |
In Progress |
|
2018-05-07 19:39:31 |
dann frazier |
nominated for series |
|
Ubuntu Bionic |
|
2018-05-07 19:39:31 |
dann frazier |
bug task added |
|
linux (Ubuntu Bionic) |
|
2018-05-07 19:39:41 |
dann frazier |
linux (Ubuntu Bionic): status |
New |
In Progress |
|
2018-05-07 19:39:44 |
dann frazier |
linux (Ubuntu Bionic): assignee |
|
dann frazier (dannf) |
|
2018-05-07 19:39:46 |
dann frazier |
linux (Ubuntu): assignee |
|
dann frazier (dannf) |
|
2018-05-07 22:03:55 |
dann frazier |
description |
[Impact]
The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user.
[Test Case]
$ sudo apt install rasdaemon
# On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error.
$ sudo ras-mc-ctl --errors
# There should be an entry for the injected error, as shown below:
No Memory errors.
PCIe AER events:
1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error
No Extlog errors.
No MCE errors.
[Regression Risk] |
[Impact]
The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user.
[Test Case]
$ sudo apt install rasdaemon
# On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error.
$ sudo ras-mc-ctl --errors
# There should be an entry for the injected error, as shown below:
No Memory errors.
PCIe AER events:
1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error
No Extlog errors.
No MCE errors.
[Regression Risk]
Above test was ran on x86 & ARM platforms to mitigate regression risk. |
|
2018-05-07 22:06:48 |
dann frazier |
attachment added |
|
einj-aer.sh https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769730/+attachment/5135673/+files/einj-aer.sh |
|
2018-05-09 18:24:01 |
Andrew Cloke |
bug |
|
|
added subscriber Andrew Cloke |
2018-05-22 17:45:15 |
dann frazier |
description |
[Impact]
The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user.
[Test Case]
$ sudo apt install rasdaemon
# On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error.
$ sudo ras-mc-ctl --errors
# There should be an entry for the injected error, as shown below:
No Memory errors.
PCIe AER events:
1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error
No Extlog errors.
No MCE errors.
[Regression Risk]
Above test was ran on x86 & ARM platforms to mitigate regression risk. |
[Impact]
The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user.
[Test Case]
$ sudo apt install rasdaemon
# On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error.
$ sudo ras-mc-ctl --errors
# There should be an entry for the injected error, as shown below:
No Memory errors.
PCIe AER events:
1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error
No Extlog errors.
No MCE errors.
[Fix]
There is a 2-patch upstream fix that addresses this issue and cleanly cherry-picks into Ubuntu. The solution is to not artficially limit which PCIe errors are reported down to the AER driver to those that are recoverable.
[Regression Risk]
Above test was ran on x86 & ARM platforms to mitigate regression risk. |
|
2018-05-23 09:38:06 |
Stefan Bader |
linux (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2018-05-24 18:06:24 |
Brad Figg |
tags |
|
verification-needed-bionic |
|
2018-05-25 15:08:23 |
dann frazier |
tags |
verification-needed-bionic |
verification-done-bionic |
|
2018-06-11 15:08:06 |
Launchpad Janitor |
linux (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2018-06-11 15:08:06 |
Launchpad Janitor |
cve linked |
|
2018-1092 |
|
2018-06-11 15:08:06 |
Launchpad Janitor |
cve linked |
|
2018-3639 |
|
2018-06-11 15:08:06 |
Launchpad Janitor |
cve linked |
|
2018-8087 |
|
2018-06-14 12:16:29 |
Launchpad Janitor |
linux (Ubuntu): status |
In Progress |
Fix Released |
|