[HMEM] kmem: clear poison reported via mce or ACPI patrol scrub notifications
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
intel |
Invalid
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
Description:
During run-time, the kernel can be notified of new/latent errors in two different ways:
If an application trips over a latent/unknown error, and if the system has machine check recovery, then we will be notified via the mce handler, and the app will be killed. In this case, the kernel should take the chance to go clear-errors for that page, and re-online it so it can be used again in the future.
If the 'patrol scrubber' discovers an error on a yet-to-be-accessed location, it can send an ACPI notification to the nfit driver. In this case, the kernel should go clear the error if the page is not in use. If it is in use, the application that has it mapped may need to killed as in case 1 above.
This additional run-time handling of errors will augment (and be complimentary to) the init-time handling in userspace, and having both will give us the best possible coverage for media errors.
Target Release: 22.10
Target Kernel: TBD
description: | updated |
tags: |
added: intel-kernel-20.04 removed: intel-kernel-19.10 |
description: | updated |
tags: |
added: intel-kernel-20.10 removed: intel-kernel-20.04 |
description: | updated |
tags: |
added: intel-kernel-21.04 removed: intel-kernel-20.10 |
description: | updated |
tags: |
added: intel-kernel-21.10 removed: intel-kernel-21.04 |
description: | updated |
tags: |
added: intel-kernel-22.04 removed: intel-kernel-21.10 |
description: | updated |
tags: |
added: intel-kernel-22.10 removed: intel-kernel-22.04 |
Changed in intel: | |
status: | New → Invalid |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1835340
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.