linux-aws: Xen / hibernation: xen-netfront panic + resume hangs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-aws (Ubuntu) |
Fix Released
|
High
|
Andrea Righi | ||
Eoan |
Fix Released
|
High
|
Andrea Righi | ||
Focal |
Fix Released
|
High
|
Andrea Righi |
Bug Description
[Impact]
During our AWS testing we were able to trigger some hibernation failures in some Xen instance types.
One problem is a kernel panic in the resume callback of the xen-netfront driver. A workaround to this problem is to compile the driver as a module and reload it at resume (we were already doing this reload with the bionic kernel that had this driver compiled as a module, but for some reasons eoan and focal had this statically compiled).
Other issues were showing up as hangs on resume, these seem to be prevented by using the new Xen/hibernation patch set posted by Anchal to the LKML:
https://<email address hidden>/
This new patch set is still being reviewed, but according to our tests it really seems to fix some of these hangs on resume.
In addition to that we can improve hibernation reliability and performance even more by applying the updated swapoff optimization patch (that has been merged upstream).
[Test case]
Create a Xen instance in AWS, hibernate/resume multiple times.
[Fix]
The following set of fixes can be used to improve hibernation performance and reliability:
- new Xen/hibernation patch set from the LKML (see link above)
- config change to compile xen-netfront as a module
- new swapoff optimization patch
[Regression potential]
The xen-netfront config change and the new swapoff optimization patch are pretty safe (one is a config change that affects only the xen-netfront driver, the other is a clean cherry-pick of an upstream commit).
The new Xen/hibernation update is pretty big and the new patches are still under review, however according to our tests it really seems to fix some of the hang issues (it definitely makes things better). Moreover, all the changes are affecting Xen and they are restricted to the hibernation/resume code paths, so, in conclusion, the overall regression potential is minimal.
[See also]
NOTE: the fix mentioned in LP: #1879711 (disable CONFIG_DMA_CMA) was also applied during our tests and it is also required to make hibernation stable in Xen.
CVE References
Changed in linux-aws (Ubuntu Eoan): | |
importance: | Undecided → High |
Changed in linux-aws (Ubuntu Focal): | |
importance: | Undecided → High |
assignee: | nobody → Andrea Righi (arighi) |
Changed in linux-aws (Ubuntu Eoan): | |
assignee: | nobody → Andrea Righi (arighi) |
Changed in linux-aws (Ubuntu): | |
assignee: | nobody → Andrea Righi (arighi) |
importance: | Undecided → High |
summary: |
- linux-aws: fix Xen / hibernation issues + linux-aws: Xen / hibernation: xen-netfront panic + resume hangs |
Changed in linux-aws (Ubuntu Eoan): | |
status: | New → Confirmed |
Changed in linux-aws (Ubuntu Focal): | |
status: | New → Confirmed |
Changed in linux-aws (Ubuntu): | |
status: | New → Confirmed |
Changed in linux-aws (Ubuntu Eoan): | |
status: | Confirmed → Fix Committed |
Changed in linux-aws (Ubuntu Focal): | |
status: | Confirmed → Fix Committed |
This bug was fixed in the package linux-aws - 5.4.0-1018.18
---------------
linux-aws (5.4.0-1018.18) focal; urgency=medium
* focal/linux-aws: 5.4.0-1016.16 -proposed tracker (LP: #1882686)
* ASoC/amd: add audio driver for amd renoir (LP: #1881046)
- [Config] aws: do not enable amd renoir ASoC audio
* Focal update: v5.4.42 upstream stable release (LP: #1879759) WARN_MAYBE_ UNINITIALIZED
- [Config] updateconfigs for CC_HAS_
* linux-aws: Xen / hibernation: xen-netfront panic + resume hangs restore_ pirqs." introduce- xen_-save- restore- R_shared_ info on Resume"
(LP: #1881869)
- Revert "UBUNTU SAUCE [aws]: xen: Only restore the ACPI SCI interrupt in
xen_
- Revert "UBUNTU SAUCE [aws]: xen: restore pirqs on resume from hibernation."
- Revert "UBUNTU SAUCE [aws]: block: xen-blkfront: consider new dom0 features
on restore"
- Revert "UBUNTU SAUCE [aws]: ACPICA: Enable sleep button on ACPI legacy wake"
- Revert "UBUNTU SAUCE [aws]: mm: swap: improve swap readahead heuristic"
- Revert "UBUNTU SAUCE [aws] PM / hibernate: reduce memory pressure during
image writing"
- Revert "UBUNTU: SAUCE [aws] x86/xen: close event channels for PIRQs in
system core suspend callback"
- Revert "UBUNTU: SAUCE [aws] xen/events: add xen_shutdown_pirqs helper
function"
- Revert "UBUNTU: SAUCE [aws] x86/xen: save and restore steal clock"
- Revert "UBUNTU: SAUCE [aws] xen-time-
_steal_clock"
- Revert "UBUNTU: SAUCE [aws] xen-netfront: add callbacks for PM suspend and
hibernation support"
- Revert "UBUNTU: SAUCE [aws] x86/xen: add system core suspend and resume
callbacks"
- Revert "UBUNTU: SAUCE [aws] x86/xen: Introduce new function to map
HYPERVISO
- Revert "UBUNTU: SAUCE: xen-blkfront: Fixed blkfront_restore to remove a call
to negotiate_mq"
- Revert "UBUNTU: SAUCE: xen-blkfront: add callbacks for PM suspend and
hibernation"
- Revert "UBUNTU: SAUCE: xenbus: add freeze/thaw/restore callbacks support"
- Revert "UBUNTU: SAUCE: xen/manage: introduce helper function to know the on-
going suspend mode"
- Revert "UBUNTU: SAUCE: xen/manage: keep track of the on-going suspend mode"
- xen/blkfront: fix ring info addressing
- [Config] aws: compile xen-netfront as module
- SAUCE: mm: swap: properly update readahead statistics in unuse_pte_range()
- UBUNTU SAUCE [aws]: mm: swap: increase default swap readahead size
- [Config] aws: compile xen-netfront as module (update the right config)
* Restore request-based mode to xen-blkfront for AWS kernels (LP: #1801305) shared_ info on restore} ...
- SAUCE: xen/manage: keep track of the on-going suspend mode
- SAUCE: xenbus: add freeze/thaw/restore callbacks support
- SAUCE: x86/xen: Introduce new function to map HYPERVISOR_
Resume
- SAUCE: x86/xen: add system core suspend and resume callbacks
- SAUCE: genirq: Shutdown irq chips in suspend/resume during hibernation
- SAUCE: xen-blkfront: add callbacks for PM suspend and hibernation
- SAUCE: xen-netfront: add callbacks for PM suspend and hibernation
- SAUCE: xen/time: introduce xen_{save,