linux-aws: Xen / hibernation: xen-netfront panic + resume hangs

Bug #1881869 reported by Andrea Righi on 2020-06-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
High
Andrea Righi
Eoan
High
Andrea Righi
Focal
High
Andrea Righi

Bug Description

[Impact]

During our AWS testing we were able to trigger some hibernation failures in some Xen instance types.

One problem is a kernel panic in the resume callback of the xen-netfront driver. A workaround to this problem is to compile the driver as a module and reload it at resume (we were already doing this reload with the bionic kernel that had this driver compiled as a module, but for some reasons eoan and focal had this statically compiled).

Other issues were showing up as hangs on resume, these seem to be prevented by using the new Xen/hibernation patch set posted by Anchal to the LKML:
https://<email address hidden>/

This new patch set is still being reviewed, but according to our tests it really seems to fix some of these hangs on resume.

In addition to that we can improve hibernation reliability and performance even more by applying the updated swapoff optimization patch (that has been merged upstream).

[Test case]

Create a Xen instance in AWS, hibernate/resume multiple times.

[Fix]

The following set of fixes can be used to improve hibernation performance and reliability:
 - new Xen/hibernation patch set from the LKML (see link above)
 - config change to compile xen-netfront as a module
 - new swapoff optimization patch

[Regression potential]

The xen-netfront config change and the new swapoff optimization patch are pretty safe (one is a config change that affects only the xen-netfront driver, the other is a clean cherry-pick of an upstream commit).

The new Xen/hibernation update is pretty big and the new patches are still under review, however according to our tests it really seems to fix some of the hang issues (it definitely makes things better). Moreover, all the changes are affecting Xen and they are restricted to the hibernation/resume code paths, so, in conclusion, the overall regression potential is minimal.

[See also]

NOTE: the fix mentioned in LP: #1879711 (disable CONFIG_DMA_CMA) was also applied during our tests and it is also required to make hibernation stable in Xen.

CVE References

Andrea Righi (arighi) on 2020-06-03
Changed in linux-aws (Ubuntu Eoan):
importance: Undecided → High
Changed in linux-aws (Ubuntu Focal):
importance: Undecided → High
assignee: nobody → Andrea Righi (arighi)
Changed in linux-aws (Ubuntu Eoan):
assignee: nobody → Andrea Righi (arighi)
Changed in linux-aws (Ubuntu):
assignee: nobody → Andrea Righi (arighi)
importance: Undecided → High
Andrea Righi (arighi) on 2020-06-03
summary: - linux-aws: fix Xen / hibernation issues
+ linux-aws: Xen / hibernation: xen-netfront panic + resume hangs
Andrea Righi (arighi) on 2020-06-03
Changed in linux-aws (Ubuntu Eoan):
status: New → Confirmed
Changed in linux-aws (Ubuntu Focal):
status: New → Confirmed
Changed in linux-aws (Ubuntu):
status: New → Confirmed
Changed in linux-aws (Ubuntu Eoan):
status: Confirmed → Fix Committed
Changed in linux-aws (Ubuntu Focal):
status: Confirmed → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (33.6 KiB)

This bug was fixed in the package linux-aws - 5.4.0-1018.18

---------------
linux-aws (5.4.0-1018.18) focal; urgency=medium

  * focal/linux-aws: 5.4.0-1016.16 -proposed tracker (LP: #1882686)

  * ASoC/amd: add audio driver for amd renoir (LP: #1881046)
    - [Config] aws: do not enable amd renoir ASoC audio

  * Focal update: v5.4.42 upstream stable release (LP: #1879759)
    - [Config] updateconfigs for CC_HAS_WARN_MAYBE_UNINITIALIZED

  * linux-aws: Xen / hibernation: xen-netfront panic + resume hangs
    (LP: #1881869)
    - Revert "UBUNTU SAUCE [aws]: xen: Only restore the ACPI SCI interrupt in
      xen_restore_pirqs."
    - Revert "UBUNTU SAUCE [aws]: xen: restore pirqs on resume from hibernation."
    - Revert "UBUNTU SAUCE [aws]: block: xen-blkfront: consider new dom0 features
      on restore"
    - Revert "UBUNTU SAUCE [aws]: ACPICA: Enable sleep button on ACPI legacy wake"
    - Revert "UBUNTU SAUCE [aws]: mm: swap: improve swap readahead heuristic"
    - Revert "UBUNTU SAUCE [aws] PM / hibernate: reduce memory pressure during
      image writing"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: close event channels for PIRQs in
      system core suspend callback"
    - Revert "UBUNTU: SAUCE [aws] xen/events: add xen_shutdown_pirqs helper
      function"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: save and restore steal clock"
    - Revert "UBUNTU: SAUCE [aws] xen-time-introduce-xen_-save-restore-
      _steal_clock"
    - Revert "UBUNTU: SAUCE [aws] xen-netfront: add callbacks for PM suspend and
      hibernation support"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: add system core suspend and resume
      callbacks"
    - Revert "UBUNTU: SAUCE [aws] x86/xen: Introduce new function to map
      HYPERVISOR_shared_info on Resume"
    - Revert "UBUNTU: SAUCE: xen-blkfront: Fixed blkfront_restore to remove a call
      to negotiate_mq"
    - Revert "UBUNTU: SAUCE: xen-blkfront: add callbacks for PM suspend and
      hibernation"
    - Revert "UBUNTU: SAUCE: xenbus: add freeze/thaw/restore callbacks support"
    - Revert "UBUNTU: SAUCE: xen/manage: introduce helper function to know the on-
      going suspend mode"
    - Revert "UBUNTU: SAUCE: xen/manage: keep track of the on-going suspend mode"
    - xen/blkfront: fix ring info addressing
    - [Config] aws: compile xen-netfront as module
    - SAUCE: mm: swap: properly update readahead statistics in unuse_pte_range()
    - UBUNTU SAUCE [aws]: mm: swap: increase default swap readahead size
    - [Config] aws: compile xen-netfront as module (update the right config)

  * Restore request-based mode to xen-blkfront for AWS kernels (LP: #1801305)
    - SAUCE: xen/manage: keep track of the on-going suspend mode
    - SAUCE: xenbus: add freeze/thaw/restore callbacks support
    - SAUCE: x86/xen: Introduce new function to map HYPERVISOR_shared_info on
      Resume
    - SAUCE: x86/xen: add system core suspend and resume callbacks
    - SAUCE: genirq: Shutdown irq chips in suspend/resume during hibernation
    - SAUCE: xen-blkfront: add callbacks for PM suspend and hibernation
    - SAUCE: xen-netfront: add callbacks for PM suspend and hibernation
    - SAUCE: xen/time: introduce xen_{save,restore}...

Changed in linux-aws (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers