Jammy / Kinetic: Enable Hibernation for Xen Based Instance Types
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-aws (Ubuntu) |
Fix Released
|
Critical
|
Matthew Ruffell | ||
Jammy |
Fix Released
|
Critical
|
gerald.yang | ||
Kinetic |
Fix Released
|
Critical
|
Matthew Ruffell |
Bug Description
[Impact]
Hibernation currently fails for all AWS Xen instance types (c3/c4/
When attempting to hibernate, the system gets stuck in sync_inodes_
Upon review of the jammy/linux-aws git log, it appears that the kernel is missing AWS hibernation enablement patches entirely. These need to be included to get hibernation working.
[Fix]
Hibernation currently works on the Amazon Linux 2 5.15 Kernel:
https:/
After careful review of the amazon-
xen: Restore xen-pirqs on resume from hibernation
xen-netfront: call netif_device_attach on resume
xen: Only restore the ACPI SCI interrupt in xen_restore_pirqs.
xen: restore pirqs on resume from hibernation.
block: xen-blkfront: consider new dom0 features on restore
x86: tsc: avoid system instability in hibernation
xen-blkfront: Fixed blkfront_restore to remove a call to negotiate_mq
Revert "xen: dont fiddle with event channel masking in suspend/resume"
PM / hibernate: update the resume offset on SNAPSHOT_
x86/xen: close event channels for PIRQs in system core suspend callback
xen/events: add xen_shutdown_pirqs helper function
x86/xen: save and restore steal clock
xen/time: introduce xen_{save,
xen-netfront: add callbacks for PM suspend and hibernation support
xen-blkfront: add callbacks for PM suspend and hibernation
x86/xen: add system core suspend and resume callbacks
x86/xen: Introduce new function to map HYPERVISOR_
xenbus: add freeze/thaw/restore callbacks support
xen/manage: introduce helper function to know the on-going suspend mode
xen/manage: keep track of the on-going suspend mode
These patches will be carried as SAUCE patches, and their subjects marked with "UBUNTU: SAUCE [aws]". Their upstream is the Amazon Hibernation team, with the repo being the Amazon Linux 2 kernel repo.
[Testcase]
1. Log into Amazon EC2.
2. Select Launch Instance.
3. Under Instance Type, select any from (c3/c4/
4. Select the "Ubuntu 22.04 LTS HVM (SSD type)" AMI in the quicklaunch pane.
5. Select your SSH keypair.
6. In storage, select 20gb. Go to the advanced tab, and set Encrypted: Yes.
7. Under Advanced Settings for the instance, set "Stop - Hibernate" to Enable.
8. Create the Instance. SSH in.
9. Wait 5 minutes for hibinit-agent to create /swap-hibinit swapfile and configure grub.
10. Start a screen session. Echo some text and then detach with ctrl-d.
11. Log out from instance.
12. In EC2, select "Instance State" > "Hibernate".
13. Wait 30 seconds to one minute. The state will go from "Stopping" to "Stopped".
14. Start the instance again.
15. SSH in.
16. Attempt to resume screen session with "screen -r".
If you are not able to ssh into the instance, hibernation had failed. If ssh works and the screen session is still running, hibernation was successful.
Alternatively, the CPC team can run their Hibernation testsuite over Jammy and Kinetic.
We have built test kernels for Jammy and Kinetic with the patches, and they are available in the below ppa:
https:/
If you try and hibernate and resume with the test kernels, hibernation is successful.
[Where problems could occur]
We are adding a significant amount of code to the Xen subsystem, spread across many commits. This code has not been mainlined, and is instead maintained out of tree by the Amazon AWS Hibernation team.
The changes target hibernation, block devices, and clock devices, specific to those used on AWS Xen instances. Most of these patches have been applied to Xenial, Bionic, Focal and other series for a long time, but some patches are new for 5.15 onward.
The changes will only target linux-aws to try and limit regression risk to AWS users, and any regressions will be limited to users of Xen based instance types (c3/c4/
If a regression were to occur, the instance would likely fail to hibernate, and at worst, write an incomplete hibernation image to the swapfile. The kernel will see this on start, and instead of resuming from the hibernation image, will start fresh. It is unlikely to cause any filesystem corruption on the rootfs, but any in progress computations at the time of hibernation could be lost. The current broken behaviour breaks networking, and users would have to power cycle the instance a few times before they can ssh in again.
CVE References
summary: |
- jammy/linux-aws hibernation timeout on xen instances + Jammy / Kinetic: Enable Hibernation for Xen Based Instance Types |
Changed in linux-aws (Ubuntu Jammy): | |
status: | New → In Progress |
Changed in linux-aws (Ubuntu Kinetic): | |
status: | New → In Progress |
Changed in linux-aws (Ubuntu Jammy): | |
importance: | Undecided → Critical |
Changed in linux-aws (Ubuntu Kinetic): | |
importance: | Undecided → Critical |
Changed in linux-aws (Ubuntu Jammy): | |
assignee: | nobody → gerald.yang (gerald-yang-tw) |
Changed in linux-aws (Ubuntu Kinetic): | |
assignee: | nobody → Matthew Ruffell (mruffell) |
description: | updated |
tags: | added: jammy kinetic sts |
description: | updated |
Changed in linux-aws (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
Changed in linux-aws (Ubuntu Kinetic): | |
status: | In Progress → Fix Committed |
In this screenshot, it appears the system has resumed as the login screen is shown along with the messages from the hibernation memory consumption utility. The first memory message was generated prior to the hibernation (matches the message from the pre-hibernation image). The second message could have been generated before the hibernation or after the resume (there isn't enough data to know for sure).