aws: Support hibernation on Graviton

Bug #2060992 reported by Philip Cox
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
In Progress
Undecided
Philip Cox
Jammy
Fix Released
Undecided
Philip Cox
Mantic
Fix Released
Undecided
Philip Cox
Noble
Fix Released
Undecided
Philip Cox

Bug Description

SRU Justification:

[Impact]
This change contains two parts, the first is adding support for
  - KVM and guest support for the PSCI SYSTEM_OFF2 (hibernate) call

And the second part is:
   - Guest kernel support for clean boot on demand

For KVM and guest support for the PSCI SYSTEM_OFF2 (hibernate) call:

PSCI v1.3 adds support for SYSTEM_OFF2 which is analogous to ACPI S4 state.

This will allow hosting environments to determine that a guest is hibernated rather than just powered off, and ensure that they preserve the virtual environment appropriately to allow the guest to resume safely (or bump the hardware_signature in the FACS to trigger a clean reboot instead).

For Guest kernel support for clean boot on demand:

The FACS field in the ACPI table is optional, but can be used communicate the hardware_signature field. If this field changes on resuming from a hibernation a clean reboot should happen rather than the resume from hibernation.

On hardware reduced platforms[0] this field may exist but it is not exposed currently.

[Fix]

The changes for KVM and guest support for the PSCI SYSTEM_OFF2 (hibernate) call come from:
     https://<email address hidden>

The changes for Guest kernel support for clean boot on demand come from:
      https://<email address hidden>

Latest patches have been picked from:
   - noble/mantic: https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/psci-hibernate-6.8

    - jammy: https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/psci-hibernate-5.15

[Test Plan]
AWS test.

[Where problems could occur]
If on hardware reduced platforms that incorrectly support/advertise the FACS field, hibernation may break if it returns a hardware signature that changes.

[Other info]
SF# 00383181

[0]: See Section 4.1 of the ACPI spec for info on hardware-reduced platforms.
https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/04_ACPI_Hardware_Specification/ACPI_Hardware_Specification.html

Philip Cox (philcox)
description: updated
Revision history for this message
dwmw2 (dwmw2) wrote :

The ACPICA patch is merged upstream: https://github.com/acpica/acpica/commit/b3496dece6de2709373ad7338698ce91dec5215d

So I've reposted the kernel patches to reference the ACPICA commit ID:
https://<email address hidden>/

As before, the full set of patches is at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/psci-hibernate
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/psci-hibernate-6.8

Philip Cox (philcox)
Changed in linux-aws (Ubuntu Mantic):
assignee: nobody → Philip Cox (philcox)
status: New → In Progress
Changed in linux-aws (Ubuntu Noble):
status: New → In Progress
Philip Cox (philcox)
Changed in linux-aws (Ubuntu Jammy):
status: New → In Progress
Philip Cox (philcox)
summary: - aws: Guest kernel support for clean boot on demand
+ aws: Support hibernation on Graviton
Philip Cox (philcox)
description: updated
Changed in linux-aws (Ubuntu Jammy):
assignee: nobody → Philip Cox (philcox)
Philip Cox (philcox)
description: updated
Philip Cox (philcox)
Changed in linux-aws (Ubuntu Jammy):
status: In Progress → Fix Committed
Changed in linux-aws (Ubuntu Mantic):
status: In Progress → Fix Committed
Changed in linux-aws (Ubuntu Noble):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/6.5.0-1021.21 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-mantic-linux-aws' to 'verification-done-mantic-linux-aws'. If the problem still exists, change the tag 'verification-needed-mantic-linux-aws' to 'verification-failed-mantic-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-mantic-linux-aws-v2 verification-needed-mantic-linux-aws
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/5.15.0-1063.69 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws' to 'verification-done-jammy-linux-aws'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws' to 'verification-failed-jammy-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-v2 verification-needed-jammy-linux-aws
Revision history for this message
Seth Carolan (secarola) wrote :

Batch tested the patched kernel and achieved 99+% success rate on CLI/console initiated Hibernate/Resumes across all ARM supported AWS EC2 instance families, C6g(d)(n), C7g(d), M6g(d), M7g(d), R6g(d), R7g(d), T4g. (4,175/4,200 test runs).

High level testing details:
1.) Spun up instance with patched AMI + this hibinit-agent patch (https://git.launchpad.net/~secarola/ubuntu/+source/ec2-hibinit-agent/commit/?h=applied/ubuntu/jammy-devel&id=034ec3ffdc8cbd9d319aa5815f02d60ec3e27f93).
2.) Started up bashscript "heartbeat" on the instance, pushing timestamp to dynamoDB table every 30 seconds.
3.) Hibernated instance through the AWS CLI
4.) Resumed instance through the AWS CLI
5.) Confirmed "heartbeat" updates to dynamoDB table after resume
6.) Repeated once more.

Manually tested instance initiated hibernation successfully:
High level testing details:
1.) Spun up instance with patched AMI + this hibinit-agent patch (https://git.launchpad.net/~secarola/ubuntu/+source/ec2-hibinit-agent/commit/?h=applied/ubuntu/jammy-devel&id=034ec3ffdc8cbd9d319aa5815f02d60ec3e27f93).
2.) Connected to the instance and started up bash script "heartbeat" on the instance, writing to text file every second with new timestamp.
#!/bin/bash
while :
do
        echo $(date) > text.txt
        sleep 1
done
3.) Hibernated instance through the GuestOS.
sudo swapon --priority=32767 /swap-hibinit
sudo systemctl hibernate
4.) Confirmed that hosting environments reported the instance as Hibernated and not shutdown.
4.) Resumed instance through the AWS CLI
5.) Connected to the instance and confirmed the date was still being written to the text file
cat text.txt

tags: added: verification-done-jammy-linux-aws
removed: verification-needed-jammy-linux-aws
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws/6.8.0-1009.9 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-aws' to 'verification-done-noble-linux-aws'. If the problem still exists, change the tag 'verification-needed-noble-linux-aws' to 'verification-failed-noble-linux-aws'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-aws-v2 verification-needed-noble-linux-aws
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (103.6 KiB)

This bug was fixed in the package linux-aws - 6.8.0-1009.9

---------------
linux-aws (6.8.0-1009.9) noble; urgency=medium

  * noble/linux-aws: 6.8.0-1009.9 -proposed tracker (LP: #2064325)

  * aws: Support hibernation on Graviton (LP: #2060992)
    - SAUCE: firmware/psci: Add definitions for PSCI v1.3 specification (ALPHA)
    - SAUCE: KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
    - SAUCE: KVM: arm64: Add support for PSCI v1.2 and v1.3
    - SAUCE: KVM: selftests: Add test for PSCI SYSTEM_OFF2
    - SAUCE: KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
    - SAUCE: arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
    - SAUCE: ACPICA: Detect FACS even for hardware reduced platforms
    - SAUCE: arm64: acpi: Honour firmware_signature field of FACS, if it exists
    - [Config]: Enable hibernate on arm64

  [ Ubuntu: 6.8.0-34.34 ]

  * noble/linux: 6.8.0-34.34 -proposed tracker (LP: #2065167)
  * Packaging resync (LP: #1786013)
    - [Packaging] debian.master/dkms-versions -- update from kernel-versions
      (main/2024.04.29)

  [ Ubuntu: 6.8.0-32.32 ]

  * noble/linux: 6.8.0-32.32 -proposed tracker (LP: #2064344)
  * Packaging resync (LP: #1786013)
    - [Packaging] drop getabis data
    - [Packaging] update variants
    - [Packaging] update annotations scripts
    - [Packaging] debian.master/dkms-versions -- update from kernel-versions
      (main/2024.04.29)
  * Enable Nezha board (LP: #1975592)
    - [Config] Enable CONFIG_REGULATOR_FIXED_VOLTAGE on riscv64
  * Enable Nezha board (LP: #1975592) // Enable StarFive VisionFive 2 board
    (LP: #2013232)
    - [Config] Enable CONFIG_SERIAL_8250_DW on riscv64
  * RISC-V kernel config is out of sync with other archs (LP: #1981437)
    - [Config] Sync riscv64 config with other architectures
  * obsolete out-of-tree ivsc dkms in favor of in-tree one (LP: #2061747)
    - ACPI: scan: Defer enumeration of devices with a _DEP pointing to IVSC device
    - Revert "mei: vsc: Call wake_up() in the threaded IRQ handler"
    - mei: vsc: Unregister interrupt handler for system suspend
    - media: ipu-bridge: Add ov01a10 in Dell XPS 9315
    - SAUCE: media: ipu-bridge: Support more sensors
  * Fix after-suspend-mediacard/sdhc-insert test failed (LP: #2042500)
    - PCI/ASPM: Move pci_configure_ltr() to aspm.c
    - PCI/ASPM: Always build aspm.c
    - PCI/ASPM: Move pci_save_ltr_state() to aspm.c
    - PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
    - PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
    - PCI/ASPM: Disable L1 before configuring L1 Substates
    - PCI/ASPM: Update save_state when configuration changes
  * RTL8852BE fw security fail then lost WIFI function during suspend/resume
    cycle (LP: #2063096)
    - wifi: rtw89: download firmware with five times retry
  * intel_rapl_common: Add support for ARL and LNL (LP: #2061953)
    - powercap: intel_rapl: Add support for Lunar Lake-M paltform
    - powercap: intel_rapl: Add support for Arrow Lake
  * Kernel panic during checkbox stress_ng_test on Grace running noble 6.8
    (arm64+largemem) kernel (LP: #2058557)
    - aio: Fix null ptr deref in aio_complete() wakeup
  * Av...

Changed in linux-aws (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (47.4 KiB)

This bug was fixed in the package linux-aws - 5.15.0-1063.69

---------------
linux-aws (5.15.0-1063.69) jammy; urgency=medium

  * jammy/linux-aws: 5.15.0-1063.69 -proposed tracker (LP: #2063712)

  * aws: Support hibernation on Graviton (LP: #2060992)
    - SAUCE: PM: hibernate: Allow ACPI hardware signature to be honoured
    - SAUCE: PM: hibernate: Honour ACPI hardware signature by default for virtual
      guests
    - SAUCE: ACPICA: Detect FACS even for hardware reduced platforms
    - SAUCE: arm64: acpi: Honour firmware_signature field of FACS, if it exists
    - SAUCE: firmware/psci: Add definitions for PSCI v1.3 specification (ALPHA)
    - SAUCE: arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
    - [Config]: Enable hibernate on arm64
    - [Config]: Enable hibernate on arm64

  [ Ubuntu: 5.15.0-111.121 ]

  * jammy/linux: 5.15.0-111.121 -proposed tracker (LP: #2063763)
  * RTL8852BE fw security fail then lost WIFI function during suspend/resume
    cycle (LP: #2063096)
    - wifi: rtw89: download firmware with five times retry
  * Mount CIFS fails with Permission denied (LP: #2061986)
    - cifs: fix ntlmssp auth when there is no key exchange
  * USB stick can't be detected (LP: #2040948)
    - usb: Disable USB3 LPM at shutdown
  * Jammy update: v5.15.153 upstream stable release (LP: #2063290)
    - io_uring/unix: drop usage of io_uring socket
    - io_uring: drop any code related to SCM_RIGHTS
    - selftests: tls: use exact comparison in recv_partial
    - ASoC: rt5645: Make LattePanda board DMI match more precise
    - x86/xen: Add some null pointer checking to smp.c
    - MIPS: Clear Cause.BD in instruction_pointer_set
    - HID: multitouch: Add required quirk for Synaptics 0xcddc device
    - gen_compile_commands: fix invalid escape sequence warning
    - RDMA/mlx5: Fix fortify source warning while accessing Eth segment
    - RDMA/mlx5: Relax DEVX access upon modify commands
    - riscv: dts: sifive: add missing #interrupt-cells to pmic
    - x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h
    - x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault()
    - net/iucv: fix the allocation size of iucv_path_table array
    - parisc/ftrace: add missing CONFIG_DYNAMIC_FTRACE check
    - block: sed-opal: handle empty atoms when parsing response
    - dm-verity, dm-crypt: align "struct bvec_iter" correctly
    - scsi: mpt3sas: Prevent sending diag_reset when the controller is ready
    - ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port
    - drm/amdgpu: Enable gpu reset for S3 abort cases on Raven series
    - Bluetooth: rfcomm: Fix null-ptr-deref in rfcomm_check_security
    - firewire: core: use long bus reset on gap count error
    - ASoC: Intel: bytcr_rt5640: Add an extra entry for the Chuwi Vi8 tablet
    - Input: gpio_keys_polled - suppress deferred probe error for gpio
    - ASoC: wm8962: Enable oscillator if selecting WM8962_FLL_OSC
    - ASoC: wm8962: Enable both SPKOUTR_ENA and SPKOUTL_ENA in mono mode
    - ASoC: wm8962: Fix up incorrect error message in wm8962_set_fll
    - do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak
    - s390/dasd: put block allocation in separat...

Changed in linux-aws (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (74.4 KiB)

This bug was fixed in the package linux-aws - 6.5.0-1021.21

---------------
linux-aws (6.5.0-1021.21) mantic; urgency=medium

  * mantic/linux-aws: 6.5.0-1021.21 -proposed tracker (LP: #2063691)

  * aws: Support hibernation on Graviton (LP: #2060992)
    - SAUCE: firmware/psci: Add definitions for PSCI v1.3 specification (ALPHA)
    - SAUCE: KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernation
    - SAUCE: KVM: arm64: Add support for PSCI v1.2 and v1.3
    - SAUCE: KVM: selftests: Add test for PSCI SYSTEM_OFF2
    - SAUCE: KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
    - SAUCE: arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
    - SAUCE: ACPICA: Detect FACS even for hardware reduced platforms
    - SAUCE: arm64: acpi: Honour firmware_signature field of FACS, if it exists
    - [Config]: Enable hibernate on arm64
    - [Config]: Enable hibernate on arm64

  [ Ubuntu: 6.5.0-40.40 ]

  * mantic/linux: 6.5.0-40.40 -proposed tracker (LP: #2063709)
  * [Mantic] Compile broken on armhf (cc1 out of memory) (LP: #2060446)
    - Revert "minmax: relax check to allow comparison between unsigned arguments
      and signed constants"
    - Revert "minmax: allow comparisons of 'int' against 'unsigned char/short'"
    - Revert "minmax: allow min()/max()/clamp() if the arguments have the same
      signedness."
    - Revert "minmax: add umin(a, b) and umax(a, b)"
  * Drop fips-checks script from trees (LP: #2055083)
    - [Packaging] Remove fips-checks script
  * alsa/realtek: adjust max output valume for headphone on 2 LG machines
    (LP: #2058573)
    - ALSA: hda/realtek: fix the hp playback volume issue for LG machines
  * Mantic update: upstream stable patchset 2024-03-27 (LP: #2059284)
    - asm-generic: make sparse happy with odd-sized put_unaligned_*()
    - powerpc/mm: Fix null-pointer dereference in pgtable_cache_add
    - arm64: irq: set the correct node for VMAP stack
    - drivers/perf: pmuv3: don't expose SW_INCR event in sysfs
    - powerpc: Fix build error due to is_valid_bugaddr()
    - powerpc/mm: Fix build failures due to arch_reserved_kernel_pages()
    - powerpc/64s: Fix CONFIG_NUMA=n build due to create_section_mapping()
    - x86/boot: Ignore NMIs during very early boot
    - powerpc: pmd_move_must_withdraw() is only needed for
      CONFIG_TRANSPARENT_HUGEPAGE
    - powerpc/lib: Validate size for vector operations
    - x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
    - perf/core: Fix narrow startup race when creating the perf nr_addr_filters
      sysfs file
    - debugobjects: Stop accessing objects after releasing hash bucket lock
    - regulator: core: Only increment use_count when enable_count changes
    - audit: Send netlink ACK before setting connection in auditd_set
    - ACPI: video: Add quirk for the Colorful X15 AT 23 Laptop
    - PNP: ACPI: fix fortify warning
    - ACPI: extlog: fix NULL pointer dereference check
    - ACPI: NUMA: Fix the logic of getting the fake_pxm value
    - PM / devfreq: Synchronize devfreq_monitor_[start/stop]
    - ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous
      events
    - FS:JFS:UBSAN:array-ind...

Changed in linux-aws (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-6.5/6.5.0-1021.21~22.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-aws-6.5' to 'verification-done-jammy-linux-aws-6.5'. If the problem still exists, change the tag 'verification-needed-jammy-linux-aws-6.5' to 'verification-failed-jammy-linux-aws-6.5'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-aws-6.5-v2 verification-needed-jammy-linux-aws-6.5
Revision history for this message
Seth Carolan (secarola) wrote :

Batch tested the proposed kernel for Jammy 6.5 and achieved 99+% success rate on CLI/console initiated Hibernate/Resumes. Complete batch tests across all ARM supported instance types ran previously, limited this test to the M6g family as confirmation.

High level testing details:
1.) Spun up instance with patched AMI + this hibinit-agent ppa (https://launchpad.net/~mitchdz/+archive/ubuntu/ec2-hibinit-agent-arm).
2.) Started up bashscript "heartbeat" on the instance, pushing timestamp to dynamoDB table every 30 seconds.
3.) Hibernated instance through the AWS CLI
4.) Resumed instance through the AWS CLI
5.) Confirmed "heartbeat" updates to dynamoDB table after resume
6.) Repeated once more.

Manually tested instance initiated hibernation successfully:
High level testing details:
1.) Spun up instance with patched AMI + this hibinit-agent patch (https://git.launchpad.net/~secarola/ubuntu/+source/ec2-hibinit-agent/commit/?h=applied/ubuntu/jammy-devel&id=034ec3ffdc8cbd9d319aa5815f02d60ec3e27f93).
2.) Connected to the instance and started up bash script "heartbeat" on the instance, writing to text file every second with new timestamp.
#!/bin/bash
while :
do
        echo $(date) > text.txt
        sleep 1
done
3.) Hibernated instance through the GuestOS.
sudo swapon --priority=32767 /swap-hibinit
sudo systemctl hibernate
4.) Confirmed that hosting environments reported the instance as Hibernated and not shutdown.
4.) Resumed instance through the AWS CLI
5.) Connected to the instance and confirmed the date was still being written to the text file
cat text.txt

tags: added: verification-done-jammy-linux-aws-6.5
removed: verification-needed-jammy-linux-aws-6.5
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.