Google Confidential Compute fails to boot with shim version 1.47

Bug #1931254 reported by Joshua Powers
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Khaled El Mously
Hirsute
Won't Fix
Undecided
Khaled El Mously
linux-gcp (Ubuntu)
Fix Released
Undecided
Unassigned
Hirsute
Fix Released
Undecided
Unassigned

Bug Description

# Overview

Hirsute and Impish daily builds are currently not booting on Google Confidential Compute. Confidential compute is Google's platform that enables the use of Secure Encrypted Virtualization extension via AMD EPYC CPUs. Booting an image with version 1.45 works, but once upgraded to 1.47, the VM no longer boots, and instead the kernel panics.

Launching the image with secure boot, but without confidential compute works as expected.

# Expected result

The system is able to reboot after the upgrade.

# Actual result

Kernel panic: https://paste.ubuntu.com/p/mHrvVc6qBc/

# Steps to reproduce

Launch a VM in GCE with confidential compute enabled with a serial v20210511a or later and look at the serial log for the kernel panic. Example CLI command to launch a VM:

$ gcloud beta compute instances create $USER-confidential-testing --zone=us-west1-b --machine-type=n2d-standard-2 --image=daily-ubuntu-2104-hirsute-v20210511a --image-project=ubuntu-os-cloud-devel --confidential-compute --maintenance-policy=TERMINATE

The last known good working image is daily-ubuntu-2104-hirsute-v20210510. The upgrade that fails is when shim signed is updated from 1.46+15.4-0ubuntu1 to 1.47+15.4-0ubuntu2

# Logs & notes

* 20210510 manifest (good): https://paste.ubuntu.com/p/QjnMPcJj7G/
* 20210511a manifest (bad): https://paste.ubuntu.com/p/PvJQwRXHcG/
* diff between manifests: https://paste.ubuntu.com/p/4nJtGxqGn7/
* serial logs of failed boot: https://paste.ubuntu.com/p/mHrvVc6qBc/

# Cause:

shim changed the memory type for pages reserved for EFI runtime services, from EfiRuntimeServicesData to EfiBootServicesData.

Memory reserved for EFI runtime/boot services must be remapped as encrypted in the kernel (during boot) if SEV (secure encrypted virtualization) is enabled. The original kernel implementation of ioremap only correctly mapped the region as encrypted for EfiRuntimeServicesData regions, so when shim changed the type to EfiBootServicesData the kernel bug was exposed

Note that this affects all 5.11 kernels not just gcp. It is possible that gcp is the only cloud that uses sev currently (for "Confidential Computing").

# Fix:

Both EfiRuntimeServicesData and EfiBootServicesData must be mapped as encrypted if SEV is active, as per:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d651ee9c71bb12fc0c8eb2786b66cbe5aa3e43b

# Test

Without the fix applied, confirmed that I was able to reproduce the issue described here (complete failure to boot, kernel panic)
With fix, confirmed no issues booting

# Regression potential

The fix could potentially cause boot failures, if a memory region is marked encrypted when it shouldn't be. I assume in that case it would cause a panic similar to the one seen here for this bug:

general protection fault, probably for non-canonical address 0x314836c31124d346: 0000 [#1] SMP NOPTI

CVE References

summary: - Google Confidnetial Compute fails to boot with 1.47
+ Google Confidential Compute fails to boot with 1.47
Revision history for this message
Peter Gonda (pgonda) wrote : Re: Google Confidential Compute fails to boot with 1.47

I think that this kernel needs a new fix that has been posted to KVM mailing list. The stack trace seems to match the report and the fix.

https://www.spinics.net/lists/kernel/msg3968888.html

Revision history for this message
Steve Langasek (vorlon) wrote :

Per the last comment, if the kernel is panicking, we should look at this from the kernel side.

Revision history for this message
Khaled El Mously (kmously) wrote :

@Peter Gonda -- You seem to be 100% correct.

Thank you for the helpful pointer!

I have confirmed the fix and shim-side of the changes that exposed the issue.

I will SRU the fix asap.

no longer affects: shim-signed (Ubuntu)
description: updated
description: updated
description: updated
no longer affects: linux-gcp (Ubuntu)
no longer affects: linux-gcp (Ubuntu Hirsute)
Changed in linux (Ubuntu):
assignee: nobody → Khaled El Mously (kmously)
Changed in linux (Ubuntu Hirsute):
assignee: nobody → Khaled El Mously (kmously)
Changed in linux (Ubuntu):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
description: updated
description: updated
description: updated
summary: - Google Confidential Compute fails to boot with 1.47
+ Google Confidential Compute fails to boot with shim version 1.47
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Joshua Powers (powersj) wrote :

Verification steps:

1. Enabled Confidential Compute
2. Change image to daily-ubuntu-2104-hirsute-v20210510 (last working image)
3. Launched image
4. Enabled proposed
5. Updated system https://paste.ubuntu.com/p/mNrZNkVYtB/
6. Rebooted sucessfully! https://paste.ubuntu.com/p/G9CmvhW9tx/

Marking verification-done

tags: added: verification-done-hirsute
removed: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This fix has already been released with hirsute/linux-gcp 5.11.0-1015.17 which is currently in hirsute-updates. For the generic hirsute/linux kernel the fix will be applied for the next SRU cycle.

no longer affects: linux-gcp (Ubuntu)
no longer affects: linux (Ubuntu Hirsute)
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
assignee: nobody → Khaled El Mously (kmously)
Changed in linux-gcp (Ubuntu Hirsute):
status: New → Fix Released
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This fix has already been released with impish/linux 5.13.0-11.1. Marking the task for the development kernel as fix released.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Changed in linux-gcp (Ubuntu):
status: New → Fix Committed
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (48.8 KiB)

This bug was fixed in the package linux-gcp - 5.11.0-1016.18+21.10.2

---------------
linux-gcp (5.11.0-1016.18+21.10.2) impish; urgency=medium

  * impish/linux-gcp: 5.11.0-1016.18+21.10.2 -proposed tracker (LP: #1936486)

  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * Disable Bluetooth in cloud kernels (LP: #1840488)
    - [Config] gcp: Disable CONFIG_BT

  [ Ubuntu: 5.11.0-1016.18 ]

  * hirsute/linux-gcp: 5.11.0-1016.18 -proposed tracker (LP: #1938651)
  * Some cloud kernels have Android related config options disabled
    (LP: #1928686)
    - [config] gcp: Enable Android options for anbox
  * Disable Bluetooth in cloud kernels (LP: #1840488)
    - [config] gcp: Disable CONFIG_BT
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * large_dir in ext4 broken (LP: #1933074)
    - SAUCE: ext4: fix directory index node split corruption
  * Add l2tp.sh in net from ubuntu_kernel_selftests back (LP: #1934293)
    - Revert "UBUNTU: SAUCE: selftests/net -- disable l2tp.sh test"
  * icmp_redirect.sh in net from ubuntu_kernel_selftests failed on F-OEM-5.6 /
    F-OEM-5.10 / F-OEM-5.13 / F / G / H (LP: #1880645)
    - selftests: icmp_redirect: support expected failures
  * Mute/mic LEDs no function on some HP platfroms (LP: #1934878)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 450 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 630 G8
  * [SRU][OEM-5.10/H] Fix HDMI output issue on Intel TGL GPU (LP: #1934864)
    - drm/i915: Fix HAS_LSPCON macro for platforms between GEN9 and GEN10
  * mute/micmute LEDs no function on HP EliteBook 830 G8 Notebook PC
    (LP: #1934239)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 830 G8 Notebook PC
  * ubuntu-host driver lacks lseek ops (LP: #1934110)
    - ubuntu-host: add generic lseek op
  * ubuntu_kernel_selftests ftrace fails on arm64 F / aws-5.8 / amd64 F
    azure-5.8 (LP: #1927749)
    - selftests/ftrace: fix event-no-pid on 1-core machine
  * Hirsute update: upstream stable patchset 2021-06-29 (LP: #1934012)
    - proc: Track /proc/$pid/attr/ opener mm_struct
    - ASoC: max98088: fix ni clock divider calculation
    - ASoC: amd: fix for pcm_read() error
    - spi: Fix spi device unregister flow
    - spi: spi-zynq-qspi: Fix stack violation bug
    - bpf: Forbid trampoline attach for functions with variable arguments
    - net/nfc/rawsock.c: fix a permission check bug
    - usb: cdns3: Fix runtime PM imbalance on error
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Glavey TM800A550L tablet
    - ASoC: Intel: bytcr_rt5640: Add quirk for the Lenovo Miix 3-830 tablet
    - vfio-ccw: Reset FSM state to IDLE inside FSM
    - vfio-ccw: Serialize FSM IDLE state with I/O completion
    - ASoC: sti-sas: add missing MODULE_DEVICE_TABLE
    - spi: sprd: Add missing MODULE_DEVICE_TABLE
    - usb: chipidea: udc: assign interrupt number to USB gadget structure
    - isdn: mISDN: netjet: Fix crash in nj_probe:
    - bonding: init notify_work earlier to avoid uninitialized use
    - netlink: disable IRQs for netlink_lock_table()
    - net: mdiobus: get...

Changed in linux-gcp (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote :

The Hirsute Hippo has reached End of Life, so this bug will not be fixed for that release.

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.