[SRU] unable to boot guest with large memory when SEV is enabled on host

Bug #1989446 reported by gerald.yang
24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
grub2-unsigned (Ubuntu)
Fix Released
High
gerald.yang
Focal
Fix Committed
High
gerald.yang
Jammy
Fix Committed
High
gerald.yang
Kinetic
Fix Released
High
gerald.yang

Bug Description

[ Impact ]

When booting a large memory guest (both focal and jammy) with 5.15 kernel on a SEV enabled host
it fails to boot and shows the following error in dmesg:
software IO TLB: Cannot allocate buffer

But booting a Fedora36 guest works fine on a SEV enabled host

With this kernel commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e998879d4fb7991856916972168cf27c0d86ed12

SWIOTLB could allocate from 64MB to 1G top contiguous memory according to how much memory the system has

in sev_setup_arch:

size = total_mem * 6 / 100;
size = clamp_val(size, IO_TLB_DEFAULT_SIZE, SZ_1G);
swiotlb_adjust_size(size);

Look into the memory block layout from Fedora grub, the available memory blocks are:
[ 0.005879] memory[0x0] [0x0000000000001000-0x000000000009ffff], 0x000000000009f000 bytes flags: 0x0
[ 0.005881] memory[0x1] [0x0000000000100000-0x000000007e9ecfff], 0x000000007e8ed000 bytes flags: 0x0
[ 0.005883] memory[0x2] [0x000000007eb1b000-0x000000007fb9afff], 0x0000000001080000 bytes flags: 0x0
[ 0.005885] memory[0x3] [0x000000007fbff000-0x000000007ffdffff], 0x00000000003e1000 bytes flags: 0x0
[ 0.005886] memory[0x4] [0x0000000100000000-0x00000004ffffffff], 0x0000000400000000 bytes flags: 0x0

The biggest one is:
[ 0.005881] memory[0x1] [0x0000000000100000-0x000000007e9ecfff], 0x000000007e8ed000 bytes flags: 0x0

The size is close to 2G and sufficient for SWIOTLB to allocate 1G contiguous memory

Then we need to exclude reserved memory blocks overlapped with this region, below is the list
[ 0.005892] reserved[0x2] [0x00000000574a7000-0x0000000059313fff], 0x0000000001e6d000 bytes flags: 0x0
[ 0.005894] reserved[0x3] [0x000000007e133018-0x000000007e17e057], 0x000000000004b040 bytes flags: 0x0
[ 0.005896] reserved[0x4] [0x000000007e845018-0x000000007e845857], 0x0000000000000840 bytes flags: 0x0
[ 0.005897] reserved[0x5] [0x000000007ee95698-0x000000007ee95af7], 0x0000000000000460 bytes flags: 0x0

Now the biggest available range is
[0x0000000000100000-0x00000000574a7000]

Before SWIOTLB allocates memory block, EFI also reserves some memory
the one that overlapped with the above range is
[ 0.005942] memblock_reserve: [0x000000007bfbe000-0x000000007bfddfff] efi_reserve_boot_services+0x8a/0xdb

It’s fine that SWIOTLB can still allocate 1G contiguous memory from [0x0000000000100000-0x00000000574a7000]:
[ 1.089832] software IO TLB: mapped [mem 0x00000000174a7000-0x00000000574a7000] (1024MB)

But if we look into the memory block layout from Ubuntu grub, the available memory blocks are:
[ 0.005833] memory[0x0] [0x0000000000001000-0x000000000009ffff], 0x000000000009f000 bytes flags: 0x0
[ 0.005835] memory[0x1] [0x0000000000100000-0x000000007e9ecfff], 0x000000007e8ed000 bytes flags: 0x0
[ 0.005837] memory[0x2] [0x000000007eb1b000-0x000000007fb9afff], 0x0000000001080000 bytes flags: 0x0
[ 0.005838] memory[0x3] [0x000000007fbff000-0x000000007ffdffff], 0x00000000003e1000 bytes flags: 0x0
[ 0.005840] memory[0x4] [0x0000000100000000-0x00000004ffffffff], 0x0000000400000000 bytes flags: 0x0

The biggest one is also:
[ 0.005835] memory[0x1] [0x0000000000100000-0x000000007e9ecfff], 0x000000007e8ed000 bytes flags: 0x0

Then excluding the reserved memory blocks:
[ 0.005846] reserved[0x2] [0x000000003a9ba000-0x000000003c7cdfff], 0x0000000001e14000 bytes flags: 0x0
[ 0.005848] reserved[0x3] [0x000000007e133018-0x000000007e17e057], 0x000000000004b040 bytes flags: 0x0
[ 0.005849] reserved[0x4] [0x000000007e847018-0x000000007e847887], 0x0000000000000870 bytes flags: 0x0
[ 0.005851] reserved[0x5] [0x000000007ee95698-0x000000007ee95af7], 0x0000000000000460 bytes flags: 0x0

Now the biggest one is:
[0x000000003e133018-0x000000007e133018]

Then excluding EFI reserved memory block that overlapped with the above range:
[ 0.005896] memblock_reserve: [0x000000007bfbe000-0x000000007bfddfff] efi_reserve_boot_services+0x8a/0xdb

So now, the biggest contiguous memory becomes
[0x000000003c7ce000-0x000000007bfbe000]

Which is less than 1G, this is why SWIOTLB can not allocate 1G contiguous memory

This commit from rhboot/grub2 fixes this issue:
https://github.com/rhboot/grub2/commit/9e6c1d803ade111b8719502ff25e86d8b4564de8

it adjusts the memory block layout, so SWIOTLB or any other drivers that need more than 1G contiguous memory can be satisfied

[ Test Plan ]

Enable SEV on a AMD machine, refer to https://docs.ovh.com/us/en/dedicated/enable-and-use-amd-sme-sev/#references-and-additional-resources_1

create a ubuntu VM with SEV enabled (--launchSecurity sev) and 18G memory as below:
virt-install --name <guest-name> --memory 18874368 --memtune hard_limit=36507216 --boot uefi --disk /var/lib/libvirt/images/<guest-name.img>,device=disk,bus=scsi --disk /var/lib/libvirt/images/<guest-name>-config.iso,device=cdrom --os-type linux --os-variant <variant> --import --controller type=scsi,model=virtio-scsi,driver.iommu=on --controller type=virtio-serial,driver.iommu=on --network network=default,model=virtio,driver.iommu=on --memballoon driver.iommu=on --graphics none --launchSecurity sev --noautoconsole

Make sure the running kernel in VM is 5.15

Then check if it can boot successfully with the above patch
dmesg should show SWIOTLB is correctly mapped to 1G memory
[ 0.005713] software IO TLB: SWIOTLB bounce buffer size adjusted to 1024MB
[ 0.821746] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 0.822210] software IO TLB: mapped [mem 0x0000000014fb4000-0x0000000054fb4000] (1024MB)
[ 0.933346] software IO TLB: Memory encryption is active and system is using DMA bounce buffers

[ Where problems could occur ]

Originally, allocate memory for initrd/params/cmdline/kernel tries to get a memory block from less than 0x3fffffff, if it can not find a contiguous memory block, it will fail

This patch only adjusts the memory allocation that
1. check if it can get memory from less than 0x7fffffff (GRUB_EFI_MAX_ALLOCATION_ADDRESS)
2. if step 1 fails, then check if it can get memory from less than 0xffffffffffffffff (GRUB_EFI_MAX_USABLE_ADDRESS)

With this patch, initrd/params/cmdline/kernel will be located in higher address, e.g. between 0x3fffffff - 0x7fffffff, so it gives more room for drivers like SWIOTLB to allocate a larger memory, so it shouldn't affect other functions

The only issue is that if initrd is too big that needs to be allocated from higher than 4G (GRUB_EFI_MAX_USABLE_ADDRESS), but even without this patch, this issue still exists, because the original policy is to allocate memory from less than 2G (0x3fffffff)
But this issue is being handled by lp:1842320

[ Other Info ]

Related bugs:
lp:1983625
lp:1842320

Related branches

affects: linux (Ubuntu) → grub2-unsigned (Ubuntu)
Changed in grub2-unsigned (Ubuntu):
assignee: nobody → gerald.yang (gerald-yang-tw)
importance: Undecided → High
status: New → In Progress
Changed in grub2-unsigned (Ubuntu Jammy):
status: New → In Progress
Changed in grub2-unsigned (Ubuntu Focal):
status: New → In Progress
assignee: nobody → gerald.yang (gerald-yang-tw)
Changed in grub2-unsigned (Ubuntu Jammy):
assignee: nobody → gerald.yang (gerald-yang-tw)
importance: Undecided → High
Changed in grub2-unsigned (Ubuntu Focal):
importance: Undecided → High
description: updated
description: updated
description: updated
Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

patch for focal

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

patch for jammy

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

patch for kinetic

Revision history for this message
gerald.yang (gerald-yang-tw) wrote (last edit ):

test PPA with the patch for focal, jammy and kinetic:
https://launchpad.net/~gerald-yang-tw/+archive/ubuntu/grub-test

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "focal.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
tags: added: sts
Revision history for this message
Julian Andres Klode (juliank) wrote :

This is a subset of LP: #1842320. I guess we should mark this as a duplicate, and add the test case to the other bug.

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

@Julian,

thanks, I will add the test case to #1842320 and leave a comment there

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

@Julian,

I understand this patch is one of the patch set in LP: #1842320
but since our customer is hitting this issue and their product is also blocked by this one

I'd like to check with you if it's possible to SRU this one first, so at least SWIOTLB can work correctly on their SEV enabled host

Or the other patches in LP: #1842320 are acceptable for SRU?

Thanks,
Gerald

description: updated
description: updated
description: updated
Revision history for this message
Julian Andres Klode (juliank) wrote :

Let's keep this for the subset there were regressions reported for the other

tags: added: foundations-todo
Changed in grub2-unsigned (Ubuntu Jammy):
status: In Progress → Triaged
Changed in grub2-unsigned (Ubuntu Focal):
status: In Progress → Triaged
Revision history for this message
Julian Andres Klode (juliank) wrote :

Uploaded to kinetic, jammy will be a binary copy week after the current update in proposed is published again (optimally next Mon).

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

@Julian

I checked grub2-unsigned, branch: applied/ubuntu/kinetic-proposed
but I couldn't find this patch in git log and the code is still the old one
I would like to confirm with you if this is the correct branch to check? thanks

Revision history for this message
Julian Andres Klode (juliank) wrote :

No the right branch is

https://code.launchpad.net/~ubuntu-core-dev/grub/+git/ubuntu/+ref/ubuntu

and that was uploaded to kinetic and sitting there in the unapproved queue to be subsumed by the latest CVE upload.

That was not supposed to happen really but nobody bothered accepting SRUs during the sprint I suppose.

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Thanks for the clarification Julian

Revision history for this message
Steve Langasek (vorlon) wrote : Please test proposed package

Hello gerald.yang, or anyone else affected,

Accepted grub2-unsigned into kinetic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-unsigned/2.06-2ubuntu13 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-kinetic to verification-done-kinetic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-kinetic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in grub2-unsigned (Ubuntu Kinetic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-kinetic
Revision history for this message
Steve Langasek (vorlon) wrote :

Hello gerald.yang, or anyone else affected,

Accepted grub2-signed into kinetic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/grub2-signed/1.186 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-kinetic to verification-done-kinetic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-kinetic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote (last edit ):

Hi Steve,

I've reproduced the problem with a kinetic VM, and then installed the grub2 packages from -proposed and I can confirm it fixes this issue:

root@ubuntu:~# free -k
               total used free shared buff/cache available
Mem: 17428828 215288 16937344 1040 276196 16927464
Swap: 0 0 0

root@ubuntu:~# apt-cache policy grub-efi-amd64-bin
grub-efi-amd64-bin:
  Installed: 2.06-2ubuntu13
  Candidate: 2.06-2ubuntu13
  Version table:
 *** 2.06-2ubuntu13 500
        500 http://archive.ubuntu.com/ubuntu kinetic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     2.06-2ubuntu12 500
        500 http://archive.ubuntu.com/ubuntu kinetic/main amd64 Packages

root@ubuntu:~# apt-cache policy grub-efi-amd64-signed
grub-efi-amd64-signed:
  Installed: 1.186+2.06-2ubuntu13
  Candidate: 1.186+2.06-2ubuntu13
  Version table:
 *** 1.186+2.06-2ubuntu13 500
        500 http://archive.ubuntu.com/ubuntu kinetic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1.185+2.06-2ubuntu12 500
        500 http://archive.ubuntu.com/ubuntu kinetic/main amd64 Packages

root@ubuntu:~# dmesg | grep -i sev
[ 0.354582] Memory Encryption Features active: AMD SEV
[ 4.813446] SVM: KVM is unsupported when running as an SEV guest
[ 4.842996] SVM: KVM is unsupported when running as an SEV guest

Adding verification-done / verification-done-kinetic tags.

tags: added: verification-done verification-done-kinetic
removed: verification-needed verification-needed-kinetic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2-unsigned - 2.06-2ubuntu13

---------------
grub2-unsigned (2.06-2ubuntu13) kinetic; urgency=medium

  * Try to pick better locations for kernel and initrd (LP: #1989446)
  * x86-efi: Use bounce buffers for reading to addresses > 4GB (enhances
    firmware compatibility of previous change)
  * Source package generated from src:grub2 using make -f ./debian/rules
    generate-grub2-unsigned

 -- Julian Andres Klode <email address hidden> Thu, 20 Oct 2022 21:18:25 +0200

Changed in grub2-unsigned (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Hi @Julian

because it also affects focal with HWE kernel(5.15)
is this patch also merged into focal?

Revision history for this message
Julian Andres Klode (juliank) wrote (last edit ):

I have no plans for individual Backports as we plan to copy back 2.06 after the security update is out.

The security update is currently blocked by getting the 2022v1 signing key installed in a PPA and having the previous security update copied out of updates to security.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub2-unsigned - 2.06-2ubuntu13

---------------
grub2-unsigned (2.06-2ubuntu13) kinetic; urgency=medium

  * Try to pick better locations for kernel and initrd (LP: #1989446)
  * x86-efi: Use bounce buffers for reading to addresses > 4GB (enhances
    firmware compatibility of previous change)
  * Source package generated from src:grub2 using make -f ./debian/rules
    generate-grub2-unsigned

 -- Julian Andres Klode <email address hidden> Thu, 20 Oct 2022 21:18:25 +0200

Changed in grub2-unsigned (Ubuntu Kinetic):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for grub2-unsigned has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

Hi @Julian

just checking if there is any news for the security updates blocking grub2.6 copy back to focal and jammy
our customer's product is still blocked by the issue since they also need the fix on focal and jammy

Thanks,
Gerald

Revision history for this message
Julian Andres Klode (juliank) wrote :

You can see yourself that the security update is in proposed now, so once that is verfied and released we can push this one.

Changed in grub2-unsigned (Ubuntu Jammy):
status: Triaged → Fix Committed
Changed in grub2-unsigned (Ubuntu Focal):
status: Triaged → Fix Committed
status: Fix Committed → Triaged
Changed in grub2-unsigned (Ubuntu Jammy):
status: Fix Committed → Triaged
Revision history for this message
Julian Andres Klode (juliank) wrote :

He I mixed that up with the larger scale one and this is part of the update currently in proposed - it was fixed in 2.06-2ubuntu13 and proposed for all series has 2.06-2ubuntu14.

For technical reasons due to binary copying the grub2-unsigned from kinetic, this bug is not part of the verification and doesn't get updates.

Changed in grub2-unsigned (Ubuntu Jammy):
status: Triaged → Fix Committed
Changed in grub2-unsigned (Ubuntu Focal):
status: Triaged → Fix Committed
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Thank you Julian and Gerald.

I've verified that upgrading grub-efi-amd64-bin to 2.06-2ubuntu14 (from -proposed) fixes the issue both in Focal and Jammy.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.