Grub fails to load kernel from squashfs if mem < 1500mb

Bug #1878541 reported by Michael Vogt on 2020-05-14
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
High
Unassigned
grub2 (Ubuntu)
Status tracked in Groovy
Focal
Undecided
Unassigned
Groovy
High
Unassigned

Bug Description

[Impact]

 * loopback command uses too much ram, resulting in OOM on small machines

[Test Case]

 * Download & Copy kernel.snap from amd64 pc image onto ESP partitition

 * Boot VM with secureboot, uefi and tpm and drop into grub recovery shell

 * observe ram usage of the machine (for example by using virt-manager graphs)

 * execute "loopback loop0 /path/to/kernel.snap"

 * observe ram usage of the machine again.

 * The RAM usage should stay almost constant with the patched grub just like it did in bionic. If it grows by the size of the kernel.snap (~500MB+), it is booting using buggy grub as shipped in focal GA.

[Regression Potential]

 * This patch changes UEFI secureboot verifier behaviour for the loopback command. The whole loopback file is no longer read & stored into memory.

This changes the PCR values. However Ubuntu has not yet been using or sealing against that PCR value. Also normally, on every kernel/grub update, the same PCR value is changed. Thus normal resealing procedure after a grub update would accommodate for this change of the PCR value.

The loopback devices as a whole are no longer measured into TPM and cannot be attested. The resurrect such behavior, there is upstream design plan to allow storing hashes of all blocks and validate them with reduced memory requirement. Currently this is deemed out of scope, and of low interest/priority.

[Other Info]

[Original bug report]

Booting a uc20 system fails early currently. The image used was:
http://cdimage.ubuntu.com/ubuntu-core/20/beta/20200513.2/

Attached is a screenshot of the debug output.

This appears to be some sort of regression with grub in 20.04 or with UEFI grub - this used to work in uc18.

Note that there is memory < 1500mb

Related branches

Michael Vogt (mvo) wrote :
Michael Vogt (mvo) on 2020-05-14
summary: - uc20 image fails with 512mb ram
+ Grub fails to load kernel from squashfs if mem < 1500mb
description: updated
Michael Vogt (mvo) on 2020-05-14
tags: added: rls-gg-incoming
description: updated
Michael Vogt (mvo) wrote :

Dimitri suggested to sort the squashfs with the "-sort" option. I created the attached test for this but it has no effect for me.

tags: added: uc20
Changed in snapd:
importance: Undecided → High
Michael Vogt (mvo) on 2020-05-18
Changed in grub2 (Ubuntu):
importance: Undecided → High
Dimitri John Ledkov (xnox) wrote :

Note, the Testscript specifies 512MB which is quite small.
Previously, we wanted to ensure that amd64 reference target is "a typical NUC with TPMv2.0 and secureboot", at the time typical NUC models had 2GB of ram.

What is the target minimum ram usage we must achieve?

Dimitri John Ledkov (xnox) wrote :

Starting uc20 in a virsh domain, whilst controlling for peak memory usage, and modifying command line to boot to "rdinit=/bin/sh" => meaning boot to unpacked initrd and start busybox shell without doing anything else.

The rss memory achieved to get to that point was 744684, out of 2033104 available (i'm not sure which units virsh is using here, but it is ~740MB out of 2048MB).

Note on any other platform or mode, we do not loop mount xz compressed snap. And we have stopped using lzma/xz for kernel image or modules compression throughout Ubuntu.

Next steps is to try booting with kernel.snap without compression, or unpacked.

Dimitri John Ledkov (xnox) wrote :

Using sorting didn't change peak rss much, it's at 742524

Dimitri John Ledkov (xnox) wrote :

lzo compression ended up being more 797568

Also, it feels like we try to read the _whole_ of the snap prior to loading it.

As if, measurement of the whole squashfs / partition is taken.

Dimitri John Ledkov (xnox) wrote :

With unpacked kernel.efi boot to rdinit=/bin/sh res usage is 456756

so it feels as if (loop) device is not freed by grub / shim / firmware.

Next up is to try to play with things interactively in grub shell, to try to figure out which commands cause memory to baloon.

Or like see if it can be freed after loading things from squashfs.

UC18 loads kernels from squashfs in under 512MB => compare if grub in uc18 is better.

Dimitri John Ledkov (xnox) wrote :

UC18 size:
8100 kernel.img
3808 initrd.img

~12MB, loaded from .snap, on ext4

UC20 size:
48196 kernel.efi

~50MB, loaded from .snap, on fat

More than 4x larger

Dimitri John Ledkov (xnox) wrote :

the kernel snap sizes, are roughly similar.

204M for uc18
284M for uc20

1.4x larger

Dimitri John Ledkov (xnox) wrote :

The minimum reproducer i have is this:

1) Fetch UC20 image from http://cdimage.ubuntu.com/ubuntu-core/20/edge/pending/
2) boot to grub cmdline prompt
2) execute

loopback loop (hd0,gpt3)/pc-kernel_502.snap

(or use tabcompletion for the right kernel snap)

Equivalent command on UC18 image (with bionic's grub) result in no additional memory used, with the same kernel snap.

On UC20 image, executing that command uses up 400MB of RAM which does not appear to be reclaimed.

It appears to be irrelevant as to what underlying fs type is (UC18 had kernel snap on ext4, UC20 has it on ESP/fat).

Changed in grub2 (Ubuntu):
status: New → Confirmed
Dimitri John Ledkov (xnox) wrote :

seems to work fine under BIOS, loopback loop does not appear to be using up any more data.

It feels like a bug in EFI memory page allocation, which never get released. And/or max_agglomerate implementation under EFI.

Chris Coulson (chrisccoulson) wrote :

I did a bit of digging on this, and it seems to happen because the grub verifier module reads in to memory the entire contents of any file that is opened via grub_file_open without the GRUB_FILE_TYPE_SKIP_SIGNATURE flag or any file which doesn't have a type of GRUB_FILE_TYPE_SIGNATURE or GRUB_FILE_TYPE_VERIFY_SIGNATURE, so that it can provide the file contents to the registered verifier modules and provide the verified contents to the grub file API from memory without having to load it from disk again (which would obviously be vulnerable to TOCTOU type bugs).

Configuring a loopback device via the loopback command opens the underlying disk image, which results in grub's verifier code reading the entire image in to memory. In the case of booting a UC20 recovery system, the loopback image is the kernel snap squashfs. This doesn't happen with the UC18 version of grub because it doesn't ship the verifier module (which is pulled in in UC20 because of the TPM verifier module. The TPM verifier just calculates a hash of the file contents and measures it to PCR9).

I'm not sure that passing loopback image files through the verifier module is a sensible default. The loopback device is just another disk backend, and grub doesn't pass entire physical disk images through the verifier. It seems weird that loopback images would be treated differently, particularly because files opened from the filesystem within the loopback image will be passed through the verifier.

I tested a local build of grub with the attached patch, and was able to boot a UC20 recovery kernel via a loop mounted kernel snap squashfs in a VM with 512MB of RAM. I'm not sure if it's the correct fix for this though.

Chris Coulson (chrisccoulson) wrote :

Hi Colin, I wouldn't mind hearing your thoughts on the previous comment.

tags: added: patch
tags: added: id-5ec540751c801c607c3d8c33
tags: removed: rls-gg-incoming
Julian Andres Klode (juliank) wrote :

The patch looks right to me.

Changed in grub2 (Ubuntu Groovy):
status: Confirmed → Triaged
Claudio Matsuoka (cmatsuoka) wrote :

Chris Coulson's patch should also solve the problem that breaks install on the Thinkcentre m920s with TPM enabled. The last printed message when booting with grub debug enabled is the type of the loopback file, and nothing happens after that. It finishes installing if you rmmod tpm.

Changed in grub2 (Ubuntu Groovy):
status: Triaged → In Progress
Julian Andres Klode (juliank) wrote :

An easy minimal test case would be appreciated. I guess I could just put grub into a directory and then tftp boot that inside qemu, and add a large file in there or something? (or use -vfat on a dir)

Changed in grub2 (Ubuntu Groovy):
status: In Progress → Fix Committed
description: updated
Changed in snapd:
status: New → In Progress
Julian Andres Klode (juliank) wrote :
Changed in grub2 (Ubuntu Focal):
status: New → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers