[scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

Bug #1931728 reported by Iain Lane
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
Undecided
Unassigned
qemu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

This is something to do with the configuration or software versions running here, since we have identical hardware running in an adjacent cloud region but with different versions of the cloud/virt stack.

When we boot bionic arm64 Ubuntu cloud images in "bos01" and they land on the eMAG systems, they always fail like this:

...
[ 1.585611] Key type dns_resolver registered
[ 1.587408] registered taskstats version 1
[ 1.588913] Loading compiled-in X.509 certificates
[ 1.592668] Loaded X.509 cert 'Build time autogenerated kernel key: 4a4a555bc5fd0178c9ab722f3ae7b392f7714ac4'
[ 1.598866] Loaded X.509 cert 'Canonical Ltd. Live Patch Signing: 14df34d1a87cf37625abec039ef2bf521249b969'
[ 1.605389] Loaded X.509 cert 'Canonical Ltd. Kernel Module Signing: 88f752e560a1e0737e31163a466ad7b70a850c19'
[ 1.610861] Couldn't get size: 0x800000000000000e
[ 1.613413] MODSIGN: Couldn't get UEFI db list
[ 1.615920] Couldn't get size: 0x800000000000000e
[ 1.618256] MODSIGN: Couldn't get UEFI MokListRT
[ 1.620315] Couldn't get size: 0x800000000000000e
[ 1.622317] MODSIGN: Couldn't get UEFI dbx list
[ 1.624446] zswap: loaded using pool lzo/zbud
[ 1.628185] Key type big_key registered
[ 1.629937] Key type trusted registered
[ 1.632012] Key type encrypted registered
[ 1.633668] AppArmor: AppArmor sha1 policy hashing enabled
[ 1.635625] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[ 1.638009] ima: Allocated hash algorithm: sha1
[ 1.639272] evm: HMAC attrs: 0x1
[ 1.640875] rtc-efi rtc-efi: setting system clock to 2021-05-11 09:28:43 UTC (1620725323)
[ 1.646247] Freeing unused kernel memory: 5824K
[ 1.657870] Checked W+X mappings: passed, no W+X pages found
[ 1.660250] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000005
[ 1.660250]
[ 1.663294] CPU: 1 PID: 1 Comm: init Not tainted 4.15.0-142-generic #146-Ubuntu
[ 1.665939] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 1.668204] Call trace:
[ 1.668981] dump_backtrace+0x0/0x198
[ 1.670212] show_stack+0x24/0x30
[ 1.671282] dump_stack+0x98/0xc8
[ 1.672433] panic+0x128/0x2b0
[ 1.673502] do_exit+0x75c/0xa80
[ 1.674749] do_group_exit+0x40/0xb0
[ 1.676191] get_signal+0x114/0x6e8
[ 1.677867] do_signal+0x18c/0x240
[ 1.679498] do_notify_resume+0xd0/0x328
[ 1.681302] work_pending+0x8/0x10
[ 1.682771] SMP: stopping secondary CPUs
[ 1.684624] Kernel Offset: disabled
[ 1.686119] CPU features: 0x04802008
[ 1.687353] Memory Limit: none
[ 1.688453] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000005
[ 1.688453]

Full example log: https://autopkgtest.ubuntu.com/results/autopkgtest-bionic/bionic/arm64/c/chromium-browser/20210511_093739_ea3f2@/log.gz

We tried to get IS to roll things back to match the working bos02, but they said it's too different and not possible.

The working cloud is running Mitaka from the cloud archive on Xenial (qemu 2.5)
The broken cloud is running Queens from the cloud archive on Xenial (qemu 2.11)

The other thing someone suggested we try is that MDS mitigation is enabled in the broken cloud, so we could disable it. No idea if that makes sense tbh.

That's about all we have right now.

Iain Lane (laney)
description: updated
Iain Lane (laney)
description: updated
summary: - [scalingstack bos01] bionic instances always fail to boot on eMAGs in
- this cloud
+ [scalingstack bos01] bionic (arm64) instances always fail to boot on
+ eMAGs in this cloud
Revision history for this message
dann frazier (dannf) wrote :

This is very strange. It looks like the guest has managed to start /init in the initramfs. /init is a simple shell script that just creates a few directories/mounts sysfs/proc, before printing "Loading, please wait...." - which we do not see. I've tried to simulate this by removing /init from the initramfs, removing /bin/sh, and removing the loader (/lib/ld-linux-aarch64.so.1), but none cause the boot to fail in *this* way. I've asked for some more info from the scalingstack setup - the libvirt xml/qemu command with the hopes of reproducing it in a debug environment.

Revision history for this message
dann frazier (dannf) wrote :

I don't have access to an eMAG system at the moment, but I tried and failed to reproduce on ThunderX 2 box by running xenial + hwe kernel + cloud-archive:queens.

Revision history for this message
dann frazier (dannf) wrote :

IS gave me a copy of the libvirt xml for the guest, and I used it to build a VM as similar as possible (minus some details about storage/NIC). However I was still unable to reproduce on a ThunderX2 host. I asked a contact over at Ampere Computing to see if this symptom was something they'd seen before, but it didn't ring any bells. We are using host-passthrough CPUs, so one difference would be underlying CPU. I noticed that the failing guest log shows KPTI as a cpu feature but my (working) one doesn't. I tried forcing KPTI on (kpti=1) on the guest to compensate, but that also did not trigger the failure. I couldn't really spot any other interesting differences in the log.

btw, this bug says bionic guests fails to boot - do we know if other Ubuntu guests versions are OK?

Revision history for this message
Iain Lane (laney) wrote : Re: [Bug 1931728] Re: [scalingstack bos01] bionic (arm64) instances always fail to boot on eMAGs in this cloud

On Mon, Jun 14, 2021 at 11:23:06PM -0000, dann frazier wrote:
> btw, this bug says bionic guests fails to boot - do we know if other
> Ubuntu guests versions are OK?

Thanks for your efforts looking into this so far.

Yeah, I've only seen it in bionic. I tried spawning instances for all
current supported releases, and only bionic failed:

ubuntu@juju-806ee7-stg-proposed-migration-43:~$ for ip in $(openstack server list | awk -F'[= |]+' '/bos01-arm64/ { print $6 }'); do timeout 60s ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no ubuntu@${ip} 'lsb_release -a; uptime'; done
Warning: Permanently added '10.43.128.5' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.7 LTS
Release: 16.04
Codename: xenial
No LSB modules are available.
 08:21:32 up 3 days, 21:33, 0 users, load average: 0.00, 0.00, 0.00
Warning: Permanently added '10.43.128.22' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description: Ubuntu Impish Indri (development branch)
Release: 21.10
Codename: impish
No LSB modules are available.
 08:21:34 up 3 days, 21:33, 0 users, load average: 0.00, 0.00, 0.00
Warning: Permanently added '10.43.128.8' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description: Ubuntu 21.04
Release: 21.04
Codename: hirsute
No LSB modules are available.
 08:21:36 up 3 days, 21:33, 0 users, load average: 0.00, 0.00, 0.00
Warning: Permanently added '10.43.128.4' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description: Ubuntu 20.10
Release: 20.10
Codename: groovy
No LSB modules are available.
 08:21:38 up 3 days, 21:33, 0 users, load average: 0.00, 0.00, 0.00
Warning: Permanently added '10.43.128.15' (ECDSA) to the list of known hosts.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
No LSB modules are available.
 08:21:40 up 3 days, 21:33, 0 users, load average: 0.03, 0.01, 0.00
ssh: connect to host 10.43.128.20 port 22: No route to host # that's the bionic instance

and I double checked they all eneded up on an eMAG.

(I was trying all releases with a reboot loop to try to reproduce the
other issue Julian mentioned on ubuntu-devel, but failed to cause it to
happen ...)

What do you think about the "MDS mitigation" BIOS setting idea as a
difference between the working/broken installations, is it worth trying
to get IS to flip that? Seems like it's probably sane to have it on,
but it'd maybe be useful info.

Cheers,

--
Iain Lane [ <email address hidden> ]
Debian Developer [ <email address hidden> ]
Ubuntu Developer [ <email address hidden> ]

Revision history for this message
dann frazier (dannf) wrote :

I've got access to an eMAG system now, so I'm going to try to reproduce there. I see the MDS mitigation setting, so I can try flipping that as well. I don't have any theory as to why that would help/hurt, but no reason not to try it.

Revision history for this message
dann frazier (dannf) wrote :

I've confirmed that enabling MDS mitigation causes the problem to occur - it does not occur with it disabled.

Firmware version 11.05.116

Revision history for this message
dann frazier (dannf) wrote :

I suppose the next step is to see if we can figure out why this only seems to impact bionic guests. I'll see if I can get access to the system again and try and identify the relevant difference.

Revision history for this message
dann frazier (dannf) wrote :

I found that booting an upstream 4.18 kernel in the guest trips this problem, while an upstream 5.4 does not. I bisected and found that this commit seems to be the relevant change:

commit 3b7142752e4bee153df6db4a76ca104ef0d7c0b4 (refs/bisect/bad)
Author: Mark Rutland <email address hidden>
Date: Wed Jul 11 14:56:45 2018 +0100

    arm64: convert native/compat syscall entry to C

    Now that the syscall invocation logic is in C, we can migrate the rest
    of the syscall entry logic over, so that the entry assembly needn't look
    at the register values at all.

    The SVE reset across syscall logic now unconditionally clears TIF_SVE,
    but sve_user_disable() will only write back to CPACR_EL1 when SVE is
    actually enabled.

    Signed-off-by: Mark Rutland <email address hidden>
    Reviewed-by: Catalin Marinas <email address hidden>
    Reviewed-by: Dave Martin <email address hidden>
    Cc: Will Deacon <email address hidden>
    Signed-off-by: Will Deacon <email address hidden>

dann frazier (dannf)
Changed in qemu (Ubuntu):
status: New → Invalid
Revision history for this message
dann frazier (dannf) wrote :

The changeset referenced in Comment #8 was part of a 21-part changeset shown here:
  https://patches.linaro.org/cover/141739/

I was able to backport these changes to our 4.15 kernel and get it booting w/ the MDS workaround enabled. One option is to SRU those (and a number of follow-up Fixes: changes for them). Presumably that's of some security benefit, though I'm not sure how significant of one. It also isn't clear exactly why those changes are needed for compat with the firmware MDS workaround.

dann frazier (dannf)
Changed in linux (Ubuntu):
status: New → Triaged
Revision history for this message
dann frazier (dannf) wrote :
Download full text (42.1 KiB)

Note that bare metal boots also fail the same way w/ the MDS workaround enabled (failing VM boots all had a newer kernel running on the host):

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x503f0002]
[ 0.000000] Linux version 4.15.0-147-generic (buildd@bos02-arm64-076) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)) #151-Ubuntu SMP Fri Jun 18 19:18:37 UTC 2021 (Ubuntu 4.15.0-147.151-generic 4.15.18)
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: EFI v2.70 by American Megatrends
[ 0.000000] efi: ACPI 2.0=0xbff5960000 SMBIOS 3.0=0xbff686fd98 ESRT=0xbff1a3a018
[ 0.000000] esrt: Reserving ESRT space from 0x000000bff1a3a018 to 0x000000bff1a3a078.
[ 0.000000] ACPI: Early table checksum verification disabled
[ 0.000000] ACPI: RSDP 0x000000BFF5960000 000024 (v02 ALASKA)
[ 0.000000] ACPI: XSDT 0x000000BFF5960028 000094 (v01 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: FACP 0x000000BFF59600C0 000114 (v06 Ampere eMAG 00000003 INTL 20190509)
[ 0.000000] ACPI: DSDT 0x000000BFF59601D8 0077CD (v05 ALASKA A M I 00000001 INTL 20190509)
[ 0.000000] ACPI: FIDT 0x000000BFF59679A8 00009C (v01 ALASKA A M I 01072009 AMI 00010013)
[ 0.000000] ACPI: DBG2 0x000000BFF5967A48 000061 (v00 Ampere eMAG 00000000 INTL 20190509)
[ 0.000000] ACPI: GTDT 0x000000BFF5967AB0 000108 (v02 Ampere eMAG 00000001 INTL 20190509)
[ 0.000000] ACPI: IORT 0x000000BFF5967BB8 000BCC (v00 Ampere eMAG 00000000 INTL 20190509)
[ 0.000000] ACPI: MCFG 0x000000BFF5968788 0000AC (v01 Ampere eMAG 00000001 INTL 20190509)
[ 0.000000] ACPI: SSDT 0x000000BFF5968838 00002D (v02 Ampere eMAG 00000001 INTL 20190509)
[ 0.000000] ACPI: SPMI 0x000000BFF5968868 000041 (v05 ALASKA A M I 00000000 AMI. 00000000)
[ 0.000000] ACPI: APIC 0x000000BFF59688B0 000A68 (v04 Ampere eMAG 00000000 AMP. 01000013)
[ 0.000000] ACPI: PCCT 0x000000BFF5969318 0005D0 (v01 Ampere eMAG 00000003 01000013)
[ 0.000000] ACPI: BERT 0x000000BFF59698E8 000030 (v01 Ampere eMAG 00000003 INTL 20190509)
[ 0.000000] ACPI: HEST 0x000000BFF5969918 000328 (v01 Ampere eMAG 00000003 INTL 20190509)
[ 0.000000] ACPI: SPCR 0x000000BFF5969C40 000050 (v02 A M I APTIO V 01072009 AMI. 0005000D)
[ 0.000000] ACPI: PPTT 0x000000BFF5969C90 000CB8 (v01 Ampere eMAG 00000003 01000013)
[ 0.000000] ACPI: SPCR: console: pl011,mmio32,0x12600000,115200
[ 0.000000] ACPI: NUMA: Failed to initialise from firmware
[ 0.000000] NUMA: Faking a node at [mem 0x0000000090000000-0x000000bfffffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0xbffffe7d00-0xbffffeafff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000090000000-0x00000000ffffffff]
[ 0.000000] Normal [mem 0x0000000100000000-0x000000bfffffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000090000000-0x0000000091f...

Revision history for this message
dann frazier (dannf) wrote :

Upstream has been working with me to try to determine what is going on here. The conclusion is that we believe that firmware is piggy-backing on the ARM_SMCCC_ARCH_WORKAROUND calls, and is clobbering some of the registers in the x4-x17 range. The patch series mentioned in Comment #9 happens to avoid consuming those registers, hiding the issue. Apparently clobbering those registers was previously OK, but the SMCCCv1.1 update mandates that x4-x17 be preserved, which the firmware authors may have missed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.