qemu-aarch64-static "Illegal instruction" with debootstrap

Bug #1922617 reported by Nathan Chancellor
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Richard Henderson

Bug Description

This is reproducible against QEMU master. I apologize for the long reproduction steps, I tried to distill it down as much as possible.

System info:

# qemu-aarch64-static --version
qemu-aarch64 version 5.2.91 (v6.0.0-rc1-68-gee82c086ba)
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

# head -n 26 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
stepping : 7
microcode : 0x5002f01
cpu MHz : 1000.716
cache size : 22528 KB
physical id : 0
siblings : 32
core id : 0
cpu cores : 16
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa itlb_multihit
bogomips : 4600.00
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

My reproduction steps:

# apt-get install --no-install-recommends -y \
    build-essential \
    ca-certificates \
    debootstrap \
    git \
    libglib2.0-dev \
    libpixman-1-dev \
    ninja-build \
    pkg-config \
    python3 \
    zstd

# git clone https://github.com/qemu/qemu

# mkdir qemu/build

# cd qemu/build

# ../configure \
    --enable-debug \
    --enable-linux-user \
    --disable-bsd-user \
    --disable-werror \
    --disable-system \
    --disable-tools \
    --disable-docs \
    --disable-gtk \
    --disable-gnutls \
    --disable-nettle \
    --disable-gcrypt \
    --disable-glusterfs \
    --disable-libnfs \
    --disable-libiscsi \
    --disable-vnc \
    --disable-kvm \
    --disable-libssh \
    --disable-libxml2 \
    --disable-vde \
    --disable-sdl \
    --disable-opengl \
    --disable-xen \
    --disable-fdt \
    --disable-vhost-net \
    --disable-vhost-crypto \
    --disable-vhost-user \
    --disable-vhost-vsock \
    --disable-vhost-scsi \
    --disable-tpm \
    --disable-qom-cast-debug \
    --disable-capstone \
    --disable-zstd \
    --disable-linux-io-uring \
    --static \
    --target-list-exclude=hexagon-linux-user

# ninja qemu-aarch64

# install -Dm755 qemu-aarch64 /usr/local/bin/qemu-aarch64-static

# cat <<'EOF' >/proc/sys/fs/binfmt_misc/register
:qemu-aarch64:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\xb7:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/usr/local/bin/qemu-aarch64-static:CF
EOF

# debootstrap --arch arm64 --foreign buster debian-rootfs

# chroot debian-rootfs /debootstrap/debootstrap --second-stage
Illegal instruction

This prevents me from building an arm64 Debian image on x86_64. If I am doing something wrong, please let me know. The binary has been uploaded for your convenience.

Tags: linux-user
Revision history for this message
Nathan Chancellor (nathanchance) wrote :
Revision history for this message
Peter Maydell (pmaydell) wrote :

This won't be the cause of the crash, but: don't run ninja directly. The build instructions (documented in README.rst) haven't changed: run configure, and then run make. The makefile still does some things and is not a pure does-absolutely-nothing wrapper around ninja in all cases.

tags: added: linux-user
Revision history for this message
Laurent Vivier (laurent-vivier) wrote :

I'm able to reproduce a coredump o("Illegal Instruction", but of host type) during a debootstrap process. The coredump is for a grep process, I'm trying to bisect.

Revision history for this message
Laurent Vivier (laurent-vivier) wrote :

Bisected to

commit 26bab757d41b853ea84cb52a10fafc9c10069658
Author: Richard Henderson <email address hidden>
Date: Fri Feb 12 10:48:33 2021 -0800

    linux-user: Introduce PAGE_ANON

    Record whether the backing page is anonymous, or if it has file
    backing. This will allow us to get close to the Linux AArch64
    ABI for MTE, which allows tag memory only on ram-backed VMAs.

    The real ABI allows tag memory on files, when those files are
    on ram-backed filesystems, such as tmpfs. We will not be able
    to implement that in QEMU linux-user.

    Thankfully, anonymous memory for malloc arenas is the primary
    consumer of this feature, so this restricted version should
    still be of use.

    Reviewed-by: Peter Maydell <email address hidden>
    Signed-off-by: Richard Henderson <email address hidden>
    Message-id: <email address hidden>
    Signed-off-by: Peter Maydell <email address hidden>

Revision history for this message
Philippe Mathieu-Daudé (philmd) wrote :

Possible fix:
https://<email address hidden>/msg796781.html

Revision history for this message
Richard Henderson (rth) wrote :

The fix that Phil links would only pertain if debootstrap were
actively using MTE. I think we can safely disregard that.

I don't believe that the bisect has worked. There is nothing in
that 3 line patch that would affect *anything*, as the PAGE_ANON
value is not used at this point.

Revision history for this message
Laurent Vivier (laurent-vivier) wrote :

Yes, applying the patch pointed out by Philippe doesn't fix the problem.

But I think bisect has worked fine.

If I revert this patch (26bab757d41), it works fine again.

I revert:

"target/arm: Add allocation tag storage for user mode"
 "linux-user: Introduce PAGE_ANON"

Only reverting the first patch doesn't fix the problem.

Revision history for this message
Laurent Vivier (laurent-vivier) wrote :

Perhaps the reason is:

include/exec/cpu-all.h

#define PAGE_ANON 0x0080
...
#define PAGE_TARGET_1 0x0080

Revision history for this message
Laurent Vivier (laurent-vivier) wrote :

commit be5d6f4884021208ae0e73379c83e51500ad3a8d
Author: Richard Henderson <email address hidden>
Date: Wed Oct 21 10:37:39 2020 -0700

    linux-user: Set PAGE_TARGET_1 for TARGET_PROT_BTI

    Transform the prot bit to a qemu internal page bit, and save
    it in the page tables.

    Reviewed-by: Peter Maydell <email address hidden>
    Signed-off-by: Richard Henderson <email address hidden>
    Message-id: <email address hidden>
    Signed-off-by: Peter Maydell <email address hidden>
...

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 49cd5cabcf2a..c18a91676656 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3445,6 +3445,11 @@ static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs *x)
 #define arm_tlb_bti_gp(x) (typecheck_memtxattrs(x)->target_tlb_bit0)
 #define arm_tlb_mte_tagged(x) (typecheck_memtxattrs(x)->target_tlb_bit1)

+/*
+ * AArch64 usage of the PAGE_TARGET_* bits for linux-user.
+ */
+#define PAGE_BTI PAGE_TARGET_1
+
 /*
  * Naming convention for isar_feature functions:
  * Functions which test 32-bit ID registers should have _aa32_ in
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 71888083417d..072754fa24d4 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14507,10 +14507,10 @@ static void disas_data_proc_simd_fp(DisasContext *s, uint32_t insn)
  */
 static bool is_guarded_page(CPUARMState *env, DisasContext *s)
 {
+ uint64_t addr = s->base.pc_first;
 #ifdef CONFIG_USER_ONLY
- return false; /* FIXME */
+ return page_get_flags(addr) & PAGE_BTI;
 #else
- uint64_t addr = s->base.pc_first;
     int mmu_idx = arm_to_core_mmu_idx(s->mmu_idx);
     unsigned int index = tlb_index(env, mmu_idx, addr);
     CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);

Revision history for this message
Richard Henderson (rth) wrote :

Ouch, yes indeed. Will fix.

Changed in qemu:
status: New → In Progress
assignee: nobody → Richard Henderson (rth)
Revision history for this message
Richard Henderson (rth) wrote :

Fix commit: 52c01ada8661 ("exec: Fix overlap of PAGE_ANON and PAGE_TARGET_1")

Changed in qemu:
status: In Progress → Fix Committed
Thomas Huth (th-huth)
Changed in qemu:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.