qemu-riscv64-static crashed with SIGSEGV

Bug #1992653 reported by Björn Töpel
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

qemu-riscv64-static crashes on chroot transition (Kinetic)

reproduce:
sudo mmdebstrap --architectures=riscv64 --include="debian-ports-archive-keyring" sid ./rv "deb http://deb.debian.org/debian-ports/ sid main" "deb http://deb.debian.org/debian-ports/ unreleased
sudo chroot ./rv

ProblemType: Crash
DistroRelease: Ubuntu 22.10
Package: qemu-user-static 1:7.0+dfsg-7ubuntu2
ProcVersionSignature: Ubuntu 5.19.0-15.15-generic 5.19.0
Uname: Linux 5.19.0-15-generic x86_64
ApportVersion: 2.23.1-0ubuntu2
Architecture: amd64
CasperMD5CheckResult: pass
CrashCounter: 1
CurrentDesktop: ubuntu:GNOME
Date: Wed Oct 12 15:36:44 2022
ExecutablePath: /usr/bin/qemu-riscv64-static
InstallationDate: Installed on 2022-09-30 (12 days ago)
InstallationMedia: Ubuntu 22.10 "Kinetic Kudu" - Beta amd64 (20220927.1)
JournalErrors: -- No entries --
KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND
LocalLibraries: /home/bjorn/rvrootfs/sid/usr/lib/riscv64-linux-gnu/ld-linux-riscv64-lp64d.so.1 /home/bjorn/rvrootfs/sid/usr/lib/riscv64-linux-gnu/libc.so.6 /home/bjorn/rvrootfs/sid/usr/lib/riscv64-linux-gnu/libtinfo.so.6.3
MachineType: LENOVO 21AHCTO1WW
ProcCmdline: /usr/libexec/qemu-binfmt/riscv64-binfmt-P /bin/bash /bin/bash -i
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.19.0-15-generic root=/dev/mapper/vgubuntu-root ro quiet splash
Signal: 11
SourcePackage: qemu
StacktraceTop:
 ?? ()
 ?? ()
 ?? ()
 ?? ()
 ?? ()
Title: qemu-riscv64-static crashed with SIGSEGV
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
dmi.bios.date: 09/13/2022
dmi.bios.release: 1.5
dmi.bios.vendor: LENOVO
dmi.bios.version: N3MET08W (1.05 )
dmi.board.asset.tag: Not Available
dmi.board.name: 21AHCTO1WW
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.4
dmi.modalias: dmi:bvnLENOVO:bvrN3MET08W(1.05):bd09/13/2022:br1.5:efr1.4:svnLENOVO:pn21AHCTO1WW:pvrThinkPadT14Gen3:rvnLENOVO:rn21AHCTO1WW:rvrNotDefined:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_21AH_BU_Think_FM_ThinkPadT14Gen3:
dmi.product.family: ThinkPad T14 Gen 3
dmi.product.name: 21AHCTO1WW
dmi.product.sku: LENOVO_MT_21AH_BU_Think_FM_ThinkPad T14 Gen 3
dmi.product.version: ThinkPad T14 Gen 3
dmi.sys.vendor: LENOVO
separator:

Revision history for this message
Björn Töpel (bjorn-topel) wrote :
Revision history for this message
Apport retracing service (apport) wrote :

StacktraceTop:
 sigsuspend ()
 dump_core_and_abort (target_sig=target_sig@entry=11) at ../../linux-user/signal.c:775
 handle_pending_signal (cpu_env=0x15a4030, sig=11, k=0x15a6cc8) at ../../linux-user/signal.c:1096
 process_pending_signals (cpu_env=0x15a4030) at ../../linux-user/signal.c:1181
 cpu_loop (env=0x15a4030) at ../../linux-user/riscv/cpu_loop.c:94

tags: removed: need-amd64-retrace
Paride Legovini (paride)
information type: Private → Public
Revision history for this message
Paride Legovini (paride) wrote (last edit ):

Thanks Björn for this bug report. I am not able to reproduce the issue. What I did is:

1. sudo apt install debian-ports-archive-keyrin
2. sudo mmdebstrap --architectures=riscv64 --include="debian-ports-archive-keyring" sid ./rv "deb http://deb.debian.org/debian-ports/ sid main" "deb http://deb.debian.org/debian-ports/ unreleased main" # there's a missing bit in the reproducer command in the bug description, but I think I guessed it right
3. sudo chroot ./rv
4. Run `arch` to check it returns riscv64

This is on an up-to-date Kinetic amd64 system.

I think you did hit a bug here, but without a reproducer it's difficult for us to start working on it. Do you have any suggestion on how can we trigger the crash? Any relevant context? Thanks.

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Thomas Ward (teward) wrote :

I repro'd this on my Jammy amd64 baremetal.

Repro'd in Kinetic VM on the Jammy host:

1) Get dependencies:
  sudo apt install binfmt-support qemu-user-static qemu-user debian-ports-archive-keyring

2) Run the debstrap.
  sudo mmdebstrap --architectures=riscv64 --include="debian-ports-archive-keyring" sid ./rv "deb http://deb.debian.org/debian-ports/ sid main" "deb http://deb.debian.org/debian-ports/ unreleased main"

3) Try to enter chroot
  sudo chroot ./rv

4) SEGFAULT

I don't have trace but I have a .crash file. Which likely contains the trace. I could run apport-collect here if we want my data from the VM attached to the bug.

Revision history for this message
Paride Legovini (paride) wrote :

Interesting. I tried again on my machine (Kinetic amd64) and again I can't reproduce. I also tried in a Kinetic VM running on a Jammy host (again amd64), and I still can't reproduce. The VM is a LXD VM running the generic kernel. I am not sure of what is different in my environment.

Revision history for this message
Björn Töpel (bjorn-topel) wrote :

It seems to be bash that causes the crash. Attaching bash from the sid rootfs:

023edb836c62b36a91755e8e9b5be652f31d2e4e1dccfc0e069b5d3978f193dc ./bash

# dpkg -l bash
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-============-============-=================================
ii bash 5.2-1 riscv64 GNU Bourne Again SHell
#

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm, I also retried this in Kinetic, sadly for me it behaves as for Paride in comment #3 - working just fine. Qemu is at 1:7.0+dfsg-7ubuntu2 for me.

But the fact that we have Björn and Thomas hitting it makes totally believe it is a problem.
Just why not on anyone's system that tries to debug it ... ?

For the (unlikely but possible) chance, I've recently had one that failed only on amd chips, I have intel here, @Paride/@Björn/@Thomas could we just compare environments.

P.S. I also subscribed Heinrich who II'd assume runs the most real and emulated riscv64, he might have some idea on this as well.

Revision history for this message
Heinrich Schuchardt (xypron) wrote :

sudo mmdebstrap --architectures=riscv64 --include="debian-ports-archive-keyring" sid ./rv "deb http://deb.debian.org/debian-ports/ sid main" "deb http://deb.debian.org/debian-ports/ unreleased

misses a closing ".

sudo mmdebstrap --architectures=riscv64 --include="debian-ports-archive-keyring" sid ./rv "deb http://deb.debian.org/debian-ports/ sid main" "deb http://deb.debian.org/debian-ports/ unreleased"
I: automatically chosen mode: root
I: riscv64 cannot be executed natively, but transparently using qemu-user binfmt emulation
I: automatically chosen format: directory
I: running apt-get update...
done
E: Malformed entry 3 in list file /tmp/rv/etc/apt/sources.list (Component)
E: The list of sources could not be read.
E: apt-get update --error-on=any -oAPT::Status-Fd=<$fd> -oDpkg::Use-Pty=false failed
W: listening on child socket failed:
E: mmdebstrap failed to run

Could you, please, provide a reproducible example.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So far we have these (and I added a few more on my side to complete the picture):
Qemu-wise all my Jammy tests were on 1:6.2+dfsg-2ubuntu6.4 and Kinetic on 1:7.0+dfsg-7ubuntu2

Format:
NR - HW -> BareMetalRelease -> VM Type and release -> result

Thomas
(1) - ?? - Jammy -> Kinetic ?? VM -> fails
(2) - ?? - Jammy -> fails
Björn
(3) - ?? - Kinetic Fails
Paride
(4) - ?? - Kinetic -> works
(5) - ?? - Jammy -> Kinetic LXD VM -> works
Christian:
(6) - Intel - Jammy -> Kinetic LXD container -> works
(7) - Intel - Jammy -> Jammy LXD container -> works
(8) - Intel - Jammy -> fails
(9) - Intel - Jammy -> Kinetic libvirt VM (cpu=passthrough) -> works
(10) - Intel - Jammy -> Kinetic libvirt VM (cpu=qemu64) -> works

For me it is already puzzling to see the difference between (7) and (8).

I mean I could say that the older Jammy qemu wasn't that far to work - after all we know many risc fixes landed in 7.0. But then why does the very same qemu work in a Jammy container, but not in a Jammy bare-metal machine (the same machine BTW).

Sadly (8) is my actual home system, so the one I'd least likely want to flood with stuff for debugging. But all others work and I can't debug there.

Maybe we can find more what makes the difference...
Would you others minds helping to fill out the ?? for your entries?
From there we will have to try more until we find the difference that matters to trigger this.
(When you do, just copy and paste the data we have so far, modify it accordingly and post it again).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : Re: [Bug 1992653] Re: qemu-riscv64-static crashed with SIGSEGV

On Mon, Oct 17, 2022 at 9:05 AM Heinrich Schuchardt
<email address hidden> wrote:
>
...
> misses a closing ".

Later updates have already said so.
If you e.g. take the steps from Thomas' comment it will work (well or crash).

To re-summarize it again:
  sudo apt install binfmt-support qemu-user-static qemu-user
debian-ports-archive-keyring mmdebstrap
  sudo mmdebstrap --architectures=riscv64
--include="debian-ports-archive-keyring" sid ./rv "deb
http://deb.debian.org/debian-ports/ sid main" "deb
http://deb.debian.org/debian-ports/ unreleased main"
  sudo chroot ./rv

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We could also think about binfmt having a different behavior in container/bare-metal but it should behave the same in VM/bare-metal and that is not what I see.
Well actually I do not know yet, let me add a Jammy VM run ...

And on my normal system I didn't run in the home dir, so another testcase ...

Thomas
(1) - ?? - Jammy -> Kinetic ?? VM -> fails
(2) - ?? - Jammy -> fails
Björn
(3) - ?? - Kinetic Fails
Paride
(4) - ?? - Kinetic -> works
(5) - ?? - Jammy -> Kinetic LXD VM -> works
Christian:
(6) - Intel - Jammy -> Kinetic LXD container -> works
(7) - Intel - Jammy -> Jammy LXD container -> works
(8) - Intel - Jammy -> fails
(9) - Intel - Jammy -> Kinetic libvirt VM (cpu=passthrough) -> works
(10) - Intel - Jammy -> Kinetic libvirt VM (cpu=qemu64) -> works
(11) - Intel -> Jammy -> Jammy libvirt VM /tmp via sudo -> works
(12) - Intel -> Jammy -> Jammy libvirt VM /root as root -> works

Hmm, I had some hope but (8) vs (11) should really be the same - out of ideas for now.
The only difference is that (11) is new and clean and who knows what i've done to my laptop (8) in the past ... :-/

Changed in qemu (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (6.8 KiB)

I guess we can consider it confirmed by now, although I feel we still fail to see the actual "this makes it good/bad" here.

I've had a look at a more detailed stack trace and it looks similar (or the same) but with more info:

--- stack trace ---
#0 0x000000000056218a in sigsuspend ()
No symbol table info available.
#1 0x00000000004d1bd7 in dump_core_and_abort (target_sig=target_sig@entry=11) at ../../linux-user/signal.c:772
        cpu = <optimized out>
        env = <optimized out>
        ts = 0x1f368a0
        host_sig = 11
        core_dumped = <optimized out>
        act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {18446744067267099647, 4294967296, 1, 549755813888, 7956000, 24576, 24576, 24592, 1537, 140051329744643, 0, 2, 32945512, 274904076288, 4271466, 3}}, sa_flags = 0, sa_restorer = 0x0}
#2 0x00000000004d4964 in handle_pending_signal (cpu_env=0x1f35170, sig=11, k=0x1f36eb8) at ../../linux-user/signal.c:1099
        cpu = 0x1f2ce70
        handler = <optimized out>
        set = {__val = {4522484598571010, 70368776932576, 0, 140051329744832, 32723312, 0, 140722945443588, 274904640376, 32689776, 4991986, 0, 0, 16896, 0, 0, 35369322530945280}}
        target_old_set = {sig = {69888}}
        sa = <optimized out>
        ts = 0x1f368a0
#3 0x000000000041aa4f in process_pending_signals (cpu_env=<optimized out>) at ../../linux-user/signal.c:1185
        sig = 11
        blocked_set = <optimized out>
        cpu = <optimized out>
        ts = 0x1f368a0
        set = {__val = {18446744067267100671, 0 <repeats 15 times>}}
        cpu = <optimized out>
        sig = <optimized out>
        ts = <optimized out>
        set = <optimized out>
        blocked_set = <optimized out>
        restart_scan = <optimized out>
#4 cpu_loop (env=<optimized out>) at ../../linux-user/riscv/cpu_loop.c:115
        cs = <optimized out>
        trapnr = <optimized out>
        signum = <optimized out>
        sigcode = <optimized out>
        sigaddr = <optimized out>
        ret = <optimized out>
        gdbstep = <optimized out>
#5 0x0000000000404168 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../../linux-user/main.c:908
        regs1 = {sepc = 274904155514, ra = 0, sp = 274904081840, gp = 0, tp = 0, t0 = 0, t1 = 0, t2 = 0, s0 = 0, s1 = 0, a0 = 0, a1 = 0, a2 = 0, a3 = 0, a4 = 0, a5 = 0, a6 = 0, a7 = 0, s2 = 0, s3 = 0, s4 = 0, s5 = 0, s6 = 0, s7 = 0, s8 = 0, s9 = 0, s10 = 0, s11 = 0, t3 = 0, t4 = 0, t5 = 0, t6 = 0}
        regs = 0x7ffc9d2cca50
        info1 = {load_bias = 274904084480, load_addr = 274877906944, start_code = 274877906944, end_code = 274878815092, start_data = 274878817208, end_data = 274878868464, start_brk = 0, brk = 274878913512, reserve_brk = 16777216, start_mmap = 2147483648, start_stack = 274904081840, stack_limit = 274895695872, entry = 274904155514, code_offset = 274877906944, data_offset = 274877906944, saved_auxv = 274904082080, auxv_len = 272, arg_start = 274904081848, arg_end = 274904081864, arg_strings = 274904082368, env_strings = 274904082381, file_string = 274904084462, elf_flags = 5, personality = 0, alignment = 4096, loadmap_addr = 0, nsegs = 2, lo...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.