QEmu with TCG acceleration (without KVM) causes kernel panics with guest kernels >=6.3

Bug #2051965 reported by Matthieu Baerts
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Triaged
Undecided
Sergio Durigan Junior
Mantic
Fix Released
Undecided
Sergio Durigan Junior

Bug Description

[ Impact ]

QEMU users on Jammy/Mantic who choose to use TCG-based acceleration instead of KVM-based might encounter kernel panics when the guest Linux kernel version is greater than or equal to 6.3.

[ Test Plan ]

It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions.

First, you need a machine running Jammy or Mantic. Then:

$ sudo apt-get update && \
 DEBIAN_FRONTEND=noninteractive \
 sudo apt-get install -y --no-install-recommends \
  build-essential libncurses5-dev gcc libssl-dev bc bison \
  libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \
  python3 python3-pkg-resources busybox \
  iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \
  ca-certificates gnupg2 net-tools kmod \
  libdbus-1-dev libnl-genl-3-dev libibverbs-dev \
  tcpdump \
  pkg-config libmnl-dev \
  clang lld llvm llvm-dev libcap-dev \
  gdb crash dwarves strace \
  iptables ebtables nftables vim psmisc bash-completion less jq \
  gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \
  libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \
  libtap-formatter-junit-perl \
  zstd \
  wget xz-utils lftp cpio u-boot-tools \
  cscope \
  bpftrace

Download the virtme project, which is simply a wrapper around QEMU:

$ git clone https://github.com/matttbe/virtme.git

Modify it not to use KVM:

$ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py
$ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py

Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh:

$ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh
$ chmod +x entrypoint.sh

Remove one line not to write stuff in /etc/hosts:

$ sed -i '/prepare_hosts_file$/d' entrypoint.sh

Point the script to virtme's location:

$ sed -i "s@/opt/virtme@${HOME}/virtme@" entrypoint.sh

Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens:

$ git clone --depth=1 https://github.com/torvalds/linux
$ cd linux
$ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh

And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop:

$ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run

Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1':

$ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal

[ Where problems could occur ]

The patches being backported have been part of upstream for a while now (since version 8.1.0). Two of the patches are relatively trivial and deal only with code movement/reorganization; the actual fix (https://gitlab.com/qemu-project/qemu/commit/deba78709a) is a bit more involved. Care has been taken to make sure that a subsequent fix is also part of the backport, but there is always a small possibility to see regressions with more complicated patches. Nevertheless, it's important to mention that this fix only touches TCG-related code, which means that hardware-backed virtualization (by KVM) will not be affected.

[ Original Description ]

I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available.

When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g.

------------- 8< -------------
[ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
[ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
   0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
   7: 00
   8: 0f 1f 40 00 nopl 0x0(%rax)
   c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
  11: 55 push %rbp
  12: 48 89 fd mov %rdi,%rbp
  15: 48 83 ec 20 sub $0x20,%rsp
  19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
  20: 00 00
  22: 48 89 44 24 18 mov %rax,0x18(%rsp)
  27: 31 c0 xor %eax,%eax
  29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
  2e: 66 90 xchg %ax,%ax
  30: 66 90 xchg %ax,%ax
  32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
  37: 48 89 ef mov %rbp,%rdi
  3a: 65 gs
  3b: 8b .byte 0x8b
  3c: 35 .byte 0x35
  3d: 17 (bad)
  3e: 9d popf
  3f: 11 .byte 0x11

Code starting with the faulting instruction
===========================================
   0: c9 leave
   1: 00 00 add %al,(%rax)
   3: 00 66 90 add %ah,-0x70(%rsi)
   6: 66 90 xchg %ax,%ax
   8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
   d: 48 89 ef mov %rbp,%rdi
  10: 65 gs
  11: 8b .byte 0x8b
  12: 35 .byte 0x35
  13: 17 (bad)
  14: 9d popf
  15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Call Trace:
[ 45.505547] <IRQ>
[ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
[ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
[ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __netif_rx (net/core/dev.c:5084)
[ 45.505547] veth_xmit (drivers/net/veth.c:321)
[ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
[ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
[ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
[ 45.505547] ? eth_header (net/ethernet/eth.c:85)
[ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
[ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
[ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
[ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
[ 45.505547] process_backlog (include/linux/rcupdate.h:779)
[ 45.505547] __napi_poll (net/core/dev.c:6576)
[ 45.505547] net_rx_action (net/core/dev.c:6647)
[ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] do_softirq (kernel/softirq.c:454)
[ 45.505547] </IRQ>
[ 45.505547] <TASK>
[ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
[ 45.505547] __dev_queue_xmit (net/core/dev.c:4379)
[ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
[ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
[ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
[ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
[ 45.505547] ? set_pte_range (mm/memory.c:4529)
[ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
[ 45.505547] ? __sock_sendmsg (net/socket.c:733)
[ 45.505547] __sock_sendmsg (net/socket.c:733)
[ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
[ 45.505547] __sys_sendto (net/socket.c:2191)
[ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
[ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __x64_sys_sendto (net/socket.c:2203)
[ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
[ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 45.505547] RIP: 0033:0x7fa1d099ca0a
[ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
All code
========
   0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
   4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
   b: eb b8 jmp 0xffffffffffffffc5
   d: 0f 1f 00 nopl (%rax)
  10: f3 0f 1e fa endbr64
  14: 41 89 ca mov %ecx,%r10d
  17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
  1e: 00
  1f: 85 c0 test %eax,%eax
  21: 75 15 jne 0x38
  23: b8 2c 00 00 00 mov $0x2c,%eax
  28: 0f 05 syscall
  2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
  30: 77 7e ja 0xb0
  32: c3 ret
  33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
  38: 41 54 push %r12
  3a: 48 83 ec 30 sub $0x30,%rsp
  3e: 44 rex.R
  3f: 89 .byte 0x89

Code starting with the faulting instruction
===========================================
   0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
   6: 77 7e ja 0x86
   8: c3 ret
   9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
   e: 41 54 push %r12
  10: 48 83 ec 30 sub $0x30,%rsp
  14: 44 rex.R
  15: 89 .byte 0x89
[ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
[ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
[ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
[ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
[ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
[ 45.505547] </TASK>
[ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 45.505547] ---[ end trace 0000000000000000 ]---
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
   0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
   7: 00
   8: 0f 1f 40 00 nopl 0x0(%rax)
   c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
  11: 55 push %rbp
  12: 48 89 fd mov %rdi,%rbp
  15: 48 83 ec 20 sub $0x20,%rsp
  19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
  20: 00 00
  22: 48 89 44 24 18 mov %rax,0x18(%rsp)
  27: 31 c0 xor %eax,%eax
  29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
  2e: 66 90 xchg %ax,%ax
  30: 66 90 xchg %ax,%ax
  32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
  37: 48 89 ef mov %rbp,%rdi
  3a: 65 gs
  3b: 8b .byte 0x8b
  3c: 35 .byte 0x35
  3d: 17 (bad)
  3e: 9d popf
  3f: 11 .byte 0x11

Code starting with the faulting instruction
===========================================
   0: c9 leave
   1: 00 00 add %al,(%rax)
   3: 00 66 90 add %ah,-0x70(%rsi)
   6: 66 90 xchg %ax,%ax
   8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
   d: 48 89 ef mov %rbp,%rdi
  10: 65 gs
  11: 8b .byte 0x8b
  12: 35 .byte 0x35
  13: 17 (bad)
  14: 9d popf
  15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
[ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
------------- 8< -------------

For more debug info:

  https://github.com/multipath-tcp/mptcp_net-next/issues/471

The crashes happen in 'jump label' code.

I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:

  https://<email address hidden>/T/

Steven Rostedt said:

> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> More specifically, I bet qemu may be doing a dcache barrier, but not an
> icache barrier in the interrupt. If the code is already in qemu's
> pipeline, it may not be flushing it like real hardware would do.
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.

And Masami Hiramatsu said:

> If KVM works well, I agree that this is a qemu
> TCG's bug. I guess TCG implementation forgets to serialize CPU when the
> IPI comes.

> if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
> Self- and Cross-Modifying Code" said that what the other CPU needs to
> do is "Execute serializing instruction; (* For example, CPUID
> instruction *)" for cross-modifying code. that has been done in
> do_sync_core(). Thus this bug should not happen.

I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x.

Related branches

Revision history for this message
Matthieu Baerts (matttbe) wrote :

I just managed to reproduce this issue when using QEmu from Ubuntu 23.10: 8.0.4+dfsg-1ubuntu3.23.10.2

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Right now might be the best time to further close in on this.
You said 8.2 is good.

Could you test if that is true for the two versions in noble right now:
 qemu | 1:8.1.3+ds-1ubuntu2 | noble | source
 qemu | 1:8.2.0+ds-4ubuntu1 | noble-proposed | source

For the second - at least as of now - you need to enable proposed [1].

Also to be extra-sure, this is the guest kernel that is crashing right?

"slow environments" are always hard to debug, are your crashes at least in your setup reliable?
If you indeed can sort out the 8.1 and 8.2 run and 8.2 is really reliably good.

Maybe we can try to slowly (as it takes a while and will not be quick) roll you PPA builds to bisect.
Or - if you can find a testcase that reproduces on just a local binary - just build qemu from git and bisect (if you are comfortable with that)?

[1]: https://wiki.ubuntu.com/Testing/EnableProposed

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Matthieu Baerts (matttbe) wrote :

Hi Christian,

Thank you for your reply!

> Could you test if that is true for the two versions in noble right now:

Yes, I will try to check that next week.

> For the second - at least as of now - you need to enable proposed [1].

Thanks!

> Also to be extra-sure, this is the guest kernel that is crashing right?

Sorry, I forgot to mention that. Yes, it is the guest kernel.

> "slow environments" are always hard to debug, are your crashes at least in your setup reliable?

I have one kernel selftest that causes the crash. But I need to run it ~200 times to have a high chance of saying I don't have the crash with this environment. But that's quite reliable.

From what I understood, we have this issue because some instructions are not serialised, so we don't have an issue all the time, it depends on the sequence.

> Maybe we can try to slowly (as it takes a while and will not be quick) roll you PPA builds to bisect.
> Or - if you can find a testcase that reproduces on just a local binary - just build qemu from git and bisect (if you are comfortable with that)?

I already built the 8.2 version. I can look at building more versions (but not the full releases with .deb packages)

Revision history for this message
Matthieu Baerts (matttbe) wrote :

Hi Christian,

It took me a bit of time to have everything setup, but I managed to do a "git bisect" to find the fix (I had to switch from GCC-13 to GCC-11):

  deba78709a ("accel/tcg: Always lock pages before translation")

https://gitlab.com/qemu-project/qemu/-/commit/deba78709a

This fix has been introduced in v8.1.0, and apparently not backported to earlier versions (I don't know if it is normal or not). So it looks like it affects all Ubuntu versions from at least Jammy 22.04 (I didn't try with an older version) to Mantic 23.10 included. I guess it has not been seen before, because the bug is visible with TCG backend (without KVM) and with Linux kernel >=6.3.

If the plan is to backport the fix in Ubuntu, it looks like it depends on this commit:

  cb62bd15e1 ("accel/tcg: Split out cpu_exec_longjmp_cleanup")

https://gitlab.com/qemu-project/qemu/-/commit/cb62bd15e1

And there is a fix as well:

  ad17868eb1 ("accel/tcg: Clear tcg_ctx->gen_tb on buffer overflow")

https://gitlab.com/qemu-project/qemu/-/commit/ad17868eb1

There are some conflicts when backporting them to v8.0.4, but it is not blocking. I resolved the conflicts and pushed these 3 commits in this branch:

https://gitlab.com/matttbe/qemu/-/commits/lp-2051965/

Please tell me what else I need to do.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Matthieu,

Thanks for the time spent investigating this bug. The information you've provided so far is really helpful.

The next steps here would be to come up with a simple reproducer for the bug, so that I can write the SRU text. Then, I will prepare uploads for Jammy and Mantic, and you can help us by testing the packages once they are uploaded to the archive.

Since you were bisecting the problem, it seems to me that you have the reproducer pretty much nailed already, right? Would you be able to tell me so that I can try reproducing the bug locally as well?

Meanwhile, I'll see about checking the commits you mentioned and start backporting them.

tags: added: server-todo
Revision history for this message
Matthieu Baerts (matttbe) wrote :

Hi Sergio,

Thank you for your reply!

> The next steps here would be to come up with a simple reproducer for the bug, so that I can write the SRU text. Then, I will prepare uploads for Jammy and Mantic, and you can help us by testing the packages once they are uploaded to the archive.

Thank you! Sure, I can test new packages.

> Since you were bisecting the problem, it seems to me that you have the reproducer pretty much nailed already, right? Would you be able to tell me so that I can try reproducing the bug locally as well?

I have a reproducer. But that's not a "simple" one. Here is what can be done:

  # Download the Linux kernel source from kernel.org or git, at least Linux 6.3, ideally a recent one, e.g. v6.7.4
  cd [linux kernel source code]

  # modify a test to stop after what triggers the kernel panic (ping)
  sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh

  # to run the ping test max 250 in the next step
  echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run

  # use a Docker image based on Ubuntu 23.10 including QEmu 8.0.4 with the bug + tools
  # this will build the kernel and dependences, then run 'mptcp_connect.sh' test 250 times
  # docker is used without "--privileged", so KVM will not be used (on purpose)
  docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm -it -e INPUT_BUILD_SKIP_PERF=1 \
    --pull always mptcp/mptcp-upstream-virtme-docker:latest \
    auto-normal

When I tested different versions of QEmu, I used the above command with 'cmd bash' instead of 'auto-normal', and run commands manually to compile QEmu and execute the tests from the VM (./.virtme/scripts/virtme.expect).

I don't have a simple C program to reproduce this concurrency bug. Is it an issue?

It is not clear how other people managed to reproduced the bug. According to the original cover letter [1], I think the bug was visible by just booting (?) Fedora rawhide with kernel-core-6.5.0-0.rc0.20230703gita901a3568fd2.8.fc39.x86_64.rpm. But on [2], it looks like the bug has been seen by LKFT team, when running kernel selftests on various kernel versions, but no more details. If needed, I guess we can contact these people.

[1] https://lore<email address hidden>/
[2] https://lore.kernel.org<email address hidden>/

> Meanwhile, I'll see about checking the commits you mentioned and start backporting them.

Thanks! Do not hesitate to look at commits from https://gitlab.com/matttbe/qemu/-/commits/lp-2051965/
But that's the first time I'm looking at QEmu code, I hope I fixed the conflicts properly.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote : Re: [Bug 2051965] Re: QEmu with TCG acceleration (without KVM) causes kernel panics with kernels >=6.3

On Wednesday, February 07 2024, Matthieu Baerts wrote:

> Hi Sergio,
>
> Thank you for your reply!

Thank you for providing the requested details, Matthieu!

>> Since you were bisecting the problem, it seems to me that you have the
> reproducer pretty much nailed already, right? Would you be able to tell
> me so that I can try reproducing the bug locally as well?
>
> I have a reproducer. But that's not a "simple" one. Here is what can be
> done:
>
> # Download the Linux kernel source from kernel.org or git, at least Linux 6.3, ideally a recent one, e.g. v6.7.4
> cd [linux kernel source code]
>
> # modify a test to stop after what triggers the kernel panic (ping)
> sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh
>
> # to run the ping test max 250 in the next step
> echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run
>
> # use a Docker image based on Ubuntu 23.10 including QEmu 8.0.4 with the bug + tools
> # this will build the kernel and dependences, then run 'mptcp_connect.sh' test 250 times
> # docker is used without "--privileged", so KVM will not be used (on purpose)
> docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --rm -it -e INPUT_BUILD_SKIP_PERF=1 \
> --pull always mptcp/mptcp-upstream-virtme-docker:latest \
> auto-normal

That's pretty good :-). It doesn't need to be really simple; it just
needs to be described clearly, which you did perfectly.

Unfortunately I haven't had the time to test the steps you outlined
today, but I will find some time to do it tomorrow and let you know how
it goes.

> I don't have a simple C program to reproduce this concurrency bug. Is it
> an issue?

Nope, it's not an issue at all. If I succeed in reproducing the bug by
following the steps above, then we're fine and I can start writing the
SRU text right away.

>> Meanwhile, I'll see about checking the commits you mentioned and start
> backporting them.
>
> Thanks! Do not hesitate to look at commits from https://gitlab.com/matttbe/qemu/-/commits/lp-2051965/
> But that's the first time I'm looking at QEmu code, I hope I fixed the conflicts properly.

I'll certainly look at your commits. Thanks for the initial backporting
work, btw. It will be helpful when comparing my results.

Again, thanks a lot for the excellent bug report. I'll look into it
more carefully tomorrow, but I wanted to give you a "sign of life" just
in case.

Cheers,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Matthieu Baerts (matttbe) wrote : Re: QEmu with TCG acceleration (without KVM) causes kernel panics with kernels >=6.3

Hi Sergio,

Thank you for the explanations, and for checking that!

Cheers,
Matt

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hey Matthieu,

OK, I was able to reproduce the bug here following your instructions (thanks again!). While doing that, I realized that using a third-party container image to trigger the issue may be problematic in the SRU process, because we have no control over which third-party packages are installed in it that may affect QEMU. So what I'm doing now is try to "simplify" the testcase by performing what the container image does, but on the host. Let's see how hard it is...

I'll keep you posted.

Revision history for this message
Matthieu Baerts (matttbe) wrote (last edit ):

Hi Sergio,

I guess simply extracting what you need from the docker container would be enough:

Dependencies: (probably more than what is strickly needed)

  sudo apt-get update && \
 DEBIAN_FRONTEND=noninteractive \
 sudo apt-get install -y --no-install-recommends \
  build-essential libncurses5-dev gcc libssl-dev bc bison \
  libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \
  python3 python3-pkg-resources busybox \
  iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \
  ca-certificates gnupg2 net-tools kmod \
  libdbus-1-dev libnl-genl-3-dev libibverbs-dev \
  tcpdump \
  pkg-config libmnl-dev \
  clang lld llvm llvm-dev libcap-dev \
  gdb crash dwarves strace \
  iptables ebtables nftables vim psmisc bash-completion less jq \
  gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \
  libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \
  libtap-formatter-junit-perl \
  zstd \
  wget xz-utils lftp cpio u-boot-tools \
  cscope \
  bpftrace

Virtme is needed in "/opt":

  cd /opt && sudo git clone https://github.com/matttbe/virtme.git

Modify it not to use KVM:

  sudo sed -i 's/accel=kvm:tcg/accel=tcg/' /opt/virtme/virtme/commands/run.py

Download the script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh

Remove one line not to write stuff in /etc/hosts:

  sed -i '/prepare_hosts_file$/d' /PATH/TO/entrypoint.sh

Finally, run it with 'sudo' and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1':

  sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 /PATH/TO/entrypoint.sh auto-normal

Is it OK like that or do we need to extract code frmo this script as well?

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hey Matthieu,

Yeah, that's exactly the first approach I'm taking here :-). I'm glad we're on the same page. And thanks for providing detailed instructions about virtme, those were helpful to unblock me.

So, in the ideal world we would have a self-contained test that doesn't require third-party software that's not packaged in Ubuntu, but honestly, I'm fairly happy to just remove Docker from the equation here. virtme is just a wrapper on top of qemu, and the entrypoint.sh script doesn't install any extra package from an unknown repository.

Right now, I'm trying to reproduce the problem with qemu from Jammy (I managed to reproduce it using Mantic). I'm a little bit puzzled to see that the qemu crashed quicker on Mantic (I've been running the test on Jammy for more than 2 hours now, and it still hasn't crashed). Anyway, I'll leave it running overnight.

Another thing that's concerning is the difference in the codebase from qemu 6.2 (the version from Jammy) and 8.x. I started backporting the upstream patches into Jammy and noticed that there are non-trivial conflicts to be solved.

I apologize for the slow progress here. I'm spread thin across many tasks so I have to timebox my work on this bug.

Hopefully I'll have good news tomorrow.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Huh, the test has just finished on Jammy and the crash didn't happen.

Revision history for this message
Matthieu Baerts (matttbe) wrote :

Hi Sergio,

Thank you for your reply! I'm glad the explanations helped!

> Right now, I'm trying to reproduce the problem with qemu from Jammy (I managed to reproduce it using Mantic). I'm a little bit puzzled to see that the qemu crashed quicker on Mantic (I've been running the test on Jammy for more than 2 hours now, and it still hasn't crashed). Anyway, I'll leave it running overnight.

I also noticed that it was easier to have QEmu crashed on Mantic than on Jammy. On the other hand, when I first tried to reproduce the issue on Jammy, most of the time, it was taking around 50 iterations to hit the kernel panic. Sometimes, it was taking < 10 iterations, but also sometimes around ~150/200.

> Huh, the test has just finished on Jammy and the crash didn't happen.

Arf, sorry for that :-/
I guess 'virtme' was correctly patched not to use KVM, right?

Maybe try with more than 250 iterations by modifying the ".virtme-exec-run" file? e.g. 500? Or no limit by using 'run_loop' instead of 'run_loop_n 250' and leave it running overnight? It is hard to predict when a concurrency bug will happen.

> Another thing that's concerning is the difference in the codebase from qemu 6.2 (the version from Jammy) and 8.x. I started backporting the upstream patches into Jammy and noticed that there are non-trivial conflicts to be solved.

Thank you for having looked at that! Maybe the QEmu community can help for that? I don't know how it usually work.

> I apologize for the slow progress here. I'm spread thin across many tasks so I have to timebox my work on this bug.

That's OK, take your time. On my side, I have a workaround by using a kernel patch. I will be happy to remove it, but I mainly reported the issue to help other people also using QEmu < 8.1 without KVM and with a guest kernel >= 6.3, because it might not be obvious the issue is due to QEmu and not the guest kernel.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote : Re: [Bug 2051965] Re: QEmu with TCG acceleration (without KVM) causes kernel panics with kernels >=6.3

Hey Matthieu,

On Tuesday, February 13 2024, Matthieu Baerts wrote:

>> Right now, I'm trying to reproduce the problem with qemu from Jammy (I
> managed to reproduce it using Mantic). I'm a little bit puzzled to see
> that the qemu crashed quicker on Mantic (I've been running the test on
> Jammy for more than 2 hours now, and it still hasn't crashed). Anyway,
> I'll leave it running overnight.
>
> I also noticed that it was easier to have QEmu crashed on Mantic than on
> Jammy. On the other hand, when I first tried to reproduce the issue on
> Jammy, most of the time, it was taking around 50 iterations to hit the
> kernel panic. Sometimes, it was taking < 10 iterations, but also
> sometimes around ~150/200.
>
>> Huh, the test has just finished on Jammy and the crash didn't happen.
>
> Arf, sorry for that :-/
> I guess 'virtme' was correctly patched not to use KVM, right?
>
> Maybe try with more than 250 iterations by modifying the ".virtme-exec-
> run" file? e.g. 500? Or no limit by using 'run_loop' instead of
> 'run_loop_n 250' and leave it running overnight? It is hard to predict
> when a concurrency bug will happen.

Ah, that did it. Now I can reproduce on Jammy and Mantic, without using
Docker. That's great.

>> Another thing that's concerning is the difference in the codebase from
> qemu 6.2 (the version from Jammy) and 8.x. I started backporting the
> upstream patches into Jammy and noticed that there are non-trivial
> conflicts to be solved.
>
> Thank you for having looked at that! Maybe the QEmu community can help
> for that? I don't know how it usually work.

No problem, I'll figure it out.

>> I apologize for the slow progress here. I'm spread thin across many
> tasks so I have to timebox my work on this bug.
>
> That's OK, take your time. On my side, I have a workaround by using a
> kernel patch. I will be happy to remove it, but I mainly reported the
> issue to help other people also using QEmu < 8.1 without KVM and with a
> guest kernel >= 6.3, because it might not be obvious the issue is due to
> QEmu and not the guest kernel.

Thanks for the patience :-).

Cheers,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Changed in qemu (Ubuntu):
status: Incomplete → Fix Released
Changed in qemu (Ubuntu Jammy):
status: New → Triaged
Changed in qemu (Ubuntu Mantic):
status: New → Triaged
Changed in qemu (Ubuntu Jammy):
assignee: nobody → Sergio Durigan Junior (sergiodj)
Changed in qemu (Ubuntu Mantic):
assignee: nobody → Sergio Durigan Junior (sergiodj)
description: updated
description: updated
summary: - QEmu with TCG acceleration (without KVM) causes kernel panics with
+ QEmu with TCG acceleration (without KVM) causes kernel panics with guest
kernels >=6.3
description: updated
Changed in qemu (Ubuntu Mantic):
status: Triaged → In Progress
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Matthieu, or anyone else affected,

Accepted qemu into mantic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/1:8.0.4+dfsg-1ubuntu3.23.10.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-mantic to verification-done-mantic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-mantic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in qemu (Ubuntu Mantic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-mantic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (qemu/1:8.0.4+dfsg-1ubuntu3.23.10.3)

All autopkgtests for the newly accepted qemu (1:8.0.4+dfsg-1ubuntu3.23.10.3) for mantic have finished running.
The following regressions have been reported in tests triggered by the package:

osk-sdl/0.67.1-3 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/mantic/update_excuses.html#qemu

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Performing the verification for Mantic.

Verifying that we're using the QEMU package from mantic-proposed:

$ apt policy qemu-system-x86
qemu-system-x86:
  Installed: 1:8.0.4+dfsg-1ubuntu3.23.10.3
  Candidate: 1:8.0.4+dfsg-1ubuntu3.23.10.3
  Version table:
 *** 1:8.0.4+dfsg-1ubuntu3.23.10.3 100
        100 http://archive.ubuntu.com/ubuntu mantic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:8.0.4+dfsg-1ubuntu3.23.10.2 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu mantic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu mantic-security/main amd64 Packages
     1:8.0.4+dfsg-1ubuntu3 500
        500 http://nova.clouds.archive.ubuntu.com/ubuntu mantic/main amd64 Packages

Running the tests in a loop and verifying that the guest Linux kernel doesn't panic:

$ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/repr/entrypoint.sh auto-normal

...
== Summary ==

fatal: No names found, cannot describe anything.
Ref:
Mode: normal
Extra kconfig: /

All tests:
ok 1 test: selftest_mptcp_connect

KVM Validation: Success! ✅

This concludes the verification for Mantic.

tags: added: verification-done verification-done-mantic
removed: verification-needed verification-needed-mantic
Revision history for this message
Matthieu Baerts (matttbe) wrote :

Hi Sergio, Andreas,

Thank you for the new version!

I also confirm that with the new version from mantic-proposed...

# apt-cache policy qemu-system-x86
qemu-system-x86:
  Installed: 1:8.0.4+dfsg-1ubuntu3.23.10.3
  Candidate: 1:8.0.4+dfsg-1ubuntu3.23.10.3
  Version table:
 *** 1:8.0.4+dfsg-1ubuntu3.23.10.3 100
        100 http://archive.ubuntu.com/ubuntu mantic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1:8.0.4+dfsg-1ubuntu3.23.10.2 500
        500 http://archive.ubuntu.com/ubuntu mantic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu mantic-security/main amd64 Packages
     1:8.0.4+dfsg-1ubuntu3 500
        500 http://archive.ubuntu.com/ubuntu mantic/main amd64 Packages

...I didn't have any panic after 30 minutes of tests with the new version, while it took me 30 seconds to get the panic with the previous one.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote : Re: [Bug 2051965] Re: QEmu with TCG acceleration (without KVM) causes kernel panics with guest kernels >=6.3

On Tuesday, February 27 2024, Matthieu Baerts wrote:

> Hi Sergio, Andreas,

Hey Matthieu,

> Thank you for the new version!
>
> I also confirm that with the new version from mantic-proposed...
>
> # apt-cache policy qemu-system-x86
> qemu-system-x86:
> Installed: 1:8.0.4+dfsg-1ubuntu3.23.10.3
> Candidate: 1:8.0.4+dfsg-1ubuntu3.23.10.3
> Version table:
> *** 1:8.0.4+dfsg-1ubuntu3.23.10.3 100
> 100 http://archive.ubuntu.com/ubuntu mantic-proposed/main amd64 Packages
> 100 /var/lib/dpkg/status
> 1:8.0.4+dfsg-1ubuntu3.23.10.2 500
> 500 http://archive.ubuntu.com/ubuntu mantic-updates/main amd64 Packages
> 500 http://security.ubuntu.com/ubuntu mantic-security/main amd64 Packages
> 1:8.0.4+dfsg-1ubuntu3 500
> 500 http://archive.ubuntu.com/ubuntu mantic/main amd64 Packages
>
>
> ...I didn't have any panic after 30 minutes of tests with the new version, while it took me 30 seconds to get the panic with the previous one.

That's great, thanks for confirming that the fix indeed works.

The package should be released to mantic-release in the next few days.

Cheers,

--
Sergio
GPG key ID: E92F D0B3 6B14 F1F4 D8E0 EB2F 106D A1C8 C3CB BF14

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:8.0.4+dfsg-1ubuntu3.23.10.3

---------------
qemu (1:8.0.4+dfsg-1ubuntu3.23.10.3) mantic; urgency=medium

  * d/p/u/lp-2051965-*.patch: Fix QEMU crash when using TCG acceleration
    with guest Linux kernel >= 6.3. (LP: #2051965)

 -- Sergio Durigan Junior <email address hidden> Tue, 13 Feb 2024 18:26:17 -0500

Changed in qemu (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for qemu has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Bryce Harrington (bryce)
tags: removed: server-todo
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.