QEmu with TCG acceleration (without KVM) causes kernel panics with guest kernels >=6.3
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
qemu (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Jammy |
Triaged
|
Undecided
|
Sergio Durigan Junior | ||
Mantic |
Fix Released
|
Undecided
|
Sergio Durigan Junior |
Bug Description
[ Impact ]
QEMU users on Jammy/Mantic who choose to use TCG-based acceleration instead of KVM-based might encounter kernel panics when the guest Linux kernel version is greater than or equal to 6.3.
[ Test Plan ]
It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions.
First, you need a machine running Jammy or Mantic. Then:
$ sudo apt-get update && \
DEBIAN_
sudo apt-get install -y --no-install-
build-essential libncurses5-dev gcc libssl-dev bc bison \
libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \
python3 python3-
iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \
ca-certificates gnupg2 net-tools kmod \
libdbus-1-dev libnl-genl-3-dev libibverbs-dev \
tcpdump \
pkg-config libmnl-dev \
clang lld llvm llvm-dev libcap-dev \
gdb crash dwarves strace \
iptables ebtables nftables vim psmisc bash-completion less jq \
gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \
libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \
libtap-
zstd \
wget xz-utils lftp cpio u-boot-tools \
cscope \
bpftrace
Download the virtme project, which is simply a wrapper around QEMU:
$ git clone https:/
Modify it not to use KVM:
$ sed -i 's/accel=
$ sed -i 's@\(if is_native and os\.access.
Download the test script from https:/
$ wget https:/
$ chmod +x entrypoint.sh
Remove one line not to write stuff in /etc/hosts:
$ sed -i '/prepare_
Point the script to virtme's location:
$ sed -i "s@/opt/
Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens:
$ git clone --depth=1 https:/
$ cd linux
$ sed -i '/ping tests"/a exit $ret' tools/testing/
And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop:
$ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run
Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_
$ sudo INPUT_BUILD_
[ Where problems could occur ]
The patches being backported have been part of upstream for a while now (since version 8.1.0). Two of the patches are relatively trivial and deal only with code movement/
[ Original Description ]
I'm using QEmu 1:6.2+dfsg-
When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g.
------------- 8< -------------
[ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
[ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 45.505547] RIP: 0010:netif_
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
=======
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c0
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Call Trace:
[ 45.505547] <IRQ>
[ 45.505547] ? die (arch/x86/
[ 45.505547] ? exc_int3 (arch/x86/
[ 45.505547] ? asm_exc_int3 (arch/x86/
[ 45.505547] ? netif_rx_internal (arch/x86/
[ 45.505547] ? netif_rx_internal (arch/x86/
[ 45.505547] __netif_rx (net/core/
[ 45.505547] veth_xmit (drivers/
[ 45.505547] dev_hard_start_xmit (include/
[ 45.505547] __dev_queue_xmit (include/
[ 45.505547] ? selinux_
[ 45.505547] ? eth_header (net/ethernet/
[ 45.505547] ip6_finish_output2 (include/
[ 45.505547] ? ip6_output (include/
[ 45.505547] ? ip6_mtu (net/ipv6/
[ 45.505547] ip6_send_skb (net/ipv6/
[ 45.505547] icmpv6_echo_reply (net/ipv6/
[ 45.505547] ? icmpv6_rcv (net/ipv6/
[ 45.505547] icmpv6_rcv (net/ipv6/
[ 45.505547] ip6_protocol_
[ 45.505547] ip6_input_finish (include/
[ 45.505547] __netif_
[ 45.505547] process_backlog (include/
[ 45.505547] __napi_poll (net/core/
[ 45.505547] net_rx_action (net/core/
[ 45.505547] __do_softirq (arch/x86/
[ 45.505547] do_softirq (kernel/
[ 45.505547] </IRQ>
[ 45.505547] <TASK>
[ 45.505547] __local_
[ 45.505547] __dev_queue_xmit (net/core/
[ 45.505547] ip6_finish_output2 (include/
[ 45.505547] ? ip6_output (include/
[ 45.505547] ? ip6_mtu (net/ipv6/
[ 45.505547] ip6_send_skb (net/ipv6/
[ 45.505547] rawv6_sendmsg (net/ipv6/
[ 45.505547] ? netfs_clear_
[ 45.505547] ? netfs_alloc_request (fs/netfs/
[ 45.505547] ? folio_add_
[ 45.505547] ? set_pte_range (mm/memory.c:4529)
[ 45.505547] ? next_uptodate_folio (include/
[ 45.505547] ? __sock_sendmsg (net/socket.c:733)
[ 45.505547] __sock_sendmsg (net/socket.c:733)
[ 45.505547] ? move_addr_
[ 45.505547] __sys_sendto (net/socket.c:2191)
[ 45.505547] ? __hrtimer_
[ 45.505547] ? __do_softirq (arch/x86/
[ 45.505547] __x64_sys_sendto (net/socket.c:2203)
[ 45.505547] do_syscall_64 (arch/x86/
[ 45.505547] entry_SYSCALL_
[ 45.505547] RIP: 0033:0x7fa1d099ca0a
[ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
All code
========
0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
4: 48 c7 c0 ff ff ff ff mov $0xffffffffffff
b: eb b8 jmp 0xffffffffffffffc5
d: 0f 1f 00 nopl (%rax)
10: f3 0f 1e fa endbr64
14: 41 89 ca mov %ecx,%r10d
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 15 jne 0x38
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xffffffffffff
30: 77 7e ja 0xb0
32: c3 ret
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 41 54 push %r12
3a: 48 83 ec 30 sub $0x30,%rsp
3e: 44 rex.R
3f: 89 .byte 0x89
Code starting with the faulting instruction
=======
0: 48 3d 00 f0 ff ff cmp $0xffffffffffff
6: 77 7e ja 0x86
8: c3 ret
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 41 54 push %r12
10: 48 83 ec 30 sub $0x30,%rsp
14: 44 rex.R
15: 89 .byte 0x89
[ 45.505547] RSP: 002b:00007ffe47
[ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
[ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
[ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
[ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
[ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
[ 45.505547] </TASK>
[ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 45.505547] ---[ end trace 0000000000000000 ]---
[ 45.505547] RIP: 0010:netif_
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
=======
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c0
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
[ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000
------------- 8< -------------
For more debug info:
https:/
The crashes happen in 'jump label' code.
I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:
https://<email address hidden>/T/
Steven Rostedt said:
> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> More specifically, I bet qemu may be doing a dcache barrier, but not an
> icache barrier in the interrupt. If the code is already in qemu's
> pipeline, it may not be flushing it like real hardware would do.
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.
And Masami Hiramatsu said:
> If KVM works well, I agree that this is a qemu
> TCG's bug. I guess TCG implementation forgets to serialize CPU when the
> IPI comes.
> if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
> Self- and Cross-Modifying Code" said that what the other CPU needs to
> do is "Execute serializing instruction; (* For example, CPUID
> instruction *)" for cross-modifying code. that has been done in
> do_sync_core(). Thus this bug should not happen.
I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x.
Related branches
- git-ubuntu bot: Approve
- Christian Ehrhardt (community): Approve
- Canonical Server Reporter: Pending requested
-
Diff: 789 lines (+755/-0)5 files modifieddebian/changelog (+7/-0)
debian/patches/series (+3/-0)
debian/patches/ubuntu/lp-2051965-accel-tcg-Always-lock-pages-before-translation.patch (+621/-0)
debian/patches/ubuntu/lp-2051965-accel-tcg-Clear-tcg_ctx-gen_tb-on-buffer-overflow.patch (+34/-0)
debian/patches/ubuntu/lp-2051965-accel-tcg-Split-out-cpu_exec_longjmp_cleanup.patch (+90/-0)
tags: | added: server-todo |
Changed in qemu (Ubuntu): | |
status: | Incomplete → Fix Released |
Changed in qemu (Ubuntu Jammy): | |
status: | New → Triaged |
Changed in qemu (Ubuntu Mantic): | |
status: | New → Triaged |
Changed in qemu (Ubuntu Jammy): | |
assignee: | nobody → Sergio Durigan Junior (sergiodj) |
Changed in qemu (Ubuntu Mantic): | |
assignee: | nobody → Sergio Durigan Junior (sergiodj) |
description: | updated |
description: | updated |
summary: |
- QEmu with TCG acceleration (without KVM) causes kernel panics with + QEmu with TCG acceleration (without KVM) causes kernel panics with guest kernels >=6.3 |
description: | updated |
Changed in qemu (Ubuntu Mantic): | |
status: | Triaged → In Progress |
tags: | removed: server-todo |
I just managed to reproduce this issue when using QEmu from Ubuntu 23.10: 8.0.4+dfsg- 1ubuntu3. 23.10.2