2024-02-01 17:32:48 |
Matthieu Baerts |
bug |
|
|
added bug |
2024-02-02 06:31:06 |
Christian Ehrhardt |
qemu (Ubuntu): status |
New |
Incomplete |
|
2024-02-07 03:23:27 |
Sergio Durigan Junior |
bug |
|
|
added subscriber Ubuntu Server |
2024-02-07 03:23:33 |
Sergio Durigan Junior |
tags |
|
server-todo |
|
2024-02-14 01:22:25 |
Sergio Durigan Junior |
nominated for series |
|
Ubuntu Jammy |
|
2024-02-14 01:22:25 |
Sergio Durigan Junior |
bug task added |
|
qemu (Ubuntu Jammy) |
|
2024-02-14 01:22:25 |
Sergio Durigan Junior |
nominated for series |
|
Ubuntu Mantic |
|
2024-02-14 01:22:25 |
Sergio Durigan Junior |
bug task added |
|
qemu (Ubuntu Mantic) |
|
2024-02-14 01:22:30 |
Sergio Durigan Junior |
qemu (Ubuntu): status |
Incomplete |
Fix Released |
|
2024-02-14 01:22:35 |
Sergio Durigan Junior |
qemu (Ubuntu Jammy): status |
New |
Triaged |
|
2024-02-14 01:22:37 |
Sergio Durigan Junior |
qemu (Ubuntu Mantic): status |
New |
Triaged |
|
2024-02-14 01:22:40 |
Sergio Durigan Junior |
qemu (Ubuntu Jammy): assignee |
|
Sergio Durigan Junior (sergiodj) |
|
2024-02-14 01:22:43 |
Sergio Durigan Junior |
qemu (Ubuntu Mantic): assignee |
|
Sergio Durigan Junior (sergiodj) |
|
2024-02-14 01:35:26 |
Sergio Durigan Junior |
description |
I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available.
When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g.
------------- 8< -------------
[ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
[ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Call Trace:
[ 45.505547] <IRQ>
[ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
[ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
[ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __netif_rx (net/core/dev.c:5084)
[ 45.505547] veth_xmit (drivers/net/veth.c:321)
[ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
[ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
[ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
[ 45.505547] ? eth_header (net/ethernet/eth.c:85)
[ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
[ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
[ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
[ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
[ 45.505547] process_backlog (include/linux/rcupdate.h:779)
[ 45.505547] __napi_poll (net/core/dev.c:6576)
[ 45.505547] net_rx_action (net/core/dev.c:6647)
[ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] do_softirq (kernel/softirq.c:454)
[ 45.505547] </IRQ>
[ 45.505547] <TASK>
[ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
[ 45.505547] __dev_queue_xmit (net/core/dev.c:4379)
[ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
[ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
[ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
[ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
[ 45.505547] ? set_pte_range (mm/memory.c:4529)
[ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
[ 45.505547] ? __sock_sendmsg (net/socket.c:733)
[ 45.505547] __sock_sendmsg (net/socket.c:733)
[ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
[ 45.505547] __sys_sendto (net/socket.c:2191)
[ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
[ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __x64_sys_sendto (net/socket.c:2203)
[ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
[ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 45.505547] RIP: 0033:0x7fa1d099ca0a
[ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
All code
========
0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
b: eb b8 jmp 0xffffffffffffffc5
d: 0f 1f 00 nopl (%rax)
10: f3 0f 1e fa endbr64
14: 41 89 ca mov %ecx,%r10d
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 15 jne 0x38
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 7e ja 0xb0
32: c3 ret
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 41 54 push %r12
3a: 48 83 ec 30 sub $0x30,%rsp
3e: 44 rex.R
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 7e ja 0x86
8: c3 ret
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 41 54 push %r12
10: 48 83 ec 30 sub $0x30,%rsp
14: 44 rex.R
15: 89 .byte 0x89
[ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
[ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
[ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
[ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
[ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
[ 45.505547] </TASK>
[ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 45.505547] ---[ end trace 0000000000000000 ]---
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
[ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
------------- 8< -------------
For more debug info:
https://github.com/multipath-tcp/mptcp_net-next/issues/471
The crashes happen in 'jump label' code.
I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:
https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/
Steven Rostedt said:
> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> More specifically, I bet qemu may be doing a dcache barrier, but not an
> icache barrier in the interrupt. If the code is already in qemu's
> pipeline, it may not be flushing it like real hardware would do.
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.
And Masami Hiramatsu said:
> If KVM works well, I agree that this is a qemu
> TCG's bug. I guess TCG implementation forgets to serialize CPU when the
> IPI comes.
> if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
> Self- and Cross-Modifying Code" said that what the other CPU needs to
> do is "Execute serializing instruction; (* For example, CPUID
> instruction *)" for cross-modifying code. that has been done in
> do_sync_core(). Thus this bug should not happen.
I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. |
[ Impact ]
TBD.
[ Test Plan ]
It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions.
First, you need a machine running Jammy or Mantic. Then:
$ sudo apt-get update && \
DEBIAN_FRONTEND=noninteractive \
sudo apt-get install -y --no-install-recommends \
build-essential libncurses5-dev gcc libssl-dev bc bison \
libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \
python3 python3-pkg-resources busybox \
iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \
ca-certificates gnupg2 net-tools kmod \
libdbus-1-dev libnl-genl-3-dev libibverbs-dev \
tcpdump \
pkg-config libmnl-dev \
clang lld llvm llvm-dev libcap-dev \
gdb crash dwarves strace \
iptables ebtables nftables vim psmisc bash-completion less jq \
gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \
libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \
libtap-formatter-junit-perl \
zstd \
wget xz-utils lftp cpio u-boot-tools \
cscope \
bpftrace
Download the virtme project, which is simply a wrapper around QEMU:
$ git clone https://github.com/matttbe/virtme.git
Modify it not to use KVM:
$ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py
$ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py
Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh:
$ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh
$ chmod +x entrypoint.sh
Remove one line not to write stuff in /etc/hosts:
$ sed -i '/prepare_hosts_file$/d' entrypoint.sh
Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens:
$ git clone --depth=1 https://github.com/torvalds/linux
$ cd linux
$ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh
And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop:
$ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run
Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1':
$ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal
[ Where problems could occur ]
TBD.
[ Original Description ]
I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available.
When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g.
------------- 8< -------------
[ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
[ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Call Trace:
[ 45.505547] <IRQ>
[ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
[ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
[ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __netif_rx (net/core/dev.c:5084)
[ 45.505547] veth_xmit (drivers/net/veth.c:321)
[ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
[ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
[ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
[ 45.505547] ? eth_header (net/ethernet/eth.c:85)
[ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
[ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
[ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
[ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
[ 45.505547] process_backlog (include/linux/rcupdate.h:779)
[ 45.505547] __napi_poll (net/core/dev.c:6576)
[ 45.505547] net_rx_action (net/core/dev.c:6647)
[ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] do_softirq (kernel/softirq.c:454)
[ 45.505547] </IRQ>
[ 45.505547] <TASK>
[ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
[ 45.505547] __dev_queue_xmit (net/core/dev.c:4379)
[ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
[ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
[ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
[ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
[ 45.505547] ? set_pte_range (mm/memory.c:4529)
[ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
[ 45.505547] ? __sock_sendmsg (net/socket.c:733)
[ 45.505547] __sock_sendmsg (net/socket.c:733)
[ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
[ 45.505547] __sys_sendto (net/socket.c:2191)
[ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
[ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __x64_sys_sendto (net/socket.c:2203)
[ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
[ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 45.505547] RIP: 0033:0x7fa1d099ca0a
[ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
All code
========
0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
b: eb b8 jmp 0xffffffffffffffc5
d: 0f 1f 00 nopl (%rax)
10: f3 0f 1e fa endbr64
14: 41 89 ca mov %ecx,%r10d
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 15 jne 0x38
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 7e ja 0xb0
32: c3 ret
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 41 54 push %r12
3a: 48 83 ec 30 sub $0x30,%rsp
3e: 44 rex.R
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 7e ja 0x86
8: c3 ret
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 41 54 push %r12
10: 48 83 ec 30 sub $0x30,%rsp
14: 44 rex.R
15: 89 .byte 0x89
[ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
[ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
[ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
[ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
[ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
[ 45.505547] </TASK>
[ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 45.505547] ---[ end trace 0000000000000000 ]---
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
[ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
------------- 8< -------------
For more debug info:
https://github.com/multipath-tcp/mptcp_net-next/issues/471
The crashes happen in 'jump label' code.
I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:
https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/
Steven Rostedt said:
> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> More specifically, I bet qemu may be doing a dcache barrier, but not an
> icache barrier in the interrupt. If the code is already in qemu's
> pipeline, it may not be flushing it like real hardware would do.
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.
And Masami Hiramatsu said:
> If KVM works well, I agree that this is a qemu
> TCG's bug. I guess TCG implementation forgets to serialize CPU when the
> IPI comes.
> if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
> Self- and Cross-Modifying Code" said that what the other CPU needs to
> do is "Execute serializing instruction; (* For example, CPUID
> instruction *)" for cross-modifying code. that has been done in
> do_sync_core(). Thus this bug should not happen.
I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. |
|
2024-02-14 01:36:50 |
Sergio Durigan Junior |
description |
[ Impact ]
TBD.
[ Test Plan ]
It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions.
First, you need a machine running Jammy or Mantic. Then:
$ sudo apt-get update && \
DEBIAN_FRONTEND=noninteractive \
sudo apt-get install -y --no-install-recommends \
build-essential libncurses5-dev gcc libssl-dev bc bison \
libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \
python3 python3-pkg-resources busybox \
iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \
ca-certificates gnupg2 net-tools kmod \
libdbus-1-dev libnl-genl-3-dev libibverbs-dev \
tcpdump \
pkg-config libmnl-dev \
clang lld llvm llvm-dev libcap-dev \
gdb crash dwarves strace \
iptables ebtables nftables vim psmisc bash-completion less jq \
gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \
libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \
libtap-formatter-junit-perl \
zstd \
wget xz-utils lftp cpio u-boot-tools \
cscope \
bpftrace
Download the virtme project, which is simply a wrapper around QEMU:
$ git clone https://github.com/matttbe/virtme.git
Modify it not to use KVM:
$ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py
$ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py
Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh:
$ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh
$ chmod +x entrypoint.sh
Remove one line not to write stuff in /etc/hosts:
$ sed -i '/prepare_hosts_file$/d' entrypoint.sh
Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens:
$ git clone --depth=1 https://github.com/torvalds/linux
$ cd linux
$ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh
And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop:
$ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run
Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1':
$ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal
[ Where problems could occur ]
TBD.
[ Original Description ]
I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available.
When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g.
------------- 8< -------------
[ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
[ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Call Trace:
[ 45.505547] <IRQ>
[ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
[ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
[ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __netif_rx (net/core/dev.c:5084)
[ 45.505547] veth_xmit (drivers/net/veth.c:321)
[ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
[ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
[ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
[ 45.505547] ? eth_header (net/ethernet/eth.c:85)
[ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
[ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
[ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
[ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
[ 45.505547] process_backlog (include/linux/rcupdate.h:779)
[ 45.505547] __napi_poll (net/core/dev.c:6576)
[ 45.505547] net_rx_action (net/core/dev.c:6647)
[ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] do_softirq (kernel/softirq.c:454)
[ 45.505547] </IRQ>
[ 45.505547] <TASK>
[ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
[ 45.505547] __dev_queue_xmit (net/core/dev.c:4379)
[ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
[ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
[ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
[ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
[ 45.505547] ? set_pte_range (mm/memory.c:4529)
[ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
[ 45.505547] ? __sock_sendmsg (net/socket.c:733)
[ 45.505547] __sock_sendmsg (net/socket.c:733)
[ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
[ 45.505547] __sys_sendto (net/socket.c:2191)
[ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
[ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __x64_sys_sendto (net/socket.c:2203)
[ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
[ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 45.505547] RIP: 0033:0x7fa1d099ca0a
[ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
All code
========
0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
b: eb b8 jmp 0xffffffffffffffc5
d: 0f 1f 00 nopl (%rax)
10: f3 0f 1e fa endbr64
14: 41 89 ca mov %ecx,%r10d
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 15 jne 0x38
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 7e ja 0xb0
32: c3 ret
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 41 54 push %r12
3a: 48 83 ec 30 sub $0x30,%rsp
3e: 44 rex.R
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 7e ja 0x86
8: c3 ret
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 41 54 push %r12
10: 48 83 ec 30 sub $0x30,%rsp
14: 44 rex.R
15: 89 .byte 0x89
[ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
[ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
[ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
[ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
[ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
[ 45.505547] </TASK>
[ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 45.505547] ---[ end trace 0000000000000000 ]---
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
[ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
------------- 8< -------------
For more debug info:
https://github.com/multipath-tcp/mptcp_net-next/issues/471
The crashes happen in 'jump label' code.
I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:
https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/
Steven Rostedt said:
> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> More specifically, I bet qemu may be doing a dcache barrier, but not an
> icache barrier in the interrupt. If the code is already in qemu's
> pipeline, it may not be flushing it like real hardware would do.
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.
And Masami Hiramatsu said:
> If KVM works well, I agree that this is a qemu
> TCG's bug. I guess TCG implementation forgets to serialize CPU when the
> IPI comes.
> if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
> Self- and Cross-Modifying Code" said that what the other CPU needs to
> do is "Execute serializing instruction; (* For example, CPUID
> instruction *)" for cross-modifying code. that has been done in
> do_sync_core(). Thus this bug should not happen.
I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. |
[ Impact ]
TBD.
[ Test Plan ]
It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions.
First, you need a machine running Jammy or Mantic. Then:
$ sudo apt-get update && \
DEBIAN_FRONTEND=noninteractive \
sudo apt-get install -y --no-install-recommends \
build-essential libncurses5-dev gcc libssl-dev bc bison \
libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \
python3 python3-pkg-resources busybox \
iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \
ca-certificates gnupg2 net-tools kmod \
libdbus-1-dev libnl-genl-3-dev libibverbs-dev \
tcpdump \
pkg-config libmnl-dev \
clang lld llvm llvm-dev libcap-dev \
gdb crash dwarves strace \
iptables ebtables nftables vim psmisc bash-completion less jq \
gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \
libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \
libtap-formatter-junit-perl \
zstd \
wget xz-utils lftp cpio u-boot-tools \
cscope \
bpftrace
Download the virtme project, which is simply a wrapper around QEMU:
$ git clone https://github.com/matttbe/virtme.git
Modify it not to use KVM:
$ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py
$ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py
Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh:
$ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh
$ chmod +x entrypoint.sh
Remove one line not to write stuff in /etc/hosts:
$ sed -i '/prepare_hosts_file$/d' entrypoint.sh
Point the script to virtme's location:
$ sed -i "s@/opt/virtme@${HOME}/virtme@" entrypoint.sh
Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens:
$ git clone --depth=1 https://github.com/torvalds/linux
$ cd linux
$ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh
And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop:
$ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run
Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1':
$ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal
[ Where problems could occur ]
TBD.
[ Original Description ]
I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available.
When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g.
------------- 8< -------------
[ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
[ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Call Trace:
[ 45.505547] <IRQ>
[ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
[ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
[ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __netif_rx (net/core/dev.c:5084)
[ 45.505547] veth_xmit (drivers/net/veth.c:321)
[ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
[ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
[ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
[ 45.505547] ? eth_header (net/ethernet/eth.c:85)
[ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
[ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
[ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
[ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
[ 45.505547] process_backlog (include/linux/rcupdate.h:779)
[ 45.505547] __napi_poll (net/core/dev.c:6576)
[ 45.505547] net_rx_action (net/core/dev.c:6647)
[ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] do_softirq (kernel/softirq.c:454)
[ 45.505547] </IRQ>
[ 45.505547] <TASK>
[ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
[ 45.505547] __dev_queue_xmit (net/core/dev.c:4379)
[ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
[ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
[ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
[ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
[ 45.505547] ? set_pte_range (mm/memory.c:4529)
[ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
[ 45.505547] ? __sock_sendmsg (net/socket.c:733)
[ 45.505547] __sock_sendmsg (net/socket.c:733)
[ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
[ 45.505547] __sys_sendto (net/socket.c:2191)
[ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
[ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __x64_sys_sendto (net/socket.c:2203)
[ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
[ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 45.505547] RIP: 0033:0x7fa1d099ca0a
[ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
All code
========
0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
b: eb b8 jmp 0xffffffffffffffc5
d: 0f 1f 00 nopl (%rax)
10: f3 0f 1e fa endbr64
14: 41 89 ca mov %ecx,%r10d
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 15 jne 0x38
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 7e ja 0xb0
32: c3 ret
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 41 54 push %r12
3a: 48 83 ec 30 sub $0x30,%rsp
3e: 44 rex.R
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 7e ja 0x86
8: c3 ret
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 41 54 push %r12
10: 48 83 ec 30 sub $0x30,%rsp
14: 44 rex.R
15: 89 .byte 0x89
[ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
[ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
[ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
[ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
[ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
[ 45.505547] </TASK>
[ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 45.505547] ---[ end trace 0000000000000000 ]---
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
[ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
------------- 8< -------------
For more debug info:
https://github.com/multipath-tcp/mptcp_net-next/issues/471
The crashes happen in 'jump label' code.
I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:
https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/
Steven Rostedt said:
> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> More specifically, I bet qemu may be doing a dcache barrier, but not an
> icache barrier in the interrupt. If the code is already in qemu's
> pipeline, it may not be flushing it like real hardware would do.
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.
And Masami Hiramatsu said:
> If KVM works well, I agree that this is a qemu
> TCG's bug. I guess TCG implementation forgets to serialize CPU when the
> IPI comes.
> if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
> Self- and Cross-Modifying Code" said that what the other CPU needs to
> do is "Execute serializing instruction; (* For example, CPUID
> instruction *)" for cross-modifying code. that has been done in
> do_sync_core(). Thus this bug should not happen.
I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. |
|
2024-02-14 03:51:23 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/460476 |
|
2024-02-14 10:29:49 |
Matthieu Baerts |
summary |
QEmu with TCG acceleration (without KVM) causes kernel panics with kernels >=6.3 |
QEmu with TCG acceleration (without KVM) causes kernel panics with guest kernels >=6.3 |
|
2024-02-20 21:01:44 |
Sergio Durigan Junior |
description |
[ Impact ]
TBD.
[ Test Plan ]
It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions.
First, you need a machine running Jammy or Mantic. Then:
$ sudo apt-get update && \
DEBIAN_FRONTEND=noninteractive \
sudo apt-get install -y --no-install-recommends \
build-essential libncurses5-dev gcc libssl-dev bc bison \
libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \
python3 python3-pkg-resources busybox \
iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \
ca-certificates gnupg2 net-tools kmod \
libdbus-1-dev libnl-genl-3-dev libibverbs-dev \
tcpdump \
pkg-config libmnl-dev \
clang lld llvm llvm-dev libcap-dev \
gdb crash dwarves strace \
iptables ebtables nftables vim psmisc bash-completion less jq \
gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \
libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \
libtap-formatter-junit-perl \
zstd \
wget xz-utils lftp cpio u-boot-tools \
cscope \
bpftrace
Download the virtme project, which is simply a wrapper around QEMU:
$ git clone https://github.com/matttbe/virtme.git
Modify it not to use KVM:
$ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py
$ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py
Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh:
$ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh
$ chmod +x entrypoint.sh
Remove one line not to write stuff in /etc/hosts:
$ sed -i '/prepare_hosts_file$/d' entrypoint.sh
Point the script to virtme's location:
$ sed -i "s@/opt/virtme@${HOME}/virtme@" entrypoint.sh
Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens:
$ git clone --depth=1 https://github.com/torvalds/linux
$ cd linux
$ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh
And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop:
$ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run
Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1':
$ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal
[ Where problems could occur ]
TBD.
[ Original Description ]
I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available.
When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g.
------------- 8< -------------
[ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
[ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Call Trace:
[ 45.505547] <IRQ>
[ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
[ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
[ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __netif_rx (net/core/dev.c:5084)
[ 45.505547] veth_xmit (drivers/net/veth.c:321)
[ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
[ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
[ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
[ 45.505547] ? eth_header (net/ethernet/eth.c:85)
[ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
[ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
[ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
[ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
[ 45.505547] process_backlog (include/linux/rcupdate.h:779)
[ 45.505547] __napi_poll (net/core/dev.c:6576)
[ 45.505547] net_rx_action (net/core/dev.c:6647)
[ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] do_softirq (kernel/softirq.c:454)
[ 45.505547] </IRQ>
[ 45.505547] <TASK>
[ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
[ 45.505547] __dev_queue_xmit (net/core/dev.c:4379)
[ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
[ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
[ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
[ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
[ 45.505547] ? set_pte_range (mm/memory.c:4529)
[ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
[ 45.505547] ? __sock_sendmsg (net/socket.c:733)
[ 45.505547] __sock_sendmsg (net/socket.c:733)
[ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
[ 45.505547] __sys_sendto (net/socket.c:2191)
[ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
[ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __x64_sys_sendto (net/socket.c:2203)
[ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
[ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 45.505547] RIP: 0033:0x7fa1d099ca0a
[ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
All code
========
0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
b: eb b8 jmp 0xffffffffffffffc5
d: 0f 1f 00 nopl (%rax)
10: f3 0f 1e fa endbr64
14: 41 89 ca mov %ecx,%r10d
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 15 jne 0x38
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 7e ja 0xb0
32: c3 ret
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 41 54 push %r12
3a: 48 83 ec 30 sub $0x30,%rsp
3e: 44 rex.R
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 7e ja 0x86
8: c3 ret
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 41 54 push %r12
10: 48 83 ec 30 sub $0x30,%rsp
14: 44 rex.R
15: 89 .byte 0x89
[ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
[ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
[ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
[ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
[ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
[ 45.505547] </TASK>
[ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 45.505547] ---[ end trace 0000000000000000 ]---
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
[ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
------------- 8< -------------
For more debug info:
https://github.com/multipath-tcp/mptcp_net-next/issues/471
The crashes happen in 'jump label' code.
I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:
https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/
Steven Rostedt said:
> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> More specifically, I bet qemu may be doing a dcache barrier, but not an
> icache barrier in the interrupt. If the code is already in qemu's
> pipeline, it may not be flushing it like real hardware would do.
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.
And Masami Hiramatsu said:
> If KVM works well, I agree that this is a qemu
> TCG's bug. I guess TCG implementation forgets to serialize CPU when the
> IPI comes.
> if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
> Self- and Cross-Modifying Code" said that what the other CPU needs to
> do is "Execute serializing instruction; (* For example, CPUID
> instruction *)" for cross-modifying code. that has been done in
> do_sync_core(). Thus this bug should not happen.
I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. |
[ Impact ]
QEMU users on Jammy/Mantic who choose to use TCG-based acceleration instead of KVM-based might encounter kernel panics when the guest Linux kernel version is greater than or equal to 6.3.
[ Test Plan ]
It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions.
First, you need a machine running Jammy or Mantic. Then:
$ sudo apt-get update && \
DEBIAN_FRONTEND=noninteractive \
sudo apt-get install -y --no-install-recommends \
build-essential libncurses5-dev gcc libssl-dev bc bison \
libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \
python3 python3-pkg-resources busybox \
iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \
ca-certificates gnupg2 net-tools kmod \
libdbus-1-dev libnl-genl-3-dev libibverbs-dev \
tcpdump \
pkg-config libmnl-dev \
clang lld llvm llvm-dev libcap-dev \
gdb crash dwarves strace \
iptables ebtables nftables vim psmisc bash-completion less jq \
gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \
libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \
libtap-formatter-junit-perl \
zstd \
wget xz-utils lftp cpio u-boot-tools \
cscope \
bpftrace
Download the virtme project, which is simply a wrapper around QEMU:
$ git clone https://github.com/matttbe/virtme.git
Modify it not to use KVM:
$ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py
$ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py
Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh:
$ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh
$ chmod +x entrypoint.sh
Remove one line not to write stuff in /etc/hosts:
$ sed -i '/prepare_hosts_file$/d' entrypoint.sh
Point the script to virtme's location:
$ sed -i "s@/opt/virtme@${HOME}/virtme@" entrypoint.sh
Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens:
$ git clone --depth=1 https://github.com/torvalds/linux
$ cd linux
$ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh
And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop:
$ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run
Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1':
$ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal
[ Where problems could occur ]
The patches being backported have been part of upstream for a while now (since version 8.1.0). Two of the patches are relatively trivial and deal only with code movement/reorganization; the actual fix (https://gitlab.com/qemu-project/qemu/commit/deba78709a) is a bit more involved. Care has been taken to make sure that a subsequent fix is also part of the backport, but there is always a small possibility to see regressions with more complicated patches. Nevertheless, it's important to mention that this fix only touches TCG-related code, which means that hardware-backed virtualization (by KVM) will not be affected.
[ Original Description ]
I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available.
When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g.
------------- 8< -------------
[ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI
[ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1
[ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Call Trace:
[ 45.505547] <IRQ>
[ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421)
[ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762)
[ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __netif_rx (net/core/dev.c:5084)
[ 45.505547] veth_xmit (drivers/net/veth.c:321)
[ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989)
[ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367)
[ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783)
[ 45.505547] ? eth_header (net/ethernet/eth.c:85)
[ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812)
[ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939)
[ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440)
[ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779)
[ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537)
[ 45.505547] process_backlog (include/linux/rcupdate.h:779)
[ 45.505547] __napi_poll (net/core/dev.c:6576)
[ 45.505547] net_rx_action (net/core/dev.c:6647)
[ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] do_softirq (kernel/softirq.c:454)
[ 45.505547] </IRQ>
[ 45.505547] <TASK>
[ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381)
[ 45.505547] __dev_queue_xmit (net/core/dev.c:4379)
[ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171)
[ 45.505547] ? ip6_output (include/linux/netfilter.h:301)
[ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208)
[ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953)
[ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584)
[ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373)
[ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42)
[ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206)
[ 45.505547] ? set_pte_range (mm/memory.c:4529)
[ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699)
[ 45.505547] ? __sock_sendmsg (net/socket.c:733)
[ 45.505547] __sock_sendmsg (net/socket.c:733)
[ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253)
[ 45.505547] __sys_sendto (net/socket.c:2191)
[ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566)
[ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] __x64_sys_sendto (net/socket.c:2203)
[ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52)
[ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 45.505547] RIP: 0033:0x7fa1d099ca0a
[ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
All code
========
0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
b: eb b8 jmp 0xffffffffffffffc5
d: 0f 1f 00 nopl (%rax)
10: f3 0f 1e fa endbr64
14: 41 89 ca mov %ecx,%r10d
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 15 jne 0x38
23: b8 2c 00 00 00 mov $0x2c,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 7e ja 0xb0
32: c3 ret
33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
38: 41 54 push %r12
3a: 48 83 ec 30 sub $0x30,%rsp
3e: 44 rex.R
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 7e ja 0x86
8: c3 ret
9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
e: 41 54 push %r12
10: 48 83 ec 30 sub $0x30,%rsp
14: 44 rex.R
15: 89 .byte 0x89
[ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a
[ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003
[ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c
[ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20
[ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090
[ 45.505547] </TASK>
[ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit
[ 45.505547] ---[ end trace 0000000000000000 ]---
[ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27)
[ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11
All code
========
0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
7: 00
8: 0f 1f 40 00 nopl 0x0(%rax)
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 55 push %rbp
12: 48 89 fd mov %rdi,%rbp
15: 48 83 ec 20 sub $0x20,%rsp
19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax
20: 00 00
22: 48 89 44 24 18 mov %rax,0x18(%rsp)
27: 31 c0 xor %eax,%eax
29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction
2e: 66 90 xchg %ax,%ax
30: 66 90 xchg %ax,%ax
32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
37: 48 89 ef mov %rbp,%rdi
3a: 65 gs
3b: 8b .byte 0x8b
3c: 35 .byte 0x35
3d: 17 (bad)
3e: 9d popf
3f: 11 .byte 0x11
Code starting with the faulting instruction
===========================================
0: c9 leave
1: 00 00 add %al,(%rax)
3: 00 66 90 add %ah,-0x70(%rsi)
6: 66 90 xchg %ax,%ax
8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx
d: 48 89 ef mov %rbp,%rdi
10: 65 gs
11: 8b .byte 0x8b
12: 35 .byte 0x35
13: 17 (bad)
14: 9d popf
15: 11 .byte 0x11
[ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246
[ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000
[ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400
[ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068
[ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400
[ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000
[ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000
[ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0
[ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt
[ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
------------- 8< -------------
For more debug info:
https://github.com/multipath-tcp/mptcp_net-next/issues/471
The crashes happen in 'jump label' code.
I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:
https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/
Steven Rostedt said:
> The real problem is that qemu does not seem to be honoring the memory
> barriers of an interrupt. The reason the code does the ipi's is to
> force a full memory barrier across all CPUs so that they all see the
> same memory before going forward to the next step.
>
> My guess is that qemu does not treat the IPI being sent as a memory
> barrier, and then the CPUs do not see a consistent memory view after
> the IPIs are sent. That's a bug in qemu!
>
> More specifically, I bet qemu may be doing a dcache barrier, but not an
> icache barrier in the interrupt. If the code is already in qemu's
> pipeline, it may not be flushing it like real hardware would do.
>
> This should be reported to the qemu community and should be fixed
> there. In the mean time, feel free to use Masami's patch in your local
> repo until qemu is fixed, but it should not be added to Linux mainline.
And Masami Hiramatsu said:
> If KVM works well, I agree that this is a qemu
> TCG's bug. I guess TCG implementation forgets to serialize CPU when the
> IPI comes.
> if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling
> Self- and Cross-Modifying Code" said that what the other CPU needs to
> do is "Execute serializing instruction; (* For example, CPUID
> instruction *)" for cross-modifying code. that has been done in
> do_sync_core(). Thus this bug should not happen.
I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. |
|
2024-02-20 21:03:57 |
Sergio Durigan Junior |
qemu (Ubuntu Mantic): status |
Triaged |
In Progress |
|
2024-02-20 21:43:05 |
Ubuntu Archive Robot |
bug |
|
|
added subscriber Sergio Durigan Junior |
2024-02-22 21:14:00 |
Andreas Hasenack |
qemu (Ubuntu Mantic): status |
In Progress |
Fix Committed |
|
2024-02-22 21:14:01 |
Andreas Hasenack |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2024-02-22 21:14:03 |
Andreas Hasenack |
bug |
|
|
added subscriber SRU Verification |
2024-02-22 21:14:07 |
Andreas Hasenack |
tags |
server-todo |
server-todo verification-needed verification-needed-mantic |
|
2024-02-27 03:17:42 |
Sergio Durigan Junior |
tags |
server-todo verification-needed verification-needed-mantic |
server-todo verification-done verification-done-mantic |
|
2024-03-07 16:51:58 |
Launchpad Janitor |
qemu (Ubuntu Mantic): status |
Fix Committed |
Fix Released |
|
2024-03-07 16:52:03 |
Andreas Hasenack |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2024-03-20 15:18:18 |
Bryce Harrington |
tags |
server-todo verification-done verification-done-mantic |
verification-done verification-done-mantic |
|