Activity log for bug #2051965

Date Who What changed Old value New value Message
2024-02-01 17:32:48 Matthieu Baerts bug added bug
2024-02-02 06:31:06 Christian Ehrhardt  qemu (Ubuntu): status New Incomplete
2024-02-07 03:23:27 Sergio Durigan Junior bug added subscriber Ubuntu Server
2024-02-07 03:23:33 Sergio Durigan Junior tags server-todo
2024-02-14 01:22:25 Sergio Durigan Junior nominated for series Ubuntu Jammy
2024-02-14 01:22:25 Sergio Durigan Junior bug task added qemu (Ubuntu Jammy)
2024-02-14 01:22:25 Sergio Durigan Junior nominated for series Ubuntu Mantic
2024-02-14 01:22:25 Sergio Durigan Junior bug task added qemu (Ubuntu Mantic)
2024-02-14 01:22:30 Sergio Durigan Junior qemu (Ubuntu): status Incomplete Fix Released
2024-02-14 01:22:35 Sergio Durigan Junior qemu (Ubuntu Jammy): status New Triaged
2024-02-14 01:22:37 Sergio Durigan Junior qemu (Ubuntu Mantic): status New Triaged
2024-02-14 01:22:40 Sergio Durigan Junior qemu (Ubuntu Jammy): assignee Sergio Durigan Junior (sergiodj)
2024-02-14 01:22:43 Sergio Durigan Junior qemu (Ubuntu Mantic): assignee Sergio Durigan Junior (sergiodj)
2024-02-14 01:35:26 Sergio Durigan Junior description I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available. When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g. ------------- 8< ------------- [ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI [ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1 [ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ======== 0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 7: 00 8: 0f 1f 40 00 nopl 0x0(%rax) c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 11: 55 push %rbp 12: 48 89 fd mov %rdi,%rbp 15: 48 83 ec 20 sub $0x20,%rsp 19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax 20: 00 00 22: 48 89 44 24 18 mov %rax,0x18(%rsp) 27: 31 c0 xor %eax,%eax 29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction 2e: 66 90 xchg %ax,%ax 30: 66 90 xchg %ax,%ax 32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx 37: 48 89 ef mov %rbp,%rdi 3a: 65 gs 3b: 8b .byte 0x8b 3c: 35 .byte 0x35 3d: 17 (bad) 3e: 9d popf 3f: 11 .byte 0x11 Code starting with the faulting instruction =========================================== 0: c9 leave 1: 00 00 add %al,(%rax) 3: 00 66 90 add %ah,-0x70(%rsi) 6: 66 90 xchg %ax,%ax 8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx d: 48 89 ef mov %rbp,%rdi 10: 65 gs 11: 8b .byte 0x8b 12: 35 .byte 0x35 13: 17 (bad) 14: 9d popf 15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Call Trace: [ 45.505547] <IRQ> [ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421) [ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762) [ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __netif_rx (net/core/dev.c:5084) [ 45.505547] veth_xmit (drivers/net/veth.c:321) [ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989) [ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367) [ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783) [ 45.505547] ? eth_header (net/ethernet/eth.c:85) [ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812) [ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440) [ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779) [ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537) [ 45.505547] process_backlog (include/linux/rcupdate.h:779) [ 45.505547] __napi_poll (net/core/dev.c:6576) [ 45.505547] net_rx_action (net/core/dev.c:6647) [ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] do_softirq (kernel/softirq.c:454) [ 45.505547] </IRQ> [ 45.505547] <TASK> [ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381) [ 45.505547] __dev_queue_xmit (net/core/dev.c:4379) [ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584) [ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373) [ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42) [ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206) [ 45.505547] ? set_pte_range (mm/memory.c:4529) [ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699) [ 45.505547] ? __sock_sendmsg (net/socket.c:733) [ 45.505547] __sock_sendmsg (net/socket.c:733) [ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253) [ 45.505547] __sys_sendto (net/socket.c:2191) [ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566) [ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __x64_sys_sendto (net/socket.c:2203) [ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52) [ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) [ 45.505547] RIP: 0033:0x7fa1d099ca0a [ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 All code ======== 0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4) 4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax b: eb b8 jmp 0xffffffffffffffc5 d: 0f 1f 00 nopl (%rax) 10: f3 0f 1e fa endbr64 14: 41 89 ca mov %ecx,%r10d 17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax 1e: 00 1f: 85 c0 test %eax,%eax 21: 75 15 jne 0x38 23: b8 2c 00 00 00 mov $0x2c,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 7e ja 0xb0 32: c3 ret 33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 38: 41 54 push %r12 3a: 48 83 ec 30 sub $0x30,%rsp 3e: 44 rex.R 3f: 89 .byte 0x89 Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 7e ja 0x86 8: c3 ret 9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) e: 41 54 push %r12 10: 48 83 ec 30 sub $0x30,%rsp 14: 44 rex.R 15: 89 .byte 0x89 [ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a [ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003 [ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c [ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20 [ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090 [ 45.505547] </TASK> [ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit [ 45.505547] ---[ end trace 0000000000000000 ]--- [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ======== 0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 7: 00 8: 0f 1f 40 00 nopl 0x0(%rax) c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 11: 55 push %rbp 12: 48 89 fd mov %rdi,%rbp 15: 48 83 ec 20 sub $0x20,%rsp 19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax 20: 00 00 22: 48 89 44 24 18 mov %rax,0x18(%rsp) 27: 31 c0 xor %eax,%eax 29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction 2e: 66 90 xchg %ax,%ax 30: 66 90 xchg %ax,%ax 32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx 37: 48 89 ef mov %rbp,%rdi 3a: 65 gs 3b: 8b .byte 0x8b 3c: 35 .byte 0x35 3d: 17 (bad) 3e: 9d popf 3f: 11 .byte 0x11 Code starting with the faulting instruction =========================================== 0: c9 leave 1: 00 00 add %al,(%rax) 3: 00 66 90 add %ah,-0x70(%rsi) 6: 66 90 xchg %ax,%ax 8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx d: 48 89 ef mov %rbp,%rdi 10: 65 gs 11: 8b .byte 0x8b 12: 35 .byte 0x35 13: 17 (bad) 14: 9d popf 15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt [ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ------------- 8< ------------- For more debug info: https://github.com/multipath-tcp/mptcp_net-next/issues/471 The crashes happen in 'jump label' code. I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue: https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/ Steven Rostedt said: > The real problem is that qemu does not seem to be honoring the memory > barriers of an interrupt. The reason the code does the ipi's is to > force a full memory barrier across all CPUs so that they all see the > same memory before going forward to the next step. > > My guess is that qemu does not treat the IPI being sent as a memory > barrier, and then the CPUs do not see a consistent memory view after > the IPIs are sent. That's a bug in qemu! > > More specifically, I bet qemu may be doing a dcache barrier, but not an > icache barrier in the interrupt. If the code is already in qemu's > pipeline, it may not be flushing it like real hardware would do. > > This should be reported to the qemu community and should be fixed > there. In the mean time, feel free to use Masami's patch in your local > repo until qemu is fixed, but it should not be added to Linux mainline. And Masami Hiramatsu said: > If KVM works well, I agree that this is a qemu > TCG's bug. I guess TCG implementation forgets to serialize CPU when the > IPI comes. > if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling > Self- and Cross-Modifying Code" said that what the other CPU needs to > do is "Execute serializing instruction; (* For example, CPUID > instruction *)" for cross-modifying code. that has been done in > do_sync_core(). Thus this bug should not happen. I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. [ Impact ] TBD. [ Test Plan ] It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions. First, you need a machine running Jammy or Mantic. Then: $ sudo apt-get update && \ DEBIAN_FRONTEND=noninteractive \ sudo apt-get install -y --no-install-recommends \ build-essential libncurses5-dev gcc libssl-dev bc bison \ libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \ python3 python3-pkg-resources busybox \ iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \ ca-certificates gnupg2 net-tools kmod \ libdbus-1-dev libnl-genl-3-dev libibverbs-dev \ tcpdump \ pkg-config libmnl-dev \ clang lld llvm llvm-dev libcap-dev \ gdb crash dwarves strace \ iptables ebtables nftables vim psmisc bash-completion less jq \ gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \ libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \ libtap-formatter-junit-perl \ zstd \ wget xz-utils lftp cpio u-boot-tools \ cscope \ bpftrace Download the virtme project, which is simply a wrapper around QEMU: $ git clone https://github.com/matttbe/virtme.git Modify it not to use KVM: $ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py $ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh: $ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh $ chmod +x entrypoint.sh Remove one line not to write stuff in /etc/hosts: $ sed -i '/prepare_hosts_file$/d' entrypoint.sh Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens: $ git clone --depth=1 https://github.com/torvalds/linux $ cd linux $ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop: $ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1': $ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal [ Where problems could occur ] TBD. [ Original Description ] I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available. When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g. ------------- 8< ------------- [ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI [ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1 [ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Call Trace: [ 45.505547] <IRQ> [ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421) [ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762) [ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __netif_rx (net/core/dev.c:5084) [ 45.505547] veth_xmit (drivers/net/veth.c:321) [ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989) [ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367) [ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783) [ 45.505547] ? eth_header (net/ethernet/eth.c:85) [ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812) [ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440) [ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779) [ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537) [ 45.505547] process_backlog (include/linux/rcupdate.h:779) [ 45.505547] __napi_poll (net/core/dev.c:6576) [ 45.505547] net_rx_action (net/core/dev.c:6647) [ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] do_softirq (kernel/softirq.c:454) [ 45.505547] </IRQ> [ 45.505547] <TASK> [ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381) [ 45.505547] __dev_queue_xmit (net/core/dev.c:4379) [ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584) [ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373) [ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42) [ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206) [ 45.505547] ? set_pte_range (mm/memory.c:4529) [ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699) [ 45.505547] ? __sock_sendmsg (net/socket.c:733) [ 45.505547] __sock_sendmsg (net/socket.c:733) [ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253) [ 45.505547] __sys_sendto (net/socket.c:2191) [ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566) [ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __x64_sys_sendto (net/socket.c:2203) [ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52) [ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) [ 45.505547] RIP: 0033:0x7fa1d099ca0a [ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 All code ========    0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)    4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax    b: eb b8 jmp 0xffffffffffffffc5    d: 0f 1f 00 nopl (%rax)   10: f3 0f 1e fa endbr64   14: 41 89 ca mov %ecx,%r10d   17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax   1e: 00   1f: 85 c0 test %eax,%eax   21: 75 15 jne 0x38   23: b8 2c 00 00 00 mov $0x2c,%eax   28: 0f 05 syscall   2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction   30: 77 7e ja 0xb0   32: c3 ret   33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   38: 41 54 push %r12   3a: 48 83 ec 30 sub $0x30,%rsp   3e: 44 rex.R   3f: 89 .byte 0x89 Code starting with the faulting instruction ===========================================    0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax    6: 77 7e ja 0x86    8: c3 ret    9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)    e: 41 54 push %r12   10: 48 83 ec 30 sub $0x30,%rsp   14: 44 rex.R   15: 89 .byte 0x89 [ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a [ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003 [ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c [ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20 [ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090 [ 45.505547] </TASK> [ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit [ 45.505547] ---[ end trace 0000000000000000 ]--- [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt [ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ------------- 8< ------------- For more debug info:   https://github.com/multipath-tcp/mptcp_net-next/issues/471 The crashes happen in 'jump label' code. I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:   https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/ Steven Rostedt said: > The real problem is that qemu does not seem to be honoring the memory > barriers of an interrupt. The reason the code does the ipi's is to > force a full memory barrier across all CPUs so that they all see the > same memory before going forward to the next step. > > My guess is that qemu does not treat the IPI being sent as a memory > barrier, and then the CPUs do not see a consistent memory view after > the IPIs are sent. That's a bug in qemu! > > More specifically, I bet qemu may be doing a dcache barrier, but not an > icache barrier in the interrupt. If the code is already in qemu's > pipeline, it may not be flushing it like real hardware would do. > > This should be reported to the qemu community and should be fixed > there. In the mean time, feel free to use Masami's patch in your local > repo until qemu is fixed, but it should not be added to Linux mainline. And Masami Hiramatsu said: > If KVM works well, I agree that this is a qemu > TCG's bug. I guess TCG implementation forgets to serialize CPU when the > IPI comes. > if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling > Self- and Cross-Modifying Code" said that what the other CPU needs to > do is "Execute serializing instruction; (* For example, CPUID > instruction *)" for cross-modifying code. that has been done in > do_sync_core(). Thus this bug should not happen. I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x.
2024-02-14 01:36:50 Sergio Durigan Junior description [ Impact ] TBD. [ Test Plan ] It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions. First, you need a machine running Jammy or Mantic. Then: $ sudo apt-get update && \ DEBIAN_FRONTEND=noninteractive \ sudo apt-get install -y --no-install-recommends \ build-essential libncurses5-dev gcc libssl-dev bc bison \ libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \ python3 python3-pkg-resources busybox \ iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \ ca-certificates gnupg2 net-tools kmod \ libdbus-1-dev libnl-genl-3-dev libibverbs-dev \ tcpdump \ pkg-config libmnl-dev \ clang lld llvm llvm-dev libcap-dev \ gdb crash dwarves strace \ iptables ebtables nftables vim psmisc bash-completion less jq \ gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \ libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \ libtap-formatter-junit-perl \ zstd \ wget xz-utils lftp cpio u-boot-tools \ cscope \ bpftrace Download the virtme project, which is simply a wrapper around QEMU: $ git clone https://github.com/matttbe/virtme.git Modify it not to use KVM: $ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py $ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh: $ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh $ chmod +x entrypoint.sh Remove one line not to write stuff in /etc/hosts: $ sed -i '/prepare_hosts_file$/d' entrypoint.sh Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens: $ git clone --depth=1 https://github.com/torvalds/linux $ cd linux $ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop: $ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1': $ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal [ Where problems could occur ] TBD. [ Original Description ] I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available. When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g. ------------- 8< ------------- [ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI [ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1 [ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Call Trace: [ 45.505547] <IRQ> [ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421) [ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762) [ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __netif_rx (net/core/dev.c:5084) [ 45.505547] veth_xmit (drivers/net/veth.c:321) [ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989) [ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367) [ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783) [ 45.505547] ? eth_header (net/ethernet/eth.c:85) [ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812) [ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440) [ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779) [ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537) [ 45.505547] process_backlog (include/linux/rcupdate.h:779) [ 45.505547] __napi_poll (net/core/dev.c:6576) [ 45.505547] net_rx_action (net/core/dev.c:6647) [ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] do_softirq (kernel/softirq.c:454) [ 45.505547] </IRQ> [ 45.505547] <TASK> [ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381) [ 45.505547] __dev_queue_xmit (net/core/dev.c:4379) [ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584) [ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373) [ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42) [ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206) [ 45.505547] ? set_pte_range (mm/memory.c:4529) [ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699) [ 45.505547] ? __sock_sendmsg (net/socket.c:733) [ 45.505547] __sock_sendmsg (net/socket.c:733) [ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253) [ 45.505547] __sys_sendto (net/socket.c:2191) [ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566) [ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __x64_sys_sendto (net/socket.c:2203) [ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52) [ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) [ 45.505547] RIP: 0033:0x7fa1d099ca0a [ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 All code ========    0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)    4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax    b: eb b8 jmp 0xffffffffffffffc5    d: 0f 1f 00 nopl (%rax)   10: f3 0f 1e fa endbr64   14: 41 89 ca mov %ecx,%r10d   17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax   1e: 00   1f: 85 c0 test %eax,%eax   21: 75 15 jne 0x38   23: b8 2c 00 00 00 mov $0x2c,%eax   28: 0f 05 syscall   2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction   30: 77 7e ja 0xb0   32: c3 ret   33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   38: 41 54 push %r12   3a: 48 83 ec 30 sub $0x30,%rsp   3e: 44 rex.R   3f: 89 .byte 0x89 Code starting with the faulting instruction ===========================================    0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax    6: 77 7e ja 0x86    8: c3 ret    9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)    e: 41 54 push %r12   10: 48 83 ec 30 sub $0x30,%rsp   14: 44 rex.R   15: 89 .byte 0x89 [ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a [ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003 [ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c [ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20 [ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090 [ 45.505547] </TASK> [ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit [ 45.505547] ---[ end trace 0000000000000000 ]--- [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt [ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ------------- 8< ------------- For more debug info:   https://github.com/multipath-tcp/mptcp_net-next/issues/471 The crashes happen in 'jump label' code. I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:   https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/ Steven Rostedt said: > The real problem is that qemu does not seem to be honoring the memory > barriers of an interrupt. The reason the code does the ipi's is to > force a full memory barrier across all CPUs so that they all see the > same memory before going forward to the next step. > > My guess is that qemu does not treat the IPI being sent as a memory > barrier, and then the CPUs do not see a consistent memory view after > the IPIs are sent. That's a bug in qemu! > > More specifically, I bet qemu may be doing a dcache barrier, but not an > icache barrier in the interrupt. If the code is already in qemu's > pipeline, it may not be flushing it like real hardware would do. > > This should be reported to the qemu community and should be fixed > there. In the mean time, feel free to use Masami's patch in your local > repo until qemu is fixed, but it should not be added to Linux mainline. And Masami Hiramatsu said: > If KVM works well, I agree that this is a qemu > TCG's bug. I guess TCG implementation forgets to serialize CPU when the > IPI comes. > if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling > Self- and Cross-Modifying Code" said that what the other CPU needs to > do is "Execute serializing instruction; (* For example, CPUID > instruction *)" for cross-modifying code. that has been done in > do_sync_core(). Thus this bug should not happen. I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. [ Impact ] TBD. [ Test Plan ] It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions. First, you need a machine running Jammy or Mantic. Then: $ sudo apt-get update && \  DEBIAN_FRONTEND=noninteractive \  sudo apt-get install -y --no-install-recommends \   build-essential libncurses5-dev gcc libssl-dev bc bison \   libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \   python3 python3-pkg-resources busybox \   iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \   ca-certificates gnupg2 net-tools kmod \   libdbus-1-dev libnl-genl-3-dev libibverbs-dev \   tcpdump \   pkg-config libmnl-dev \   clang lld llvm llvm-dev libcap-dev \   gdb crash dwarves strace \   iptables ebtables nftables vim psmisc bash-completion less jq \   gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \   libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \   libtap-formatter-junit-perl \   zstd \   wget xz-utils lftp cpio u-boot-tools \   cscope \   bpftrace Download the virtme project, which is simply a wrapper around QEMU: $ git clone https://github.com/matttbe/virtme.git Modify it not to use KVM: $ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py $ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh: $ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh $ chmod +x entrypoint.sh Remove one line not to write stuff in /etc/hosts: $ sed -i '/prepare_hosts_file$/d' entrypoint.sh Point the script to virtme's location: $ sed -i "s@/opt/virtme@${HOME}/virtme@" entrypoint.sh Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens: $ git clone --depth=1 https://github.com/torvalds/linux $ cd linux $ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop: $ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1': $ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal [ Where problems could occur ] TBD. [ Original Description ] I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available. When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g. ------------- 8< ------------- [ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI [ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1 [ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Call Trace: [ 45.505547] <IRQ> [ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421) [ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762) [ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __netif_rx (net/core/dev.c:5084) [ 45.505547] veth_xmit (drivers/net/veth.c:321) [ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989) [ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367) [ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783) [ 45.505547] ? eth_header (net/ethernet/eth.c:85) [ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812) [ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440) [ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779) [ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537) [ 45.505547] process_backlog (include/linux/rcupdate.h:779) [ 45.505547] __napi_poll (net/core/dev.c:6576) [ 45.505547] net_rx_action (net/core/dev.c:6647) [ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] do_softirq (kernel/softirq.c:454) [ 45.505547] </IRQ> [ 45.505547] <TASK> [ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381) [ 45.505547] __dev_queue_xmit (net/core/dev.c:4379) [ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584) [ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373) [ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42) [ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206) [ 45.505547] ? set_pte_range (mm/memory.c:4529) [ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699) [ 45.505547] ? __sock_sendmsg (net/socket.c:733) [ 45.505547] __sock_sendmsg (net/socket.c:733) [ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253) [ 45.505547] __sys_sendto (net/socket.c:2191) [ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566) [ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __x64_sys_sendto (net/socket.c:2203) [ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52) [ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) [ 45.505547] RIP: 0033:0x7fa1d099ca0a [ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 All code ========    0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)    4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax    b: eb b8 jmp 0xffffffffffffffc5    d: 0f 1f 00 nopl (%rax)   10: f3 0f 1e fa endbr64   14: 41 89 ca mov %ecx,%r10d   17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax   1e: 00   1f: 85 c0 test %eax,%eax   21: 75 15 jne 0x38   23: b8 2c 00 00 00 mov $0x2c,%eax   28: 0f 05 syscall   2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction   30: 77 7e ja 0xb0   32: c3 ret   33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   38: 41 54 push %r12   3a: 48 83 ec 30 sub $0x30,%rsp   3e: 44 rex.R   3f: 89 .byte 0x89 Code starting with the faulting instruction ===========================================    0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax    6: 77 7e ja 0x86    8: c3 ret    9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)    e: 41 54 push %r12   10: 48 83 ec 30 sub $0x30,%rsp   14: 44 rex.R   15: 89 .byte 0x89 [ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a [ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003 [ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c [ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20 [ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090 [ 45.505547] </TASK> [ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit [ 45.505547] ---[ end trace 0000000000000000 ]--- [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt [ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ------------- 8< ------------- For more debug info:   https://github.com/multipath-tcp/mptcp_net-next/issues/471 The crashes happen in 'jump label' code. I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:   https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/ Steven Rostedt said: > The real problem is that qemu does not seem to be honoring the memory > barriers of an interrupt. The reason the code does the ipi's is to > force a full memory barrier across all CPUs so that they all see the > same memory before going forward to the next step. > > My guess is that qemu does not treat the IPI being sent as a memory > barrier, and then the CPUs do not see a consistent memory view after > the IPIs are sent. That's a bug in qemu! > > More specifically, I bet qemu may be doing a dcache barrier, but not an > icache barrier in the interrupt. If the code is already in qemu's > pipeline, it may not be flushing it like real hardware would do. > > This should be reported to the qemu community and should be fixed > there. In the mean time, feel free to use Masami's patch in your local > repo until qemu is fixed, but it should not be added to Linux mainline. And Masami Hiramatsu said: > If KVM works well, I agree that this is a qemu > TCG's bug. I guess TCG implementation forgets to serialize CPU when the > IPI comes. > if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling > Self- and Cross-Modifying Code" said that what the other CPU needs to > do is "Execute serializing instruction; (* For example, CPUID > instruction *)" for cross-modifying code. that has been done in > do_sync_core(). Thus this bug should not happen. I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x.
2024-02-14 03:51:23 Launchpad Janitor merge proposal linked https://code.launchpad.net/~sergiodj/ubuntu/+source/qemu/+git/qemu/+merge/460476
2024-02-14 10:29:49 Matthieu Baerts summary QEmu with TCG acceleration (without KVM) causes kernel panics with kernels >=6.3 QEmu with TCG acceleration (without KVM) causes kernel panics with guest kernels >=6.3
2024-02-20 21:01:44 Sergio Durigan Junior description [ Impact ] TBD. [ Test Plan ] It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions. First, you need a machine running Jammy or Mantic. Then: $ sudo apt-get update && \  DEBIAN_FRONTEND=noninteractive \  sudo apt-get install -y --no-install-recommends \   build-essential libncurses5-dev gcc libssl-dev bc bison \   libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \   python3 python3-pkg-resources busybox \   iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \   ca-certificates gnupg2 net-tools kmod \   libdbus-1-dev libnl-genl-3-dev libibverbs-dev \   tcpdump \   pkg-config libmnl-dev \   clang lld llvm llvm-dev libcap-dev \   gdb crash dwarves strace \   iptables ebtables nftables vim psmisc bash-completion less jq \   gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \   libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \   libtap-formatter-junit-perl \   zstd \   wget xz-utils lftp cpio u-boot-tools \   cscope \   bpftrace Download the virtme project, which is simply a wrapper around QEMU: $ git clone https://github.com/matttbe/virtme.git Modify it not to use KVM: $ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py $ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh: $ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh $ chmod +x entrypoint.sh Remove one line not to write stuff in /etc/hosts: $ sed -i '/prepare_hosts_file$/d' entrypoint.sh Point the script to virtme's location: $ sed -i "s@/opt/virtme@${HOME}/virtme@" entrypoint.sh Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens: $ git clone --depth=1 https://github.com/torvalds/linux $ cd linux $ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop: $ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1': $ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal [ Where problems could occur ] TBD. [ Original Description ] I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available. When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g. ------------- 8< ------------- [ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI [ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1 [ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Call Trace: [ 45.505547] <IRQ> [ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421) [ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762) [ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __netif_rx (net/core/dev.c:5084) [ 45.505547] veth_xmit (drivers/net/veth.c:321) [ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989) [ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367) [ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783) [ 45.505547] ? eth_header (net/ethernet/eth.c:85) [ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812) [ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440) [ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779) [ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537) [ 45.505547] process_backlog (include/linux/rcupdate.h:779) [ 45.505547] __napi_poll (net/core/dev.c:6576) [ 45.505547] net_rx_action (net/core/dev.c:6647) [ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] do_softirq (kernel/softirq.c:454) [ 45.505547] </IRQ> [ 45.505547] <TASK> [ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381) [ 45.505547] __dev_queue_xmit (net/core/dev.c:4379) [ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584) [ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373) [ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42) [ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206) [ 45.505547] ? set_pte_range (mm/memory.c:4529) [ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699) [ 45.505547] ? __sock_sendmsg (net/socket.c:733) [ 45.505547] __sock_sendmsg (net/socket.c:733) [ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253) [ 45.505547] __sys_sendto (net/socket.c:2191) [ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566) [ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __x64_sys_sendto (net/socket.c:2203) [ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52) [ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) [ 45.505547] RIP: 0033:0x7fa1d099ca0a [ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 All code ========    0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)    4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax    b: eb b8 jmp 0xffffffffffffffc5    d: 0f 1f 00 nopl (%rax)   10: f3 0f 1e fa endbr64   14: 41 89 ca mov %ecx,%r10d   17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax   1e: 00   1f: 85 c0 test %eax,%eax   21: 75 15 jne 0x38   23: b8 2c 00 00 00 mov $0x2c,%eax   28: 0f 05 syscall   2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction   30: 77 7e ja 0xb0   32: c3 ret   33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   38: 41 54 push %r12   3a: 48 83 ec 30 sub $0x30,%rsp   3e: 44 rex.R   3f: 89 .byte 0x89 Code starting with the faulting instruction ===========================================    0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax    6: 77 7e ja 0x86    8: c3 ret    9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)    e: 41 54 push %r12   10: 48 83 ec 30 sub $0x30,%rsp   14: 44 rex.R   15: 89 .byte 0x89 [ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a [ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003 [ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c [ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20 [ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090 [ 45.505547] </TASK> [ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit [ 45.505547] ---[ end trace 0000000000000000 ]--- [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt [ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ------------- 8< ------------- For more debug info:   https://github.com/multipath-tcp/mptcp_net-next/issues/471 The crashes happen in 'jump label' code. I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:   https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/ Steven Rostedt said: > The real problem is that qemu does not seem to be honoring the memory > barriers of an interrupt. The reason the code does the ipi's is to > force a full memory barrier across all CPUs so that they all see the > same memory before going forward to the next step. > > My guess is that qemu does not treat the IPI being sent as a memory > barrier, and then the CPUs do not see a consistent memory view after > the IPIs are sent. That's a bug in qemu! > > More specifically, I bet qemu may be doing a dcache barrier, but not an > icache barrier in the interrupt. If the code is already in qemu's > pipeline, it may not be flushing it like real hardware would do. > > This should be reported to the qemu community and should be fixed > there. In the mean time, feel free to use Masami's patch in your local > repo until qemu is fixed, but it should not be added to Linux mainline. And Masami Hiramatsu said: > If KVM works well, I agree that this is a qemu > TCG's bug. I guess TCG implementation forgets to serialize CPU when the > IPI comes. > if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling > Self- and Cross-Modifying Code" said that what the other CPU needs to > do is "Execute serializing instruction; (* For example, CPUID > instruction *)" for cross-modifying code. that has been done in > do_sync_core(). Thus this bug should not happen. I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x. [ Impact ] QEMU users on Jammy/Mantic who choose to use TCG-based acceleration instead of KVM-based might encounter kernel panics when the guest Linux kernel version is greater than or equal to 6.3. [ Test Plan ] It can be somewhat tricky to trigger the problem, but Matthieu provided great instructions. First, you need a machine running Jammy or Mantic. Then: $ sudo apt-get update && \  DEBIAN_FRONTEND=noninteractive \  sudo apt-get install -y --no-install-recommends \   build-essential libncurses5-dev gcc libssl-dev bc bison \   libelf-dev flex git curl tar hashalot qemu-kvm sudo expect \   python3 python3-pkg-resources busybox \   iputils-ping ethtool klibc-utils kbd rsync ccache netcat-openbsd \   ca-certificates gnupg2 net-tools kmod \   libdbus-1-dev libnl-genl-3-dev libibverbs-dev \   tcpdump \   pkg-config libmnl-dev \   clang lld llvm llvm-dev libcap-dev \   gdb crash dwarves strace \   iptables ebtables nftables vim psmisc bash-completion less jq \   gettext-base libevent-dev libtraceevent-dev libnewt0.52 libslang2 libutempter0 python3-newt tmux \   libdwarf-dev libbfd-dev libnuma-dev libzstd-dev libunwind-dev libdw-dev libslang2-dev python3-dev python3-setuptools binutils-dev libiberty-dev libbabeltrace-dev systemtap-sdt-dev libperl-dev python3-docutils \   libtap-formatter-junit-perl \   zstd \   wget xz-utils lftp cpio u-boot-tools \   cscope \   bpftrace Download the virtme project, which is simply a wrapper around QEMU: $ git clone https://github.com/matttbe/virtme.git Modify it not to use KVM: $ sed -i 's/accel=kvm:tcg/accel=tcg/' ./virtme/virtme/commands/run.py $ sed -i 's@\(if is_native and os\.access.*\|ret\.extend...-cpu.*We can.t migrate regar.*\)@#\1@' ./virtme/virtme/architectures.py Download the test script from https://github.com/multipath-tcp/mptcp-upstream-virtme-docker/blob/latest/entrypoint.sh: $ wget https://raw.githubusercontent.com/multipath-tcp/mptcp-upstream-virtme-docker/latest/entrypoint.sh $ chmod +x entrypoint.sh Remove one line not to write stuff in /etc/hosts: $ sed -i '/prepare_hosts_file$/d' entrypoint.sh Point the script to virtme's location: $ sed -i "s@/opt/virtme@${HOME}/virtme@" entrypoint.sh Now, we have to download the Linux kernel source and modify one of its tests to stop when the crash happens: $ git clone --depth=1 https://github.com/torvalds/linux $ cd linux $ sed -i '/ping tests"/a exit $ret' tools/testing/selftests/net/mptcp/mptcp_connect.sh And now we create a configuration file instructing virtme to execute that specific test 250 times in a loop: $ echo 'run_loop_n 250 run_selftest_one mptcp_connect.sh' > .virtme-exec-run Finally, run it with 'sudo' (inside the Linux kernel source tree) and with 'INPUT_BUILD_SKIP_PACKETDRILL=1' + 'INPUT_BUILD_SKIP_PERF=1' + 'INPUT_NO_BLOCK=1': $ sudo INPUT_BUILD_SKIP_PACKETDRILL=1 INPUT_BUILD_SKIP_PERF=1 INPUT_NO_BLOCK=1 $HOME/entrypoint.sh auto-normal [ Where problems could occur ] The patches being backported have been part of upstream for a while now (since version 8.1.0). Two of the patches are relatively trivial and deal only with code movement/reorganization; the actual fix (https://gitlab.com/qemu-project/qemu/commit/deba78709a) is a bit more involved. Care has been taken to make sure that a subsequent fix is also part of the backport, but there is always a small possibility to see regressions with more complicated patches. Nevertheless, it's important to mention that this fix only touches TCG-related code, which means that hardware-backed virtualization (by KVM) will not be affected. [ Original Description ] I'm using QEmu 1:6.2+dfsg-2ubuntu6.16 from a docker container to run kernel tests. I was always using QEmu with KVM acceleration, until recently where it is not available. When TCG is used instead of KVM, I experienced regular crashes when using it in slow environments (e.g. GitHub Actions), e.g. ------------- 8< ------------- [ 45.505495] int3: 0000 [#1] PREEMPT SMP NOPTI [ 45.505547] CPU: 1 PID: 1070 Comm: ping Tainted: G N 6.7.0-g244ee3389ffe #1 [ 45.505547] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Call Trace: [ 45.505547] <IRQ> [ 45.505547] ? die (arch/x86/kernel/dumpstack.c:421) [ 45.505547] ? exc_int3 (arch/x86/kernel/traps.c:762) [ 45.505547] ? asm_exc_int3 (arch/x86/include/asm/idtentry.h:569) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] ? netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __netif_rx (net/core/dev.c:5084) [ 45.505547] veth_xmit (drivers/net/veth.c:321) [ 45.505547] dev_hard_start_xmit (include/linux/netdevice.h:4989) [ 45.505547] __dev_queue_xmit (include/linux/netdevice.h:3367) [ 45.505547] ? selinux_ip_postroute_compat (security/selinux/hooks.c:5783) [ 45.505547] ? eth_header (net/ethernet/eth.c:85) [ 45.505547] ip6_finish_output2 (include/net/neighbour.h:542) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] icmpv6_echo_reply (net/ipv6/icmp.c:812) [ 45.505547] ? icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] icmpv6_rcv (net/ipv6/icmp.c:939) [ 45.505547] ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:440) [ 45.505547] ip6_input_finish (include/linux/rcupdate.h:779) [ 45.505547] __netif_receive_skb_one_core (net/core/dev.c:5537) [ 45.505547] process_backlog (include/linux/rcupdate.h:779) [ 45.505547] __napi_poll (net/core/dev.c:6576) [ 45.505547] net_rx_action (net/core/dev.c:6647) [ 45.505547] __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] do_softirq (kernel/softirq.c:454) [ 45.505547] </IRQ> [ 45.505547] <TASK> [ 45.505547] __local_bh_enable_ip (kernel/softirq.c:381) [ 45.505547] __dev_queue_xmit (net/core/dev.c:4379) [ 45.505547] ip6_finish_output2 (include/linux/netdevice.h:3171) [ 45.505547] ? ip6_output (include/linux/netfilter.h:301) [ 45.505547] ? ip6_mtu (net/ipv6/route.c:3208) [ 45.505547] ip6_send_skb (net/ipv6/ip6_output.c:1953) [ 45.505547] rawv6_sendmsg (net/ipv6/raw.c:584) [ 45.505547] ? netfs_clear_subrequests (include/linux/list.h:373) [ 45.505547] ? netfs_alloc_request (fs/netfs/objects.c:42) [ 45.505547] ? folio_add_file_rmap_ptes (arch/x86/include/asm/bitops.h:206) [ 45.505547] ? set_pte_range (mm/memory.c:4529) [ 45.505547] ? next_uptodate_folio (include/linux/xarray.h:1699) [ 45.505547] ? __sock_sendmsg (net/socket.c:733) [ 45.505547] __sock_sendmsg (net/socket.c:733) [ 45.505547] ? move_addr_to_kernel.part.0 (net/socket.c:253) [ 45.505547] __sys_sendto (net/socket.c:2191) [ 45.505547] ? __hrtimer_run_queues (include/linux/seqlock.h:566) [ 45.505547] ? __do_softirq (arch/x86/include/asm/jump_label.h:27) [ 45.505547] __x64_sys_sendto (net/socket.c:2203) [ 45.505547] do_syscall_64 (arch/x86/entry/common.c:52) [ 45.505547] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) [ 45.505547] RIP: 0033:0x7fa1d099ca0a [ 45.505547] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 All code ========    0: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)    4: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax    b: eb b8 jmp 0xffffffffffffffc5    d: 0f 1f 00 nopl (%rax)   10: f3 0f 1e fa endbr64   14: 41 89 ca mov %ecx,%r10d   17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax   1e: 00   1f: 85 c0 test %eax,%eax   21: 75 15 jne 0x38   23: b8 2c 00 00 00 mov $0x2c,%eax   28: 0f 05 syscall   2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction   30: 77 7e ja 0xb0   32: c3 ret   33: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   38: 41 54 push %r12   3a: 48 83 ec 30 sub $0x30,%rsp   3e: 44 rex.R   3f: 89 .byte 0x89 Code starting with the faulting instruction ===========================================    0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax    6: 77 7e ja 0x86    8: c3 ret    9: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)    e: 41 54 push %r12   10: 48 83 ec 30 sub $0x30,%rsp   14: 44 rex.R   15: 89 .byte 0x89 [ 45.505547] RSP: 002b:00007ffe47710958 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 45.505547] RAX: ffffffffffffffda RBX: 00007ffe47712090 RCX: 00007fa1d099ca0a [ 45.505547] RDX: 0000000000000040 RSI: 0000559b91bbd300 RDI: 0000000000000003 [ 45.505547] RBP: 0000559b91bbd300 R08: 00007ffe477142a4 R09: 000000000000001c [ 45.505547] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe47711c20 [ 45.505547] R13: 0000000000000040 R14: 0000559b91bbf4f4 R15: 00007ffe47712090 [ 45.505547] </TASK> [ 45.505547] Modules linked in: mptcp_diag inet_diag mptcp_token_test mptcp_crypto_test kunit [ 45.505547] ---[ end trace 0000000000000000 ]--- [ 45.505547] RIP: 0010:netif_rx_internal (arch/x86/include/asm/jump_label.h:27) [ 45.505547] Code: 0f 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00 55 48 89 fd 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 e9 <c9> 00 00 00 66 90 66 90 48 8d 54 24 10 48 89 ef 65 8b 35 17 9d 11 All code ========    0: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)    7: 00    8: 0f 1f 40 00 nopl 0x0(%rax)    c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)   11: 55 push %rbp   12: 48 89 fd mov %rdi,%rbp   15: 48 83 ec 20 sub $0x20,%rsp   19: 65 48 8b 04 25 28 00 mov %gs:0x28,%rax   20: 00 00   22: 48 89 44 24 18 mov %rax,0x18(%rsp)   27: 31 c0 xor %eax,%eax   29:* e9 c9 00 00 00 jmp 0xf7 <-- trapping instruction   2e: 66 90 xchg %ax,%ax   30: 66 90 xchg %ax,%ax   32: 48 8d 54 24 10 lea 0x10(%rsp),%rdx   37: 48 89 ef mov %rbp,%rdi   3a: 65 gs   3b: 8b .byte 0x8b   3c: 35 .byte 0x35   3d: 17 (bad)   3e: 9d popf   3f: 11 .byte 0x11 Code starting with the faulting instruction ===========================================    0: c9 leave    1: 00 00 add %al,(%rax)    3: 00 66 90 add %ah,-0x70(%rsi)    6: 66 90 xchg %ax,%ax    8: 48 8d 54 24 10 lea 0x10(%rsp),%rdx    d: 48 89 ef mov %rbp,%rdi   10: 65 gs   11: 8b .byte 0x8b   12: 35 .byte 0x35   13: 17 (bad)   14: 9d popf   15: 11 .byte 0x11 [ 45.505547] RSP: 0018:ffffb106c00f0af8 EFLAGS: 00000246 [ 45.505547] RAX: 0000000000000000 RBX: ffff99918827b000 RCX: 0000000000000000 [ 45.505547] RDX: 000000000000000a RSI: ffff99918827d000 RDI: ffff9991819e6400 [ 45.505547] RBP: ffff9991819e6400 R08: 0000000000000000 R09: 0000000000000068 [ 45.505547] R10: ffff999181c104c0 R11: 736f6d6570736575 R12: ffff9991819e6400 [ 45.505547] R13: 0000000000000076 R14: 0000000000000000 R15: ffff99918827c000 [ 45.505547] FS: 00007fa1d06ca1c0(0000) GS:ffff9991fdc80000(0000) knlGS:0000000000000000 [ 45.505547] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 45.505547] CR2: 0000559b91aac240 CR3: 0000000004986000 CR4: 00000000000006f0 [ 45.505547] Kernel panic - not syncing: Fatal exception in interrupt [ 45.505547] Kernel Offset: 0x37600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ------------- 8< ------------- For more debug info:   https://github.com/multipath-tcp/mptcp_net-next/issues/471 The crashes happen in 'jump label' code. I reported the issue to kernel devs (x86 ML), and it looks like it is a QEmu issue:   https://lore.kernel.org/all/06cb540e-34ff-4dcd-b936-19d4d14378c9@kernel.org/T/ Steven Rostedt said: > The real problem is that qemu does not seem to be honoring the memory > barriers of an interrupt. The reason the code does the ipi's is to > force a full memory barrier across all CPUs so that they all see the > same memory before going forward to the next step. > > My guess is that qemu does not treat the IPI being sent as a memory > barrier, and then the CPUs do not see a consistent memory view after > the IPIs are sent. That's a bug in qemu! > > More specifically, I bet qemu may be doing a dcache barrier, but not an > icache barrier in the interrupt. If the code is already in qemu's > pipeline, it may not be flushing it like real hardware would do. > > This should be reported to the qemu community and should be fixed > there. In the mean time, feel free to use Masami's patch in your local > repo until qemu is fixed, but it should not be added to Linux mainline. And Masami Hiramatsu said: > If KVM works well, I agree that this is a qemu > TCG's bug. I guess TCG implementation forgets to serialize CPU when the > IPI comes. > if you need a reference, "Intel SDM Vol3A, 9.1.3 Handling > Self- and Cross-Modifying Code" said that what the other CPU needs to > do is "Execute serializing instruction; (* For example, CPUID > instruction *)" for cross-modifying code. that has been done in > do_sync_core(). Thus this bug should not happen. I wanted to report this to the QEmu community, but it looks like the v6.x versions are no longer supported. I also tested with the v8.2.0, and even if it is not easy to reproduce the crash locally, I didn't have any issues with it. I guess a fix has been done in newer versions, but not backported to v6.x.
2024-02-20 21:03:57 Sergio Durigan Junior qemu (Ubuntu Mantic): status Triaged In Progress
2024-02-20 21:43:05 Ubuntu Archive Robot bug added subscriber Sergio Durigan Junior
2024-02-22 21:14:00 Andreas Hasenack qemu (Ubuntu Mantic): status In Progress Fix Committed
2024-02-22 21:14:01 Andreas Hasenack bug added subscriber Ubuntu Stable Release Updates Team
2024-02-22 21:14:03 Andreas Hasenack bug added subscriber SRU Verification
2024-02-22 21:14:07 Andreas Hasenack tags server-todo server-todo verification-needed verification-needed-mantic
2024-02-27 03:17:42 Sergio Durigan Junior tags server-todo verification-needed verification-needed-mantic server-todo verification-done verification-done-mantic
2024-03-07 16:51:58 Launchpad Janitor qemu (Ubuntu Mantic): status Fix Committed Fix Released
2024-03-07 16:52:03 Andreas Hasenack removed subscriber Ubuntu Stable Release Updates Team
2024-03-20 15:18:18 Bryce Harrington tags server-todo verification-done verification-done-mantic verification-done verification-done-mantic