CPU soft lockup in a spin lock using tproxy and nfqueue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I've been experimenting with netfilter's queue target and transparent redirection of IPv4/TCP connections on Ubuntu Jammy.
Initiating excessive connection requests from one source IP address to one IP/port, I could invoke
a soft lockup on a server host with the latest linux-generic 5.15.0.69.67.
With "kernel.
[ 520.222992] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [sample-
[ 520.223719] Modules linked in: nfnetlink_queue nft_socket nf_socket_ipv4 nf_socket_ipv6
nft_tproxy nf_tproxy_ipv6 nf_tproxy_ipv4 nft_queue nft_ct nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 nf_tables nfnetlink binfmt_misc intel_rapl_msr intel_rapl_common rapl
kvm_intel kvm nls_iso8859_1 input_leds serio_raw qemu_fw_cfg sch_fq_codel dm_multipath
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ramoops reed_solomon ipmi_devintf ipmi_msghandler
pstore_blk msr pstore_zone efi_pstore ip_tables x_tables autofs4 raid10 raid456 libcrc32c
async_
multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel
crypto_simd cryptd psmouse ahci i2c_i801 i2c_smbus libahci lpc_ich virtio_blk xhci_pci
xhci_pci_renesas virtio_net net_failover failover
[ 520.223772] CPU: 0 PID: 949 Comm: sample-queue-ha Kdump: loaded Not tainted 6.2.0-rc6 #1
[ 520.223774] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 520.223775] RIP: 0010:native_
[ 520.223781] Code: 0f 92 c2 41 8b 04 24 0f b6 d2 c1 e2 08 30 e4 09 d0 a9 00 01 ff ff 0f 85
ec 01 00 00 85 c0 74 12 41 8b 04 24 84 c0 74 0a f3 90 <41> 8b 04 24 84 c0 75 f6 b8 01 00 00
00 66 41 89 04 24 5b 41 5c 41
[ 520.223782] RSP: 0018:ffffa3e600
[ 520.223784] RAX: 0000000000000101 RBX: ffff952b02af9c00 RCX: 0000000000003fff
[ 520.223785] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff952b02a91910
[ 520.223786] RBP: ffffa3e600003960 R08: 0000000000039c00 R09: 00000000948c28ba
[ 520.223787] R10: 00000000000124f8 R11: 0000000000000000 R12: ffff952b02a91910
[ 520.223788] R13: ffff952b057a6110 R14: ffff952b02a91910 R15: ffff952b0389c600
[ 520.223789] FS: 00007efc3ba2eb8
[ 520.223790] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 520.223791] CR2: 00007f0b96ffcef8 CR3: 000000000551e006 CR4: 0000000000370ef0
[ 520.223795] Call Trace:
[ 520.223796] <IRQ>
[ 520.223805] _raw_spin_
[ 520.223807] inet_twsk_
[ 520.223810] tcp_time_
[ 520.223811] tcp_rcv_
[ 520.223813] ? sk_filter_
[ 520.223815] ? tcp_inbound_
[ 520.223817] tcp_v4_
[ 520.223820] tcp_v4_
[ 520.223821] ? raw_local_
[ 520.223823] ip_protocol_
[ 520.223825] ip_local_
[ 520.223826] ip_local_
[ 520.223827] ? ip_protocol_
[ 520.223829] ip_sublist_
[ 520.223830] ip_sublist_
[ 520.223832] ? ip_rcv_
[ 520.223833] ip_list_
[ 520.223835] __netif_
[ 520.223837] netif_receive_
[ 520.223839] napi_complete_
[ 520.223842] virtnet_
[ 520.223851] __napi_
[ 520.223852] net_rx_
[ 520.223854] ? skb_recv_
[ 520.223858] __do_softirq+
[ 520.223860] __irq_exit_
[ 520.223865] irq_exit_
[ 520.223866] common_
[ 520.223870] </IRQ>
[ 520.223871] <TASK>
[ 520.223872] asm_common_
[ 520.223873] RIP: 0010:inet_
[ 520.223875] Code: 74 04 48 89 4a 08 49 89 4e 08 49 83 c6 08 4c 89 70 38 5b 41 5c 41 5d 41
5e 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <66> 0f 1f 00 0f 1f 44 00 00 48 8b 46
40 48 85 c0 74 01 c3 48 8b 46
[ 520.223876] RSP: 0018:ffffa3e600
[ 520.223877] RAX: ffff952b057a0ab8 RBX: ffff952b057a6aa0 RCX: 00000000a54739c0
[ 520.223878] RDX: ffff952b057a6898 RSI: ffff952b0a20c700 RDI: ffff952b011dfd00
[ 520.223879] RBP: ffffa3e60096f398 R08: ffff952b0544cec0 R09: 000000000000c6ce
[ 520.223880] R10: 0000000000009cad R11: 0000000000000000 R12: ffffffffbb72a5a0
[ 520.223881] R13: ffff952b05561ac0 R14: ffff952b02a91910 R15: ffffffffbb72a5a0
[ 520.223883] ? inet_twsk_
[ 520.223885] inet_twsk_
[ 520.223886] inet_twsk_
[ 520.223889] nf_tproxy_
[ 520.223892] nft_tproxy_
[ 520.223897] nft_do_
[ 520.223920] ? nft_do_
[ 520.223927] ? kmem_cache_
[ 520.223930] ? skb_free_
[ 520.223932] ? kfree_skbmem+
[ 520.223934] ? consume_
[ 520.223936] ? tcp_v4_
[ 520.223938] ? tcp_v4_
[ 520.223939] ? raw_local_
[ 520.223941] ? ip_protocol_
[ 520.223942] ? ip_local_
[ 520.223944] ? __nf_conntrack_
[ 520.223952] ? ip_local_
[ 520.223953] nft_do_
[ 520.223961] ? nf_conntrack_
[ 520.223968] nf_reinject+
[ 520.223971] nfqnl_reinject+
[ 520.223974] nfqnl_recv_
[ 520.223978] nfnetlink_
[ 520.223983] ? save_fpregs_
[ 520.223985] ? nfnetlink_
[ 520.223988] netlink_
[ 520.223990] nfnetlink_
[ 520.223993] ? __netlink_
[ 520.223995] netlink_
[ 520.223996] netlink_
[ 520.223998] sock_sendmsg+
[ 520.224000] ____sys_
[ 520.224002] ___sys_
[ 520.224004] ? __sys_recvfrom+
[ 520.224007] ? __rseq_
[ 520.224009] __sys_sendmsg+
[ 520.224011] ? do_syscall_
[ 520.224013] __x64_sys_
[ 520.224015] do_syscall_
[ 520.224016] ? do_syscall_
[ 520.224018] ? exit_to_
[ 520.224020] ? syscall_
[ 520.224022] ? __x64_sys_
[ 520.224024] ? do_syscall_
[ 520.224026] ? syscall_
[ 520.224027] ? __x64_sys_
[ 520.224029] ? do_syscall_
[ 520.224030] ? do_syscall_
[ 520.224032] entry_SYSCALL_
Bug fix has already been created on the stable upstream:
Status changed to 'Confirmed' because the bug affects multiple users.