genetlink: fix single op policy dump when do is present

Bug #2059961 reported by William Tu
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-bluefield (Ubuntu)
New
Undecided
Jose Ogando Justo
Jammy
Fix Committed
Undecided
Unassigned

Bug Description

intro
-----

Our internal test triggers a kernel crash dump below
[ 888.690348] Sun Mar 24 23:51:59 2024: DriVerTest - Start Test
 [ 888.691834] ----------------------------------------------------------------------------------------------------
 [ 888.983912] mlx5_core 0000:08:00.1 eth3: Link up
 [ 888.987644] IPv6: ADDRCONF(NETDEV_CHANGE): eth3: link becomes ready
 [ 889.336577] mlx5_core 0000:08:00.0 eth2: Link up
 [ 894.635836] Sun Mar 24 11:52:04 PM IST 2024 - DriVerTest Debug Heartbeat
 [ 940.431644] general protection fault, probably for non-canonical address 0x8002001400000000: 0000 [#1] SMP NOPTI
 [ 940.432866] CPU: 7 PID: 94305 Comm: ethtool Tainted: G OE 5.15.0-1039.17.g0d63875-bluefield #1
 [ 940.433970] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [ 940.435220] RIP: 0010:netlink_policy_dump_add_policy+0x95/0x160
 [ 940.435893] Code: 48 c1 e0 04 4c 8b 34 01 4d 85 f6 74 5b 31 db eb 10 4c 89 e8 83 c3 01 48 c1 e0 04 39 5c 01 08 72 3f 89 d8 48 c1 e0 04 4c 01 f0 <0f> b6 10 83 ea 08 83 fa 01 77 dc 0f b7 50 02 48 8b 70 08 48 8d 7c
 [ 940.437921] RSP: 0018:ffa0000002d37a08 EFLAGS: 00010286
 [ 940.438551] RAX: 8002001400000000 RBX: 0000000000000000 RCX: ff1100027d000000
 [ 940.439351] RDX: 00000000fffffff8 RSI: 0000000000000018 RDI: ffa0000002d37a10
 [ 940.440131] RBP: 0000000000000003 R08: 0000000000400000 R09: ff1100027d2d0f10
 [ 940.440900] R10: 0000000000000318 R11: 0000000000000000 R12: ff1100011fa59bc0
 [ 940.441683] R13: 0000000000000004 R14: 8002001400000000 R15: ffffffff83fa6540
 [ 940.442459] FS: 00007f4a17993740(0000) GS:ff1100085f9c0000(0000) knlGS:0000000000000000
 [ 940.443394] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [ 940.444044] CR2: 0000000000429f50 CR3: 000000012fc2e002 CR4: 0000000000771ee0
 [ 940.444847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 [ 940.445639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 [ 940.446431] PKRU: 55555554
 [ 940.446795] Call Trace:
 [ 940.447144] <TASK>
 [ 940.447444] ? __die_body+0x1b/0x60
 [ 940.447880] ? die_addr+0x39/0x60
 [ 940.448315] ? exc_general_protection+0x1bc/0x3c0
 [ 940.448867] ? asm_exc_general_protection+0x22/0x30
 [ 940.449445] ? netlink_policy_dump_add_policy+0x95/0x160
 [ 940.450058] ? netlink_policy_dump_add_policy+0xb2/0x160
 [ 940.450714] ? ethtool_get_phc_vclocks+0x70/0x70
 [ 940.451272] ctrl_dumppolicy_start+0xc4/0x2a0
 [ 940.451788] ? ethnl_reply_init+0xd0/0xd0
 [ 940.452284] ? __nla_parse+0x22/0x30
 [ 940.452734] ? __cond_resched+0x15/0x30
 [ 940.453211] ? kmem_cache_alloc_trace+0x44/0x390
 [ 940.453750] genl_start+0xc3/0x150
 [ 940.454179] __netlink_dump_start+0x175/0x250
 [ 940.454706] genl_family_rcv_msg_dumpit.isra.0+0x9a/0x100
 [ 940.455334] ? genl_family_rcv_msg_attrs_parse.isra.0+0xe0/0xe0
 [ 940.455998] ? genl_unlock+0x20/0x20
 [ 940.456453] ? genl_parallel_done+0x40/0x40
 [ 940.456957] genl_rcv_msg+0x11f/0x2b0
 [ 940.457421] ? genl_get_cmd+0x170/0x170
 [ 940.457890] ? ctrl_dumppolicy_put_op.isra.0+0x1e0/0x1e0
 [ 940.458515] ? genl_lock_done+0x60/0x60
 [ 940.458987] ? genl_family_rcv_msg_doit.isra.0+0x110/0x110
 [ 940.459634] netlink_rcv_skb+0x54/0x100
 [ 940.460107] genl_rcv+0x24/0x40
 [ 940.460504] netlink_unicast+0x18d/0x230
 [ 940.460983] netlink_sendmsg+0x240/0x4a0
 [ 940.461472] __sock_sendmsg+0x2f/0x40
 [ 940.461922] __sys_sendto+0xee/0x160
 [ 940.462384] ? __sys_recvmsg+0x56/0xa0
 [ 940.462854] ? exit_to_user_mode_prepare+0x35/0x170
 [ 940.463439] __x64_sys_sendto+0x25/0x30
 [ 940.463906] do_syscall_64+0x35/0x80
 [ 940.464368] entry_SYSCALL_64_after_hwframe+0x61/0xcb
 [ 940.464955] RIP: 0033:0x7f4a17aa940a
 [ 940.465415] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
 [ 940.467418] RSP: 002b:00007ffc3612cac8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
 [ 940.468284] RAX: ffffffffffffffda RBX: 0000000000c3b3b0 RCX: 00007f4a17aa940a
 [ 940.469057] RDX: 0000000000000024 RSI: 0000000000c3b3b0 RDI: 0000000000000003
 [ 940.469852] RBP: 0000000000c3b2a0 R08: 00007f4a17ba4200 R09: 000000000000000c
 [ 940.470674] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000c3b340
 [ 940.471470] R13: 0000000000c3b350 R14: 00007ffc3612caec R15: 0000000000c3b3b0
 [ 940.472257] </TASK>
 [ 940.472570] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) iptable_raw(E) openvswitch(E) nsh(E) nf_conncount(E) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) auxiliary(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) mlx_compat(OE) memtrack(OE) psample(E) ptp(E) pps_core(E) nfsv3(E) nfs_acl(E) rpcsec_gss_krb5(E) xt_conntrack(E) auth_rpcgss(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) nfnetlink(E) xt_addrtype(E) iptable_filter(E) iptable_nat(E) nf_nat(E) br_netfilter(E) bridge(E) stp(E) llc(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) fscache(E) netfs(E) overlay(E) rfkill(E) sunrpc(E) kvm_intel(E) iTCO_wdt(E) iTCO_vendor_support(E) kvm(E) irqbypass(E) virtio_net(E) i2c_i801(E) pcspkr(E) i2c_smbus(E) lpc_ich(E) net_failover(E) mfd_core(E) failover(E) sch_fq_codel(E) drm(E) i2c_core(E) ip_tables(E) crc32_pclmul(E) crc32c_intel(E) ghash_clmulni_intel(E) sha256_ssse3(E) sha1_ssse3(E) serio_raw(E) fuse(E)
 [ 940.472612] [last unloaded: ib_core]
 [ 940.481959] ---[ end trace 09663efb82dc1774 ]---
 [ 940.482523] RIP: 0010:netlink_policy_dump_add_policy+0x95/0x160

fix
---

Need to cherry-pick the following patch

commit c1b05105573b2cd5845921eb0d2caa26e2144a34
Author: Jakub Kicinski <email address hidden>
Date: Wed Nov 9 10:32:54 2022 -0800

    genetlink: fix single op policy dump when do is present

    Jonathan reports crashes when running net-next in Meta's fleet.
    Stats collection uses ethtool -I which does a per-op policy dump
    to check if stats are supported. We don't initialize the dumpit
    information if doit succeeds due to evaluation short-circuiting.

    The crash may look like this:

       BUG: kernel NULL pointer dereference, address: 0000000000000cc0
       RIP: 0010:netlink_policy_dump_add_policy+0x174/0x2a0
         ctrl_dumppolicy_start+0x19f/0x2f0
         genl_start+0xe7/0x140

    Or we may trigger a warning:

       WARNING: CPU: 1 PID: 785 at net/netlink/policy.c:87 netlink_policy_dump_get_policy_idx+0x79/0x80
       RIP: 0010:netlink_policy_dump_get_policy_idx+0x79/0x80
         ctrl_dumppolicy_put_op+0x214/0x360

    depending on what garbage we pick up from the stack.

    Reported-by: Jonathan Lemon <email address hidden>
    Fixes: 26588edbef60 ("genetlink: support split policies in ctrl_dumppolicy_put_op()")
    Reviewed-by: Jacob Keller <email address hidden>
    Tested-by: Leon Romanovsky <email address hidden>
    Link: https://<email address hidden>
    Signed-off-by: Jakub Kicinski <email address hidden>

Changed in linux-bluefield (Ubuntu):
assignee: nobody → Jose Ogando Justo (joseogando)
Changed in linux-bluefield (Ubuntu Jammy):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-bluefield/5.15.0-1040.42 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-done-jammy-linux-bluefield'. If the problem still exists, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-failed-jammy-linux-bluefield'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy-linux-bluefield
Tony Duan (yifeid)
tags: added: verification-done-jammy-linux-bluefield
removed: verification-needed-jammy-linux-bluefield
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.