genetlink: fix single op policy dump when do is present
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-bluefield (Ubuntu) |
New
|
Undecided
|
Jose Ogando Justo | ||
Jammy |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
intro
-----
Our internal test triggers a kernel crash dump below
[ 888.690348] Sun Mar 24 23:51:59 2024: DriVerTest - Start Test
[ 888.691834] -------
[ 888.983912] mlx5_core 0000:08:00.1 eth3: Link up
[ 888.987644] IPv6: ADDRCONF(
[ 889.336577] mlx5_core 0000:08:00.0 eth2: Link up
[ 894.635836] Sun Mar 24 11:52:04 PM IST 2024 - DriVerTest Debug Heartbeat
[ 940.431644] general protection fault, probably for non-canonical address 0x8002001400000000: 0000 [#1] SMP NOPTI
[ 940.432866] CPU: 7 PID: 94305 Comm: ethtool Tainted: G OE 5.15.0-
[ 940.433970] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.
[ 940.435220] RIP: 0010:netlink_
[ 940.435893] Code: 48 c1 e0 04 4c 8b 34 01 4d 85 f6 74 5b 31 db eb 10 4c 89 e8 83 c3 01 48 c1 e0 04 39 5c 01 08 72 3f 89 d8 48 c1 e0 04 4c 01 f0 <0f> b6 10 83 ea 08 83 fa 01 77 dc 0f b7 50 02 48 8b 70 08 48 8d 7c
[ 940.437921] RSP: 0018:ffa0000002
[ 940.438551] RAX: 8002001400000000 RBX: 0000000000000000 RCX: ff1100027d000000
[ 940.439351] RDX: 00000000fffffff8 RSI: 0000000000000018 RDI: ffa0000002d37a10
[ 940.440131] RBP: 0000000000000003 R08: 0000000000400000 R09: ff1100027d2d0f10
[ 940.440900] R10: 0000000000000318 R11: 0000000000000000 R12: ff1100011fa59bc0
[ 940.441683] R13: 0000000000000004 R14: 8002001400000000 R15: ffffffff83fa6540
[ 940.442459] FS: 00007f4a1799374
[ 940.443394] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 940.444044] CR2: 0000000000429f50 CR3: 000000012fc2e002 CR4: 0000000000771ee0
[ 940.444847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 940.445639] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 940.446431] PKRU: 55555554
[ 940.446795] Call Trace:
[ 940.447144] <TASK>
[ 940.447444] ? __die_body+
[ 940.447880] ? die_addr+0x39/0x60
[ 940.448315] ? exc_general_
[ 940.448867] ? asm_exc_
[ 940.449445] ? netlink_
[ 940.450058] ? netlink_
[ 940.450714] ? ethtool_
[ 940.451272] ctrl_dumppolicy
[ 940.451788] ? ethnl_reply_
[ 940.452284] ? __nla_parse+
[ 940.452734] ? __cond_
[ 940.453211] ? kmem_cache_
[ 940.453750] genl_start+
[ 940.454179] __netlink_
[ 940.454706] genl_family_
[ 940.455334] ? genl_family_
[ 940.455998] ? genl_unlock+
[ 940.456453] ? genl_parallel_
[ 940.456957] genl_rcv_
[ 940.457421] ? genl_get_
[ 940.457890] ? ctrl_dumppolicy
[ 940.458515] ? genl_lock_
[ 940.458987] ? genl_family_
[ 940.459634] netlink_
[ 940.460107] genl_rcv+0x24/0x40
[ 940.460504] netlink_
[ 940.460983] netlink_
[ 940.461472] __sock_
[ 940.461922] __sys_sendto+
[ 940.462384] ? __sys_recvmsg+
[ 940.462854] ? exit_to_
[ 940.463439] __x64_sys_
[ 940.463906] do_syscall_
[ 940.464368] entry_SYSCALL_
[ 940.464955] RIP: 0033:0x7f4a17aa940a
[ 940.465415] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89
[ 940.467418] RSP: 002b:00007ffc36
[ 940.468284] RAX: ffffffffffffffda RBX: 0000000000c3b3b0 RCX: 00007f4a17aa940a
[ 940.469057] RDX: 0000000000000024 RSI: 0000000000c3b3b0 RDI: 0000000000000003
[ 940.469852] RBP: 0000000000c3b2a0 R08: 00007f4a17ba4200 R09: 000000000000000c
[ 940.470674] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000c3b340
[ 940.471470] R13: 0000000000c3b350 R14: 00007ffc3612caec R15: 0000000000c3b3b0
[ 940.472257] </TASK>
[ 940.472570] Modules linked in: udp_diag(E) tcp_diag(E) inet_diag(E) iptable_raw(E) openvswitch(E) nsh(E) nf_conncount(E) rdma_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) auxiliary(OE) mlxdevm(OE) ib_uverbs(OE) ib_core(OE) mlx_compat(OE) memtrack(OE) psample(E) ptp(E) pps_core(E) nfsv3(E) nfs_acl(E) rpcsec_gss_krb5(E) xt_conntrack(E) auth_rpcgss(E) xt_MASQUERADE(E) nf_conntrack_
[ 940.472612] [last unloaded: ib_core]
[ 940.481959] ---[ end trace 09663efb82dc1774 ]---
[ 940.482523] RIP: 0010:netlink_
fix
---
Need to cherry-pick the following patch
commit c1b05105573b2cd
Author: Jakub Kicinski <email address hidden>
Date: Wed Nov 9 10:32:54 2022 -0800
genetlink: fix single op policy dump when do is present
Jonathan reports crashes when running net-next in Meta's fleet.
Stats collection uses ethtool -I which does a per-op policy dump
to check if stats are supported. We don't initialize the dumpit
information if doit succeeds due to evaluation short-circuiting.
The crash may look like this:
BUG: kernel NULL pointer dereference, address: 0000000000000cc0
RIP: 0010:netlink_
Or we may trigger a warning:
WARNING: CPU: 1 PID: 785 at net/netlink/
RIP: 0010:netlink_
depending on what garbage we pick up from the stack.
Reported-by: Jonathan Lemon <email address hidden>
Fixes: 26588edbef60 ("genetlink: support split policies in ctrl_dumppolicy
Reviewed-by: Jacob Keller <email address hidden>
Tested-by: Leon Romanovsky <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Jakub Kicinski <email address hidden>
Changed in linux-bluefield (Ubuntu): | |
assignee: | nobody → Jose Ogando Justo (joseogando) |
Changed in linux-bluefield (Ubuntu Jammy): | |
status: | New → Fix Committed |
tags: |
added: verification-done-jammy-linux-bluefield removed: verification-needed-jammy-linux-bluefield |
This bug is awaiting verification that the linux-bluefield /5.15.0- 1040.42 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification- needed- jammy-linux- bluefield' to 'verification- done-jammy- linux-bluefield '. If the problem still exists, change the tag 'verification- needed- jammy-linux- bluefield' to 'verification- failed- jammy-linux- bluefield' .
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/ /wiki.ubuntu. com/Testing/ EnableProposed for documentation how to enable and use -proposed. Thank you!