mlxbf_gige: call request_irq() after NAPI initialized

Bug #2059310 reported by David Thompson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-bluefield (Ubuntu)
New
Undecided
Unassigned
Jammy
Fix Committed
Undecided
Unassigned

Bug Description

SRU Justification:

[Impact]
The mlxbf_gige driver encounters a NULL pointer exception in
mlxbf_gige_open() when kdump is enabled. The exception happens
because there is a pending RX interrupt before the call to
request_irq(RX IRQ) executes. Then, the RX IRQ handler fires
immediately after this request_irq() completes. The RX IRQ handler
runs "napi_schedule()" before NAPI is fully initialized via
"netif_napi_add()" and "napi_enable()", both which happen later
in the open() logic.

[Fix]
The logic in mlxbf_gige_open() must fully initialize NAPI before
any calls to request_irq() execute.

[Test Case]
* Boot BF platform and bring up "oob_net0" interface
* Enable kdump completely
* Trigger kdump via "echo c > /proc/sysrq-trigger"
* There should be no exceptions from mlxbf_gige driver

[Regression Potential]
There is low potential for regression as this brings in upstream content.

[Other]
None

Changed in linux-bluefield (Ubuntu Jammy):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-bluefield/5.15.0-1040.42 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-done-jammy-linux-bluefield'. If the problem still exists, change the tag 'verification-needed-jammy-linux-bluefield' to 'verification-failed-jammy-linux-bluefield'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-bluefield-v2 verification-needed-jammy-linux-bluefield
Tien Do (tienmdo)
Changed in linux-bluefield (Ubuntu Jammy):
status: Fix Committed → Confirmed
Tien Do (tienmdo)
Changed in linux-bluefield (Ubuntu Jammy):
status: Confirmed → Fix Committed
tags: added: verification-done-jammy-linux-bluefield
removed: verification-needed-jammy-linux-bluefield
Revision history for this message
Tien Do (tienmdo) wrote :

Verification

#################################
root@bu-lab62v2-oob:~# bfver
--/dev/mmcblk0boot0
BlueField ATF version: v2.2(release):4.7.0-17-g55e782a
BlueField UEFI version: 4.7.0-30-g91cad60
BlueField BSP version: 4.7.0.13101

OS Release Version: bf-bundle-2.7.0-11_24.04_ubuntu-22.04_dev
root@bu-lab62v2-oob:~#

#########################################
root@bu-lab62v2-oob:~# ifconfig oob_net0
oob_net0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.15.1.66 netmask 255.255.255.0 broadcast 10.15.1.255
        inet6 fe80::a288:c2ff:fe0e:865a prefixlen 64 scopeid 0x20<link>
        ether a0:88:c2:0e:86:5a txqueuelen 1000 (Ethernet)
        RX packets 202 bytes 22026 (22.0 KB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 65 bytes 5360 (5.3 KB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

root@bu-lab62v2-oob:~#

########################################
root@bu-lab62v2-oob:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0xdd000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.15.0-1040-bluefield
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.15.0-1040-bluefield
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.15.0-1040-bluefield root=UUID=0790b0b1-2540-4819-aed6-0e26caf975f5 ro console=hvc0 console=ttyAMA0 earlycon=pl011,0x13010000 fixrtc net.ifnames=0 biosdevname=0 iommu.passthrough=1 isolcpus=6,7 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz
root@bu-lab62v2-oob:~#

############################################
root@bu-lab62v2-oob:~# echo c > /proc/sysrq-trigger
[ 443.856844] sysrq: Trigger a crash
[ 443.861063] Kernel panic - not syncing: sysrq triggered crash
[ 443.866799] CPU: 0 PID: 10859 Comm: bash Kdump: loaded Tainted: G OE 5.15.0-1040-bluefield #42-Ubuntu
[ 443.877215] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.7.0.13101 Apr 5 2024
[ 443.887109] Call trace:
[ 443.889540] dump_backtrace+0x0/0x200
[ 443.893194] show_stack+0x20/0x2c
[ 443.896494] dump_stack_lvl+0x68/0x84
[ 443.900145] dump_stack+0x18/0x34
[ 443.903444] panic+0x1b0/0x3a0
[ 443.906486] sysrq_reset_seq_param_set+0x0/0x9c
[ 443.911003] __handle_sysrq+0xc4/0x250
[ 443.914737] write_sysrq_trigger+0xbc/0x1a0
[ 443.918905] proc_reg_write+0xb0/0x10c
[ 443.922640] vfs_write+0xf8/0x2c4
[ 443.925941] ksys_write+0x70/0x100
[ 443.929328] __arm64_sys_write+0x24/0x30
[ 443.933235] invoke_syscall+0x78/0x100
[ 443.936971] el0_svc_common.constprop.0+0x54/0x184
[ 443.941747] do_el0_svc+0x30/0xac
[ 443.945046] el0_svc+0x48/0x160
[ 443.948175] el0t_64_sync_handler+0xa4/0x12c
[ 443.952430] el0t_64_sync+0x1a4/0x1a8
[ 443.956082] SMP: stopping secondary CPUs
[ 443.960307] Starting crashdump kernel...
[ 443.964217] Bye!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.