NETDEV WATCHDOG: eno12399np0 (bnxt_en): transmit queue 4 timed out
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
Focal |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Issue Description:
We encountered a network device timeout error on our server, as indicated by a NETDEV WATCHDOG timeout event. The error occurred specifically on the transmit queue 4 of the network interface eno12399np0, which uses the bnxt_en driver.
Error Log:
Time of Incident: May 31 03:53:35
Error Message:
yaml
Copy code
NETDEV WATCHDOG: eno12399np0 (bnxt_en): transmit queue 4 timed out
WARNING: CPU: 2 PID: 0 at net/sched/
Kernel Version: 5.4.0-182-generic #202-Ubuntu
Hardware: Dell Inc. PowerEdge R650, BIOS 1.13.2 dated 12/19/2023
Modules Linked:
A comprehensive list of kernel modules active at the time was provided, including networking and system management modules, which may be relevant to diagnosing the issue.
Steps Taken:
We have checked physical connections and rebooted the server without resolving the issue. The network interface seems to sporadically fail, leading to these watchdog timeouts.
Questions:
Has anyone experienced similar issues with the bnxt_en driver or similar hardware configurations?
Are there known issues with this driver version on Ubuntu 20.04 LTS that could lead to transmit queue timeouts?
Any recommendations on driver updates, kernel patches, or configuration changes that could help mitigate this problem?
Additional Context:
The server is critical to our operations, handling high network traffic loads.
This is the first occurrence after a recent system update.
Request for Assistance:
Insights on debugging further at the kernel level or specific logs that would be useful to examine.
Suggestions for temporary workarounds or permanent fixes from community members with experience in network management and kernel troubleshooting.
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756411] ------------[ cut here ]------------
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756415] NETDEV WATCHDOG: eno12399np0 (bnxt_en): transmit queue 4 timed out
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756450] WARNING: CPU: 2 PID: 0 at net/sched/
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756452] Modules linked in: nf_conntrack_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756505] br_netfilter bridge ramoops efi_pstore reed_solomon stp llc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid1 raid0 multipath linear dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor mgag200 drm_vram_helper i2c_algo_bit ttm hid_generic drm_kms_helper syscopyarea raid6_pq sysfillrect sysimgblt libcrc32c usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel fb_sys_fops aesni_intel crypto_simd cryptd nvme glue_helper ahci drm nvme_core bnxt_en tg3 i2c_i801 libahci wmi
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756543] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.0-182-generic #202-Ubuntu
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756546] Hardware name: Dell Inc. PowerEdge R650/0FGCWW, BIOS 1.13.2 12/19/2023
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756551] RIP: 0010:dev_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756556] Code: eb 9d 48 8b 5d d0 c6 05 ba 7c 2a 01 01 48 89 df e8 25 ae fa ff 44 89 e1 48 89 de 48 c7 c7 80 a6 20 b4 48 89 c2 e8 be 46 14 00 <0f> 0b e9 77 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756559] RSP: 0018:ffffae5740
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756562] RAX: 0000000000000000 RBX: ffff9ead25d40000 RCX: 0000000000000006
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756564] RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff9ead3f65c8c0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756566] RBP: ffffae574017ce70 R08: 000000000000094a R09: 0000000000000004
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756567] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000004
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756569] R13: ffff9ead25d4dbc0 R14: 000000000000004a R15: ffff9ead25d40480
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756572] FS: 000000000000000
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756574] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756576] CR2: 00007f311800b3c0 CR3: 0000003f1c522004 CR4: 0000000000762ee0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756578] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756580] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756581] PKRU: 55555554
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756583] Call Trace:
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756586] <IRQ>
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756596] ? show_regs.
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756603] ? __warn+0x98/0xe0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756607] ? dev_watchdog+
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756613] ? report_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756621] ? do_error_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756624] ? do_invalid_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756628] ? dev_watchdog+
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756634] ? invalid_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756638] ? dev_watchdog+
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756641] ? dev_watchdog+
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756645] ? pfifo_fast_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756652] call_timer_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756658] __run_timers.
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756663] ? timerqueue_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756668] ? enqueue_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756671] ? ktime_get+0x3e/0xa0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756676] run_timer_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756682] __do_softirq+
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756687] irq_exit+0xae/0xb0
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756692] smp_apic_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756697] apic_timer_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756699] </IRQ>
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756706] RIP: 0010:cpuidle_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756710] Code: ff e8 cf 06 83 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 f2 1e 89 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756712] RSP: 0018:ffffae5740
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756715] RAX: ffff9ead3f66ff00 RBX: ffffffffb4969be0 RCX: 000000000000001f
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756717] RDX: 0000000000000000 RSI: 000000002dd27b80 RDI: 0000000000000000
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756718] RBP: ffffae5740397e78 R08: 00000eb2b824f134 R09: 000000007fffffff
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756720] R10: ffff9ead3f66ebc0 R11: ffff9ead3f66eba0 R12: ffff9ead33291800
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756722] R13: 0000000000000002 R14: 0000000000000002 R15: ffff9ead33291800
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756728] ? cpuidle_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756733] cpuidle_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756739] call_cpuidle+
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756742] do_idle+0x1dd/0x270
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756747] cpu_startup_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756754] start_secondary
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756760] secondary_
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756764] ---[ end trace 73ce74318a7baae1 ]---
May 31 03:53:35 onf-hk-comp006 kernel: [16160.756771] bnxt_en 0000:31:00.0 eno12399np0: TX timeout detected, starting reset task!
Targeting bug report to Focal for now, as the reported logs indicate a 5.4.0-182-generic running kernel.