tg3 transmit timed out, resetting
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Hi,
since earlier this year (sorry, I don't have an exact date) I started to encounter regular network hangs on multiple HPE servers that I manage. The hangs occur on high network load and are followed by a message like "tg3 0000:02:00.0 eno1: transmit timed out, resetting".
The problem already started on the original 20.04 kernel and still occurs on the current HWE kernel. Affected machines are HPE ProLiant ML30 Gen9, DL20 Gen9, and Microserver Gen8. The frequency of the problem seems to increase as time passes.
There is a long standing upstream ticket at https:/
I'll post log messages in the comments.
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Oct 29 13:56 seq
crw-rw---- 1 root audio 116, 33 Oct 29 13:56 timer
AplayDevices: aplay: device_list:276: no soundcards found...
ApportVersion: 2.20.11-
Architecture: amd64
ArecordDevices: arecord: device_list:276: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckR
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2020-04-26 (559 days ago)
InstallationMedia: Ubuntu-Server 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant MicroServer Gen8
Package: linux (not installed)
PciMultimedia:
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.187.20
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal uec-images
Uname: Linux 5.11.0-38-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: False
dmi.bios.date: 04/04/2019
dmi.bios.vendor: HP
dmi.bios.version: J06
dmi.chassis.type: 7
dmi.chassis.vendor: HP
dmi.ec.
dmi.modalias: dmi:bvnHP:
dmi.product.family: ProLiant
dmi.product.name: ProLiant MicroServer Gen8
dmi.product.sku: 712317-421
dmi.sys.vendor: HP
Example 1: sch_generic. c:467 dev_watchdog+ 0x24f/0x260 netlink nfnetlink xfrm_use> watchdog+ 0x24f/0x260 174e88 EFLAGS: 00010282 0(0000) GS:ffff94cbebd8 0000(0000) knlGS:000000000 0000000 enqueue+ 0x150/0x150 fn+0x2e/ 0x100 part.0+ 0x1e0/0x250 deadline+ 0x2c/0x40 program_ event+0x8f/ 0xe0 softirq+ 0x2a/0x50 0xe0/0x29b irq_on_ stack+0x12/ 0x20 own_stack+ 0x3d/0x50 rcu+0xa4/ 0xb0 apic_timer_ interrupt+ 0x3d/0x90 apic_timer_ interrupt+ 0x12/0x20 enter_state+ 0xdf/0x350
Nov 04 17:34:59 <hostname> kernel: ------------[ cut here ]------------
Nov 04 17:34:59 <hostname> kernel: NETDEV WATCHDOG: eno1 (tg3): transmit queue 0 timed out
Nov 04 17:34:59 <hostname> kernel: WARNING: CPU: 3 PID: 0 at net/sched/
Nov 04 17:34:59 <hostname> kernel: Modules linked in: rpcsec_gss_krb5 xt_nat veth xt_MASQUERADE nf_conntrack_
Nov 04 17:34:59 <hostname> kernel: raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel m>
Nov 04 17:34:59 <hostname> kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.11.0-38-generic #42~20.04.1-Ubuntu
Nov 04 17:34:59 <hostname> kernel: Hardware name: HP ProLiant ML30 Gen9/ProLiant ML30 Gen9, BIOS U23 04/04/2019
Nov 04 17:34:59 <hostname> kernel: RIP: 0010:dev_
Nov 04 17:34:59 <hostname> kernel: Code: 07 78 fd ff eb ab 4c 89 ff c6 05 33 22 ee 00 01 e8 26 3c fa ff 44 89 e9 4c 89 fe 48 c7 c7 f0 4>
Nov 04 17:34:59 <hostname> kernel: RSP: 0018:ffffa82b80
Nov 04 17:34:59 <hostname> kernel: RAX: 0000000000000000 RBX: ffff94c7927c8500 RCX: 0000000000000027
Nov 04 17:34:59 <hostname> kernel: RDX: 0000000000000027 RSI: 0000000100012071 RDI: ffff94cbebd98ac8
Nov 04 17:34:59 <hostname> kernel: RBP: ffffa82b80174eb8 R08: ffff94cbebd98ac0 R09: ffffa82b80174c48
Nov 04 17:34:59 <hostname> kernel: R10: 000000000113d6b0 R11: 000000000113d790 R12: 0000000000000005
Nov 04 17:34:59 <hostname> kernel: R13: 0000000000000000 R14: ffff94c7931cc4c0 R15: ffff94c7931cc000
Nov 04 17:34:59 <hostname> kernel: FS: 000000000000000
Nov 04 17:34:59 <hostname> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 17:34:59 <hostname> kernel: CR2: 000056452bee1a38 CR3: 000000052a610003 CR4: 00000000003706e0
Nov 04 17:34:59 <hostname> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 04 17:34:59 <hostname> kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 04 17:34:59 <hostname> kernel: Call Trace:
Nov 04 17:34:59 <hostname> kernel: <IRQ>
Nov 04 17:34:59 <hostname> kernel: ? pfifo_fast_
Nov 04 17:34:59 <hostname> kernel: call_timer_
Nov 04 17:34:59 <hostname> kernel: __run_timers.
Nov 04 17:34:59 <hostname> kernel: ? lapic_next_
Nov 04 17:34:59 <hostname> kernel: ? clockevents_
Nov 04 17:34:59 <hostname> kernel: run_timer_
Nov 04 17:34:59 <hostname> kernel: __do_softirq+
Nov 04 17:34:59 <hostname> kernel: asm_call_
Nov 04 17:34:59 <hostname> kernel: </IRQ>
Nov 04 17:34:59 <hostname> kernel: do_softirq_
Nov 04 17:34:59 <hostname> kernel: irq_exit_
Nov 04 17:34:59 <hostname> kernel: sysvec_
Nov 04 17:34:59 <hostname> kernel: asm_sysvec_
Nov 04 17:34:59 <hostname> kernel: RIP: 0010:cpuidle_
Nov 04 17:34:59 <hostname> kernel: Code: ff e8 95 a8 77 ff 80 7d d7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4...