tg3 transmit timed out, resetting

Bug #1950046 reported by Marian Rainer-Harbach
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi,

since earlier this year (sorry, I don't have an exact date) I started to encounter regular network hangs on multiple HPE servers that I manage. The hangs occur on high network load and are followed by a message like "tg3 0000:02:00.0 eno1: transmit timed out, resetting".

The problem already started on the original 20.04 kernel and still occurs on the current HWE kernel. Affected machines are HPE ProLiant ML30 Gen9, DL20 Gen9, and Microserver Gen8. The frequency of the problem seems to increase as time passes.

There is a long standing upstream ticket at https://bugzilla.kernel.org/show_bug.cgi?id=12877.

I'll post log messages in the comments.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Oct 29 13:56 seq
 crw-rw---- 1 root audio 116, 33 Oct 29 13:56 timer
AplayDevices: aplay: device_list:276: no soundcards found...
ApportVersion: 2.20.11-0ubuntu27.21
Architecture: amd64
ArecordDevices: arecord: device_list:276: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CasperMD5CheckResult: pass
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2020-04-26 (559 days ago)
InstallationMedia: Ubuntu-Server 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: HP ProLiant MicroServer Gen8
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.11.0-38-generic root=/dev/mapper/svr2--vg-root ro maybe-ubiquity
ProcVersionSignature: Ubuntu 5.11.0-38.42~20.04.1-generic 5.11.22
RelatedPackageVersions:
 linux-restricted-modules-5.11.0-38-generic N/A
 linux-backports-modules-5.11.0-38-generic N/A
 linux-firmware 1.187.20
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal uec-images
Uname: Linux 5.11.0-38-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: False
dmi.bios.date: 04/04/2019
dmi.bios.vendor: HP
dmi.bios.version: J06
dmi.chassis.type: 7
dmi.chassis.vendor: HP
dmi.ec.firmware.release: 2.78
dmi.modalias: dmi:bvnHP:bvrJ06:bd04/04/2019:efr2.78:svnHP:pnProLiantMicroServerGen8:pvr:sku712317-421:cvnHP:ct7:cvr:
dmi.product.family: ProLiant
dmi.product.name: ProLiant MicroServer Gen8
dmi.product.sku: 712317-421
dmi.sys.vendor: HP

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :
Download full text (42.8 KiB)

Example 1:
Nov 04 17:34:59 <hostname> kernel: ------------[ cut here ]------------
Nov 04 17:34:59 <hostname> kernel: NETDEV WATCHDOG: eno1 (tg3): transmit queue 0 timed out
Nov 04 17:34:59 <hostname> kernel: WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x24f/0x260
Nov 04 17:34:59 <hostname> kernel: Modules linked in: rpcsec_gss_krb5 xt_nat veth xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_use>
Nov 04 17:34:59 <hostname> kernel: raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel m>
Nov 04 17:34:59 <hostname> kernel: CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.11.0-38-generic #42~20.04.1-Ubuntu
Nov 04 17:34:59 <hostname> kernel: Hardware name: HP ProLiant ML30 Gen9/ProLiant ML30 Gen9, BIOS U23 04/04/2019
Nov 04 17:34:59 <hostname> kernel: RIP: 0010:dev_watchdog+0x24f/0x260
Nov 04 17:34:59 <hostname> kernel: Code: 07 78 fd ff eb ab 4c 89 ff c6 05 33 22 ee 00 01 e8 26 3c fa ff 44 89 e9 4c 89 fe 48 c7 c7 f0 4>
Nov 04 17:34:59 <hostname> kernel: RSP: 0018:ffffa82b80174e88 EFLAGS: 00010282
Nov 04 17:34:59 <hostname> kernel: RAX: 0000000000000000 RBX: ffff94c7927c8500 RCX: 0000000000000027
Nov 04 17:34:59 <hostname> kernel: RDX: 0000000000000027 RSI: 0000000100012071 RDI: ffff94cbebd98ac8
Nov 04 17:34:59 <hostname> kernel: RBP: ffffa82b80174eb8 R08: ffff94cbebd98ac0 R09: ffffa82b80174c48
Nov 04 17:34:59 <hostname> kernel: R10: 000000000113d6b0 R11: 000000000113d790 R12: 0000000000000005
Nov 04 17:34:59 <hostname> kernel: R13: 0000000000000000 R14: ffff94c7931cc4c0 R15: ffff94c7931cc000
Nov 04 17:34:59 <hostname> kernel: FS: 0000000000000000(0000) GS:ffff94cbebd80000(0000) knlGS:0000000000000000
Nov 04 17:34:59 <hostname> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 17:34:59 <hostname> kernel: CR2: 000056452bee1a38 CR3: 000000052a610003 CR4: 00000000003706e0
Nov 04 17:34:59 <hostname> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 04 17:34:59 <hostname> kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 04 17:34:59 <hostname> kernel: Call Trace:
Nov 04 17:34:59 <hostname> kernel: <IRQ>
Nov 04 17:34:59 <hostname> kernel: ? pfifo_fast_enqueue+0x150/0x150
Nov 04 17:34:59 <hostname> kernel: call_timer_fn+0x2e/0x100
Nov 04 17:34:59 <hostname> kernel: __run_timers.part.0+0x1e0/0x250
Nov 04 17:34:59 <hostname> kernel: ? lapic_next_deadline+0x2c/0x40
Nov 04 17:34:59 <hostname> kernel: ? clockevents_program_event+0x8f/0xe0
Nov 04 17:34:59 <hostname> kernel: run_timer_softirq+0x2a/0x50
Nov 04 17:34:59 <hostname> kernel: __do_softirq+0xe0/0x29b
Nov 04 17:34:59 <hostname> kernel: asm_call_irq_on_stack+0x12/0x20
Nov 04 17:34:59 <hostname> kernel: </IRQ>
Nov 04 17:34:59 <hostname> kernel: do_softirq_own_stack+0x3d/0x50
Nov 04 17:34:59 <hostname> kernel: irq_exit_rcu+0xa4/0xb0
Nov 04 17:34:59 <hostname> kernel: sysvec_apic_timer_interrupt+0x3d/0x90
Nov 04 17:34:59 <hostname> kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Nov 04 17:34:59 <hostname> kernel: RIP: 0010:cpuidle_enter_state+0xdf/0x350
Nov 04 17:34:59 <hostname> kernel: Code: ff e8 95 a8 77 ff 80 7d d7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4...

affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Marian Rainer-Harbach (marianrh) wrote :
Download full text (11.1 KiB)

Example 2:
Nov 06 15:34:38 <hostname> kernel: ------------[ cut here ]------------
Nov 06 15:34:38 <hostname> kernel: NETDEV WATCHDOG: eno1 (tg3): transmit queue 0 timed out
Nov 06 15:34:38 <hostname> kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x24f/0x260
Nov 06 15:34:38 <hostname> kernel: Modules linked in: rpcsec_gss_krb5 usblp binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm hpilo rapl intel_cstate acpi_ip>
Nov 06 15:34:38 <hostname> kernel: ghash_clmulni_intel cryptd drm psmouse libahci xhci_pci tg3 lpc_ich xhci_pci_renesas
Nov 06 15:34:38 <hostname> kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.11.0-38-generic #42~20.04.1-Ubuntu
Nov 06 15:34:38 <hostname> kernel: Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 04/04/2019
Nov 06 15:34:38 <hostname> kernel: RIP: 0010:dev_watchdog+0x24f/0x260
Nov 06 15:34:38 <hostname> kernel: Code: 07 78 fd ff eb ab 4c 89 ff c6 05 33 22 ee 00 01 e8 26 3c fa ff 44 89 e9 4c 89 fe 48 c7 c7 f0 42 c9 8f 48 89 c2 e8 cf 5a 16 00 <0f> 0b eb 8c 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
Nov 06 15:34:38 <hostname> kernel: RSP: 0018:ffffad17002c4e88 EFLAGS: 00010282
Nov 06 15:34:38 <hostname> kernel: RAX: 0000000000000000 RBX: ffff93cd93e7bd00 RCX: 0000000000000000
Nov 06 15:34:38 <hostname> kernel: RDX: ffff93d07aa68a20 RSI: ffff93d07aa58ac0 RDI: 0000000000000300
Nov 06 15:34:38 <hostname> kernel: RBP: ffffad17002c4eb8 R08: ffff93d07aa58ac0 R09: ffffad17002c4c48
Nov 06 15:34:38 <hostname> kernel: R10: 0000000000cb9cd0 R11: 0000000000cb9db8 R12: 0000000000000005
Nov 06 15:34:38 <hostname> kernel: R13: 0000000000000000 R14: ffff93cd94af04c0 R15: ffff93cd94af0000
Nov 06 15:34:38 <hostname> kernel: FS: 0000000000000000(0000) GS:ffff93d07aa40000(0000) knlGS:0000000000000000
Nov 06 15:34:38 <hostname> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 06 15:34:38 <hostname> kernel: CR2: 00007f41920eed38 CR3: 00000000be610001 CR4: 00000000001706e0
Nov 06 15:34:38 <hostname> kernel: Call Trace:
Nov 06 15:34:38 <hostname> kernel: <IRQ>
Nov 06 15:34:38 <hostname> kernel: ? pfifo_fast_enqueue+0x150/0x150
Nov 06 15:34:38 <hostname> kernel: call_timer_fn+0x2e/0x100
Nov 06 15:34:38 <hostname> kernel: __run_timers.part.0+0x1e0/0x250
Nov 06 15:34:38 <hostname> kernel: ? lapic_next_deadline+0x2c/0x40
Nov 06 15:34:38 <hostname> kernel: ? clockevents_program_event+0x8f/0xe0
Nov 06 15:34:38 <hostname> kernel: run_timer_softirq+0x2a/0x50
Nov 06 15:34:38 <hostname> kernel: __do_softirq+0xe0/0x29b
Nov 06 15:34:38 <hostname> kernel: asm_call_irq_on_stack+0x12/0x20
Nov 06 15:34:38 <hostname> kernel: </IRQ>
Nov 06 15:34:38 <hostname> kernel: do_softirq_own_stack+0x3d/0x50
Nov 06 15:34:38 <hostname> kernel: irq_exit_rcu+0xa4/0xb0
Nov 06 15:34:38 <hostname> kernel: sysvec_apic_timer_interrupt+0x3d/0x90
Nov 06 15:34:38 <hostname> kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Nov 06 15:34:38 <hostname> kernel: RIP: 0010:cpuidle_enter_state+0xdf/0x350
Nov 06 15:34:38 <hostname> kernel: Code: ff e8 95 a8 77 ff 80 7d d7 00 74 17 9c 58 0f 1f 44 00 0...

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1950046

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : CRDA.txt

apport information

tags: added: apport-collected uec-images
description: updated
Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : Lspci.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : Lspci-vt.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : Lsusb.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : Lsusb-t.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : Lsusb-v.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : ProcEnviron.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : ProcModules.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : UdevDb.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : WifiSyslog.txt

apport information

Revision history for this message
Marian Rainer-Harbach (marianrh) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.