14e4:1687 broadcom tg3 network driver disconnects under high load
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Unassigned |
Bug Description
The tg3 broadcom network driver that binds with chipset 5762 goes offline and unable to recover (even with tg3 watchdog timeout) when network transmit is under high load. Call trace:
https:/
When this happens, only a reboot would be able to fix it. Sometimes, however, bringing the interface offline and online (via ifconfig) would recover networking. I've also tested with the latest tg3 driver (dec 2014 version) and networking is still problematic. I have also disabled TSO, GSO etc... with ethtool and the bug still surfaces. This bug may be related to the integrated Firmware.
Here is the procedure to replicate the issue because it is hard to replicate it under moderate network load.
1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a Ubuntu/Kubunu Live CD 14.04-15.04.
2. from another machine: start 5 sessions, repetitively copy (scp with public key authentication) a 70 meg file back and forth to the tg3 machine in each session. (not sure if this is necessary)
3. create a 1GB file on the tg3 machine, with something like dd if=/dev/urandom of=/my/test/file bs=1024 count=$
4. from another machine: repetitively scp copy that 1GB file from the tg3 machine. This can be done with something like:
while [ 0 ]; do
scp -i /my/scp/private.key <email address hidden>
done;
Networking will mostly goes offline in about 10-30 minutes.
WORKAROUND: Add udev rule to make the changes permanent in /etc/udev/
ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}
ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: linux-image-
ProcVersionSign
Uname: Linux 3.19.0-15-generic x86_64
ApportVersion: 2.17.2-0ubuntu1
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
/dev/snd/
CasperVersion: 1.360
Date: Thu Apr 23 11:16:24 2015
IwConfig:
eth0 no wireless extensions.
lo no wireless extensions.
LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
ProcEnviron:
LANGUAGE=
TERM=xterm
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=
PulseList:
Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
RelatedPackageV
linux-
linux-
linux-firmware 1.143
RfKill:
SourcePackage: linux
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/22/2014
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: L06 v02.15
dmi.board.
dmi.board.name: 2215
dmi.board.vendor: Hewlett-Packard
dmi.chassis.
dmi.chassis.type: 6
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-
dmi.product.name: HP EliteDesk 705 G1 MT
dmi.sys.vendor: Hewlett-Packard
Toan (tpham3783) wrote : | #1 |
- AlsaInfo.txt Edit (42.9 KiB, text/plain; charset="utf-8")
- CRDA.txt Edit (238 bytes, text/plain; charset="utf-8")
- CurrentDmesg.txt Edit (80.8 KiB, text/plain; charset="utf-8")
- Dependencies.txt Edit (2.7 KiB, text/plain; charset="utf-8")
- JournalErrors.txt Edit (36.8 KiB, text/plain; charset="utf-8")
- Lspci.txt Edit (24.1 KiB, text/plain; charset="utf-8")
- Lsusb.txt Edit (690 bytes, text/plain; charset="utf-8")
- ProcCpuinfo.txt Edit (2.2 KiB, text/plain; charset="utf-8")
- ProcInterrupts.txt Edit (2.1 KiB, text/plain; charset="utf-8")
- ProcModules.txt Edit (3.7 KiB, text/plain; charset="utf-8")
- UdevDb.txt Edit (153.7 KiB, text/plain; charset="utf-8")
- WifiSyslog.txt Edit (104.2 KiB, text/plain; charset="utf-8")
Brad Figg (brad-figg) wrote : Status changed to Confirmed | #2 |
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
description: | updated |
description: | updated |
Joseph Salisbury (jsalisbury) wrote : Re: broadcom tg3 network driver disconnects under high load | #3 |
Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?
Would it be possible for you to test the latest upstream kernel? Refer to https:/
If this bug is fixed in the mainline kernel, please add the following tag 'kernel-
If the mainline kernel does not fix this bug, please add the tag: 'kernel-
If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".
Thanks in advance.
[0] http://
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
status: | Confirmed → Incomplete |
Toan (tpham3783) wrote : | #4 |
Joseph,
>Did this issue start happening after an update/upgrade?
No, I also had this issue. I tested with multiple OSes and kernel versions. I tested the issue with kernel 2.6.39,
and three Ubuntu live CDs 12.04, 14.04, and 15.04 (which was released today). I, however, will consider testing with kernel 4.x.
>Was there a prior kernel version where you were not having this particular problem?
No
Toan (tpham3783) wrote : | #5 |
Please note,this bug is unrelated to Bug #1331513 b/c even if TSO, GSO etc... are disabled, I can still re-producible it. The lock-up would only occur under VERY_HIGH_
System Information
Product Name: HP EliteDesk 705 G1 MT
Version:
Serial Number: 2UA5041TG4
UUID: E24D7A80-
Wake-up Type: Power Switch
SKU Number: K5U61UP#ABA
Family: 103C_53307F G=D
Here is the state of the network interface when the tigon3 driver completely locked up. Attached file is the dmesg log.
eth0 Link encap:Ethernet HWaddr 64:51:06:47:82:8a
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:90235313784 errors:30064771065 dropped:7 overruns:0 frame:120259084260
TX packets:90387363107 errors:30064771065 dropped:0 overruns:0 carrier:0
RX bytes:32978848243 (32.9 GB) TX bytes:321345086545 (321.3 GB)
PS: I just compiled linux-stable 4.0 trunk, will try to run and and report back soon.
tags: | added: latest-bios-2.15 |
tags: | added: trusty |
Toan (tpham3783) wrote : | #6 |
tags: | added: bcm5762 broadcom kernel-bug-exists-upstream linux-4.0 lucid tg3 tigon |
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
penalvch (penalvch) wrote : | #7 |
Toan, the issue you are reporting is an upstream one. Could you please report this problem to the appropriate mailing list (netdev) by following the instructions verbatim at https:/
Please provide a direct URL to your e-mail to the mailing list once you have made it so that it may be tracked via http://
Thank you for your understanding.
tags: |
added: kernel-bug-exists-upstream-4.0 removed: bcm5762 broadcom linux-4.0 tg3 tigon |
Changed in linux (Ubuntu): | |
status: | Confirmed → Triaged |
summary: |
- broadcom tg3 network driver disconnects under high load + 14e4:1687 broadcom tg3 network driver disconnects under high load |
Toan (tpham3783) wrote : | #8 |
Here is the bug report email to netdev mailing list:
Lauri Võsandi (v6sa) wrote : | #9 |
Hi, disabling highdma with ethtool seems to work around the issue. I've added following udev rule to make the changes permanent in /etc/udev/
ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}
Toan (tpham3783) wrote : | #10 |
Thank you for your valuable finding. I'll test your suggestion in the next few days to confirm that it works.
I've also reported the work-around to Broadcom dev team and suggested a patch to the tg3 driver to disable highdma. I'll keep you updated on the issue... thank you once again.
Toan (tpham3783) wrote : | #11 |
Lauri,
Can you let me know if you've tested the work-around solution on a 64bit or 32bit OS? AFAK, HIGHMEM option only allows dma support on 64bit system (>4GB), so I dont think it would make a difference if the native OS is 32bit. The reason I am asking because I've tested the bug on both 32 and 64 bit systems, so I just dont see how disabling highdma on a 32bit system would resolve the issue. Regardless, I will try the work-around solution on a 32bit system pretty soon.
Lauri Võsandi (v6sa) wrote : | #12 |
Hi,
I am running on 64-bit system. The machine didn't hiccup in ~36 hours so we stopped testing there, otherwise I managed to bump into connection drop within hours, 8 hours tops. For test I had scp copying data inbound and outbound and in addition to that Youtube was playing in several browser tabs. Higher memory usage seemed to trigger the bug faster.
Toan (tpham3783) wrote : | #13 |
Lauri,
I've pumped over 1.5TB of data and have never seen the hic-up yet. I think we've found the smoking gun. Below is a simple patch to the tigon device driver if you prefer not to use the udev rule solution.
I believe the root cause is that the tigon net driver uses virtual memory for DMA transfers. All DMA transfers should be remapped to logical memory using dma_map_page() in order for HIGHDMA feature to work. Broadcom will look into this and hopefully, the bug will be fixed upstream soon... Thanks again...
--- linux-2.
+++ linux-2.
@@ -18992,6 +18992,12 @@
+ /* pham, patch 5762 chip */
+ if (tp->pdev->device == 0x1687 || tg3_asic_rev(tp) == ASIC_REV_5762){
+ printk("tg3: disable HIGHDMA for tigon3 device 5762\r\n");
+ dev->features &= ~NETIF_F_HIGHDMA;
+ }
+
/* 5700 B0 chips do not support checksumming correctly due
* to hardware bugs.
*/
Toan (tpham3783) wrote : | #14 |
It is confirmed, disabling HIGHDMA fixed the NIC problem. This was tested by putting a system under load for 120+ hours, and simulated over 12TB of data through the tg3 NIC. Great find Lauri, and thank you again!
description: | updated |
Changed in linux (Ubuntu): | |
importance: | Medium → High |
Lauri Võsandi (v6sa) wrote : | #15 |
Hello, it seems that while running graphical user interface and highdma off similar problem persists:
NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
[...]
irq 18: nobody cared (try booting with the "irqpoll" option)
After that device goes offline and can'be brought up again with rmmod/modprobe and the mouse movement becomes jerky. The problem appears quicker if you play around with Firefox etc. Tried booting with irqpoll, the connection still drops but module can be reloaded and mouse isn't jerky. I tried this with 3.18.25 and 4.4.2 kernels, both exibited similar behaviour.
chriscrutch (chriscrutch) wrote : | #16 |
Any chance there's been any movement on this bug? It's really a pain for me. Disabling HIGHDMA helped a bit, but now it seems to kick in at different times. The bandwidth use doesn't seem to be an issue anymore, but now it disconnects with heavy data transfer to USB. It kicks in when performing large backups to an external hard drive, and when copying large video files to a SD card attached with a USB adapter.
Daniel (dkim-b) wrote : | #17 |
I am having the issue as well on kernel 4.4.0-66 (x64). Disabling HighDMA did not fix anything on my end and I cannot figure out what will trigger the issue. It seems to occur randomly and even if there is no active network traffic.
gadi (gadieid) wrote : | #18 |
It happened to me as well on Proliant 360 gent 9 Ubuntu 16.04.2 with 4.4.0-72-generic kernel
ifconfig -a didn't show any eno devices
3 identical servers (HW and SW) had no problem at all
a simple modprobe tg3 command and all eno devices (1-4) appeared
Jorge Joaquim Gomes Silva (jorgej) wrote : | #19 |
Any fix to this bug?
I have the same problem: Ubuntu 16.04 LTS, kernel: 4.8.0-46-generic. Same problem in Debian 9 kernel 4.9.
tags: |
added: bios-outdated-2.28 removed: latest-bios-2.15 |
description: | updated |
tags: |
added: kernel-bug-exists-upstream-4.11 removed: kernel-bug-exists-upstream-4.0 |
tags: | added: xenial |
luc (glarage) wrote : | #20 |
HP EliteDesk 705 G1 SFF with NetXtreme BCM5762 Gigabit Ethernet PCIe
FTTH user here, no ethernet connection after highload (speedtest or you tube) , like others users i had to reboot. The only workaround i found= [sudo ethtool -s eno1 speed 100 duplex full autoneg on] after a reboot, and i can use network but not with my full bandwidth....
Lubuntu 17.04 with 4.12.0-
Kai-Heng Feng (kaihengfeng) wrote : | #21 |
If this happens on mainline kernel, please file an upstream bug at https:/
Jorge Joaquim Gomes Silva (jorgej) wrote : | #22 |
Hi,
Have same issue with ubuntu 17.04 kernel 4.10.0.19. Any suggestions to fix this problem, besides to reduce speed of the interface?
Roger Techima (techima) wrote : | #23 |
Hello,
i am having the same problem in HP EliteDesk 705 G2 Desktop Mini.
I tried 14.04, 16.04 and 17.04 highdma off solution but this didn't solve the bug for me.
I am running in a 100Mbit network. I noticed that in gigabit seems to work, but it seems I did not test for enough time.
Best regards,
Roger
Kai-Heng Feng (kaihengfeng) wrote : | #24 |
FWIW, I can't reproduce the issue on the same chip. I used iperf instead of scp though.
luc (glarage) wrote : | #25 |
hi guys,
Not a fix but it did the trick: add to your grub iommu=soft.
You will have a fully working ethernet connection.
Mine look like this = GRUB_CMDLINE_
After you have to update grub, like you know.
Why?
Because of this lines with DMESG after i updated my bios (BIOS L06 v02.28 02/07/2017)=
[ 108.769354] psmouse serio1: Wheel Mouse at isa0060/
[ 108.903961] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[ 108.945302] tg3 0000:03:00.0 eno1: Link is down
[ 109.305448] AMD-Vi: Event logged [
[ 109.305454] IO_PAGE_FAULT device=03:00.0 domain=0x000d address=
[ 109.305459] AMD-Vi: Event logged [
[ 109.305460] IO_PAGE_FAULT device=03:00.0 domain=0x000d address=
Yngvi Hrafn Pétursson (skuti) wrote : | #26 |
Having same same issue on HP EliteDesk 705 G3 Desktop Mini (W4V44AV)
Broadcom Corporation NetXtreme BCM5762 Gigabit Ethernet PCIe (rev 10) and tg3 module
Error is triggered after the link speed is set or negotiated to 100Mbps
Usually within 15sec of ping ower 100Mbps link
But but works ok with 1Gbps links.
Can be triggered by pluging to 100Mbps port, changin the switch port to 100Mbps or:
# ethtool -s eno1 speed 100 duplex full autoneg off
Netboot works until the tg3 module takes ower.
Windows works ok.
Tested:
- multiple cables, computers and switch vendors
- upgrading bios
- ethtool disable eee and hardware offload
- ubuntu 12.04 - 17.04
- new kernel linux-generic-
- disable power management in bios
- disable power management with grup switches
- iommu=soft iommu=on iommu=off
- disable highdma
None of the workarounds that i found on Google worked for me.
modinfo tg3 | grep -v alias
filename: /lib/modules/
firmware: tigon/tg3_tso5.bin
firmware: tigon/tg3_tso.bin
firmware: tigon/tg3.bin
version: 3.137
license: GPL
description: Broadcom Tigon3 ethernet driver
author: David S. Miller (<email address hidden>) and Jeff Garzik (<email address hidden>)
srcversion: 8C06FB0EBBF221D
depends: ptp
intree: Y
vermagic: 4.4.0-92-generic SMP mod_unload modversions
parm: tg3_debug:Tigon3 bitmapped debugging message enable value (int)
Paulo Abadie Guedes (paulo.guedes) wrote : | #27 |
Hello, I have seen the exactly same issue, with the exactly same hardware you have: it's the HP EliteDesk 705 G3 Desktop Mini.
I've tested already a ton of options, including recompiling the latest kernel, booting with several parameters, and so on and so forth. Got nothing more than a big headache. I have 100+ machines to install in a month and my team is having a really hard time to deal with this issue.
I have posted my findings on the fog forums. Fog is an open-source cloning tool. Please check it out:
Any ideas on this bug? It seems to be related to 10/100 switches. If both ends are gigabit, it works much more reliably. Problems still arise, but much less frequently. With my old "fast ethernet" switch, the problem alwasy happens.
It's lurking anywhere between the binary blob (the firmware), the kernel driver, the hardware or any tricky combination of these. Perhaps related to the AMD platform
I can run tests or gather more data, if it helps. The issue always happens here.
Any ideas on how to solve or workaround this issue? Patches or parameters are welcome...
Regards,
Paulo
Tessio Fechine (tessiof) wrote : | #28 |
There was a commit to fix something about the BCM5762 variant, but it seems to be restricted to DELL servers..
https:/
Kai-Heng Feng (kaihengfeng) wrote : Re: [Bug 1447664] Re: 14e4:1687 broadcom tg3 network driver disconnects under high load | #29 |
> On 11 Jan 2018, at 9:23 PM, Tessio Fechine <email address hidden> wrote:
>
> There was a commit to fix something about the BCM5762 variant, but it seems to be restricted to DELL servers..
> https:/
Can you try it without the if block?
If you don’t know how to compile kernel, I can build kernel package.
>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https:/
>
> Title:
> 14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
> Triaged
> Status in linux package in Debian:
> New
>
> Bug description:
> The tg3 broadcom network driver that binds with chipset 5762 goes offline and unable to recover (even with tg3 watchdog timeout) when network transmit is under high load. Call trace:
> https:/
>
> When this happens, only a reboot would be able to fix it. Sometimes,
> however, bringing the interface offline and online (via ifconfig)
> would recover networking. I've also tested with the latest tg3 driver
> (dec 2014 version) and networking is still problematic. I have also
> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
> This bug may be related to the integrated Firmware.
>
> Here is the procedure to replicate the issue because it is hard to
> replicate it under moderate network load.
>
> 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a Ubuntu/Kubunu Live CD 14.04-15.04.
> 2. from another machine: start 5 sessions, repetitively copy (scp with public key authentication) a 70 meg file back and forth to the tg3 machine in each session. (not sure if this is necessary)
> 3. create a 1GB file on the tg3 machine, with something like dd if=/dev/urandom of=/my/test/file bs=1024 count=$
> 4. from another machine: repetitively scp copy that 1GB file from the tg3 machine. This can be done with something like:
>
> while [ 0 ]; do
> scp -i /my/scp/private.key <email address hidden>
> done;
>
> Networking will mostly goes offline in about 10-30 minutes.
>
> WORKAROUND: Add udev rule to make the changes permanent in /etc/udev/
> ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}
>
> ProblemType: Bug
> DistroRelease: Ubuntu 15.04
> Package: linux-image-
> ProcVersionSign
> Uname: Linux 3.19.0-15-generic x86_64
> ApportVersion: 2.17.2-0ubuntu1
> Architecture: amd64
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC1: kubuntu 3748 F.... pulseaudio
> /dev/snd/controlC0: kubuntu 3748 F.... pulseaudio
> CasperVersion: 1.360
> Date: Thu Apr 23 11:16:24 2015
> IwConfig:
> eth0 no wireless extensions.
>
> lo no wireless extensions.
> LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
> MachineType: Hewlett-Packard HP EliteDesk 705 G...
Tessio Fechine (tessiof) wrote : | #30 |
If you point me to the kernel package I can try it..
Kai-Heng Feng (kaihengfeng) wrote : | #31 |
There you go:
http://
Yngvi Hrafn Pétursson (skuti) wrote : | #32 |
I tested this kernel but was unable to mount the hard disk.
Missing modules for HP EliteDesk 705 G3 Desktop Mini?
Kai-Heng Feng (kaihengfeng) wrote : | #33 |
Probably. I built a new one, please give it a try:
http://
Yngvi Hrafn Pétursson (skuti) wrote : | #34 |
This kernel works on the HP box i have.
Tested with Firefox and speedtest.net.
Tested with iperf3 on 1Gpbs, 100Mbps full-duplex and 100Mbps half-duplex.
No timeouts or errors in dmesg :)
Tessio Fechine (tessiof) wrote : | #35 |
tg3 still crashing..
[ 301.753501] tg3 0000:01:00.0 eno1: Link is up at 100 Mbps, full duplex
[ 301.753546] tg3 0000:01:00.0 eno1: Flow control is off for TX and off for RX
[ 301.753551] tg3 0000:01:00.0 eno1: EEE is disabled
[ 312.032110] NETDEV WATCHDOG: eno1 (tg3): transmit queue 0 timed out
[ 312.032190] ------------[ cut here ]------------
[ 312.032208] WARNING: CPU: 1 PID: 0 at /home/khfeng/
[ 312.032209] Modules linked in: rfcomm bnep nls_iso8859_1 edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel btusb joydev btrtl btbcm btintel input_leds snd_hda_
[ 312.032305] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.0-17-generic #20~lp1447664
[ 312.032307] Hardware name: HP HP EliteDesk 705 G2 MINI/805B, BIOS N26 Ver. 02.11 11/01/2016
[ 312.032310] task: ffff88952c81c500 task.stack: ffff9df2c19c4000
[ 312.032314] RIP: 0010:dev_
[ 312.032317] RSP: 0018:ffff88953e
[ 312.032320] RAX: 0000000000000037 RBX: 0000000000000000 RCX: 0000000000000000
[ 312.032322] RDX: 0000000000000000 RSI: ffff88953ec96598 RDI: ffff88953ec96598
[ 312.032323] RBP: ffff88953ec83e80 R08: 0000000000000001 R09: 00000000000003bf
[ 312.032325] R10: ffff88953ec83ee0 R11: 0000000000000000 R12: 0000000000000005
[ 312.032327] R13: 0000000000000001 R14: ffff8895226ea000 R15: ffff889521856d80
[ 312.032330] FS: 000000000000000
[ 312.032333] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 312.032334] CR2: 00000000021a0008 CR3: 00000003a6126000 CR4: 00000000001406e0
[ 312.032337] Call Trace:
[ 312.032341] <IRQ>
[ 312.032349] ? qdisc_rcu_
[ 312.032358] call_timer_
[ 312.032361] run_timer_
[ 312.032367] ? ktime_get+0x40/0xa0
[ 312.032371] ? lapic_next_
[ 312.032377] __do_softirq+
[ 312.032382] irq_exit+0xb6/0xc0
[ 312.032385] smp_apic_
[ 312.032388] apic_timer_
[ 312.032390] </IRQ>
[ 312.032397] RIP: 0010:cpuidle_
[ 312.032399] RSP: 0018:ffff9df2c1
[ 312.032402] RAX: ffff88953eca2c40 RBX: 00000048a68f835f RCX: 000000000000001f
[ 312.032403] RDX: 00000048a68f835f RSI: fffffffb76b082a3 RDI: 0000000000000000
[ 312.032405] RBP: ffff9df2c19c7eb0 R08: 0000000000000858 R09: 0000000000000861
[ 312.032407] R10: ffff9df2c19c7e40 R11: 0000000000000643 R12: ffff8895...
Kai-Heng Feng (kaihengfeng) wrote : | #36 |
Take a deeper look, I don't think [1] will help the situation. It's for mainly to solve the issue on jumbo frame.
I thinks it's better to ask HP and Broadcom to fix the issue.
Paulo Abadie Guedes (paulo.guedes) wrote : | #37 |
Hello, I am still having this bug. I'm working with several HP machines, with the same model as Yngvi. Here it is (from dmesg messages):
Hardware name: HP HP EliteDesk 705 G3 Brazil Desktop Mini/8266, BIOS P26 Ver. 02.03 12/22/2016
Interesting to notice that it always happens with a 10/100 switch, but never occurs with a gigabit one.
I've compiled and tested the 4.15.0-rc8 release candidade, which has the commit 4419bb1cedcda02
I tried also to force the patch, by keeping the "if body" and removing the condition, just to see what happens (with another printk to prove that it runs). The code runs (limiting MRRS t0 2048, I think), but it does not solve the bug.
It complains that TSC is unstable, right after tg3 breaks. Here is a dmesg snippet, maybe it helps.
<...>
[ 155.816404] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 155.816447] clocksource: 'refined-jiffies' wd_now: fffdcbf3 wd_last: fffdc110 mask: ffffffff
[ 155.816490] clocksource: 'tsc' cs_now: 7d3f16e620 cs_last: 7b2987b172 mask: ffffffffffffffff
[ 155.816533] tsc: Marking TSC unstable due to clocksource watchdog
[ 155.939181] tg3 0000:01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[ 156.103998] tg3 0000:01:00.0 eth0: Link is down
[ 156.322988] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[ 156.323040] sched_clock: Marking unstable (156322980975, 5436)<-
[ 156.323144] clocksource: Switched to clocksource refined-jiffies
<...>
If you want to take a deeper look, there are a few logs here. Tried also with "tsc=unstable" and other boot parameters, mostly to see if any would help (feeling lucky, perhaps?). Nothing changed, the bug is still in here. They show mostly the same messages, to me.
log_01_acpi_off.txt
https:/
log_02_
https:/
log_03_
https:/
log_04_
https:/
log_05_
https:/
Well, any ideas? I can reproduce the problem 100% of the time. Would you like me to test any other patch?
Kai-Heng Feng, you mention "it's better to ask HP and Broadcom to fix the issue". I agree, but how can we do that?
Thank you,
Paulo
Kai-Heng Feng (kaihengfeng) wrote : | #38 |
First please file an upstream bug at https:/
Product: Drivers
Component: Network
Also, looks like it's a Ubuntu certified hardware, let me ask around.
Paulo Abadie Guedes (paulo.guedes) wrote : | #39 |
Hello, I would like to confirm that it's useful to file a new bug for this
issue. For me, the problem I'm having is the same as we are discussing in
this thread. Would it be just a duplicate?
Maybe I'm missing something, because I don't know the details of the bug
hunting process for Ubuntu.
Can you please confirm I should open it?
In this case, I can add a detailed description and dmesg logs, with debug
on and the timeout error message inside.
Anyway, I want to report advances in this problem. I have tested a few
kernels and patches in the last weeks, and have found one combination that
does solve the issue.
I also checked that this patch is not yet merged into the latest vanilla
stable kernel, version 4.15, released three days ago. But it patches and
works also for 4.15, which is just great (at last for me).
Will send the details later (or tomorrow), as soon as I get back to my
computer.
Paulo
On Jan 29, 2018 12:54 AM, "Kai-Heng Feng" <email address hidden>
wrote:
> First please file an upstream bug at https:/
> Product: Drivers
> Component: Network
>
> Also, looks like it's a Ubuntu certified hardware, let me ask around.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> 14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
> Triaged
> Status in linux package in Debian:
> New
>
> Bug description:
> The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load. Call trace:
> https:/
>
> When this happens, only a reboot would be able to fix it. Sometimes,
> however, bringing the interface offline and online (via ifconfig)
> would recover networking. I've also tested with the latest tg3 driver
> (dec 2014 version) and networking is still problematic. I have also
> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
> This bug may be related to the integrated Firmware.
>
> Here is the procedure to replicate the issue because it is hard to
> replicate it under moderate network load.
>
> 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
> using a Ubuntu/Kubunu Live CD 14.04-15.04.
> 2. from another machine: start 5 sessions, repetitively copy (scp with
> public key authentication) a 70 meg file back and forth to the tg3 machine
> in each session. (not sure if this is necessary)
> 3. create a 1GB file on the tg3 machine, with something like dd
> if=/dev/urandom of=/my/test/file bs=1024 count=$
> 4. from another machine: repetitively scp copy that 1GB file from the
> tg3 machine. This can be done with something like:
>
> while [ 0 ]; do
> scp -i /my/scp/private.key <email address hidden>
> done;
>
> Networking will mostly goes offline in about 10-30 minutes.
>
> WORKAROUND: Add udev rule to make the changes permanent in
> /etc/udev/
> ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}
Paulo Abadie Guedes (paulo.guedes) wrote : | #40 |
Hello, this thread has a patch that solved the bug (for me).
https://<email address hidden>
The patch is here:
https://<email address hidden>
I tested this patch on the following kernels and situations.
1) Stable kernels 4.13.3 and 4.15 crash without the patch (plus all other versions tested). Patch is not merged yet in the main linux branch, until (and including) 4.15 (stable).
2) Stable kernels 4.13.3 and 4.15 work great with the patch: no timeouts on tg3. Fast transfers on gigabit links and 10/100 links.
3) I wrote to the patch author, mentioned my results and asked when it will be merged on Jan 31 (10 days ago). Still waiting, probably the author is currently quite busy.
4) A lot of tests performed during weeks. The last session took about one or two weeks, working full time, on an isolated network. Using the fog open source cloning solution. Several hundreds of GB transferred during tests, for cloning 100+ machines inside a few labs. Both single and multicast cloning sessions used. Tested with a gigabit switch and also with 10/100 switches. Checked both single and multicast, sequential tests, in parallel, with/without power failures, with/without several patches, in many configurations, with lots of kernel parameters, you name it.
5) The test scenario shows this bug is completely reproducible, 100% of the time. Without the patch, my kernels always fail. Tested about 20 different versions and none worked. With the patch above, the two versions always work correctly.
6) A minor detail: patch has a slight offset for 4.15 (2 lines, probably new comments or code) but works anyway.
This work would be impossible without all the cooperation from the fog team. Sebastian suggested the patch, and others helped a lot. A big "thank you" for them!
I wonder when this will be merged in the main kernel. Please, can anyone help on this?
Regards,
Paulo
Kai-Heng Feng (kaihengfeng) wrote : | #41 |
Kernel with patch in comment #40. Please try it out.
Changed in linux (Ubuntu): | |
assignee: | nobody → Kai-Heng Feng (kaihengfeng) |
Paulo Abadie Guedes (paulo.guedes) wrote : | #42 |
Thank you, we will try it as soon as possible.
Currently I'm on vacation, and will not be able to test it until about
March 5 (2 weeks from now). But as soon as I test it, I'll let you know
about the results.
It would be great if someone else could try it too.
Thanks,
Paulo
On Feb 12, 2018 3:25 AM, "Kai-Heng Feng" <email address hidden>
wrote:
Kernel with patch in comment #40. Please try it out.
http://
--
You received this bug notification because you are subscribed to the bug
report.
https:/
Title:
14e4:1687 broadcom tg3 network driver disconnects under high load
Status in linux package in Ubuntu:
Triaged
Status in linux package in Debian:
New
Bug description:
marc (boolioncube) wrote : | #43 |
i recently got one of these EliteDesks. tg3 locks up like once a week; seems to happen when flexget adds a bunch to transmission ... it spikes the TX... and boom. i just installed the patched kernel now. thanks yall.
Ed S (imimimx) wrote : | #44 |
dpkg: dependency problems prevent configuration of linux-headers-
linux-
Package libssl1.1 is not installed
Depending version problem for Ubuntu 16.04?
ii libssl-dev:amd64 1.0.2g-1ubuntu4.10 amd64 Secure Sockets Layer toolkit - development files
Kai-Heng Feng (kaihengfeng) wrote : | #45 |
The kernel was compiled in Bionic, so it has wrong dependency on Xenial.
I built a new one, please give it a try:
http://
Kai-Heng Feng (kaihengfeng) wrote : | #46 |
Guy, Broadcom has a new patch [1] that need to test.
Here's the kernel [2] to try.
[1] https:/
[2] https:/
Paulo Abadie Guedes (paulo.guedes) wrote : | #47 |
Ok, I'll check it out. Thank you very much!
By the way, we downloaded and tested one of the Deb packages you created,
and it worked quite well. Will check which one was exactly before
reporting (almost sure it was the one for xenial).
We managed to reproduce the issue easily by booting into pxe and, after the
nic was started (trying to get an ip), we reset the machine and booted into
Ubuntu. There is a huge difference by doing this and doing a cold boot,
directly into Ubuntu.
My hypothesis is that pxe setups the nic in a way that is not the default,
by changing one (or more) of the config bits for some register. This same
bit(s) is/are not being touched by the tg3 driver without patch. This way,
a boot may work sometimes, maybe due to default values not being set by the
kernel module tg3 (and being set by pxe code, if it executed before Linux
is loaded).
Anyway, the unpatched kernel breaks very quickly, while the patched kernel
you provided worked out very well. This happens after running pxe.
I will check your links soon and return with our results in the next days,
hopefully this weekend or next week.
Thank you,
Paulo
On Mar 20, 2018 14:16, "Kai-Heng Feng" <email address hidden> wrote:
Guy, Broadcom has a new patch [1] that need to test.
Here's the kernel [2] to try.
[1] https:/
[2] https:/
--
You received this bug notification because you are subscribed to the bug
report.
https:/
Title:
14e4:1687 broadcom tg3 network driver disconnects under high load
Status in linux package in Ubuntu:
Triaged
Status in linux package in Debian:
New
Bug description:
The tg3 broadcom network driver that binds with chipset 5762 goes offline
and unable to recover (even with tg3 watchdog timeout) when network
transmit is under high load. Call trace:
https:/
When this happens, only a reboot would be able to fix it. Sometimes,
however, bringing the interface offline and online (via ifconfig)
would recover networking. I've also tested with the latest tg3 driver
(dec 2014 version) and networking is still problematic. I have also
disabled TSO, GSO etc... with ethtool and the bug still surfaces.
This bug may be related to the integrated Firmware.
Here is the procedure to replicate the issue because it is hard to
replicate it under moderate network load.
1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using
a Ubuntu/Kubunu Live CD 14.04-15.04.
2. from another machine: start 5 sessions, repetitively copy (scp with
public key authentication) a 70 meg file back and forth to the tg3 machine
in each session. (not sure if this is necessary)
3. create a 1GB file on the tg3 machine, with something like dd
if=/dev/urandom of=/my/test/file bs=1024 count=$
4. from another machine: repetitively scp copy that 1GB file from the tg3
machine. This can be done with something like:
while [ 0 ]; do
scp -i /my/scp/private.key <email address hidden>
done;
Networking will mostly goes offline in about 10-30 minutes.
WORKAROUN...
Kai-Heng Feng (kaihengfeng) wrote : | #48 |
Folks,
tg3 maintainers are waiting for the test result. Hopefully it can fix the issue.
luc (glarage) wrote : | #49 |
Hi Kai-heng,
I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018 x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
Unfortunelly, i have the same problem= (TG3 still crash, a reboot is mandatory)
[ 105.620301] tg3 0000:03:00.0 eno1: 0: Host status block [00000001:
[ 105.620309] tg3 0000:03:00.0 eno1: 0: NAPI info [000000cc:
[ 105.620317] tg3 0000:03:00.0 eno1: 1: Host status block [00000001:
[ 105.620324] tg3 0000:03:00.0 eno1: 1: NAPI info [00000042:
[ 105.620331] tg3 0000:03:00.0 eno1: 2: Host status block [00000001:
[ 105.620370] tg3 0000:03:00.0 eno1: 2: NAPI info [000000d2:
[ 105.755739] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[ 105.797123] tg3 0000:03:00.0 eno1: Link is down
[ 105.889440] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=
[ 105.889478] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=
[ 109.932707] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
[ 109.932710] tg3 0000:03:00.0 eno1: Flow control is off for TX and off for RX
[ 109.932711] tg3 0000:03:00.0 eno1: EEE is enabled
Paulo Abadie Guedes (paulo.guedes) wrote : | #50 |
We tried this same version yesterday and the bug was still present.
Actually it looked worse, because the machine crashed faster (maybe was
just an impression). Will collect logs to report this properly soon, in a
few hours.
Paulo
On Fri, Apr 13, 2018, 13:55 luc <email address hidden> wrote:
> Hi Kai-heng,
>
> I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018
> x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
> I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28
> 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
> Unfortunelly, i have the same problem= (TG3 still crash, a reboot is
> mandatory)
>
> [ 105.620301] tg3 0000:03:00.0 eno1: 0: Host status block
> [00000001:
> [ 105.620309] tg3 0000:03:00.0 eno1: 0: NAPI info
> [000000cc:
> [ 105.620317] tg3 0000:03:00.0 eno1: 1: Host status block
> [00000001:
> [ 105.620324] tg3 0000:03:00.0 eno1: 1: NAPI info
> [00000042:
> [ 105.620331] tg3 0000:03:00.0 eno1: 2: Host status block
> [00000001:
> [ 105.620370] tg3 0000:03:00.0 eno1: 2: NAPI info
> [000000d2:
> [ 105.755739] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00
> enable_bit=2
> [ 105.797123] tg3 0000:03:00.0 eno1: Link is down
> [ 105.889440] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> domain=0x000d address=
> [ 105.889478] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> domain=0x000d address=
> [ 109.932707] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
> [ 109.932710] tg3 0000:03:00.0 eno1: Flow control is off for TX and off
> for RX
> [ 109.932711] tg3 0000:03:00.0 eno1: EEE is enabled
>
> ** Attachment added: "Bug tg3"
>
> https:/
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> 14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
> Triaged
> Status in linux package in Debian:
> New
>
> Bug description:
> The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load. Call trace:
> https:/
>
> When this happens, only a reboot would be able to fix it. Sometimes,
> however, bringing the interface offline and online (via ifconfig)
> would recover networking. I've also tested with the latest tg3 driver
> (dec 2014 version) and networking is still problematic. I have also
> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
> This bug may be related to the integrated Firmware.
>
> Here is the procedure to replicate the issue because it is hard to
> replicate it ...
Paulo Abadie Guedes (paulo.guedes) wrote : | #51 |
- log_kernel_4_15_0_9_generic_tg3_working.txt Edit (65.3 KiB, text/plain; charset="US-ASCII"; name="log_kernel_4_15_0_9_generic_tg3_working.txt")
- log_kernel_4_15_0_14_generic.txt Edit (100.0 KiB, text/plain; charset="US-ASCII"; name="log_kernel_4_15_0_14_generic.txt")
Hi Kai-heng,
Here are the test results we got.
Kernel 4.15.0-14-generic failed. Transmit queue timed out. The dmesg output
is attached. The tg3 module crashes in a few seconds right after opening
the user session (e.g. about less than 10 sec).
However, kernel 4.15.0-9-generic worked like a charm. It boots and brings
up tg3, the Ethernet link is working and the module seems stable. We tested
it to download a few gb, an Ubuntu image, play videos for a few hours and
the like. Not even a single crash was observed. The dmesg output for this
working kernel is attached also, because maybe it might help you to sort
out what's different from one kernel to the other.
Would you like us to test another image? Or to gather more information?
Regards,
Paulo
On Fri, Apr 13, 2018, 14:03 Paulo Guedes - IFPE - Campus Recife <
<email address hidden>> wrote:
> We tried this same version yesterday and the bug was still present.
> Actually it looked worse, because the machine crashed faster (maybe was
> just an impression). Will collect logs to report this properly soon, in a
> few hours.
> Paulo
>
> On Fri, Apr 13, 2018, 13:55 luc <email address hidden> wrote:
>
>> Hi Kai-heng,
>>
>> I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018
>> x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
>> I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28
>> 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
>> Unfortunelly, i have the same problem= (TG3 still crash, a reboot is
>> mandatory)
>>
>> [ 105.620301] tg3 0000:03:00.0 eno1: 0: Host status block
>> [00000001:
>> [ 105.620309] tg3 0000:03:00.0 eno1: 0: NAPI info
>> [000000cc:
>> [ 105.620317] tg3 0000:03:00.0 eno1: 1: Host status block
>> [00000001:
>> [ 105.620324] tg3 0000:03:00.0 eno1: 1: NAPI info
>> [00000042:
>> [ 105.620331] tg3 0000:03:00.0 eno1: 2: Host status block
>> [00000001:
>> [ 105.620370] tg3 0000:03:00.0 eno1: 2: NAPI info
>> [000000d2:
>> [ 105.755739] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00
>> enable_bit=2
>> [ 105.797123] tg3 0000:03:00.0 eno1: Link is down
>> [ 105.889440] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x000d address=
>> [ 105.889478] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x000d address=
>> [ 109.932707] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
>> [ 109.932710] tg3 0000:03:00.0 eno1: Flow control is off for TX and off
>> for RX
>> [ 109.932711] tg3 0000:03:00.0 eno1: EEE is enabled
>>
>> ** Attachment added: "Bug tg3"
>>
>> https:/
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https:/
>>
>> Title:
>> 14e4:1687 ...
luc (glarage) wrote : | #52 |
- TG3 4.15.9 Edit (214.5 KiB, text/html)
Sorry for multi posting, didn't saw the 4.15.0.9 kernel before... :)
TG3 still crash, but not too early... I made several video on full HD + several speed test before losing connection; (FTTH here, my download speed is about 290 Mbps)
luc (glarage) wrote : | #53 |
Hi guys,
A little review about the new bios (2.30) available for HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.30 03/22/2018.
It's change nothing about the TG3 driver= still crash (without iommu=soft, in my case) .... :(
[ 80.864034] ------------[ cut here ]------------
[ 80.864039] NETDEV WATCHDOG: eno1 (tg3): transmit queue 0 timed out
[ 80.864081] WARNING: CPU: 1 PID: 0 at /home/khfeng/
[ 80.864082] Modules linked in: nls_iso8859_1 edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_
[ 80.864136] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-9-generic #10~lp1447664+
[ 80.864137] Hardware name: Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.30 03/22/2018
[ 80.864141] RIP: 0010:dev_
[ 80.864143] RSP: 0018:ffff9d3cae
[ 80.864146] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 80.864147] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9d3caec96450
[ 80.864149] RBP: ffff9d3caec83e98 R08: 0000000000000001 R09: 00000000000003da
[ 80.864150] R10: 0000000000000000 R11: 00000000000003da R12: 0000000000000005
[ 80.864152] R13: ffff9d3c9b4a4000 R14: ffff9d3c9b4a4478 R15: ffff9d3c9af34d80
[ 80.864154] FS: 000000000000000
[ 80.864156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 80.864158] CR2: 00002547c8b50c00 CR3: 000000022188c000 CR4: 00000000000406e0
[ 80.864160] Call Trace:
[ 80.864163] <IRQ>
[ 80.864168] ? dev_graft_
[ 80.864174] call_timer_
[ 80.864178] run_timer_
[ 80.864182] ? ktime_get+0x3e/0xa0
[ 80.864186] ? lapic_next_
[ 80.864192] __do_softirq+
[ 80.864196] irq_exit+0xb6/0xc0
[ 80.864200] smp_apic_
[ 80.864204] apic_timer_
[ 80.864205] </IRQ>
[ 80.864210] RIP: 0010:cpuidle_
[ 80.864212] RSP: 0018:ffffbd7700
[ 80.864215] RAX: ffff9d3caeca2840 RBX: 0000000000000002 RCX: 000000000000001f
[ 80.864216] RDX: 0000000000000000 RSI: 0000000024a3c7c4 RDI: 0000000000000000
[ 80.864218] RBP: ffffbd7700d4fe98 R08: ffff9d3caeca1664 R09: 0000000000000018
[ 80.864219] R10: ffffbd7700d4fe30 R11: 000000000000011c R12: 0000000000000002
[ 80.864221] R13: ffff9d3ca5f1b000 R14: ffffffffbf3802f8 R15: 00000012d3b48a8f
[ 80.864226] cpuidle_
[ 80.864230] call_cpuidle+
[ 80.864233] do_idle+0x197/0x200
[ 80.864236] cpu_start...
Kai-Heng Feng (kaihengfeng) wrote : | #54 |
I guess this commit fixes the issue. Can anyone try it?
commit 3a498606bb04af6
Author: Sanjeev Bansal <email address hidden>
Date: Mon Jul 16 11:13:32 2018 +0530
tg3: Add higher cpu clock for 5762.
This patch has fix for TX timeout while running bi-directional
traffic with 100 Mbps using 5762.
Signed-off-by: Sanjeev Bansal <email address hidden>
Signed-off-by: Siva Reddy Kallam <email address hidden>
Reviewed-by: Michael Chan <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
Hi,
the commit 3a498606bb04af6
- For a 1Gbps ethernet, nothing was changed with this commit. There was immediately the crash after some transmission.
- With the 100M ethernet, the crash is not very often triggered even without the above fix. I cannot judge yet.
Best regards
luc (glarage) wrote : | #56 |
Actually with 4.19.6 and Bios HP V02.31, Tg3 still crash with 100Mbps or 1 Gbps
Logs are still the same
Kai-Heng Feng (kaihengfeng) wrote : | #57 |
Yea I saw the same issue on Gigabits ethernet. I raised the issue [1] to the tg3 maintainers.
Do you use 5762?
luc (glarage) wrote : | #58 |
Yep, dmesg | grep tg3 | less =
tg3.c:v3.137 (May 11, 2014)
tg3 0000:03:00.0 eth0: Tigon3 [partno(BCM95762) rev 5762100] (PCI Express)
tg3 0000:03:00.0 eth0: attached PHY is 5762C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
tg3 0000:03:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
tg3 0000:03:00.0 eth0: dma_rwctrl[
tg3 0000:03:00.0 eno1: renamed from eth0
tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
tg3 0000:03:00.0 eno1: Flow control is on for TX and on for RX
tg3 0000:03:00.0 eno1: EEE is enabled
BTW, thanks for your time, Kai-Heng Feng
Bob Lawrence (pilotbob42) wrote : | #59 |
Confirmed that this is still an issue on 18.04.1. I have an HP 705 G1 with the Broadcom 5762. In my case it's a Plex server. Whenever I try to stream something the interface goes "NO-CARRIER" and the only way to recover is to reboot. I've tried disabling highdma, tso and gso using ethtool, iommu=soft kernel parameter, and forcing every combo of 1gbps/100mbps & half/full duplex. Nothing seems to workaround the issue.
System: Host: Bobs-HTPC Kernel: 4.15.0-43-generic x86_64 bits: 64 Console: tty 1 Distro: Ubuntu 18.04.1 LTS
Machine: Device: desktop System: Hewlett-Packard product: HP EliteDesk 705 G1 DM serial: N/A
Mobo: Hewlett-Packard model: 225E serial: N/A BIOS: Hewlett-Packard v: L06 v02.31 date: 08/31/2018
Battery hidpp__0: charge: N/A condition: NA/NA Wh
CPU: Quad core AMD A8-7600 Radeon R7 10 Compute Cores 4C+6G (-MCP-) cache: 8192 KB
clock speeds: max: 3100 MHz 1: 3094 MHz 2: 3094 MHz 3: 3094 MHz 4: 3094 MHz
Graphics: Card: Advanced Micro Devices [AMD/ATI] Kaveri [Radeon R7 Graphics]
Display Server: N/A drivers: ati,radeon (unloaded: modesetting,
tty size: 120x53 Advanced Data: N/A out of X
Audio: Card-1 Advanced Micro Devices [AMD] FCH Azalia Controller driver: snd_hda_intel
Card-2 Advanced Micro Devices [AMD/ATI] Kaveri HDMI/DP Audio Controller driver: snd_hda_intel
Sound: Advanced Linux Sound Architecture v: k4.15.0-43-generic
Network: Card-1: Intel Wireless 7260 driver: iwlwifi
IF: wlp2s0 state: up mac: cc:3d:82:a7:bf:ed
Card-2: Broadcom Limited NetXtreme BCM5762 Gigabit Ethernet PCIe driver: tg3
IF: eno1 state: up speed: 100 Mbps duplex: half mac: ec:b1:d7:4c:2d:8e
Drives: HDD Total Size: 9501.7GB (42.8% used)
ID-1: /dev/sda model: ST500LM000 size: 500.1GB
ID-2: USB /dev/sdb model: 5 size: 9001.6GB
Partition: ID-1: / size: 458G used: 23G (6%) fs: ext4 dev: /dev/sda1
RAID: No RAID devices: /proc/mdstat, md_mod kernel module present
Sensors: System Temperatures: cpu: 40.8C mobo: N/A gpu: 42.0
Fan Speeds (in rpm): cpu: N/A
Info: Processes: 227 Uptime: 12:49 Memory: 1608.0/5943.7MB Init: systemd runlevel: 5
Client: Shell (bash) inxi: 2.3.56
Paulo Abadie Guedes (paulo.guedes) wrote : | #60 |
Thank you.
I am still having the problem during our cloning process, although it's not
so frequent. Before the patch I applied, each and every transfer would
ALWAYS kick the tg3 bug.
Here it seems related to problems with NAPI. AFAIK, this is an approach to
handle interrupt bursts. NIC's work typically in bursts: a long time
without packets, then a very large stream of packets, then silence. This is
the common scenario.
Having interrupts to serve sporadic data is ok. But a burst of packets
trigger a burst of interrupts, which is not as efficient as just polling
the NIC (during the burst).
What NAPI does is (in a very very simplified way): it expects the first
interrupt from the network, then switches off interrupts, poll the NIC (up
to a limit) until there are no more network packets, or the "work quota" is
exhausted, what happens first. Then it turns on interrupts and the cycle
repeats. This quota (sorry, don't remember the correct term) is very
important to prevent the kernel from being stuck just serving packets.
What's happening is (my understanding) that something went wrong during
this process and the tg3 driver gets stuck.
A colleague told me that it's related to the broadcom driver.
Please try this workaround. Remove the two drivers, then reload "broadcom"
and "tg3" in this order. Maybe then your network will restart.
sudo modprobe -r broadcom tg3
sudo modprobe broadcom
sudo modprobe tg3
Please tell us what happens when you try this. It won't solve the problem,
but perhaps it helps.
Regards,
Paulo
On Sat, Jan 26, 2019, 10:39 Bob Lawrence <<email address hidden> wrote:
> Confirmed that this is still an issue on 18.04.1. I have an HP 705 G1
> with the Broadcom 5762. In my case it's a Plex server. Whenever I try to
> stream something the interface goes "NO-CARRIER" and the only way to
> recover is to reboot. I've tried disabling highdma, tso and gso using
> ethtool, iommu=soft kernel parameter, and forcing every combo of
> 1gbps/100mbps & half/full duplex. Nothing seems to workaround the issue.
>
> System: Host: Bobs-HTPC Kernel: 4.15.0-43-generic x86_64 bits: 64
> Console: tty 1 Distro: Ubuntu 18.04.1 LTS
> Machine: Device: desktop System: Hewlett-Packard product: HP EliteDesk
> 705 G1 DM serial: N/A
> Mobo: Hewlett-Packard model: 225E serial: N/A BIOS:
> Hewlett-Packard v: L06 v02.31 date: 08/31/2018
> Battery hidpp__0: charge: N/A condition: NA/NA Wh
> CPU: Quad core AMD A8-7600 Radeon R7 10 Compute Cores 4C+6G (-MCP-)
> cache: 8192 KB
> clock speeds: max: 3100 MHz 1: 3094 MHz 2: 3094 MHz 3: 3094 MHz
> 4: 3094 MHz
> Graphics: Card: Advanced Micro Devices [AMD/ATI] Kaveri [Radeon R7
> Graphics]
> Display Server: N/A drivers: ati,radeon (unloaded:
> modesetting,
> tty size: 120x53 Advanced Data: N/A out of X
> Audio: Card-1 Advanced Micro Devices [AMD] FCH Azalia Controller
> driver: snd_hda_intel
> Card-2 Advanced Micro Devices [AMD/ATI] Kaveri HDMI/DP Audio
> Controller driver: snd_hda_intel
> Sound: Advanced Linux Sound Architecture v: k4.15.0-43-generic
> Network: Card-1: Intel Wireless 7260 driver: iwlwifi
> I...
marc (boolioncube) wrote : | #61 |
we're using this hp g1 thing primarily for torrent seeding and
gnuMotion... so it is consistently getting ~5mb/s and pushing out 300kb/s.
the explanation on #60 might explain why this recipe is somewhat stable --
its always active.
i also havnt updated the bios - i wont have physical access for a few
months - but since others are saying it makes no diff ... think i wont
bother
- uptime/ 48 days
- RX/ 18tb
- TX/ 4tb
- 2 tg3 hangs ... and it restarted the driver on its own
------------------
zosky@mintyElite:~$ uname -a
Linux mintyElite 4.15.0-9-generic #10~lp1447664+
15:51:40 CST 2018 x86_64 x86_64 x86_64 GNU/Linux
zosky@mintyElite:~$ ifconfig
eno1 Link encap:Ethernet HWaddr 50:65:f3:51:fe:7e
inet addr:192.168.1.62 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:13545805910 errors:0 dropped:1445 overruns:0 frame:0
TX packets:9698573442 errors:0 dropped:0 overruns:0 carrier:0
* RX bytes:180462217
TB)*
zosky@mintyElite:~$ dmesg | grep tg3 | head -10
[ 2.828297] tg3.c:v3.137 (May 11, 2014)
[ 2.846250] tg3 0000:01:00.0 eth0: Tigon3 [partno(BCM95762) rev *5762100*]
(PCI Express) MAC address 50:65:f3:51:fe:7e
[ 2.847035] tg3 0000:01:00.0 eth0: attached PHY is 5762C
(10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[ 2.847880] tg3 0000:01:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0]
ASF[0] TSOcap[1]
[ 2.848796] tg3 0000:01:00.0 eth0: dma_rwctrl[
[ 2.850519] tg3 0000:01:00.0 eno1: renamed from eth0
[ 46.205677] tg3 0000:01:00.0 eno1: Link is up at 1000 Mbps, full duplex
[ 46.205679] tg3 0000:01:00.0 eno1: Flow control is on for TX and on for
RX
[ 46.205681] tg3 0000:01:00.0 eno1: EEE is disabled
[2700404.396192] NETDEV WATCHDOG: eno1 (tg3): transmit queue 0 timed out
On Sat, Jan 26, 2019 at 9:04 AM Paulo Abadie Guedes <
<email address hidden>> wrote:
> Thank you.
> I am still having the problem during our cloning process, although it's not
> so frequent. Before the patch I applied, each and every transfer would
> ALWAYS kick the tg3 bug.
>
> Here it seems related to problems with NAPI. AFAIK, this is an approach to
> handle interrupt bursts. NIC's work typically in bursts: a long time
> without packets, then a very large stream of packets, then silence. This is
> the common scenario.
>
> Having interrupts to serve sporadic data is ok. But a burst of packets
> trigger a burst of interrupts, which is not as efficient as just polling
> the NIC (during the burst).
>
> What NAPI does is (in a very very simplified way): it expects the first
> interrupt from the network, then switches off interrupts, poll the NIC (up
> to a limit) until there are no more network packets, or the "work quota" is
> exhausted, what happens first. Then it turns on interrupts and the cycle
> repeats. This quota (sorry, don't remember the correct term) is very
> important to prevent the kernel from being stuck just serving packets.
>
> What's happening is (my understanding) that somet...
Bob Lawrence (pilotbob42) wrote : | #62 |
@paulo.guedes
Yes, removing and re-adding the modules as you describe does at least recover eno1 without rebooting. Still, hardly a solution for what was intended to be a headless Plex server. This happens every time I start an mpeg2 tv stream through my plex box which is only about a 20mbps load. Sometimes it happens immediately sometimes it goes for nearly an hour.
Also, I compiled a custom kernel with the patch described in post #40. It had no effect on the dropouts for me. They are still occurring.
System:
Host: Bobs-HTPC Kernel: 4.15.18+ x86_64 bits: 64 Console: tty 1 Distro: Ubuntu 18.04.1 LTS
Machine:
Device: desktop System: Hewlett-Packard product: HP EliteDesk 705 G1 DM serial: N/A
Mobo: Hewlett-Packard model: 225E serial: N/A BIOS: Hewlett-Packard v: L06 v02.31 date: 08/31/2018
Network:
Card-1: Intel Wireless 7260 driver: iwlwifi
IF: wlp2s0 state: up mac: cc:3d:82:a7:bf:ed
Card-2: Broadcom Limited NetXtreme BCM5762 Gigabit Ethernet PCIe driver: tg3
IF: eno1 state: up speed: 100 Mbps duplex: full mac: ec:b1:d7:4c:2d:8e
Bob Lawrence (pilotbob42) wrote : | #63 |
Also, on the last crash, I caught it while it was happening and RX/TX errors and collisions all went through the roof right before it went "no-carrier".
James Johnson (triplej) wrote : | #64 |
I have been experiencing this issue on an HP 745 G4 with the same BCM 5762 on several kernel versions from ubuntu 16.04 and up to 4.15.0-45. On my system the network would immediately crash often before logging in. Occasionally I would be able to ping for several seconds before the device would crash.
I have tried several work arounds in this thread although none were successful. Setting iommu to soft may have increase the duration from 10 seconds to about 30 however I did not test this extensively.
I was able to upgrade to mainline kernel 4.20.7-042007 using uuku, and I no longer experience any device instability. I'm not sure if this specific patch was included in this release although it maybe useful for those still experiencing crashes on Ubuntu 18.04
Shane R. Spencer (whardier) wrote : | #65 |
Same issue with HP EliteDesk 705 G2 MINI
Turned off all power saving options in BIOS.
Currently running 18.04 HWE EDGE (Linux 5.0.0-15-generic) compiled with:
CONFIG_TIGON3=m
CONFIG_
Tempted to turn off HWMON.
[ 1.314002] tg3 0000:01:00.0 eth0: Tigon3 [partno(BCM95762) rev 5762100] (PCI Express) MAC address c8:d3:ff:a2:96:e9
[ 1.314915] tg3 0000:01:00.0 eth0: attached PHY is 5762C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[ 1.315781] tg3 0000:01:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1]
[ 1.316661] tg3 0000:01:00.0 eth0: dma_rwctrl[
[ 1.324241] tg3 0000:01:00.0 eno1: renamed from eth0
[ 6.950429] tg3 0000:01:00.0 eno1: Link is up at 1000 Mbps, full duplex
[ 6.950471] tg3 0000:01:00.0 eno1: Flow control is on for TX and on for RX
[ 6.950475] tg3 0000:01:00.0 eno1: EEE is disabled
Has anybody found a stable fix for this problem?
Chris Schwarz (cschwarz) wrote : | #66 |
I have not experienced the issue since I started using kernel 4.20.11 .
On Fri., May 24, 2019, 9:54 a.m. Shane R. Spencer, <email address hidden>
wrote:
> Same issue with HP EliteDesk 705 G2 MINI
>
> Turned off all power saving options in BIOS.
>
> Currently running 18.04 HWE EDGE (Linux 5.0.0-15-generic) compiled with:
>
> CONFIG_TIGON3=m
> CONFIG_
>
> Tempted to turn off HWMON.
>
> [ 1.314002] tg3 0000:01:00.0 eth0: Tigon3 [partno(BCM95762) rev
> 5762100] (PCI Express) MAC address c8:d3:ff:a2:96:e9
> [ 1.314915] tg3 0000:01:00.0 eth0: attached PHY is 5762C
> (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
> [ 1.315781] tg3 0000:01:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0]
> ASF[0] TSOcap[1]
> [ 1.316661] tg3 0000:01:00.0 eth0: dma_rwctrl[
> [ 1.324241] tg3 0000:01:00.0 eno1: renamed from eth0
> [ 6.950429] tg3 0000:01:00.0 eno1: Link is up at 1000 Mbps, full duplex
> [ 6.950471] tg3 0000:01:00.0 eno1: Flow control is on for TX and on for
> RX
> [ 6.950475] tg3 0000:01:00.0 eno1: EEE is disabled
>
> Has anybody found a stable fix for this problem?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> 14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
> Triaged
> Status in linux package in Debian:
> New
>
> Bug description:
> The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load. Call trace:
> https:/
>
> When this happens, only a reboot would be able to fix it. Sometimes,
> however, bringing the interface offline and online (via ifconfig)
> would recover networking. I've also tested with the latest tg3 driver
> (dec 2014 version) and networking is still problematic. I have also
> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
> This bug may be related to the integrated Firmware.
>
> Here is the procedure to replicate the issue because it is hard to
> replicate it under moderate network load.
>
> 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
> using a Ubuntu/Kubunu Live CD 14.04-15.04.
> 2. from another machine: start 5 sessions, repetitively copy (scp with
> public key authentication) a 70 meg file back and forth to the tg3 machine
> in each session. (not sure if this is necessary)
> 3. create a 1GB file on the tg3 machine, with something like dd
> if=/dev/urandom of=/my/test/file bs=1024 count=$
> 4. from another machine: repetitively scp copy that 1GB file from the
> tg3 machine. This can be done with something like:
>
> while [ 0 ]; do
> scp -i /my/scp/private.key <email address hidden>
> done;
>
> Networking will mostly goes offline in about 10-30 minutes.
>
> WORKAROUND: Add udev rule to make the changes permanent in
> /etc/udev/
> ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}
> ATTRS{device}
Changed in linux (Ubuntu): | |
assignee: | Kai-Heng Feng (kaihengfeng) → nobody |
Kai-Heng Feng (kaihengfeng) wrote : | #67 |
Latest kernels in Xenial, Bionic, Cosmic and Disco have the following commit:
commit 3a498606bb04af6
Author: Sanjeev Bansal <email address hidden>
Date: Mon Jul 16 11:13:32 2018 +0530
tg3: Add higher cpu clock for 5762.
This patch has fix for TX timeout while running bi-directional
traffic with 100 Mbps using 5762.
Signed-off-by: Sanjeev Bansal <email address hidden>
Signed-off-by: Siva Reddy Kallam <email address hidden>
Reviewed-by: Michael Chan <email address hidden>
Signed-off-by: David S. Miller <email address hidden>
Changed in linux (Ubuntu): | |
status: | Triaged → Fix Released |
Paulo Abadie Guedes (paulo.guedes) wrote : | #68 |
Thank you, Kai-Heng Feng. Really appreciate it.
Currently I'm under a lot of pressure at work. But I will try this in the
next days, to see if it fixes the problem for us. My network still have the
same condition and my previous kernel versions are still breaking. So, it
should be easy to reproduce.
Will write back reporting as soon as I can.
Thank you again,
Paulo
On Tue, Jul 2, 2019, 03:15 Kai-Heng Feng <email address hidden>
wrote:
> Latest kernels in Xenial, Bionic, Cosmic and Disco have the following
> commit:
> commit 3a498606bb04af6
> Author: Sanjeev Bansal <email address hidden>
> Date: Mon Jul 16 11:13:32 2018 +0530
>
> tg3: Add higher cpu clock for 5762.
>
> This patch has fix for TX timeout while running bi-directional
> traffic with 100 Mbps using 5762.
>
> Signed-off-by: Sanjeev Bansal <email address hidden>
> Signed-off-by: Siva Reddy Kallam <email address hidden>
> Reviewed-by: Michael Chan <email address hidden>
> Signed-off-by: David S. Miller <email address hidden>
>
> ** Changed in: linux (Ubuntu)
> Status: Triaged => Fix Released
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> 14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
> Fix Released
> Status in linux package in Debian:
> New
>
> Bug description:
> The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load. Call trace:
> https:/
>
> When this happens, only a reboot would be able to fix it. Sometimes,
> however, bringing the interface offline and online (via ifconfig)
> would recover networking. I've also tested with the latest tg3 driver
> (dec 2014 version) and networking is still problematic. I have also
> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
> This bug may be related to the integrated Firmware.
>
> Here is the procedure to replicate the issue because it is hard to
> replicate it under moderate network load.
>
> 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
> using a Ubuntu/Kubunu Live CD 14.04-15.04.
> 2. from another machine: start 5 sessions, repetitively copy (scp with
> public key authentication) a 70 meg file back and forth to the tg3 machine
> in each session. (not sure if this is necessary)
> 3. create a 1GB file on the tg3 machine, with something like dd
> if=/dev/urandom of=/my/test/file bs=1024 count=$
> 4. from another machine: repetitively scp copy that 1GB file from the
> tg3 machine. This can be done with something like:
>
> while [ 0 ]; do
> scp -i /my/scp/private.key <email address hidden>
> done;
>
> Networking will mostly goes offline in about 10-30 minutes.
>
> WORKAROUND: Add udev rule to make the changes permanent in
> /etc/udev/
> ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}
> AT...
luc (glarage) wrote : | #69 |
Actually with kernel 5.1.15 and if i don't make a mistake, this commit is merged since 2018-07-16;
A first speedtest-net gives me this outpout=
tg3 0000:03:00.0 eno1: 0: Host status block [00000001:
tg3 0000:03:00.0 eno1: 0: NAPI info [000000d5:
tg3 0000:03:00.0 eno1: 1: Host status block [00000001:
tg3 0000:03:00.0 eno1: 1: NAPI info [000000d2:
tg3 0000:03:00.0 eno1: 2: Host status block [00000001:
tg3 0000:03:00.0 eno1: 2: NAPI info [00000031:
A second speedtest-net gives me this outpout (and i lost the connection)=
tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffed80 flags=0x0000]
[tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffee40 flags=0x0000]
tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffedc0 flags=0x0000]
tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffee00 flags=0x0000]
tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffee80 flags=0x0000]
[tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffeec0 flags=0x0000]
tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffef40 flags=0x0000]
tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffef00 flags=0x0000]
tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffef80 flags=0x0000]
tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1ffffefc0 flags=0x0000]
(...)
tg3 0000:03:00.0: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3 0000:03:00.0: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3 0000:03:00.0 eno1: Link is down
LSPCI gives me = 03:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5762 Gigabit Ethernet PCIe (rev 10) and i have a HP 705G1
I'm grateful for the effort you put into solving this bug and the many reminders to broadcom people..
tags: | added: cscc |
Chi-Thanh Christopher Nguyen (chithanh) wrote : | #70 |
Still an issue here with Dell Latitude 5495 and Kernel 5.2.7.
I noticed that very much like similar problems I had with Realtek LAN, it helped as a workaround to boot with iommu=pt kernel parameter.
A rtl8169 report was here https:/
no longer affects: | linux (Ubuntu) |
Launchpad Janitor (janitor) wrote : | #71 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
penalvch (penalvch) wrote : | #72 |
Chi-Thanh Christopher Nguyen, please note:
1) The kernel 5.2.7 is not supported here on Launchapd. Hence, please re-direct your inquiry to the relevant maintainer(s) upstream.
2) If you can reproduce the issue with a supported kernel then please file a new report to provide debugging logs via a terminal:
ubuntu-bug linux
Please feel free to subscribe me to it.
affects: | linux (Debian) → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
status: | New → Fix Released |
Bob Lawrence (pilotbob42) wrote : | #73 |
Problem still exists with kernel 5.3.0-59-generic. Same machine I reported on previously. Multiple kernel releases since then. Only change with more recent kernels is that the connection recovers on its own after a few minutes (as opposed to requiring a reboot). Still, the only workaround that has any effect is to manually set the connection to 100mbps and half duplex. Pretty useless for a media server.
Example dmesg output when the problem occurs:
[219326.666826] tg3 0000:03:00.0 eno1: transmit timed out, resetting
[219329.265075] tg3 0000:03:00.0 eno1: 0x00000000: 0x168714e4, 0x50100546, 0x02000010, 0x00000000
[219329.265116] tg3 0000:03:00.0 eno1: 0x00000010: 0xe082000c, 0x00000000, 0xe081000c, 0x00000000
[219329.265125] tg3 0000:03:00.0 eno1: 0x00000020: 0xe080000c, 0x00000000, 0x00000000, 0x225e103c
[many, many, hex dump lines repeated here]
[219329.267191] tg3 0000:03:00.0 eno1: 0x00007030: 0x000e0000, 0x000038d8, 0x00230030, 0x80000000
[219329.267198] tg3 0000:03:00.0 eno1: 0x00007500: 0x00000000, 0x00000000, 0x00000081, 0x00000000
[219329.267203] tg3 0000:03:00.0 eno1: 0x00007510: 0x00000000, 0x7fffffbf, 0x00000000, 0x00000000
[219329.267214] tg3 0000:03:00.0 eno1: 0: Host status block [00000001:
[219329.267222] tg3 0000:03:00.0 eno1: 0: NAPI info [000000a6:
[219329.267229] tg3 0000:03:00.0 eno1: 1: Host status block [00000001:
[219329.267236] tg3 0000:03:00.0 eno1: 1: NAPI info [0000003c:
[219329.267244] tg3 0000:03:00.0 eno1: 2: Host status block [00000001:
[219329.267256] tg3 0000:03:00.0 eno1: 2: NAPI info [000000b5:
[219329.267267] tg3 0000:03:00.0 eno1: 3: Host status block [00000001:
[219329.267273] tg3 0000:03:00.0 eno1: 3: NAPI info [00000093:
[219329.267279] tg3 0000:03:00.0 eno1: 4: Host status block [00000001:
[219329.267286] tg3 0000:03:00.0 eno1: 4: NAPI info [00000002:
[219329.370520] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=1400 enable_bit=2
[219329.473173] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=c00 enable_bit=2
[219329.575744] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4800 enable_bit=2
[219329.578634] tg3 0000:03:00.0 eno1: Link is down
INXI output:
System: Host: Bobs-HTPC Kernel: 5.3.0-59-generic x86_64 bits: 64 Desktop: MATE 1.20.1
Distro: Ubuntu 18.04.4 LTS
Machine: Device: desktop System: Hewlett-Packard product: HP EliteDesk 705 G1 DM serial: N/A
Mobo: Hewlett-Packard model: 225E serial: N/A
BIOS: Hewlett-Packard v: L06 v02.31 date: 08/31/2018
CPU: Quad core AMD A8-7600 Radeon R7 10 Compute Cores 4C+6G (-MCP-) cache: 8192 KB
clock speeds: max: 3100 MHz 1: 1499 MHz 2: 1524 MHz 3: 1438 MHz 4: 1402 MHz
Graphics: Card: Advanced Micro Devices [AMD/ATI] Kaveri [Radeon R7 Graphics]
...
Bob Lawrence (pilotbob42) wrote : | #74 |
I'll also add that though I had previously tried iommu=soft with no luck, trying iommu=pt as suggested by chithanh in post 70 does seem to workaround the issue successfully. I'm not sure as to why that would be the case as this is neither a VM nor a VM host, but since adding the parameter to my kernel line and rebooting I've been running for several hours with media continuously streaming. Without the parameter it would only stream for a matter of minutes before dropping the connection.
Toan (tpham3783) wrote : | #75 |
Has anyone applied the patch to the tg3 driver that was shared in comment# 13? That one solved the issue for me. If that was the real fix, I'd like to inform the tg3 maintainers about it so that we can have it patched in the mainline. thanks.
tp
Bob Lawrence (pilotbob42) wrote : | #76 |
I did not apply the patch in #13, but I did try disabling highdma with ethtool (essentially what the patch makes permanent) and that had no effect for me (at least not on the kernel I was using at the time). I did try the patch in #40 and that had no effect for me either. The only thing I've found that keeps my Broadcom 5762 alive without disconnecting is the kernel parameter "iommu=pt". I'm just finally grateful to have found a workaround so I can keep this server wired and not have to rely on its wireless only.
I can't help but think we are chasing a moving target across so many kernel versions since this issue was first reported.
Janno Sannik (jannoke) wrote : | #77 |
Just letting know that "iommu=pt" fixed my problem on HP Elitedesk 705 G2. There was not even a talk about test benching anything since I could not even download a 100MB file using 300Mbit/s internet connection. It would lose connection without any logs. It however would recover with ifdown/ifup.
This is not ubuntu, but (up to date) proxmox-ve v6.2-15 using kernel 5.4.65-1-pve which is based on debian.
Tony Eckel (teckel) wrote : | #78 |
Have an EliteDesk mini 705 G2 with identical issue and none of the fixes worked.
running Ubuntu 20.04.2 LTS
So it isn't fixed.
What do you need to troubleshoot this?
Kai-Heng Feng (kaihengfeng) wrote : | #79 |
Tony, please file a new bug.
This change was made by a bot.