14e4:1687 broadcom tg3 network driver disconnects under high load

Bug #1447664 reported by Toan on 2015-04-23
84
This bug affects 13 people
Affects Status Importance Assigned to Milestone
linux (Debian)
New
Undecided
Unassigned
linux (Ubuntu)
High
Kai-Heng Feng

Bug Description

The tg3 broadcom network driver that binds with chipset 5762 goes offline and unable to recover (even with tg3 watchdog timeout) when network transmit is under high load. Call trace:
https://launchpadlibrarian.net/204185480/dmesg

When this happens, only a reboot would be able to fix it. Sometimes, however, bringing the interface offline and online (via ifconfig) would recover networking. I've also tested with the latest tg3 driver (dec 2014 version) and networking is still problematic. I have also disabled TSO, GSO etc... with ethtool and the bug still surfaces. This bug may be related to the integrated Firmware.

Here is the procedure to replicate the issue because it is hard to replicate it under moderate network load.

1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a Ubuntu/Kubunu Live CD 14.04-15.04.
2. from another machine: start 5 sessions, repetitively copy (scp with public key authentication) a 70 meg file back and forth to the tg3 machine in each session. (not sure if this is necessary)
3. create a 1GB file on the tg3 machine, with something like dd if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
4. from another machine: repetitively scp copy that 1GB file from the tg3 machine. This can be done with something like:

while [ 0 ]; do
   scp -i /my/scp/private.key <email address hidden>:/my/test/file /tmp
done;

Networking will mostly goes offline in about 10-30 minutes.

WORKAROUND: Add udev rule to make the changes permanent in /etc/udev/rules.d/80-tg3-fix.rules :
ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: linux-image-3.19.0-15-generic 3.19.0-15.15
ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
Uname: Linux 3.19.0-15-generic x86_64
ApportVersion: 2.17.2-0ubuntu1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: kubuntu 3748 F.... pulseaudio
 /dev/snd/controlC0: kubuntu 3748 F.... pulseaudio
CasperVersion: 1.360
Date: Thu Apr 23 11:16:24 2015
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
ProcEnviron:
 LANGUAGE=
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash ---
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-15-generic N/A
 linux-backports-modules-3.19.0-15-generic N/A
 linux-firmware 1.143
RfKill:

SourcePackage: linux
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/22/2014
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: L06 v02.15
dmi.board.asset.tag: 2UA5041TG4
dmi.board.name: 2215
dmi.board.vendor: Hewlett-Packard
dmi.chassis.asset.tag: 2UA5041TG4
dmi.chassis.type: 6
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvrL06v02.15:bd10/22/2014:svnHewlett-Packard:pnHPEliteDesk705G1MT:pvr:rvnHewlett-Packard:rn2215:rvr:cvnHewlett-Packard:ct6:cvr:
dmi.product.name: HP EliteDesk 705 G1 MT
dmi.sys.vendor: Hewlett-Packard

Toan (tpham3783) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Toan (tpham3783) on 2015-04-23
description: updated
description: updated

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.0 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.0-vivid/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Toan (tpham3783) wrote :

Joseph,

>Did this issue start happening after an update/upgrade?

No, I also had this issue. I tested with multiple OSes and kernel versions. I tested the issue with kernel 2.6.39,
and three Ubuntu live CDs 12.04, 14.04, and 15.04 (which was released today). I, however, will consider testing with kernel 4.x.

>Was there a prior kernel version where you were not having this particular problem?

No

Toan (tpham3783) wrote :

Please note,this bug is unrelated to Bug #1331513 b/c even if TSO, GSO etc... are disabled, I can still re-producible it. The lock-up would only occur under VERY_HIGH_NETWORK_LOAD, so a typical user (web-surfing only) would not be able catch it easily. On a side note, the machine I am testing is an HP EliteDesk 705 (DMI info below), and it is the official certified hardware to run Ubuntu.

System Information
        Manufacturer: Hewlett-Packard
        Product Name: HP EliteDesk 705 G1 MT
        Version:
        Serial Number: 2UA5041TG4
        UUID: E24D7A80-9AA4-11E4-8822-8A8247065164
        Wake-up Type: Power Switch
        SKU Number: K5U61UP#ABA
        Family: 103C_53307F G=D

Here is the state of the network interface when the tigon3 driver completely locked up. Attached file is the dmesg log.

eth0 Link encap:Ethernet HWaddr 64:51:06:47:82:8a
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:90235313784 errors:30064771065 dropped:7 overruns:0 frame:120259084260
          TX packets:90387363107 errors:30064771065 dropped:0 overruns:0 carrier:0
          collisions:30064771065 txqueuelen:1000
          RX bytes:32978848243 (32.9 GB) TX bytes:321345086545 (321.3 GB)
          Interrupt:18

PS: I just compiled linux-stable 4.0 trunk, will try to run and and report back soon.

tags: added: latest-bios-2.15
tags: added: trusty
Toan (tpham3783) wrote :

Guys,

I've just confirmed that this bug exist in the upstream kernel version 4.0. Attached file is the full kernel-4.0 log (from bootup to the time the broadcom driver crashes). We may have to report this bug to a Broadcom network driver/firmware developer. thanks

Toan (tpham3783) on 2015-04-23
tags: added: bcm5762 broadcom kernel-bug-exists-upstream linux-4.0 lucid tg3 tigon
Po-Hsu Lin (cypressyew) on 2015-04-24
Changed in linux (Ubuntu):
status: Incomplete → Confirmed

Toan, the issue you are reporting is an upstream one. Could you please report this problem to the appropriate mailing list (netdev) by following the instructions verbatim at https://wiki.ubuntu.com/Bugs/Upstream/kernel ?

Please provide a direct URL to your e-mail to the mailing list once you have made it so that it may be tracked via http://vger.kernel.org/vger-lists.html . It can take a day for the new e-mail to show up in the respective archive.

Thank you for your understanding.

tags: added: kernel-bug-exists-upstream-4.0
removed: bcm5762 broadcom linux-4.0 tg3 tigon
Changed in linux (Ubuntu):
status: Confirmed → Triaged
summary: - broadcom tg3 network driver disconnects under high load
+ 14e4:1687 broadcom tg3 network driver disconnects under high load
Toan (tpham3783) wrote :

Here is the bug report email to netdev mailing list:

http://www.spinics.net/lists/netdev/msg326389.html

Lauri Võsandi (v6sa) wrote :

Hi, disabling highdma with ethtool seems to work around the issue. I've added following udev rule to make the changes permanent in /etc/udev/rules.d/80-tg3-fix.rules

ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"

Toan (tpham3783) wrote :

Thank you for your valuable finding. I'll test your suggestion in the next few days to confirm that it works.

I've also reported the work-around to Broadcom dev team and suggested a patch to the tg3 driver to disable highdma. I'll keep you updated on the issue... thank you once again.

Toan (tpham3783) wrote :

Lauri,

Can you let me know if you've tested the work-around solution on a 64bit or 32bit OS? AFAK, HIGHMEM option only allows dma support on 64bit system (>4GB), so I dont think it would make a difference if the native OS is 32bit. The reason I am asking because I've tested the bug on both 32 and 64 bit systems, so I just dont see how disabling highdma on a 32bit system would resolve the issue. Regardless, I will try the work-around solution on a 32bit system pretty soon.

Lauri Võsandi (v6sa) wrote :

Hi,

I am running on 64-bit system. The machine didn't hiccup in ~36 hours so we stopped testing there, otherwise I managed to bump into connection drop within hours, 8 hours tops. For test I had scp copying data inbound and outbound and in addition to that Youtube was playing in several browser tabs. Higher memory usage seemed to trigger the bug faster.

Toan (tpham3783) wrote :

Lauri,

I've pumped over 1.5TB of data and have never seen the hic-up yet. I think we've found the smoking gun. Below is a simple patch to the tigon device driver if you prefer not to use the udev rule solution.

I believe the root cause is that the tigon net driver uses virtual memory for DMA transfers. All DMA transfers should be remapped to logical memory using dma_map_page() in order for HIGHDMA feature to work. Broadcom will look into this and hopefully, the bug will be fixed upstream soon... Thanks again...

--- linux-2.6.38.2/drivers/staging/bcm-tg3/tg3.c.vanilla 2016-01-07 14:14:20.000000000 -0500
+++ linux-2.6.38.2/drivers/staging/bcm-tg3/tg3.c 2016-01-06 16:05:37.000000000 -0500
@@ -18992,6 +18992,12 @@

        tg3_init_bufmgr_config(tp);

+ /* pham, patch 5762 chip */
+ if (tp->pdev->device == 0x1687 || tg3_asic_rev(tp) == ASIC_REV_5762){
+ printk("tg3: disable HIGHDMA for tigon3 device 5762\r\n");
+ dev->features &= ~NETIF_F_HIGHDMA;
+ }
+
        /* 5700 B0 chips do not support checksumming correctly due
         * to hardware bugs.
         */

Toan (tpham3783) wrote :

It is confirmed, disabling HIGHDMA fixed the NIC problem. This was tested by putting a system under load for 120+ hours, and simulated over 12TB of data through the tg3 NIC. Great find Lauri, and thank you again!

description: updated
Changed in linux (Ubuntu):
importance: Medium → High
Lauri Võsandi (v6sa) wrote :

Hello, it seems that while running graphical user interface and highdma off similar problem persists:

NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out
[...]
irq 18: nobody cared (try booting with the "irqpoll" option)

After that device goes offline and can'be brought up again with rmmod/modprobe and the mouse movement becomes jerky. The problem appears quicker if you play around with Firefox etc. Tried booting with irqpoll, the connection still drops but module can be reloaded and mouse isn't jerky. I tried this with 3.18.25 and 4.4.2 kernels, both exibited similar behaviour.

chriscrutch (chriscrutch) wrote :

Any chance there's been any movement on this bug? It's really a pain for me. Disabling HIGHDMA helped a bit, but now it seems to kick in at different times. The bandwidth use doesn't seem to be an issue anymore, but now it disconnects with heavy data transfer to USB. It kicks in when performing large backups to an external hard drive, and when copying large video files to a SD card attached with a USB adapter.

Daniel (dkim-b) wrote :

I am having the issue as well on kernel 4.4.0-66 (x64). Disabling HighDMA did not fix anything on my end and I cannot figure out what will trigger the issue. It seems to occur randomly and even if there is no active network traffic.

gadi (gadieid) wrote :

It happened to me as well on Proliant 360 gent 9 Ubuntu 16.04.2 with 4.4.0-72-generic kernel
ifconfig -a didn't show any eno devices
3 identical servers (HW and SW) had no problem at all
a simple modprobe tg3 command and all eno devices (1-4) appeared

Any fix to this bug?

I have the same problem: Ubuntu 16.04 LTS, kernel: 4.8.0-46-generic. Same problem in Debian 9 kernel 4.9.

tags: added: bios-outdated-2.28
removed: latest-bios-2.15
description: updated
tags: added: kernel-bug-exists-upstream-4.11
removed: kernel-bug-exists-upstream-4.0
tags: added: xenial
luc (glarage) wrote :

HP EliteDesk 705 G1 SFF with NetXtreme BCM5762 Gigabit Ethernet PCIe

FTTH user here, no ethernet connection after highload (speedtest or you tube) , like others users i had to reboot. The only workaround i found= [sudo ethtool -s eno1 speed 100 duplex full autoneg on] after a reboot, and i can use network but not with my full bandwidth....
Lubuntu 17.04 with 4.12.0-041200rc3-generic

Kai-Heng Feng (kaihengfeng) wrote :

If this happens on mainline kernel, please file an upstream bug at https://bugzilla.kernel.org.

Hi,

Have same issue with ubuntu 17.04 kernel 4.10.0.19. Any suggestions to fix this problem, besides to reduce speed of the interface?

Roger Techima (techima) wrote :

Hello,

i am having the same problem in HP EliteDesk 705 G2 Desktop Mini.

I tried 14.04, 16.04 and 17.04 highdma off solution but this didn't solve the bug for me.

I am running in a 100Mbit network. I noticed that in gigabit seems to work, but it seems I did not test for enough time.

Best regards,

Roger

Kai-Heng Feng (kaihengfeng) wrote :

FWIW, I can't reproduce the issue on the same chip. I used iperf instead of scp though.

luc (glarage) wrote :

hi guys,
Not a fix but it did the trick: add to your grub iommu=soft.
You will have a fully working ethernet connection...without reduce the speed.
Mine look like this = GRUB_CMDLINE_LINUX_DEFAULT="iommu=soft"
After you have to update grub, like you know.
Why?
Because of this lines with DMESG after i updated my bios (BIOS L06 v02.28 02/07/2017)=

[ 108.769354] psmouse serio1: Wheel Mouse at isa0060/serio1/input0 lost synchronization, throwing 2 bytes away.
[ 108.903961] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[ 108.945302] tg3 0000:03:00.0 eno1: Link is down
[ 109.305448] AMD-Vi: Event logged [
[ 109.305454] IO_PAGE_FAULT device=03:00.0 domain=0x000d address=0x00000000ffa06e80 flags=0x0020]
[ 109.305459] AMD-Vi: Event logged [
[ 109.305460] IO_PAGE_FAULT device=03:00.0 domain=0x000d address=0x00000000ffa06ec0 flags=0x0020]

Yngvi Hrafn Pétursson (skuti) wrote :

Having same same issue on HP EliteDesk 705 G3 Desktop Mini (W4V44AV)
Broadcom Corporation NetXtreme BCM5762 Gigabit Ethernet PCIe (rev 10) and tg3 module

Error is triggered after the link speed is set or negotiated to 100Mbps
Usually within 15sec of ping ower 100Mbps link
But but works ok with 1Gbps links.

Can be triggered by pluging to 100Mbps port, changin the switch port to 100Mbps or:
# ethtool -s eno1 speed 100 duplex full autoneg off

Netboot works until the tg3 module takes ower.
Windows works ok.
Tested:
- multiple cables, computers and switch vendors
- upgrading bios
- ethtool disable eee and hardware offload
- ubuntu 12.04 - 17.04
- new kernel linux-generic-hwe-16.04-edge Version: 4.11.0.14.22
- disable power management in bios
- disable power management with grup switches
- iommu=soft iommu=on iommu=off
- disable highdma

None of the workarounds that i found on Google worked for me.

modinfo tg3 | grep -v alias
filename: /lib/modules/4.4.0-92-generic/kernel/drivers/net/ethernet/broadcom/tg3.ko
firmware: tigon/tg3_tso5.bin
firmware: tigon/tg3_tso.bin
firmware: tigon/tg3.bin
version: 3.137
license: GPL
description: Broadcom Tigon3 ethernet driver
author: David S. Miller (<email address hidden>) and Jeff Garzik (<email address hidden>)
srcversion: 8C06FB0EBBF221DF79133B9
depends: ptp
intree: Y
vermagic: 4.4.0-92-generic SMP mod_unload modversions
parm: tg3_debug:Tigon3 bitmapped debugging message enable value (int)

Hello, I have seen the exactly same issue, with the exactly same hardware you have: it's the HP EliteDesk 705 G3 Desktop Mini.

I've tested already a ton of options, including recompiling the latest kernel, booting with several parameters, and so on and so forth. Got nothing more than a big headache. I have 100+ machines to install in a month and my team is having a really hard time to deal with this issue.

I have posted my findings on the fog forums. Fog is an open-source cloning tool. Please check it out:

https://forums.fogproject.org/topic/10731/crash-due-to-timeout-in-tg3-kernel-module-tg3_stop_block-timed-out-ofs-4c00-enable_bit-2

Any ideas on this bug? It seems to be related to 10/100 switches. If both ends are gigabit, it works much more reliably. Problems still arise, but much less frequently. With my old "fast ethernet" switch, the problem alwasy happens.

It's lurking anywhere between the binary blob (the firmware), the kernel driver, the hardware or any tricky combination of these. Perhaps related to the AMD platform

I can run tests or gather more data, if it helps. The issue always happens here.
Any ideas on how to solve or workaround this issue? Patches or parameters are welcome...

Regards,
Paulo

Tessio Fechine (tessiof) wrote :

There was a commit to fix something about the BCM5762 variant, but it seems to be restricted to DELL servers..
https://github.com/torvalds/linux/commit/4419bb1cedcda0272e1dc410345c5a1d1da0e367#diff-ee9b0abeec638cc316efd5b30e0e01e8

Download full text (4.6 KiB)

> On 11 Jan 2018, at 9:23 PM, Tessio Fechine <email address hidden> wrote:
>
> There was a commit to fix something about the BCM5762 variant, but it seems to be restricted to DELL servers..
> https://github.com/torvalds/linux/commit/4419bb1cedcda0272e1dc410345c5a1d1da0e367#diff-ee9b0abeec638cc316efd5b30e0e01e8

Can you try it without the if block?

If you don’t know how to compile kernel, I can build kernel package.

>
> --
> You received this bug notification because you are subscribed to linux
> in Ubuntu.
> https://bugs.launchpad.net/bugs/1447664
>
> Title:
> 14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
> Triaged
> Status in linux package in Debian:
> New
>
> Bug description:
> The tg3 broadcom network driver that binds with chipset 5762 goes offline and unable to recover (even with tg3 watchdog timeout) when network transmit is under high load. Call trace:
> https://launchpadlibrarian.net/204185480/dmesg
>
> When this happens, only a reboot would be able to fix it. Sometimes,
> however, bringing the interface offline and online (via ifconfig)
> would recover networking. I've also tested with the latest tg3 driver
> (dec 2014 version) and networking is still problematic. I have also
> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
> This bug may be related to the integrated Firmware.
>
> Here is the procedure to replicate the issue because it is hard to
> replicate it under moderate network load.
>
> 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using a Ubuntu/Kubunu Live CD 14.04-15.04.
> 2. from another machine: start 5 sessions, repetitively copy (scp with public key authentication) a 70 meg file back and forth to the tg3 machine in each session. (not sure if this is necessary)
> 3. create a 1GB file on the tg3 machine, with something like dd if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
> 4. from another machine: repetitively scp copy that 1GB file from the tg3 machine. This can be done with something like:
>
> while [ 0 ]; do
> scp -i /my/scp/private.key <email address hidden>:/my/test/file /tmp
> done;
>
> Networking will mostly goes offline in about 10-30 minutes.
>
> WORKAROUND: Add udev rule to make the changes permanent in /etc/udev/rules.d/80-tg3-fix.rules :
> ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4", ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"
>
> ProblemType: Bug
> DistroRelease: Ubuntu 15.04
> Package: linux-image-3.19.0-15-generic 3.19.0-15.15
> ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
> Uname: Linux 3.19.0-15-generic x86_64
> ApportVersion: 2.17.2-0ubuntu1
> Architecture: amd64
> AudioDevicesInUse:
> USER PID ACCESS COMMAND
> /dev/snd/controlC1: kubuntu 3748 F.... pulseaudio
> /dev/snd/controlC0: kubuntu 3748 F.... pulseaudio
> CasperVersion: 1.360
> Date: Thu Apr 23 11:16:24 2015
> IwConfig:
> eth0 no wireless extensions.
>
> lo no wireless extensions.
> LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
> MachineType: Hewlett-Packard HP EliteDesk 705 G...

Read more...

Tessio Fechine (tessiof) wrote :

If you point me to the kernel package I can try it..

Yngvi Hrafn Pétursson (skuti) wrote :

I tested this kernel but was unable to mount the hard disk.
Missing modules for HP EliteDesk 705 G3 Desktop Mini?

Kai-Heng Feng (kaihengfeng) wrote :

Probably. I built a new one, please give it a try:
http://people.canonical.com/~khfeng/lp1447664~2/

Yngvi Hrafn Pétursson (skuti) wrote :

This kernel works on the HP box i have.
Tested with Firefox and speedtest.net.
Tested with iperf3 on 1Gpbs, 100Mbps full-duplex and 100Mbps half-duplex.

No timeouts or errors in dmesg :)

Tessio Fechine (tessiof) wrote :
Download full text (80.3 KiB)

tg3 still crashing..

[ 301.753501] tg3 0000:01:00.0 eno1: Link is up at 100 Mbps, full duplex
[ 301.753546] tg3 0000:01:00.0 eno1: Flow control is off for TX and off for RX
[ 301.753551] tg3 0000:01:00.0 eno1: EEE is disabled
[ 312.032110] NETDEV WATCHDOG: eno1 (tg3): transmit queue 0 timed out
[ 312.032190] ------------[ cut here ]------------
[ 312.032208] WARNING: CPU: 1 PID: 0 at /home/khfeng/Sources/linux-lp1447664/net/sched/sch_generic.c:320 dev_watchdog+0x21e/0x230
[ 312.032209] Modules linked in: rfcomm bnep nls_iso8859_1 edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel btusb joydev btrtl btbcm btintel input_leds snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi aes_x86_64 crypto_simd snd_hda_intel snd_hda_codec bluetooth snd_hda_core snd_hwdep glue_helper ecdh_generic cryptd snd_pcm hp_wmi sparse_keymap snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq wmi_bmof mac_hid shpchp snd_seq_device snd_timer fam15h_power k10temp i2c_piix4 snd tpm_infineon soundcore parport_pc ppdev lp parport autofs4 uas usb_storage hid_generic usbhid hid amdkfd amd_iommu_v2 amdgpu i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops tg3 ahci drm ptp libahci pps_core wmi video
[ 312.032305] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.14.0-17-generic #20~lp1447664
[ 312.032307] Hardware name: HP HP EliteDesk 705 G2 MINI/805B, BIOS N26 Ver. 02.11 11/01/2016
[ 312.032310] task: ffff88952c81c500 task.stack: ffff9df2c19c4000
[ 312.032314] RIP: 0010:dev_watchdog+0x21e/0x230
[ 312.032317] RSP: 0018:ffff88953ec83e50 EFLAGS: 00010282
[ 312.032320] RAX: 0000000000000037 RBX: 0000000000000000 RCX: 0000000000000000
[ 312.032322] RDX: 0000000000000000 RSI: ffff88953ec96598 RDI: ffff88953ec96598
[ 312.032323] RBP: ffff88953ec83e80 R08: 0000000000000001 R09: 00000000000003bf
[ 312.032325] R10: ffff88953ec83ee0 R11: 0000000000000000 R12: 0000000000000005
[ 312.032327] R13: 0000000000000001 R14: ffff8895226ea000 R15: ffff889521856d80
[ 312.032330] FS: 0000000000000000(0000) GS:ffff88953ec80000(0000) knlGS:0000000000000000
[ 312.032333] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 312.032334] CR2: 00000000021a0008 CR3: 00000003a6126000 CR4: 00000000001406e0
[ 312.032337] Call Trace:
[ 312.032341] <IRQ>
[ 312.032349] ? qdisc_rcu_free+0x50/0x50
[ 312.032358] call_timer_fn+0x33/0x130
[ 312.032361] run_timer_softirq+0x3fd/0x460
[ 312.032367] ? ktime_get+0x40/0xa0
[ 312.032371] ? lapic_next_event+0x1d/0x30
[ 312.032377] __do_softirq+0xda/0x2a6
[ 312.032382] irq_exit+0xb6/0xc0
[ 312.032385] smp_apic_timer_interrupt+0x69/0x120
[ 312.032388] apic_timer_interrupt+0x9f/0xb0
[ 312.032390] </IRQ>
[ 312.032397] RIP: 0010:cpuidle_enter_state+0xa2/0x2e0
[ 312.032399] RSP: 0018:ffff9df2c19c7e70 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
[ 312.032402] RAX: ffff88953eca2c40 RBX: 00000048a68f835f RCX: 000000000000001f
[ 312.032403] RDX: 00000048a68f835f RSI: fffffffb76b082a3 RDI: 0000000000000000
[ 312.032405] RBP: ffff9df2c19c7eb0 R08: 0000000000000858 R09: 0000000000000861
[ 312.032407] R10: ffff9df2c19c7e40 R11: 0000000000000643 R12: ffff8895...

Kai-Heng Feng (kaihengfeng) wrote :

Take a deeper look, I don't think [1] will help the situation. It's for mainly to solve the issue on jumbo frame.

I thinks it's better to ask HP and Broadcom to fix the issue.

[1] https://github.com/torvalds/linux/commit/4419bb1cedcda0272e1dc410345c5a1d1da0e367#diff-ee9b0abeec638cc316efd5b30e0e01e8

Hello, I am still having this bug. I'm working with several HP machines, with the same model as Yngvi. Here it is (from dmesg messages):
Hardware name: HP HP EliteDesk 705 G3 Brazil Desktop Mini/8266, BIOS P26 Ver. 02.03 12/22/2016

Interesting to notice that it always happens with a 10/100 switch, but never occurs with a gigabit one.

I've compiled and tested the 4.15.0-rc8 release candidade, which has the commit 4419bb1cedcda0272e1dc410345c5a1d1da0e367, but it does not solve the issue. I added a few printk and can see that the module is correctly compiled and loaded, but my machine is not a Dell. Hence, the "if" condition fails and the body is not executed.

I tried also to force the patch, by keeping the "if body" and removing the condition, just to see what happens (with another printk to prove that it runs). The code runs (limiting MRRS t0 2048, I think), but it does not solve the bug.
It complains that TSC is unstable, right after tg3 breaks. Here is a dmesg snippet, maybe it helps.

<...>
[ 155.816404] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 155.816447] clocksource: 'refined-jiffies' wd_now: fffdcbf3 wd_last: fffdc110 mask: ffffffff
[ 155.816490] clocksource: 'tsc' cs_now: 7d3f16e620 cs_last: 7b2987b172 mask: ffffffffffffffff
[ 155.816533] tsc: Marking TSC unstable due to clocksource watchdog
[ 155.939181] tg3 0000:01:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[ 156.103998] tg3 0000:01:00.0 eth0: Link is down
[ 156.322988] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[ 156.323040] sched_clock: Marking unstable (156322980975, 5436)<-(156582881282, -259894745)
[ 156.323144] clocksource: Switched to clocksource refined-jiffies
<...>

If you want to take a deeper look, there are a few logs here. Tried also with "tsc=unstable" and other boot parameters, mostly to see if any would help (feeling lucky, perhaps?). Nothing changed, the bug is still in here. They show mostly the same messages, to me.

log_01_acpi_off.txt
https://pastebin.com/FGQNiLqk

log_02_maxcpus_1.txt
https://pastebin.com/2eEJnA3Z

log_03_nmi_watchdog_off.txt
https://pastebin.com/Su44AqiX

log_04_nmi_watchdog_off.txt
https://pastebin.com/4ja0UZ0c

log_05_noapic_nolapic.txt
https://pastebin.com/fZNJbME5

Well, any ideas? I can reproduce the problem 100% of the time. Would you like me to test any other patch?

Kai-Heng Feng, you mention "it's better to ask HP and Broadcom to fix the issue". I agree, but how can we do that?

Thank you,
Paulo

Kai-Heng Feng (kaihengfeng) wrote :

First please file an upstream bug at https://bugzilla.kernel.org/
Product: Drivers
Component: Network

Also, looks like it's a Ubuntu certified hardware, let me ask around.

Download full text (5.4 KiB)

Hello, I would like to confirm that it's useful to file a new bug for this
issue. For me, the problem I'm having is the same as we are discussing in
this thread. Would it be just a duplicate?

Maybe I'm missing something, because I don't know the details of the bug
hunting process for Ubuntu.

Can you please confirm I should open it?

In this case, I can add a detailed description and dmesg logs, with debug
on and the timeout error message inside.

Anyway, I want to report advances in this problem. I have tested a few
kernels and patches in the last weeks, and have found one combination that
does solve the issue.

I also checked that this patch is not yet merged into the latest vanilla
stable kernel, version 4.15, released three days ago. But it patches and
works also for 4.15, which is just great (at last for me).

Will send the details later (or tomorrow), as soon as I get back to my
computer.

Paulo

On Jan 29, 2018 12:54 AM, "Kai-Heng Feng" <email address hidden>
wrote:

> First please file an upstream bug at https://bugzilla.kernel.org/
> Product: Drivers
> Component: Network
>
> Also, looks like it's a Ubuntu certified hardware, let me ask around.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1447664
>
> Title:
> 14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
> Triaged
> Status in linux package in Debian:
> New
>
> Bug description:
> The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load. Call trace:
> https://launchpadlibrarian.net/204185480/dmesg
>
> When this happens, only a reboot would be able to fix it. Sometimes,
> however, bringing the interface offline and online (via ifconfig)
> would recover networking. I've also tested with the latest tg3 driver
> (dec 2014 version) and networking is still problematic. I have also
> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
> This bug may be related to the integrated Firmware.
>
> Here is the procedure to replicate the issue because it is hard to
> replicate it under moderate network load.
>
> 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
> using a Ubuntu/Kubunu Live CD 14.04-15.04.
> 2. from another machine: start 5 sessions, repetitively copy (scp with
> public key authentication) a 70 meg file back and forth to the tg3 machine
> in each session. (not sure if this is necessary)
> 3. create a 1GB file on the tg3 machine, with something like dd
> if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
> 4. from another machine: repetitively scp copy that 1GB file from the
> tg3 machine. This can be done with something like:
>
> while [ 0 ]; do
> scp -i /my/scp/private.key <email address hidden>:/my/test/file /tmp
> done;
>
> Networking will mostly goes offline in about 10-30 minutes.
>
> WORKAROUND: Add udev rule to make the changes permanent in
> /etc/udev/rules.d/80-tg3-fix.rules :
> ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x1...

Read more...

Hello, this thread has a patch that solved the bug (for me).
https://<email address hidden>/msg189347.html

The patch is here:
https://<email address hidden>/msg189923/0001-tg3-Add-clock-override-support-for-5762.patch

I tested this patch on the following kernels and situations.

1) Stable kernels 4.13.3 and 4.15 crash without the patch (plus all other versions tested). Patch is not merged yet in the main linux branch, until (and including) 4.15 (stable).

2) Stable kernels 4.13.3 and 4.15 work great with the patch: no timeouts on tg3. Fast transfers on gigabit links and 10/100 links.

3) I wrote to the patch author, mentioned my results and asked when it will be merged on Jan 31 (10 days ago). Still waiting, probably the author is currently quite busy.

4) A lot of tests performed during weeks. The last session took about one or two weeks, working full time, on an isolated network. Using the fog open source cloning solution. Several hundreds of GB transferred during tests, for cloning 100+ machines inside a few labs. Both single and multicast cloning sessions used. Tested with a gigabit switch and also with 10/100 switches. Checked both single and multicast, sequential tests, in parallel, with/without power failures, with/without several patches, in many configurations, with lots of kernel parameters, you name it.

5) The test scenario shows this bug is completely reproducible, 100% of the time. Without the patch, my kernels always fail. Tested about 20 different versions and none worked. With the patch above, the two versions always work correctly.

6) A minor detail: patch has a slight offset for 4.15 (2 lines, probably new comments or code) but works anyway.

This work would be impossible without all the cooperation from the fog team. Sebastian suggested the patch, and others helped a lot. A big "thank you" for them!

I wonder when this will be merged in the main kernel. Please, can anyone help on this?

Regards,
Paulo

Kai-Heng Feng (kaihengfeng) wrote :

Kernel with patch in comment #40. Please try it out.

http://people.canonical.com/~khfeng/lp1447664-clk/

Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)

Thank you, we will try it as soon as possible.

Currently I'm on vacation, and will not be able to test it until about
March 5 (2 weeks from now). But as soon as I test it, I'll let you know
about the results.

It would be great if someone else could try it too.

Thanks,
Paulo

On Feb 12, 2018 3:25 AM, "Kai-Heng Feng" <email address hidden>
wrote:

Kernel with patch in comment #40. Please try it out.

http://people.canonical.com/~khfeng/lp1447664-clk/

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1447664

Title:
  14e4:1687 broadcom tg3 network driver disconnects under high load

Status in linux package in Ubuntu:
  Triaged
Status in linux package in Debian:
  New

Bug description:

marc (boolioncube) wrote :

i recently got one of these EliteDesks. tg3 locks up like once a week; seems to happen when flexget adds a bunch to transmission ... it spikes the TX... and boom. i just installed the patched kernel now. thanks yall.

Ed S (imimimx) wrote :

dpkg: dependency problems prevent configuration of linux-headers-4.13.0-34-generic:
 linux-headers-4.13.0-34-generic depends on libssl1.1 (>= 1.1.0); however:
  Package libssl1.1 is not installed

Depending version problem for Ubuntu 16.04?

ii libssl-dev:amd64 1.0.2g-1ubuntu4.10 amd64 Secure Sockets Layer toolkit - development files

Kai-Heng Feng (kaihengfeng) wrote :

The kernel was compiled in Bionic, so it has wrong dependency on Xenial.
I built a new one, please give it a try:
http://people.canonical.com/~khfeng/lp1447664-xenial/

Kai-Heng Feng (kaihengfeng) wrote :

Guy, Broadcom has a new patch [1] that need to test.
Here's the kernel [2] to try.

[1] https://lkml.org/lkml/2018/3/20/35
[2] https://people.canonical.com/~khfeng/lp1447664-20180320/

Download full text (5.4 KiB)

Ok, I'll check it out. Thank you very much!

By the way, we downloaded and tested one of the Deb packages you created,
and it worked quite well. Will check which one was exactly before
reporting (almost sure it was the one for xenial).

We managed to reproduce the issue easily by booting into pxe and, after the
nic was started (trying to get an ip), we reset the machine and booted into
Ubuntu. There is a huge difference by doing this and doing a cold boot,
directly into Ubuntu.

My hypothesis is that pxe setups the nic in a way that is not the default,
by changing one (or more) of the config bits for some register. This same
bit(s) is/are not being touched by the tg3 driver without patch. This way,
a boot may work sometimes, maybe due to default values not being set by the
kernel module tg3 (and being set by pxe code, if it executed before Linux
is loaded).

Anyway, the unpatched kernel breaks very quickly, while the patched kernel
you provided worked out very well. This happens after running pxe.

I will check your links soon and return with our results in the next days,
hopefully this weekend or next week.

Thank you,
Paulo

On Mar 20, 2018 14:16, "Kai-Heng Feng" <email address hidden> wrote:

Guy, Broadcom has a new patch [1] that need to test.
Here's the kernel [2] to try.

[1] https://lkml.org/lkml/2018/3/20/35
[2] https://people.canonical.com/~khfeng/lp1447664-20180320/

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1447664

Title:
  14e4:1687 broadcom tg3 network driver disconnects under high load

Status in linux package in Ubuntu:
  Triaged
Status in linux package in Debian:
  New

Bug description:
  The tg3 broadcom network driver that binds with chipset 5762 goes offline
and unable to recover (even with tg3 watchdog timeout) when network
transmit is under high load. Call trace:
  https://launchpadlibrarian.net/204185480/dmesg

  When this happens, only a reboot would be able to fix it. Sometimes,
  however, bringing the interface offline and online (via ifconfig)
  would recover networking. I've also tested with the latest tg3 driver
  (dec 2014 version) and networking is still problematic. I have also
  disabled TSO, GSO etc... with ethtool and the bug still surfaces.
  This bug may be related to the integrated Firmware.

  Here is the procedure to replicate the issue because it is hard to
  replicate it under moderate network load.

  1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using
a Ubuntu/Kubunu Live CD 14.04-15.04.
  2. from another machine: start 5 sessions, repetitively copy (scp with
public key authentication) a 70 meg file back and forth to the tg3 machine
in each session. (not sure if this is necessary)
  3. create a 1GB file on the tg3 machine, with something like dd
if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
  4. from another machine: repetitively scp copy that 1GB file from the tg3
machine. This can be done with something like:

  while [ 0 ]; do
     scp -i /my/scp/private.key <email address hidden>:/my/test/file /tmp
  done;

  Networking will mostly goes offline in about 10-30 minutes.

  WORKAROUN...

Read more...

Kai-Heng Feng (kaihengfeng) wrote :

Folks,

tg3 maintainers are waiting for the test result. Hopefully it can fix the issue.

luc (glarage) wrote :

Hi Kai-heng,

I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018 x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
Unfortunelly, i have the same problem= (TG3 still crash, a reboot is mandatory)

[ 105.620301] tg3 0000:03:00.0 eno1: 0: Host status block [00000001:000000cc:(0000:002e:0000):(0000:0006)]
[ 105.620309] tg3 0000:03:00.0 eno1: 0: NAPI info [000000cc:000000cc:(0024:0006:01ff):0000:(00f7:0000:0000:0000)]
[ 105.620317] tg3 0000:03:00.0 eno1: 1: Host status block [00000001:00000042:(0000:0000:0000):(0830:0000)]
[ 105.620324] tg3 0000:03:00.0 eno1: 1: NAPI info [00000042:00000042:(0000:0000:01ff):0830:(0030:0030:0000:0000)]
[ 105.620331] tg3 0000:03:00.0 eno1: 2: Host status block [00000001:000000d2:(0fff:0000:0000):(0000:0000)]
[ 105.620370] tg3 0000:03:00.0 eno1: 2: NAPI info [000000d2:000000d2:(0000:0000:01ff):0fff:(07ff:07ff:0000:0000)]
[ 105.755739] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[ 105.797123] tg3 0000:03:00.0 eno1: Link is down
[ 105.889440] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0x00000000ffe3d640 flags=0x0020]
[ 105.889478] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000d address=0x00000000ffe3d680 flags=0x0020]
[ 109.932707] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
[ 109.932710] tg3 0000:03:00.0 eno1: Flow control is off for TX and off for RX
[ 109.932711] tg3 0000:03:00.0 eno1: EEE is enabled

Download full text (6.3 KiB)

We tried this same version yesterday and the bug was still present.
Actually it looked worse, because the machine crashed faster (maybe was
just an impression). Will collect logs to report this properly soon, in a
few hours.
Paulo

On Fri, Apr 13, 2018, 13:55 luc <email address hidden> wrote:

> Hi Kai-heng,
>
> I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018
> x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
> I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28
> 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
> Unfortunelly, i have the same problem= (TG3 still crash, a reboot is
> mandatory)
>
> [ 105.620301] tg3 0000:03:00.0 eno1: 0: Host status block
> [00000001:000000cc:(0000:002e:0000):(0000:0006)]
> [ 105.620309] tg3 0000:03:00.0 eno1: 0: NAPI info
> [000000cc:000000cc:(0024:0006:01ff):0000:(00f7:0000:0000:0000)]
> [ 105.620317] tg3 0000:03:00.0 eno1: 1: Host status block
> [00000001:00000042:(0000:0000:0000):(0830:0000)]
> [ 105.620324] tg3 0000:03:00.0 eno1: 1: NAPI info
> [00000042:00000042:(0000:0000:01ff):0830:(0030:0030:0000:0000)]
> [ 105.620331] tg3 0000:03:00.0 eno1: 2: Host status block
> [00000001:000000d2:(0fff:0000:0000):(0000:0000)]
> [ 105.620370] tg3 0000:03:00.0 eno1: 2: NAPI info
> [000000d2:000000d2:(0000:0000:01ff):0fff:(07ff:07ff:0000:0000)]
> [ 105.755739] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00
> enable_bit=2
> [ 105.797123] tg3 0000:03:00.0 eno1: Link is down
> [ 105.889440] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> domain=0x000d address=0x00000000ffe3d640 flags=0x0020]
> [ 105.889478] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
> domain=0x000d address=0x00000000ffe3d680 flags=0x0020]
> [ 109.932707] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
> [ 109.932710] tg3 0000:03:00.0 eno1: Flow control is off for TX and off
> for RX
> [ 109.932711] tg3 0000:03:00.0 eno1: EEE is enabled
>
> ** Attachment added: "Bug tg3"
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+attachment/5114233/+files/Bug%20tg3
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1447664
>
> Title:
> 14e4:1687 broadcom tg3 network driver disconnects under high load
>
> Status in linux package in Ubuntu:
> Triaged
> Status in linux package in Debian:
> New
>
> Bug description:
> The tg3 broadcom network driver that binds with chipset 5762 goes
> offline and unable to recover (even with tg3 watchdog timeout) when network
> transmit is under high load. Call trace:
> https://launchpadlibrarian.net/204185480/dmesg
>
> When this happens, only a reboot would be able to fix it. Sometimes,
> however, bringing the interface offline and online (via ifconfig)
> would recover networking. I've also tested with the latest tg3 driver
> (dec 2014 version) and networking is still problematic. I have also
> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
> This bug may be related to the integrated Firmware.
>
> Here is the procedure to replicate the issue because it is hard to
> replicate it ...

Read more...

Download full text (7.3 KiB)

Hi Kai-heng,

Here are the test results we got.
Kernel 4.15.0-14-generic failed. Transmit queue timed out. The dmesg output
is attached. The tg3 module crashes in a few seconds right after opening
the user session (e.g. about less than 10 sec).

However, kernel 4.15.0-9-generic worked like a charm. It boots and brings
up tg3, the Ethernet link is working and the module seems stable. We tested
it to download a few gb, an Ubuntu image, play videos for a few hours and
the like. Not even a single crash was observed. The dmesg output for this
working kernel is attached also, because maybe it might help you to sort
out what's different from one kernel to the other.

Would you like us to test another image? Or to gather more information?

Regards,
Paulo

On Fri, Apr 13, 2018, 14:03 Paulo Guedes - IFPE - Campus Recife <
<email address hidden>> wrote:

> We tried this same version yesterday and the bug was still present.
> Actually it looked worse, because the machine crashed faster (maybe was
> just an impression). Will collect logs to report this properly soon, in a
> few hours.
> Paulo
>
> On Fri, Apr 13, 2018, 13:55 luc <email address hidden> wrote:
>
>> Hi Kai-heng,
>>
>> I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018
>> x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
>> I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28
>> 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
>> Unfortunelly, i have the same problem= (TG3 still crash, a reboot is
>> mandatory)
>>
>> [ 105.620301] tg3 0000:03:00.0 eno1: 0: Host status block
>> [00000001:000000cc:(0000:002e:0000):(0000:0006)]
>> [ 105.620309] tg3 0000:03:00.0 eno1: 0: NAPI info
>> [000000cc:000000cc:(0024:0006:01ff):0000:(00f7:0000:0000:0000)]
>> [ 105.620317] tg3 0000:03:00.0 eno1: 1: Host status block
>> [00000001:00000042:(0000:0000:0000):(0830:0000)]
>> [ 105.620324] tg3 0000:03:00.0 eno1: 1: NAPI info
>> [00000042:00000042:(0000:0000:01ff):0830:(0030:0030:0000:0000)]
>> [ 105.620331] tg3 0000:03:00.0 eno1: 2: Host status block
>> [00000001:000000d2:(0fff:0000:0000):(0000:0000)]
>> [ 105.620370] tg3 0000:03:00.0 eno1: 2: NAPI info
>> [000000d2:000000d2:(0000:0000:01ff):0fff:(07ff:07ff:0000:0000)]
>> [ 105.755739] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00
>> enable_bit=2
>> [ 105.797123] tg3 0000:03:00.0 eno1: Link is down
>> [ 105.889440] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x000d address=0x00000000ffe3d640 flags=0x0020]
>> [ 105.889478] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x000d address=0x00000000ffe3d680 flags=0x0020]
>> [ 109.932707] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
>> [ 109.932710] tg3 0000:03:00.0 eno1: Flow control is off for TX and off
>> for RX
>> [ 109.932711] tg3 0000:03:00.0 eno1: EEE is enabled
>>
>> ** Attachment added: "Bug tg3"
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+attachment/5114233/+files/Bug%20tg3
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1447664
>>
>> Title:
>> 14e4:1687 ...

Read more...

luc (glarage) wrote :

Sorry for multi posting, didn't saw the 4.15.0.9 kernel before... :)
TG3 still crash, but not too early... I made several video on full HD + several speed test before losing connection; (FTTH here, my download speed is about 290 Mbps)

luc (glarage) wrote :
Download full text (4.8 KiB)

Hi guys,
A little review about the new bios (2.30) available for HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.30 03/22/2018.
It's change nothing about the TG3 driver= still crash (without iommu=soft, in my case) .... :(

[ 80.864034] ------------[ cut here ]------------
[ 80.864039] NETDEV WATCHDOG: eno1 (tg3): transmit queue 0 timed out
[ 80.864081] WARNING: CPU: 1 PID: 0 at /home/khfeng/Sources/linux-lp1447664-xenial/net/sched/sch_generic.c:323 dev_watchdog+0x222/0x230
[ 80.864082] Modules linked in: nls_iso8859_1 edac_mce_amd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi aesni_intel aes_x86_64 hp_wmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer shpchp snd crypto_simd glue_helper cryptd fam15h_power input_leds serio_raw sparse_keymap soundcore wmi_bmof k10temp tpm_infineon i2c_piix4 mac_hid ip_tables x_tables autofs4 amdkfd amd_iommu_v2 amdgpu chash radeon i2c_algo_bit ttm tg3 ptp psmouse pps_core drm_kms_helper wmi syscopyarea sysfillrect ahci sysimgblt fb_sys_fops libahci drm video
[ 80.864136] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-9-generic #10~lp1447664+xenial
[ 80.864137] Hardware name: Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.30 03/22/2018
[ 80.864141] RIP: 0010:dev_watchdog+0x222/0x230
[ 80.864143] RSP: 0018:ffff9d3caec83e68 EFLAGS: 00010282
[ 80.864146] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 80.864147] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9d3caec96450
[ 80.864149] RBP: ffff9d3caec83e98 R08: 0000000000000001 R09: 00000000000003da
[ 80.864150] R10: 0000000000000000 R11: 00000000000003da R12: 0000000000000005
[ 80.864152] R13: ffff9d3c9b4a4000 R14: ffff9d3c9b4a4478 R15: ffff9d3c9af34d80
[ 80.864154] FS: 0000000000000000(0000) GS:ffff9d3caec80000(0000) knlGS:0000000000000000
[ 80.864156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 80.864158] CR2: 00002547c8b50c00 CR3: 000000022188c000 CR4: 00000000000406e0
[ 80.864160] Call Trace:
[ 80.864163] <IRQ>
[ 80.864168] ? dev_graft_qdisc+0x70/0x70
[ 80.864174] call_timer_fn+0x32/0x140
[ 80.864178] run_timer_softirq+0x1ed/0x440
[ 80.864182] ? ktime_get+0x3e/0xa0
[ 80.864186] ? lapic_next_event+0x20/0x30
[ 80.864192] __do_softirq+0xf2/0x288
[ 80.864196] irq_exit+0xb6/0xc0
[ 80.864200] smp_apic_timer_interrupt+0x71/0x140
[ 80.864204] apic_timer_interrupt+0x9f/0xb0
[ 80.864205] </IRQ>
[ 80.864210] RIP: 0010:cpuidle_enter_state+0xa7/0x300
[ 80.864212] RSP: 0018:ffffbd7700d4fe60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
[ 80.864215] RAX: ffff9d3caeca2840 RBX: 0000000000000002 RCX: 000000000000001f
[ 80.864216] RDX: 0000000000000000 RSI: 0000000024a3c7c4 RDI: 0000000000000000
[ 80.864218] RBP: ffffbd7700d4fe98 R08: ffff9d3caeca1664 R09: 0000000000000018
[ 80.864219] R10: ffffbd7700d4fe30 R11: 000000000000011c R12: 0000000000000002
[ 80.864221] R13: ffff9d3ca5f1b000 R14: ffffffffbf3802f8 R15: 00000012d3b48a8f
[ 80.864226] cpuidle_enter+0x17/0x20
[ 80.864230] call_cpuidle+0x23/0x40
[ 80.864233] do_idle+0x197/0x200
[ 80.864236] cpu_start...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Related questions