Comment 51 for bug 1447664

Hi Kai-heng,

Here are the test results we got.
Kernel 4.15.0-14-generic failed. Transmit queue timed out. The dmesg output
is attached. The tg3 module crashes in a few seconds right after opening
the user session (e.g. about less than 10 sec).

However, kernel 4.15.0-9-generic worked like a charm. It boots and brings
up tg3, the Ethernet link is working and the module seems stable. We tested
it to download a few gb, an Ubuntu image, play videos for a few hours and
the like. Not even a single crash was observed. The dmesg output for this
working kernel is attached also, because maybe it might help you to sort
out what's different from one kernel to the other.

Would you like us to test another image? Or to gather more information?

Regards,
Paulo

On Fri, Apr 13, 2018, 14:03 Paulo Guedes - IFPE - Campus Recife <
<email address hidden>> wrote:

> We tried this same version yesterday and the bug was still present.
> Actually it looked worse, because the machine crashed faster (maybe was
> just an impression). Will collect logs to report this properly soon, in a
> few hours.
> Paulo
>
> On Fri, Apr 13, 2018, 13:55 luc <email address hidden> wrote:
>
>> Hi Kai-heng,
>>
>> I tried 4.15.0-14-generic #15~lp1447664 SMP Tue Mar 20 14:31:37 CST 2018
>> x86_64 x86_64 x86_64 GNU/Linux, on Lubuntu 17.10.
>> I have a Hewlett-Packard HP EliteDesk 705 G1 SFF/2215, BIOS L06 v02.28
>> 02/07/2017 and Lubuntu is in UEFI mode (my only OS) on this device.
>> Unfortunelly, i have the same problem= (TG3 still crash, a reboot is
>> mandatory)
>>
>> [ 105.620301] tg3 0000:03:00.0 eno1: 0: Host status block
>> [00000001:000000cc:(0000:002e:0000):(0000:0006)]
>> [ 105.620309] tg3 0000:03:00.0 eno1: 0: NAPI info
>> [000000cc:000000cc:(0024:0006:01ff):0000:(00f7:0000:0000:0000)]
>> [ 105.620317] tg3 0000:03:00.0 eno1: 1: Host status block
>> [00000001:00000042:(0000:0000:0000):(0830:0000)]
>> [ 105.620324] tg3 0000:03:00.0 eno1: 1: NAPI info
>> [00000042:00000042:(0000:0000:01ff):0830:(0030:0030:0000:0000)]
>> [ 105.620331] tg3 0000:03:00.0 eno1: 2: Host status block
>> [00000001:000000d2:(0fff:0000:0000):(0000:0000)]
>> [ 105.620370] tg3 0000:03:00.0 eno1: 2: NAPI info
>> [000000d2:000000d2:(0000:0000:01ff):0fff:(07ff:07ff:0000:0000)]
>> [ 105.755739] tg3 0000:03:00.0: tg3_stop_block timed out, ofs=4c00
>> enable_bit=2
>> [ 105.797123] tg3 0000:03:00.0 eno1: Link is down
>> [ 105.889440] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x000d address=0x00000000ffe3d640 flags=0x0020]
>> [ 105.889478] tg3 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT
>> domain=0x000d address=0x00000000ffe3d680 flags=0x0020]
>> [ 109.932707] tg3 0000:03:00.0 eno1: Link is up at 1000 Mbps, full duplex
>> [ 109.932710] tg3 0000:03:00.0 eno1: Flow control is off for TX and off
>> for RX
>> [ 109.932711] tg3 0000:03:00.0 eno1: EEE is enabled
>>
>> ** Attachment added: "Bug tg3"
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+attachment/5114233/+files/Bug%20tg3
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1447664
>>
>> Title:
>> 14e4:1687 broadcom tg3 network driver disconnects under high load
>>
>> Status in linux package in Ubuntu:
>> Triaged
>> Status in linux package in Debian:
>> New
>>
>> Bug description:
>> The tg3 broadcom network driver that binds with chipset 5762 goes
>> offline and unable to recover (even with tg3 watchdog timeout) when network
>> transmit is under high load. Call trace:
>> https://launchpadlibrarian.net/204185480/dmesg
>>
>> When this happens, only a reboot would be able to fix it. Sometimes,
>> however, bringing the interface offline and online (via ifconfig)
>> would recover networking. I've also tested with the latest tg3 driver
>> (dec 2014 version) and networking is still problematic. I have also
>> disabled TSO, GSO etc... with ethtool and the bug still surfaces.
>> This bug may be related to the integrated Firmware.
>>
>> Here is the procedure to replicate the issue because it is hard to
>> replicate it under moderate network load.
>>
>> 1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705)
>> using a Ubuntu/Kubunu Live CD 14.04-15.04.
>> 2. from another machine: start 5 sessions, repetitively copy (scp with
>> public key authentication) a 70 meg file back and forth to the tg3 machine
>> in each session. (not sure if this is necessary)
>> 3. create a 1GB file on the tg3 machine, with something like dd
>> if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
>> 4. from another machine: repetitively scp copy that 1GB file from the
>> tg3 machine. This can be done with something like:
>>
>> while [ 0 ]; do
>> scp -i /my/scp/private.key <email address hidden>:/my/test/file /tmp
>> done;
>>
>> Networking will mostly goes offline in about 10-30 minutes.
>>
>> WORKAROUND: Add udev rule to make the changes permanent in
>> /etc/udev/rules.d/80-tg3-fix.rules :
>> ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4",
>> ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"
>>
>> ProblemType: Bug
>> DistroRelease: Ubuntu 15.04
>> Package: linux-image-3.19.0-15-generic 3.19.0-15.15
>> ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
>> Uname: Linux 3.19.0-15-generic x86_64
>> ApportVersion: 2.17.2-0ubuntu1
>> Architecture: amd64
>> AudioDevicesInUse:
>> USER PID ACCESS COMMAND
>> /dev/snd/controlC1: kubuntu 3748 F.... pulseaudio
>> /dev/snd/controlC0: kubuntu 3748 F.... pulseaudio
>> CasperVersion: 1.360
>> Date: Thu Apr 23 11:16:24 2015
>> IwConfig:
>> eth0 no wireless extensions.
>>
>> lo no wireless extensions.
>> LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
>> MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
>> ProcEnviron:
>> LANGUAGE=
>> TERM=xterm
>> PATH=(custom, no user)
>> LANG=en_US.UTF-8
>> SHELL=/bin/bash
>> ProcFB: 0 radeondrmfb
>> ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi
>> file=/cdrom/preseed/hostname.seed boot=casper maybe-ubiquity quiet splash
>> ---
>> PulseList:
>> Error: command ['pacmd', 'list'] failed with exit code 1: Home
>> directory not accessible: Permission denied
>> No PulseAudio daemon running, or not running as session daemon.
>> RelatedPackageVersions:
>> linux-restricted-modules-3.19.0-15-generic N/A
>> linux-backports-modules-3.19.0-15-generic N/A
>> linux-firmware 1.143
>> RfKill:
>>
>> SourcePackage: linux
>> UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
>> UpgradeStatus: No upgrade log present (probably fresh install)
>> dmi.bios.date: 10/22/2014
>> dmi.bios.vendor: Hewlett-Packard
>> dmi.bios.version: L06 v02.15
>> dmi.board.asset.tag: 2UA5041TG4
>> dmi.board.name: 2215
>> dmi.board.vendor: Hewlett-Packard
>> dmi.chassis.asset.tag: 2UA5041TG4
>> dmi.chassis.type: 6
>> dmi.chassis.vendor: Hewlett-Packard
>> dmi.modalias:
>> dmi:bvnHewlett-Packard:bvrL06v02.15:bd10/22/2014:svnHewlett-Packard:pnHPEliteDesk705G1MT:pvr:rvnHewlett-Packard:rn2215:rvr:cvnHewlett-Packard:ct6:cvr:
>> dmi.product.name: HP EliteDesk 705 G1 MT
>> dmi.sys.vendor: Hewlett-Packard
>>
>> To manage notifications about this bug go to:
>>
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1447664/+subscriptions
>>
>