By the way, we downloaded and tested one of the Deb packages you created,
and it worked quite well. Will check which one was exactly before
reporting (almost sure it was the one for xenial).
We managed to reproduce the issue easily by booting into pxe and, after the
nic was started (trying to get an ip), we reset the machine and booted into
Ubuntu. There is a huge difference by doing this and doing a cold boot,
directly into Ubuntu.
My hypothesis is that pxe setups the nic in a way that is not the default,
by changing one (or more) of the config bits for some register. This same
bit(s) is/are not being touched by the tg3 driver without patch. This way,
a boot may work sometimes, maybe due to default values not being set by the
kernel module tg3 (and being set by pxe code, if it executed before Linux
is loaded).
Anyway, the unpatched kernel breaks very quickly, while the patched kernel
you provided worked out very well. This happens after running pxe.
I will check your links soon and return with our results in the next days,
hopefully this weekend or next week.
Thank you,
Paulo
On Mar 20, 2018 14:16, "Kai-Heng Feng" <email address hidden> wrote:
Guy, Broadcom has a new patch [1] that need to test.
Here's the kernel [2] to try.
Title:
14e4:1687 broadcom tg3 network driver disconnects under high load
Status in linux package in Ubuntu:
Triaged
Status in linux package in Debian:
New
Bug description:
The tg3 broadcom network driver that binds with chipset 5762 goes offline
and unable to recover (even with tg3 watchdog timeout) when network
transmit is under high load. Call trace: https://launchpadlibrarian.net/204185480/dmesg
When this happens, only a reboot would be able to fix it. Sometimes,
however, bringing the interface offline and online (via ifconfig)
would recover networking. I've also tested with the latest tg3 driver
(dec 2014 version) and networking is still problematic. I have also
disabled TSO, GSO etc... with ethtool and the bug still surfaces.
This bug may be related to the integrated Firmware.
Here is the procedure to replicate the issue because it is hard to
replicate it under moderate network load.
1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using
a Ubuntu/Kubunu Live CD 14.04-15.04.
2. from another machine: start 5 sessions, repetitively copy (scp with
public key authentication) a 70 meg file back and forth to the tg3 machine
in each session. (not sure if this is necessary)
3. create a 1GB file on the tg3 machine, with something like dd
if=/dev/urandom of=/my/test/file bs=1024 count=$((1024*1000))
4. from another machine: repetitively scp copy that 1GB file from the tg3
machine. This can be done with something like:
while [ 0 ]; do
scp -i /my/scp/private.key <email address hidden>:/my/test/file /tmp
done;
Networking will mostly goes offline in about 10-30 minutes.
WORKAROUND: Add udev rule to make the changes permanent in
/etc/udev/rules.d/80-tg3-fix.rules :
ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}=="0x14e4",
ATTRS{device}=="0x1687", RUN+="/sbin/ethtool -K %k highdma off"
Ok, I'll check it out. Thank you very much!
By the way, we downloaded and tested one of the Deb packages you created,
and it worked quite well. Will check which one was exactly before
reporting (almost sure it was the one for xenial).
We managed to reproduce the issue easily by booting into pxe and, after the
nic was started (trying to get an ip), we reset the machine and booted into
Ubuntu. There is a huge difference by doing this and doing a cold boot,
directly into Ubuntu.
My hypothesis is that pxe setups the nic in a way that is not the default,
by changing one (or more) of the config bits for some register. This same
bit(s) is/are not being touched by the tg3 driver without patch. This way,
a boot may work sometimes, maybe due to default values not being set by the
kernel module tg3 (and being set by pxe code, if it executed before Linux
is loaded).
Anyway, the unpatched kernel breaks very quickly, while the patched kernel
you provided worked out very well. This happens after running pxe.
I will check your links soon and return with our results in the next days,
hopefully this weekend or next week.
Thank you,
Paulo
On Mar 20, 2018 14:16, "Kai-Heng Feng" <email address hidden> wrote:
Guy, Broadcom has a new patch [1] that need to test.
Here's the kernel [2] to try.
[1] https:/ /lkml.org/ lkml/2018/ 3/20/35 /people. canonical. com/~khfeng/ lp1447664- 20180320/
[2] https:/
-- /bugs.launchpad .net/bugs/ 1447664
You received this bug notification because you are subscribed to the bug
report.
https:/
Title:
14e4:1687 broadcom tg3 network driver disconnects under high load
Status in linux package in Ubuntu:
Triaged
Status in linux package in Debian:
New
Bug description: /launchpadlibra rian.net/ 204185480/ dmesg
The tg3 broadcom network driver that binds with chipset 5762 goes offline
and unable to recover (even with tg3 watchdog timeout) when network
transmit is under high load. Call trace:
https:/
When this happens, only a reboot would be able to fix it. Sometimes,
however, bringing the interface offline and online (via ifconfig)
would recover networking. I've also tested with the latest tg3 driver
(dec 2014 version) and networking is still problematic. I have also
disabled TSO, GSO etc... with ethtool and the bug still surfaces.
This bug may be related to the integrated Firmware.
Here is the procedure to replicate the issue because it is hard to
replicate it under moderate network load.
1. Bootup a machine with a broadcom 5762 NIC (ie. HP DeskElite 705) using ((1024* 1000))
a Ubuntu/Kubunu Live CD 14.04-15.04.
2. from another machine: start 5 sessions, repetitively copy (scp with
public key authentication) a 70 meg file back and forth to the tg3 machine
in each session. (not sure if this is necessary)
3. create a 1GB file on the tg3 machine, with something like dd
if=/dev/urandom of=/my/test/file bs=1024 count=$
4. from another machine: repetitively scp copy that 1GB file from the tg3
machine. This can be done with something like:
while [ 0 ]; do :/my/test/ file /tmp
scp -i /my/scp/private.key <email address hidden>
done;
Networking will mostly goes offline in about 10-30 minutes.
WORKAROUND: Add udev rule to make the changes permanent in rules.d/ 80-tg3- fix.rules : =="0x14e4" , =="0x1687" , RUN+="/sbin/ethtool -K %k highdma off"
/etc/udev/
ACTION=="add", SUBSYSTEM=="net", ATTRS{vendor}
ATTRS{device}
ProblemType: Bug 3.19.0- 15-generic 3.19.0-15.15 gnature: Ubuntu 3.19.0- 15.15-generic 3.19.3 nUse: snd/controlC1: kubuntu 3748 F.... pulseaudio snd/controlC0: kubuntu 3748 F.... pulseaudio
DistroRelease: Ubuntu 15.04
Package: linux-image-
ProcVersionSi
Uname: Linux 3.19.0-15-generic x86_64
ApportVersion: 2.17.2-0ubuntu1
Architecture: amd64
AudioDevicesI
USER PID ACCESS COMMAND
/dev/
/dev/
CasperVersion: 1.360
Date: Thu Apr 23 11:16:24 2015
IwConfig:
eth0 no wireless extensions.
lo no wireless extensions. Line: BOOT_IMAGE= /casper/ vmlinuz. efi preseed/ hostname. seed boot=casper maybe-ubiquity quiet splash eVersions: restricted- modules- 3.19.0- 15-generic N/A backports- modules- 3.19.0- 15-generic N/A
LiveMediaBuild: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
MachineType: Hewlett-Packard HP EliteDesk 705 G1 MT
ProcEnviron:
LANGUAGE=
TERM=xterm
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmd
file=/cdrom/
---
PulseList:
Error: command ['pacmd', 'list'] failed with exit code 1: Home directory
not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
RelatedPackag
linux-
linux-
linux-firmware 1.143
RfKill:
SourcePackage: linux asset.tag: 2UA5041TG4 asset.tag: 2UA5041TG4 vendor: Hewlett-Packard Packard: bvrL06v02. 15:bd10/ 22/2014: svnHewlett- Packard: pnHPEliteDesk70 5G1MT:pvr: rvnHewlett- Packard: rn2215: rvr:cvnHewlett- Packard: ct6:cvr:
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/22/2014
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: L06 v02.15
dmi.board.
dmi.board.name: 2215
dmi.board.vendor: Hewlett-Packard
dmi.chassis.
dmi.chassis.type: 6
dmi.chassis.
dmi.modalias:
dmi:bvnHewlett-
dmi.product.name: HP EliteDesk 705 G1 MT
dmi.sys.vendor: Hewlett-Packard
To manage notifications about this bug go to: /bugs.launchpad .net/ubuntu/ +source/ linux/+ bug/1447664/ +subscriptions
https:/