14e4:1648 Broadcom NIC (tg3) not working with DMA errors

Bug #961239 reported by Markus Schuster
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

All information should be included in the attached debug information - short summary:
While setting the network interface "up" after boot (ip link set up dev eth0), I get some DMA errors and dumps from tg3. After that the interface is "up", but I'm unable to pass any traffic over that network interface.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-19-generic 3.2.0-19.30
ProcVersionSignature: Ubuntu 3.2.0-19.30-generic 3.2.11
Uname: Linux 3.2.0-19-generic x86_64
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Mar 19 10:43 seq
 crw-rw---T 1 root audio 116, 33 Mar 19 10:43 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 1.94.1-0ubuntu2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Date: Mon Mar 19 10:46:05 2012
HibernationDevice: RESUME=UUID=148f8b55-c516-40b0-98f0-aec3860ab5f3
InstallationMedia: Ubuntu-Server 12.04 LTS "Precise Pangolin" - Alpha amd64 (20120315)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 002: ID 6189:182d Sitecom USB 2.0 Ethernet
MachineType: IBM IBM x3850-[88631RG]-
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:en
 TERM=linux
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-19-generic root=/dev/mapper/main-root ro
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-19-generic N/A
 linux-backports-modules-3.2.0-19-generic N/A
 linux-firmware 1.71
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 02/15/2008
dmi.bios.vendor: IBM
dmi.bios.version: -[ZUE166AUS-1.12]-
dmi.board.name: Node1 Processor Card
dmi.board.vendor: IBM
dmi.chassis.type: 17
dmi.chassis.vendor: IBM
dmi.modalias: dmi:bvnIBM:bvr-[ZUE166AUS-1.12]-:bd02/15/2008:svnIBM:pnIBMx3850-[88631RG]-:pvr:rvnIBM:rnNode1ProcessorCard:rvr:cvnIBM:ct17:cvr:
dmi.product.name: IBM x3850-[88631RG]-
dmi.sys.vendor: IBM

Revision history for this message
Markus Schuster (markus-schuster) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-19.31)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: New → Confirmed
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-19.31
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Markus Schuster (markus-schuster) wrote : Re: Broadcom NIC (tg3) not working with DMA errors

Still happening with 3.2.0-19.31

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.3 kernel[1] (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-precise/

tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Luis Henriques (henrix) wrote :

After looking at the code, the messages related with the DMA errors and registers dump should be harmless. The tg3 driver dumps these logs while the device is not yet initialised (there are patches yet to be accepted mainline to remove these logs when device is not up yet).

The fact that there is no traffic passing through this interface is more important, and should be a different issue. You seem to have 3 ethernet devices, 2 are using tg3 driver and another usb device. Could you please describe how you're trying to send packets through the interface?

Revision history for this message
Markus Schuster (markus-schuster) wrote :

Here's my update:
First of all a new finding: The server has two Broadcom NICs on board - both driven by the tg3 kernel module. eth0 is the one I have trouble with, eth1 works without problems - quite strange...

Second, I've upgraded to 3.2.0-20.32 as I noticed its release - but there's no change in behavior.

Third, I've installed the mainline/vanilla linux package linux-image-3.3.0-030300-generic_3.3.0-030300.201203182135_amd64.deb and it fixes this bug completely - no register dumps, no DMA errors and two working NICs :)

Finally here are my steps to reproduce the problem about beeing not able to pass any traffic via the (first) NIC - actually quite straight forward:
1. Set the interface up (ip link set dev eth0 up)
2. Assign an IP address - either manually (ip addr add dev eth0 192.0.0.1/24) or by running dhclient -v eth0 (but dhclient fails to get any DHCP lease)
3. ping a host on the same subnet -> No reply received and no entry in the ARP cache (ip neig)
Oh, and I use the same network cable, connected to the same switch port, for all the tests. So I'm quite sure it's not related to stupid things like a broken network cable or defective switch port.

The USB NIC you saw was just a quick workaround to be able to upload debug information for this bug before I was aware eth1 is actually working.

tags: added: kernel-fixed-upstream
removed: needs-upstream-testing
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-20.32)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-20.32
Luis Henriques (henrix)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Markus Schuster (markus-schuster) wrote : Re: Broadcom NIC (tg3) not working with DMA errors

What's the status of this bug report? Ubuntu 12.04 will be released in less than a month and it would be a real shame if it won't work reliably on the market leaders server hardware...

penalvch (penalvch)
tags: added: latest-bios-1.12
removed: kernel-request-3.2.0-19.31 kernel-request-3.2.0-20.32
tags: added: kernel-fixed-upstream-v3.3 needs-reverse-bisect
removed: kernel-fixed-upstream
Revision history for this message
penalvch (penalvch) wrote :

Markus Schuster, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest LTS development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/ubuntu-server/daily/current/ .

If it remains an issue, could you please just make a comment to this?

summary: - Broadcom NIC (tg3) not working with DMA errors
+ 14e4:1648 Broadcom NIC (tg3) not working with DMA errors
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
mkaatman (mkaatman) wrote :

I'm having the exact same behavior. Dell T300 server running Ubuntu 13.10. ( 3.11.0-15-generic #23-Ubuntu SMP Mon Dec 9 18:17:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux)

eth0 fails, eth1 works flawlessly.

01:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express [14e4:165a]

/var/log/kern.log:Jan 15 20:40:15 kernel: [ 78.704013] tg3 0000:01:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update.

ethtool -i eth0
driver: tg3
version: 3.132
firmware-version: 5722-v3.08, ASFIPMI v6.02
bus-info: 0000:01:00.0
supports-statistics: yes

A couple guys [here](http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=681089) having the same issue.

Revision history for this message
penalvch (penalvch) wrote :

mkaatman, thank you for your comment. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into a Ubuntu repository kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Revision history for this message
Scott Smith (bscott.smith) wrote :

Validated functionality on 19.04 with inbox driver version 3.137
Traffic passes as expected

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.