tg3 transmit timeout

Bug #294092 reported by oliford
68
This bug affects 12 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

Binary package hint: linux-image-2.6.27-7-generic

I have recently installed Xubuntu Intrepid Ibex 8.10 x64 (fresh install) and the ethernet driver stops working periodically and NETDEV emits a warning to dmesg. I've been using Xubuntu on the machine through Fiesty, Gutsy and Hardy and have not seen the problem before.

This morning (5/11/8) I ran an apt-get upgrade which upgraded linux-image-2.6.27-7-generic but the problem remains. The package is now on version linux-image-2.6.27-7.16.

The relevant dmesg entries:
[ 1086.816047] ------------[ cut here ]------------
[ 1086.816057] WARNING: at /build/buildd/linux-2.6.27/net/sched/sch_generic.c:219 dev_watchdog+0x272/0x280()
[ 1086.816062] NETDEV WATCHDOG: eth0 (tg3): transmit timed out
[ 1086.816065] Modules linked in: af_packet i915 drm rfcomm bridge stp bnep sco l2cap bluetooth ppdev ipv6 acpi_cpufreq cpufreq_powersave cpufreq_stats cpufreq_ondemand freq_table cpufreq_userspace cpufreq_conservative sbs pci_slot sbshc iptable_filter ip_tables x_tables sbp2 lp joydev pcmcia snd_hda_intel snd_pcm_oss snd_mixer_oss arc4 ecb crypto_blkcipher snd_pcm pcspkr evdev psmouse iwl3945 serio_raw snd_seq_dummy rfkill snd_seq_oss mac80211 parport_pc led_class snd_seq_midi parport cfg80211 snd_rawmidi snd_seq_midi_event tpm_infineon tpm snd_seq tpm_bios video container sdhci_pci yenta_socket snd_timer output sdhci rsrc_nonstatic snd_seq_device pcmcia_core tifm_7xx1 mmc_core wmi tifm_core snd ac battery iTCO_wdt soundcore button iTCO_vendor_support intel_agp snd_page_alloc shpchp pci_hotplug ext3 jbd mbcache sr_mod cdrom ata_generic sd_mod crc_t10dif sg ata_piix ahci pata_acpi libata scsi_mod dock tg3 libphy ohci1394 ieee1394 ehci_hcd uhci_hcd usbcore thermal processor fan fbcon tileblit font bitblit softcursor fuse
[ 1086.816233] Pid: 0, comm: swapper Not tainted 2.6.27-7-generic #1
[ 1086.816237]
[ 1086.816238] Call Trace:
[ 1086.816241] <IRQ> [<ffffffff8024e91c>] warn_slowpath+0xbc/0xf0
[ 1086.816256] [<ffffffff8023e823>] ? __enqueue_entity+0x93/0xa0
[ 1086.816262] [<ffffffff8023fd09>] ? enqueue_entity+0xd9/0x260
[ 1086.816268] [<ffffffff80246397>] ? enqueue_task_fair+0x57/0x60
[ 1086.816274] [<ffffffff8023c390>] ? enqueue_task+0x50/0x60
[ 1086.816279] [<ffffffff80245e4d>] ? resched_task+0x2d/0x90
[ 1086.816285] [<ffffffff802473ee>] ? try_to_wake_up+0x11e/0x2e0
[ 1086.816293] [<ffffffff80273ff4>] ? timer_stats_update_stats+0x24/0x370
[ 1086.816300] [<ffffffff80267066>] ? autoremove_wake_function+0x16/0x40
[ 1086.816307] [<ffffffff80253e79>] ? set_normalized_timespec+0x9/0x90
[ 1086.816313] [<ffffffff803a89fa>] ? strlcpy+0x4a/0x60
[ 1086.816318] [<ffffffff8047b942>] dev_watchdog+0x272/0x280
[ 1086.816326] [<ffffffff802631c1>] ? __queue_work+0x41/0x50
[ 1086.816332] [<ffffffff8022c71e>] ? hpet_legacy_next_event+0xe/0x80
[ 1086.816338] [<ffffffff8047b6d0>] ? dev_watchdog+0x0/0x280
[ 1086.816343] [<ffffffff8025a019>] run_timer_softirq+0x179/0x260
[ 1086.816349] [<ffffffff8027273d>] ? tick_handle_oneshot_broadcast+0xed/0x100
[ 1086.816356] [<ffffffff80254d8c>] __do_softirq+0x8c/0x100
[ 1086.816362] [<ffffffff8021417c>] call_softirq+0x1c/0x30
[ 1086.816367] [<ffffffff80215875>] do_softirq+0x65/0xa0
[ 1086.816373] [<ffffffff80254af5>] irq_exit+0x95/0xa0
[ 1086.816377] [<ffffffff80215b1b>] do_IRQ+0x8b/0x100
[ 1086.816382] [<ffffffff80212f0e>] ret_from_intr+0x0/0x29
[ 1086.816386] <EOI> [<ffffffffa003adc4>] ? acpi_idle_enter_simple+0x16b/0x1aa [processor]
[ 1086.816416] [<ffffffffa003adbc>] ? acpi_idle_enter_simple+0x163/0x1aa [processor]
[ 1086.816424] [<ffffffff8044bb59>] ? cpuidle_idle_call+0xb9/0x100
[ 1086.816430] [<ffffffff80210e95>] ? cpu_idle+0x75/0x110
[ 1086.816436] [<ffffffff804f0446>] ? rest_init+0x66/0x70
[ 1086.816441]
[ 1086.816444] ---[ end trace 0b1fb181c0127e1a ]---
[ 1086.816448] tg3: eth0: transmit timed out, resetting
[ 1086.816457] tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
[ 1086.816464] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
[ 1086.918338] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
[ 1087.019699] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
[ 1087.144247] tg3: eth0: Link is down.
[ 1089.202947] tg3: eth0: Link is up at 100 Mbps, full duplex.
[ 1089.202957] tg3: eth0: Flow control is on for TX and on for RX.

My system is a HP Compaq nc6320 laptop (intel core 2 duo 64-bit) and has a BCM5788 Ethernet device: (from lspci)
02:0e.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5788 Gigabit Ethernet (rev 03)

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
breusz (breusz) wrote :
Download full text (4.3 KiB)

Nov 24 18:50:43 zilla kernel: [16085.000016] ------------[ cut here ]------------
Nov 24 18:50:43 zilla kernel: [16085.000020] WARNING: at /build/buildd/linux-2.6.27/net/sched/sch_generic
.c:219 dev_watchdog+0x21a/0x230()
Nov 24 18:50:43 zilla kernel: [16085.000022] NETDEV WATCHDOG: eth1 (sky2): transmit timed out
Nov 24 18:50:43 zilla kernel: [16085.000024] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conn
track_ipv4 nf_conntrack af_packet nfsd auth_rpcgss exportfs nfs lockd nfs_acl sunrpc ipv6 iptable_filter
ip_tables x_tables coretemp w83627ehf hwmon_vid sbp2 parport_pc lp parport loop button iTCO_wdt iTCO_vend
or_support shpchp pci_hotplug pcspkr evdev snd_hda_intel i82975x_edac edac_core snd_pcm snd_timer snd sou
ndcore snd_page_alloc ext3 jbd mbcache sd_mod crc_t10dif ata_piix sg ata_generic pata_jmicron ahci pata_a
cpi ehci_hcd ohci1394 uhci_hcd ieee1394 libata scsi_mod usbcore dock sky2 dm_mirror dm_log dm_snapshot dm
_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse
Nov 24 18:50:43 zilla kernel: [16085.000070] Pid: 0, comm: swapper Not tainted 2.6.27-7-server #1
Nov 24 18:50:43 zilla kernel: [16085.000072] [<c01393c5>] warn_slowpath+0x65/0x90
Nov 24 18:50:43 zilla kernel: [16085.000077] [<c012c138>] ? enqueue_entity+0xd8/0x2f0
Nov 24 18:50:43 zilla kernel: [16085.000081] [<c0131d88>] ? enqueue_task_fair+0x48/0x50
Nov 24 18:50:43 zilla kernel: [16085.000084] [<c0128617>] ? enqueue_task+0x57/0x70
Nov 24 18:50:43 zilla kernel: [16085.000086] [<c0132b66>] ? try_to_wake_up+0xd6/0x290
Nov 24 18:50:43 zilla kernel: [16085.000089] [<c01560bb>] ? getnstimeofday+0x4b/0x100
Nov 24 18:50:43 zilla kernel: [16085.000093] [<c013dfe6>] ? set_normalized_timespec+0x16/0x90
Nov 24 18:50:43 zilla kernel: [16085.000096] [<c015bde7>] ? timer_stats_update_stats+0x17/0x250
Nov 24 18:50:43 zilla kernel: [16085.000100] [<c025cd59>] ? strlen+0x9/0x20
Nov 24 18:50:43 zilla kernel: [16085.000103] [<c025adbd>] ? strlcpy+0x1d/0x60
Nov 24 18:50:43 zilla kernel: [16085.000106] [<c02ff2c7>] ? netdev_drivername+0x37/0x40
Nov 24 18:50:43 zilla kernel: [16085.000109] [<c0313f5a>] dev_watchdog+0x21a/0x230
Nov 24 18:50:43 zilla kernel: [16085.000112] [<c0158400>] ? clocksource_watchdog+0x220/0x280
Nov 24 18:50:43 zilla kernel: [16085.000115] [<c0143a68>] run_timer_softirq+0x138/0x210
Nov 24 18:50:43 zilla kernel: [16085.000117] [<c0313d40>] ? dev_watchdog+0x0/0x230
Nov 24 18:50:43 zilla kernel: [16085.000120] [<c0313d40>] ? dev_watchdog+0x0/0x230
Nov 24 18:50:43 zilla kernel: [16085.000122] [<c013ec12>] __do_softirq+0x92/0x120
Nov 24 18:50:43 zilla kernel: [16085.000125] [<c013ecfd>] do_softirq+0x5d/0x60
Nov 24 18:50:43 zilla kernel: [16085.000127] [<c013ee75>] irq_exit+0x55/0x90
Nov 24 18:50:43 zilla kernel: [16085.000129] [<c011991d>] smp_apic_timer_interrupt+0x5d/0x90
Nov 24 18:50:43 zilla kernel: [16085.000133] [<c010aa94>] apic_timer_interrupt+0x28/0x30
Nov 24 18:50:43 zilla kernel: [16085.000136] [<c011076a>] ? mwait_idle+0x4a/0x50
Nov 24 18:50:43 zilla kernel: [16085.000139] [<c010888d>] cpu_idle+0x7d/0x140
Nov 24 18:50:43 zilla kernel: [16085.000142] [<c037c333>] rest_init+0x53/0x60
Nov 24 18:50:43 zilla kern...

Read more...

Revision history for this message
Michael Gichoga (mgichoga) wrote :

I too can also reproduce this problem consistently when I initiate an rsync transfer.

Machine: HP Compaq nc6120

#uname -a
2.6.27-10-generic #1 SMP Fri Nov 21 12:00:22 UTC 2008 i686 GNU/Linux

#modinfo tg3
filename: /lib/modules/2.6.27-10-generic/kernel/drivers/net/tg3.ko
version: 3.94
license: GPL
description: Broadcom Tigon3 ethernet driver

#dmesg
[ 1286.000061] tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
[ 1286.000071] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
[ 1286.101209] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
[ 1286.202722] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
[ 1286.303920] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
[ 1286.427307] tg3: eth0: Link is down.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Stefan Friesel (stefan-friesel) wrote : Re: tg3 transmit timeout in Intrepid

I confirm this bug on HP Compaq 6715s

#uname -a
2.6.27-11-generic #1 SMP Thu Jan 29 19:28:32 UTC 2009 x86_64 GNU/Linux

#modinfo tg3
filename: /lib/modules/2.6.27-11-generic/kernel/drivers/net/tg3.ko
version: 3.94

#kern.log
kernel: [108770.805048] tg3: eth0: transmit timed out, resetting
kernel: [108770.805112] tg3: DEBUG: MAC_TX_STATUS[0000000b] MAC_RX_STATUS[00000000]
kernel: [108770.805123] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000000]
VitaminC kernel: [108770.943119] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
VitaminC kernel: [108771.081605] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
VitaminC kernel: [108771.219667] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
VitaminC kernel: [108771.357606] tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
VitaminC kernel: [108771.367804] tg3: eth0: Link is down.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Just curious if anyone would be willing to confirm the issue exists with the latest pre-release of Jaunty 9.04, currently Beta - http://www.ubuntu.com/testing/jaunty/beta . It contains a 2.6.28 based kernel. Please let us know if this issue remains if you do happen to test. Thanks.

Revision history for this message
Stefan Friesel (stefan-friesel) wrote :

This still happens on current jaunty here.

Revision history for this message
Alexander Egger (alexander-egger-deactivatedaccount) wrote :
Download full text (4.5 KiB)

Reproduced on HP nc6120 Jaunty 9.04 (as of 2009-04-22).

Happens during a rsync.

Apr 22 17:17:32 upper-eggeral kernel: [10515.000050] ------------[ cut here ]------------
Apr 22 17:17:32 upper-eggeral kernel: [10515.000059] WARNING: at /build/buildd/linux-2.6.28/net/sched/sch_generic.c:226 dev_watchdog+0x219/0x230()
Apr 22 17:17:32 upper-eggeral kernel: [10515.000065] NETDEV WATCHDOG: eth0 (tg3): transmit timed out
Apr 22 17:17:32 upper-eggeral kernel: [10515.000070] Modules linked in: sha1_generic ppp_mppe ppp_async crc_ccitt michael_mic arc4 ecb ieee80211_crypt_tkip aes_i586 aes_generic ieee80211_crypt_ccmp binfmt_misc i915 drm bridge stp bnep input_polldev ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi sbp2 lp pata_pcmcia joydev pcmcia snd_usb_audio snd_usb_lib snd_intel8x0 snd_hwdep snd_ac97_codec ac97_bus snd_pcm_oss snd_seq_dummy snd_mixer_oss snd_seq_oss usblp tifm_7xx1 snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_seq_device psmouse asix usbnet iTCO_wdt iTCO_vendor_support snd_pcm usbhid tifm_core serio_raw pcspkr ipw2200 mii yenta_socket rsrc_nonstatic pcmcia_core sdhci_pci sdhci btusb ppdev uss720 pl2303 intel_agp snd_timer snd_page_alloc snd soundcore ieee80211 ieee80211_crypt agpgart parport_pc parport video output tg3 ohci1394 ieee1394 fbcon tileblit font bitblit softcursor
Apr 22 17:17:32 upper-eggeral kernel: [10515.000225] Pid: 0, comm: swapper Not tainted 2.6.28-11-generic #42-Ubuntu
Apr 22 17:17:32 upper-eggeral kernel: [10515.000230] Call Trace:
Apr 22 17:17:32 upper-eggeral kernel: [10515.000242] [<c0139ab0>] warn_slowpath+0x60/0x80
Apr 22 17:17:32 upper-eggeral kernel: [10515.000252] [<c012c6fc>] ? enqueue_entity+0x13c/0x360
Apr 22 17:17:32 upper-eggeral kernel: [10515.000260] [<c0132131>] ? enqueue_task_fair+0x31/0x70
Apr 22 17:17:32 upper-eggeral kernel: [10515.000267] [<c01287a7>] ? enqueue_task+0x57/0x70
Apr 22 17:17:32 upper-eggeral kernel: [10515.000274] [<c0133b74>] ? try_to_wake_up+0x104/0x290
Apr 22 17:17:32 upper-eggeral kernel: [10515.000283] [<c02cb03d>] ? strlcpy+0x1d/0x60
Apr 22 17:17:32 upper-eggeral kernel: [10515.000292] [<c04312f2>] ? netdev_drivername+0x32/0x40
Apr 22 17:17:32 upper-eggeral kernel: [10515.000300] [<c0445e49>] dev_watchdog+0x219/0x230
Apr 22 17:17:32 upper-eggeral kernel: [10515.000309] [<c014b791>] ? __queue_work+0x31/0x40
Apr 22 17:17:32 upper-eggeral kernel: [10515.000318] [<c0143b00>] run_timer_softirq+0x130/0x200
Apr 22 17:17:32 upper-eggeral kernel: [10515.000325] [<c0445c30>] ? dev_watchdog+0x0/0x230
Apr 22 17:17:32 upper-eggeral kernel: [10515.000332] [<c0445c30>] ? dev_watchdog+0x0/0x230
Apr 22 17:17:32 upper-eggeral kernel: [10515.000341] [<c013f197>] __do_softirq+0x97/0x170
Apr 22 17:17:32 upper-eggeral kernel:...

Read more...

Revision history for this message
Ian (ibatterb) wrote :

2.6.28-15-generic #49-Ubuntu (no apt-get updates pending)

I can reproduce this on demand every time I connect to vino-server from another host. Fault occurs approximately 5-10 seconds after the initial connection, during the first screen draw.

Because of the nature of the fault (dying when full-mtu packets are sent), I tried it again with a lower MTU, and /can not get it to hang/ when MTU is set to 1400. I'll experiment and see if I can find the magic number that makes it hang again.

Revision history for this message
Ian (ibatterb) wrote :

Testing shows that an MTU equal to, or larger than 1498 bytes causes this issue. Smaller values do not.

Hopefully this will help point someone in the right direction in the driver

This is an HP laptop, with the following NIC:

02:0e.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5788 Gigabit Ethernet (rev 03)
        Subsystem: Hewlett-Packard Company Device 30aa
        Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 16
        Memory at e8110000 (32-bit, non-prefetchable) [size=64K]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [48] Power Management version 2
        Capabilities: [50] Vital Product Data <?>
        Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3 Enable-
        Kernel driver in use: tg3
        Kernel modules: tg3

Revision history for this message
Ian (ibatterb) wrote :

There's something else going on here.. the 1497 MTU that worked for me two hours ago no longer works, and neither does 1496. Further testing won't be possible for several hours, as I took down the ethernet interface remotely by accident. Oops.

Perhaps the act of changing the MTU is a factor ?

Revision history for this message
Michael Gichoga (mgichoga) wrote :

Ian,

What model of the HP do you have? I could reproduce this issue consistently with the HP Compaq nc6120, however the motherboard had problems with the USB ports, (all four models in stock had this problem) and now I have a nc6220 and do not have this problem. I have not looked at the changelogs for tg3 driver with then new kernel release.

Revision history for this message
Ian (ibatterb) wrote :

I have an nx6320.

The only hardware issues I'm aware of on this thing (had it for 18 months now) are the USB ports on the docking station are problematic after loss of power (you have to remove all USB devices and plug them in again for them to be recognised), but as that happens under windows as well as linux, I don't think that's related to this bug.

Revision history for this message
Brendan_P (brendan-p) wrote :

Hi All,

I use the tg3 driver on a Dell Studio XPS 16;

"Tigon3 [partno(BCM95784M) rev 5784100 PHY(5784)] (PCI Express) 10/100/1000Base-T Ethernet"

I'm hoping this issue is related to along standing and completely baffling issue I have. I cannot maintain a steady through put over the LAN device, wireless is fine and other machines are fine so suspect this driver/card is the culprit.

Here is a attached screenshot, lshw to follow.

Here is my post on the matter: http://ubuntuforums.org/showthread.php?t=1173499

If it's not related then I'll open another bug.

Thanks in advance for your time and effort.

Regards
Brendan

Revision history for this message
Brendan_P (brendan-p) wrote :

lshw attached.

Revision history for this message
Ian (ibatterb) wrote :

It seems that the problem is specific to situations where the driver has to transmit full size packets, but does not occur when it receives them.

This was evidenced by a problem I just experienced where I was able to scp a file onto the laptop, but was not able to scp it back off again (just got timeouts). I was able to work around the fault by reducing the MTU on the laptop to 1496.

Revision history for this message
Ian (ibatterb) wrote :

Further, returning the MTU to 1500 causes the problem to return. I then tested turning tso off "ethtool -K eth0 tso off", as suggested in other threads for this problem, and that made no difference.

Revision history for this message
Brendan_P (brendan-p) wrote :

Seems I may have another issue. Changing the MTU has no effect on my issue. I'll look at opening a new bug later this evening.

Cheers
B

Revision history for this message
david.barbion (david-barbion) wrote :

Hi all,

I have same problem here with an HP NC6000 with the following broadcom network adapter:
02:0e.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5705M_2 Gigabit Ethernet (rev 03)
 Subsystem: Hewlett-Packard Company Device 0890
 Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 11
 Memory at 90000000 (64-bit, non-prefetchable) [size=64K]
 [virtual] Expansion ROM at 6c000000 [disabled] [size=64K]
 Capabilities: <access denied>
 Kernel driver in use: tg3
 Kernel modules: tg3

Here are my observations:
Sometime, the network link still up (ethtool confirm it) but input packages seem discarded... in this case rmmod && modprobe made it...
But sometime, the computer completely freezes (either magic sysrq doesn't help)...

I saw also that it doesn't depend on packet size but on network bandwidth usage:
Full size packet in my lan causes that problem (~100MBPs), whereas full size packet from internet works ok (~3MBPS)
MTU still at 1500 (when reducing this value, network is unresponsive)
Finally, I saw that sending and recieving packets causes lots of IO (what I mean is that it takes remaining idle CPU).
Could it be an IRQ problem ? here is "cat /proc/interrupts":
           CPU0
  0: 404858 XT-PIC-XT timer
  1: 4171 XT-PIC-XT i8042
  2: 0 XT-PIC-XT cascade
  3: 3 XT-PIC-XT
  4: 3 XT-PIC-XT
  5: 3 XT-PIC-XT
  7: 0 XT-PIC-XT parport0
  8: 0 XT-PIC-XT rtc0
  9: 2099 XT-PIC-XT acpi
 10: 39337 XT-PIC-XT ehci_hcd:usb1, uhci_hcd:usb2, uhci_hcd:usb3, uhci_hcd:usb4, yenta, yenta, yenta, radeon@pci:0000:01:00.0
 11: 190764 XT-PIC-XT ath, eth0, Intel 82801DB-ICH4
 12: 122662 XT-PIC-XT i8042
 14: 85470 XT-PIC-XT ata_piix
 15: 5669 XT-PIC-XT ata_piix

where you can see IRQ 11 shared... wireless works ok.

Revision history for this message
oliford (oliford) wrote :

This bug is still present in the Karmic initial kernel at 2.6.31-17.

For the year since I reported this bug I've been sticking with the 2.6.24-6 kernel that works. I've now upgraded to Karmic which doesn't work with 2.6.24.

A quick google now shows http://bugzilla.kernel.org/show_bug.cgi?id=12971 which appears to me to be the same problem.
They suggest you can work around the problem by turnthing 'scatter/grather' off with:
sudo ethtool --offload eth0 sg off

Which appears to work but slightly reduces the maximum throughput.

It also suggests the problem is fixed by a mainline kernel commit entitled "tg3: Fix 5906 transmit hangs" which, by the looks of it, is included from 2.6.33-rc1.

There some (don't know what build) packages for ubuntu's kernel for amd64/i386 for 2.6.33-rc2 here:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.33-rc2/
But I can't get them to work with Karmic easily.

I'm going to try a build from the ubuntu kernel git today. If it works, it will mean it /should/ be fixed in 10.04 (Lucid Lynx) anyway.
More later.

Revision history for this message
oliford (oliford) wrote :

Ok, I got a few things wrong with that last message...

It looks like Lucid will be 2.6.32, so won't include the latest mainline tg3 patches that the bugzilla.kernel.org report claims fixes it.

The 2.6.33-rc1 package from the kernel-ppa link boots under karmic. It does still stop transmitting but now I never see the NETDEV tg3 timeout message (or anything else for that matter) and have to manually stop and start the interface.

I can't find anything above 2.6.32 in the ubuntu git repos, so I suspect (i'm guessing) that they haven't gone this far yet and that the 2.6.33 packages in the kernel PPA are just builds of the vanilla kernel. Sorry to the kernel team for not totally following how this all goes under ubuntu.

Is there some way the tg3 patches (or at least the supposed fix) in 2.6.33, might get backported to the Lucid 2.6.32 ubuntu kernel, for the sake of anyone with a tg3 device (a lot of HP laptops seem to use it)?

Revision history for this message
david.barbion (david-barbion) wrote :

Hi,

I applied the ethtool as you described and it seems to work again. The output transfer bandwidth reaches 8Mb/s whereas input reaches 11Mb/s.
No crash or network problem occured.

Revision history for this message
arruah (arruah) wrote :

I have some problem on my Ubuntu amd 64 10.04 server

#uname -a
#Linux bas 2.6.32-23-server #37-Ubuntu SMP Fri Jun 11 09:11:11 UTC 2010 x86_64 GNU/Linux

#dmesg
[220203.040162] tg3: eth0: transmit timed out, resetting
[220203.052988] tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
[220203.066156] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
[220203.180211] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
[220203.293855] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
[220203.407216] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
[220203.521116] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
[220203.634662] tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
[220203.659299] tg3: eth0: Link is down.
[220205.255508] tg3: eth0: Link is up at 100 Mbps, full duplex.
[220205.255513] tg3: eth0: Flow control is off for TX and off for RX.
[275200.012515] tg3: eth0: transmit timed out, resetting
[275200.024642] tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
[275200.036744] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
[275200.150057] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
[275200.263121] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
[275200.376136] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
[275200.489747] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
[275200.515217] tg3: eth0: Link is down.
[275202.132397] tg3: eth0: Link is up at 100 Mbps, full duplex.
[275202.132402] tg3: eth0: Flow control is off for TX and off for RX.
[332287.010028] tg3: eth0: transmit timed out, resetting
[332287.022216] tg3: DEBUG: MAC_TX_STATUS[00000008] MAC_RX_STATUS[00000000]
[332287.034486] tg3: DEBUG: RDMAC_STATUS[00000000] WDMAC_STATUS[00000008]
[332287.147802] tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
[332287.260802] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
[332287.373686] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
[332287.486861] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
[332287.512189] tg3: eth0: Link is down.
[332289.150353] tg3: eth0: Link is up at 100 Mbps, full duplex.
[332289.150358] tg3: eth0: Flow control is off for TX and off for RX.

Revision history for this message
stop (whoopwhoop) wrote :

I have this on 10.04 64 bit as well.
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5751 Gigabit Ethernet PCI Express (rev 20)

Oct 7 23:19:24 hpserver kernel: [ 6371.040473] tg3: DEBUG: MAC_TX_STATUS[ffffffff] MAC_RX_STATUS[ffffffff]
Oct 7 23:19:24 hpserver kernel: [ 6371.040609] tg3: DEBUG: RDMAC_STATUS[ffffffff] WDMAC_STATUS[ffffffff]
Oct 7 23:19:24 hpserver kernel: [ 6371.141999] tg3: tg3_stop_block timed out, ofs=2c00 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6371.243352] tg3: tg3_stop_block timed out, ofs=2000 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6371.344694] tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6371.446043] tg3: tg3_stop_block timed out, ofs=2800 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6371.547385] tg3: tg3_stop_block timed out, ofs=3000 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6371.648735] tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6371.750080] tg3: tg3_stop_block timed out, ofs=1800 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6371.851429] tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6371.952773] tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6372.054121] tg3: tg3_stop_block timed out, ofs=1000 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6372.155464] tg3: tg3_stop_block timed out, ofs=1c00 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6372.256854] tg3: tg3_abort_hw timed out for eth0, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
Oct 7 23:19:24 hpserver kernel: [ 6372.358269] tg3: tg3_stop_block timed out, ofs=3c00 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6372.459611] tg3: tg3_stop_block timed out, ofs=4c00 enable_bit=2
Oct 7 23:19:24 hpserver kernel: [ 6373.847052] tg3: eth0: No firmware running.
Oct 7 23:19:24 hpserver kernel: [ 6375.061957] tg3: tg3_abort_hw timed out for eth0, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
Oct 7 23:19:24 hpserver kernel: [ 6388.621871] tg3: eth0: Link is down.

summary: - tg3 transmit timeout in Intrepid
+ tg3 transmit timeout
Revision history for this message
stop (whoopwhoop) wrote :

I am running default 2.6.32-25-server 64 bit. eth0 went down five times. Doesn't come back up by itself...

Please check, possible duplicates:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/545334
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/530357

Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.