NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out

Bug #472057 reported by laurent on 2009-11-03
126
This bug affects 26 people
Affects Status Importance Assigned to Milestone
Debian
Fix Released
Unknown
Fedora
Won't Fix
Critical
openSUSE
Unknown
Medium
linux (Ubuntu)
High
Unassigned
Nominated for Jaunty by Kev Walke
Nominated for Lucid by Eric Munson
Nominated for Maverick by Eric Munson

Bug Description

ProblemType: KernelOops
Annotation: Your system might become unstable now and might need to be restarted.
AplayDevices:
 **** List of PLAYBACK Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC861 Analog [ALC861 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: Intel [HDA Intel], device 0: ALC861 Analog [ALC861 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: laurent 2099 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xdc440000 irq 22'
   Mixer name : 'Realtek ALC861'
   Components : 'HDA:10ec0861,11799205,00100300 HDA:11c11040,11790001,00100200'
   Controls : 13
   Simple ctrls : 9
Date: Tue Nov 3 03:20:00 2009
DistroRelease: Ubuntu 9.10
Failure: oops
HibernationDevice: RESUME=UUID=086c5112-bf9c-406f-93fc-5bc4d147fbfb
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release i386 (20091028.5)
MachineType: TOSHIBA Satellite A110
Package: linux-image-2.6.31-14-generic 2.6.31-14.48
PccardctlIdent:
 Socket 0:
   no product info available
PccardctlStatus:
 Socket 0:
   no card
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-14-generic root=UUID=80af4521-9258-4c03-a24c-0b8fcd54ff83 ro quiet splash
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
RelatedPackageVersions:
 linux-backports-modules-2.6.31-14-generic N/A
 linux-firmware 1.24
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
Tags: kernel-oops
Title: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Uname: Linux 2.6.31-14-generic i686
dmi.bios.date: 07/20/2006
dmi.bios.vendor: TOSHIBA
dmi.bios.version: V1.30
dmi.board.name: HTW20
dmi.board.vendor: TOSHIBA
dmi.board.version: Null
dmi.chassis.asset.tag: *
dmi.chassis.type: 10
dmi.chassis.vendor: TOSHIBA
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnTOSHIBA:bvrV1.30:bd07/20/2006:svnTOSHIBA:pnSatelliteA110:pvrPSAB0E-00500VB4:rvnTOSHIBA:rnHTW20:rvrNull:cvnTOSHIBA:ct10:cvrN/A:
dmi.product.name: Satellite A110
dmi.product.version: PSAB0E-00500VB4
dmi.sys.vendor: TOSHIBA

laurent (m2k-networx) wrote :
Narcissus (narcissus) wrote :

I'm having this problem too. I had a very hard time figuring what caused me disconnections when traffic was high (Bittorrent for example), I changed my router but the problem still appeared. When I was pretty sure it was a Ubuntu/Linux Kernel issue only, I found the same error in syslog when I was disconnected. I joined it as an attachment but I was never notified that something was going wrong. No KernelOops. It seems to be a problem with the r8169 module when large packets are received or under heavy traffic.

Changed in linux (Ubuntu):
status: New → Confirmed
Narcissus (narcissus) wrote :

And here's my lspci. I'll open a new report when I'll experience this again.

Download full text (4.8 KiB)

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.4) Gecko/20091016 SUSE/3.5.4-1.1.2 Firefox/3.5.4

I have a box with a realtek8169 as the built in gigabit ethernet.
I'm unable to put it under heavy network traffic as if I do the box hangs shortly thereafter. Can be replicated 100% of the time, within 2-3 minutes of heavy (network) load.

Nov 22 16:34:33 frank kernel: [28634.284677] The following is only an harmless informational message.
Nov 22 16:34:33 frank kernel: [28634.284687] Unless you get a _continuous_flood_ of these messages it means
Nov 22 16:34:33 frank kernel: [28634.284693] everything is working fine. Allocations from irqs cannot be
Nov 22 16:34:33 frank kernel: [28634.284697] perfectly reliable and the kernel is designed to handle that.
Nov 22 16:34:33 frank kernel: [28634.284704] swapper: page allocation failure. order:0, mode:0x20
Nov 22 16:34:33 frank kernel: [28634.284713] Pid: 0, comm: swapper Not tainted 2.6.31.5-0.1-default #1
Nov 22 16:34:33 frank kernel: [28634.284718] Call Trace:
Nov 22 16:34:33 frank kernel: [28634.284748] [<ffffffff81011749>] try_stack_unwind+0x189/0x1b0
Nov 22 16:34:33 frank kernel: [28634.284763] [<ffffffff8101013d>] dump_trace+0x9d/0x330
Nov 22 16:34:33 frank kernel: [28634.284776] [<ffffffff81011254>] show_trace_log_lvl+0x64/0x90
Nov 22 16:34:33 frank kernel: [28634.284787] [<ffffffff810112a3>] show_trace+0x23/0x40
Nov 22 16:34:33 frank kernel: [28634.284801] [<ffffffff81554378>] dump_stack+0x81/0x9e
Nov 22 16:34:33 frank kernel: [28634.284814] [<ffffffff811110c2>] __alloc_pages_slowpath+0x572/0x580
Nov 22 16:34:33 frank kernel: [28634.284826] [<ffffffff81111221>] __alloc_pages_nodemask+0x151/0x160
Nov 22 16:34:33 frank kernel: [28634.284838] [<ffffffff8114beac>] kmem_getpages+0x6c/0x190
Nov 22 16:34:33 frank kernel: [28634.284850] [<ffffffff8114cfca>] fallback_alloc+0x1ca/0x290
Nov 22 16:34:33 frank kernel: [28634.284861] [<ffffffff8114cd10>] ____cache_alloc_node+0xb0/0x1a0
Nov 22 16:34:33 frank kernel: [28634.284874] [<ffffffff8114c1a2>] kmem_cache_alloc_node+0xa2/0x220
Nov 22 16:34:33 frank kernel: [28634.284885] [<ffffffff8114c39f>] __kmalloc_node+0x7f/0x110
Nov 22 16:34:33 frank kernel: [28634.284898] [<ffffffff81461cf4>] __alloc_skb+0x84/0x1a0
Nov 22 16:34:33 frank kernel: [28634.284910] [<ffffffff81462150>] __netdev_alloc_skb+0x40/0x80
Nov 22 16:34:33 frank kernel: [28634.284936] [<ffffffffa011f0a8>] rtl8169_rx_fill+0xc8/0x280 [r8169]
Nov 22 16:34:34 frank kernel: [28634.284966] [<ffffffffa011f63c>] rtl8169_rx_interrupt+0x3dc/0x590 [r8169]
Nov 22 16:34:34 frank kernel: [28634.284990] [<ffffffffa0120bca>] rtl8169_poll+0x4a/0x298 [r8169]
Nov 22 16:34:34 frank kernel: [28634.285011] [<ffffffff81470b49>] net_rx_action+0x149/0x2e0
Nov 22 16:34:34 frank kernel: [28634.285024] [<ffffffff81076713>] __do_softirq+0xd3/0x240
Nov 22 16:34:34 frank kernel: [28634.285035] [<ffffffff8100d80c>] call_softirq+0x1c/0x30
Nov 22 16:34:34 frank kernel: [28634.285046] [<ffffffff8100f8c5>] do_softirq+0xb5/0x110
Nov 22 16:34:34 frank kernel: [28634.285056] [<ffffffff810763a5>] irq_exit+0xb5/0xd0
Nov 22 16:34:34 frank kernel: [28634.285066] [<ffffffff8100ed1c>] do_IRQ+0x...

Read more...

_erwin_ (jansen332) wrote :

Same problem here. This oops occur

_erwin_ (jansen332) wrote :

I have this problem every day at the first resume. No heavy traffic is involved. It requires a re-boot to get the network up and running.

Can you attach the output from hwinfo to the bug? I need details of your hardware to debug further.

It looks like there was a recent fix for the RTL8110SC rev d: 05af2142d09845de2f4ae34181c72addd72d5ef9. Might be related.

hwinfo:

35: None 01.0: 10701 Ethernet
  [Created at net.124]
  Unique ID: L2Ua.ndpeucax6V1
  Parent ID: JNkJ.vJ3ALhVDC+0
  SysFS ID: /class/net/eth1
  SysFS Device Link: /devices/pci0000:00/0000:00:07.0/0000:02:00.0
  Hardware Class: network interface
  Model: "Ethernet network interface"
  Driver: "r8169"
  Driver Modules: "r8169"
  Device File: eth1
  HW Address: 00:30:67:07:65:cd
  Link detected: yes
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #23 (Ethernet controller)

selected messages from /v/l/m:

[ 7.369578] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[ 7.369664] r8169 0000:02:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 7.369783] r8169 0000:02:00.0: setting latency timer to 64
[ 7.369820] alloc irq_desc for 26 on node 0
[ 7.369823] alloc kstat_irqs on node 0
[ 7.369838] r8169 0000:02:00.0: irq 26 for MSI/MSI-X
[ 7.370528] eth0: RTL8168c/8111c at 0xffffc90011312000, 00:30:67:07:65:cd, XID 3c4000c0 IRQ 26

Hmm. Looks like I have an 8111c, but it's possibly related.
I applied the patch to the current sources and built the module successfully.
I'll be testing shortly.

(In reply to comment #2)
> I'll be testing shortly.

How did the test go?

So far the machine seems OK, but I haven't had a chance to put it under any significant load due to some sick kids and such. Hopefully today!

Using vblade and the AoE driver, I'm seeing a sustained 43 to 45MB/s with *no* messages in /var/log/messages.

The box seems perfectly solid.

The patch appears to have helped, but I'll give it another hour or so and comment here.

Download full text (6.7 KiB)

I may have spoken too soon.

I'm getting these:

Dec 8 10:33:52 frank kernel: [ 3732.187277] The following is only an harmless informational message.
Dec 8 10:33:52 frank kernel: [ 3732.187282] Unless you get a _continuous_flood_ of these messages it means
Dec 8 10:33:52 frank kernel: [ 3732.187286] everything is working fine. Allocations from irqs cannot be
Dec 8 10:33:52 frank kernel: [ 3732.187290] perfectly reliable and the kernel is designed to handle that.
Dec 8 10:33:52 frank kernel: [ 3732.187296] swapper: page allocation failure. order:1, mode:0x20
Dec 8 10:33:52 frank kernel: [ 3732.187303] Pid: 0, comm: swapper Not tainted 2.6.31.5-0.1-default #1
Dec 8 10:33:53 frank kernel: [ 3732.187307] Call Trace:
Dec 8 10:33:53 frank kernel: [ 3732.187323] [<ffffffff81011749>] try_stack_unwind+0x189/0x1b0
Dec 8 10:33:53 frank kernel: [ 3732.187335] [<ffffffff8101013d>] dump_trace+0x9d/0x330
Dec 8 10:33:53 frank kernel: [ 3732.187348] [<ffffffff81011254>] show_trace_log_lvl+0x64/0x90
Dec 8 10:33:53 frank kernel: [ 3732.187359] [<ffffffff810112a3>] show_trace+0x23/0x40
Dec 8 10:33:53 frank kernel: [ 3732.187376] [<ffffffff81554378>] dump_stack+0x81/0x9e
Dec 8 10:33:53 frank kernel: [ 3732.187389] [<ffffffff811110c2>] __alloc_pages_slowpath+0x572/0x580
Dec 8 10:33:53 frank kernel: [ 3732.187403] [<ffffffff81111221>] __alloc_pages_nodemask+0x151/0x160
Dec 8 10:33:53 frank kernel: [ 3732.187419] [<ffffffff8114beac>] kmem_getpages+0x6c/0x190
Dec 8 10:33:53 frank kernel: [ 3732.187431] [<ffffffff8114cfca>] fallback_alloc+0x1ca/0x290
Dec 8 10:33:53 frank kernel: [ 3732.187442] [<ffffffff8114cd10>] ____cache_alloc_node+0xb0/0x1a0
Dec 8 10:33:53 frank kernel: [ 3732.187457] [<ffffffff8114c1a2>] kmem_cache_alloc_node+0xa2/0x220
Dec 8 10:33:53 frank kernel: [ 3732.187474] [<ffffffff8114c39f>] __kmalloc_node+0x7f/0x110
Dec 8 10:33:53 frank kernel: [ 3732.187489] [<ffffffff81461cf4>] __alloc_skb+0x84/0x1a0
Dec 8 10:33:53 frank kernel: [ 3732.187500] [<ffffffff81462150>] __netdev_alloc_skb+0x40/0x80
Dec 8 10:33:53 frank kernel: [ 3732.187524] [<ffffffffa011eec8>] rtl8169_rx_fill+0xc8/0x280 [r8169]
Dec 8 10:33:53 frank kernel: [ 3732.187552] [<ffffffffa011f45c>] rtl8169_rx_interrupt+0x3dc/0x590 [r8169]
Dec 8 10:33:53 frank kernel: [ 3732.187577] [<ffffffffa01209ea>] rtl8169_poll+0x4a/0x298 [r8169]
Dec 8 10:33:53 frank kernel: [ 3732.187609] [<ffffffff81470b49>] net_rx_action+0x149/0x2e0
Dec 8 10:33:53 frank kernel: [ 3732.187621] [<ffffffff81076713>] __do_softirq+0xd3/0x240
Dec 8 10:33:53 frank kernel: [ 3732.187633] [<ffffffff8100d80c>] call_softirq+0x1c/0x30
Dec 8 10:33:53 frank kernel: [ 3732.187643] [<ffffffff8100f8c5>] do_softirq+0xb5/0x110
Dec 8 10:33:53 frank kernel: [ 3732.187653] [<ffffffff810763a5>] irq_exit+0xb5/0xd0
Dec 8 10:33:53 frank kernel: [ 3732.187662] [<ffffffff8100ed1c>] do_IRQ+0x7c/0x100
Dec 8 10:33:53 frank kernel: [ 3732.187679] [<ffffffff8100d013>] ret_from_intr+0x0/0x11
Dec 8 10:33:53 frank kernel: [ 3732.187692] [<ffffffff81039766>] native_safe_halt+0x6/0x10
Dec 8 10:33:53 frank kernel: [ 3732.187704] [<ffffffff810168c2>] default_idle+0x62/0x120
Dec 8 10:33:53 frank kernel:...

Read more...

I ran it pretty hard with the KOTD 2.6.31.7-0.0.0.8.a22d080-desktop, and it seems OK so far.

(In reply to comment #7)
> I ran it pretty hard with the KOTD 2.6.31.7-0.0.0.8.a22d080-desktop, and it
> seems OK so far.

OK, I will leave NEEDINFO on you until you can confirm that 2.6.31.7-0.0.0.8.a22d080-desktop fixes it for sure.

Download full text (3.7 KiB)

Well.

Using the KOTD it hung. That took about 7 minutes.

The last messages I see before it goes completely dark is this:

Dec 17 12:29:27 frank kernel: [ 1632.647547] [<ffffffff81140dfc>] kmem_getpages+0x6c/0x190
Dec 17 12:29:27 frank kernel: [ 1632.647558] [<ffffffff81141da9>] fallback_alloc+0x179/0x240
Dec 17 12:29:27 frank kernel: [ 1632.647570] [<ffffffff81141b50>] ____cache_alloc_node+0xa0/0x180
Dec 17 12:29:27 frank kernel: [ 1632.647585] [<ffffffff81141332>] kmem_cache_alloc_node+0xa2/0x260
Dec 17 12:29:27 frank kernel: [ 1632.647599] [<ffffffff81142a4f>] __kmalloc_node+0x7f/0x160
Dec 17 12:29:27 frank kernel: [ 1632.647614] [<ffffffff81459da4>] __alloc_skb+0x84/0x1a0
Dec 17 12:29:27 frank kernel: [ 1632.647626] [<ffffffff8145a200>] __netdev_alloc_skb+0x40/0x80
Dec 17 12:29:27 frank kernel: [ 1632.647645] [<ffffffffa010afa8>] rtl8169_rx_fill+0xc8/0x280 [r8169]
Dec 17 12:29:27 frank kernel: [ 1632.647673] [<ffffffffa010b53c>] rtl8169_rx_interrupt+0x3dc/0x590 [r8169]
Dec 17 12:29:27 frank kernel: [ 1632.647702] [<ffffffffa010caca>] rtl8169_poll+0x4a/0x298 [r8169]
Dec 17 12:29:27 frank kernel: [ 1632.647734] [<ffffffff8146a21d>] net_rx_action+0x17d/0x320
Dec 17 12:29:27 frank kernel: [ 1632.647751] [<ffffffff81073b33>] __do_softirq+0xd3/0x2d0
Dec 17 12:29:27 frank kernel: [ 1632.647762] [<ffffffff8100d8bc>] call_softirq+0x1c/0x30
Dec 17 12:29:27 frank kernel: [ 1632.647775] [<ffffffff8100f975>] do_softirq+0xb5/0x110
Dec 17 12:29:27 frank kernel: [ 1632.647788] [<ffffffff8107409d>] irq_exit+0xbd/0xd0
Dec 17 12:29:27 frank kernel: [ 1632.647798] [<ffffffff8100edcc>] do_IRQ+0x7c/0x100
Dec 17 12:29:27 frank kernel: [ 1632.647809] [<ffffffff8100d093>] ret_from_intr+0x0/0x11
Dec 17 12:29:27 frank kernel: [ 1632.647821] [<ffffffff815564e6>] _spin_unlock_irq+0x36/0x90
Dec 17 12:29:28 frank kernel: [ 1632.647832] [<ffffffff811100f8>] shrink_inactive_list+0x268/0x6f0
Dec 17 12:29:28 frank kernel: [ 1632.647843] [<ffffffff8111062b>] shrink_list+0xab/0xd0
Dec 17 12:29:28 frank kernel: [ 1632.647852] [<ffffffff811107f5>] shrink_zone+0x1a5/0x260
Dec 17 12:29:28 frank kernel: [ 1632.647863] [<ffffffff811111e9>] balance_pgdat+0x639/0x6c0
Dec 17 12:29:28 frank kernel: [ 1632.647873] [<ffffffff81111387>] kswapd+0x117/0x170
Dec 17 12:29:28 frank kernel: [ 1632.647885] [<ffffffff8108c196>] kthread+0xb6/0xc0
Dec 17 12:29:28 frank kernel: [ 1632.647896] [<ffffffff8100d7ba>] child_rip+0xa/0x20
Dec 17 12:29:28 frank kernel: [ 1632.647903] Mem-Info:
Dec 17 12:29:28 frank kernel: [ 1632.647906] Node 0 DMA per-cpu:
Dec 17 12:29:28 frank kernel: [ 1632.647912] CPU 0: hi: 0, btch: 1 usd: 0
Dec 17 12:29:28 frank kernel: [ 1632.647916] CPU 1: hi: 0, btch: 1 usd: 0
Dec 17 12:29:28 frank kernel: [ 1632.647921] CPU 2: hi: 0, btch: 1 usd: 0
Dec 17 12:29:28 frank kernel: [ 1632.647925] CPU 3: hi: 0, btch: 1 usd: 0
Dec 17 12:29:28 frank kernel: [ 1632.647929] Node 0 DMA32 per-cpu:
Dec 17 12:29:28 frank kernel: [ 1632.647935] CPU 0: hi: 186, btch: 31 usd: 168
Dec 17 12:29:28 frank kernel: [ 1632.647939] CPU 1: hi: 186, btch: 31 usd: 167
Dec 17 12:29:28 frank kernel: [ 1632.647944]...

Read more...

Changed in debian:
status: Unknown → Confirmed
pie86 (bonfus) wrote :

Important update:
I *do not experience* this bug anymore since I changed C-State option in my BIOS settings.

http://www.fit-pc2.com/forum/viewtopic.php?f=9&t=1224&start=0
http://www.fit-pc2.com/forum/viewtopic.php?f=9&t=361

Can you test the latest KOTD which is 2.6.33? http://ftp.suse.com/pub/projects/kernel/kotd/master/

If that fails could you file a bug upstream at bugzilla.kernel.org?

Let me know the URL or test results. Thanks.

Mimue (michael-mueller12) wrote :

I experience this bug too when I do backup large data to my nfs server. I used Areca as a backup utility. I want to backup about 5GB of data. After a while (more than an hour), the eth0 times out. See my syslog for details.

hallenrm (hallenrm-yahoo) wrote :

I have discovered, that this problem no longer affects my desktop ever since i removed the connection to the headphones. I am therefore led to believe that it is due to a bug in pulseaudio module.

Changed in opensuse:
status: Unknown → Incomplete
Oleg Yaroshevych (brainunit) wrote :

I experience this bug too with Ubuntu 10.04 Desktop x64. Drops connection every hour.

Asus UL30A

Eric Munson (emunson) wrote :

This bug was not affecting me until the 2.6.32-24 kernel update, now the adapter that feeds my internal network doesn't work (connection to the world is still functional).

Boris Devouge (bdevouge) wrote :

Apparently pcie_aspm=off as a kernel parameter fixes the issue. Not sure what this implies regarding PM.
From : https://bugzilla.redhat.com/show_bug.cgi?id=538920#c78

Trying to reproduce here. Will report back.

Closed due to lack of response to Comment #10. Thanks.

Changed in opensuse:
importance: Unknown → Medium
status: Incomplete → Unknown
Lee Jones (lag) wrote :

How did your tests go Boris?

Changed in linux (Ubuntu):
assignee: nobody → Lee Jones (lag)
importance: Undecided → High
Lee Jones (lag) wrote :

I'm currently working on a very similar bug to this.

Would you mind running this kernel whist using your device and post the logs back on here please?

http://people.canonical.com/~ljones/lp535315-netdev-watchdog-maverick/

Thanks.

Mike (mike-fdb) wrote :

I've similar crash with atl2 driver, hardware:
Ethernet controller: Atheros Communications L2 Fast Ethernet (rev a0)

Mike (mike-fdb) wrote :

Similar problems with lucid with backported maveric kernel. Network started to disappear when i replaced faulty harddisk and enabled cool'n'quet and powernowd. I'm testing now if disabling powersaving helps with this problem.

carloslp (carloslp) wrote :

This bug is very very similar to #535315

m1fcj (hakan-koseoglu) wrote :

pcie_aspm=off does nothing here. Crash happens within a minute of heavy traffic, even quicker since I moved to GBit at home.

Lee Jones (lag) on 2011-02-07
Changed in linux (Ubuntu):
assignee: Lee Jones (lag) → nobody

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
urusha (urusha) wrote :

It seems this issue still exists in oneiric.
pcie_aspm=off doesn't help.
Some info is in attachment.

Changed in debian:
status: Confirmed → Fix Released
Changed in fedora:
importance: Unknown → Critical
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.