r8169 driver causes CPU soft lockup

Bug #448827 reported by Hugh Cole-Baker
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Using Karmic on a Dell Mini 12 (with the i686 architecture, not lpia) and running the 2.6.31-13-generic kernel, after leaving the machine idle for several minutes while downloading files, it will inevitably lock up and require a reboot. The error message I get from the kernel is "BUG: soft lockup - CPU#0 stuck for 61s". From the stack trace printed after the error message, it looks as if the lockup is happening in the r8169 driver.
I have tried printing the stack trace of the stuck CPU with Alt-SysRq-L every time the lockup happens, and it always shows the CPU stuck at the same point.
I couldn't find any way to save the stack trace as a text file, but attached an image of the stack trace to this bug.

ProblemType: Bug
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D0c', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info: Error: [Errno 2] No such file or directory
Card0.Amixer.values: Error: [Errno 2] No such file or directory
CurrentDmesg:
 [ 29.114318] svc: failed to register lockdv1 RPC service (errno 97).
 [ 29.117186] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
 [ 29.135916] NFSD: starting 90-second grace period
 [ 35.644069] eth0: no IPv6 routers present
 [ 64.747223] WDT500/501-P driver 0.10 at 0x0240 (Interrupt 11). heartbeat=60 sec (nowayout=0)
Date: Sun Oct 11 17:16:08 2009
DistroRelease: Ubuntu 9.10
HibernationDevice: RESUME=UUID=545485ac-bffb-4592-990f-f3eaf842c237
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Dell Inc. Inspiron 1210
Package: linux-image-2.6.31-13-generic 2.6.31-13.43
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.31-13-generic root=UUID=1e19a5a9-7546-4972-8b5e-85cbf6d3fc64 ro pci=nomsi video=uvesafb:1024x768-32,mtrr:3,ywrap
ProcEnviron:
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-13.43-generic
RelatedPackageVersions: linux-firmware 1.21
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
Uname: Linux 2.6.31-13-generic i686
dmi.bios.date: 10/07/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A01
dmi.board.name: 0X605H
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.asset.tag: **********
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: A01
dmi.modalias: dmi:bvnDellInc.:bvrA01:bd10/07/2008:svnDellInc.:pnInspiron1210:pvrA01:rvnDellInc.:rn0X605H:rvrA01:cvnDellInc.:ct8:cvrA01:
dmi.product.name: Inspiron 1210
dmi.product.version: A01
dmi.sys.vendor: Dell Inc.
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
AplayDevices: aplay: device_list:223: no soundcards found...
Architecture: i386
ArecordDevices: arecord: device_list:223: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/dsp', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer', '/dev/sequencer2', '/dev/sequencer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg:
 [ 24.395819] svc: failed to register lockdv1 RPC service (errno 97).
 [ 24.402708] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
 [ 24.422448] NFSD: starting 90-second grace period
 [ 24.968019] eth0: no IPv6 routers present
DistroRelease: Ubuntu 10.04
HibernationDevice: RESUME=UUID=545485ac-bffb-4592-990f-f3eaf842c237
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Dell Inc. Inspiron 1210
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-19-generic root=UUID=1e19a5a9-7546-4972-8b5e-85cbf6d3fc64 ro video=uvesafb:1024x768-32,mtrr:3,ywrap
ProcEnviron:
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Regression: No
RelatedPackageVersions: linux-firmware 1.33
Reproducible: Yes
RfKill: Error: [Errno 2] No such file or directory
Tags: lucid kconfig needs-upstream-testing
Uname: Linux 2.6.32-19-generic i686
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
dmi.bios.date: 10/07/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A01
dmi.board.name: 0X605H
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.asset.tag: **********
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: A01
dmi.modalias: dmi:bvnDellInc.:bvrA01:bd10/07/2008:svnDellInc.:pnInspiron1210:pvrA01:rvnDellInc.:rn0X605H:rvrA01:cvnDellInc.:ct8:cvrA01:
dmi.product.name: Inspiron 1210
dmi.product.version: A01
dmi.sys.vendor: Dell Inc.

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote :
Revision history for this message
Hugh Cole-Baker (sigmaris) wrote :

I tried specifying pci=nomsi as a kernel boot argument, but the machine still seems to lock up. I then tried adding noapic as a boot argument and this seems to cause the bug not to occur, the system has been running for 24+ hours now while using the network and has not locked up.

Revision history for this message
Paul Daniels (walkabout) wrote :

I use my machine regularly to ltsp boot client machines ssh to them and use nc to send partition images about the network. After upgrading to karmic this was not possible, due to the locking up. I have attempted an alternate fix installing realtek's 8168 driver.

lspci reports my card as a "RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)"

I obtained the "r8168-8.014.00" driver and applied the patches as specified on http://www.jamesonwilliams.com/hardy-r8168
towards the end of the comments section there is a patch to be able to compile the driver with the 2.6.31-14-generic kernel, as there have been changes to some data structures required.

Using this driver also should allow things like wol to work. I suspect the r8169 code is to blame. Oh and Ya! Bzflag isnt locking things up either. Given the prevalence of these chipsets I would have thought that sorting this out would be a priority.

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote :

After upgrading to the 2.6.31-14-generic kernel in Karmic, the lockups have stopped, even without noapic, so it looks like this bug may have been fixed.

Changed in linux (Ubuntu):
status: New → Fix Released
Revision history for this message
Hugh Cole-Baker (sigmaris) wrote :

Actually, my bad, it looks like the bug _wasn't_ fixed, the frequency that it occurs may just have been reduced. The system still freezes up after running for a while, with the 2.6.31-14-generic kernel.

Changed in linux (Ubuntu):
status: Fix Released → New
Revision history for this message
Luca Reverberi (socketreve) wrote :
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Hugh,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 448827

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : AlsaDevices.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : BootDmesg.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : Card0.Amixer.info.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : Lspci.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : Lsusb.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : PciMultimedia.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : ProcModules.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : UdevDb.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : UdevLog.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote : WifiSyslog.txt

apport information

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote :

I upgraded to the 10.04 Beta version, and the bug was still present. I ran apport under 10.04 to get updated info, then tried the mainline kernel (2.6.34-999-generic #201004051003), but the bug still happens.
It is quite easy for me to reproduce, just boot up the system, connect it via built-in Ethernet, leave it, then SSH in and run any networking-intensive task and it will eventually freeze up with no response over the network or on the console.
After it has frozen it will only respond to the magic SysRq keys, but there seems to be no way to get it back into a usable state. All I can do is print debugging info to the console, and since it's only 640x480 there isn't much use as the screen's too small to read the backtrace(s). None of the debugging info gets saved to /var/log/, presumably because the rest of the kernel is locked up.

tags: removed: needs-upstream-testing
Revision history for this message
Hugh Cole-Baker (sigmaris) wrote :

I checked recently and decided to try the workaround in Luca's post. I couldn't find C-State settings in the bios, but did find "Intel SpeedStep" which I disabled, assuming it was another name for a related option. So far the system has been running without lockups, so there may be some connection to Speedstep with this bug.

Revision history for this message
Hugh Cole-Baker (sigmaris) wrote :

Provided the only kernel log I could obtain, in the form of a screenshot. The kernel writes no other logs after locking up.

Changed in linux (Ubuntu):
status: Incomplete → New
Brad Figg (brad-figg)
tags: added: acpi
tags: added: acpi-event
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: b73a1py79
Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.