Dell T7500 Hangs/Freezes Repeatedly

Bug #568549 reported by sog
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I have a Dell T7500 workstation running the AMD64 Lucid distribution, fresh install, that completely hangs repeatedly, multiple times per day. There is no obvious cause, which is making it difficult to pin down the issue.

It does not appear to be related to USB devices or the sound card, because the machine has frozen while those devices were unavailable due to a separate bug.

The machine hangs completely, in that it is not just an X lockup. The machine becomes unresponsive to ssh, and when systems are connected via that protocol prior to a freeze they display no seriously abnormal behavior, they just stop responding.

I'm attaching the recommended logs, and would be happy to provide any other information that would assist in isolating the root cause of the problem.
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: sog 1878 F.... pulseaudio
 /dev/snd/controlC1: sog 1878 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf7ffc000 irq 16'
   Mixer name : 'Analog Devices AD1984A'
   Components : 'HDA:11d4194a,1028026d,00100400'
   Controls : 34
   Simple ctrls : 20
Card1.Amixer.info:
 Card hw:1 'XFi'/'Creative X-Fi 20K2 Unknown'
   Mixer name : '20K2'
   Components : ''
   Controls : 29
   Simple ctrls : 10
DistroRelease: Ubuntu 10.04
Frequency: Once a day.
HibernationDevice: RESUME=UUID=1f268f45-7aa6-403f-a945-2667971b22e9
MachineType: Dell Inc. Precision WorkStation T7500
NonfreeKernelModules: nvidia
Package: linux (not installed)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-22-generic root=UUID=85807720-72d9-42f8-89ac-c13db1fccf70 ro quiet splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.32-22.33-generic 2.6.32.11+drm33.2
Regression: No
RelatedPackageVersions: linux-firmware 1.34
Reproducible: No
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
Tags: lucid needs-upstream-testing
Uname: Linux 2.6.32-22-generic x86_64
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare vboxusers
dmi.bios.date: 09/09/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A03
dmi.board.name: 0D881F
dmi.board.vendor: Dell Inc.
dmi.board.version: A06
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA03:bd09/09/2009:svnDellInc.:pnPrecisionWorkStationT7500:pvr:rvnDellInc.:rn0D881F:rvrA06:cvnDellInc.:ct7:cvr:
dmi.product.name: Precision WorkStation T7500
dmi.sys.vendor: Dell Inc.

Revision history for this message
sog (sogrady) wrote :
Revision history for this message
sog (sogrady) wrote :
Revision history for this message
sog (sogrady) wrote :
Revision history for this message
sog (sogrady) wrote :
Revision history for this message
sog (sogrady) wrote :

Further info: this does not appear to be a bug related to the proprietary NVidia drivers. Disabling them has had no effect; the machine still locks up even without them installed and active.

tags: added: kj-triage
Revision history for this message
sog (sogrady) wrote :

More data. The machine has hung three times in two hours. The third time I SSH'd into the machine and did a tail of the output to see if there were any indications it was about to hang. Here's the output:

sog@bishop:~$ sudo tail -f /var/log/messages
Apr 28 09:34:48 bishop kernel: [ 33.462707] [drm] nouveau 0000:03:00.0: 0xBFF3: parsing clock script 0
Apr 28 09:34:48 bishop kernel: [ 33.462965] [drm] nouveau 0000:03:00.0: 0xB95B: parsing clock script 1
Apr 28 09:34:48 bishop kernel: [ 33.466590] Console: switching to colour frame buffer device 320x100
Apr 28 09:34:48 bishop kernel: [ 33.558302] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Apr 28 09:34:48 bishop kernel: [ 33.878202] [drm] nouveau 0000:03:00.0: Allocating FIFO number 2
Apr 28 09:34:48 bishop kernel: [ 33.906195] [drm] nouveau 0000:03:00.0: nouveau_channel_alloc: initialised FIFO 2
Apr 28 09:34:59 bishop kernel: [ 45.102229] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
Apr 28 09:36:37 bishop kernel: [ 138.393011] __ratelimit: 9 callbacks suppressed
Apr 28 09:36:37 bishop kernel: [ 138.393015] chromium-browse[6206]: segfault at 20 ip 000000000112ed48 sp 00007fff15f9d360 error 4 in chromium-browser[400000+277e000]
Apr 28 09:37:21 bishop kernel: [ 181.493911] CE: hpet increasing min_delta_ns to 15000 nsec
Apr 28 09:52:56 bishop pulseaudio[1643]: ratelimit.c: 16 events suppressed
Write failed: Broken pipe

Revision history for this message
sog (sogrady) wrote :

Might it be the AMD64 pulseaudio builds? This has been a problem in the past:

http://ubuntuforums.org/showthread.php?t=1317918

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi sog,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 568549

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
sog (sogrady) wrote : AlsaDevices.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
sog (sogrady) wrote : AplayDevices.txt

apport information

Revision history for this message
sog (sogrady) wrote : ArecordDevices.txt

apport information

Revision history for this message
sog (sogrady) wrote : BootDmesg.txt

apport information

Revision history for this message
sog (sogrady) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
sog (sogrady) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
sog (sogrady) wrote : Card1.Amixer.values.txt

apport information

Revision history for this message
sog (sogrady) wrote : CurrentDmesg.txt

apport information

Revision history for this message
sog (sogrady) wrote : IwConfig.txt

apport information

Revision history for this message
sog (sogrady) wrote : Lspci.txt

apport information

Revision history for this message
sog (sogrady) wrote : Lsusb.txt

apport information

Revision history for this message
sog (sogrady) wrote : PciMultimedia.txt

apport information

Revision history for this message
sog (sogrady) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
sog (sogrady) wrote : ProcInterrupts.txt

apport information

Revision history for this message
sog (sogrady) wrote : ProcModules.txt

apport information

Revision history for this message
sog (sogrady) wrote : UdevDb.txt

apport information

Revision history for this message
sog (sogrady) wrote :

This is still an issue for this machine. I've completed the requested apport-collect, and I will try latest upstream kernel ASAP.

Revision history for this message
sog (sogrady) wrote : UdevLog.txt

apport information

Revision history for this message
sog (sogrady) wrote : WifiSyslog.txt

apport information

Revision history for this message
sog (sogrady) wrote :

So far, I'm having good luck with the 2.6.32-22-generic build. The workstation has been up for one day and 19 hours, a record so far. I'll report back and see if we have any regression issues with the 2.6.32.23.24 build I just applied, but for now, we might be good.

Revision history for this message
Martin Bogomolni (martinbogo) wrote :

The hardware power management in the T7500 causes the random crash and freezes. When the machine changes between the ACPI "C" power states (Intel TurboBoost/Intel PowerStep) the entire machine will freeze and become unresponsive. Kernel newer than 2.6.29 have some code that helps, but the issue still is not solved even in kernel 2.6.34.

You must _disable_ all power management features in the BIOS of the T7500 for stable operation.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu development release http://cdimage.ubuntu.com/daily-live/current/ . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
sog (sogrady) wrote :

@Martin Bogomolni's BIOS suggestion originally fixed the issue for me, but the bug has been reintroduced in one of the kernel updates to Lucid. The machine hangs completely, forcing a manual restart. It's a major issue.

Changed in linux (Ubuntu):
status: Expired → In Progress
Revision history for this message
sog (sogrady) wrote :

This bug is not only present in 11.10, but more frequent. Since upgrading yesterday afternoon, the workstation has hung three times; twice in an hour this morning. Same symptoms: unpredictable freeze, machine unresponsive and requiring a hard restart.

It remains a major issue.

Revision history for this message
Riccardo Poli (rpoli) wrote :

I just wanted to report that I eventually cured the problem by disabling some bios option to do with power save states.

Revision history for this message
sog (sogrady) wrote :

Thanks! I've disabled the C-States option and all other related power management features, and am still experiencing the bug.

Revision history for this message
penalvch (penalvch) wrote :

sog, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: bios-outdated-a16 oneiric
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: In Progress → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.