Thinkpad T60 hard hang after upgrade to LTS 8.04

Bug #230847 reported by marc staveley
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Debian)
Fix Released
Unknown
linux (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

After upgrading to Hardy (LTS 8.04) my Thinkpad T60 hard hangs most nights (overnight when powered on but not being used) and sometimes during the day (while being used).

The daytime hangs seem to be during heavy wired network use, but this could be a red-herring.

The only way to get the machine back is to hard power off and back on.
There are no messages in the log files (or dmesg) after the reboot to indicate what happened.

I am not the only person experiencing this problem, there are a number of reports on the thinkwiki web site.
http://www.thinkwiki.org/wiki/Installing_Ubuntu_8.04_(Hardy_Heron)_on_a_ThinkPad_T60
but no details.

I'm willing to instrument to catch this, if someone can tell me what to do.

I am not a Linux newbie, but have been working in Linux since the 0.98.0 kernel days, but have been out of the kernel development loop the last 5 years.

Gutsy (7.08) did not have this problem.

lsb_release -rd
Description: Ubuntu 8.04
Release: 8.04

lspci:
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express Memory Controller Hub (rev 03)
00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express PCI Express Root Port (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 2 (rev 02)
00:1c.2 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 3 (rev 02)
00:1c.3 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 4 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface Bridge (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family) SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc M52 [Mobility Radeon X1300]
02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02)
15:00.0 CardBus bridge: Texas Instruments PCI1510 PC card Cardbus Controller

Tags: cft-2.6.27
Revision history for this message
marc staveley (marc-ubuntu-staveley) wrote :

It happened again today, but this time under light network load, but while Rhythmbox was playing mp3's.

A fraction of a second of music repeated continuously while the machine did not respond otherwise (stuck in sound chip?)

Revision history for this message
marc staveley (marc-ubuntu-staveley) wrote :

It is likely this is the same as </a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=468069">Debian bug #468069</a> -- linux-2.6: CONFIG_E1000_NAPI causes hangs on T60 on high traffic load.

Can a kernel package be made with this option turned off?

Revision history for this message
marc staveley (marc-ubuntu-staveley) wrote :

Okay, so I didn't read far enough. The Debian guys fixed this by substituting the e1000e driver for the e1000 which they think doesn't have this problem. The e1000e driver comes with the kernel I'm running (sprite 2.6.24-16-generic) but I can't get it to bind to the ethernet hardware. It loads fine but doesn't create the device.

Revision history for this message
wfaust (junk-coloraid) wrote :

I had similar problems on my T60 with Suse and Debian. I spend days searching and testing various solutions published on the net. Most only fixed the problem on the first sight... but than various problems appeared when using the 24h/7d. Especialy receiving data at high speed or many small two-way requests (pop3 email...) often started showing the problems. I even tried various cardbus/express card network cards and none worked reliable making the problem even more worse as I had no reliable network connection at all.

In the end, all problems were solved after disabling ASPM in the network driver (e1000 or e1000e). I did publish a fix description for the e1000 an how to install it on Suse without recompiling a kernel. As this is just a patch for the e1000 orig e1000 sourceforge driver, I see no major reason why the description doesn't also apply to Ubuntu if the sourceforge e1000 driver compiles/installs fine. Maybe you want to try it beside disabling NAPI:

http://forums.suselinuxsupport.de/index.php?showtopic=66152&mode=threaded&pid=270086

Last time I checked, the current orig. 2.6.25 kernel has ASPM disabled by default and uses the e1000e module.

Revision history for this message
marc staveley (marc-ubuntu-staveley) wrote :

5 days and counting without a hang.

I followed wfaust's advise and built a new e1000 driver with their patches applied (against 7.6.15, I tried 8.0.1 but the patches don't apply against that version) to turn off ASPM

Revision history for this message
marc staveley (marc-ubuntu-staveley) wrote :

I upgraded to the latest Hardy kernel packages (linux-image-2.6.24-17-generic) which seems to have a downgraded e1000 driver (ver 7.3.20-k2-NAPI). 3 days and counting without a kernel hang, so I assume this version of the driver doesn't have the ASPM code.

Revision history for this message
marc staveley (marc-ubuntu-staveley) wrote :

I spoke too soon. I had two hard hangs with linux-image-2.6.24-17-generic so I have installed the patched e1000 driver and have not had a hang since.

Revision history for this message
wfaust (junk-coloraid) wrote :

Marc,

your problem seems to be identical to my T60. You asked for 8.0.1: The device ids for the 82573L got removed from e1000 8.0.1 as the e1000e driver is now the default. As the patched 7.6.15 seems to work fine, I currently have no urgent need readd the device ids to the latest e1000 driver.
Or is there a specific need or wanted feature in 8.0.1?

Did you check http://www.thinkwiki.org/wiki/Problem_with_e1000:_EEPROM_Checksum_Is_Not_Valid and the README
file shipped with the latest e1000 driver? Did you try one of the scripts or the Intel PreBoot solution in order to get rid of the problem completly?

Revision history for this message
marc staveley (marc-ubuntu-staveley) wrote :

I have no need for 8.0.1. The patched 7.6.15 works seems to be working fine (no hangs in 8 days). The only problem is that I keep having to rebuild the driver for each new kernel push from Ubuntu (2 new kernels in 2 weeks). But I can live with that.

I looked at http://www.thinkwiki.org/wiki/Problem_with_e1000:_EEPROM_Checksum_Is_Not_Valid, and tried the scripts, etc but that didn't fix the problem of the hard hangs, just the checksum reads.

The other "solution" they suggest is to add a very similar patch to e1000e as I have already added to e1000. I'll stick with what I've got until Ubuntu/Debian catches up.

Revision history for this message
wfaust (junk-coloraid) wrote :

I also tried the Lenovo and e1000 fix scripts (the lenovo script patches a different address in the eeprom) during the last days. The result was rather negative. It didn't fix anything and all I got was problems during boot time as the e1000 module wasn't loaded anymore after the Lenovo script changed the eeprom. I changed the eeprom back to the old setting but the problem with loading the module remained. I now have to force loading the e1000 module during boot to get things working again.

Changed in linux:
status: Unknown → Fix Released
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Marc,

I see you had sent a reply to this bug report but it is not showing up here. . . So I'm pasting your response as a comment here and will try to answer as best I can:

Marc: "Leann, I am willing to test, but when I went looking for linux-image-2.6.27-* I
could not find them. What repository are they in?"

You'll need to enable the Intrepid repository in order to get the 2.6.27 kernel.

Marc: "Also can you tell me what ethernet driver is used for the Intel
Corporation 82573L Gigabit Ethernet Controller - e1000 or e1000e? Also
does the driver still have the ASPM code enabled?"

Please attach the output of 'sudo lspci-vvnn' so I can see the device id for the controller in order to see which driver is used."

Marc: "ps. please bear in mind that it sometimes takes a few days for a hang to
happen."

No worries, thanks for testing.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi All,

There is a serious bug which may affect some people subscribed to this report so I wanted to pass along the information. Due to an unresolved bug in the e1000e driver in the 2.6.27 Linux kernel, this driver/kernel should not be used on Intel ethernet hardware supported by the e1000e driver (Intel GigE). Doing so may render your network hardware permanently inoperable.

Older Intel ethernet hardware which uses the e1000 driver is not affected by this; however, some hardware which used the e1000 driver in previous Ubuntu releases, such as hardware that uses a PCI Express bus, has been moved from e1000 to e1000e in the latest kernel releases. If in doubt, do not use this driver/kernel and subscribe to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263555 to be notified when the bug is fixed.

Thanks.

Revision history for this message
Michael Milligan (milli) wrote :

I upgraded to linux-image-2.6.27-4-generic and found that e1000e has been removed, but the PCI IDs were not added back to the e1000 driver... :-/ Can we get a -4-generic with an e1000 with PCI IDs from e1000e added back in?

Revision history for this message
Bryan Wu (cooloney) wrote :

This bug report is being closed because we received no response to the previous inquiry for information. Please reopen if this is still an issue in the current Ubuntu release, Jaunty Jackalope 9.04 - http://www.ubuntu.com/getubuntu/download. If the issue remains in Jaunty, please test the latest upstream kernel build - https://wiki.ubuntu.com/KernelMainlineBuilds . To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

Changed in linux (Ubuntu):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.