Disk corruption and complete freezes. :(

Bug #381327 reported by Dan Shoutis
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

After some digging and tinkering, I suspect iwlagn / wireless networking.

Symptoms / timeline:
 - I've had this laptop (Lenovo T500) less than two weeks now.
 - I have an external SATA hard drive connected to the laptop with the help of a PCMCIA adapter.
 - The system hangs completely from time to time.
 - At first I suspected VirtualBox: The hangs were happening when working with Windows instances on the external SATA drive.
 - Additionally, the VirtualBox images were being corrupted, usually e.g. when downloading & installing XP SP2
 - Once the drive itself was corrupted, although this might be from the hard-power-off: Had to rescue some directories from lost+found after an fsck.
 - Then, I suspected the external drive of being bad, or maybe the adapter causing problems. (It has been working great for years.)
 - Oops, there were files on my internal drive being corrupted too! (Found a bunch of binary crap inside a header file. Forcing a reinstall of the package via aptitude made it better. Right now debsums reports that everything is matching up so far, besides a bunch of .desktop files....)
 - And, I've now gotten a handful of lockups without vbox running at all.
 - The typical freeze: No mouse movement, music playing loops infinitely over the last 1/10 of a second or so of sound, caps lock key flashes; nothing to do but hold down the power button.
 - Two nontypical freezes as well:
     - One, firefox (and possibly any new process, but I was trying to get work done at the time) would fail with "Bus error" when trying to start it from the command line; restarting made the issue go away.
     - Two, no new processes would start at all. I was able to switch to a console and try to log in there -- the login worked but bash never came up.
  - That second one was spewing kernel output once I got to a console, including messages about iwlagn or intel wireless, I forget which.

I'm currently stresstesting the system on a wired connection.

Possibly related:
 http://ubuntuforums.org/showthread.php?t=832383&highlight=flash+64+alpha&page=3
 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/300693
 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/303276 (However, I'm on wireless B/G and never N so far.)
 lkml thread "bad DMAR interaction with iwlagn and SATA"

Besides the obvious and eagerly anticipated fix, this is enough of a problem that a workaround would be very very very nice. (I'd almost rather have the hard lockups every time than occasional silent disk corruption, and I'm considering doing a clean install over a wired network just in case.)

I'm happy to attach any additional logs or information missed by ubuntu-bug.

Thanks and cheers.

ProblemType: Bug
Architecture: amd64
DistroRelease: Ubuntu 9.04
HibernationDevice: RESUME=UUID=2e67e80f-9fe0-4bb3-93fc-7b8a545e747d
MachineType: LENOVO 2081CTO
Package: linux-image-2.6.28-11-generic 2.6.28-11.42
ProcCmdLine: root=UUID=6f5a17ec-eb58-42ce-b69b-f99e8c5b7473 ro quiet splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.28-11.42-generic
SourcePackage: linux

Revision history for this message
Dan Shoutis (dan-shoutis) wrote :
Revision history for this message
Adonis Papaderos (ado-papas) wrote :

I do experience something similar to your problem, and I have found that it is because disk failures. You could use dd from a livecd (or smartctl) to see if your problem has the same source as mine.

Revision history for this message
Dan Shoutis (dan-shoutis) wrote :

Unfortunately*, the drives seem fine. No smart errors logged on internal or external drive, and a badblocks run on the entire esata drive came out clean.

* (Well, it would also suck to have a failing drive, too.)

Revision history for this message
Dan Shoutis (dan-shoutis) wrote :
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Dan,

Let us know how the stress testing over wired ethernet goes. If you suspect this might be related to wifi, I'd suggest also try installing linux-backports-modules-jaunty to see if it helps, it has an updated compat-wireless stack. If that is still problematic I'd also suggest trying the latest compat-wireless stack available upstream - http://wireless.kernel.org/en/users/Download . If this issue does exist upstream we'll want to make sure the upstream developers are notified. I'd also suggest trying the latest mainline kernel build as well, 2.6.30-rc8 as of this posting - https://wiki.ubuntu.com/KernelMainlineBuilds . Please let us know your results. Thanks.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: compat-wireless needs-upstream-testing
Revision history for this message
Dan Shoutis (dan-shoutis) wrote :

A few updates.

Running on a wired connection has resolved the corruption and hanging, at least so far; including situations that were triggering it before, such as pulling a torrent onto the eSATA drive.

However, it has panicked twice now upon shutdown after the "will now halt" message. (This is out of 20+ shutdowns or so, I estimate). I took a picture the second time & it's attached; the 'ieee80211_stop' line seems to point at wireless problems still.

I've installed linux-crashdump, but I haven't been able to persuade it to do its job yet. Google hasn't been very useful, and it's deadline season with my work so I can only pick at the issue. (Perhaps I need to edit the kernel parms for grub?)

Revision history for this message
Dantroline (daniel-arasweb) wrote :

I am running a T500 also and having some similar problems, although the disks are fine. I am getting random freezes, and already have had one corrupted installation which refused to boot and had to be redone (always back up your data!). I have done RAM tests and they were without fault. I am running Ubuntu 64bit 9.04.

I'm not getting panicked shut-downs, however. The problems I am seeing are with firefox/flash (they freeze, and until I re-installed the latest flash-nonfree, they caused a keyboard/mouse freeze and then a total lockup). With javascript and flash I also am finding random wireless network failures - I lose my connection and a new connection is renegotiated by the wifi card.

The problem as I see it is that rogue programs (adobe flash is the worst offender) which take up 100% of one of the processors are not caught out and all resources get squeezed - it's usually impossible to catch the program and close it in time (and frankly, one should never have to!). The freezes vary from regular lockups, to some kind of crash state where the caps-lock light flashes. I've never experienced this kind of instability with Ubuntu before.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

@Dantroline, care to open a new bug report as the issue you've described seems to be a different issue than what Dan has reported. It's helpful to the kernel team if bug reports target one specific issue against a specific set of hardware.

@Dan, thanks for the feedback as well as attaching the photo. When you have time, if you could test if the panic occurs with the latest mainline kernel - https://wiki.ubuntu.com/KernelMainlineBuilds , that would be great. Additionally, if it does happen again with the mainline kernel, would you be able to get a photo of the beginning of the panic and attach it here?

Revision history for this message
Dantroline (daniel-arasweb) wrote :

@Leann: I will collect my info together and do that. Cheers.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.