random kernel crashes

Bug #706924 reported by Audrius Šaikūnas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

This issue has been daunting me for about a year.

Basically what happens is:
immediately after *cold* boot in about 5 minutes kernel sometimes crashes. If I manage to survive these 5 minutes without a crash - everything runs perfectly. If the kernel crashes and I reboot then no more crashing occurs.

This can be illustrated by 'last' logs:
 % last | grep crash -C 5
angel tty7 :0 Sat Jan 22 13:52 - down (13:20)
reboot system boot 2.6.35-23-generi Sat Jan 22 13:52 - 03:13 (13:20)
angel tty7 :0 Sat Jan 22 13:51 - crash (00:01)
reboot system boot 2.6.35-23-generi Sat Jan 22 13:50 - 03:13 (13:22)
angel pts/3 :0.0 Fri Jan 21 11:30 - down (14:41)
--
angel tty7 :0 Wed Jan 19 14:17 - down (12:57)
reboot system boot 2.6.35-23-generi Wed Jan 19 14:08 - 03:14 (13:06)
angel pts/3 :0.0 Wed Jan 19 14:03 - crash (00:04)
angel tty7 :0 Wed Jan 19 14:03 - crash (00:04)
reboot system boot 2.6.35-23-generi Wed Jan 19 14:03 - 03:14 (13:11)
angel pts/4 213.197.181.238 Tue Jan 18 18:34 - 18:48 (00:13)
--
angel tty7 :0 Tue Jan 11 13:40 - down (11:14)
reboot system boot 2.6.35-23-generi Tue Jan 11 13:40 - 00:55 (11:14)
angel tty7 :0 Tue Jan 11 13:39 - crash (00:01)
reboot system boot 2.6.35-23-generi Tue Jan 11 13:38 - 00:55 (11:16)
angel pts/3 :0.0 Mon Jan 10 15:01 - down (12:15)
angel tty7 :0 Mon Jan 10 15:01 - down (12:15)
reboot system boot 2.6.35-23-generi Mon Jan 10 15:00 - 03:16 (12:15)
angel tty7 :0 Mon Jan 10 14:59 - crash (00:00)
reboot system boot 2.6.35-23-generi Mon Jan 10 14:59 - 03:16 (12:17)
angel pts/3 :1.0 Mon Jan 10 14:57 - down (00:01)
--
reboot system boot 2.6.35-23-generi Fri Jan 7 13:59 - 00:46 (10:47)
reboot system boot 2.6.35-23-generi Fri Jan 7 13:53 - 13:55 (00:02)
angel pts/3 :0.0 Fri Jan 7 12:59 - crash (00:54)
angel tty7 :0 Fri Jan 7 12:58 - 13:52 (00:53)
reboot system boot 2.6.35-23-generi Fri Jan 7 12:58 - 13:55 (00:57)
--
angel tty7 :1 Tue Jan 4 15:36 - down (10:48)
reboot system boot 2.6.35-23-generi Tue Jan 4 15:35 - 02:24 (10:48)
angel pts/3 :0.0 Mon Jan 3 14:30 - crash (1+01:04)
angel tty7 :0 Mon Jan 3 14:30 - crash (1+01:04)
reboot system boot 2.6.35-23-generi Mon Jan 3 16:19 - 02:24 (1+10:05)
angel pts/3 :0.0 Mon Jan 3 16:16 - crash (00:02)
angel tty7 :0 Mon Jan 3 16:16 - crash (00:02)
reboot system boot 2.6.35-23-generi Mon Jan 3 16:15 - 02:24 (1+10:08)
angel pts/3 :0.0 Sun Jan 2 16:30 - down (07:38)

As you can see, crashes always occur within first 5 minutes of a cold boot and never again.

I've been using different kernels - 2.6.32.x, 2.6.35.x, different ubuntu versions - 10.04 and 10.10, different GPU drivers - r600 radeon, radeonhd and fglrx, tested with and without vmware modules - crashes never stopped.

I've also attached a typical crash log.
I've also observed one difference when using radeon vs fglrx drivers:
* When using fglrx, kernel crashes usually in a 'hard' way (caps & scroll lock leds are blinking, magic keys don't do anything). The only way out is to do a hard reset.
* When using radeon, kernel oopses. There are two ways kernel can crash:
** Usable: some of the running processes crash completely, but I can still launch a new process and to upload crash logs somewhere. I can also do "init 6" to restart the machine.
** Unusable: Computer is still running, but I can't launch any new process. If I run, say "cat /var/log/syslog", it just hangs (no reaction to SIGTERM/Ctrl+C). So the only way out is to use magic keys to reboot the machine.

If you have any questions regarding anything, please ask.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: linux-image-2.6.35-23-generic 2.6.35-23.41
Regression: No
Reproducible: No
ProcVersionSignature: Ubuntu 2.6.35-23.37-generic 2.6.35.7
Uname: Linux 2.6.35-23-generic x86_64
NonfreeKernelModules: fglrx wl
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: NVidia [HDA NVidia], device 0: STAC92xx Analog [STAC92xx Analog]
   Subdevices: 0/1
   Subdevice #0: subdevice #0
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC1', '/dev/snd/hwC1D0', '/dev/snd/pcmC1D3p', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/6702/fd/23: Stale NFS file handle
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'NVidia'/'HDA NVidia at 0xf0880000 irq 17'
   Mixer name : 'IDT 92HD73C1X5'
   Components : 'HDA:111d7675,102802a1,00100103'
   Controls : 24
   Simple ctrls : 15
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xafeec000 irq 47'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100100'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [off]
Date: Mon Jan 24 15:26:37 2011
Frequency: Once every few days.
HibernationDevice: RESUME=UUID=f8dbe2eb-0fd8-4d91-b46e-7ce738690ee0
MachineType: Alienware M17x
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-23-generic root=UUID=197a0956-7cf0-4dd6-973e-f6ba5066d23b ro apparmor=0 memory_corruption_check=0 nosplash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
RelatedPackageVersions: linux-firmware 1.38.3
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
StagingDrivers: udlfb
Title: [STAGING]
WifiSyslog:
 Jan 24 15:00:53 heaven kernel: [ 1236.180548] /dev/vmmon[5070]: PTSC: initialized at 2801000000 Hz using TSC
 Jan 24 15:01:00 heaven kernel: [ 1242.817066] /dev/vmnet: open called by PID 5081 (vmware-vmx)
 Jan 24 15:01:00 heaven kernel: [ 1242.817084] /dev/vmnet: port on hub 8 successfully opened
dmi.bios.date: 07/30/2010
dmi.bios.vendor: Alienware
dmi.bios.version: A05
dmi.board.vendor: Alienware
dmi.board.version: A05
dmi.chassis.type: 8
dmi.chassis.vendor: Alienware
dmi.chassis.version: A05
dmi.modalias: dmi:bvnAlienware:bvrA05:bd07/30/2010:svnAlienware:pnM17x:pvrA0523:rvnAlienware:rn:rvrA05:cvnAlienware:ct8:cvrA05:
dmi.product.name: M17x
dmi.product.version: A0523
dmi.sys.vendor: Alienware

Revision history for this message
Audrius Šaikūnas (tuxmarkv) wrote :
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Audrius,

If you could also please test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Audrius Šaikūnas (tuxmarkv) wrote :

I'm sorry, but I cannot test upstream kernels, because they don't have aufs filesystem module, on which my root file system is sitting. If anybody has any suggestions in this matter, please advise.

Revision history for this message
Audrius Šaikūnas (tuxmarkv) wrote :

I've managed to salvage another crash log.

Revision history for this message
Audrius Šaikūnas (tuxmarkv) wrote :

I've found a post from !6! years ago, which describes a similar bug:
http://www.linuxhelp.net/forums/index.php?showtopic=4090

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.