I would like to share some observations that hopefully help someone else to get closer to the source of this issue. My suspicion is a kernel problem. I kept a terminal with 'tail -f /var/log/syslog' open to see if something unusual happens before the computer freezes. And yes, every time just before the computer freezes, this is what is written en masse to the syslog:
My observation is, that my computer dies in stages. It has 4 CPU cores and it seems as if they get stuck one after the other in whatever this "soft lock up" is, leading to a complete freeze in short time. In my case, I was running an svn command when this task got stuck and the messages above started showing up. At this time my computer was still responding and I was able to change to a terminal window, where I tried to kill the svn task. Even a 'kill -9' as root did not stop it. Apparently, the kernal was not able to process signals any more.
Another observation is, that there is no actual crash. The kernel seems to keep writing the syslog messages above even after the computer freezes. The kernel is running, but it seems that no event que is processed any more, which looks like a complete freeze to a user.
If anyone can tell me what to look for in my syslog files I will happily do so to shed some more light on this mysterious and annoying problem.
I recently upgraded to Ubunto 12.04LTS and since then observe the same problem. Initially I followd the discussion in <https:/ /bugs.launchpad .net/ubuntu/ +source/ xserver- xorg-video- intel/+ bug/993187>, but it seems to have gone to a dead end.
I would like to share some observations that hopefully help someone else to get closer to the source of this issue. My suspicion is a kernel problem. I kept a terminal with 'tail -f /var/log/syslog' open to see if something unusual happens before the computer freezes. And yes, every time just before the computer freezes, this is what is written en masse to the syslog:
Apr 10 18:58:24 ceylon kernel: [ 2384.656005] ioatdma 0000:00:08.0: reset codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd i5000_edac tpm_infineon psmouse edac_core soundcore snd_page_alloc ioatdma ppdev i5k_amb mac_hid tpm_tis shpchp serio_raw dca parport_pc lp parport usbhid hid usb_storage firewire_ohci firewire_core crc_itu_t floppy e1000e mptsas mptscsih mptbase scsi_transport_sas codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd i5000_edac tpm_infineon psmouse edac_core soundcore snd_page_alloc ioatdma ppdev i5k_amb mac_hid tpm_tis shpchp serio_raw dca parport_pc lp parport usbhid hid usb_storage firewire_ohci firewire_core crc_itu_t floppy e1000e mptsas mptscsih mptbase scsi_transport_sas ffffffff8165e64 0>] [<ffffffff8165e 640>] _raw_spin_ lock_bh+ 0x30/0x30 effae0 EFLAGS: 00000286 0(0000) GS:ffff88023fd8 0000(0000) knlGS:000000000 0000000 372>] ? ioat1_cleanup+ 0x32/0xa0 [ioatdma] 626>] ioat1_cleanup_ event+0x16/ 0x30 [ioatdma] 06e>] ioat_dma_ tx_status+ 0x6e/0x100 [ioatdma] d92>] ? tcp_service_ net_dma+ 0x92/0x140 d77>] tcp_service_ net_dma+ 0x77/0x140 a13>] tcp_recvmsg+ 0x333/0xaf0 bdb>] inet_recvmsg+ 0x6b/0x80 e0d>] sock_recvmsg+ 0x11d/0x140 acd>] ? lru_cache_ add_lru+ 0x2d/0x50 138>] ? sched_clock_ cpu+0xa8/ 0x110 4ec>] ? task_sched_ runtime+ 0x5c/0xa0 de3>] ? native_ sched_clock+ 0x13/0x80 fa5>] ? sched_clock_ local+0x25/ 0x90 138>] ? sched_clock_ cpu+0xa8/ 0x110 78e>] sys_recvfrom+ 0xee/0x170 769>] ? read_tsc+0x9/0x20 d05>] ? poll_select_ set_timeout+ 0x85/0xa0 a82>] system_ call_fastpath+ 0x16/0x1b 372>] ? ioat1_cleanup+ 0x32/0xa0 [ioatdma] 626>] ioat1_cleanup_ event+0x16/ 0x30 [ioatdma] 06e>] ioat_dma_ tx_status+ 0x6e/0x100 [ioatdma] d92>] ? tcp_service_ net_dma+ 0x92/0x140 d77>] tcp_service_ net_dma+ 0x77/0x140 a13>] tcp_recvmsg+ 0x333/0xaf0 bdb>] inet_recvmsg+ 0x6b/0x80 e0d>] sock_recvmsg+ 0x11d/0x140 acd>] ? lru_cache_ add_lru+ 0x2d/0x50 138>] ? sched_clock_ cpu+0xa8/ 0x110 4ec>] ? task_sched_ runtime+ 0x5c/0xa0 de3>] ? native_ sched_clock+ 0x13/0x80 fa5>] ? sched_clock_ local+0x25/ 0x90 138>] ? sched_clock_ cpu+0xa8/ 0x110 78e>] sys_recvfrom+ 0xee/0x170 769>] ? read_tsc+0x9/0x20 d05>] ? poll_select_ set_timeout+ 0x85/0xa0 a82>] system_ call_fastpath+ 0x16/0x1b
Apr 10 18:58:24 ceylon kernel: [ 2384.856005] ioatdma 0000:00:08.0: reset
[... repeats lots of times]
Apr 10 18:58:51 ceylon kernel: [ 2412.256005] ioatdma 0000:00:08.0: reset
Apr 10 18:58:51 ceylon kernel: [ 2412.456005] ioatdma 0000:00:08.0: reset
Apr 10 18:58:52 ceylon kernel: [ 2412.603756] BUG: soft lockup - CPU#3 stuck for 22s! [svn:5127]
Apr 10 18:58:52 ceylon kernel: [ 2412.603760] Modules linked in: rfcomm bnep bluetooth binfmt_misc ext2 vesafb nvidia(P) snd_hda_
Apr 10 18:58:52 ceylon kernel: [ 2412.603803] CPU 3
Apr 10 18:58:52 ceylon kernel: [ 2412.603805] Modules linked in: rfcomm bnep bluetooth binfmt_misc ext2 vesafb nvidia(P) snd_hda_
Apr 10 18:58:52 ceylon kernel: [ 2412.603838]
Apr 10 18:58:52 ceylon kernel: [ 2412.603840] Pid: 5127, comm: svn Tainted: P W O 3.2.0-40-generic #64-Ubuntu FUJITSU SIEMENS CELSIUS R640 /D1808
Apr 10 18:58:52 ceylon kernel: [ 2412.603846] RIP: 0010:[<
Apr 10 18:58:52 ceylon kernel: [ 2412.603855] RSP: 0018:ffff880203
Apr 10 18:58:52 ceylon kernel: [ 2412.603857] RAX: ffff88022deb2000 RBX: 000000000000003c RCX: 0000000000000001
Apr 10 18:58:52 ceylon kernel: [ 2412.603860] RDX: ffff88022bd70018 RSI: 0000000000000004 RDI: ffff88022bd70468
Apr 10 18:58:52 ceylon kernel: [ 2412.603863] RBP: ffff880203effb18 R08: 000000000000003c R09: dead000000200200
Apr 10 18:58:52 ceylon kernel: [ 2412.603865] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Apr 10 18:58:52 ceylon kernel: [ 2412.603867] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880203effad8
Apr 10 18:58:52 ceylon kernel: [ 2412.603870] FS: 00007f0f5ad8d78
Apr 10 18:58:52 ceylon kernel: [ 2412.603873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 10 18:58:52 ceylon kernel: [ 2412.603875] CR2: 00007f0f4d2f0798 CR3: 000000020f239000 CR4: 00000000000006e0
Apr 10 18:58:52 ceylon kernel: [ 2412.603878] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 10 18:58:52 ceylon kernel: [ 2412.603881] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 10 18:58:52 ceylon kernel: [ 2412.603883] Process svn (pid: 5127, threadinfo ffff880203efe000, task ffff880215bac500)
Apr 10 18:58:52 ceylon kernel: [ 2412.603885] Stack:
Apr 10 18:58:52 ceylon kernel: [ 2412.603887] ffffffffa0122372 ffff880203effbc8 00000002132290c0 0000000000000000
Apr 10 18:58:52 ceylon kernel: [ 2412.603893] ffff88022bd70418 0000000000000004 ffff880203effb8c ffff880203effb38
Apr 10 18:58:52 ceylon kernel: [ 2412.603897] ffffffffa0122626 0000000000000001 ffff88022bd70418 ffff880203effb68
Apr 10 18:58:52 ceylon kernel: [ 2412.603902] Call Trace:
Apr 10 18:58:52 ceylon kernel: [ 2412.603911] [<ffffffffa0122
Apr 10 18:58:52 ceylon kernel: [ 2412.603917] [<ffffffffa0122
Apr 10 18:58:52 ceylon kernel: [ 2412.603922] [<ffffffffa0121
Apr 10 18:58:52 ceylon kernel: [ 2412.603927] [<ffffffff81585
Apr 10 18:58:52 ceylon kernel: [ 2412.603930] [<ffffffff81585
Apr 10 18:58:52 ceylon kernel: [ 2412.603934] [<ffffffff81589
Apr 10 18:58:52 ceylon kernel: [ 2412.603939] [<ffffffff815ad
Apr 10 18:58:52 ceylon kernel: [ 2412.603943] [<ffffffff8152c
Apr 10 18:58:52 ceylon kernel: [ 2412.603949] [<ffffffff81125
Apr 10 18:58:52 ceylon kernel: [ 2412.603953] [<ffffffff81092
Apr 10 18:58:52 ceylon kernel: [ 2412.603958] [<ffffffff81061
Apr 10 18:58:52 ceylon kernel: [ 2412.603963] [<ffffffff8101b
Apr 10 18:58:52 ceylon kernel: [ 2412.603966] [<ffffffff81091
Apr 10 18:58:52 ceylon kernel: [ 2412.603969] [<ffffffff81092
Apr 10 18:58:52 ceylon kernel: [ 2412.603973] [<ffffffff81530
Apr 10 18:58:52 ceylon kernel: [ 2412.603976] [<ffffffff8101b
Apr 10 18:58:52 ceylon kernel: [ 2412.603981] [<ffffffff8118c
Apr 10 18:58:52 ceylon kernel: [ 2412.603985] [<ffffffff81666
Apr 10 18:58:52 ceylon kernel: [ 2412.603988] Code: 48 83 ec 08 66 66 66 66 90 48 89 fb e8 8a 00 a1 ff 48 89 df e8 82 05 9e ff 66 90 48 83 c4 08 5b 5d c3 66 0f 1f 84 00 00 00 00 00 <55> 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] Call Trace:
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffffa0122
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffffa0122
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffffa0121
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81585
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81585
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81589
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff815ad
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8152c
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81125
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81092
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81061
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8101b
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81091
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81092
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81530
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8101b
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8118c
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81666
Apr 10 18:58:52 ceylon kernel: [ 2412.656005] ioatdma 0000:00:08.0: reset
Apr 10 18:58:52 ceylon kernel: [ 2412.856005] ioatdma 0000:00:08.0: reset
[... starts repeating from here]
My observation is, that my computer dies in stages. It has 4 CPU cores and it seems as if they get stuck one after the other in whatever this "soft lock up" is, leading to a complete freeze in short time. In my case, I was running an svn command when this task got stuck and the messages above started showing up. At this time my computer was still responding and I was able to change to a terminal window, where I tried to kill the svn task. Even a 'kill -9' as root did not stop it. Apparently, the kernal was not able to process signals any more.
Another observation is, that there is no actual crash. The kernel seems to keep writing the syslog messages above even after the computer freezes. The kernel is running, but it seems that no event que is processed any more, which looks like a complete freeze to a user.
If anyone can tell me what to look for in my syslog files I will happily do so to shed some more light on this mysterious and annoying problem.