Comment 42 for bug 1154006

Revision history for this message
Frank Schilder (f-schilder) wrote : Re: 12.04 freezes many times daily

I recently upgraded to Ubunto 12.04LTS and since then observe the same problem. Initially I followd the discussion in <https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/993187>, but it seems to have gone to a dead end.

I would like to share some observations that hopefully help someone else to get closer to the source of this issue. My suspicion is a kernel problem. I kept a terminal with 'tail -f /var/log/syslog' open to see if something unusual happens before the computer freezes. And yes, every time just before the computer freezes, this is what is written en masse to the syslog:

Apr 10 18:58:24 ceylon kernel: [ 2384.656005] ioatdma 0000:00:08.0: reset
Apr 10 18:58:24 ceylon kernel: [ 2384.856005] ioatdma 0000:00:08.0: reset
[... repeats lots of times]
Apr 10 18:58:51 ceylon kernel: [ 2412.256005] ioatdma 0000:00:08.0: reset
Apr 10 18:58:51 ceylon kernel: [ 2412.456005] ioatdma 0000:00:08.0: reset
Apr 10 18:58:52 ceylon kernel: [ 2412.603756] BUG: soft lockup - CPU#3 stuck for 22s! [svn:5127]
Apr 10 18:58:52 ceylon kernel: [ 2412.603760] Modules linked in: rfcomm bnep bluetooth binfmt_misc ext2 vesafb nvidia(P) snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd i5000_edac tpm_infineon psmouse edac_core soundcore snd_page_alloc ioatdma ppdev i5k_amb mac_hid tpm_tis shpchp serio_raw dca parport_pc lp parport usbhid hid usb_storage firewire_ohci firewire_core crc_itu_t floppy e1000e mptsas mptscsih mptbase scsi_transport_sas
Apr 10 18:58:52 ceylon kernel: [ 2412.603803] CPU 3
Apr 10 18:58:52 ceylon kernel: [ 2412.603805] Modules linked in: rfcomm bnep bluetooth binfmt_misc ext2 vesafb nvidia(P) snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd i5000_edac tpm_infineon psmouse edac_core soundcore snd_page_alloc ioatdma ppdev i5k_amb mac_hid tpm_tis shpchp serio_raw dca parport_pc lp parport usbhid hid usb_storage firewire_ohci firewire_core crc_itu_t floppy e1000e mptsas mptscsih mptbase scsi_transport_sas
Apr 10 18:58:52 ceylon kernel: [ 2412.603838]
Apr 10 18:58:52 ceylon kernel: [ 2412.603840] Pid: 5127, comm: svn Tainted: P W O 3.2.0-40-generic #64-Ubuntu FUJITSU SIEMENS CELSIUS R640 /D1808
Apr 10 18:58:52 ceylon kernel: [ 2412.603846] RIP: 0010:[<ffffffff8165e640>] [<ffffffff8165e640>] _raw_spin_lock_bh+0x30/0x30
Apr 10 18:58:52 ceylon kernel: [ 2412.603855] RSP: 0018:ffff880203effae0 EFLAGS: 00000286
Apr 10 18:58:52 ceylon kernel: [ 2412.603857] RAX: ffff88022deb2000 RBX: 000000000000003c RCX: 0000000000000001
Apr 10 18:58:52 ceylon kernel: [ 2412.603860] RDX: ffff88022bd70018 RSI: 0000000000000004 RDI: ffff88022bd70468
Apr 10 18:58:52 ceylon kernel: [ 2412.603863] RBP: ffff880203effb18 R08: 000000000000003c R09: dead000000200200
Apr 10 18:58:52 ceylon kernel: [ 2412.603865] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Apr 10 18:58:52 ceylon kernel: [ 2412.603867] R13: 0000000000000000 R14: 0000000000000000 R15: ffff880203effad8
Apr 10 18:58:52 ceylon kernel: [ 2412.603870] FS: 00007f0f5ad8d780(0000) GS:ffff88023fd80000(0000) knlGS:0000000000000000
Apr 10 18:58:52 ceylon kernel: [ 2412.603873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 10 18:58:52 ceylon kernel: [ 2412.603875] CR2: 00007f0f4d2f0798 CR3: 000000020f239000 CR4: 00000000000006e0
Apr 10 18:58:52 ceylon kernel: [ 2412.603878] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 10 18:58:52 ceylon kernel: [ 2412.603881] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 10 18:58:52 ceylon kernel: [ 2412.603883] Process svn (pid: 5127, threadinfo ffff880203efe000, task ffff880215bac500)
Apr 10 18:58:52 ceylon kernel: [ 2412.603885] Stack:
Apr 10 18:58:52 ceylon kernel: [ 2412.603887] ffffffffa0122372 ffff880203effbc8 00000002132290c0 0000000000000000
Apr 10 18:58:52 ceylon kernel: [ 2412.603893] ffff88022bd70418 0000000000000004 ffff880203effb8c ffff880203effb38
Apr 10 18:58:52 ceylon kernel: [ 2412.603897] ffffffffa0122626 0000000000000001 ffff88022bd70418 ffff880203effb68
Apr 10 18:58:52 ceylon kernel: [ 2412.603902] Call Trace:
Apr 10 18:58:52 ceylon kernel: [ 2412.603911] [<ffffffffa0122372>] ? ioat1_cleanup+0x32/0xa0 [ioatdma]
Apr 10 18:58:52 ceylon kernel: [ 2412.603917] [<ffffffffa0122626>] ioat1_cleanup_event+0x16/0x30 [ioatdma]
Apr 10 18:58:52 ceylon kernel: [ 2412.603922] [<ffffffffa012106e>] ioat_dma_tx_status+0x6e/0x100 [ioatdma]
Apr 10 18:58:52 ceylon kernel: [ 2412.603927] [<ffffffff81585d92>] ? tcp_service_net_dma+0x92/0x140
Apr 10 18:58:52 ceylon kernel: [ 2412.603930] [<ffffffff81585d77>] tcp_service_net_dma+0x77/0x140
Apr 10 18:58:52 ceylon kernel: [ 2412.603934] [<ffffffff81589a13>] tcp_recvmsg+0x333/0xaf0
Apr 10 18:58:52 ceylon kernel: [ 2412.603939] [<ffffffff815adbdb>] inet_recvmsg+0x6b/0x80
Apr 10 18:58:52 ceylon kernel: [ 2412.603943] [<ffffffff8152ce0d>] sock_recvmsg+0x11d/0x140
Apr 10 18:58:52 ceylon kernel: [ 2412.603949] [<ffffffff81125acd>] ? lru_cache_add_lru+0x2d/0x50
Apr 10 18:58:52 ceylon kernel: [ 2412.603953] [<ffffffff81092138>] ? sched_clock_cpu+0xa8/0x110
Apr 10 18:58:52 ceylon kernel: [ 2412.603958] [<ffffffff810614ec>] ? task_sched_runtime+0x5c/0xa0
Apr 10 18:58:52 ceylon kernel: [ 2412.603963] [<ffffffff8101bde3>] ? native_sched_clock+0x13/0x80
Apr 10 18:58:52 ceylon kernel: [ 2412.603966] [<ffffffff81091fa5>] ? sched_clock_local+0x25/0x90
Apr 10 18:58:52 ceylon kernel: [ 2412.603969] [<ffffffff81092138>] ? sched_clock_cpu+0xa8/0x110
Apr 10 18:58:52 ceylon kernel: [ 2412.603973] [<ffffffff8153078e>] sys_recvfrom+0xee/0x170
Apr 10 18:58:52 ceylon kernel: [ 2412.603976] [<ffffffff8101b769>] ? read_tsc+0x9/0x20
Apr 10 18:58:52 ceylon kernel: [ 2412.603981] [<ffffffff8118cd05>] ? poll_select_set_timeout+0x85/0xa0
Apr 10 18:58:52 ceylon kernel: [ 2412.603985] [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
Apr 10 18:58:52 ceylon kernel: [ 2412.603988] Code: 48 83 ec 08 66 66 66 66 90 48 89 fb e8 8a 00 a1 ff 48 89 df e8 82 05 9e ff 66 90 48 83 c4 08 5b 5d c3 66 0f 1f 84 00 00 00 00 00 <55> 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 66 66 66 66 90
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] Call Trace:
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffffa0122372>] ? ioat1_cleanup+0x32/0xa0 [ioatdma]
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffffa0122626>] ioat1_cleanup_event+0x16/0x30 [ioatdma]
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffffa012106e>] ioat_dma_tx_status+0x6e/0x100 [ioatdma]
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81585d92>] ? tcp_service_net_dma+0x92/0x140
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81585d77>] tcp_service_net_dma+0x77/0x140
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81589a13>] tcp_recvmsg+0x333/0xaf0
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff815adbdb>] inet_recvmsg+0x6b/0x80
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8152ce0d>] sock_recvmsg+0x11d/0x140
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81125acd>] ? lru_cache_add_lru+0x2d/0x50
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81092138>] ? sched_clock_cpu+0xa8/0x110
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff810614ec>] ? task_sched_runtime+0x5c/0xa0
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8101bde3>] ? native_sched_clock+0x13/0x80
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81091fa5>] ? sched_clock_local+0x25/0x90
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81092138>] ? sched_clock_cpu+0xa8/0x110
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8153078e>] sys_recvfrom+0xee/0x170
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8101b769>] ? read_tsc+0x9/0x20
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff8118cd05>] ? poll_select_set_timeout+0x85/0xa0
Apr 10 18:58:52 ceylon kernel: [ 2412.604002] [<ffffffff81666a82>] system_call_fastpath+0x16/0x1b
Apr 10 18:58:52 ceylon kernel: [ 2412.656005] ioatdma 0000:00:08.0: reset
Apr 10 18:58:52 ceylon kernel: [ 2412.856005] ioatdma 0000:00:08.0: reset
[... starts repeating from here]

My observation is, that my computer dies in stages. It has 4 CPU cores and it seems as if they get stuck one after the other in whatever this "soft lock up" is, leading to a complete freeze in short time. In my case, I was running an svn command when this task got stuck and the messages above started showing up. At this time my computer was still responding and I was able to change to a terminal window, where I tried to kill the svn task. Even a 'kill -9' as root did not stop it. Apparently, the kernal was not able to process signals any more.

Another observation is, that there is no actual crash. The kernel seems to keep writing the syslog messages above even after the computer freezes. The kernel is running, but it seems that no event que is processed any more, which looks like a complete freeze to a user.

If anyone can tell me what to look for in my syslog files I will happily do so to shed some more light on this mysterious and annoying problem.