soft lockup - CPU#1 stuck for 61s! [cron:3954]

Bug #376363 reported by rah003
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

Binary package hint: evolution

had@ace2:~$ lsb_release -rd
Description: Ubuntu 9.04
Release: 9.04

had@ace2:~$ apt-cache policy evolution
evolution:
  Installed: 2.26.1-0ubuntu1
  Candidate: 2.26.1-0ubuntu1
  Version table:
 *** 2.26.1-0ubuntu1 0
        500 http://ubuntu.ynet.sk jaunty/main Packages
        100 /var/lib/dpkg/status

syslog:
May 14 07:51:07 ace2 kernel: [ 1778.117008] BUG: soft lockup - CPU#1 stuck for 61s! [cron:3954]
May 14 07:51:07 ace2 kernel: [ 1778.117008] Modules linked in: hidp xt_limit xt_tcpudp ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc nf_conntrack_ftp xt_state aes_x86_64 aes_generic binfmt_misc ppdev bridge stp bnep vboxnetflt vboxdrv input_polldev joydev lp parport snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq arc4 snd_timer ecb snd_seq_device iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle mmc_block iwlagn iwlcore iptable_filter snd tpm_infineon mac80211 video ip_tables soundcore tpm iTCO_wdt asus_laptop x_tables sdhci_pci sdhci snd_page_alloc tpm_bios intel_agp usbhid psmouse pcspkr serio_raw ricoh_mmc btusb cfg80211 iTCO_vendor_support output led_class nvidia(P) ohci1394 ieee1394 e1000e vesafb fbcon tileblit font bitblit softcursor
May 14 07:51:07 ace2 kernel: [ 1778.117008] CPU 1:
May 14 07:51:07 ace2 kernel: [ 1778.117008] Modules linked in: hidp xt_limit xt_tcpudp ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc nf_conntrack_ftp xt_state aes_x86_64 aes_generic binfmt_misc ppdev bridge stp bnep vboxnetflt vboxdrv input_polldev joydev lp parport snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq arc4 snd_timer ecb snd_seq_device iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle mmc_block iwlagn iwlcore iptable_filter snd tpm_infineon mac80211 video ip_tables soundcore tpm iTCO_wdt asus_laptop x_tables sdhci_pci sdhci snd_page_alloc tpm_bios intel_agp usbhid psmouse pcspkr serio_raw ricoh_mmc btusb cfg80211 iTCO_vendor_support output led_class nvidia(P) ohci1394 ieee1394 e1000e vesafb fbcon tileblit font bitblit softcursor
May 14 07:51:07 ace2 kernel: [ 1778.117008] Pid: 3954, comm: cron Tainted: P W 2.6.28-11-generic #42-Ubuntu
May 14 07:51:07 ace2 kernel: [ 1778.117008] RIP: 0010:[<ffffffff8041fa65>] [<ffffffff8041fa65>] __read_lock_failed+0x5/0x20
May 14 07:51:07 ace2 kernel: [ 1778.117008] RSP: 0018:ffff8801355b1d90 EFLAGS: 00000297
May 14 07:51:07 ace2 kernel: [ 1778.117008] RAX: ffff8801355b9660 RBX: ffff8801355b1d98 RCX: ffff8801355b1de8
May 14 07:51:07 ace2 kernel: [ 1778.117008] RDX: 0000000000000001 RSI: ffff880122d9b000 RDI: ffff880135dbca44
May 14 07:51:07 ace2 kernel: [ 1778.117008] RBP: ffff8801355b1d98 R08: 0000000000000008 R09: 00007fffc40aeb00
May 14 07:51:07 ace2 kernel: [ 1778.117008] R10: 0000000000000008 R11: 0000000000000246 R12: ffff880028040970
May 14 07:51:07 ace2 kernel: [ 1778.117008] R13: ffff8801355b9698 R14: ffffffff809b4c00 R15: ffff8801351141c8
May 14 07:51:07 ace2 kernel: [ 1778.117008] FS: 00007f4dbc08e780(0000) GS:ffff880137803a80(0000) knlGS:0000000000000000
May 14 07:51:07 ace2 kernel: [ 1778.117008] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 14 07:51:07 ace2 kernel: [ 1778.117008] CR2: 00007f64fe355000 CR3: 000000013559d000 CR4: 00000000000006a0
May 14 07:51:07 ace2 kernel: [ 1778.117008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 14 07:51:07 ace2 kernel: [ 1778.117008] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 14 07:51:07 ace2 kernel: [ 1778.117008] Call Trace:
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff8069e3ff>] ? _read_lock+0xf/0x20
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff802f39a0>] do_path_lookup+0x50/0x200
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff802f14b5>] ? getname+0x45/0xb0
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff802f48db>] user_path_at+0x7b/0xb0
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff8026c44c>] ? lock_hrtimer_base+0x2c/0x60
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff8026c50f>] ? hrtimer_try_to_cancel+0x3f/0x90
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff8026c57a>] ? hrtimer_cancel+0x1a/0x30
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff8069d580>] ? do_nanosleep+0x40/0xc0
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff802ebca8>] vfs_stat_fd+0x28/0x60
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff8026be60>] ? hrtimer_wakeup+0x0/0x30
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff802ebd87>] sys_newstat+0x27/0x50
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff8026ccaf>] ? sys_nanosleep+0x6f/0x80
May 14 07:51:07 ace2 kernel: [ 1778.117008] [<ffffffff8021253a>] system_call_fastpath+0x16/0x1b

What happened:
Just attempted to send mail with evolution, then whole system started to be very unresponsive. After cca 30 secs I also noticed that the network connection stopped working even though laptop was showing that it is still connected to wireless network. I had to reboot.
After restart and after starting evolution the same situation happened again as soon as evolution started and tried to resend previously unsent mail. This time however it was a hard lock, the X stopped responding completely (no mouse, no keyboard). There's no any useful info in syslog from this time.

Let me know if there is anything else I can provide to help track this thing.

affects: evolution (Ubuntu) → linux (Ubuntu)
Revision history for this message
Moritz Naumann (mnaumann) wrote :

On a wild guess, this can be related to a firmware (microcode) issue in iwlagn. I suggest installing linux-backports-modules-jaunty which will cause an updated (and apparently less problematic) firmware to be installed.

Revision history for this message
Moritz Naumann (mnaumann) wrote :

I should provide references to explain what makes me think this can be related to the iwlagn module, as well as other Intel wireless modules:

* I just reviewed a syslog (https://bugs.launchpad.net/linux/+bug/200509) of a user who had this issue and also had "iwlagn: Microcode SW error detected. Restarting 0x2000000." (which by itself is similar to bug #200509)

* A possibly related discussion on lkml:
   http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-07/msg12580.html
   (thread continues at http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-08/msg00141.html )

Revision history for this message
rah003 (rah-atlas) wrote :
Download full text (4.9 KiB)

Hey Moritz,
thanks for suggestion, but it is not really an option.
I tried backports already and with that i have same issue as before (when i was using intrepid) - the network keeps disconnecting frequently, specially under the load (e.g. when transfering data files bigger then 1GB) and since I work with large data files a lot this makes wireless totally useless to me. So for now it seems less painful to live with the lockup issue, specially since the frequency of it happening is cca once a week only.

Since you think it is network related rather then evolution, it might be of help that i see following info in syslog every few seconds (removed the real mac address and IP addresses):
ay 19 08:38:20 ace2 NetworkManager: <info> (wlan0): supplicant connection state: completed -> group handshake
May 19 08:38:20 ace2 NetworkManager: <info> (wlan0): supplicant connection state: group handshake -> completed
May 19 08:38:51 ace2 NetworkManager: <info> (wlan0): supplicant connection state: completed -> group handshake
May 19 08:38:51 ace2 NetworkManager: <info> (wlan0): supplicant connection state: group handshake -> completed
May 19 08:38:54 ace2 kernel: [ 290.065971] Inbound IN=wlan0 OUT= MAC=00:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:00 SRC=YYY.YYY.YYY.YYY DST=ZZZ.ZZZ.ZZZ.ZZZ LEN=92 TOS=0x00 PREC=0x00 TTL=49 ID=54402 DF PROTO=TCP SPT=6667 DPT=59313 WINDOW=5840 RES=0x00 ACK PSH URGP=0
M

and also this one (seen it first time today, never noticed before):
May 19 15:15:57 ace2 kernel: [24112.997210] no space for new kewModules linked in: xt_limit xt_tcpudp ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc nf_conntrack_ftp xt_state aes_x86_64 aes_generic hidp binfmt_misc ppdev bridge stp bnep vboxnetflt vboxdrv input_polldev joydev lp parport snd_hda_intel snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event arc4 snd_seq ecb mmc_block snd_timer iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iwlagn snd_seq_device iptable_mangle iwlcore iptable_filter snd ip_tables mac80211 psmouse soundcore sdhci_pci tpm_infineon tpm x_tables video asus_laptop serio_raw pcspkr snd_page_alloc ricoh_mmc sdhci intel_agp tpm_bios iTCO_wdt iTCO_vendor_support btusb cfg80211 output led_class nvidia(P) usbhid ohci1394 ieee1394 e1000e vesafb fbcon tileblit font bitblit softcursor
May 19 15:15:57 ace2 kernel: [24112.997338] Pid: 10, comm: events/1 Tainted: P W 2.6.28-11-generic #42-Ubuntu
May 19 15:15:57 ace2 kernel: [24112.997343] Call Trace:
May 19 15:15:57 ace2 kernel: [24112.997358] [<ffffffff80250927>] warn_slowpath+0xb7/0xf0
May 19 15:15:57 ace2 kernel: [24112.997368] [<ffffffff80373739>] ? ext4_check_descriptors+0x299/0x2a0
May 19 15:15:57 ace2 kernel: [24112.997376] [<ffffffff802fca21>] ? alloc_inode+0x1f1/0x220
May 19 15:15:57 ace2 kernel: [24112.997384] [<ffffffff8031765e>] ? inotify_d_instantiate+0x4e/0x60
May 19 15:15:57 ace2 kernel: [24112.997402] [<ffffffffa09ab2b0>] ? iwl4965_mac_set_key+0x0/0x160 [iwlagn]
May 19 15:15:57 ace2 kernel: [24112.997410] [<ffffffff803bff62>] ? debugfs_mknod+0xd2/0x130
May 19 15:15:57 ace2 kernel: [24112.997416] [<ffffffff802fb499>] ?...

Read more...

Revision history for this message
Andres Mujica (andres.mujica) wrote :

Thanks for testing and confirming against the latest Jaunty released. Please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux-image-2.6.28-11-generic 376363

If the issue remains in Jaunty, if you could also test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine this issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
Revision history for this message
rah003 (rah-atlas) wrote : apport-collect data

Architecture: amd64
DistroRelease: Ubuntu 9.04
HibernationDevice: RESUME=UUID=6ffc7339-ab76-48b0-a727-80f0da694366
MachineType: ASUSTeK Computer Inc. V1S
NonfreeKernelModules: nvidia
Package: linux-image-2.6.28-11-generic 2.6.28-11.42
PackageArchitecture: amd64
ProcCmdLine: root=UUID=d190b0cc-1d74-450d-8e51-c34e650705b2 ro quiet splash vga=775 acpi=force
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.28-13.44-generic
Uname: Linux 2.6.28-13-generic x86_64
UserGroups: adm admin cdrom dialout fax lpadmin plugdev sambashare saned vboxusers video www-data

Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :
Revision history for this message
rah003 (rah-atlas) wrote :

uploaded the requested info. Will try to reproduce the issue with the latest upstream kernel over the weekend. Right now running with kernel 2.6.28-13 and haven't run into the problem for the past week (tho I try to avoid stressing the load on network connection which brought that issue in most often).

tags: removed: needs-kernel-logs
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.