cpu soft lockup causes system lockup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Incomplete
|
High
|
Unassigned |
Bug Description
This large computational server with many users ran for 141 days, then suffered a cpu soft lockup.
The soft lockup repeated continuously.
System load gradually built up. After several hours, it was in the 20s. Shortly thereafter, the system
became unresponsive and had to be reset. Upon reboot, the system was fine.
Here is syslog for the first cpu soft lockup:
Sep 16 14:20:26 flatfish kernel: [12283888.860058] BUG: soft lockup - CPU#7 stuck for 22s! [ps:31125]
Sep 16 14:20:26 flatfish kernel: [12283888.864001] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 autofs4 microcode clip atm parport_pc ppdev binfmt_misc nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc shpchp psmouse radeon i7300_edac ttm drm_kms_helper drm edac_core dm_multipath dcdbas joydev serio_raw i2c_algo_bit mac_hid lp parport ses enclosure uas usb_storage usbhid hid sata_sil24 bnx2 megaraid_sas dm_raid45 xor dm_mirror dm_region_hash dm_log
Sep 16 14:20:26 flatfish kernel: [12283888.864001] CPU 7
Sep 16 14:20:26 flatfish kernel: [12283888.864001] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 autofs4 microcode clip atm parport_pc ppdev binfmt_misc nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc shpchp psmouse radeon i7300_edac ttm drm_kms_helper drm edac_core dm_multipath dcdbas joydev serio_raw i2c_algo_bit mac_hid lp parport ses enclosure uas usb_storage usbhid hid sata_sil24 bnx2 megaraid_sas dm_raid45 xor dm_mirror dm_region_hash dm_log
Sep 16 14:20:26 flatfish kernel: [12283888.864001]
Sep 16 14:20:26 flatfish kernel: [12283888.864001] Pid: 31125, comm: ps Tainted: G D W 3.2.0-23-generic #36-Ubuntu Dell Inc. PowerEdge R900/0TT975
Sep 16 14:20:26 flatfish kernel: [12283888.864001] RIP: 0010:[<
Sep 16 14:20:26 flatfish kernel: [12283888.864001] RSP: 0018:ffff8807c4
Sep 16 14:20:26 flatfish kernel: [12283888.864001] RAX: 000000000000c49f RBX: 00ff881f00000000 RCX: ffff881f6c5f8e00
Sep 16 14:20:26 flatfish kernel: [12283888.864001] RDX: 000000000000c4a0 RSI: ffffffff81c28020 RDI: ffff881f59846680
Sep 16 14:20:26 flatfish kernel: [12283888.864001] RBP: ffff8807c4d55a48 R08: 0000000000000009 R09: 0000000000000000
Sep 16 14:20:26 flatfish kernel: [12283888.864001] R10: ffff881eb5cbafc0 R11: 0000000000000009 R12: ffff8807c4d5590a
Sep 16 14:20:26 flatfish kernel: [12283888.864001] R13: 000000000000ffff R14: ffff8807c4d559ce R15: ffff8807c4d559a8
Sep 16 14:20:26 flatfish kernel: [12283888.988018] FS: 00007fed758f870
Sep 16 14:20:26 flatfish kernel: [12283888.988018] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 16 14:20:26 flatfish kernel: [12283888.988018] CR2: 00007fff2da5dff8 CR3: 0000001f5f72f000 CR4: 00000000000006e0
Sep 16 14:20:26 flatfish kernel: [12283888.988018] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 16 14:20:26 flatfish kernel: [12283888.988018] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 16 14:20:26 flatfish kernel: [12283888.988018] Process ps (pid: 31125, threadinfo ffff8807c4d54000, task ffff881f6d99ade0)
Sep 16 14:20:26 flatfish kernel: [12283888.988018] Stack:
Sep 16 14:20:26 flatfish kernel: [12283888.988018] ffff8807c4d55a58 ffffffff8165c46e ffff8807c4d55ac8 ffffffffa023a96e
Sep 16 14:20:26 flatfish kernel: [12283888.988018] ffff8807c4d55aa8 0000000000000001 ffff8807c4d55a98 ffff881eb5cbaff8
Sep 16 14:20:26 flatfish kernel: [12283888.988018] 00000009bf59e000 ffff881f59846680 ffff8807c4d55b58 ffff881eb5cbafc0
Sep 16 14:20:26 flatfish kernel: [12283888.988018] Call Trace:
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8165c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffffa023a
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffffa023a
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffffa023b
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81182
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81184
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81185
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81186
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81186
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81318
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81186
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81187
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8118c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81197
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8165c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81187
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8117c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81195
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81197
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8117c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8117c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8117c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81664
Sep 16 14:20:26 flatfish kernel: [12283888.988018] Code: 90 90 90 90 90 90 90 90 90 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 74 13 66 0f 1f 84 00 00 00 00 00 f3 90 <0f> b7 07 66 39 d0 75 f6 5d c3 0f 1f 40 00 8b 17 55 31 c0 48 89
Sep 16 14:20:26 flatfish kernel: [12283888.988018] Call Trace:
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8165c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffffa023a
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffffa023a
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffffa023b
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81182
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81184
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81185
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81186
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81186
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81318
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81186
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81187
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8118c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81197
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8165c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81187
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8117c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81195
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81197
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8117c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8117c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff8117c
Sep 16 14:20:26 flatfish kernel: [12283888.988018] [<ffffffff81664
This system relies heavily on NFS v3 for user home directories, scratch directories, data.
This system has 16 cores, 128 GB RAM.
We have previously seen this behavior with openSUSE 11.3, where we opened a bug
report: https:/
lsb_release -rd
Description: Ubuntu 12.04 LTS
Release: 12.04
I'm sending the results of "ubuntu-bug linux".
---
AlsaDevices:
total 0
crw-rw---T 1 root audio 116, 1 Sep 16 17:31 seq
crw-rw---T 1 root audio 116, 33 Sep 16 17:31 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu8
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=
InstallationMedia: Ubuntu-Server 12.04 LTS "Precise Pangolin" - Release amd64 (20120424.1)
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Dell Inc. PowerEdge R900
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
LANGUAGE=en_US:
TERM=xterm
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageV
linux-
linux-
linux-firmware 1.79
RfKill: Error: [Errno 2] No such file or directory
Tags: precise
Uname: Linux 3.2.0-23-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
WifiSyslog:
Sep 20 08:57:06 lamprey kernel: [314190.440030] megaraid_sas 0000:1a:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update.
Sep 20 09:10:23 lamprey kernel: [314782.500029] megaraid_sas 0000:1a:00.0: vpd r/w failed. This is likely a firmware bug on this device. Contact the card vendor for a firmware update.
dmi.bios.date: 10/09/2007
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.1.0
dmi.board.name: 0TT975
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.
dmi.product.name: PowerEdge R900
dmi.sys.vendor: Dell Inc.
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1053491
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.