System locks up unrecoverably after CPU soft lockup

Bug #1689951 reported by Gordon P. Hemsley
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Compiz
New
Undecided
Unassigned
Nvidia
New
Undecided
Unassigned
linux
New
Undecided
Unassigned
systemd
New
Undecided
Unassigned
Ubuntu
Confirmed
Undecided
Unassigned

Bug Description

I recently updated from 16.04 LTS to 17.04.

Ever since then, I have had repeated situations were the system will lock up (often while asleep) in such a way that it no longer responds to any kind of input and must be hard rebooted.

When checking the syslog, I see repeated instances of messages like this:

May 10 07:36:40 Hickory kernel: [224616.143587] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [compiz:4946]
May 10 07:36:40 Hickory kernel: [224616.143590] Modules linked in: ccm pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) xt_CHECKSUM iptable_mangle vboxdrv(OE) bridge stp llc rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c snd_hrtimer cmac bnep binfmt_misc usblp dm_crypt snd_hda_codec_hdmi eeepc_wmi asus_wmi mxm_wmi sparse_keymap arc4 intel_rapl x86_pkg_temp_thermal uvcvideo intel_powerclamp videobuf2_vmalloc videobuf2_memops btusb kvm_intel btrtl btbcm btintel kvm videobuf2_v4l2
May 10 07:36:40 Hickory kernel: [224616.143640] irqbypass videobuf2_core bluetooth crct10dif_pclmul snd_usb_audio videodev crc32_pclmul joydev input_leds ghash_clmulni_intel media snd_usbmidi_lib pcbc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 snd_hda_core crypto_simd snd_hwdep glue_helper cryptd iwlmvm snd_pcm nvidia_uvm(POE) snd_seq_midi mac80211 snd_seq_midi_event snd_rawmidi intel_cstate intel_rapl_perf snd_seq iwlwifi snd_seq_device snd_timer cfg80211 snd mei_me mei soundcore lpc_ich shpchp tpm_infineon mac_hid wmi cuse coretemp parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj usbhid hid btrfs xor raid6_pq dm_mirror dm_region_hash dm_log uas usb_storage nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper firewire_ohci syscopyarea
May 10 07:36:40 Hickory kernel: [224616.143681] sysfillrect sysimgblt fb_sys_fops drm ahci firewire_core libahci r8169 crc_itu_t mii fjes video
May 10 07:36:40 Hickory kernel: [224616.143688] CPU: 6 PID: 4946 Comm: compiz Tainted: P D W OEL 4.10.0-20-generic #22-Ubuntu
May 10 07:36:40 Hickory kernel: [224616.143689] Hardware name: ASUS All Series/Z87-A, BIOS 1602 10/29/2013
May 10 07:36:40 Hickory kernel: [224616.143691] task: ffff9e0517ef9680 task.stack: ffffc26c43cfc000
May 10 07:36:40 Hickory kernel: [224616.143694] RIP: 0010:native_queued_spin_lock_slowpath+0x12b/0x1a0
May 10 07:36:40 Hickory kernel: [224616.143696] RSP: 0018:ffffc26c43cff628 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
May 10 07:36:40 Hickory kernel: [224616.143698] RAX: 0000000000000000 RBX: ffffe3ec8986faf0 RCX: ffff9e059ed99e40
May 10 07:36:40 Hickory kernel: [224616.143699] RDX: 00000000001c0101 RSI: 0000000000000101 RDI: ffffe3ec8986faf0
May 10 07:36:40 Hickory kernel: [224616.143700] RBP: ffffc26c43cff628 R08: 00000000001c0000 R09: 0000000000000000
May 10 07:36:40 Hickory kernel: [224616.143701] R10: 0000000000000000 R11: 0000000000000000 R12: ffffe3ec85b31580
May 10 07:36:40 Hickory kernel: [224616.143702] R13: ffffc26c43cff6a8 R14: ffff9e03e1beb1f0 R15: 0000000000000000
May 10 07:36:40 Hickory kernel: [224616.143704] FS: 00007f1e47daf780(0000) GS:ffff9e059ed80000(0000) knlGS:0000000000000000
May 10 07:36:40 Hickory kernel: [224616.143705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 07:36:40 Hickory kernel: [224616.143706] CR2: ffffffffff600000 CR3: 0000000397e3f000 CR4: 00000000001406e0
May 10 07:36:40 Hickory kernel: [224616.143708] Call Trace:
May 10 07:36:40 Hickory kernel: [224616.143711] _raw_spin_lock+0x20/0x30
May 10 07:36:40 Hickory kernel: [224616.143714] __page_check_address+0xdd/0x1c0
May 10 07:36:40 Hickory kernel: [224616.143716] try_to_unmap_one+0x81/0x630
May 10 07:36:40 Hickory kernel: [224616.143719] ? page_counter_uncharge+0x22/0x40
May 10 07:36:40 Hickory kernel: [224616.143721] rmap_walk_anon+0xde/0x270
May 10 07:36:40 Hickory kernel: [224616.143723] rmap_walk+0x48/0x60
May 10 07:36:40 Hickory kernel: [224616.143724] try_to_unmap+0x107/0x130
May 10 07:36:40 Hickory kernel: [224616.143726] ? page_remove_rmap+0x280/0x280
May 10 07:36:40 Hickory kernel: [224616.143727] ? __page_set_anon_rmap+0x70/0x70
May 10 07:36:40 Hickory kernel: [224616.143729] ? page_get_anon_vma+0x90/0x90
May 10 07:36:40 Hickory kernel: [224616.143731] ? invalid_mkclean_vma+0x20/0x20
May 10 07:36:40 Hickory kernel: [224616.143732] migrate_pages+0x9a3/0xbe0
May 10 07:36:40 Hickory kernel: [224616.143734] ? __ClearPageMovable+0x10/0x10
May 10 07:36:40 Hickory kernel: [224616.143736] ? isolate_freepages_block+0x390/0x390
May 10 07:36:40 Hickory kernel: [224616.143737] compact_zone+0x482/0x890
May 10 07:36:40 Hickory kernel: [224616.143740] ? ktime_get+0x41/0xb0
May 10 07:36:40 Hickory kernel: [224616.143741] compact_zone_order+0x90/0xb0
May 10 07:36:40 Hickory kernel: [224616.143743] try_to_compact_pages+0x1a2/0x260
May 10 07:36:40 Hickory kernel: [224616.143745] __alloc_pages_direct_compact+0x46/0xf0
May 10 07:36:40 Hickory kernel: [224616.143747] __alloc_pages_slowpath+0x49f/0xba0
May 10 07:36:40 Hickory kernel: [224616.143749] __alloc_pages_nodemask+0x209/0x260
May 10 07:36:40 Hickory kernel: [224616.143752] alloc_pages_current+0x95/0x140
May 10 07:36:40 Hickory kernel: [224616.143753] kmalloc_order+0x18/0x40
May 10 07:36:40 Hickory kernel: [224616.143755] kmalloc_order_trace+0x24/0xa0
May 10 07:36:40 Hickory kernel: [224616.143757] __kmalloc+0x1c7/0x1e0
May 10 07:36:40 Hickory kernel: [224616.143766] nvkms_alloc+0x27/0x60 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143777] _nv001951kms+0x1a/0x30 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143786] ? _nv001898kms+0x37/0xe10 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143788] ? kmalloc_order+0x18/0x40
May 10 07:36:40 Hickory kernel: [224616.143789] ? kmalloc_order_trace+0x24/0xa0
May 10 07:36:40 Hickory kernel: [224616.143791] ? __kmalloc+0x1c7/0x1e0
May 10 07:36:40 Hickory kernel: [224616.143793] ? __check_object_size+0x100/0x1d7
May 10 07:36:40 Hickory kernel: [224616.143796] ? _copy_from_user+0x4e/0x80
May 10 07:36:40 Hickory kernel: [224616.143814] ? _nv000319kms+0x40/0x40 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143821] ? _nv000171kms+0x31/0x40 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143829] ? nvKmsIoctl+0x163/0x1e0 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143837] ? nvkms_ioctl_common+0x45/0x80 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143845] ? nvkms_ioctl+0x71/0xa0 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143932] ? nvidia_frontend_compat_ioctl+0x40/0x50 [nvidia]
May 10 07:36:40 Hickory kernel: [224616.143998] ? nvidia_frontend_unlocked_ioctl+0xe/0x10 [nvidia]
May 10 07:36:40 Hickory kernel: [224616.144001] ? do_vfs_ioctl+0xa3/0x610
May 10 07:36:40 Hickory kernel: [224616.144002] ? __schedule+0xe4/0x6f0
May 10 07:36:40 Hickory kernel: [224616.144004] ? SyS_ioctl+0x79/0x90
May 10 07:36:40 Hickory kernel: [224616.144006] ? entry_SYSCALL_64_fastpath+0x1e/0xad
May 10 07:36:40 Hickory kernel: [224616.144007] Code: 48 03 04 d5 e0 53 d4 a1 48 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 <8b> 17 66 85 d2 75 f7 be 01 00 00 00 eb 10 89 d0 f0 0f b1 37 39
May 10 07:36:48 Hickory kernel: [224624.139731] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [Chrome_ChildThr:26728]
May 10 07:36:48 Hickory kernel: [224624.139744] Modules linked in: ccm pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) xt_CHECKSUM iptable_mangle vboxdrv(OE) bridge stp llc rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c snd_hrtimer cmac bnep binfmt_misc usblp dm_crypt snd_hda_codec_hdmi eeepc_wmi asus_wmi mxm_wmi sparse_keymap arc4 intel_rapl x86_pkg_temp_thermal uvcvideo intel_powerclamp videobuf2_vmalloc videobuf2_memops btusb kvm_intel btrtl btbcm btintel kvm videobuf2_v4l2
May 10 07:36:48 Hickory kernel: [224624.139779] irqbypass videobuf2_core bluetooth crct10dif_pclmul snd_usb_audio videodev crc32_pclmul joydev input_leds ghash_clmulni_intel media snd_usbmidi_lib pcbc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 snd_hda_core crypto_simd snd_hwdep glue_helper cryptd iwlmvm snd_pcm nvidia_uvm(POE) snd_seq_midi mac80211 snd_seq_midi_event snd_rawmidi intel_cstate intel_rapl_perf snd_seq iwlwifi snd_seq_device snd_timer cfg80211 snd mei_me mei soundcore lpc_ich shpchp tpm_infineon mac_hid wmi cuse coretemp parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj usbhid hid btrfs xor raid6_pq dm_mirror dm_region_hash dm_log uas usb_storage nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper firewire_ohci syscopyarea
May 10 07:36:48 Hickory kernel: [224624.139812] sysfillrect sysimgblt fb_sys_fops drm ahci firewire_core libahci r8169 crc_itu_t mii fjes video
May 10 07:36:48 Hickory kernel: [224624.139817] CPU: 4 PID: 26728 Comm: Chrome_ChildThr Tainted: P D W OEL 4.10.0-20-generic #22-Ubuntu
May 10 07:36:48 Hickory kernel: [224624.139818] Hardware name: ASUS All Series/Z87-A, BIOS 1602 10/29/2013
May 10 07:36:48 Hickory kernel: [224624.139819] task: ffff9e0509ef2d00 task.stack: ffffc26c4c554000
May 10 07:36:48 Hickory kernel: [224624.139822] RIP: 0010:native_queued_spin_lock_slowpath+0x17b/0x1a0
May 10 07:36:48 Hickory kernel: [224624.139822] RSP: 0000:ffffc26c4c557d48 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
May 10 07:36:48 Hickory kernel: [224624.139823] RAX: 00000000001c0101 RBX: ffffe3ec8986faf0 RCX: 0000000000000001
May 10 07:36:48 Hickory kernel: [224624.139824] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffe3ec8986faf0
May 10 07:36:48 Hickory kernel: [224624.139825] RBP: ffffc26c4c557d48 R08: 0000000000000101 R09: ffff9e053982ec80
May 10 07:36:48 Hickory kernel: [224624.139825] R10: 00007fad07f00618 R11: 00007fad07f00618 R12: ffff9e03e1beb008
May 10 07:36:48 Hickory kernel: [224624.139826] R13: 3e00000000355001 R14: ffffc26c4c557e30 R15: ffff9e0299d4c320
May 10 07:36:48 Hickory kernel: [224624.139827] FS: 00007facf69df700(0000) GS:ffff9e059ed00000(0000) knlGS:0000000000000000
May 10 07:36:48 Hickory kernel: [224624.139827] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 07:36:48 Hickory kernel: [224624.139828] CR2: 00007facab001280 CR3: 00000002961cd000 CR4: 00000000001406e0
May 10 07:36:48 Hickory kernel: [224624.139829] Call Trace:
May 10 07:36:48 Hickory kernel: [224624.139833] _raw_spin_lock+0x20/0x30
May 10 07:36:48 Hickory kernel: [224624.139836] __migration_entry_wait+0x1c/0x180
May 10 07:36:48 Hickory kernel: [224624.139836] migration_entry_wait+0x74/0x80
May 10 07:36:48 Hickory kernel: [224624.139839] do_swap_page+0x5b3/0x770
May 10 07:36:48 Hickory kernel: [224624.139841] ? ep_ptable_queue_proc+0xa0/0xa0
May 10 07:36:48 Hickory kernel: [224624.139842] handle_mm_fault+0x873/0x1360
May 10 07:36:48 Hickory kernel: [224624.139843] ? __seccomp_filter+0x67/0x250
May 10 07:36:48 Hickory kernel: [224624.139845] __do_page_fault+0x23e/0x4e0
May 10 07:36:48 Hickory kernel: [224624.139847] do_page_fault+0x22/0x30
May 10 07:36:48 Hickory kernel: [224624.139848] page_fault+0x28/0x30
May 10 07:36:48 Hickory kernel: [224624.139849] RIP: 0033:0x417bc0
May 10 07:36:48 Hickory kernel: [224624.139850] RSP: 002b:00007facf69de9b8 EFLAGS: 00010202
May 10 07:36:48 Hickory kernel: [224624.139851] RAX: 00007facaae01580 RBX: 00007facab001280 RCX: 00007facaac00bc1
May 10 07:36:48 Hickory kernel: [224624.139851] RDX: 00007facab001281 RSI: 00007faca5a00590 RDI: 00007fad07f00610
May 10 07:36:48 Hickory kernel: [224624.139852] RBP: 00007facaa700188 R08: 00000000ffffffff R09: 00007facab5016e8
May 10 07:36:48 Hickory kernel: [224624.139852] R10: 00007fad07f00618 R11: 00007fad07f00618 R12: 00007fad07f00608
May 10 07:36:48 Hickory kernel: [224624.139853] R13: 00000000000000b0 R14: 00007fad07f00048 R15: 00007fad07f00600
May 10 07:36:48 Hickory kernel: [224624.139854] Code: c0 74 e6 4d 85 c9 c6 07 01 74 30 41 c7 41 08 01 00 00 00 e9 52 ff ff ff 83 fa 01 0f 84 b0 fe ff ff 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f3 90 4c 8b 09 4d 85

I run BOINC when I'm not at my computer, which will often use a lot of CPU/GPU/RAM, but it's limited in what it can use, and my computer is very powerful.

I've also seen instances where individual applications will lock up suddenly, and I am unable to kill them because they are waiting on an uninterruptable communication with the CPU. The only recourse I have found in those cases has also been to reboot.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
Revision history for this message
pdecat (pdecat) wrote :

I add similar repeated error messages at the end of logs, but the very first error was

> kernel: kernel BUG at /build/linux-lz1RHE/linux-4.10.0/include/linux/swapops.h:129!

cf. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838

Revision history for this message
Francois Thirioux (fthx) wrote :

I hit the same bug in Artful.

no longer affects: duplicity
Revision history for this message
Gordon P. Hemsley (gphemsley) wrote :

This bug stopped occurring when I installed the kernel update mentioned in bug 1674838.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.