System locks up unrecoverably after CPU soft lockup

Bug #1689951 reported by Gordon P. Hemsley on 2017-05-10
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Compiz
Undecided
Unassigned
Nvidia
Undecided
Unassigned
linux
New
Undecided
Unassigned
systemd
New
Undecided
Unassigned
Ubuntu
Undecided
Unassigned

Bug Description

I recently updated from 16.04 LTS to 17.04.

Ever since then, I have had repeated situations were the system will lock up (often while asleep) in such a way that it no longer responds to any kind of input and must be hard rebooted.

When checking the syslog, I see repeated instances of messages like this:

May 10 07:36:40 Hickory kernel: [224616.143587] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 23s! [compiz:4946]
May 10 07:36:40 Hickory kernel: [224616.143590] Modules linked in: ccm pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) xt_CHECKSUM iptable_mangle vboxdrv(OE) bridge stp llc rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c snd_hrtimer cmac bnep binfmt_misc usblp dm_crypt snd_hda_codec_hdmi eeepc_wmi asus_wmi mxm_wmi sparse_keymap arc4 intel_rapl x86_pkg_temp_thermal uvcvideo intel_powerclamp videobuf2_vmalloc videobuf2_memops btusb kvm_intel btrtl btbcm btintel kvm videobuf2_v4l2
May 10 07:36:40 Hickory kernel: [224616.143640] irqbypass videobuf2_core bluetooth crct10dif_pclmul snd_usb_audio videodev crc32_pclmul joydev input_leds ghash_clmulni_intel media snd_usbmidi_lib pcbc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 snd_hda_core crypto_simd snd_hwdep glue_helper cryptd iwlmvm snd_pcm nvidia_uvm(POE) snd_seq_midi mac80211 snd_seq_midi_event snd_rawmidi intel_cstate intel_rapl_perf snd_seq iwlwifi snd_seq_device snd_timer cfg80211 snd mei_me mei soundcore lpc_ich shpchp tpm_infineon mac_hid wmi cuse coretemp parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj usbhid hid btrfs xor raid6_pq dm_mirror dm_region_hash dm_log uas usb_storage nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper firewire_ohci syscopyarea
May 10 07:36:40 Hickory kernel: [224616.143681] sysfillrect sysimgblt fb_sys_fops drm ahci firewire_core libahci r8169 crc_itu_t mii fjes video
May 10 07:36:40 Hickory kernel: [224616.143688] CPU: 6 PID: 4946 Comm: compiz Tainted: P D W OEL 4.10.0-20-generic #22-Ubuntu
May 10 07:36:40 Hickory kernel: [224616.143689] Hardware name: ASUS All Series/Z87-A, BIOS 1602 10/29/2013
May 10 07:36:40 Hickory kernel: [224616.143691] task: ffff9e0517ef9680 task.stack: ffffc26c43cfc000
May 10 07:36:40 Hickory kernel: [224616.143694] RIP: 0010:native_queued_spin_lock_slowpath+0x12b/0x1a0
May 10 07:36:40 Hickory kernel: [224616.143696] RSP: 0018:ffffc26c43cff628 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
May 10 07:36:40 Hickory kernel: [224616.143698] RAX: 0000000000000000 RBX: ffffe3ec8986faf0 RCX: ffff9e059ed99e40
May 10 07:36:40 Hickory kernel: [224616.143699] RDX: 00000000001c0101 RSI: 0000000000000101 RDI: ffffe3ec8986faf0
May 10 07:36:40 Hickory kernel: [224616.143700] RBP: ffffc26c43cff628 R08: 00000000001c0000 R09: 0000000000000000
May 10 07:36:40 Hickory kernel: [224616.143701] R10: 0000000000000000 R11: 0000000000000000 R12: ffffe3ec85b31580
May 10 07:36:40 Hickory kernel: [224616.143702] R13: ffffc26c43cff6a8 R14: ffff9e03e1beb1f0 R15: 0000000000000000
May 10 07:36:40 Hickory kernel: [224616.143704] FS: 00007f1e47daf780(0000) GS:ffff9e059ed80000(0000) knlGS:0000000000000000
May 10 07:36:40 Hickory kernel: [224616.143705] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 07:36:40 Hickory kernel: [224616.143706] CR2: ffffffffff600000 CR3: 0000000397e3f000 CR4: 00000000001406e0
May 10 07:36:40 Hickory kernel: [224616.143708] Call Trace:
May 10 07:36:40 Hickory kernel: [224616.143711] _raw_spin_lock+0x20/0x30
May 10 07:36:40 Hickory kernel: [224616.143714] __page_check_address+0xdd/0x1c0
May 10 07:36:40 Hickory kernel: [224616.143716] try_to_unmap_one+0x81/0x630
May 10 07:36:40 Hickory kernel: [224616.143719] ? page_counter_uncharge+0x22/0x40
May 10 07:36:40 Hickory kernel: [224616.143721] rmap_walk_anon+0xde/0x270
May 10 07:36:40 Hickory kernel: [224616.143723] rmap_walk+0x48/0x60
May 10 07:36:40 Hickory kernel: [224616.143724] try_to_unmap+0x107/0x130
May 10 07:36:40 Hickory kernel: [224616.143726] ? page_remove_rmap+0x280/0x280
May 10 07:36:40 Hickory kernel: [224616.143727] ? __page_set_anon_rmap+0x70/0x70
May 10 07:36:40 Hickory kernel: [224616.143729] ? page_get_anon_vma+0x90/0x90
May 10 07:36:40 Hickory kernel: [224616.143731] ? invalid_mkclean_vma+0x20/0x20
May 10 07:36:40 Hickory kernel: [224616.143732] migrate_pages+0x9a3/0xbe0
May 10 07:36:40 Hickory kernel: [224616.143734] ? __ClearPageMovable+0x10/0x10
May 10 07:36:40 Hickory kernel: [224616.143736] ? isolate_freepages_block+0x390/0x390
May 10 07:36:40 Hickory kernel: [224616.143737] compact_zone+0x482/0x890
May 10 07:36:40 Hickory kernel: [224616.143740] ? ktime_get+0x41/0xb0
May 10 07:36:40 Hickory kernel: [224616.143741] compact_zone_order+0x90/0xb0
May 10 07:36:40 Hickory kernel: [224616.143743] try_to_compact_pages+0x1a2/0x260
May 10 07:36:40 Hickory kernel: [224616.143745] __alloc_pages_direct_compact+0x46/0xf0
May 10 07:36:40 Hickory kernel: [224616.143747] __alloc_pages_slowpath+0x49f/0xba0
May 10 07:36:40 Hickory kernel: [224616.143749] __alloc_pages_nodemask+0x209/0x260
May 10 07:36:40 Hickory kernel: [224616.143752] alloc_pages_current+0x95/0x140
May 10 07:36:40 Hickory kernel: [224616.143753] kmalloc_order+0x18/0x40
May 10 07:36:40 Hickory kernel: [224616.143755] kmalloc_order_trace+0x24/0xa0
May 10 07:36:40 Hickory kernel: [224616.143757] __kmalloc+0x1c7/0x1e0
May 10 07:36:40 Hickory kernel: [224616.143766] nvkms_alloc+0x27/0x60 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143777] _nv001951kms+0x1a/0x30 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143786] ? _nv001898kms+0x37/0xe10 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143788] ? kmalloc_order+0x18/0x40
May 10 07:36:40 Hickory kernel: [224616.143789] ? kmalloc_order_trace+0x24/0xa0
May 10 07:36:40 Hickory kernel: [224616.143791] ? __kmalloc+0x1c7/0x1e0
May 10 07:36:40 Hickory kernel: [224616.143793] ? __check_object_size+0x100/0x1d7
May 10 07:36:40 Hickory kernel: [224616.143796] ? _copy_from_user+0x4e/0x80
May 10 07:36:40 Hickory kernel: [224616.143814] ? _nv000319kms+0x40/0x40 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143821] ? _nv000171kms+0x31/0x40 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143829] ? nvKmsIoctl+0x163/0x1e0 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143837] ? nvkms_ioctl_common+0x45/0x80 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143845] ? nvkms_ioctl+0x71/0xa0 [nvidia_modeset]
May 10 07:36:40 Hickory kernel: [224616.143932] ? nvidia_frontend_compat_ioctl+0x40/0x50 [nvidia]
May 10 07:36:40 Hickory kernel: [224616.143998] ? nvidia_frontend_unlocked_ioctl+0xe/0x10 [nvidia]
May 10 07:36:40 Hickory kernel: [224616.144001] ? do_vfs_ioctl+0xa3/0x610
May 10 07:36:40 Hickory kernel: [224616.144002] ? __schedule+0xe4/0x6f0
May 10 07:36:40 Hickory kernel: [224616.144004] ? SyS_ioctl+0x79/0x90
May 10 07:36:40 Hickory kernel: [224616.144006] ? entry_SYSCALL_64_fastpath+0x1e/0xad
May 10 07:36:40 Hickory kernel: [224616.144007] Code: 48 03 04 d5 e0 53 d4 a1 48 89 08 8b 41 08 85 c0 75 09 f3 90 8b 41 08 85 c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 <8b> 17 66 85 d2 75 f7 be 01 00 00 00 eb 10 89 d0 f0 0f b1 37 39
May 10 07:36:48 Hickory kernel: [224624.139731] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [Chrome_ChildThr:26728]
May 10 07:36:48 Hickory kernel: [224624.139744] Modules linked in: ccm pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) xt_CHECKSUM iptable_mangle vboxdrv(OE) bridge stp llc rfcomm ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c snd_hrtimer cmac bnep binfmt_misc usblp dm_crypt snd_hda_codec_hdmi eeepc_wmi asus_wmi mxm_wmi sparse_keymap arc4 intel_rapl x86_pkg_temp_thermal uvcvideo intel_powerclamp videobuf2_vmalloc videobuf2_memops btusb kvm_intel btrtl btbcm btintel kvm videobuf2_v4l2
May 10 07:36:48 Hickory kernel: [224624.139779] irqbypass videobuf2_core bluetooth crct10dif_pclmul snd_usb_audio videodev crc32_pclmul joydev input_leds ghash_clmulni_intel media snd_usbmidi_lib pcbc snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel aesni_intel snd_hda_codec aes_x86_64 snd_hda_core crypto_simd snd_hwdep glue_helper cryptd iwlmvm snd_pcm nvidia_uvm(POE) snd_seq_midi mac80211 snd_seq_midi_event snd_rawmidi intel_cstate intel_rapl_perf snd_seq iwlwifi snd_seq_device snd_timer cfg80211 snd mei_me mei soundcore lpc_ich shpchp tpm_infineon mac_hid wmi cuse coretemp parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_logitech_hidpp hid_logitech_dj usbhid hid btrfs xor raid6_pq dm_mirror dm_region_hash dm_log uas usb_storage nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper firewire_ohci syscopyarea
May 10 07:36:48 Hickory kernel: [224624.139812] sysfillrect sysimgblt fb_sys_fops drm ahci firewire_core libahci r8169 crc_itu_t mii fjes video
May 10 07:36:48 Hickory kernel: [224624.139817] CPU: 4 PID: 26728 Comm: Chrome_ChildThr Tainted: P D W OEL 4.10.0-20-generic #22-Ubuntu
May 10 07:36:48 Hickory kernel: [224624.139818] Hardware name: ASUS All Series/Z87-A, BIOS 1602 10/29/2013
May 10 07:36:48 Hickory kernel: [224624.139819] task: ffff9e0509ef2d00 task.stack: ffffc26c4c554000
May 10 07:36:48 Hickory kernel: [224624.139822] RIP: 0010:native_queued_spin_lock_slowpath+0x17b/0x1a0
May 10 07:36:48 Hickory kernel: [224624.139822] RSP: 0000:ffffc26c4c557d48 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
May 10 07:36:48 Hickory kernel: [224624.139823] RAX: 00000000001c0101 RBX: ffffe3ec8986faf0 RCX: 0000000000000001
May 10 07:36:48 Hickory kernel: [224624.139824] RDX: 0000000000000101 RSI: 0000000000000001 RDI: ffffe3ec8986faf0
May 10 07:36:48 Hickory kernel: [224624.139825] RBP: ffffc26c4c557d48 R08: 0000000000000101 R09: ffff9e053982ec80
May 10 07:36:48 Hickory kernel: [224624.139825] R10: 00007fad07f00618 R11: 00007fad07f00618 R12: ffff9e03e1beb008
May 10 07:36:48 Hickory kernel: [224624.139826] R13: 3e00000000355001 R14: ffffc26c4c557e30 R15: ffff9e0299d4c320
May 10 07:36:48 Hickory kernel: [224624.139827] FS: 00007facf69df700(0000) GS:ffff9e059ed00000(0000) knlGS:0000000000000000
May 10 07:36:48 Hickory kernel: [224624.139827] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 07:36:48 Hickory kernel: [224624.139828] CR2: 00007facab001280 CR3: 00000002961cd000 CR4: 00000000001406e0
May 10 07:36:48 Hickory kernel: [224624.139829] Call Trace:
May 10 07:36:48 Hickory kernel: [224624.139833] _raw_spin_lock+0x20/0x30
May 10 07:36:48 Hickory kernel: [224624.139836] __migration_entry_wait+0x1c/0x180
May 10 07:36:48 Hickory kernel: [224624.139836] migration_entry_wait+0x74/0x80
May 10 07:36:48 Hickory kernel: [224624.139839] do_swap_page+0x5b3/0x770
May 10 07:36:48 Hickory kernel: [224624.139841] ? ep_ptable_queue_proc+0xa0/0xa0
May 10 07:36:48 Hickory kernel: [224624.139842] handle_mm_fault+0x873/0x1360
May 10 07:36:48 Hickory kernel: [224624.139843] ? __seccomp_filter+0x67/0x250
May 10 07:36:48 Hickory kernel: [224624.139845] __do_page_fault+0x23e/0x4e0
May 10 07:36:48 Hickory kernel: [224624.139847] do_page_fault+0x22/0x30
May 10 07:36:48 Hickory kernel: [224624.139848] page_fault+0x28/0x30
May 10 07:36:48 Hickory kernel: [224624.139849] RIP: 0033:0x417bc0
May 10 07:36:48 Hickory kernel: [224624.139850] RSP: 002b:00007facf69de9b8 EFLAGS: 00010202
May 10 07:36:48 Hickory kernel: [224624.139851] RAX: 00007facaae01580 RBX: 00007facab001280 RCX: 00007facaac00bc1
May 10 07:36:48 Hickory kernel: [224624.139851] RDX: 00007facab001281 RSI: 00007faca5a00590 RDI: 00007fad07f00610
May 10 07:36:48 Hickory kernel: [224624.139852] RBP: 00007facaa700188 R08: 00000000ffffffff R09: 00007facab5016e8
May 10 07:36:48 Hickory kernel: [224624.139852] R10: 00007fad07f00618 R11: 00007fad07f00618 R12: 00007fad07f00608
May 10 07:36:48 Hickory kernel: [224624.139853] R13: 00000000000000b0 R14: 00007fad07f00048 R15: 00007fad07f00600
May 10 07:36:48 Hickory kernel: [224624.139854] Code: c0 74 e6 4d 85 c9 c6 07 01 74 30 41 c7 41 08 01 00 00 00 e9 52 ff ff ff 83 fa 01 0f 84 b0 fe ff ff 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 66 89 07 5d c3 f3 90 4c 8b 09 4d 85

I run BOINC when I'm not at my computer, which will often use a lot of CPU/GPU/RAM, but it's limited in what it can use, and my computer is very powerful.

I've also seen instances where individual applications will lock up suddenly, and I am unable to kill them because they are waiting on an uninterruptable communication with the CPU. The only recourse I have found in those cases has also been to reboot.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu:
status: New → Confirmed
pdecat (pdecat) wrote :

I add similar repeated error messages at the end of logs, but the very first error was

> kernel: kernel BUG at /build/linux-lz1RHE/linux-4.10.0/include/linux/swapops.h:129!

cf. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1674838

Francois Thirioux (fthx) wrote :

I hit the same bug in Artful.

no longer affects: duplicity
Gordon P. Hemsley (gphemsley) wrote :

This bug stopped occurring when I installed the kernel update mentioned in bug 1674838.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers