Comment 24 for bug 1530405

Revision history for this message
Keith Burns (alagalah) wrote : Re: [Bug 1530405] Re: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kerneloops:814]

Found that root cause on my rig was low end GPU and nouveau driver. Vendor
upgraded my GPU and put on Nvidia driver and its been perfect ever since.
NOTE: nothing in logs indicates a video issue.

On Mon, Aug 22, 2016, 3:45 AM Jimmy Pan <dspjmr@163.com> wrote:

> This problem is driving me crazy. Looks like this is related to the
> nvidia driver. You can shut down the system with 3rd party nvidia driver
> installed in recovery x mode. However, you cannot enter normal mode at
> all.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1530405
>
> Title:
> NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kerneloops:814]
>
> Status in linux package in Ubuntu:
> Triaged
>
> Bug description:
> I'm using Ubuntu Xenial 16.04 and my computer (ASUS M32BF) will
> randomly freeze up, sometimes before the login screen, sometimes while
> I'm in the middle of using a program. This sometimes happens on the
> Wily 15.10 live cd as well, and on both kernel 4.3.0-2, and kernel
> 4.2.0-22.
>
> Important part of log:
>
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112129] NMI watchdog: BUG:
> soft lockup - CPU#0 stuck for 22s! [kerneloops:814]
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112134] Modules linked in:
> rfcomm bnep nls_iso8859_1 kvm_amd kvm eeepc_wmi asus_wmi crct10dif_pclmul
> sparse_keymap crc32_pclmul aesni_intel aes_x86_64 arc4 lrw gf128mul
> rtl8821ae glue_helper snd_hda_codec_realtek ablk_helper
> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel btcoexist
> snd_hda_codec rtl_pci snd_hda_core joydev input_leds snd_hwdep rtlwifi
> snd_pcm fam15h_power cryptd snd_seq_midi serio_raw snd_seq_midi_event
> snd_rawmidi mac80211 snd_seq snd_seq_device snd_timer cfg80211 btusb btrtl
> btbcm btintel bluetooth snd soundcore edac_mce_amd k10temp edac_core
> i2c_piix4 shpchp mac_hid parport_pc ppdev lp parport autofs4
> hid_logitech_hidpp uas usb_storage hid_logitech_dj usbhid hid amdkfd
> amd_iommu_v2 radeon i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect
> sysimgblt fb_sys_fops r8169 psmouse mii drm ahci libahci wmi fjes video
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112183] CPU: 0 PID: 814
> Comm: kerneloops Not tainted 4.3.0-2-generic #11-Ubuntu
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112184] Hardware name:
> ASUSTeK COMPUTER INC. K30BF_M32BF_A_F_K31BF_6/K30BF_M32BF_A_F_K31BF_6, BIOS
> 0501 07/09/2015
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112186] task:
> ffff88031146d400 ti: ffff88030e838000 task.ti: ffff88030e838000
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112188] RIP:
> 0010:[<ffffffff810819d6>] [<ffffffff810819d6>] __do_softirq+0x76/0x250
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112194] RSP:
> 0018:ffff88031fc03f30 EFLAGS: 00000202
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112196] RAX:
> ffff88030e83c000 RBX: 0000000000000000 RCX: 0000000040400040
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112197] RDX:
> 0000000000000000 RSI: 000000000000613e RDI: 0000000000000380
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112198] RBP:
> ffff88031fc03f80 R08: 00000029f8fa1411 R09: ffff88031fc169f0
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112199] R10:
> 0000000000000020 R11: 0000000000000004 R12: ffff88031fc169c0
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112201] R13:
> ffff88030df8e200 R14: 0000000000000000 R15: 0000000000000001
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112202] FS:
> 00007f6c266ac880(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112203] CS: 0010 DS: 0000
> ES: 0000 CR0: 0000000080050033
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112205] CR2:
> 00007ffef000bff8 CR3: 000000030f368000 CR4: 00000000000406f0
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112206] Stack:
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112207] 404000401fc03f78
> ffff88030e83c000 00000000ffff0121 ffff88030000000a
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112209] 000000021fc0d640
> 0000000000000000 ffff88031fc169c0 ffff88030df8e200
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112211] 0000000000000000
> 0000000000000001 ffff88031fc03f90 ffffffff81081d23
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112213] Call Trace:
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112215] <IRQ>
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112219]
> [<ffffffff81081d23>] irq_exit+0xa3/0xb0
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112222]
> [<ffffffff817fda02>] smp_apic_timer_interrupt+0x42/0x50
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112225]
> [<ffffffff817fb862>] apic_timer_interrupt+0x82/0x90
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112226] <EOI>
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112230]
> [<ffffffff810a5457>] ? finish_task_switch+0x67/0x1c0
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112232]
> [<ffffffff817f645c>] __schedule+0x36c/0x980
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112234]
> [<ffffffff817f6aa3>] schedule+0x33/0x80
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112236]
> [<ffffffff817f9e4f>] do_nanosleep+0x6f/0xf0
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112239]
> [<ffffffff810ea59c>] hrtimer_nanosleep+0xdc/0x1f0
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112241]
> [<ffffffff810e9500>] ? __hrtimer_init+0x90/0x90
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112243]
> [<ffffffff817f9e3a>] ? do_nanosleep+0x5a/0xf0
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112245]
> [<ffffffff810ea72a>] SyS_nanosleep+0x7a/0x90
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112247]
> [<ffffffff817faaf2>] entry_SYSCALL_64_fastpath+0x16/0x71
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112248] Code: 45 d4 89 4d b4
> 65 48 8b 04 25 c4 3e 01 00 c7 45 c8 0a 00 00 00 48 89 45 b8 65 c7 05 31 24
> f9 7e 00 00 00 00 fb 66 0f 1f 44 00 00 <b8> ff ff ff ff 49 c7 c4 c0 b0 e0
> 81 0f bc 45 d4 83 c0 01 89 45
>
> This soft lockup happens either in kerneloops or swapper/0.
>
> This might have something to do with networking, because the soft
> lockups appear happen immediately before or after some network-related
> stuff. nm-applet says NetworkManager is not running, but "service
> network-manager status" says it is. "service network-manager stop"
> does not work, and I need to "kill -9" the pids for the processes.
> After starting the service after killing it, it the nm-applet's "Edit
> Connection" works for a few moments, then it won't delete any
> connections, and when closed, it won't re-open. At then end "kill -9"
> won't even work anymore (the processes get parented to init, but do
> not die). Usually, however, I never even see the login screen.
>
> I've been able to boot into the Wily livecd by using Windows 10 ->
> Shift-Restart -> UEFI Settings -> then booting my USB with the livecd
> (often soft lockups without going through windows).
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1530405/+subscriptions
>