NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kerneloops:814]

Bug #1530405 reported by CrystalMageX
256
This bug affects 53 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Triaged
High
Unassigned

Bug Description

I'm using Ubuntu Xenial 16.04 and my computer (ASUS M32BF) will randomly freeze up, sometimes before the login screen, sometimes while I'm in the middle of using a program. This sometimes happens on the Wily 15.10 live cd as well, and on both kernel 4.3.0-2, and kernel 4.2.0-22.

Important part of log:

Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112129] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kerneloops:814]
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112134] Modules linked in: rfcomm bnep nls_iso8859_1 kvm_amd kvm eeepc_wmi asus_wmi crct10dif_pclmul sparse_keymap crc32_pclmul aesni_intel aes_x86_64 arc4 lrw gf128mul rtl8821ae glue_helper snd_hda_codec_realtek ablk_helper snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel btcoexist snd_hda_codec rtl_pci snd_hda_core joydev input_leds snd_hwdep rtlwifi snd_pcm fam15h_power cryptd snd_seq_midi serio_raw snd_seq_midi_event snd_rawmidi mac80211 snd_seq snd_seq_device snd_timer cfg80211 btusb btrtl btbcm btintel bluetooth snd soundcore edac_mce_amd k10temp edac_core i2c_piix4 shpchp mac_hid parport_pc ppdev lp parport autofs4 hid_logitech_hidpp uas usb_storage hid_logitech_dj usbhid hid amdkfd amd_iommu_v2 radeon i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops r8169 psmouse mii drm ahci libahci wmi fjes video
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112183] CPU: 0 PID: 814 Comm: kerneloops Not tainted 4.3.0-2-generic #11-Ubuntu
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112184] Hardware name: ASUSTeK COMPUTER INC. K30BF_M32BF_A_F_K31BF_6/K30BF_M32BF_A_F_K31BF_6, BIOS 0501 07/09/2015
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112186] task: ffff88031146d400 ti: ffff88030e838000 task.ti: ffff88030e838000
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112188] RIP: 0010:[<ffffffff810819d6>] [<ffffffff810819d6>] __do_softirq+0x76/0x250
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112194] RSP: 0018:ffff88031fc03f30 EFLAGS: 00000202
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112196] RAX: ffff88030e83c000 RBX: 0000000000000000 RCX: 0000000040400040
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112197] RDX: 0000000000000000 RSI: 000000000000613e RDI: 0000000000000380
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112198] RBP: ffff88031fc03f80 R08: 00000029f8fa1411 R09: ffff88031fc169f0
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112199] R10: 0000000000000020 R11: 0000000000000004 R12: ffff88031fc169c0
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112201] R13: ffff88030df8e200 R14: 0000000000000000 R15: 0000000000000001
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112202] FS: 00007f6c266ac880(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112205] CR2: 00007ffef000bff8 CR3: 000000030f368000 CR4: 00000000000406f0
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112206] Stack:
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112207] 404000401fc03f78 ffff88030e83c000 00000000ffff0121 ffff88030000000a
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112209] 000000021fc0d640 0000000000000000 ffff88031fc169c0 ffff88030df8e200
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112211] 0000000000000000 0000000000000001 ffff88031fc03f90 ffffffff81081d23
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112213] Call Trace:
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112215] <IRQ>
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112219] [<ffffffff81081d23>] irq_exit+0xa3/0xb0
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112222] [<ffffffff817fda02>] smp_apic_timer_interrupt+0x42/0x50
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112225] [<ffffffff817fb862>] apic_timer_interrupt+0x82/0x90
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112226] <EOI>
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112230] [<ffffffff810a5457>] ? finish_task_switch+0x67/0x1c0
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112232] [<ffffffff817f645c>] __schedule+0x36c/0x980
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112234] [<ffffffff817f6aa3>] schedule+0x33/0x80
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112236] [<ffffffff817f9e4f>] do_nanosleep+0x6f/0xf0
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112239] [<ffffffff810ea59c>] hrtimer_nanosleep+0xdc/0x1f0
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112241] [<ffffffff810e9500>] ? __hrtimer_init+0x90/0x90
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112243] [<ffffffff817f9e3a>] ? do_nanosleep+0x5a/0xf0
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112245] [<ffffffff810ea72a>] SyS_nanosleep+0x7a/0x90
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112247] [<ffffffff817faaf2>] entry_SYSCALL_64_fastpath+0x16/0x71
Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112248] Code: 45 d4 89 4d b4 65 48 8b 04 25 c4 3e 01 00 c7 45 c8 0a 00 00 00 48 89 45 b8 65 c7 05 31 24 f9 7e 00 00 00 00 fb 66 0f 1f 44 00 00 <b8> ff ff ff ff 49 c7 c4 c0 b0 e0 81 0f bc 45 d4 83 c0 01 89 45

This soft lockup happens either in kerneloops or swapper/0.

This might have something to do with networking, because the soft lockups appear happen immediately before or after some network-related stuff. nm-applet says NetworkManager is not running, but "service network-manager status" says it is. "service network-manager stop" does not work, and I need to "kill -9" the pids for the processes. After starting the service after killing it, it the nm-applet's "Edit Connection" works for a few moments, then it won't delete any connections, and when closed, it won't re-open. At then end "kill -9" won't even work anymore (the processes get parented to init, but do not die). Usually, however, I never even see the login screen.

I've been able to boot into the Wily livecd by using Windows 10 -> Shift-Restart -> UEFI Settings -> then booting my USB with the livecd (often soft lockups without going through windows).

Revision history for this message
CrystalMageX (crystalmagex) wrote :
Revision history for this message
CrystalMageX (crystalmagex) wrote :

Output of "lspci -vnvn":

Revision history for this message
CrystalMageX (crystalmagex) wrote :

Output of "uname -a" :

Revision history for this message
CrystalMageX (crystalmagex) wrote :

Output of "cat /proc/version_signature" :

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1530405

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
CrystalMageX (crystalmagex) wrote :

Due to the nature of the issue, I am unable to run the command "apport-collect", because I cannot get into my desktop or tty. However, I have somehow managed to get into recovery mode without it becoming unusable, and unfortunately "apport-collect" requires internet to run, and recovery mode doesn't have networking. Does a "apport-bug linux --save bug.apport" work instead?

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: amd64 wily
tags: added: xenial
Changed in linux (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.4 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc7-wily

Changed in linux (Ubuntu):
importance: Critical → High
status: Confirmed → Incomplete
Revision history for this message
CrystalMageX (crystalmagex) wrote :

This issue has always been happening, even on the Wily 15.10 live cd. Since I installed Ubuntu from the 15.10 disk, and then upgraded to Xenial 16.04, there have not been any kernel versions that do not work, both kernel 4.3.0-2, and kernel 4.2.0-22 hang after a while.

Revision history for this message
CrystalMageX (crystalmagex) wrote :

With mainline kernel build "Linux version 4.4.0-040400rc7-generic (kernel@tangerine) (gcc version 5.2.1 20151010 (Ubuntu 5.2.1-22ubuntu2) ) #201512272230 SMP Mon Dec 28 03:32:16 UTC 2015" (http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc7-wily), the issue seems to be resolved. No hangs, crashes, glitches, or anything out of the ordinary has occurred on this kernel for about an hour now.

tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

CrystalMageX, the next step is to fully reverse commit bisect from kernel 4.3 to 4.4-rc7 in order to identify the last bad commit, followed immediately by the first good one. Once this commit has been identified, then it may be reviewed as a candidate for backporting into your release. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection#How_do_I_reverse_bisect_the_upstream_kernel.3F ?

Please note, finding adjacent kernel versions is not fully commit bisecting.

After the fix commit (not kernel version) has been identified, then please mark this report Status Confirmed.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

tags: added: kernel-fixed-upstream-4.4-rc7
tags: added: needs-reverse-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
penalvch (penalvch) wrote :

Joseph Salisbury, given this is now marked Triaged, could you please advise to the fix commit as requested in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1530405/comments/10 ?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The bug has been marked as 'Triaged' because there is a fix available upstream. Even though the fix has not been identified yet, marking the bug incomplete will cause it to expire.

tags: added: kernel-da-key
Revision history for this message
Marc Branchaud (marcnarc) wrote :
Download full text (6.5 KiB)

I'm still seeing this problem in stock amd64 16.04 (kernel 4.4.0-21):

Apr 28 13:46:37 rincewind kernel: [ 108.345612] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [QQmlThread:1159]
Apr 28 13:46:37 rincewind kernel: [ 108.347049] Modules linked in: pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) ftdi_sio pl2303 usbserial snd_usb_audio snd_usbmidi_lib snd_hda_codec_hdmi nvidia_uvm(POE) coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep kvm snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd gpio_ich dcdbas dell_smm_hwmon input_leds shpchp soundcore serio_raw mei_me mei irqbypass i7core_edac 8250_fintek edac_core lpc_ich mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid broadcom bcm_phy_lib nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops tg3 drm ahci ptp libahci pps_core fjes
Apr 28 13:46:37 rincewind kernel: [ 108.347082] CPU: 0 PID: 1159 Comm: QQmlThread Tainted: P OE 4.4.0-21-generic #37-Ubuntu
Apr 28 13:46:37 rincewind kernel: [ 108.347084] Hardware name: Dell Inc. Vostro 430/054KM3, BIOS 2.2.0 07/06/2010
Apr 28 13:46:37 rincewind kernel: [ 108.347085] task: ffff8804264b8000 ti: ffff880429d98000 task.ti: ffff880429d98000
Apr 28 13:46:37 rincewind kernel: [ 108.347086] RIP: 0010:[<ffffffff81823f35>] [<ffffffff81823f35>] _raw_spin_unlock_irqrestore+0x15/0x20
Apr 28 13:46:37 rincewind kernel: [ 108.347092] RSP: 0018:ffff880429d9b910 EFLAGS: 00000282
Apr 28 13:46:37 rincewind kernel: [ 108.347093] RAX: 0000000000000009 RBX: 0000000000000000 RCX: 0000000000000051
Apr 28 13:46:37 rincewind kernel: [ 108.347094] RDX: ffffffffc0621d16 RSI: 0000000000000282 RDI: 0000000000000282
Apr 28 13:46:37 rincewind kernel: [ 108.347095] RBP: ffff880429d9b910 R08: 0000000000000000 R09: 0000000000000020
Apr 28 13:46:37 rincewind kernel: [ 108.347096] R10: ffff880427f9df18 R11: ffffffffc0651de0 R12: ffffffffc0a5a638
Apr 28 13:46:37 rincewind kernel: [ 108.347097] R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000000001
Apr 28 13:46:37 rincewind kernel: [ 108.347099] FS: 00007f10badef700(0000) GS:ffff88043fc00000(0000) knlGS:0000000000000000
Apr 28 13:46:37 rincewind kernel: [ 108.347100] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 28 13:46:37 rincewind kernel: [ 108.347101] CR2: 0000000001154a70 CR3: 0000000001e0a000 CR4: 00000000000006f0
Apr 28 13:46:37 rincewind kernel: [ 108.347102] Stack:
Apr 28 13:46:37 rincewind kernel: [ 108.347103] ffff880429d9b920 ffffffffc015ee4a ffff880427f9df40 ffffffffc063b959
Apr 28 13:46:37 rincewind kernel: [ 108.347104] ffff8800b9034008 ffff880429064008 0000000000000000 ffff88042ac1c2a0
Apr 28 13:46:37 rincewind kernel: [ 108.347106] 00000000c1d00060 ffffffffc0621d16 ffffffffc09b8b08 00000000c1d00060
Apr 28 13:46:37 rincewind kernel: [ 108.347108] Call Trace:
Apr 28 13:46:37 rincewind kernel: [ 108.347209] [<ffffffffc015ee4a>] os_release_spinlock+0x1a/0x20 [nvidia]
Apr 28 13:46:37 rincewind kernel: [ 108.347295] [<ffffffffc063b959>] _nv016600rm+0x5f9/0x6e0 [nvidia]
Apr 28...

Read more...

Revision history for this message
Marc Branchaud (marcnarc) wrote :

The NMI messages went away after I upgraded the kernel to 4.5.2-040502 and the nVidia driver to 364.19.

However, I was still experiencing intermittent lockups.

Those went away after I disabled KDE's File Indexing (turning off baloo-file-indexer). I think that maybe my original NMI problems could also have stemmed from that.

Dang, I've had to disable file indexing after upgrading in the past, but I completely forgot about it this time around. Spent a week trying to track down this problem. Yeesh!

I've taken my complaints to bug #1548051. Sorry for the noise...

Revision history for this message
odror (ozdror) wrote :

I am having the same issue.
Nvidia gtx 980 and i7-5820k

It was not as much a problem when downgrading the video driver to 355.11

Revision history for this message
liuxu (n-i-9) wrote :

Me too.
Nvidia gtx 960m and i7 6700hq.
New install ,anything not change.

Revision history for this message
Keith Burns (alagalah) wrote :

I too am experiencing this issue on a brand new system (received Fri Jun3) with latest BIOS for X99 deluxe and installed Ubuntu 16.04:

dmesg: https://gist.github.com/a15923ba58575c5e62501c17a43e05b5

dmidecode: https://gist.github.com/29f7ff9030f7cb78791431868c609260

lsb_release -a: https://gist.github.com/99ad04476602f00e130223744b535573

uname -a: Linux thing1 4.5.0-040500-generic #201603140130 SMP Mon Mar 14 05:32:22 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Fabio C. Barrionuevo (luzfcb) wrote :
Download full text (7.8 KiB)

this problem was apparently fixed for me, after I install 4.5.0-040500-generic ( from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/ ), but after 2016-05-28 ubuntu packages update, i unable to start Ubuntu 16.04 with any of the installed kernels:

4.5.0-040500-generic ( from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/ )
4.4.0-22-generic
4.4.0-21-generic

I can confirm that this problem also occurs with x86_64

Until the moment, I find two things:

I can only start the ubuntu 16.04 if:

1 - Disable Multi Core support on Dell BIOS,

or

2 - add "acpi=off" on grub start line on system start

I think it's a problem with the processor (bug) or a bug with the identify/access the ACPI features

I really want to find a definitive solution to this problem.

If you need more information or need me to run a program, script, or compile the kernel to test (the last time I did it was in Ubuntu 8.04).

please tell me what needs to be run and how, I'll be happy to help.

I've attached the kernel log que contains the problem traceback

this is lscpu from the system started after disable Multi Core support on BIOS:

fabio@luuzfcb:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 2
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 61
Model name: Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
Stepping: 4
CPU MHz: 799.968
CPU max MHz: 3000.0000
CPU min MHz: 500.0000
BogoMIPS: 4788.86
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap xsaveopt dtherm ida arat pln pts

Relevant traceback from kern.log:

May 28 13:56:30 luuzfcb kernel: [ 9535.840881] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [kworker/2:3:6541]
May 28 13:56:30 luuzfcb kernel: [ 9535.840885] Modules linked in: drbg ansi_cprng ctr ccm rfcomm nvram msr xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables x_tables nf_nat nf_conntrack br_netfilter bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c bbswitch(OE) bnep nls_iso8859_1 nvidia_uvm(POE) nvidia_modeset(POE) nvidia(POE) uvcvideo btusb btrtl btbcm btintel videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core bluetooth videodev media arc4 iwlmvm snd_soc_rt5640 dell_wmi sparse_keymap mxm_wmi intel_rapl x86_pkg_temp_thermal intel_powerclamp dell_led snd_hda_codec_hdmi snd_soc_rl6231 snd_soc_ssm4567 coretemp kvm_intel snd_soc_core mac80211 snd_hda_codec_realtek ...

Read more...

Revision history for this message
Keith Burns (alagalah) wrote :

I have not had a repeat of this issue after disabling USB 3 (it disables 3.1 not 3) in my bios. A bug has been raised with asus re the X99 deluxe v3101 of their bios. Hope this helps someone.

Revision history for this message
Fabio C. Barrionuevo (luzfcb) wrote :

on my Dell Latitude 3450, Core i7-5500U, some new discoveries:

i can start my system if i:

* enable Multi Core support on BIOS
* and disable Hyperthreading on BIOS

Revision history for this message
Garret (garretbowser) wrote :

I started experiencing this very issue about 3 months ago. My server, Dell XPS 8700, would lock up repeatedly over the course of a day but stay "active", meaning my email server wouldn't crash, for a week sometimes even though I could not log in. This server is at the latest standard upgrades of 14.04 LTS, so whatever kernel version that is. I can say this based on trying to troubleshoot via the internet's recommendations, I have replaced the PSU, to no avail. The thing that appears to have solved the issue is to remove the video card, thus rendering the nouveu driver inoperable. In the past four days I have not seen one NMI Watchdog Soft Lockup error in my syslog. Granted I have to run my video through the built in video card, I am fine with that.

I know there is a lot of logic in an OS kernel and I am just one case but perhaps this will provide some insight when troubleshooting this issue.

Revision history for this message
Frederick Astacio (fxastacio) wrote :

the error occurred when shutting down the laptop

Revision history for this message
Jimmy Pan (dspjmr) wrote :

This problem is driving me crazy. Looks like this is related to the nvidia driver. You can shut down the system with 3rd party nvidia driver installed in recovery x mode. However, you cannot enter normal mode at all.

Revision history for this message
Keith Burns (alagalah) wrote : Re: [Bug 1530405] Re: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kerneloops:814]
Download full text (6.9 KiB)

Found that root cause on my rig was low end GPU and nouveau driver. Vendor
upgraded my GPU and put on Nvidia driver and its been perfect ever since.
NOTE: nothing in logs indicates a video issue.

On Mon, Aug 22, 2016, 3:45 AM Jimmy Pan <dspjmr@163.com> wrote:

> This problem is driving me crazy. Looks like this is related to the
> nvidia driver. You can shut down the system with 3rd party nvidia driver
> installed in recovery x mode. However, you cannot enter normal mode at
> all.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1530405
>
> Title:
> NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kerneloops:814]
>
> Status in linux package in Ubuntu:
> Triaged
>
> Bug description:
> I'm using Ubuntu Xenial 16.04 and my computer (ASUS M32BF) will
> randomly freeze up, sometimes before the login screen, sometimes while
> I'm in the middle of using a program. This sometimes happens on the
> Wily 15.10 live cd as well, and on both kernel 4.3.0-2, and kernel
> 4.2.0-22.
>
> Important part of log:
>
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112129] NMI watchdog: BUG:
> soft lockup - CPU#0 stuck for 22s! [kerneloops:814]
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112134] Modules linked in:
> rfcomm bnep nls_iso8859_1 kvm_amd kvm eeepc_wmi asus_wmi crct10dif_pclmul
> sparse_keymap crc32_pclmul aesni_intel aes_x86_64 arc4 lrw gf128mul
> rtl8821ae glue_helper snd_hda_codec_realtek ablk_helper
> snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel btcoexist
> snd_hda_codec rtl_pci snd_hda_core joydev input_leds snd_hwdep rtlwifi
> snd_pcm fam15h_power cryptd snd_seq_midi serio_raw snd_seq_midi_event
> snd_rawmidi mac80211 snd_seq snd_seq_device snd_timer cfg80211 btusb btrtl
> btbcm btintel bluetooth snd soundcore edac_mce_amd k10temp edac_core
> i2c_piix4 shpchp mac_hid parport_pc ppdev lp parport autofs4
> hid_logitech_hidpp uas usb_storage hid_logitech_dj usbhid hid amdkfd
> amd_iommu_v2 radeon i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect
> sysimgblt fb_sys_fops r8169 psmouse mii drm ahci libahci wmi fjes video
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112183] CPU: 0 PID: 814
> Comm: kerneloops Not tainted 4.3.0-2-generic #11-Ubuntu
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112184] Hardware name:
> ASUSTeK COMPUTER INC. K30BF_M32BF_A_F_K31BF_6/K30BF_M32BF_A_F_K31BF_6, BIOS
> 0501 07/09/2015
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112186] task:
> ffff88031146d400 ti: ffff88030e838000 task.ti: ffff88030e838000
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112188] RIP:
> 0010:[<ffffffff810819d6>] [<ffffffff810819d6>] __do_softirq+0x76/0x250
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112194] RSP:
> 0018:ffff88031fc03f30 EFLAGS: 00000202
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112196] RAX:
> ffff88030e83c000 RBX: 0000000000000000 RCX: 0000000040400040
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112197] RDX:
> 0000000000000000 RSI: 000000000000613e RDI: 0000000000000380
> Dec 31 19:13:12 COMPUTERNAME kernel: [ 64.112198] RBP:
> ffff88031fc03f80 R08: 00000029f8fa1411 R09: ffff88031fc169f0
>...

Read more...

Revision history for this message
Jimmy Pan (dspjmr) wrote :

I dont think the gpu can be called low end, it's pretty new nvidia gtx960m.

I am not saying the cause it's in the log, but just some clue about the bug, you cannot tell all the problems just from logs.

By the way, if I set nomodeset without a nvidia driver, then the system can be shutdown every time, though not suspend.

Revision history for this message
Jimmy Pan (dspjmr) wrote :

Do we have a plan when this will be fixed?

Revision history for this message
asimsalam (asimsalam) wrote :

Like Frederick Astacio, I see this every time I shutdown the laptop. I am running elementary OS, which is a Ubuntu derivative.
Is there an ETA for the fix?

Revision history for this message
Александр (venom96669) wrote :

Join us.
Bug manifested itself on Ubuntu Server 16.04.
I tried everything I could ..
As a result, updated to 16.10. Bug preserved.
Suffice critical.
The server normally does not work more than 5 hours.

uname -r
4.8.0-22-generic

Revision history for this message
wvengen (wvengen) wrote :

I'm seeing this issue after desktop upgrade to yakkety with kernel 4.8.0-22-generic but not with kernel 4.4.16-040416-generic.

NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [ps:3908]
Hardware name: Apple Inc. MacBookPro11,1/Mac-189A3D4F975D5FFC, BIOS MBP111.88Z.0138.B07.1402121134 02/12/2014
CPU: 2 PID: 3908 Comm: ps Tainted: P D OE 4.8.0-22-generic #24-Ubuntu
RIP: 0010:[<ffffffffa92cf24b>] [<ffffffffa92cf24b>] native_queued_spin_lock_slowpath+0x17b/0x1a0
Call Trace:
 [<ffffffffa9a9ef10>] _raw_spin_lock+0x20/0x30
 [<ffffffffa94a9236>] pid_revalidate+0x56/0x110
 [<ffffffffa943eb5b>] lookup_fast+0x2eb/0x310
 [<ffffffffa9441e81>] path_openat+0x181/0x1450
 [<ffffffffa94432b4>] ? putname+0x54/0x60
 [<ffffffffa9444491>] do_filp_open+0x91/0x100
 [<ffffffffa9452ea6>] ? __alloc_fd+0x46/0x180
 [<ffffffffa9431505>] do_sys_open+0x135/0x280
 [<ffffffffa943166e>] SyS_open+0x1e/0x20
 [<ffffffffa9a9f076>] entry_SYSCALL_64_fastpath+0x1e/0xa8

Revision history for this message
johnmne (phi-reporter) wrote :

This happened to me too about a 2 months ago:

I installed Ubuntu 16.04.
Then I installed the graphic driver (nvidia-367) about a month ago.

Everything was OK until 21 September 2016.
I think that the linux image update that I downloaded at 21 Sep (according to the update history file) totally ruined everything..

The error message in "dmesg" was:
"NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [nvidia-persiste ..."

It happened on startup.
But a significant delay was present *also* when shutting down.

Revision history for this message
Andreas John (derjohn) wrote :

This just happend to me when upgrading vom 4.7.8 to 4.8.4. Kernel is self-compiled, but with the kernel config taken from Ubuntu Mainline PPA.

I run a Macbook 11,3, so it's very similar to what at wvengen's setup happens.

Revision history for this message
Rob Knop (rknop-l) wrote :

I am having this problem too. I'm not sure the root of the problem. It's intermittent. Sometimes machines boot, sometimes they do not. Sometimes they boot, but then fail to mount an nfs filesystem. (In those cases, top shows mount.nfs using 100% of the CPU. I can't kill the process, and I can't reboot... the "reboot" command tells me that it times out. Hurray, systemd... sigh.)

The machine in question has nvidia-352 and Linux 4.4.0-45 installed.

I couldn't figure out if the problem was kernel, NFS, systemd, or nvidia. Finding this bug suggests to me that the problem is kernel. However, the errors I'm getting suggest that it's "mount.nfs" that is the process that's stuck.

This only started happening a few days ago when I did an apt-get upgrade. I probably hadn't upgraded for a few weeks before that.

Any hope of this bug getting fixed any time soon? It sounds like there's a fix out there upstream; is there, really?

Revision history for this message
Thomas (thuffner) wrote :

I am seeing this as well, running Lubuntu 16.04 Xenial and 4.4.0-45-generic

Attached is my dmesg showing similar error:

[Tue Nov 1 21:52:16 2016] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]

Revision history for this message
Thomas (thuffner) wrote :

See the same on Yakety with 4.8.0-26-generic

Revision history for this message
Alexia Death (alexiade) wrote :

My i7 was perfectly stable with 4.4 kernel, but 4.8 after Yakety update failed to boot properly with soft lockups. Downgrade to 4.4 kernel fixed the issue for now.

Revision history for this message
Thomas (thuffner) wrote :

I was running the latest kernel and the issue went away.

But it returned after updating to 4.8.0-28-generic

Revision history for this message
Federico Alves (h-sales) wrote :

I am affected with Xenial 16.04.1 It all started with an upgrade. It has made my system useless, and affected my entire business. I have several VMware virtual machines, all identical, so it is not any driver. It also does not depend on what server I run them. This is the kernel, which after an update has no quality control or sufficient testing.

Revision history for this message
Roberto Longobardi (seccanj) wrote :

I am affected with 16.04, kernel 4.4.0-51-generic.

The only way I could make my system work has been to install initial Ubuntu 16.04 and not upgrade any single package since installation.

I used to have hangs even before login, so I enabled auto-login without password and things got better.

I still have lockups during shutdown, so I need to hard-shutdown the PC every time, but at least the working sessions go smooth.

My system is an ASUS K550V, Intel Core i7-6700HQ, NVIDIA Geforce GTX 950M.

Revision history for this message
gerald.yang (gerald-yang-tw) wrote :

I am seeing this issue too, but in my case, it seems there is something wrong in bbswitch-dkms, if I remove it manually by 'sudo rm /lib/modules/4.4.0-xx-generic/updates/dkms/bbswitch.ko' I can boot into system with both intel and nvidia mode (prime-select intel or prime-select nvidia).

Since nvidia-prime depends on bbswitch-dkms, I can not remove bbswitch-dkms by 'apt remove', so the temporary solution for me is to remove the kernel module.

Revision history for this message
turoyo dee (lpturoyo) wrote :

@seccanj , I have experienced this same issue and was able to solve with the same steps.

 - install 16.04.1

 - get intel-microcode from "unity" -> "additional drivers"

 - blacklist nouveau

 - $ update init-ramfs -u

this resolves boot and shutdown lockup.

however, I've accidentally updated the kernel twice after resolving it with these steps.

and in both cases, even after uninstalling the new kernel [ even latest 4.4.0-53]
the issue returns randomly on boot and even once on shutdown

[I'm using Dell 7559 6700HQ]
* also using acpi_osi="!Windows 2015" does not resolve this

Revision history for this message
Jimmy Pan (dspjmr) wrote :

@lpturoyo,

Do you need to install Nvidia drivers and does it mean you cannot update any packages? Also after installing the new kernel, does it only has shutdown problem or it also has boot problem.

Revision history for this message
penalvch (penalvch) wrote :

turoyo dee, it will help immensely if you filed a new report with the Ubuntu repository kernel (not mainline/upstream) via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

For more on why this is helpful, please see https://wiki.ubuntu.com/ReportingBugs.

Revision history for this message
turoyo dee (lpturoyo) wrote :

@dspjmr,

in my setup, I do not bother to install/run nvidia, and boot/shutdown works fine.
during the times I have accidentally updated the kernel ( due to some other package installation ), I only experience the boot issue ( and it is very random, about 1 of 5 or maybe 1 of 8 boots ).

The only time I ever experience shutdown problem is on vanilla install ( step 1 in my setup )
but once I get the intel-microcode and blacklist nouveau, everything works fine.

@penalvch,

ok, I'll check, I just landed on these bug reports via google.

Revision history for this message
Jimmy Pan (dspjmr) wrote :

@lpturoyo

Thank you very much, I can shutdown normally now. And my system was already installing the latest packages before blacklisting nouveau, so it should be safe to update.

Nonetheless, the shutdown problem should be related to nouveau.

Revision history for this message
Jimmy Pan (dspjmr) wrote :

One problem for the method in #44, the fan very easily get spinning all the time if nouveau is blacklisted. So I have the enable it. I don't think this is a sensible workaround.

Revision history for this message
Roger Winans (solvaholic) wrote :

I downloaded lubuntu-16.10-desktop-amd64.iso and created a new VirtualBox VM on my MacBook to install it.

Each time I tried to boot the VM I got that 'NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s!'

After reading Keith Burns (alagalah) comment #19, I remembered enabling the USB 3.0 port in the VM. Once I changed the port back to USB 2.0, the VM booted without issue.

hth, and thanks to @alagalah

(VirtualBox Version 5.0.32 r112930; macOS Sierra Version 10.12.3)

Revision history for this message
Liam Gladdy (liam) wrote :

I too have this bug after a clean install of 16.10 this weekend (with a low end Nvidia graphics card)

The server, after a few minutes to a few hours would hang. I tried upgrading to 17.04's current alpha too, and it didn't help. I enabled netconsole to get some logs, and I've attached them here incase it's useful.

Before the reinstall I had blacklisted nouveau, so I'm it would make sense if that was causing my issues here. I've blacklisted again now, so expect this to be resolved for me.

Revision history for this message
turoyo dee (lpturoyo) wrote :

after re-testing since a while back, now with 16.04.2, I was able to isolate that acpi_osi parameter solves the shutdown issue from nouveau "failed to adjust lnkctl speed"

update grub with:

GRUB_CMDLINE_LINUX_DEFAULT="acpi_osi=\"!Windows 2015\""

* I removed the quiet splash just to see boot logs in progress
* this worked with 16.04.1, I just didn't figure it out then

reference for solution:
https://bugzilla.kernel.org/show_bug.cgi?id=153281

Revision history for this message
Roberto Longobardi (seccanj) wrote :

Just wanted to report that I no longer have this issue since I installed the Solus distribution: https://solus-project.com/

I was also affected by the fan bug just cited in the previous comment, which also seems to have been solved with Solus: https://bugzilla.kernel.org/show_bug.cgi?id=153281

I had tried the latest Fedora 25 just before Solus, but couldn't even complete the installation wizard: it hanged with "soft lockup - CPU#0 stuck for 22s!".

I've been running the Solus distribution for some days now and didn't get fans or NMI watchdog issues anymore. Also no boot options have been required.

My machine is an Asus K550V laptop, Intel Core i7-6700HQ with NVidia Geforce GTX 950M.

Hope this helps.

Revision history for this message
Jimmy Pan (dspjmr) wrote :

@seccanj, thank you very much bro, you are a life saver.

Now my machine can shutdown and even suspend, it finally became an usable machine.

Revision history for this message
Roberto Longobardi (seccanj) wrote :

Glad it works for you too.

It would be interesting to investigate what has this distribution that prevents the bug to show up.

May be the desktop environment, Budgie, which is unique?

There's a Budgie version for Arch: https://www.archlinux.org/packages/community/x86_64/budgie-desktop/ which may be worth trying to this means.

Revision history for this message
Jimmy Pan (dspjmr) wrote :

@seccanj, sorry, I @ the wrong person, I haven't try this distribution yet. I tried the method in #48

@lpturoyo, thank you very much.

Revision history for this message
Fernando (ferlahozseg) wrote :

I have a new laptop Asus UX360UAK with this issue too: soft lockup CPU stuck for XX seconds

Revision history for this message
Jordan Silva (jordansilva) wrote :

I have a new laptop Avell B155 V4 with i7 and GTX 950M and I get the same error. I'm not able to install or run the live Ubuntu, it gets stuck on loading screen.

Revision history for this message
kngharv (kngharv) wrote :

Dell Inspiron 15 (2016) model with Nvidia gtx960m. I am on 16.10, kernel version 4.8.0-44-generic.

nouvau driver:
Package: xserver-xorg-video-nouveau
Version: 1:1.0.14+git1703080733.b71de8~gd~y

I also encounter this problem.

at the grub, I can at least log in with full graphics by adding

nouveau.modeset=0

then, computer will behave normally... until I left idle for too long and laptop went to some sort of suspension, then, the kernel crashes.

It is like this since past couple kernel updates.

Revision history for this message
Alexey Deryushkin (aderyushkin) wrote :

I have the same problem trying to boot from Ubuntu 16.04.2LTS Boot USB Stick on my new Dell Precision 5520 with Xeon. 17.04 works OK.

Revision history for this message
Vasile Gorcinschi (vgorcinschi) wrote :

I have the same issue when shutting down from Ubuntu 16.04 LTS. Machine: Dell Precision 3520

Revision history for this message
yossarian_uk (morgancoxuk) wrote :

I had the exact same issue with Asus -> N552VW

It would boot the livecd once from a cold boot, after reboot I got the same error as above and it would hang during boot, it would refuse to every boot again.

The workaround from kngharv worked fine however.

Also I discovered enabling secure-boot also worked (although you have to disable that to install Nvidia driver)

What can we do to make this work out the box on affected h/w ?

I can imagine a newbie trying out Linux for the first time on their Asus laptop and just thinking Linux doesn't work.

tags: added: kde-neon yakkety
Revision history for this message
Walter Hunt (angrypuppy) wrote :

Been seeing this sporadically since 17.04 upgrade. I have an NVIDIA GeForce GT 740 running the proprietary driver, so it isn't nouveau in my case. Only common activity that sticks in my mind is that it seems to have occurred for me when (or shortly after) doing something with VirtualBox - starting a VM, doing something in VM. VM is set to hog 2 of my 4 cores.

Doesn't always happen - right now, usual desktop/apps and Windows7 VM seem to be running fine. I think I've seen it 3-4 times in 1.5 weeks of light usage.

When it does happen, I still have mouse, but everything else is hit or miss - sometimes I can alt-tab, sometimes not, sometimes taskbar thingy pops up but does nothing, sometimes I can switch apps. I think each time there was a momentary audio screwup, but then the audio picks up and plays fine.

Linux FrankenMac.walterhome.net 4.10.0-21-generic #23-Ubuntu SMP Fri Apr 28 16:14:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Running Kubuntu with the PPA for more up-to-date KDE stuff, forget the name.

Please let me know anything I can do to help track this down or debug it, I got time. :)

Revision history for this message
DimanNe (dimanne) wrote :

+1
I have encountered the same bug (NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s!). Logs could be found in the attachment.

Revision history for this message
Nabeel Omer (nabeelomer) wrote :
Revision history for this message
Frank Baehnisch (fbhnisch) wrote :

Same here on Lubuntu 16.04
Can't be nvidia related, i have an intel gpu.
NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s!

Revision history for this message
Simon Lambourn (simon-lambourn-o) wrote :

I can repeatably get this problem if I boot without a network connection. I get the message "waiting for a start job to complete ... Raise Network Interfaces" followed by "NMI Watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]"
Booting with a network cable attached has no problems.
This is on 4.10.0-26 and AMD Athlon 64 X2, using an AMD GPU. Nouveau module is not loaded.

Revision history for this message
Philipp Wrann (philippwrann) wrote :

I get those messages every time i shut down, so basically every day.

Installed Ubuntu 16.04 on nvme ssd, that i installed into a msi notebook.
Integrated intel (i7 7700) + dedicated nvidia graphics (gtx 1050)

PLEASE help me, turning off my laptop using the power button feels very unhealthy.

Today i tried running lshw to give you some more information about my chipsets, etc. But the consequence was kind of a system freeze. Could not kill any processes anymore and one cpu core was at 100% - though htop did not tell me any process with 100% cpu load. Also networking and many other things were blocked.

Last time i got:
[992.132362] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [gpu-manager:8250]
[1020.132749] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [gpu-manager:8250]
[1025.232819] INFO: rcu_sched self-detected stall on CPU
...followed by some irq stuff

I also got this message with kworker instead of gpu-manager

I highly suspect the nvidia gpu to be the issue here. Or some other hardware-based kernel issue. Or maybe nfs, i mounted 2 nfs shares on this machine for the first time, from which only 1 is available (i use this notebook in different offices).

uname -r: Kernel: 4.10.0-27-generic

Revision history for this message
Philipp Wrann (philippwrann) wrote :

I can find the following line in the logs over and over again, could this be connected to the power-off issue i described above?

nouveau 0000:01:00.0: Refused to change power state, currently in D3

Revision history for this message
Dean Schulze (dean-w-schulze-q) wrote :

I get this error every time I shutdown, starting yesterday 2017-07-24. I have to finish shutting down by holding down the power button for several seconds. lshw hangs the system completely.

This never happened before yesterday so it must be a recent patch or change of some sort.

$ uname -a
Linux XPS-15-9560 4.10.0-27-generic #30~16.04.2-Ubuntu SMP Thu Jun 29 16:07:46 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
$ uname -r
4.10.0-27-generic

Revision history for this message
Philipp Wrann (philippwrann) wrote :

Today, after making some updates the issue still exists, but the shutdown screen got a little bit more verbose.

It tells me

"A stop job is running for Light Display Manager" followed by the original error from this issue.

If i do the following i can shut down:

1) sudo systemctl stop lightdm.service
2) switch to tty1 (ctrl+alt+f1)
3) sudo poweroff

If i type "sudo systemctl stop lightdm.service && sudo poweroff" it wont work

The lightdm logs only contain 1 non-debug output:

[+33.52s] CRITICAL: session_get_login1_session_id: assertion 'session != NULL' failed

The lightdm/x-0.log.1.gz contains some errors too

Errors from xkbcomp are not fatal to the X server
(EE)
(EE) Backtrace:
(EE) 0: /usr/lib/xorg/Xorg (xorg_backtrace+0x4e) [0x55b252dc7c6e]
(EE) 1: /usr/lib/xorg/Xorg (0x55b252c15000+0x1b6ff9) [0x55b252dcbff9]
(EE) 2: /lib/x86_64-linux-gnu/libc.so.6 (0x7f68dad65000+0x354b0) [0x7f68dad9a4b0]
(EE) 3: /usr/lib/xorg/modules/libfb.so (_fbGetWindowPixmap+0xd) [0x7f68d4028bdd]
(EE) 4: /usr/lib/xorg/Xorg (0x55b252c15000+0x1353d7) [0x55b252d4a3d7]
(EE) 5: /usr/lib/xorg/Xorg (0x55b252c15000+0x1354a5) [0x55b252d4a4a5]
(EE) 6: /usr/lib/xorg/Xorg (0x55b252c15000+0x135c12) [0x55b252d4ac12]
(EE) 7: /usr/lib/xorg/Xorg (0x55b252c15000+0x1347c3) [0x55b252d497c3]
(EE) 8: /usr/lib/xorg/Xorg (0x55b252c15000+0xe4c58) [0x55b252cf9c58]
(EE) 9: /usr/lib/xorg/Xorg (0x55b252c15000+0x132b64) [0x55b252d47b64]
(EE) 10: /usr/lib/xorg/Xorg (0x55b252c15000+0x57f17) [0x55b252c6cf17]
(EE) 11: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0xf0) [0x7f68dad85830]
(EE) 12: /usr/lib/xorg/Xorg (_start+0x29) [0x55b252c57069]
(EE)
(EE) Segmentation fault at address 0x10
(EE)
Fatal server error:
(EE) Caught signal 11 (Segmentation fault). Server aborting
(EE)
(EE)

Both logs contain the following strings (maybe some redirected input?):
^@^@^@^@^@ (this goes on for some time)

I hope this is of some help for you

Revision history for this message
Philipp Wrann (philippwrann) wrote :

I created a separat issue in the lightdm tracker but could not link it properly, here is the link:
https://bugs.launchpad.net/lightdm/+bug/1707574

Revision history for this message
Dean Schulze (dean-w-schulze-q) wrote :

I installed updates yesterday (August 3) and now the OS shutdown command does nothing except for hide the mouse cursor and leave the desktop hung. No response from keyboard or mouse. I now have to do a hard shutdown using the power button. Executing "shutdown" from a terminal does nothing.

Having to do a hard shutdown seems like a recipe to corrupt the OS files.

Revision history for this message
Stuart Page (sdpagent) wrote :

I believe this issue is effecting me too after a fresh install of Ubuntu server 16.04.3 with the HWE kernel 4.10 today (fully updated). After seeing the issue, was unable to log in or SSH in.

Revision history for this message
mrvst (maravento) wrote :

To fix it edit /etc/sysctl.conf and add:
kernel.watchdog_thresh=30
reboot

Revision history for this message
Philipp Wrann (philippwrann) wrote :

I managed to solve the issue for my case!!

The problem was a nfs share defined in /etc/fstab. It seems it should be unmounted but networking already stopped. So i switched to autofs and successfuly rebooted twice without the problem.

Thats how i defined the mount:
server:/path/to/nfs-share /media/nfs-share nfs auto,nofail,noatime,nolock,intr,tcp,actimeo=1800 0 0

Revision history for this message
Scott Deagan (scott-deagan) wrote :

Not sure if this is the same issue, but I'm getting:

[ 32.468026] watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [nvidia-smi:594]

on Kubuntu 17.04 (kernel 4.13.3-041303-generic, i7-4790, GTX 1070, Nvidia 384.69). My desktop still works fine, I do not experience any crashes or freezes, but shutdown takes ages (several minutes).

This is a fairly recent issue for me (I'm sure things were working fine with the 4.12 kernels).

Revision history for this message
Alena Laskavaia (elaskavaia) wrote :

Start happening to me now on boot when I connect to docking station (which brings secondary monitor)
4.10.0-35-generic #39~16.04.1-Ubuntu SMP Wed Sep 13 09:02:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

After reading this thread I realised that its probably multiple problem of dead lock which trigger the watchdog message but causes are different, it could be one bug in kernel or could be multiple problems all resulting in dead lock. I actually always had reboot problem - it locked up on reboot, but I ignored it, but now it does not boot at all which is more of an issue.

So I have to apply 2 solutions:
- for shutdown issue I apply fstab fix from #72, now it shuts down properly, but hang still happens on boot when attached to secondary monitor
- for booting issue I switch from nouveau display driver to Nvidia 375.66 and it seems to work now (it could be that I just stop hitting it temporary and it really triggered by random race condition)

Revision history for this message
Dean Schulze (dean-w-schulze-q) wrote :

A friend pointed out that the problem I was experiencing on shutdown (Dell XPS-15 9560) was due to the X.org Nouveau driver. I switched to NVidia driver (375.66) and my laptop shuts down normally now.

Revision history for this message
DiagonalArg (diagonalarg) wrote :

Happening on attempted bootup of a Thinkpad W520, using a USB stick loaded with Ubuntu 17.10 installer.

Revision history for this message
DiagonalArg (diagonalarg) wrote :

*That's the Ubuntu Mate 17.10 installer. I get a repeated Call trace which ends with the line, "perf: interrupt took too long (8809 > 8338), lowering t_max_sample_rate to 22500.

Revision history for this message
gaurav arora (gauravv7) wrote :

I have faced this issue but maybe with different logs(pfa log statement).

My laptop is dell 7757 i7 7700+nvidia 1050ti mobile 8gb ram 128gb ssd and 1tb sata, dual boot ubuntu 18.04.1LTS with 4.15 kernel + windows 10 home(although not using much).
Sata raid on enabled in dell bios. Earlier tried with sata operation as ahcp. Both gave errors as same.

Currently running video mode is xorg session.
I was trying to install nvidia drivers(nvidia.run file from nvidia website) but everytime I do or such a computational task comes, this error comes up.
Shutdown/reboot also hangs for eternity and have to hard shutdown.
Random system hang or trackpad hang is also usual.
Tried apci=off in grub, which fix the issues but my i7 runs single core only which is unacceptable.

Revision history for this message
gaurav arora (gauravv7) wrote :

Adding to the above #78, when I start the system, and before logging into ubuntu 18.04 LTS with my username and password. I opened up the console view (tty4). While shutdown down from there and reboot also works fine. I recieve no messages of cpu worker stuck as described above. I think its some other service which hangs the cpu workers(cpu resource deadlock) which starts after logging into the system for the first time. Since its a new laptop I have not installed apps or services to start with init.d and also I have put no triggers for /etc/rc* bluetooth/screen settings.

After this I tried installing nvidia drivers for gtx 1050 ti from nvidia site run file I downloaded and it gave an error for nouveau kernel driver. So I turned it off by nvidia's installer helper for this and updated ramfs and did a reboot.

No issues for me now, for shutdown/reboot or random system hang after successfully running nvidia graphics.

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : apport information

ProblemType: Bug
ApportVersion: 2.20.10-0ubuntu13
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: typer 2994 F.... pulseaudio
 /dev/snd/pcmC0D0p: typer 2994 F...m pulseaudio
CurrentDesktop: LXQt
DistroRelease: Ubuntu 18.10
InstallationDate: Installed on 2018-12-20 (2 days ago)
InstallationMedia: Lubuntu 18.10 "Cosmic Cuttlefish" - Release amd64 (20181017.2)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 1c4f:0002 SiGma Micro Keyboard TRACER Gamma Ivory
 Bus 001 Device 002: ID 046d:c083 Logitech, Inc.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Gigabyte Technology Co., Ltd. B360M-D3H
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.18.0-12-generic root=UUID=ba2e86e4-48ce-47f3-a413-98536fe6febd ro quiet splash resume=UUID=5b571422-2d63-4044-88be-ecd5e66732d8 vt.handoff=1
ProcVersionSignature: Ubuntu 4.18.0-12.13-generic 4.18.17
RelatedPackageVersions:
 linux-restricted-modules-4.18.0-12-generic N/A
 linux-backports-modules-4.18.0-12-generic N/A
 linux-firmware 1.175.1
RfKill:

Tags: cosmic
Uname: Linux 4.18.0-12-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sudo
_MarkForUpload: True
dmi.bios.date: 04/19/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F4
dmi.board.asset.tag: Default string
dmi.board.name: B360M D3H-CF
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF4:bd04/19/2018:svnGigabyteTechnologyCo.,Ltd.:pnB360M-D3H:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnB360MD3H-CF:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: B360M-D3H
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

tags: added: apport-collected cosmic
Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : AlsaInfo.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : CRDA.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : IwConfig.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : Lspci.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : ProcEnviron.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : ProcModules.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : PulseList.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : UdevDb.txt

apport information

Revision history for this message
Elvis de Freitas Souza (edigitalb) wrote : WifiSyslog.txt

apport information

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.