Hard lockup during boot with linux-image-4.4.0-112-generic

Bug #1745349 reported by Lars Behrens
86
This bug affects 14 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
High
Joseph Salisbury
Xenial
In Progress
High
Joseph Salisbury

Bug Description

Systems on ASUS H110M-C Boards don't boot here with Kernel 4.4.0-112 instead they show

NMI watchdog: Watchdog detected hard lockup on cpu 0
NMI watchdog: Watchdog detected hard lockup on cpu 1
NMI watchdog: Watchdog detected hard lockup on cpu 3
NMI watchdog: Watchdog detected hard lockup on cpu 2

I cannot provide any logs of course.

Booting with 4.4.0-109 works.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-112-generic 4.4.0-112.135
ProcVersionSignature: Ubuntu 4.4.0-109.132-generic 4.4.98
Uname: Linux 4.4.0-109-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: lr3200 2013 F.... pulseaudio
Date: Thu Jan 25 11:38:55 2018
HibernationDevice: RESUME=UUID=8e35131d-6d40-4f4f-a3db-c7336e72cbce
IwConfig:
 lo no wireless extensions.

 enp5s0 no wireless extensions.

 enp3s0 no wireless extensions.
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 002: ID 046d:c077 Logitech, Inc. M105 Optical Mouse
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: System manufacturer System Product Name
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-109-generic root=UUID=4dc5bb0a-ab7c-48fa-ac23-bcdf23755d93 ro
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-109-generic N/A
 linux-backports-modules-4.4.0-109-generic N/A
 linux-firmware 1.157.15
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/12/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3601
dmi.board.asset.tag: Default string
dmi.board.name: H110M-C
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3601:bd12/12/2017:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnH110M-C:rvrRevX.0x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
importance: Undecided → High
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-da-key
tags: added: pti
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
Revision history for this message
Ingemar Fällman (mrsmurf) wrote :

I have similar issues on my DELL R630 server but only on those with extra NIC:s.

In my case it is not triggerd on boot but when I start the openstack neutron agents components that starts configuring the extra nic (Intel 10-Gigabit X540-AT2 rev 01).

[ 204.789174] NMI watchdog: Watchdog detected hard LOCKUP on cpu 7
[ 205.193525] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6
[ 205.716981] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
[ 206.012783] NMI watchdog: Watchdog detected hard LOCKUP on cpu 14
[ 206.313395] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
[ 207.761181] NMI watchdog: Watchdog detected hard LOCKUP on cpu 9
[ 208.288291] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
[ 209.190285] NMI watchdog: Watchdog detected hard LOCKUP on cpu 12
[ 209.560277] NMI watchdog: Watchdog detected hard LOCKUP on cpu 8
[ 210.040526] NMI watchdog: Watchdog detected hard LOCKUP on cpu 2
[ 211.555861] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
[ 216.922401] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [neutron-rootwra:17362]
[ 216.926401] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sudo:17410]
[ 216.946400] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [sudo:17389]
[ 216.950399] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [cinder-volume:7662]
[ 216.958398] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 23s! [neutron-rootwra:17349]
[ 237.065467] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[ 239.587762] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10
[ 244.924190] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sudo:17410]
[ 244.948189] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [cinder-volume:7662]
[ 244.956188] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 23s! [neutron-rootwra:17349]
[ 264.237850] NMI watchdog: Watchdog detected hard LOCKUP on cpu 11
[ 265.156735] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5
[ 272.953977] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 23s! [neutron-rootwra:17349]
[ 296.597072] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13

It works for me with 109, but I get the kernel lockup with 110.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote : Re: [Bug 1745349] Re: Hard lockup during boot with linux-image-4.4.0-112-generic

Am 25.01.2018 um 21:45 schrieb Joseph Salisbury:
> Can you see if either of the following two kernels also exhibit the bug:
>
> 4.4.0-110: https://launchpad.net/~canonical-kernel-security-
> team/+archive/ubuntu/ppa/+build/14231477
>
> 4.4.0-111: https://launchpad.net/~canonical-kernel-security-
> team/+archive/ubuntu/ppa/+build/14241947

Yes, both show the same freeze and error message.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :
Revision history for this message
Lars Behrens (lars-behrens-u) wrote :
Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

Any more info that you need? The bug is still tagged as "incomplete"

Revision history for this message
Ingemar Fällman (mrsmurf) wrote :

I can prevent the issue by addning the "noibpb" boot parameter to my kernel arguments.

Revision history for this message
ake sandgren (ake-sandgren) wrote :

The 4.4.0-116-generic still hangs my laptop during boot the same way -112 did. And noibpb still solves the problem.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

Am 22.02.2018 um 09:05 schrieb ake sandgren:
> The 4.4.0-116-generic still hangs my laptop during boot the same way
> -112 did. And noibpb still solves the problem.

Confirming for Asus H110M-C boards, still the hardlocks.

Revision history for this message
Mike Turner (q-mike-5) wrote :

4.4.0-116-generic does appear to solve the problem on my Asus Prime Z270K.

Revision history for this message
Mike Turner (q-mike-5) wrote :

Correction - 4.4.0-116-generic does not solve the problem.

The server was up for over an hour before hanging. As it doesn't usually have a console, I can't be certain of the cause of the hang. However, a subsequent reboot, with a console, involving rebuilding a 10TB RAID 5 array, showed exactly the same Watchdog Hard Lockup for each core during the boot process.

Now running with noibpb again.

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in linux (Ubuntu Xenial):
status: Incomplete → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built an Artful test kernel with a revert of commit: ff2699c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Revision history for this message
Andreas Argyris (anargiri) wrote :

Hello, I also have the same issue on a Dell E7470 laptop using Ubuntu 16.04 LTS.

I tried the kernel with the reverted commit and it works fine.
I have also noticed the same issue on 4.13.0-32 and 4.13.0-36 hwe kernels

Let me know if you need any logs from my side

Cheers

Revision history for this message
Mike Turner (q-mike-5) wrote :

I'm afraid that I am not an expert at Ubuntu. I think that I have installed your test kernel correctly, but I don't know how to prove it. I used sudo dpkg -i *.deb .

If I have installed it correctly, then it doesn't cure the problem. It runs fine with noibpb, but fails in the same way if I try to run without it.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

@Mike Turner, you can confirm which kernel you are running by running the following command from a terminal: uname -a

@Lars Behrens, as the original bug reporter, can you test the kernel posted in comment #14?

Revision history for this message
Mike Turner (q-mike-5) wrote :

uname -a shows:-

Linux golf 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

so I obviously haven't implemented your kernel correctly.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

Sorry, no improvement on ASUS H110M-C Board with
linux-image-4.4.0-116-generic_4.4.0-116.140~lp1745349Commitff2699cReverted_amd64

Still 4 out of 5 boots result in hard lock.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between 4.4.0-109 and 4.4.0-110. The kernel bisect will require testing of about 5 - 6 test kernels.

I built the first test kernel, up to the following commit:
d3d0f0a209ee29cf553b8b5580eb954b0d4aa970

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Mike Turner (q-mike-5) wrote :

I'm definitely running from your 110 kernel, and it exhibits the Watchdog Hard Lockup problem, when running without noipbp.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

No luck, unfortunately.
linux-image-4.4.0-110-generic_4.4.0-110.133~lp1745349Commitd3d0f0a209ee_amd64 shows the same hard lock.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
7de295e2a47849488acec80fc7c9973a4dca204e

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Mike Turner (q-mike-5) wrote :

That one, Linux golf 4.4.0-110-generic #133~lp1745349Commit7de295e2a47849 SMP Tue Mar 6 13:00:57 UTC 20 x86_64 x86_64 x86_64 GNU/Linux doesn't appear to have the problem.

It has been running for a lot longer than it usually does.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

Yay!

Three boots in a row worked with
linux-image-4.4.0-110-generic_4.4.0-110.133~lp1745349Commit7de295e2a47849_amd64

Was only able to test on one machine though, all my users are logged in. More testing tomorrow.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
e233ec086ab77c1fc5714368e93a0d6c99d92226

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Mike Turner (q-mike-5) wrote :

That one (e233e...) failed.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
49bb7a3cbf55099e79b27c611176b8db5533d566

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Mike Turner (q-mike-5) wrote :

Linux golf 4.4.0-110-generic #133~lp1745349Commit49bb7a3cbf5 SMP Tue Mar 6 16:37:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux doesn't seem to have the problem,.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
47a07600efeb2a180b847bb0e3e99e138a49198c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Mike Turner (q-mike-5) wrote :

That one, 47a0... , failed.

Unfortunately, that is the last one that I can test for you until next week, as I have to go away for a few days.

I hope that someone else can help.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

47a07600efe does not work here, hard lock on two pcs.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
b0c3e8bd040546f4dfedff77e3a171a9b15c6571

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

No success, hard locks with b0c3e8bd040

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The bisect reported the following as the first bad commit:
b0c3e8bd0405 ("x86/mm: Only set IBPB when the new thread cannot ptrace current thread")

I built a Xenial test kernel with a revert of this commit. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

linux-image-4.4.0-116-generic_4.4.0-116.140~lp1745349Commitb0c3e8bd0405Reverted_amd64
seems to work here, 3 reboots in a row on one machine without hard locks.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Before I request feedback from upstream, can you see if this bug is already fixed in the mainline kernel? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4/

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

To narrow down the number of commits that could have fixed this, can you test the following two kernels as welL:

v4.15 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/
v4.16-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc1/

Revision history for this message
Ingemar Fällman (mrsmurf) wrote :

If i remember correct then from 4.15 the retpoline fixes are used for spectre mitigation so the noibpb code is not used.

Revision history for this message
Lars Behrens (lars-behrens-u) wrote :

linux-image-4.15.0-041500-generic_4.15.0-041500.201802011154_amd64
and
linux-image-4.16.0-041600rc1-generic_4.16.0-041600rc1.201802120030_amd64
both are working

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'll ping the patch author for feedback.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the proposed kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Thank you in advance!

Revision history for this message
Mike Turner (q-mike-5) wrote :

I'm afraid that 4.4.0-117-generic #141 - Ubuntu SMP Tue Mar 13 11:58:07 doesn't cure the problem for me. I still get the Watchdog Hard Lockups.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This bug is probably a duplicate of bug 1746418 in Artful.

Revision history for this message
Val (vk1266) wrote :

I have very similar issues with 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux. Booting with the noibpb option does help and I am currently using it as a workaround.

Revision history for this message
broadmind (gustavo-vegas) wrote :

I have experienced a similar lockup during boot with newly installed updates today (4/4/2018). The kernel version is 4.4.0-119-generic #143. I also tried to boot with an older one (4.4.0-116 perhaps)to no avail. Adding "noibpb" to GRUB_CMDLINE_LINUX= in /etc/default/grub allowed the machine to boot fully.

Revision history for this message
Ivan Ivanov (her-ivanov) wrote :

Thanks broadmind (gustavo-vegas).
I also have a same problem lockup during boot with newly installed updates today (4/4/2018). The kernel version is 4.4.0-119-generic #143. I also tried to boot with an older one (4.4.0-116 perhaps)to no avail. I can only boot in upstart mode on these kernels, also there were no problems when loading 4.4.0-112.
Adding "noibpb" to GRUB_CMDLINE_LINUX= in /etc/default/grub allowed the machine to boot fully without problems.

Revision history for this message
Sander (3-sander) wrote :

Confirmed, 4.4.0-119-generic #143 does not solve this problem, still a hard freeze during boot. Adding "noibpb" to GRUB_CMDLINE_LINUX= in /etc/default/grub allowed the machine to boot. Interestingly enough my server is on VMware, so the problem exists there too

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can those affected by this bug, please test the following kernel:
https://launchpad.net/ubuntu/+source/linux/4.15.0-15.16/+build/14530348

To test the kernel, install both the linux-image and linux-image-extra .deb packages.

Revision history for this message
Mike Turner (q-mike-5) wrote :

I've just rebooted five times without any lock ups.

I'll leave it running, and see what happens.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Mike. That is the kernel that will ship with 18.04(Bionic). We may need to perform a "Reverse" kernel bisect to identify the commit in Bionic that fixes this bug.

Revision history for this message
Ivan Ivanov (her-ivanov) wrote :

No problem with boot on 4.15.0-15.16 (noibpb disable in GRUB_CMDLINE_LINUX=)

Revision history for this message
Andreas Argyris (anargiri) wrote :

Hello,

I can also confirm that there are no boot problems with 4.15.0-15.16 and noibpb disabled in GRUB_CMDLINE_LINUX

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.