Hard lockup during boot with linux-image-4.4.0-112-generic

Bug #1745349 reported by Lars Behrens on 2018-01-25
86
This bug affects 14 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Xenial
High
Joseph Salisbury

Bug Description

Systems on ASUS H110M-C Boards don't boot here with Kernel 4.4.0-112 instead they show

NMI watchdog: Watchdog detected hard lockup on cpu 0
NMI watchdog: Watchdog detected hard lockup on cpu 1
NMI watchdog: Watchdog detected hard lockup on cpu 3
NMI watchdog: Watchdog detected hard lockup on cpu 2

I cannot provide any logs of course.

Booting with 4.4.0-109 works.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-112-generic 4.4.0-112.135
ProcVersionSignature: Ubuntu 4.4.0-109.132-generic 4.4.98
Uname: Linux 4.4.0-109-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: lr3200 2013 F.... pulseaudio
Date: Thu Jan 25 11:38:55 2018
HibernationDevice: RESUME=UUID=8e35131d-6d40-4f4f-a3db-c7336e72cbce
IwConfig:
 lo no wireless extensions.

 enp5s0 no wireless extensions.

 enp3s0 no wireless extensions.
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 002: ID 046d:c077 Logitech, Inc. M105 Optical Mouse
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: System manufacturer System Product Name
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-109-generic root=UUID=4dc5bb0a-ab7c-48fa-ac23-bcdf23755d93 ro
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-109-generic N/A
 linux-backports-modules-4.4.0-109-generic N/A
 linux-firmware 1.157.15
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/12/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3601
dmi.board.asset.tag: Default string
dmi.board.name: H110M-C
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3601:bd12/12/2017:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnH110M-C:rvrRevX.0x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Lars Behrens (lars-behrens-u) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
importance: Undecided → High
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-da-key
tags: added: pti
Ingemar Fällman (mrsmurf) wrote :

I have similar issues on my DELL R630 server but only on those with extra NIC:s.

In my case it is not triggerd on boot but when I start the openstack neutron agents components that starts configuring the extra nic (Intel 10-Gigabit X540-AT2 rev 01).

[ 204.789174] NMI watchdog: Watchdog detected hard LOCKUP on cpu 7
[ 205.193525] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6
[ 205.716981] NMI watchdog: Watchdog detected hard LOCKUP on cpu 3
[ 206.012783] NMI watchdog: Watchdog detected hard LOCKUP on cpu 14
[ 206.313395] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0
[ 207.761181] NMI watchdog: Watchdog detected hard LOCKUP on cpu 9
[ 208.288291] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
[ 209.190285] NMI watchdog: Watchdog detected hard LOCKUP on cpu 12
[ 209.560277] NMI watchdog: Watchdog detected hard LOCKUP on cpu 8
[ 210.040526] NMI watchdog: Watchdog detected hard LOCKUP on cpu 2
[ 211.555861] NMI watchdog: Watchdog detected hard LOCKUP on cpu 15
[ 216.922401] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [neutron-rootwra:17362]
[ 216.926401] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sudo:17410]
[ 216.946400] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 23s! [sudo:17389]
[ 216.950399] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [cinder-volume:7662]
[ 216.958398] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 23s! [neutron-rootwra:17349]
[ 237.065467] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[ 239.587762] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10
[ 244.924190] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sudo:17410]
[ 244.948189] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [cinder-volume:7662]
[ 244.956188] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 23s! [neutron-rootwra:17349]
[ 264.237850] NMI watchdog: Watchdog detected hard LOCKUP on cpu 11
[ 265.156735] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5
[ 272.953977] NMI watchdog: BUG: soft lockup - CPU#13 stuck for 23s! [neutron-rootwra:17349]
[ 296.597072] NMI watchdog: Watchdog detected hard LOCKUP on cpu 13

It works for me with 109, but I get the kernel lockup with 110.

Am 25.01.2018 um 21:45 schrieb Joseph Salisbury:
> Can you see if either of the following two kernels also exhibit the bug:
>
> 4.4.0-110: https://launchpad.net/~canonical-kernel-security-
> team/+archive/ubuntu/ppa/+build/14231477
>
> 4.4.0-111: https://launchpad.net/~canonical-kernel-security-
> team/+archive/ubuntu/ppa/+build/14241947

Yes, both show the same freeze and error message.

Lars Behrens (lars-behrens-u) wrote :
Lars Behrens (lars-behrens-u) wrote :
Lars Behrens (lars-behrens-u) wrote :

Any more info that you need? The bug is still tagged as "incomplete"

Ingemar Fällman (mrsmurf) wrote :

I can prevent the issue by addning the "noibpb" boot parameter to my kernel arguments.

ake sandgren (ake-sandgren) wrote :

The 4.4.0-116-generic still hangs my laptop during boot the same way -112 did. And noibpb still solves the problem.

Lars Behrens (lars-behrens-u) wrote :

Am 22.02.2018 um 09:05 schrieb ake sandgren:
> The 4.4.0-116-generic still hangs my laptop during boot the same way
> -112 did. And noibpb still solves the problem.

Confirming for Asus H110M-C boards, still the hardlocks.

Mike Turner (q-mike-5) wrote :

4.4.0-116-generic does appear to solve the problem on my Asus Prime Z270K.

Mike Turner (q-mike-5) wrote :

Correction - 4.4.0-116-generic does not solve the problem.

The server was up for over an hour before hanging. As it doesn't usually have a console, I can't be certain of the cause of the hang. However, a subsequent reboot, with a console, involving rebuilding a 10TB RAID 5 array, showed exactly the same Watchdog Hard Lockup for each core during the boot process.

Now running with noibpb again.

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in linux (Ubuntu Xenial):
status: Incomplete → In Progress
Joseph Salisbury (jsalisbury) wrote :

I built an Artful test kernel with a revert of commit: ff2699c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Andreas Argyris (anargiri) wrote :

Hello, I also have the same issue on a Dell E7470 laptop using Ubuntu 16.04 LTS.

I tried the kernel with the reverted commit and it works fine.
I have also noticed the same issue on 4.13.0-32 and 4.13.0-36 hwe kernels

Let me know if you need any logs from my side

Cheers

Mike Turner (q-mike-5) wrote :

I'm afraid that I am not an expert at Ubuntu. I think that I have installed your test kernel correctly, but I don't know how to prove it. I used sudo dpkg -i *.deb .

If I have installed it correctly, then it doesn't cure the problem. It runs fine with noibpb, but fails in the same way if I try to run without it.

Joseph Salisbury (jsalisbury) wrote :

@Mike Turner, you can confirm which kernel you are running by running the following command from a terminal: uname -a

@Lars Behrens, as the original bug reporter, can you test the kernel posted in comment #14?

Mike Turner (q-mike-5) wrote :

uname -a shows:-

Linux golf 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

so I obviously haven't implemented your kernel correctly.

Lars Behrens (lars-behrens-u) wrote :

Sorry, no improvement on ASUS H110M-C Board with
linux-image-4.4.0-116-generic_4.4.0-116.140~lp1745349Commitff2699cReverted_amd64

Still 4 out of 5 boots result in hard lock.

Joseph Salisbury (jsalisbury) wrote :

I started a kernel bisect between 4.4.0-109 and 4.4.0-110. The kernel bisect will require testing of about 5 - 6 test kernels.

I built the first test kernel, up to the following commit:
d3d0f0a209ee29cf553b8b5580eb954b0d4aa970

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Mike Turner (q-mike-5) wrote :

I'm definitely running from your 110 kernel, and it exhibits the Watchdog Hard Lockup problem, when running without noipbp.

Lars Behrens (lars-behrens-u) wrote :

No luck, unfortunately.
linux-image-4.4.0-110-generic_4.4.0-110.133~lp1745349Commitd3d0f0a209ee_amd64 shows the same hard lock.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
7de295e2a47849488acec80fc7c9973a4dca204e

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Mike Turner (q-mike-5) wrote :

That one, Linux golf 4.4.0-110-generic #133~lp1745349Commit7de295e2a47849 SMP Tue Mar 6 13:00:57 UTC 20 x86_64 x86_64 x86_64 GNU/Linux doesn't appear to have the problem.

It has been running for a lot longer than it usually does.

Lars Behrens (lars-behrens-u) wrote :

Yay!

Three boots in a row worked with
linux-image-4.4.0-110-generic_4.4.0-110.133~lp1745349Commit7de295e2a47849_amd64

Was only able to test on one machine though, all my users are logged in. More testing tomorrow.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
e233ec086ab77c1fc5714368e93a0d6c99d92226

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Mike Turner (q-mike-5) wrote :

That one (e233e...) failed.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
49bb7a3cbf55099e79b27c611176b8db5533d566

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Mike Turner (q-mike-5) wrote :

Linux golf 4.4.0-110-generic #133~lp1745349Commit49bb7a3cbf5 SMP Tue Mar 6 16:37:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux doesn't seem to have the problem,.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
47a07600efeb2a180b847bb0e3e99e138a49198c

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Mike Turner (q-mike-5) wrote :

That one, 47a0... , failed.

Unfortunately, that is the last one that I can test for you until next week, as I have to go away for a few days.

I hope that someone else can help.

Lars Behrens (lars-behrens-u) wrote :

47a07600efe does not work here, hard lock on two pcs.

Joseph Salisbury (jsalisbury) wrote :

I built the next test kernel, up to the following commit:
b0c3e8bd040546f4dfedff77e3a171a9b15c6571

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test that kernel and report back if it has the bug or not? I will build the next test kernel based on your test results.

Thanks in advance

Lars Behrens (lars-behrens-u) wrote :

No success, hard locks with b0c3e8bd040

Joseph Salisbury (jsalisbury) wrote :

The bisect reported the following as the first bad commit:
b0c3e8bd0405 ("x86/mm: Only set IBPB when the new thread cannot ptrace current thread")

I built a Xenial test kernel with a revert of this commit. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1745349

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Lars Behrens (lars-behrens-u) wrote :

linux-image-4.4.0-116-generic_4.4.0-116.140~lp1745349Commitb0c3e8bd0405Reverted_amd64
seems to work here, 3 reboots in a row on one machine without hard locks.

Joseph Salisbury (jsalisbury) wrote :

Before I request feedback from upstream, can you see if this bug is already fixed in the mainline kernel? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc4/

Joseph Salisbury (jsalisbury) wrote :

To narrow down the number of commits that could have fixed this, can you test the following two kernels as welL:

v4.15 Final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15/
v4.16-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16-rc1/

Ingemar Fällman (mrsmurf) wrote :

If i remember correct then from 4.15 the retpoline fixes are used for spectre mitigation so the noibpb code is not used.

Lars Behrens (lars-behrens-u) wrote :

linux-image-4.15.0-041500-generic_4.15.0-041500.201802011154_amd64
and
linux-image-4.16.0-041600rc1-generic_4.16.0-041600rc1.201802120030_amd64
both are working

Joseph Salisbury (jsalisbury) wrote :

I'll ping the patch author for feedback.

Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the proposed kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Thank you in advance!

Mike Turner (q-mike-5) wrote :

I'm afraid that 4.4.0-117-generic #141 - Ubuntu SMP Tue Mar 13 11:58:07 doesn't cure the problem for me. I still get the Watchdog Hard Lockups.

Joseph Salisbury (jsalisbury) wrote :

This bug is probably a duplicate of bug 1746418 in Artful.

Val (vk1266) wrote :

I have very similar issues with 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux. Booting with the noibpb option does help and I am currently using it as a workaround.

broadmind (gustavo-vegas) wrote :

I have experienced a similar lockup during boot with newly installed updates today (4/4/2018). The kernel version is 4.4.0-119-generic #143. I also tried to boot with an older one (4.4.0-116 perhaps)to no avail. Adding "noibpb" to GRUB_CMDLINE_LINUX= in /etc/default/grub allowed the machine to boot fully.

Ivan Ivanov (her-ivanov) wrote :

Thanks broadmind (gustavo-vegas).
I also have a same problem lockup during boot with newly installed updates today (4/4/2018). The kernel version is 4.4.0-119-generic #143. I also tried to boot with an older one (4.4.0-116 perhaps)to no avail. I can only boot in upstart mode on these kernels, also there were no problems when loading 4.4.0-112.
Adding "noibpb" to GRUB_CMDLINE_LINUX= in /etc/default/grub allowed the machine to boot fully without problems.

Sander (3-sander) wrote :

Confirmed, 4.4.0-119-generic #143 does not solve this problem, still a hard freeze during boot. Adding "noibpb" to GRUB_CMDLINE_LINUX= in /etc/default/grub allowed the machine to boot. Interestingly enough my server is on VMware, so the problem exists there too

Joseph Salisbury (jsalisbury) wrote :

Can those affected by this bug, please test the following kernel:
https://launchpad.net/ubuntu/+source/linux/4.15.0-15.16/+build/14530348

To test the kernel, install both the linux-image and linux-image-extra .deb packages.

Mike Turner (q-mike-5) wrote :

I've just rebooted five times without any lock ups.

I'll leave it running, and see what happens.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing, Mike. That is the kernel that will ship with 18.04(Bionic). We may need to perform a "Reverse" kernel bisect to identify the commit in Bionic that fixes this bug.

Ivan Ivanov (her-ivanov) wrote :

No problem with boot on 4.15.0-15.16 (noibpb disable in GRUB_CMDLINE_LINUX=)

Andreas Argyris (anargiri) wrote :

Hello,

I can also confirm that there are no boot problems with 4.15.0-15.16 and noibpb disabled in GRUB_CMDLINE_LINUX

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers