System freezes when starting Xorg after installing linux-image-4.13.0-32-generic

Bug #1746418 reported by Sandra Ray on 2018-01-31
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Artful
High
Joseph Salisbury

Bug Description

My laptop (Lenovo T440) freezes when starting Xorg (lightdm) only after installing the latest ubuntu kernel for 17.10, linux-image-4.13.0-32-generic.

Am running Xorg instead of MIR due to MIR crashing when I initially upgraded from 17.04 to 17.10, but that's a different problem for a different day. It's been running with Xorg stably for at least 2-3 months.

If I boot with the previous kernel version on my /boot (linux-image-4.13.0-25-generic), everything works fine.

If I boot with the latest mainline kernel from the Ubuntu mainline repo (4.15.0), everything works.

Using grub's option to run "Recovery Mode" for the 4.13.0-32 kernel, I am able to use the CLI and access the encrypted root filesystem, which is how I generated the apport repo that's attached. It's only when telling it to continue Normal booting that it freezes, which is why I suspect the problem of being Xorg.

When the system freezes, the fan starts to spin up. I cannot switch virtual consoles using CTRL-ALT-F{1-7}. CTRL-ALT-DEL pressed repeatedly doesn't do anything. To reboot, I have to hold the power button until the system powers off.

Sandra Ray (sandraray) wrote :
tags: added: kernel-fixed-upstream
tags: added: kernel-fixed-upstream-4.15

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1746418

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful
Sandra Ray (sandraray) on 2018-01-31
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
SkySurfer (mueller-g) wrote :

I was able to gather the output by disabling plymoth screen:

[ 61.359631] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[ 64.063967] watchdog: BUG: soft lockup - CPU6 stuck for 22s! [snap-exec:6105]
[ 64.063968] watchdog: BUG: soft lockup - CPU7 stuck for 22s! [systemd:6412]
[ 80.059966] watchdog: BUG: soft lockup - CPU0 stuck for 22s! [kworker/0::0:3]
[ 84.823028] NMI watchdog: Watchdog detected hard LOCKUP on cpu 5

Joseph Salisbury (jsalisbury) wrote :

I'd like to perform a "Reverse" bisect to figure out what commit fixes this bug. We need to identify the last kernel version that had the bug, and the first kernel version that fixed the bug.

Can you test the following kernels and report back:

v4.14 final: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14/
v4.15-rc1: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc1/
v4.15-rc4: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc4/

You don't have to test every kernel, just up until the first kernel that does not have the bug.

Thanks in advance!

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
tags: added: performing-bisect
Sandra Ray (sandraray) wrote :

I can confirm that v4.14 final works.

Also, I made an obvious blunder in the initial report: I had to disable *Wayland* after upgrading to 17.10, not Mir. That must've been a crossover from an alternate universe where Ubuntu went with Mir :)

Dominik George (natureshadow) wrote :

I can confirm the issue on a Lenovo T470s. However, it runs 16.04 LTS, but with the same kernel package.

Disabling lightdm (and thus startup of X does not fix the issue here, but the rest is identical.

Dominik George (natureshadow) wrote :

I have to add: We have quite a few laptops of this kind that show the issue, but one doesn't. It is the same hardware and has the same firmware versions.

Dominik George (natureshadow) wrote :

Another update: 4.4.0-112 generic from 16.04 shows the same issue. I do not know how to mark that in Launchpad.

Dominik George (natureshadow) wrote :

I verified again that it is not X. The hang also occurs when booting into multi-user.target and even after purging X.org.

I felt adventorous and did the following:

I booted into rescue mode. Then I let a shell loop start every service in /lib/systemd/system with 1 sec delay. All service came up without issues and I got the system to the same state as multi-user.target would.

Joseph Salisbury (jsalisbury) wrote :

@Sandra Ray, can you give 4.14-rc1 a try:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc1/

SkySurfer (mueller-g) wrote :

I can confirm that 4.14-rc1 is working fine on W540

Sandra Ray (sandraray) wrote :

4.14-rc1 boots ok!

Also, now that Dominik mentions it, I'm not 100% that the hang is happening in X - it could be in any of a number of startup programs.

Dominik George (natureshadow) wrote :

4.14-rc1 also works here.

Any updates?

Ingemar Fällman (mrsmurf) wrote :

Could it be the same issue as #1745349? It looks like it is the Spectre fix that causes the issue.
The problem goes away for me on the 4.4.0-112 kernel with the "noibpb" kernel argument on boot.

FaberfoX (faberfox) wrote :

I've been having this same issue on ubuntu mate 17.10 on a thinkpad T450s since kernel 4.13.0-25, and was booting with previous 4.13.0-21. Just tried #14 suggestion (adding noibpb) and it fixed it here.

Joseph Salisbury (jsalisbury) wrote :

We now know the fix is in v4.14-rc1. Can folks affected by this bug confirm that v4.13 final is bad? If it is, I'll start a reverse bisect between 4.13 final and v4.14-rc1.

4.13 final is available from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13/

Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Artful):
status: New → In Progress
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Changed in linux (Ubuntu Artful):
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
SkySurfer (mueller-g) wrote :

I have tested 4.13 final from your link and it boots up.

Joseph Salisbury (jsalisbury) wrote :

Thanks for testing. That could mean that a SAUCE patch caused this regression. I review the git log. If nothing sticks out, we can perform a regular kernel bisect of the Ubuntu kernels.

Joseph Salisbury (jsalisbury) wrote :

Before starting the bisect, can you see if the latest Artful kernel still has the bug? It is in the updates as kernel version 4.13.0-36.

Changed in linux (Ubuntu):
importance: Medium → High
Changed in linux (Ubuntu Artful):
importance: Medium → High
Sandra Ray (sandraray) wrote :

4.13.0-36 hangs the system.

Adding "noibpb" to the kernel command line fixes the problem for me too.

tags: added: pti
Joseph Salisbury (jsalisbury) wrote :

I built an Artful test kernel with a revert of commit: 8578993

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1746418

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Lars Ehrhardt (lehrhardt) wrote :

Hi there,

we tried this new kernel - our test machine still locked up during the boot process while starting services.

Sandra Ray (sandraray) wrote :

@jsalisbury, sorry - your test kernel still hangs (though it continues to boot with "noibpb" passed as a kernel flag).

Also - do you know if https could be enabled for kernel.ubuntu.com? While it's unlikely to happen, I wouldn't want some evil man in the middle to corrupt those beautiful kernel bits.

Joseph Salisbury (jsalisbury) wrote :

I built a Artful test kernel with a revert of commit 96d520d. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1746418

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Sandra Ray (sandraray) wrote :

@jsalisbury: 96d520d boots!

Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the proposed kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Thank you in advance!

Sandra Ray (sandraray) wrote :

@jsalisbury : linux-image version 4.13.0-38.43 from artful-proposed still hangs. :(

Joseph Salisbury (jsalisbury) wrote :

This bug is probably a duplicate of bug 1745349 in Xenial.

Sandra Ray (sandraray) wrote :

For anyone looking for a temporary/unsupported workaround on Thinkpads without disabling the meltdown/spectre protections via noibpb, the mainline kernels (4.14 and 4.15) seem to function.

You can find more information on running them at https://wiki.ubuntu.com/Kernel/MainlineBuilds

kolya (mar-kolya) wrote :

Looks like I've got this one too on Lenovo P50 with E3-1505M. I've created #1759997 since I was not able to trace it immediately to this bug.

The problem is that I had a fully working system until 3.20180312.0 microcode update arrived - which made my system essentially unbootable. This was an unpleasant surprise and may be a wide spread problem since this microcode is now 'main stream'.

Gerard Dethier (g-dethier) wrote :

Hi, I am also affected by this bug but not on a Lenovo or Thinkpad laptop. Mine is an Asus Zenbook UX301L running Ubuntu 17.10. Latest working kernel version is 4.13.0-36-generic. Patches 37 and 38 cause the freeze on boot. Adding option noibpb solves the issue (at least with 4.13.0-38-generic).

Joseph Salisbury (jsalisbury) wrote :

Can those affected by this bug, please test the following kernel:
https://launchpad.net/ubuntu/+source/linux/4.15.0-15.16/+build/14530348

To test the kernel, install both the linux-image and linux-image-extra .deb packages.

Sandra Ray (sandraray) wrote :

@jsalisbury, it boots!

Lars Ehrhardt (lehrhardt) wrote :

We can also confirm, that kernel image 4.15.0-15.16 boots on one of our affected machines.

Sandra Ray (sandraray) wrote :

I was able to successfully boot the latest 17.10 kernel: linux-image-4.13.0-39-generic

This issue is resolved.

Thanks!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments