Freezing on boot since kernel 4.15.0-72-generic release

Bug #1856387 reported by Anthony Buckley on 2019-12-14
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Bionic
High
You-Sheng Yang
linux-oem (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
You-Sheng Yang

Bug Description

[SRU Justification]

[Impact]
In bug 1840239, HPET is disabled on some systems for they caused TSC
being marked unstable while it is not. This caused an regression as bug
1851216
that some systems may then hang at early, so a fix cherry picked
back from v5.3-rc1. However, this fix also introduce yet another
regression that some other users may hang at boot while PIT is diabled
in the previous fix.

[Fix]
Commit 979923871f69 ("x86/timer: Don't skip PIT setup when APIC is
disabled or in legacy mode") from v5.6-rc1, also backported to v5.4.19
and v5.5.3, fixes PIT setup in this case.

[Test Case]
Simply boot a patch kernel on systems affected and it shouldn't hang.

[Regression Potential]
Low. Stable patch and trivial backport.

[Other Info]
The same fix for bug 1851216 was also backported to Disco and Eoan, but
they were then fixed with this 979923871f69 commit backported in bug
1866858
and bug 1867051, which pulls v5.4 stable patches into Disco and
Eoan correspondingly, leaving B/OEM-B the only victims so far.

========== Original Bug Description ==========

After the update to install kernel 4.15.0-72-generic (a bit over a week ago) my computer will not boot. On boot, all I see is the purple screen with:
Loading Linux 4.15.0-72-generic ...
Loading initial ramdisk ...
and nothing happens. Just sits there. I've waited about 5-10 minutes on occasion but to no avail.
I've checked a number of logs in /var/log but not found anything.

If I go into the advanced options and select kernel 4.15.0-70-generic, the computer boots normally.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-72-generic 4.15.0-72.81
ProcVersionSignature: Ubuntu 4.15.0-70.79-generic 4.15.18
Uname: Linux 4.15.0-70-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: tony 1977 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Sat Dec 14 21:53:14 2019
HibernationDevice: RESUME=UUID=5475ce25-e091-45e2-9811-9b5cddc08dd1
InstallationDate: Installed on 2018-09-16 (454 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 04f2:b59e Chicony Electronics Co., Ltd
 Bus 001 Device 002: ID 046d:c063 Logitech, Inc. DELL Laser Mouse
 Bus 001 Device 004: ID 8087:0aaa Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: GIGABYTE Sabre 17WV8
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-70-generic root=UUID=9455257c-d3b7-4d61-853d-ab0b0ee40013 ro acpi=off
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-70-generic N/A
 linux-backports-modules-4.15.0-70-generic N/A
 linux-firmware 1.173.13
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/22/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F05
dmi.board.asset.tag: Tag 12345
dmi.board.name: Sabre 17WV8
dmi.board.vendor: GIGABYTE
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: GIGABYTE
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF05:bd05/22/2018:svnGIGABYTE:pnSabre17WV8:pvrNotApplicable:rvnGIGABYTE:rnSabre17WV8:rvrNotApplicable:cvnGIGABYTE:ct10:cvrN/A:
dmi.product.family: Sabre
dmi.product.name: Sabre 17WV8
dmi.product.version: Not Applicable
dmi.sys.vendor: GIGABYTE

CVE References

Anthony Buckley (tony-buckley) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
You-Sheng Yang (vicamo) wrote :

Hi, two questions.

1. do you have dmesg booting with a -72 kernel? The one attached is -70.

2. Could you try booting with an extra kernel parameter "hpet=disable"?

You-Sheng Yang (vicamo) wrote :

For 2), I mean booting -70 with "hpet=disable".

Anthony Buckley (tony-buckley) wrote :

Hello You-Sheng,
Thanks for responding.
I tried "hpet=disable" on a boot for both -70 and -72. No change. Kernel -72 still just stops, but -70 boots OK. As for the dmesg, I don't seem to be able to get any logging / messaging for -72. The only reference I can find for it is in the dpkg log when it was installed back on 4-Dec-2019. It's as if it does not even start. For what it's worth I've attached the dmesg for the -70 boot with "hpet=disable".

Anthony Buckley (tony-buckley) wrote :

Hello,
Just updating to say that I tried the upstream kernel below but the problem is still present.

Upstream kernel:

5.5.0-050500rc1-generic

Regards.
Tony

Anthony Buckley (tony-buckley) wrote :

Hello You-Sheng,
Thanks for responding again.
Yes, I was just looking at doing a bisect. I've done it before for another problem. I just have to familiarise myself again with the procedures for it. Will attend to it soon.
Regards

Anthony Buckley (tony-buckley) wrote :

Hello,
I have completed the bisect as requested and identified the problem commit. The bisect message is as follows:-

git bisect good
f723dd269d0740e09af47bb5590ffc4f61766153 is the first bad commit
commit f723dd269d0740e09af47bb5590ffc4f61766153
Author: Thomas Gleixner <email address hidden>
Date: Thu Nov 7 09:05:00 2019 +0100

    x86/timer: Skip PIT initialization on modern chipsets

    BugLink: https://bugs.launchpad.net/bugs/1851216

    Recent Intel chipsets including Skylake and ApolloLake have a special
    ITSSPRC register which allows the 8254 PIT to be gated. When gated, the
    8254 registers can still be programmed as normal, but there are no IRQ0
    timer interrupts.

    Some products such as the Connex L1430 and exone go Rugged E11 use this
    register to ship with the PIT gated by default. This causes Linux to fail
    to boot:

      Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with
      apic=debug and send a report.

    The panic happens before the framebuffer is initialized, so to the user, it
    appears as an early boot hang on a black screen.

    Affected products typically have a BIOS option that can be used to enable
    the 8254 and make Linux work (Chipset -> South Cluster Configuration ->
    Miscellaneous Configuration -> 8254 Clock Gating), however it would be best
    to make Linux support the no-8254 case.

    Modern sytems allow to discover the TSC and local APIC timer frequencies,
    so the calibration against the PIT is not required. These systems have
    always running timers and the local APIC timer works also in deep power
    states.

    So the setup of the PIT including the IO-APIC timer interrupt delivery
    checks are a pointless exercise.

    Skip the PIT setup and the IO-APIC timer interrupt checks on these systems,
    which avoids the panic caused by non ticking PITs and also speeds up the
    boot process.

    Thanks to Daniel for providing the changelog, initial analysis of the
    problem and testing against a variety of machines.

    Reported-by: Daniel Drake <email address hidden>
    Signed-off-by: Thomas Gleixner <email address hidden>
    Tested-by: Daniel Drake <email address hidden>
    Cc: <email address hidden>
    Cc: <email address hidden>
    Cc: <email address hidden>
    Cc: <email address hidden>
    Cc: <email address hidden>
    Link: https://<email address hidden>

    (backported from commit c8c4076723daca08bf35ccd68f22ea1c6219e207)
    Signed-off-by: You-Sheng Yang <email address hidden>
    Acked-by: Stefan Bader <email address hidden>
    Acked-by: Connor Kuehl <email address hidden>
    Signed-off-by: Stefan Bader <email address hidden>

:040000 040000 9c51f067713006f928684555c3254e89bdc10361 ad4d7a34eed39a733c78e630f4d9125f67e001bb M arch

Regards

Anthony Buckley (tony-buckley) wrote :

Hello again, sorry I meant to add the git bisect log (FYI). See below:-

# bad: [48d6312566e04b7a713cc7c15ae7dcd37efcfa95] UBUNTU: Ubuntu-4.15.0-72.81
# good: [ad85666cf30fb921558424a18cabbf396361a90c] UBUNTU: Ubuntu-4.15.0-70.79
git bisect start 'Ubuntu-4.15.0-72.81' 'Ubuntu-4.15.0-70.79'
# good: [c76da031386f02b6738e6ba9a132dc3b817e8ba4] arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
git bisect good c76da031386f02b6738e6ba9a132dc3b817e8ba4
# bad: [4e160cf6dea3a8521bb8488532f18c69868622cf] mlxsw: spectrum: Set LAG port collector only when active
git bisect bad 4e160cf6dea3a8521bb8488532f18c69868622cf
# good: [7adb99a811f00df95c29751a0e77ffab97c96fd1] selftests: lib.mk set KSFT_TAP_LEVEL to prevent nested TAP headers
git bisect good 7adb99a811f00df95c29751a0e77ffab97c96fd1
# good: [b04e95663b835ee0e34d64ce9a6178b083b788ff] net: ena: remove inline keyword from functions in *.c
git bisect good b04e95663b835ee0e34d64ce9a6178b083b788ff
# bad: [4d4aa7f60b63e5b5477e2b3536633b4c28ef5f65] dm snapshot: rework COW throttling to fix deadlock
git bisect bad 4d4aa7f60b63e5b5477e2b3536633b4c28ef5f65
# good: [069a7c0f0d8cba118849543a4fc72384d3679fd6] UBUNTU: [Packaging] dkms -- dkms-build quieten wget verbiage
git bisect good 069a7c0f0d8cba118849543a4fc72384d3679fd6
# bad: [46888ff0dc436827f5fdfb0a77d7f50845f154ab] thermal: int340x: processor_thermal: Add GeminiLake support
git bisect bad 46888ff0dc436827f5fdfb0a77d7f50845f154ab
# bad: [f723dd269d0740e09af47bb5590ffc4f61766153] x86/timer: Skip PIT initialization on modern chipsets
git bisect bad f723dd269d0740e09af47bb5590ffc4f61766153
# good: [f25dc28338aa6277fa1e832416802c83bf8ed4e2] efi: efi_get_memory_map -- increase map headroom
git bisect good f25dc28338aa6277fa1e832416802c83bf8ed4e2
# first bad commit: [f723dd269d0740e09af47bb5590ffc4f61766153] x86/timer: Skip PIT initialization on modern chipsets

You-Sheng Yang (vicamo) wrote :

Hi Anthony,

As you may have found, this commit was landed in bug 1851216 for a possible system hang due to bug 1840239, and since it is a solution backported from v5.3-rc1, and you also stated this can be reproduced in v5.5-rc1, then you may probably want to either 1) try a slightly newer v5.5-rc5 mainline kernel[1], or 2) file a upstream bug in kernel bugzilla[2].

[1]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
[2]: https://bugzilla.kernel.org/

Anthony Buckley (tony-buckley) wrote :

Hi You-Sheng,
Thanks for responding. Sadly, the problem is not fixed in v5.5-rc5 mainline kernel.
I'll look at lodging a bug in bugzilla.
Also I'll try putting some debug code around in that commit to see if I can identify anything.
Regards.

Tom Ivar Johansen (tijohansen) wrote :

Hi,
I seem to have the same problem. I have no experience with linux kernels or bug reports, but I am a computer engineer so with guidance I will be able to contribute to debugging.

I am running "Ubuntu 4.15.0-46.49-generic 4.15.18" on an HP zbook the lspci output is attached.

Both 4.15.0-72 and 4.15.0-74 failed as described by Anthony Buckley.

Anthony Buckley (tony-buckley) wrote :

I have filed a bug in bugzilla. Hopefully it's been done OK as I've not done this before.

It is as follows:-

Bug 206125 - Freezing on boot since kernel 4.15.0-72-generic release

Regards.

You-Sheng Yang (vicamo) wrote :

Anthony, you should stat this is still reproducible with v5.5-rc5, not Ubuntu 4.15.0-72-generic.

Include bugzilla url for further reference: https://bugzilla.kernel.org/show_bug.cgi?id=206125 .

Anthony Buckley (tony-buckley) wrote :

Thanks for your feedback, You-Sheng. I've added a comment to that effect now.
Regards.

Anthony Buckley (tony-buckley) wrote :

I have tested a proposed patch by Thomas Gleixner (<email address hidden>) at both the identified commit and also at the latest version and in both cases my computer booted successfully.
Comment also posted in bugzilla.
https://bugzilla.kernel.org/show_bug.cgi?id=206125
Regards

Tom Ivar Johansen (tijohansen) wrote :

I can confirm that I have applied the same patch to 4.15.0-76.86-generic. Without the patch my system failed as described above. With the patch it seems to work.

sirkku (sirkusmaisteri) wrote :

It seems that I have the same issue with my HP ZBook that Ubuntu doesn't boot since kernel 4.15.0-72-generic release.

You-Sheng Yang (vicamo) wrote :

Hi, for those who still suffers from this issue, it was a regression issue caused by commit f723dd269d07 "x86/timer: Skip PIT initialization on modern chipsets"), which was backported to 4.15.0-71 in bug 1851216 as a fix for some other platforms. So, please try latest mainline kernel[1] as possible as there might be yet another fix to this regression, then we may finally cherry-pick it back to 4.15 and fix hardware platforms of either group. As far as we know, this issue was still reproducible on v5.5-rc5, so you may want to try something newer than that directly.

[1]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

Anthony Buckley (tony-buckley) wrote :

Hello all,
Does anyone know if this bug will cause a problem upgrading to Ubuntu 20.04 LTS? I assume it will as we're stuck using an older kernel and 20.04 is based on kernel 5.4. Or, will it simply upgrade and keep us on the older kernel?
Regards.

You-Sheng Yang (vicamo) wrote :

@Anthony, you can try focal kernel directly on your Bionic installation first.

  $ printf "deb http://archive.ubuntu.com/ubuntu/ focal main restricted universe multiverse\ndeb http://archive.ubuntu.com/ubuntu/ focal-updates main restricted universe multiverse\n" > /etc/apt/sources.list.d/focal.list
  $ sudo apt update
  $ sudo apt install linux-modules-extra-5.4.0-26-generic linux-firmware/focal

And, it seems an upstream fix commit 979923871f69 ("x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode") has been backported to v5.4.19 and therefore focal kernel included that in bug 1863588 since at least 5.4.0-15. So it should be fine for you to use focal kernel now.

Or, maybe you don't bother upgrade kernels from Focal. Just use 5.3 kernels from Bionic, as they should have the same fix since 5.3.0-46.

Changed in linux (Ubuntu Bionic):
status: New → In Progress
assignee: nobody → You-Sheng Yang (vicamo)
You-Sheng Yang (vicamo) wrote :

Bug 1851216 backports commit c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") to Bionic and Disco, which then has a follow-up commit 979923871f69 ("x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode") landed in Eoan and Focal and on, leaving Bionic the only victim suffering from this issue and not yet EOL-ed.

You-Sheng Yang (vicamo) wrote :

Disco and OEM-OSP1-B have been fixed as well.

You-Sheng Yang (vicamo) wrote :

PPA for testing https://launchpad.net/~vicamo/+archive/ubuntu/ppa-1856387 . It would take several hours to publish built binaries.

Anthony Buckley (tony-buckley) wrote :

Hello You-Sheng,
I've only just noticed this. Thanks for responding. I've been a bit busy lately, but I'll either try one of the bionic 5.3 kernels or try your ppa test hopefully soon.
Thanks. Regards.

You-Sheng Yang (vicamo) wrote :

Please do try my ppa so that we can verify if it actually works and solve this problem for other Bionic users as well. Thank you.

Anthony Buckley (tony-buckley) wrote :

OK. I must confess I'm a bit vague on ppa's. Does it clone the kernel source and then I do a build and test?

Anthony Buckley (tony-buckley) wrote :

Hello You-Sheng,
(hope you are well by the way)
I've applied your ppa as described

sudo add-apt-repository ppa:vicamo/ppa-1856387
sudo apt-get update

However I'm not sure what to do next. I assume it installed some packages, but how do I test. I tried a reboot, but I couldn't see how it would work as the latest kernel I have available is 4.15.0-99 and I understand the changes are in the 5... kernels.
Do I need to get or build a new kernel?

Anthony Buckley (tony-buckley) wrote :

Hello again You-Sheng,
I think I get something. I think I understand what you mean by it would take several hours to publish built binaries. You want me to follow those links on the ppa page and do those git clones?
I'm doing that anyway just see what happens.
Regards.
Tony

Anthony Buckley (tony-buckley) wrote :

Hello yet again You-Sheng,
OK, I'm missing something here unfortunately. I've dome the first clone:-

git clone -b bug-1856387/fix-PIT-skip/bionic https://git.launchpad.net/~vicamo/+git/ubuntu-kernel

but obviously the second will be a problem trying to clone into 'ubuntu-kernel'.
What is:-
git clone -b bug-1856387/fix-PIT-skip/bionic git+ssh://<email address hidden>/~vicamo/+git/ubuntu-kernel

Regards

You-Sheng Yang (vicamo) wrote :

Sorry for the late reply. I'm not going to let you compile kernel, as least for now. I promise.

So it's pretty simple here. After you ran following two commands:

  $ sudo add-apt-repository ppa:vicamo/ppa-1856387
  $ sudo apt-get update

apt has rebuilt its available packages database for install, and all that you will do next is to install one or more packages listed in the ppa "View package details" link[1]. Since you were on 4.15 generic kernel, please try following:

  $ sudo apt install linux-modules-extra-4.15.0-100-generic=4.15.0-100.101+lp1856387 \
      linux-headers-4.15.0-100-generic=4.15.0-100.101+lp1856387

Note for the "=" sign. It will install all the prerequisite packages as well. Then you reboot and select to boot from 4.15.0-100-generic kernel from grub's menu, and see if now you boot into GUI as expected long time ago.

If you're also interested in having a try on -oem kernels, use:

  $ sudo apt install linux-image-unsigned-4.15.0-1080-oem=4.15.0-1080.90+lp1856387 \
      linux-headers-4.15.0-1080-oem=4.15.0-1080.90+lp1856387

[1]: https://launchpad.net/~vicamo/+archive/ubuntu/ppa-1856387/+packages

Anthony Buckley (tony-buckley) wrote :

Well done You-Sheng!
Tried both packages and they worked fine. Don't worry about the delay, I have more than enough to keep me occupied at the moment.
Thanks much for your efforts here. What happens next, just wait for 20.04.1 to release? Do I need to clean anything up or can I just leave as is for the moment?
Regards.
Tony

You-Sheng Yang (vicamo) wrote :

@Anthony, thank you. As commented in #23, it should be safe to upgrade you system to Eoan/Focal and on if you feel like. I'll still do the follow-ups and send the fix to Bionic then.

You-Sheng Yang (vicamo) on 2020-06-12
Changed in linux-oem (Ubuntu Bionic):
status: New → In Progress
You-Sheng Yang (vicamo) on 2020-06-12
Changed in linux-oem (Ubuntu Bionic):
assignee: nobody → You-Sheng Yang (vicamo)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux-oem (Ubuntu):
status: New → Invalid
Stefan Bader (smb) on 2020-06-16
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
Anthony Buckley (tony-buckley) wrote :

Thanks You-Sheng, I'll set aside some time shortly to upgrade.
Regards.

You-Sheng Yang (vicamo) on 2020-06-22
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
AceLan Kao (acelankao) on 2020-07-03
Changed in linux-oem (Ubuntu Bionic):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem - 4.15.0-1093.103

---------------
linux-oem (4.15.0-1093.103) bionic; urgency=medium

  * bionic/linux-oem: 4.15.0-1093.103 -proposed tracker (LP: #1887026)

  * [SRU] plug headset won't proper reconfig ouput to it on machine with default
    output (LP: #1882248)
    - SAUCE: ALSA: hda - let hs_mic be picked ahead of hp_mic

  * Freezing on boot since kernel 4.15.0-72-generic release (LP: #1856387)
    - x86/timer: Don't skip PIT setup when APIC is disabled or in legacy mode

  [ Ubuntu: 4.15.0-112.113 ]

  * bionic/linux: 4.15.0-112.113 -proposed tracker (LP: #1887048)
  * Packaging resync (LP: #1786013)
    - update dkms package versions
  * CVE-2020-11935
    - SAUCE: aufs: do not call i_readcount_inc()
    - SAUCE: aufs: bugfix, IMA i_readcount
  * CVE-2020-10757
    - mm: Fix mremap not considering huge pmd devmap
  * Update lockdown patches (LP: #1884159)
    - efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
    - efi: Restrict efivar_ssdt_load when the kernel is locked down
    - powerpc/xmon: add read-only mode
    - powerpc/xmon: Restrict when kernel is locked down
    - [Config] CONFIG_XMON_DEFAULT_RO_MODE=y
    - SAUCE: acpi: disallow loading configfs acpi tables when locked down
  * seccomp_bpf fails on powerpc (LP: #1885757)
    - SAUCE: selftests/seccomp: fix ptrace tests on powerpc
  * Introduce the new NVIDIA 418-server and 440-server series, and update the
    current NVIDIA drivers (LP: #1881137)
    - [packaging] add signed modules for the 418-server and the 440-server
      flavours

  [ Ubuntu: 4.15.0-111.112 ]

  * bionic/linux: 4.15.0-111.112 -proposed tracker (LP: #1886999)
  * Bionic update: upstream stable patchset 2020-05-07 (LP: #1877461)
    - SAUCE: mlxsw: Add missmerged ERR_PTR hunk
  * linux 4.15.0-109-generic network DoS regression vs -108 (LP: #1886668)
    - SAUCE: Revert "netprio_cgroup: Fix unlimited memory leak of v2 cgroups"

 -- Kelsey Skunberg <email address hidden> Tue, 14 Jul 2020 12:21:34 -0600

Changed in linux-oem (Ubuntu Bionic):
status: Fix Committed → Fix Released

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.