intel-microcode on ASUS makes kernel stuck during loading initramfs on bionic-updates, bionic-security

Bug #1829620 reported by Mark on 2019-05-18
150
This bug affects 24 people
Affects Status Importance Assigned to Milestone
intel-microcode (Ubuntu)
Undecided
Steve Beattie
linux (Ubuntu)
Undecided
Unassigned
linux-hwe (Ubuntu)
Undecided
Unassigned
linux-hwe-edge (Ubuntu)
Undecided
Unassigned

Bug Description

Description:
- my system gets stuck at "Booting, Loading initramfs" (the first 2 lines of booting, after grub)
- does not even show the enter cryptsetup passphrase
- affected kernels:
# apt list --installed |grep linux-signed
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
linux-signed-generic/bionic-security,bionic-updates,now 4.15.0.50.52 amd64 [installed]
linux-signed-generic-hwe-18.04/bionic-security,bionic-updates,now 4.18.0.20.70 amd64 [installed]
linux-signed-generic-hwe-18.04-edge/bionic-security,bionic-updates,now 5.0.0.15.71 amd64 [installed]

- the setup is not new, has been working perfectly before (about 7 days since my last restart?)

System:
- HW: ASUS Zenbook 14 UX433FN
- Ubuntu 18.04, runing latest HWE, fully updated
- grub(-pc), cryptsetup (crypttab entries for custom encrypted LUKS setup),

Suspected/possible cause?:
- recent intel-microcode package update
- recent kernel package updates

Steps taken:
- tried to remove "splash quiet" from grub/kernel cmd line (also tried adding nosplash, noplymouth)
- completely removed nvidia drivers (apt purge *nvidia*)
- completely purged and reinstalled grub (grup-pc)
- completely purged and reinstalled all kernels (headers, modules, image, ..)
- toggle BIOS "fastboot" (now using OFF)
- toggle UEFI SecureBoot (now using ON)
- remove plymouth (apt remove *plymouth* , but the workaround is working with plymouth installed)

Workaround:
- so far, I'm only able to boot with non-Ubuntu kernel! (linux-image-liquorix-amd64)
- which needs "splash" option ON
- reinstall cryptsetup & update-grub (as suggested in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829620/comments/10 )

I am not sure how to get you more debug info, as this setup has been working before, and it's a very eary boot-process bug, so I can't even access dmesg etc.

EDIT:

Hypothesis:
Only affects ASUS with i7-8565U Whiskey Lake Intel CPU

Upstream Bug Report:
https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/1

WORKAROUND 1: disable intel microcode updates during boot
From this bug: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920
1/ add the boot parameter: dis_ucode_ldr to /etc/default/grub
2/ update-grub

WORKAROUND 2: downgrade (and hold) intel-microcode to older version from bionic/main
apt install --reinstall intel-microcode=3.20180312.0~ubuntu18.04.1

---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.6
Architecture: amd64
CurrentDesktop: KDE
DistroRelease: KDE neon 18.04
InstallationDate: Installed on 2012-12-23 (2337 days ago)
InstallationMedia: Kubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.1)
Package: linux-hwe-edge (not installed)
Tags: bionic wayland-session
Uname: Linux 5.0.0-17.1-liquorix-amd64 x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm libvirtd lpadmin netdev plugdev sudo vboxusers video
_MarkForUpload: True
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.6
Architecture: amd64
CurrentDesktop: KDE
DistroRelease: KDE neon 18.04
InstallationDate: Installed on 2012-12-23 (2339 days ago)
InstallationMedia: Kubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.1)
Package: linux-hwe-edge
PackageArchitecture: amd64
ProcVersionSignature: Ubuntu 5.0.0-15.16~18.04.1-generic 5.0.6
Tags: third-party-packages bionic wayland-session
Uname: Linux 5.0.0-15-generic x86_64
UnreportableReason: Toto není oficiální KDE balík. Prosíme odstraňte všechny balíky třetích stran a zkuste to znovu.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm libvirtd lpadmin netdev plugdev sudo video
_MarkForUpload: True

Mark (markthecodehamster) wrote :

GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
#GRUB_CMDLINE_LINUX_DEFAULT="quiet splash ro rootflags=subvol=@ubuntu mem_sleep_default=deep acpi_backlight=vendor pcie_aspm=force acpi_osi=Linux dm_mod.use_blk_mq=1 quiet kaslr ro resume=/dev/mapper/swapDevice"
GRUB_CMDLINE_LINUX_DEFAULT="ro kaslr resume=/dev/mapper/swapDevice mem_sleep_default=deep dm_mod.use_blk_mq=1 acpi_osi=Linux acpi_backlight=vendor splash"
GRUB_CMDLINE_LINUX=""
# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"
# Uncomment to disable graphical terminal (grub-pc only)
GRUB_TERMINAL=console
# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
GRUB_GFXMODE=auto
GRUB_GFXPAYLOAD_LINUX=text
# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true
# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"
# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

Mark (markthecodehamster) wrote :

/etc/crypttab

# Example crypttab file. Fields are: name, underlying device, passphrase, cryptsetup options.
ubuntuDevice UUID=2b44dbfa-f195-4a27-b126-c895c31a1bd5 none luks,initramfs,discard
swapDevice UUID=676be184-ca1a-464c-82f6-dfea11adb1b9 none luks,initramfs,discard

Mark (markthecodehamster) wrote :

/etc/fstab

# <file system> <mount point> <type> <options> <dump> <pass>
LABEL=ubuntu / btrfs defaults,compress=zstd,commit=120,ssd,subvol=@ubuntu 0 1
LABEL=boot /boot ext4 defaults 0 2
tmpfs /tmp tmpfs nosuid,nodev,noexec,size=5g,mode=1777 0 0 #moved to /etc/systemd/system/tmp.mount
tmpfs /run/shm tmpfs defaults,noexec,nosuid,nodev 0 0
proc /proc proc defaults,noexec,nosuid,nodev,hidepid=2 0 0
LABEL=ubuntu /home btrfs defaults,noexec,nosuid,nodev,compress=zstd,ssd,relatime,commit=120,subvol=@home 0 2
LABEL=swap none swap sw 0 0
LABEL=ubuntu /.snapshots btrfs defaults,noexec,nosuid,nodev,compress=zstd:6,ssd,relatime,commit=120,subvol=@snapshots 0 0
LABEL=SYSTEM /boot/efi vfat defaults,utf8,umask=007,gid=46 0 0

description: updated

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1829620

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected bionic wayland-session
description: updated

apport information

> apport-collect 1829620
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

I have ran and uploaded the output from apport-collect,
BUT this is not on an original ubuntu kernel, as the issue prevents me from booting any of those.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux-hwe (Ubuntu):
status: New → Confirmed
Changed in linux-hwe-edge (Ubuntu):
status: New → Confirmed
Simon Allan (sysko-supinfo) wrote :

I don't know how I can help on this one, but a coworker and myself , both on ubuntu , have been it by the same problem

(stuck a loading initramfs)

if we try with a older kernel in the grub list it works the first time and then it does not work anymore after next reboot

Simon Allan (sysko-supinfo) wrote :

I'm interested to know how you installed the custom kernel

Simon Allan (sysko-supinfo) wrote :

uninstalling cryptsetup and running upgrade-grub seems to fix the problem in my case

Mark (markthecodehamster) wrote :

Hi Simon,

> a coworker and myself , both on ubuntu , have been it by the same problem

thanks for confirming the error! Could you state what HW are you running? As the bug apparently does not hit everybody.

> if we try with a older kernel in the grub list it works the first time and then it does not work anymore after next reboot

Ok, I can confirm this behavior. This undeterminism is what made me confused that I had fixed the bug and then it hits again. Even my "hack" with non ubuntu kernel suffers this -- first run ok, then stuck!
I would strongly expect that /boot/, grub, & initramfs stay constant during reboots. So it's something in BIOS that is stateful and changes during hard-power off/resets??

> I'm interested to know how you installed the custom kernel

It's from this PPA:
deb http://ppa.launchpad.net/damentz/liquorix/ubuntu bionic main
//and as I discovered thanks to your comment, unfortunately even this kernel does not help workaround the issue.

> uninstalling cryptsetup and running upgrade-grub seems to fix the problem in my case

Do you use cryptsetup for encryption? I'm trying REinstalling, hope the uninstall is not the only option, as it's unapplicable to me.

PS: could someone please bump priority of this bug?

Mark (markthecodehamster) wrote :

> uninstalling cryptsetup and running upgrade-grub seems to fix the problem in my case

Awesome, thank you Simon!
For me this fixes the issue with non-ubontu kernel (I don't have the old versions of official kernels to try on).
So the final workaround is:
- have non-ubuntu (or old?) kernel
- splash ON
- reinstall cryptsetup

Then this kernel boots OK even after repeated reboots.
Without reinstalling cryptsetup this works only once until broken again.
Latest ubuntu kernels still do not work (for me, do they for you?)

So at this stage I'm atleast able to use the computer again.

Mark (markthecodehamster) wrote :

Adding cryptsetup to the list of affected, as even for non-ubuntu kernels, action is required (reinstall cryptsetup) to fix this.

description: updated
Simon Allan (sysko-supinfo) wrote :

> Do you use cryptsetup for encryption?
no

> Hardware
We use a laptop Asus vivobook S15 (we both have the very same model)

TJ (tj) wrote :

Mark:

With a LUKS encrypted system, when a new kernel is installed "update-initramfs -u -k $KERNEL_VERSION" is executed.

As part of that cryptsetup hooks scripts are called. They examine /etc/fstab and /etc/crypttabto determine if the root file-system, or swap (which may be used for hibernation) are encrypted.

If so cryptsetup and its supporting libraries and scripts are copied into the initrd.img-$KERNEL_VERSION file that is being built.

Additionally, the /etc/crypttab entry for the root file-system device is added to the initialramfs's /conf/conf.d/cryptroot.

At boot-time the initialramfs cryptsetup scripts read this file and should unlock the LUKS container.

As that is not happening you should, when dropped at the initialramfs shell, check for the existence of the config and the tooling:

# ls -l /conf/conf.d/cryptroot /bin/cryptroot-unlock /sbin/cryptsetup /lib/cryptsetup/askpass

If they are present you can manually unlock using:

# cryptsetup open /dev/sdXY sdXY_crypt

Note: identify the LUKS container using:

# blkid | grep crypt_LUKS

and replace my example 'sdXY' with your device name.

After a successful unlock scan for LVM volumes:

# vgchange -ay

Then let the init system resume operations by pressing Ctrl+D or typing:

# exit

If you're not in the initramfs but are looking at the broken system's file system from a LiveISO or similar you can check whether the cryptsetup tools are included in the initrd.img file using this command:

# sudo ls -l /boot/initrd.img*

# sudo lsinitramfs /boot/initrd.img-$KERNEL_VERSION | grep crypt

cryptroot-keyfiles
cryptroot-keyfiles/cryptswap1.key
cryptroot-keyfiles/LUKS_VG02.key
sbin/cryptsetup
usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
conf/conf.d/cryptroot
bin/cryptroot-unlock
lib/cryptsetup
lib/cryptsetup/askpass
lib/x86_64-linux-gnu/libgcrypt.so.20.2.1
lib/x86_64-linux-gnu/libcryptsetup.so.12
lib/x86_64-linux-gnu/libgcrypt.so.20
lib/x86_64-linux-gnu/libcryptsetup.so.12.2.0
lib/modules/5.1.0-050100-lowlatency/kernel/crypto
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/crypto_simd.ko
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/xor.ko
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/cryptd.ko
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/ecdh_generic.ko
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/async_tx
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/async_tx/async_tx.ko
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/async_tx/async_memcpy.ko
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/async_tx/async_raid6_recov.ko
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/async_tx/async_pq.ko
lib/modules/5.1.0-050100-lowlatency/kernel/crypto/async_tx/async_xor.ko
lib/modules/5.1.0-050100-lowlatency/kernel/drivers/md/dm-crypt.ko
lib/modules/5.1.0-050100-lowlatency/kernel/arch/x86/crypto
lib/modules/5.1.0-050100-lowlatency/kernel/arch/x86/crypto/glue_helper.ko
lib/modules/5.1.0-050100-lowlatency/kernel/arch/x86/crypto/aes-x86_64.ko
lib/modules/5.1.0-050100-lowlatency/kernel/arch/x86/crypto/aesni-intel.ko
scripts/local-bottom/cryptopensc
scripts/local-block/cryptroot
scripts/local-top/cryptroot
scripts/local-top/cryptopensc

TJ (tj) wrote :

Mark

TJ (tj) wrote :

Mark

I just realised your report isn't about cryptsetup being the problem and you never reach the initialramfs shell.

Your report that the last thing you see is "my system gets stuck at "Booting, Loading initramfs" tells us the kernel isn't starting. Those messages come from GRUB when it loads the kernel and initrd.img into memory.

I'm wondering if the recent Intel microcode updates and related kernel patches could be responsible? Can you tell us what CPU the system has?

$ cat /proc/cpuinfo

Mark (markthecodehamster) wrote :

> > Hardware
> We use a laptop Asus vivobook S15 (we both have the very same model)

This leads me to hypothesis: The bug affects only(?) ASUS with later (Kaby lake) Intel CPUs ?

summary: - cryptsetup stuck at loading initramfs
+ ASUS microcode/kernel stuck at loading initramfs
Changed in cryptsetup (Ubuntu):
status: New → Invalid
description: updated
tags: removed: wayland-session

hi TJ, thank you very much for the helpful debug instructions!

> I just realised your report isn't about cryptsetup being the problem and you never reach the initialramfs shell.

Yes, this is the case. Still, checked the lsinitramfs and the image files are OK, with all the cryptsetup binaries.

I have removed cryptsetup from the linked dependencies.

> $ cat /proc/cpuinfo

attached /proc/cpuinfo from my machine.
With @Simon's comment thet he's also on ASUS, my suspicion is this affects only these machines.

Mark (markthecodehamster) wrote :

ASUS zenbook ux433fn CPU info on machine hit by this bug.

Mark (markthecodehamster) wrote :

> I'm wondering if the recent Intel microcode updates and related kernel patches could be responsible? Can you tell us what CPU the system has?

And BINGO!
This was my suspicion too, I found a way to disable microcode updates to verify this:
WORKAROUND: disable intel microcode updates
From this bug: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920
1/ add the boot parameter: dis_ucode_ldr to /etc/default/grub
2/ update-grub

Results:
- linux-generic (4.15.0.50):
   - proceeds to boot (I see some dmesg output, modules loading; gets stuck on some err on loading i915 module, probably due to the lack of microcode and new HW?)

- linux-generic-hwe (4.18):
  - not tested yet

- linux-generic-hwe-edge (5.0):
  - boots without an error! OK :)

I'll be attaching the apport report for the ubuntu kernel I'm now able to boot.

@Simon, can you please test this workaround and verify this hits ASUS and confirms it blames intel-microcode?

tags: added: third-party-packages wayland-session
description: updated

apport information

apport information

apport information

tags: added: microcode
removed: cryptsetup plymouth splash wayland-session
description: updated

Confirmed
WORKAROUND 2: downgrade (and hold) intel-microcode to older version from bionic/main
apt install --reinstall intel-microcode=3.20180312.0~ubuntu18.04.1

Q: How comes liquorix kernel boots with the current micorcode ok?

Problem scoped down to:
intel-microcode from HWE (18.04.2) causes ASUS get stuck with any of official ubuntu kernels during boot (after grub, before initramfs is loaded, probably fails when microcode is applied?)

description: updated
summary: - ASUS microcode/kernel stuck at loading initramfs
+ intel-microcode on ASUS makes kernel stuck during loading initramfs on
+ HWE
summary: intel-microcode on ASUS makes kernel stuck during loading initramfs on
- HWE
+ bionic-updates, bionic-security
Changed in intel-microcode (Ubuntu):
status: New → Confirmed
Steve Langasek (vorlon) wrote :

Marking as an SRU regression and assigning to the Security Team uploader for follow-up.

tags: added: regression-security regression-update
no longer affects: cryptsetup (Ubuntu)
Changed in intel-microcode (Ubuntu):
assignee: nobody → Steve Beattie (sbeattie)
TJ (tj) wrote :

Mark:

An additional kernel command-line option that might reveal more would be remove "quiet splash" and add "debug early_print=XXX" where XXX is vga for BIOS-mode or efi for UEFI mode boots.

You may find "earlycon" added into the mix may also add more early messages.

See https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html

Steve Beattie (sbeattie) wrote :

Hi, sorry the intel-microcode update is causing problems. Could one of you with the affected processors report the output of:

  iucode-tool -S

Mark, is it correct that you are *not* seeing this with 5.0 kernel from the linux-hwe-edge kernel? Can you report the output of 'dmesg | grep microcode' when bboted in this configuration to make sure that the initramfs is loading the updated microcode in that case.

Thanks!

Steve Beattie (sbeattie) wrote :

Also, do systems succesfully boot when 'mds=no' is passed on the kernel command line with the problematic microcode in place?

Mark (markthecodehamster) wrote :

TJ,

> An additional kernel command-line option that might reveal more would be remove "quiet splash" and add "debug early_print=XXX" where XXX is vga for BIOS-mode or efi for UEFI mode boots.
You may find "earlycon"

no info provided using these options, still stuck at the initrd line without any output.

Mark (markthecodehamster) wrote :

Hi Steve,

no problem, thank you for digging into this!

> ith the affected processors report the output of:
  iucode-tool -S

# iucode-tool -S
iucode-tool: system has processor(s) with signature 0x000806eb

> is it correct that you are *not* seeing this with 5.0 kernel from the linux-hwe-edge kernel?

no. All Ubuntu kernels with the latest intel-microcode suffer this bug on my HW.
I need to do any of these three workarounds to be able to boot:
1/ use non-ubuntu kernel (I have liquorix)
2/ disable microcode updates during boot: dis_ucode_ldr param
3/ downgrade the intel-microcode package to bionic/main version

> report the output of 'dmesg | grep microcode' when bboted in this configuration to make sure that the initramfs is loading the updated microcode in that case

# uname -a
Linux mmm-U2442 5.0.0-15-generic #16~18.04.1-Ubuntu SMP Tue May 7 14:17:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# dmesg |grep microcode
[ 0.185333] MDS: Vulnerable: Clear CPU buffers attempted, no microcode

(^^^ this is with latest intel-microcode package (from 2019) and the dis_ucode_ldr...so apparently the ucode is not loaded)

Mark (markthecodehamster) wrote :

> Also, do systems succesfully boot when 'mds=no' is passed on the kernel command line with the problematic microcode in place?

no, this does not help

Tyler Hicks (tyhicks) wrote :

Mark, thanks for all the testing. Unfortunately, I asked Steve to have you try the wrong 'mds=' option.

Can you try to boot with the latest Ubuntu kernel, with the problematic microcode, using 'mds=off' on the kernel command line? (Note that it is 'off' instead of 'no')

If that doesn't work, can you try to boot with 'mitigations=off' passed on the kernel command line?

Steve Beattie (sbeattie) wrote :

Simon, can you post the contents of /proc/cpuinfo as well? It'd help with tracking down which processors might be affected by this. Thanks!

Tyler Hicks (tyhicks) wrote :

Mark, one more request for now. You say that you can boot up a non-Ubuntu kernel with the problematic microcode. Can you boot up one of those kernels and then verify the microcode revision with the following command:

  $ sudo cat /sys/devices/system/cpu/cpu0/microcode/version

Please paste the results. Also, the kernel version would be useful. Thanks again!

Mark (markthecodehamster) wrote :

Tyler,

> Can you try to boot with the latest Ubuntu kernel, with the problematic microcode, using 'mds=off' on the kernel command line? (Note that it is 'off' instead of 'no')

I tried with mds=off, but no avail.

> If that doesn't work, can you try to boot with 'mitigations=off' passed on the kernel command line?

on the other hand, mitigations=off did cut it!
I guess now we're to find which of the mitigations is causing it? Do you have a hint? Which are all the params for turning kernel mitigations,please?

Mark (markthecodehamster) wrote :

PS,
#dmesg | grep microcode
#
returns nothing on the ubuntu hwe-edge kernel with mitigations=off (=problematic ucode otherwise loaded)

Actually, is the microcode loaded at all?

# cat /sys/devices/system/cpu/cpu0/microcode/version
cat: /sys/devices/system/cpu/cpu0/microcode/version: No such file or directory

# uname -a
Linux mmm-U2442 5.0.0-15-generic #16~18.04.1-Ubuntu SMP Tue May 7 14:17:37 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
root@mmm-U2442:~# dmesg |grep -i microcode

# apt list --installed |grep microcode
amd64-microcode/bionic-updates,now 3.20180524.1~ubuntu0.18.04.2 amd64 [installed,automatic]
intel-microcode/bionic-security,bionic-updates,now 3.20190514.0ubuntu0.18.04.2 amd64 [installed,automatic]

# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.0.0-15-generic root=UUID=XXX ro rootflags=subvol=@ubuntu ro kaslr resume=/dev/mapper/swapDevice mem_sleep_default=deep dm_mod.use_blk_mq=1 acpi_osi=Linux acpi_backlight=vendor dis_ucode_ldr mitigations=off

On 2019-05-21 05:28:18, Mark wrote:
> > If that doesn't work, can you try to boot with 'mitigations=off'
> > passed on the kernel command line?
>
> on the other hand, mitigations=off did cut it!
> I guess now we're to find which of the mitigations is causing it? Do
> you have a hint? Which are all the params for turning kernel
> mitigations,please?

Oh, that is interesting.

On an Intel processor, mitigations=off is equivalent to these options:

 nopti
 nospectre_v2
 spectre_v2_user=off
 spec_store_bypass_disable=off
 l1tf=off
 mds=off

So, if you could cycle through those options individually to identify
which one allows you to boot, I think it would really help a lot in
getting better information to Intel.

Tyler Hicks (tyhicks) wrote :

On 2019-05-21 05:37:09, Mark wrote:
> Actually, is the microcode loaded at all?
>
> # cat /sys/devices/system/cpu/cpu0/microcode/version
> cat: /sys/devices/system/cpu/cpu0/microcode/version: No such file or directory

That's odd. Try replacing cpu0 with cpu* to see if a version file exists
for any logical CPU:

 $ sudo cat /sys/devices/system/cpu/cpu*/microcode/version

Mark (markthecodehamster) wrote :

I'm very sorry, made a mistake

> > on the other hand, mitigations=off did cut it!
> > I guess now we're to find which of the mitigations is causing it? Do
> > you have a hint? Which are all the params for turning kernel
> > mitigations,please?
>
> Oh, that is interesting.

I was NOT able to boot with mitigations=off now. If you review the line

> BOOT_IMAGE=/vmlinuz-5.0.0-15-generic ... dis_ucode_ldr mitigations=off

I made a mistake, microcode disable was still there. So backtrack one level, mitigations=off have no effect, do not help. I still need the dis_ucode_ldr
Once again, sorry for that. I was thinking I'm going crazy when I couldn't boot with mitigations off now, so I'm actually happy it was the mistake between the chair and keyboard.

Back to

> That's odd. Try replacing cpu0 with cpu* to see if a version file exists
for any logical CPU:

# cat /sys/devices/system/cpu/cpu*/microcode/version
cat: '/sys/devices/system/cpu/cpu*/microcode/version': No such file or directory

but that is I'm only able to boot if ucode is disabled.

I'll now try the non-ubuntu kernel and post the same.

Mark (markthecodehamster) wrote :

From non-ubuntu kernel:

# uname -a
Linux mmm-U2442 5.0.0-17.1-liquorix-amd64 #1 ZEN SMP PREEMPT liquorix 5.0-17ubuntu1~bionic (2019-05-17) x86_64 x86_64 x86_64 GNU/Linux

# dmesg |grep microcode
#

# cat /proc/cmdline
audit=0 BOOT_IMAGE=/vmlinuz-5.0.0-17.1-liquorix-amd64 root=UUID=824505cb-a96b-404d-9581-b0c2ffbc013b ro rootflags=subvol=@ubuntu ro kaslr resume=/dev/mapper/swapDevice mem_sleep_default=deep dm_mod.use_blk_mq=1 acpi_osi=Linux acpi_backlight=vendor earlycon print_early=efi debug mds=off

# cat /sys/devices/system/cpu/cpu0/microcode/version
0xb8

Summary:
* Ubuntu kernel:
- need dis_ucode_ldr, cpu0/microcode does not exist

* non-ubuntu:
- only need mds=off, microcode seems to load.

Mark (markthecodehamster) wrote :

Is this two separate bugs?
One that mds=on mitigation fails with the new microcode on any kernel,
the other that microcode in my HW (ASUS, Kaby lake) fails to load on ubuntu kernels?

Tyler Hicks (tyhicks) wrote :

On 2019-05-21 07:14:52, Mark wrote:
> Is this two separate bugs?
> One that mds=on mitigation fails with the new microcode on any kernel,
> the other that microcode in my HW (ASUS, Kaby lake) fails to load on ubuntu kernels?

Possibly but a single bug is sufficient from Ubuntu's standpoint at this
time. It seems like a problem with the microcode for your processor
(we've heard similar reports from another Linux distribution). We'll
speak with Intel about this. Thanks for all your assistance so far.

Mark (markthecodehamster) wrote :

> We'll speak with Intel about this

thank you! Hope they manage to fix it soon.
Let me know if you need some more testing.
Cheers,

Lei Zhao (leizmonk) wrote :

Anyone here think that this: https://bugs.launchpad.net/ubuntu/+bug/1829735 could be either a dupe of this bug or related?

Mark (markthecodehamster) wrote :

> Anyone here think that this: https://bugs.launchpad.net/ubuntu/+bug/1829735 could be either a dupe of this bug or related?

Hi Lei, thanks for reaching here! Very likely duplicate, as we have the same HW and the people describe similar symptoms.

Duplicates:
https://bugs.launchpad.net/ubuntu/+bug/1829735
https://bugs.launchpad.net/ubuntu/+source/linux-signed-hwe/+bug/1829784

Mark (markthecodehamster) wrote :

For anyone having a similar problem, try booting with microcode updates off:
add dis_ucode_ldr to the kernel command line (from GRUB if you cannot boot)

Chris Seeley (sealy-au) wrote :

>>For anyone having a similar problem, try booting with microcode updates off:
add dis_ucode_ldr to the kernel command line (from GRUB if you cannot boot)

  >>> can confirm this allows for boot.

  >>> have put outputs of dmesg, uname -a, lscpu, iucode-tools files here sealy.hypnos.feralhosting.com/logs

hope this help

Lei Zhao (leizmonk) wrote :

Thanks Mark & Chris! Appreciate your help in diagnosing and finding this temporary workaround. Will keep my eyes on this thread for when there's a more permanent fix.

Simon Allan (sysko-supinfo) wrote :

Here's my /proc/cpuinfo

Tyler Hicks (tyhicks) wrote :

I was able to speak with folks at Intel about this and got some good
info from them:

* To avoid confusion, we need to be clear that the i7-8565U is a Whiskey
  Lake processor (*not* a Kaby Lake)
  - https://ark.intel.com/content/www/us/en/ark/products/149091/intel-core-i7-8565u-processor-8m-cache-up-to-4-60-ghz.html
* There is an upstream intel-microcode bug report that is already
  tracking this issue
  - https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/1
* The upstream bug report suggests that the problem is seen with both
  the 20190312 and 20190514 microcode releases
* This may be a bug in how the kernel is loading the microcode

description: updated
Mark (markthecodehamster) wrote :

> To avoid confusion, we need to be clear that the i7-8565U is a Whiskey
  Lake processor (*not* a Kaby Lake)

yep, sorry, I got confused in the intel naming scheme, it's whiskey I'm having problems with ;)

> This may be a bug in how the kernel is loading the microcode

can you review the liquorix patches? Their kernel boots with said microcode. Is there a PPA with vanilla kernel I should try?

Tyler Hicks (tyhicks) wrote :

On 2019-05-22 17:13:02, Mark wrote:
> > This may be a bug in how the kernel is loading the microcode
>
> can you review the liquorix patches? Their kernel boots with said
> microcode.

I compared the arch/x86/ source code directories in the liquorix and
Ubuntu kernels. While there are a large amount of differences, I didn't
see anything that would cause a microcode loading behavior change.

This has me wondering about a few things regarding your being able to
boot the liquorix kernel in comment #41.

A) 'dmesg | grep microcode' showed no output. This is confusing because
   the microcode loader should print information when it loads
   microcode unless it is disabled. Your /proc/cmdline doesn't show the
   dis_ucode_ldr kernel parameter being in use but the use of that
   option would prevent anything from being printed by the microcode
   loader.

B) You have the latest microcode revision (0xb8) loaded according to
   /sys/devices/system/cpu/cpu0/microcode/version. That means that
   either the kernel microcode loader did load microcode or you've
   recently installed a BIOS update which contains the latest microcode
   revision and the kernel microcode loader didn't do anything.

   Asus published BIOS version 302 for your laptop on 2019-04-02
   (https://www.asus.com/Laptops/ASUS-ZenBook-14-UX433FN/HelpDesk_BIOS/).
   It is has no release notes that I can find. The release date on the
   BIOS update comes after the date stamped in the microcode file
   (2019-03-30). That would be quite the turnaround time but it is
   technically possible that it contains revision 0xb8.

To summarize, maybe you've installed a BIOS update that includes the new
microcode and that's allowing you to boot the liquorix kernel because it
isn't doing a microcode load because of A) and/or B) above?

The upstream bug report has mentions of Arch, Fedora, Ubuntu, and
upstream kernels all being affected by this bug. I'm skeptical that the
liquorix kernel is doing microcode loading correctly, when nobody else
is, which is why I'm trying to double check everything in comment #41.

> Is there a PPA with vanilla kernel I should try?

We have a mainline vanilla kernel builds of 5.0.17 here:

  https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.17/

Unfortunately, the 5.0.18 build, which would more closely resemble the
liquorix kernel, isn't available yet.

Mark (markthecodehamster) wrote :

> or you've recently installed a BIOS update which contains the latest microcode
   revision and the kernel microcode loader didn't do anything.
   Asus published BIOS version 302 for your laptop on 2019-04-02
   (https://www.asus.com/Laptops/ASUS-ZenBook-14-UX433FN/HelpDesk_BIOS/).

I really have updated to BIOS v302 from Asus website. It is roughly around the time the problems started, but I *think* I have been able to boot with the new BIOS before. So probably some change in kernel/microcode in addition to that?

> To summarize, maybe you've installed a BIOS update that includes the new
microcode and that's allowing you to boot the liquorix kernel because it
isn't doing a microcode load because of A) and/or B) above?

yes. Both cases: (updated BIOS + liquorix kernel) or (dis_ucode_ldr + ubuntu kernel) work for me.

Mark (markthecodehamster) wrote :

> https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0.17/

so I have tested with a kernel from this PPA and it did not help.

Tyler,
there's one strange thing I'd like to ask: the boot process behaves stateful(?!), ie. I can get into a previous "clean"/"dirty" state (and at least liquorix) kernel then boots differently.

1. make the kernel get stuck, eg. run ubuntu 4.15 (-dis_ucode_ldr) -> "dirty". reboot (I have to hard power off).
2. Boot liquorix kernel (-dis_ucode_ldr, +mds=off) -> stuck!
3. make kernel pass once successfully. eg boot liquorix (+dis_ucode_ldr) -> i get to cryptsetup asking for passwd in intrd (it's not enough to reboot at this stage), need to fill in and boot to desktop. -> "clean". Reboot.
4. now repeat step 2, and I get to boot?!

How is 2 vs. 4 possible? Where is the info stored? Thank you

Mark (markthecodehamster) wrote :

Also, I'm still able to boot any kernel (with no extra boot params) when I force-downgrade the intel-microcode to 2018 version (from bionic/main)

Greg (buchovagabond) wrote :

Intel microcode update definitely created same issue for me on Asus Zenbook 14 (UX433F) on 18.04.2 LTS.

Can also confirm the instructions above "add dis_ucode_ldr to the kernel command line (from GRUB if you cannot boot)" worked perfectly for temporary workaround.

This is my fourth Asus, and I've been using Linux since 1997, but I can't remember struggling as much with any other machine/distro combo as with this UX433F on Ubuntu.

Chris Seeley (sealy-au) wrote :

Steve, Mark, I've had bios version 302 installed since the 5th of April and was booting fine until the apt-get upgrade run on the 21st of May

note in my proc/cpuinfo the microcode version is listed as 0x98

also how do you force a downgrade of the microcode package?

Chris Seeley (sealy-au) wrote :

^^^ sorry this cpuinfo was off a ux433fn booted on an affected kernel with dis_ucode_ldr

Steve Beattie (sbeattie) wrote :

Chris, thanks for confirming which processor is in use.

To force downgrade the microcode package on bionic, do:

  sudo apt install intel-microcode=3.20180312.0~ubuntu18.04.1

For other releases, you will need to replace the version with what version(s) are available for your release; you can use apt-cache policy to see what is available.

In the worst case, if the version you want to downgrade to is not available in the archive any more (e.g. for xenial or trusty), you can navigate to that release's information page for that package (e.g. https://launchpad.net/ubuntu/xenial/+source/intel-microcode), navigate to the specific version you desire, and manually download the appropriate binary deb for your system, and install it via 'sudo dpkg -i intel_microcode-VERSION_ARCH.deb'.

One downside to using older versions is that, unless you use dpkg pinning to keep them in place, unattended-upgrades or some other update process will upgrade them to the current version available for that release, putting you right back to where you started.

An alternative approach would be to add IUCODE_TOOL_INITRAMFS=no to /etc/default/intel-microcode and then re-running 'update-initramfs -u' to remove the microcode bits from the initramfs image. NOTE: you will want to remember that you did this so that you can re-enable microcode updates once this issue has been resolved.

Thanks.

Mark (markthecodehamster) wrote :

just heads up that the recent update to intel-microcode (3.20190514.0ubuntu0.18.04.3) does not resolve the issue (probably wasn't even expected to)

Simon Allan (sysko-supinfo) wrote :

for cosmic it's the very same package

  sudo apt install intel-microcode=3.20180312.0~ubuntu18.04.1

and so far it seems to fix the issue, thanks a lot for the easy to implement workaround.

TJ (tj) wrote :

Steve: Another possible system affected where disabling microcode loading appears to have fixed it. I'll leave it to you to decide whether that is a duplicate of this issue though:

LP: #1829402 "Purple screen hangup during boot"

Tom and myself have spent some considerable time with the affected user on IRC testing permutations so I'm confident it is the same issue.

Simon Allan (sysko-supinfo) wrote :

@TJ,

we got a new employee these days who had a slightly different model of our laptop, and he got the purple screen hangup you're describing and the workaround of pinning the intel-microcode version fixed the issue.

Olokun Ademola (olo96) wrote :

Hi.
I'm using a Ubuntu 19.04 on a Samsung NP300E5C. I noticed I couldn't sudo update and I was getting a "can't write to /tmp" error. I rebooted my PC, and now it won't go past the purple screen. I've tried the add "dis_ucode_ldr" fix but nothing happened. Please I'm fairly new at Linux, can someone explain how I can go about fixing this bug?

Thanks

aditya lele (adityadlele) wrote :

Hi,

I think the discussions already confirm this. Just wanted to reiterate: I have ASUS-FX533FD Zenbook laptop with i7-8565U CPU. I also faced the same problem as mentioned in the bug description. I can confirm that it was fixed with "WORKAROUND 2: downgrade (and hold) intel-microcode to older version from bionic/main".

Thanks for your help.

Eric Chai (zcchai) wrote :

Hello,
I have the same issue on VirtualBox with MacBook Pro as host. After the recent update, the boot stops on all purple screen. However SSH login still works.

The WORKAROUND 1 % 2 don't work. Any more Workarounds?

Thanks

Tom Reynolds (tomreyn) wrote :

Eric: This issue should only affect Ubuntu running bare metal on Intel systems, so not Ubuntu running in a VM. Put differently: if you do not run Ubuntu as the primary operating system on this MacBook Pro then please file a separate bug report, or (ideally beforehand) see if you can get this sorted with commercial (https://ubuntu.com/support) or community / volunteer (https://ubuntu.com/support/community-support) support. Thanks.

Felix Ticona (felix-ticona) wrote :

In our case downgrade the microcode don't work, instead of that we noticed that our kernel was outdated:
 uname -r
 4.15.0-51-generic
so we upgrade the kernel to the recent version with:

 sudo apt-get install --install-recommends linux-generic-hwe-18.04 xserver-xorg-hwe-18.04

with that the kernel was upgraded to:
 4.18.0-21-generic
and ubuntu start correctly

Mark (markthecodehamster) wrote :

Hey, I'm the OP,

I can confirm @felix-ticona 's finding, after the kernel upgrade (atleast HWE to the -21 revision and the issue is "magically" done).

# uname -a
Linux xxx 4.18.0-21-generic #22~18.04.1-Ubuntu SMP Thu May 16 15:07:19 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

# dmesg | grep command
[ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.18.0-21-generic root=UUID=xxx ro rootflags=subvol=@ubuntu ro kaslr resume=xxx mem_sleep_default=deep dm_mod.use_blk_mq=1 acpi_osi=Linux acpi_backlight=vendor quiet

Notice no dis_ucode_ldr (=no workaround 1)

# apt-cache policy intel-microcode
intel-microcode:
Installed: 3.20190514.0ubuntu0.18.04.3
Candidate: 3.20190514.0ubuntu0.18.04.3
Version table:
*** 3.20190514.0ubuntu0.18.04.3 500
500 mirror://mirrors.ubuntu.com/mirrors.txt bionic-security/main amd64 Packages
500 mirror://mirrors.ubuntu.com/mirrors.txt bionic-updates/main amd64 Packages
100 /var/lib/dpkg/status
3.20180312.0~ubuntu18.04.1 500
500 mirror://mirrors.ubuntu.com/mirrors.txt bionic/main amd64 Packages

(=no workaround 2: old intel-microcode package holding)

TODO: test main kernel (4.15) and hwe-edge (5.0)

Mark (markthecodehamster) wrote :

Can someone pin-point what the change is in HWE kernel rebuilds -20 and -21?

Mark (markthecodehamster) wrote :

is it because of this? (is the file new?)

intel-microcode-blacklist.conf :

# The microcode module attempts to apply a microcode update when
# it autoloads. This is not always safe, so we block it by default.
blacklist microcode

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.