Purple screen hangup during boot

Bug #1829402 reported by Jackneill
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Specs:
Ubuntu 19.04 fresh install, everything updated
Firmware is latest (Bios version 300)
Asus vivobook s14
LVM disk encryption

Steps to reproduce:
- Install ubuntu 19.04/18.10
- Reboot
- Purple screen indefinitely

Setting kernel cmdline args only to ro nomodeset seems solves it.

After some debugging session with #ubuntu with different cmdline args, it seems now the default one works too. I have not edited anything, but now i am unable to reproduce it.

ProblemType: Bug
DistroRelease: Ubuntu 19.04
Package: linux-image-5.0.0-15-generic 5.0.0-15.16
ProcVersionSignature: Ubuntu 5.0.0-15.16-generic 5.0.6
Uname: Linux 5.0.0-15-generic x86_64
ApportVersion: 2.20.10-0ubuntu27
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: markbartos 1778 F.... pulseaudio
Date: Thu May 16 17:15:07 2019
InstallationDate: Installed on 2019-05-16 (0 days ago)
InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Release amd64 (20190416)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 13d3:3526 IMC Networks
 Bus 001 Device 002: ID 13d3:56c1 IMC Networks
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: ASUSTeK COMPUTER INC. VivoBook_ASUSLaptop X430FA_S430FA
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.0.0-15-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash vt.handoff=1
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.0.0-15-generic N/A
 linux-backports-modules-5.0.0-15-generic N/A
 linux-firmware 1.178.1
SourcePackage: linux
StagingDrivers: r8822be
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/22/2018
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: X430FA.300
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: X430FA
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK COMPUTER INC.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrX430FA.300:bd11/22/2018:svnASUSTeKCOMPUTERINC.:pnVivoBook_ASUSLaptopX430FA_S430FA:pvr1.0:rvnASUSTeKCOMPUTERINC.:rnX430FA:rvr1.0:cvnASUSTeKCOMPUTERINC.:ct10:cvr1.0:
dmi.product.family: VivoBook
dmi.product.name: VivoBook_ASUSLaptop X430FA_S430FA
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK COMPUTER INC.

Revision history for this message
Jackneill (jackneill1000) wrote :
Revision history for this message
Jackneill (jackneill1000) wrote :

I should add that it was/seemed quite non-deterministic.
kernel args (only nomodeset) which seemed to work sometimes did not.

To edit kernel args, i went to bios to escape so as to get grub.

Revision history for this message
Jackneill (jackneill1000) wrote :

The laptop is brand new.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Tom Reynolds (tomreyn) wrote :

Thanks for filing this bug report, Jack.
I'm one of the IRC support volunteers who tried to help here, but I'm mostly out of ideas.
As suggested on IRC, trying a mainline kernel may be a good idea (only if you can still reproduce the issue).

I should mention that a "USB-C Triple-4K Dock" (SerialNumber: DGWC0011000xxxxx) was connected to this system initially:
https://www.startech.com/Cards-Adapters/Laptop-docking-stations/triple-4k-usb-c-laptop-docking-station~DK30CH2DPPD
But from what I understood, the issue continued after this was disconnected.

Revision history for this message
Jackneill (jackneill1000) wrote :

Pressing ESC or F1 does nothing in the hanging purple screen. (I now got the error again.)

Editing the grub "ubuntu" menu entry to ro nomodeset did nothing, altough doing the same for a recovery entry did the job of getting to boot properly.

Revision history for this message
Jackneill (jackneill1000) wrote :

I should probably add that i installed guix (the distro) for a brief moment, and it rebooted correctly. Everything with guix was successful, altough as I remember it was not more than 1-2 reboots, and maybe a power off, and start.

Revision history for this message
TJ (tj) wrote :

Can we collect more information about the circumstances in which this happens, since these can often be crucial in narrowing down the possibilities and identifying a solution?

1. Does the problem occur if the system is completely powered off between reboots (as opposed to what is called a 'warm' reboot) ?

2. Can 'journalctl --list-boots' be used to identify logs of a *failed* boot and use 'journalctl -b X' to grab the log file of the failed boot?

3. has 'GRUB_TERMINAL=console' in /etc/default/grub been tried (edit file to include that string without a leading # [comment prefix] then "sudo update-grub" and reboot to test)

Revision history for this message
Jackneill (jackneill1000) wrote :

>1. Does the problem occur if the system is completely powered off between reboots (as opposed to what >is called a 'warm' reboot) ?

Yes.

>2. Can 'journalctl --list-boots' be used to identify logs of a *failed* boot and use 'journalctl -b X' to grab the log file of the failed boot?

Yes.

boot -1: (cold powered off state, been powered off for some hours)
This was a successful boot without doing anything.
https://termbin.com/sp9t (boot_m1.txt)

<There was a purple screen hangup boot but i guess that was not recorded>

boot 0: (powered off by long press power button, get to bios, esc to get grub, edit a recovery menu entry, delete 'recovery' add 'nomodeset', to be able to boot)
https://termbin.com/fdwb (boot0.txt)

Revision history for this message
Jackneill (jackneill1000) wrote :

Attachment: boot_m1.txt

Revision history for this message
Jackneill (jackneill1000) wrote :

>2. Can 'journalctl --list-boots' be used to identify logs of a *failed* boot and use 'journalctl -b X' to grab the log file of the failed boot?

No. As inferred from the above comment. (Sorry i have mistakenly wrote 'yes'.

Revision history for this message
Jackneill (jackneill1000) wrote :

>3. has 'GRUB_TERMINAL=console' in /etc/default/grub been tried (edit file to include that string without a leading # [comment prefix] then "sudo update-grub" and reboot to test)

When I have a successful boot 2 error lines appear for a brief moment then everything proceeds.

When I get the hanging screen (this time black) i can read those lines:

`
error: no video mode activated.
error: can't find command `hwmatch`.

Revision history for this message
TJ (tj) wrote :

15:38 <Jackneill> TJ-, for a brief moment i saw 2-3 lines of text, first line is "error: no video mode activated"
15:38 <Jackneill> was unable to see more.
15:38 <Jackneill> but this is a successful boot no purple screen this time
15:38 <TJ-> Jackneill: so, GRUB_TERMINAL=console was successfully booted to desktop?
15:39 <Jackneill> TJ-, yes, but its non-determinitsitc so its hard to tell
15:39 <Jackneill> if the error still exists or not
15:39 <Jackneill> requires many boots.
...
15:43 <Jackneill> TJ-, i got a bad boot this time.
15:43 <Jackneill> TJ-, hanging black screen with 2 lines: first is what i said above, second: 'error: can't find command `hwmatch`'
15:45 <TJ-> Jackneill: the fact you can see the text is a good thing, as it may point to the issue not being video at all, but possibly something else that is hanging the entire system
...
15:49 <TJ-> Jackneill: "hwmatch" is part of the GRUB boot-loader, and (is supposed) to be executed by GRUB at boot-time to check if the GPU is included in a blacklist: "/boot/grub/grub.cfg:122: if hwmatch ${prefix}/gfxblacklist.txt 3; then ..." - so it appears the hand-over from GRUB to Linux kernel is failing

So, in GRUB GFX mode, which sets the background colour to the Ubuntu themed 'purple' mentioned in this bug, the display is cleared when GRUB hands over to the Linux kernel. If the kernel is not starting for some reason that purple screen is all the user will see.

With GRUB_TERMINAL=console the display is not in a GFX mode, but text only, so we get to see some clue at least!

Revision history for this message
Tom Reynolds (tomreyn) wrote :

Possibly related to bug 699802. See comments 24 and 101 for potential workarounds.

If you'll test these, please test them thoroughly, doing multiple warm and cold boots with each changed setting, since successful boots are currently non-deterministic.

Revision history for this message
Tom Reynolds (tomreyn) wrote :

Also, before you do the testing, please attach your current
  /boot/grub/grub.cfg
  /boot/efi/EFI/ubuntu/grub.cfg
files to this bug report. Thanks.

Revision history for this message
Jackneill (jackneill1000) wrote :

Having done https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/699802/comments/101,
now i only get 'error: can't find command `hwmatch`.

Revision history for this message
Jackneill (jackneill1000) wrote :

Filesystem setup:

$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 9,7G 0 9,7G 0% /dev
tmpfs 2,0G 1,8M 2,0G 1% /run
/dev/mapper/ubuntu--vg-root 232G 7,7G 213G 4% /
tmpfs 9,8G 64M 9,7G 1% /dev/shm
tmpfs 5,0M 4,0K 5,0M 1% /run/lock
tmpfs 9,8G 0 9,8G 0% /sys/fs/cgroup
/dev/loop1 3,8M 3,8M 0 100% /snap/gnome-system-monitor/81
/dev/loop2 4,2M 4,2M 0 100% /snap/gnome-calculator/406
/dev/loop3 88M 88M 0 100% /snap/go/3739
/dev/loop4 152M 152M 0 100% /snap/gnome-3-28-1804/31
/dev/loop5 1,0M 1,0M 0 100% /snap/gnome-logs/61
/dev/loop0 928M 928M 0 100% /snap/android-studio/76
/dev/loop8 90M 90M 0 100% /snap/core/6818
/dev/loop9 36M 36M 0 100% /snap/gtk-common-themes/1198
/dev/loop10 927M 927M 0 100% /snap/android-studio/75
/dev/loop11 133M 133M 0 100% /snap/postman/81
/dev/loop12 90M 90M 0 100% /snap/core/6673
/dev/loop6 54M 54M 0 100% /snap/core18/941
/dev/loop13 116M 116M 0 100% /snap/insomnia/29
/dev/loop7 190M 190M 0 100% /snap/gitkraken/138
/dev/loop14 3,8M 3,8M 0 100% /snap/gnome-system-monitor/77
/dev/loop15 152M 152M 0 100% /snap/gnome-3-28-1804/40
/dev/loop16 54M 54M 0 100% /snap/core18/970
/dev/loop17 125M 125M 0 100% /snap/vscode/93
/dev/loop18 89M 89M 0 100% /snap/core/6964
/dev/sda2 705M 163M 491M 25% /boot
/dev/sda1 511M 7,9M 504M 2% /boot/efi
/dev/loop19 15M 15M 0 100% /snap/gnome-characters/258
/dev/loop20 15M 15M 0 100% /snap/gnome-characters/254
tmpfs 2,0G 28K 2,0G 1% /run/user/1000

EFI grub attached.

Revision history for this message
Jackneill (jackneill1000) wrote :

And indeed, i have no such mod under /boot: attached hierarchy.

Revision history for this message
Jackneill (jackneill1000) wrote :

But as i said previously, despite this, sometimes i can boot, sometimes not.

Revision history for this message
Jackneill (jackneill1000) wrote :

Adding "earlyprintk=efi,keep" to the command line only gives me 1 line:

    'Memory KASLR using RDRAND RDTSC...'

If its quite low res, only this line, if i also get 'error: can't find command `hwmatch` then its a bit more high-res.

Revision history for this message
Jackneill (jackneill1000) wrote :

But it's also possible it hangs with absolutely no text on screen, event with that.

Revision history for this message
Tom Reynolds (tomreyn) wrote :

Please consider testing this:

1. Add this to /etc/default/grub :
GRUB_GFXPAYLOAD_LINUX="text"

2. Add this to the GRUB_CMDLINE_LINUX_DEFAULT option in /etc/default/grub :
earlyprintk=efi,keep

3. Run: sudo update-grub

4. Try all of the following combinations of kernel boot parameters by editing them at the grub menu (https://wiki.ubuntu.com/Kernel/KernelBootParameters):
  mitigations=off
  dis_ucode_ldr mitigations=off
  dis_ucode_ldr mds=off
(Only) if any of these combinations boot fine, try them again at least five times or until a boot fails. Take notes and finally report how often you tried them and how often they booted fine, and whether there was a failed boot for each of these combinations.

Revision history for this message
Jackneill (jackneill1000) wrote :

1. GRUB_CMDLINE_LINUX_DEFAULT="mds=off"
    Same hang.
2. GRUB_CMDLINE_LINUX_DEFAULT="mitigations=off"
    Same hang.
2. GRUB_CMDLINE_LINUX_DEFAULT="dis_ucode_ldr"
    Seems to be fine. Done multiple reboots, cold boots, but never hang.

Revision history for this message
Jackneill (jackneill1000) wrote :

I should add that i edited nothing else, for example GRUB_GFXPAYLOAD_LINUX="text"..

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Possible dupe of LP: #1829402.

Revision history for this message
TJ (tj) wrote :

This looks to be related to LP: #829620 "intel-microcode on ASUS makes kernel stuck during loading initramfs on bionic-updates, bionic-security" which until now we thought only affected particular models. Intel are aware, and this issue is being tracked in that bug.

Revision history for this message
Tom Reynolds (tomreyn) wrote :

Possible duplicate of bug #1829620.

Revision history for this message
Tom Reynolds (tomreyn) wrote :

Thanks for testing, Jackneill.

To explain, the "dis_ucode_ldr" option disables loading of microcode updates by Linux. Microcode updates are are binary blobs (no source code is available) which modify operation of a CPU until the next reboot. They are supplied by CPU manufacturers / vendors to mainboard manufacturers / vendors and are now more commonly shared with operating systems, too. Linux supports loading them into the CPU during early boot.

Unfortunately, with the latest microcode update (which was released by the same time as the kernel updates you reported about on 19.04 and 18.04 LTS), your CPU seems to not behave reliably, causing a high percentage of your boots into Ubuntu to fail. This root cause is confirmed by using the dis_ucode_ldr kernel boot parameter, which, as you report, ensures your system boots up fine every time.

I will now mark this bug report as a duplicate of bug #1829620 (I am convinced both Kai-Heng and TJ meant to point to the same bug report) and suggest you keep using "dis_ucode_ldr" temporarily until Intel releases a new microcode update and it becomes available in Ubuntu via https://packages.ubuntu.com/search?keywords=intel-microcode - the status of bug #1829620 should change to "fix released" at that time. Feel free to address me on IRC (#ubuntu) in case of any questions of a support nature, or add to the bug report if you would like to report more details on this issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.