Boots fine with 5.3.0-19, doesn't boot any more with 5.3.0-22

Bug #1852435 reported by Andrej Gelenberg
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

System doesn't boot anymore with new kernel update linux-image-5.3.0-22-generic. Here is the screenshot where the boot process stops: https://photos.app.goo.gl/6b7D2SsGBpgKWLd1A

I installed old kernel packages out of local deb cache (linux-image-5.3.0-19-generic), initramfs regenerated, same boot options and was able to boot fine. I assume something went wrong with new security patches, ucode for intel CPU seems not to be an issue, because old kernel booted with newly generated initrd image.

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: linux-image-5.3.0-22-generic 5.3.0-22.24
ProcVersionSignature: Ubuntu 5.3.0-19.20-generic 5.3.1
Uname: Linux 5.3.0-19-generic x86_64
ApportVersion: 2.20.11-0ubuntu8.2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: andrej 3321 F.... pulseaudio
 /dev/snd/controlC0: andrej 3321 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
Date: Wed Nov 13 14:37:01 2019
InstallationDate: Installed on 2019-11-04 (9 days ago)
InstallationMedia: Ubuntu 19.10 "Eoan Ermine" - Release amd64 (20191017)
MachineType: LENOVO 20Q500E2GE
ProcFB: 0 i915drmfb
ProcKernelCmdLine: initrd=\initrd.img-5.3.0-19-generic BOOT_IMAGE=/vmlinuz-5.3.0-19-generic root=/dev/mapper/ubuntu-root ro
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-19-generic N/A
 linux-backports-modules-5.3.0-19-generic N/A
 linux-firmware 1.183.1
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/11/2019
dmi.bios.vendor: LENOVO
dmi.bios.version: R0ZET35W (1.13 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20Q500E2GE
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrR0ZET35W(1.13):bd10/11/2019:svnLENOVO:pn20Q500E2GE:pvrThinkPadL490:rvnLENOVO:rn20Q500E2GE:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad L490
dmi.product.name: 20Q500E2GE
dmi.product.sku: LENOVO_MT_20Q5_BU_SMB_FM_ThinkPad L490
dmi.product.version: ThinkPad L490
dmi.sys.vendor: LENOVO

Revision history for this message
Andrej Gelenberg (andrej-gelenberg) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Andrej Gelenberg (andrej-gelenberg) wrote :

Still doesn't boot with 5.3.0-23 version

Revision history for this message
Tyler Hicks (tyhicks) wrote :
Download full text (3.4 KiB)

Hi Andrej - Thanks for the bug report and sorry for the trouble.

The 5.3.0-22 kernel had a bunch of changes in addition to the Intel related security fixes. Lets start by ruling some things out.

I'd like for you to *separately* try two different kernel command-line parameters.

The first is "mitigations=off" which is an easy way to disable both of the Intel CPU related security fixes that landed in 5.3.0-22 in addition to all the pre-existing issues. This doesn't disable the i915 graphics driver security fixes but I don't suspect that those are the problem here.

If that doesn't work, remove "mitigations=off" and try "dis_ucode_ldr" which disables the kernel's microcode loader to rule out a faulty CPU microcode.

If that doesn't work, please try combining the two options and report back the results.

It is important to note that both options are dangerous and leave your system vulnerable to known CPU security flaws. They should only be used temporarily for testing purposes.

Also, do you perhaps use full disk encryption with LUKS/dm-crypt?

Finally, I suspect that your issue is actually TPM related but I'd like to rule out the security fixes and microcode updates first. I see the following TPM related errors in your kernel logs:

[ 7.104690] tpm_tis STM7308:00: 2.0 TPM (device-id 0x0, rev-id 78)
[ 7.105311] tpm tpm0: tpm_try_transmit: send(): error -5
[ 7.105344] tpm tpm0: [Firmware Bug]: TPM interrupt not working, polling instead
...
[ 8.598278] Call Trace:
[ 8.598898] <IRQ>
[ 8.599497] dump_stack+0x63/0x8a
[ 8.600127] __report_bad_irq+0x3a/0xaf
[ 8.600768] note_interrupt.cold+0xb/0x61
[ 8.601397] handle_irq_event_percpu+0x73/0x80
[ 8.602020] handle_irq_event+0x3b/0x5a
[ 8.602657] handle_fasteoi_irq+0x9c/0x150
[ 8.603292] handle_irq+0x20/0x30
[ 8.603946] do_IRQ+0x50/0xe0
[ 8.604591] common_interrupt+0xf/0xf
[ 8.605201] </IRQ>
[ 8.605832] RIP: 0010:cpuidle_enter_state+0xc5/0x420
[ 8.606458] Code: ff e8 ef 8a 83 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3d 03 00 00 31 ff e8 22 e1 89 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 89 d1 01 00 00 41 c7 44 24 10 00 00 00 00 48 83 c4 18
[ 8.607134] RSP: 0018:ffffa40a4010be38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
[ 8.607835] RAX: ffff954b1036b340 RBX: ffffffffb555a700 RCX: 000000000000001f
[ 8.608531] RDX: 0000000000000000 RSI: 000000004041a68b RDI: 0000000000000000
[ 8.609225] RBP: ffffa40a4010be78 R08: 0000000200650069 R09: 000000007fffffff
[ 8.609948] R10: ffff954b1036a0e4 R11: ffff954b1036a0c4 R12: ffff954b10376500
[ 8.610674] R13: 0000000000000001 R14: 0000000000000001 R15: ffff954b10376500
[ 8.611382] ? cpuidle_enter_state+0xa1/0x420
[ 8.612089] cpuidle_enter+0x2e/0x40
[ 8.612820] call_cpuidle+0x23/0x40
[ 8.613542] do_idle+0x1eb/0x280
[ 8.614230] cpu_startup_entry+0x20/0x30
[ 8.614940] start_secondary+0x168/0x1c0
[ 8.615653] secondary_startup_64+0xa4/0xb0
[ 8.616379] handlers:
[ 8.617089] [<00000000382c6122>] tis_int_handler
[ 8.617815] Disabling IRQ #31

We've seen quite a few TPM related issues with 5.3 and these two changes, which landed in 5.3.0-22, look relate...

Read more...

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Also, is it possible to try 5.3.0-24 from eoan-proposed?

Revision history for this message
Andi (keefer-a) wrote :

Hi Tyler,

I'm also affected from this issue and tried your suggested kernel parameters on 5.3.0-22:
- mitigations=off
- dis_ucode_ldr
- mitigations=off dis_ucode_ldr
but nothing booted
I also tried 5.3.0-23/5.3.0-24 with no success.
And yes, I use full disk encryption with LUKS + LVM.

Any other suggestions?

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Can you try to disable the TPM and also disable Secure Boot entirely? By the way, do you use secure boot at all?

Thank you very much.
Cascardo.

Revision history for this message
Andrej Gelenberg (andrej-gelenberg) wrote :

Secure lot is disabled, but TPM is needed for Windows disk encryption. The BUG is also present in the latest vanilla kernel from kernel.org. Will test things out and report my finding.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Can you try the test package at [1]?

[1] https://people.canonical.com/~cascardo/tpmrevert1/

Revision history for this message
Andrej Gelenberg (andrej-gelenberg) wrote :

I can confirm the bug also in latest 5.4.0 vanilla kernel and that it is in TPM module (without it kernel doesn't hang)

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Andrej.

Can you install the test package I made available at [1]? Just download the .deb files and install them with sudo dpkg -i *deb.

Thanks.
Cascardo.

[1] https://people.canonical.com/~cascardo/tpmrevert1/

Revision history for this message
Arkanon (arkanon) wrote :

Hi, Thadeu.

I have the same problem described in <http://bugs.launchpad.net/ubuntu/+source/linux/+bug/1847980> and apparently in this topic. I could only initialize my LUbuntu 19.10 with the "nomodeset" parameter and then only the main monitor worked. My integrated video chipset is the Intel Corporation HD Graphics 530 (rev 06).

I installed the packages you indicated and now everything seems to work :)

The same problem occurs with LUbuntu 18.04.3 and his kernel 5.3 package. Do you think I can use the same packages in it?

Revision history for this message
Arkanon (arkanon) wrote :

OK, I just tested, Thadeu. Your package also works perfectly with LUbuntu 18.04.3 :) Thank's very much!

Revision history for this message
Andrej Gelenberg (andrej-gelenberg) wrote :

I don't think it's an issue with intel graphics. More things pointing with difficulties with TPM. Also in 5.3.0-19 kernel following message appears:
[ 8.621471] irq 31: nobody cared (try booting with the "irqpoll" option)
[ 8.621474] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.3.0-19-generic #20-Ubuntu
[ 8.621475] Hardware name: LENOVO 20Q500E2GE/20Q500E2GE, BIOS R0ZET35W (1.13 ) 10/11/2019
[ 8.621476] Call Trace:
[ 8.621478] <IRQ>
[ 8.621481] dump_stack+0x63/0x8a
[ 8.621483] __report_bad_irq+0x3a/0xaf
[ 8.621485] note_interrupt.cold+0xb/0x61
[ 8.621487] handle_irq_event_percpu+0x73/0x80
[ 8.621488] handle_irq_event+0x3b/0x5a
[ 8.621489] handle_fasteoi_irq+0x9c/0x150
[ 8.621492] handle_irq+0x20/0x30
[ 8.621493] do_IRQ+0x50/0xe0
[ 8.621495] common_interrupt+0xf/0xf
[ 8.621496] </IRQ>
[ 8.621498] RIP: 0010:cpuidle_enter_state+0xc5/0x420
[ 8.621499] Code: ff e8 ef 8a 83 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 3d 03 00 00 31 ff e8 22 e1 89 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 89 d1 01 00 00 41 c7 44 24 10 00 00 00 00 48 83 c4 18
[ 8.621501] RSP: 0018:ffff9937c00f3e38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 8.621503] RAX: ffff89d91e32b340 RBX: ffffffffb035a700 RCX: 000000000000001f
[ 8.621505] RDX: 0000000000000000 RSI: 000000004041c85e RDI: 0000000000000000
[ 8.621506] RBP: ffff9937c00f3e78 R08: 0000000201e126f9 R09: 000000007fffffff
[ 8.621507] R10: ffff89d91e32a0e4 R11: ffff89d91e32a0c4 R12: ffff89d91e336500
[ 8.621508] R13: 0000000000000001 R14: 0000000000000001 R15: ffff89d91e336500
[ 8.621510] ? cpuidle_enter_state+0xa1/0x420
[ 8.621511] cpuidle_enter+0x2e/0x40
[ 8.621513] call_cpuidle+0x23/0x40
[ 8.621514] do_idle+0x1eb/0x280
[ 8.621515] cpu_startup_entry+0x20/0x30
[ 8.621516] start_secondary+0x168/0x1c0
[ 8.621518] secondary_startup_64+0xa4/0xb0
[ 8.621519] handlers:
[ 8.621522] [<00000000cb091107>] tis_int_handler
[ 8.621523] Disabling IRQ #3

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

I believe Arkanon mentioned that the new kernel fixes the graphics problem because it includes fixes from 5.3.8 as described in LP: #1847980, which would still be found from the kernel in -proposed. However, combined with this issue, Arkanon could probably not test the fix for the graphics issue without being able to boot at all.

Revision history for this message
Andrej Gelenberg (andrej-gelenberg) wrote :

Hi Cascardo,

works like a treat. Boots fine with your package.

Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Andrej, by the way, did you get a chance to test the kernel I built?

Thanks.
Cascardo.

Revision history for this message
Andrej Gelenberg (andrej-gelenberg) wrote :

already wrote you, i did and it works

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) wrote :
Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) wrote :

Also, I can say:
Disabling TPM in bios works as workaround
kernel 5.3.0-19 was fine 5.3.0-20 was not, and nothing works up to 5.3.0-24

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) wrote :

My blame list contains:
    - tpm_tis_core: Turn on the TPM before probing IRQ's
    - tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts
    - tpm: Wrap the buffer from the caller to tpm_buf in tpm_send()

Revision history for this message
Gianfranco Costamagna (costamagnagianfranco) wrote :

I can confirm the fix worked on another ThinkPad device with BIOS/TPM enabled.

tags: added: regression-update
tags: added: regression-proposed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

Hi, Andrej and Gianfranco.

Thanks a lot for your help testing this.

We were monitoring different reports about boot issues with 5.3.0-23, but we were not sure they were about the same issues. In fact, some of those reports were about linux-hwe-edge 5.3.0-23 that didn't have modules signed because of a difference in bionic tools, so we wanted to make sure before duplicating bugs to identify which ones were which.

As the other one I marked this bug as a duplicate of (LP: #1852586) had the confirmation sent before (or so I noticed), I ended up submitting the fix to the mailing list with it as a buglink.

So, this is now fix committed, and should land in -proposed by next week. Sorry for the troubles caused, and thanks again for all the help identifying and testing these reverts.

Cascardo.

Revision history for this message
Warren Kumari (wkumari) wrote :

Cascardo's patch / reversion (https://people.canonical.com/~cascardo/tpmrevert1/ ) also works on a Panasonic Toughbook FZ-55 (thanks!). Annoyingly the BIOS doesn't provide a way to disable the TPM on this device.

It would hang just after enumerating USB devices (which doesn't really seem to match other people's symptoms, but...

Revision history for this message
mauro (m-dignazio) wrote :

I confirm the fix with the packeges in:

[1] https://people.canonical.com/~cascardo/tpmrevert1/

My post about the same problem is:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1858498

thanks

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.