sig=0x806ec/20200609, Hard lockups using microcode releases 20191115 on Intel Whiskey Lake

Bug #1862751 reported by You-Sheng Yang
70
This bug affects 12 people
Affects Status Importance Assigned to Milestone
HWE Next
Invalid
Undecided
Unassigned
OEM Priority Project
Confirmed
Medium
Unassigned
intel-microcode (Ubuntu)
Confirmed
Critical
Unassigned

Bug Description

This hangs Linux warm/cold boot for all versions with non-uniformed fail rate. With v5.2 or older, it's almost 100% reproducible; with v5.3 or above, it's much less likely. With kernel boot parameters `initcall_debug ignore_loglevel=1 earlyprintk=efi earlycon=efifb`, it shows it's mostly locked up inside lock access in `console_init()`, but it may also sometimes happen when early console is not even initialized. On Ubuntu Bionic, the first known failing version of intel-microcode package is 3.20191115.1ubuntu0.18.04.1. By reverting this package to any prior revision and such hard lockup is then gone. Another possible temporary work-around is to disable SMP by passing "nosmp" to kernel.

This is currently reproducible on Intel i5-8365U, model 142, stepping 12.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: intel-microcode 3.20191115.1ubuntu0.18.04.2
ProcVersionSignature: Ubuntu 5.0.0-1037.42-oem-osp1 5.0.21
Uname: Linux 5.0.0-1037-oem-osp1 x86_64
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: gdm 1229 F.... pulseaudio
                      u 1855 F.... pulseaudio
Date: Tue Feb 11 05:10:57 2020
DistributionChannelDescriptor:
 # This is the distribution channel descriptor for the OEM CDs
 # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor
 canonical-oem-somerville-bionic-amd64-20190418-59+beaver-osp1-gilly+X27
InstallationDate: Installed on 2020-02-10 (0 days ago)
InstallationMedia: Ubuntu 18.04 "Bionic" - Build amd64 LIVE Binary 20190418-12:10
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 0a5c:5843 Broadcom Corp.
 Bus 001 Device 002: ID 0bda:5532 Realtek Semiconductor Corp.
 Bus 001 Device 004: ID 8087:0029 Intel Corp.
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. Latitude 5410
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.0.0-1037-oem-osp1 root=UUID=7b079ce4-5c1d-401e-b847-d2c8b8fc939b ro initcall_debug ignore_loglevel=1 earlyprintk=efi earlycon=efifb nosmp
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-5.0.0-1037-oem-osp1 N/A
 linux-backports-modules-5.0.0-1037-oem-osp1 N/A
 linux-firmware 1.173.14
SourcePackage: intel-microcode
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/29/2020
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 0.0.7
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr0.0.7:bd01/29/2020:svnDellInc.:pnLatitude5410:pvr:rvnDellInc.:rn:rvr:cvnDellInc.:ct10:cvr:
dmi.product.family: Latitude
dmi.product.name: Latitude 5410
dmi.product.sku: 09C9
dmi.sys.vendor: Dell Inc.

Revision history for this message
You-Sheng Yang (vicamo) wrote :
description: updated
Revision history for this message
You-Sheng Yang (vicamo) wrote :

Filed upstream bug as https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/24

microcode: sig=0x806ec, pf=0x80, revision=0xc6 # good
microcode: sig=0x806ec, pf=0x80, revision=0xca # hang

tags: added: oem-priority originate-from-1855642 somerville
Revision history for this message
You-Sheng Yang (vicamo) wrote :

PPA that reverts `intel-ucode/06-8e-0c`: https://launchpad.net/~vicamo/+archive/ubuntu/ppa-1862751

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Previous reboot hang issue for some Skylake: bug 1854764. Trying to reach @sbeattie.

Revision history for this message
You-Sheng Yang (vicamo) wrote :
Revision history for this message
You-Sheng Yang (vicamo) wrote :

Rebuild packages for F/E/B/X in PPA https://launchpad.net/~vicamo/+archive/ubuntu/ppa-1862751, verified local built binaries of all series. Need review & landing.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Another possible duplicate in bug 1858810.

Changed in oem-priority:
status: New → Confirmed
importance: Undecided → Critical
You-Sheng Yang (vicamo)
Changed in intel-microcode (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
You-Sheng Yang (vicamo) wrote :
Revision history for this message
You-Sheng Yang (vicamo) wrote :

Updated 3.20191115.1ubuntu0.19.10.4, 3.20191115.1ubuntu4 after bug 1862938.

Revision history for this message
Steve Beattie (sbeattie) wrote :

Hi You-Sheng Yang, are you still seeing this issue after the release of the 20200609 microcode update? Particularly after a warm reboot?

Bug 1883002 looks to be the same processor id and reports similar instability with the 20200609 microcode, particularly after a warm reboot.

Thanks.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Yes, this is fixed by 20200609 release:

  microcode: microcode updated early to revision 0xd6, date = 2020-04-23
  microcode: sig=0x806ec, pf=0x80, revision=0xd6
  microcode: Microcode Update Driver: v2.2.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

Verified with 3.20200609.0ubuntu0.18.04.1 by the way.

Changed in intel-microcode (Ubuntu):
status: Triaged → Fix Released
Changed in oem-priority:
status: Confirmed → Fix Released
Changed in hwe-next:
status: New → Fix Released
Revision history for this message
You-Sheng Yang (vicamo) wrote :

Unfortunately, with some more tries, this still happens on kernels of all series.

Changed in hwe-next:
status: Fix Released → Confirmed
Changed in oem-priority:
status: Fix Released → Confirmed
Changed in intel-microcode (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Paul Menzel (paulmenzel) wrote :

Trying to find out, if bug #1883065 [1] is a duplicate of this one.

Is this a laptop? If yes, does having the power cable plugged in make a difference?

Starting with `maxcpus=1` and then trying to bring more CPUs online, on the Dell Precision 3540 with dedicated AMD graphics card, it always hangs after one or two more CPUs when on battery. With the power cable, it often works, but I saw it also failing. I reported the issue to the Linux Kernel Mailing List *Intel laptop: Starting with `maxcpus=1` and then bringing other CPUs online freezes system* [2].

[1]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883065
[2]: https://lkml.org/lkml/2020/6/11/474

Revision history for this message
Paul Menzel (paulmenzel) wrote :

PS: The Linux command line parameter `dis_ucode_ldr` can be used to disable applying microcode updates [1].

[1]: https://lkml.org/lkml/2020/6/11/651

Revision history for this message
You-Sheng Yang (vicamo) wrote :

This is a laptop, and yes, disable SMP is a known work-around. I didn't check if it's related to AC/battery. Upstream for intel-microcode package is https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files, and this has been reported as https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files/issues/24.

You-Sheng Yang (vicamo)
summary: - Hard lockups using microcode releases 20191115 on Intel Whiskey Lake
+ sig=0x806ec/20200609, Hard lockups using microcode releases 20191115 on
+ Intel Whiskey Lake
Revision history for this message
Henrique de Moraes Holschuh (hmh) wrote :

Debian bug report:
https://bugs.debian.org/962757

Facts from debian bug report:
Hang only on rev 0xd6 (20200609), works with previous revisions (including rev. 0xca). Since the bug doesn't always trigger, maybe rev 0xd6 made it easier to trigger.

Revision history for this message
You-Sheng Yang (vicamo) wrote :
Changed in oem-priority:
importance: Critical → Medium
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

what's the status here?

Changed in hwe-next:
status: Confirmed → Incomplete
Timo Aaltonen (tjaalton)
Changed in hwe-next:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.