linux-image-4.15.0-177-generic freezes on the welcome screen

Bug #1973167 reported by Robert Schlabbach
134
This bug affects 25 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
Bionic
Fix Released
High
Unassigned

Bug Description

== SRU Justification ==
[Impact]
Freeze at boot. intel-lpss is trying to synchronously load idma64 module
in asynchronous conext.

[Fix]
Let userspace load the module to avoid the deadlock.

[Test]
Users confirmed the proposed fix can solve the issue.

[Where problems could occur]
Userspace can handle modules loading just fine, so there shouldn't be
any surprise.

== Original Bug Report ==
After updating to linux-image-4.15.0-177-generic, my machine completely freezes on the Ubuntu Welcome screen, i.e. right after switching to GUI mode. The mouse pointer is frozen, the keyboard does not even respond to CAPS LOCK or NUM LOCK, pressing CTRL+ALT+F2 shows no reaction.

I cannot work with my machine at this point and have to hard reset it.

Selecting advanced boot options in grub and selecting the previous linux kernel 4.15.0-176 makes it work again. So this bad bug was introduced with the 4.15.0-177 kernel release.

I tried removing the nvidia-driver-510 package, presumably making Ubuntu use the "nouveau" driver, and with that, I could use the welcome screen and log in, but the machine still froze shortly afterwards.

So maybe this is some sort of interference with my GeForce 1080 graphics card, but it is not specific to the driver used.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-177-generic:amd64 4.15.0-177.186
ProcVersionSignature: Ubuntu 4.15.0-176.185-generic 4.15.18
Uname: Linux 4.15.0-176-generic x86_64
NonfreeKernelModules: nvidia_drm nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu7.27
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: robert 2082 F.... pulseaudio
 /dev/snd/controlC0: robert 2082 F.... pulseaudio
 /dev/snd/controlC2: robert 2082 F.... pulseaudio
CurrentDesktop: Unity:Unity7:ubuntu
Date: Thu May 12 13:36:28 2022
EcryptfsInUse: Yes
InstallationDate: Installed on 2015-10-29 (2386 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Release amd64 (20151021)
MachineType: Supermicro Super Server
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-176-generic root=UUID=7c296e4d-0189-43a0-aef8-7d536b98a536 ro
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-176-generic N/A
 linux-backports-modules-4.15.0-176-generic N/A
 linux-firmware 1.173.21
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2019-07-29 (1017 days ago)
dmi.bios.date: 09/18/2020
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3.4
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: X11SAE
dmi.board.vendor: Supermicro
dmi.board.version: 1.01
dmi.chassis.asset.tag: To be filled by O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 0123456789
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3.4:bd09/18/2020:svnSupermicro:pnSuperServer:pvr0123456789:rvnSupermicro:rnX11SAE:rvr1.01:cvnSupermicro:ct17:cvr0123456789:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: Super Server
dmi.product.version: 0123456789
dmi.sys.vendor: Supermicro

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

After finding someone reporting the same issue (https://forums.linuxmint.com/viewtopic.php?t=373747&p=2175201) and reading that the machine may come alive after a while, I found that indeed after 3 minutes the machine works. dmesg that reveals that it seemingly was the initialization of the intel_lpss driver that took 3 minutes:

[ 6.889189] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
[ 7.112397] input: PC Speaker as /devices/platform/pcspkr/input/input5
[ 7.220480] RAPL PMU: API unit is 2^-32 Joules, 5 fixed counters, 655360 ms ovfl timer
[ 7.220481] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[ 7.220481] RAPL PMU: hw unit of domain package 2^-14 Joules
[ 7.220482] RAPL PMU: hw unit of domain dram 2^-14 Joules
[ 7.220482] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[ 7.220483] RAPL PMU: hw unit of domain psys 2^-14 Joules
[ 9.882380] IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
[ 10.068162] IPv6: ADDRCONF(NETDEV_UP): eno1: link is not ready
[ 10.070009] IPv6: ADDRCONF(NETDEV_UP): eno2: link is not ready
[ 10.115669] IPv6: ADDRCONF(NETDEV_UP): eno2: link is not ready
[ 10.117735] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[ 10.615555] IPv6: ADDRCONF(NETDEV_UP): enp2s0: link is not ready
[ 14.114836] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 14.114883] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[ 15.199994] atlantic: link change old 0 new 1000
[ 15.200150] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0: link becomes ready
[ 187.423062] intel-lpss 0000:00:15.1: enabling device (0000 -> 0002)
[ 187.432242] mei_me 0000:00:16.0: enabling device (0000 -> 0002)
[ 187.432264] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 187.441672] AVX2 version of gcm_enc/dec engaged.
[ 187.441673] AES CTR mode by8 optimization enabled
[ 187.445619] idma64 idma64.0: Found Intel integrated DMA 64-bit
[ 187.452528] idma64 idma64.1: Found Intel integrated DMA 64-bit

Note these lines:
[ 6.889189] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
[ 187.423062] intel-lpss 0000:00:15.1: enabling device (0000 -> 0002)
[ 187.432242] mei_me 0000:00:16.0: enabling device (0000 -> 0002)
[ 187.445619] idma64 idma64.0: Found Intel integrated DMA 64-bit
[ 187.452528] idma64 idma64.1: Found Intel integrated DMA 64-bit

For comparison, the same lines from the 4.15.0-176 dmesg:

[ 6.321873] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
[ 6.340549] idma64 idma64.0: Found Intel integrated DMA 64-bit
[ 6.345409] intel-lpss 0000:00:15.1: enabling device (0000 -> 0002)
[ 6.345610] idma64 idma64.1: Found Intel integrated DMA 64-bit
[ 6.350618] mei_me 0000:00:16.0: enabling device (0000 -> 0002)

So the initialization time increased from 30ms of 180s, i.e. by a factor of 6,000. This cannot be right.

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

After grepping dmesg for all lines about the "00:15" devices, I found two lines starting with "DMAR:", which made me think of another machine running debian which had issues with DMAR: devices that were related to the Intel IOMMU.

So I tried the workaround I knew from there:

Edit /etc/default/grub and add "intel_iommu=off" to the GRUB_CMDLINE_LINUX_DEFAULT value, then run "sudo update-grub" to update the grub configuration and reboot.

and voila, intel-lpss initializes within a few milliseconds again:

[ 6.518592] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
[ 6.560693] idma64 idma64.0: Found Intel integrated DMA 64-bit
[ 6.569509] intel-lpss 0000:00:15.1: enabling device (0000 -> 0002)
[ 6.569691] idma64 idma64.1: Found Intel integrated DMA 64-bit
[ 6.574164] mei_me 0000:00:16.0: enabling device (0000 -> 0002)

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.15.0-177-generic root=UUID=<blabla> ro intel_iommu=off

So some change between 4.15.0-176 and 4.15.0-177 seemingly broke the Intel IOMMU. Question is whether it was a formerly "dormant" BIOS bug that was only unveiled by some change, or whether it is a newly introduced Linux bug that broke Intel IOMMU support...

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

Alas! I spoke to seen. The seeming "workaround" only lasted for one boot, and after that, the ~180 seconds delay is back to stay, although the Intel IOMMU is still disabled.

So this issue is "sporadic". Maybe even a race condition... :(

And maybe not even related to the intel-lpss driver: The line:

"intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)"

seems to come from drivers/pci/setup-res.c:

 if (cmd != old_cmd) {
  dev_info(&dev->dev, "enabling device (%04x -> %04x)\n",
    old_cmd, cmd);
  pci_write_config_word(dev, PCI_COMMAND, cmd);
 }

So it just enables the PCI device, which at some point leads to drivers/mfd/intel-lpss.c#intel_lpss_probe() being called which requests the DMA module leading to drivers/dma/idma64.c#idma64_probe() being called which finally outputs:

dev_info(chip->dev, "Found Intel integrated DMA 64-bit\n");

So a lot of code between these two log lines:

[ 6.439056] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
[ 187.141427] idma64 idma64.0: Found Intel integrated DMA 64-bit

Revision history for this message
Andy Townsend (someoneelseosm) wrote :
Download full text (18.4 KiB)

A similar issue here, different hardware and graphics to above machine. Dell 5000, Ubuntu 18.04, boots OK off "4.15.0-176-generic" but with "Linux 4.15.0-177-generic" hangs at the welcome screen.

"sudo lspci -vnvn" (when booted off 4.15.0-176-generic) returns:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5914] (rev 08)
 Subsystem: Dell Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [1028:0808]
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
 Latency: 0
 Capabilities: [e0] Vendor Specific Information: Len=10 <?>
 Kernel driver in use: skl_uncore

00:02.0 VGA compatible controller [0300]: Intel Corporation UHD Graphics 620 [8086:5917] (rev 07) (prog-if 00 [VGA controller])
 Subsystem: Dell UHD Graphics 620 [1028:0808]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 125
 Region 0: Memory at d0000000 (64-bit, non-prefetchable) [size=16M]
 Region 2: Memory at c0000000 (64-bit, prefetchable) [size=256M]
 Region 4: I/O ports at f000 [size=64]
 [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
 Capabilities: [40] Vendor Specific Information: Len=0c <?>
 Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
  DevCap: MaxPayload 128 bytes, PhantFunc 0
   ExtTag- RBE+
  DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
   RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
   MaxPayload 128 bytes, MaxReadReq 128 bytes
  DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
  DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
  DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
  Address: fee00018 Data: 0000
 Capabilities: [d0] Power Management version 2
  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [100 v1] Process Address Space ID (PASID)
  PASIDCap: Exec- Priv-, Max PASID Width: 14
  PASIDCtl: Enable- Exec- Priv-
 Capabilities: [200 v1] Address Translation Service (ATS)
  ATSCap: Invalidate Queue Depth: 00
  ATSCtl: Enable-, Smallest Translation Unit: 00
 Capabilities: [300 v1] Page Request Interface (PRI)
  PRICtl: Enable- Reset-
  PRISta: RF- UPRGI- Stopped+
  Page Request Capacity: 00008000, Page Request Allocation: 00000000
 Kernel driver in use: i915
 Kernel modules: i915

00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 08)
 Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [1028:0808]
 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66M...

Revision history for this message
Uwe G. (uwe-007) wrote (last edit ):

I'm affected too.

Kubuntu 18.04, Acer Aspire 5 (A517-51G-5826), NVIDIA GeForce MX130

Revision history for this message
piotr (piotr-sawicki) wrote :

I'm affected too

Ubuntu 18.04.6 LTS, Asus UX305UA

Revision history for this message
John F (nunsuch) wrote :

Also effected - Asus UX430UA Intel I5 running Ubuntu 18.04.6
Noticed 3m delay when processing UFW and possibly boot/efi. Also effected sound, not checked other functions.
Reverting to 4.15.0-176 resolves the problem
Not sure what log to upload - please advise.

Revision history for this message
Yury Krasouski (krasoffski) wrote :

Have the same problem on ThinkPad T480s and 18.04.6

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

Trying to isolate the issue, I:

1. Booted my portable Ubuntu 18.04.6 installation on a different machine (different CPU, chipset, but still Intel). On the other machine, kernel 4.15.0-177 booted without issues, so apparently it does not depend on the installation, but rather on the hardware or BIOS whether the freeze occurs or not.

2. Replaced the entire /lib/modules/4.15.0-177-generic/ with the contents from /lib/modules/4.15.0-176-generic/ and rebuilt initramfs. This did NOT cure the freezes (only caused a lot of module signature errors, confirming that the modules really were replaced). So whatever is causing the freezes is not in one of loadable modules, but rather within the kernel itself or the builtin modules.

Still, the 176-to-177 diff over the builtin modules is HUGE... :-/

Revision history for this message
Uwe G. (uwe-007) wrote :

I just installed kernel 4.15.0-180 ... no change ... problem is still there :-(

Revision history for this message
Andy Townsend (someoneelseosm) wrote :

> I just installed kernel 4.15.0-180 ... problem is still there

Also for me - no problem in -176, but problems in -177 and -180.

If there's logs that it'd be useful to see, just ask.

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

I have found a change between 176 and 177 that looks suspicious:

diff -upr --color 176/linux-source-4.15.0/kernel/module.c 177/linux-source-4.15.0/kernel/module.c
--- 176/linux-source-4.15.0/kernel/module.c 2022-03-29 19:39:48.000000000 +0200
+++ 177/linux-source-4.15.0/kernel/module.c 2022-04-14 22:22:50.000000000 +0200
@@ -3520,22 +3514,13 @@ static noinline int do_init_module(struc

- /*
- * We need to finish all async code before the module init sequence
- * is done. This has potential to deadlock. For example, a newly
- * detected block device can trigger request_module() of the
- * default iosched from async probing task. Once userland helper
- * reaches here, async_synchronize_full() will wait on the async
- * task waiting on request_module() and deadlock.
- *
- * This deadlock is avoided by perfomring async_synchronize_full()
- * iff module init queued any async jobs. This isn't a full
- * solution as it will deadlock the same if module loading from
- * async jobs nests more than once; however, due to the various
- * constraints, this hack seems to be the best option for now.
- * Please refer to the following thread for details.
- *
- * http://thread.gmane.org/gmane.linux.kernel/1420814
- */
- if (!mod->async_probe_requested && (current->flags & PF_USED_ASYNC))
+ /*
+ * We need to finish all async code before the module init sequence
+ * is done. This has potential to deadlock if synchronous module
+ * loading is requested from async (which is not allowed!).
+ *
+ * See commit 0fdff3ec6d87 ("async, kmod: warn on synchronous
+ * request_module() from async workers") for more details.
+ */
+ if (!mod->async_probe_requested)
   async_synchronize_full();

Maybe this is the deadlock we're all running into...?

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

My "prime suspect" is this commit:
https://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/commit/kernel/module.c?h=Ubuntu-4.15.0-177.186&id=3879f4364139acb2bd3932e6a15994f109c49d6b

Also see Linus' comments when this patch was submitted to the mainline kernel:
https://www.spinics.net/lists/kernel/msg4223720.html

"that might be a big deal slowing things down at boot time.
[...]
Comments? Maybe this is a "just apply it, see if somebody screams" situation?"

From what I understand, other measures taken in the kernel and modules make this issue no longer occur, so my hypothesis is that this may not have caused issues in the current kernel. So the mistake was that Canonical backported this commit to a very old kernel version that kernel.org no longer maintains...

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

FIXED IT! It is indeed the intel_lpss driver, at least in my case.

This commit:
https://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/commit/kernel/module.c?h=Ubuntu-4.15.0-177.186&id=3879f4364139acb2bd3932e6a15994f109c49d6b

will not work right when a module that is asynchronously loaded tries to synchronously load a module, which is not (or no longer?) allowed. It appears the intel_lpss driver did just that until this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/mfd/intel-lpss.c?id=569fac74627cc332a2097a7a4bfdc654b8e7f273

But this commit has not been backported to the 4.15 kernel, so the intel_lpss driver delivered with the 4.15.0-177 kernel package does not have it.

Applying this commit to the 4.15.0-177 source tree, rebuilding and replacing (only!) the intel_lpss.ko module makes the kernel load without delays for me. I confirmed the same with kernel 4.15.0-180.

So now we only need to convince the Ubuntu maintainers to backport https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/mfd/intel-lpss.c?id=569fac74627cc332a2097a7a4bfdc654b8e7f273 to the 4.15 kernel...

Changed in linux (Ubuntu Bionic):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Jerome Pansanel (pansanel) wrote :

It affects to:
Kubuntu 18.04, Dell Latitude 7490.
I could start by blacklisting the intel-lpss-pci module (in this case the mouse doesn't work) and loading it again after boot (the mouse works again fine).

Revision history for this message
msmalin (msmalin) wrote :

I'm on a Dell Precision 3520 with Kubuntu 18.04, I have the same issue with 4.15.0-177 and 4.15.0-180. This also seems to disable my sound, bluetooth, and wifi. Currently booting 4.15.0-176.

Revision history for this message
Alexis Scheuer (alexis-scheuer) wrote :

Hi!
After all, I am lucky: my Dell Latitude 5480 succeeds sometimes to start XUbuntu 18.04.6 LTS normally, but often freezes on boot because "A start job is running for ..." (several messages of this type), both with linux-image-4.15.0-177-generic and linux-image-4.15.0-180-generic, while everything goes fine with linux-image-4.15.0-166-generic (which I installed back).
I will give a try with 176.

Revision history for this message
Celso Macêdo (celsovsm) wrote :

Same problem here with ASUS VivoBook X510U

Revision history for this message
José Eduardo Esteves Filho (joseesteves) wrote :

Same problem here. Can't boot anything after 4.15.0-176. Boot sequence falls into the emergency mode or never boots.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Here's a test kernel with the commit:
https://people.canonical.com/~khfeng/lp1973167/

Revision history for this message
Thimo E (thimoe) wrote :

Thank you for your analysis and test kernel.
A lot of our machines (Supermicro X11 / Xeon W-2133 based) also suffer from the problem introduced by 4.15.0-177.

I can confirm that the kernel 4.15.0-182-generic #191+lp1973167 provided by Kai-Heng Feng fixes the issue on my HW.

Could you please proceed with the roll-out of this patch?

Revision history for this message
Joerg Mattiello (mjoe) wrote :

We are waiting for the new kernel too as we are currently blocked with new installations based on 18.04 LTS. Can you please give a date when it will be ready? Thanks!

Revision history for this message
Andy Townsend (someoneelseosm) wrote :

I have the same issue with 4.15.0-184 as with .177 and .180.

description: updated
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Franck Iaropoli (franck-iaropoli-arm) wrote :

@Kai-Heng Fenq thanks for your test kernel.
I can confirm that it works for our affected users :)

Revision history for this message
John F (nunsuch) wrote :

Problem continues with #180 and #184. Extended boot delay (3 minutes to complete) - Message " (1of2) A start job is running for /boot/efi (dk exact time delay) AND (2of2) A start job is running for Uncomplicated firewall (takes 3 minutes to complete)". Also No Sound and No WiFi.

Kernel version #176 worked but was removed by auto update. However, Version #177 seemed to stabilize and also now working properly. Have temporarily disabled auto update to preserve version #177.

Can postlogs if advised which one(s) to upload.

Revision history for this message
Victor Abend (vabend) wrote :

I'm having exactly the same problems. (Lenovo ideapat, Linux Mint 19)
Just tested with #187. Problems remain.
Last working Kernel was #176.

Revision history for this message
n3u20m4nc32 (n3u20m4nc32) wrote :

I have the same problem with a (HP Probook) and can confirm that the problem ist still there with the kernel #187.

Revision history for this message
Andrew Lee Adams (andrew-lee-adams) wrote :

Same problem, Dell Precision 5510. I have paused updates, waiting for the fix to be released.

Revision history for this message
Uwe G. (uwe-007) wrote :

(I am not an expert.)
Can anyone tell me, what are the next steps after "SRU requested" (#26) and how long this is likely to take?

Please write the version of the fixed kernel (if available), so that I know which one for update.

Revision history for this message
Claudia (claudia-m) wrote :

Hi All,

We are also suffering with this problem (roughly 100 machines)
Kernel 4.15.0-182-generic #191+lp1973167 fixes this issue for me.

Any updates on the release schedule?

Thanks,
Claudia

Stefan Bader (smb)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in linux (Ubuntu Bionic):
status: Confirmed → Fix Committed
Revision history for this message
zefciu (zefciu) wrote (last edit ):

EDIT: nvm that was a fluke

Booted my latitude 5490 with kernel 188 version. Had problems with previous three versions. Now everything seems to work fine.

Revision history for this message
Andreas Weber (aenduweber) wrote :

Kernel 188 does not fix it (extensive testing done), neither does the kernel changelog for 188 show an entry that would point to the fix of this bug.

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :
Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

It is in the pipeline for kernel 189 (4.15.0-189.200):
https://kernel.ubuntu.com/git/ubuntu/ubuntu-bionic.git/?h=Ubuntu-4.15.0-189.200

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/4.15.0-189.200 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Andreas Weber (aenduweber) wrote :

Kernel 189 does fix the problem for us, so tag changed to 'verification-done-bionic'.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Thimo E (thimoe) wrote :

I can also confirm kernel linux/4.15.0-189.200 from proposed fixes this issue.

Thank you for your support.

Revision history for this message
Robin Seidel (mrcode0x61) wrote :

The proposed fix (linux-image-4.15.0-189-generic/bionic-proposed) does NOT work for me as expected:

+ Could not observe delayed booting anymore
- but still no WiFi, Sound
- additionally, now the external monitor over USB-C does NOT work anymore either.

=> 176 is still the only way I get through my day.

Additional note: at least for the 188 I can say that it worked sometimes (no delayed booting, WiFi, sound, etc.) but most of the times it did not. Therefore I strongly suspect a race-condition / bad-interleaving of some sort.

Device:
Ubuntu 18.04.06 on a Dell XPS 9560
Intel i7 7th Gen,
Nvidia GTX 1050M (though running on Intel HD Graphics 630)

This has been going on for nearly 2 months now, is there hope that this will be fixed or should I just move on to a newer distro / kernel 5.x ?

Revision history for this message
Johannès Jahan (jjahan) wrote (last edit ):

I have the same issue and linux-image-4.15.0.189-generic did not solve it for me neither (downloaded from https://ubuntu.pkgs.org/18.04/ubuntu-proposed-main-amd64/linux-image-4.15.0-189-generic_4.15.0-189.200_amd64.deb.html).

I still have no WiFi, no control over sound, luminosity or whatsoever, and HDMI output doesn't work.

My device :
Ubuntu 18.04.6 LTS
Asus Zenbook UX410UF-GV028T
Intel i7-8550U
Intel UHD Graphics 620

I don't know how to change the tag to 'verification-failed-bionic' though...

Revision history for this message
Robert Schlabbach (robert-s-t) wrote :

@Robin Seidel, @Johannès Jahan,

that was the risk, that there might be other drivers in the old 4.15 kernel that also still request a synchronous module load from an asynchronous context. The general change in the module loader exposes any such driver. My hope was that intel-lpss would be the only one, but apparently that is not the case.

What you could do is run capture the output of "lsmod" in a working kernel (176) and then in this proposed kernel (189) and diff them. The 189 output should be missing some modules (or show them with fewer or 0 references). Those are the ones which failed to load due to the "wrong" way of doing things.

BTW, once the system is up, you can do "sudo modprobe <module name>" to load them. I have to do that in the 188 kernel (after waiting the 3 minutes) with "snd_hda_intel" to make the sound work.

Revision history for this message
Uwe G. (uwe-007) wrote (last edit ):

I switched to HWE kernel. (5.4.0-121-generic)
Maybe this is helpful for someone here ...

https://wiki.ubuntu.com/Kernel/LTSEnablementStack

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Seems like modules and headers packages are not installed.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.9 KiB)

This bug was fixed in the package linux - 4.15.0-189.200

---------------
linux (4.15.0-189.200) bionic; urgency=medium

  * bionic/linux: 4.15.0-189.200 -proposed tracker (LP: #1979525)

  * linux-image-4.15.0-177-generic freezes on the welcome screen (LP: #1973167)
    - mfd: intel-lpss: Use MODULE_SOFTDEP() instead of implicit request

  * Bionic update: upstream stable patchset 2022-06-03 (LP: #1977622)
    - etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead
    - mm: page_alloc: fix building error on -Werror=array-compare
    - tracing: Dump stacktrace trigger to the corresponding instance
    - gfs2: assign rgrp glock before compute_bitstructs
    - ALSA: usb-audio: Clear MIDI port active flag after draining
    - tcp: fix race condition when creating child sockets from syncookies
    - tcp: Fix potential use-after-free due to double kfree()
    - dmaengine: imx-sdma: Fix error checking in sdma_event_remap
    - net/packet: fix packet_sock xmit return value checking
    - netlink: reset network and mac headers in netlink_dump()
    - ARM: vexpress/spc: Avoid negative array index when !SMP
    - platform/x86: samsung-laptop: Fix an unsigned comparison which can never be
      negative
    - ALSA: usb-audio: Fix undefined behavior due to shift overflowing the
      constant
    - vxlan: fix error return code in vxlan_fdb_append
    - cifs: Check the IOCB_DIRECT flag, not O_DIRECT
    - brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
    - drm/msm/mdp5: check the return of kzalloc()
    - net: macb: Restart tx only if queue pointer is lagging
    - stat: fix inconsistency between struct stat and struct compat_stat
    - ata: pata_marvell: Check the 'bmdma_addr' beforing reading
    - dma: at_xdmac: fix a missing check on list iterator
    - powerpc/perf: Fix power9 event alternatives
    - openvswitch: fix OOB access in reserve_sfa_size()
    - ASoC: soc-dapm: fix two incorrect uses of list iterator
    - e1000e: Fix possible overflow in LTR decoding
    - ARC: entry: fix syscall_trace_exit argument
    - ext4: fix symlink file size not match to file content
    - ext4: fix overhead calculation to account for the reserved gdt blocks
    - ext4: force overhead calculation if the s_overhead_cluster makes no sense
    - staging: ion: Prevent incorrect reference counting behavour
    - block/compat_ioctl: fix range check in BLKGETSIZE
    - ax25: add refcount in ax25_dev to avoid UAF bugs
    - ax25: fix reference count leaks of ax25_dev
    - ax25: fix UAF bugs of net_device caused by rebinding operation
    - ax25: Fix refcount leaks caused by ax25_cb_del()
    - ax25: fix UAF bug in ax25_send_control()
    - ax25: fix NPD bug in ax25_disconnect
    - ax25: Fix NULL pointer dereferences in ax25 timers
    - ax25: Fix UAF bugs in ax25 timers
    - ASoC: atmel: Remove system clock tree configuration for at91sam9g20ek
    - net/sched: cls_u32: fix possible leak in u32_init_knode()
    - drm/panel/raspberrypi-touchscreen: Avoid NULL deref if not initialised
    - drm/panel/raspberrypi-touchscreen: Initialise the bridge in prepare

  * Bionic update: upstream stable patchset 2022-05-17 (LP: #197...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Andy Townsend (someoneelseosm) wrote :

The same system as https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1973167/comments/6 no longer freezes on boot with 4.15.0-189. Sound also works.

Revision history for this message
John F (nunsuch) wrote :

Problem resolved. Want to confirm 4.15.0-189.200 fixed problem. Big thank you to all those to worked on the fix -- much appreciated!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.