4.18.0-14 doesn't boot past grub

Bug #1813657 reported by P.D.
86
This bug affects 15 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

4.18.0-14 doesn't boot past grub for me in thinkpad t510i. 4.18.0-13 worked fine. I had to revert back to 4.15 kernel. I'm using a distro that is based off of Ubuntu 18.04.

System info:
CPU: Intel(R) Core(TM) i5 CPU M 520 @ 2.40GHz
        4 cores/threads
        2400.00 MHz
RAM: 7.59 GiB

I'm not sure how to provide crash info, as the kernel doesn't boot.
---
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
CurrentDesktop: X-Cinnamon
DistroRelease: Linux Mint 19.1
HibernationDevice: RESUME=UUID=fcc2ec6a-a02c-4188-ac0d-a8f924741962
InstallationDate: Installed on 2019-01-10 (18 days ago)
InstallationMedia: Linux Mint 19.1 "Tessa" - Release amd64 20181130
Lsusb:
 Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 0a5c:21e8 Broadcom Corp. BCM20702A0 Bluetooth 4.0
 Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: LENOVO 4349BR8
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-44-generic root=UUID=3478dfd0-edae-424a-a34f-9f230cb9cacc ro quiet splash
ProcVersionSignature: Ubuntu 4.15.0-44.47-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-44-generic N/A
 linux-backports-modules-4.15.0-44-generic N/A
 linux-firmware 1.173.3
Tags: tessa
Uname: Linux 4.15.0-44-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo vboxusers wireshark
_MarkForUpload: False
dmi.bios.date: 10/26/2010
dmi.bios.vendor: LENOVO
dmi.bios.version: 6MET81WW (1.41 )
dmi.board.name: 4349BR8
dmi.board.vendor: LENOVO
dmi.board.version: Not Available
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvr6MET81WW(1.41):bd10/26/2010:svnLENOVO:pn4349BR8:pvrThinkPadT510:rvnLENOVO:rn4349BR8:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.family: ThinkPad T510
dmi.product.name: 4349BR8
dmi.product.version: ThinkPad T510
dmi.sys.vendor: LENOVO

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1813657

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
P.D. (paed808) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected tessa
description: updated
Revision history for this message
P.D. (paed808) wrote : AudioDevicesInUse.txt

apport information

Revision history for this message
P.D. (paed808) wrote : CRDA.txt

apport information

Revision history for this message
P.D. (paed808) wrote : CurrentDmesg.txt

apport information

Revision history for this message
P.D. (paed808) wrote : IwConfig.txt

apport information

Revision history for this message
P.D. (paed808) wrote : Lspci.txt

apport information

Revision history for this message
P.D. (paed808) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
P.D. (paed808) wrote : ProcEnviron.txt

apport information

Revision history for this message
P.D. (paed808) wrote : ProcInterrupts.txt

apport information

Revision history for this message
P.D. (paed808) wrote : ProcModules.txt

apport information

Revision history for this message
P.D. (paed808) wrote : PulseList.txt

apport information

Revision history for this message
P.D. (paed808) wrote : RfKill.txt

apport information

Revision history for this message
P.D. (paed808) wrote : UdevDb.txt

apport information

Revision history for this message
P.D. (paed808) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D.: Thanks for the bug report.

If 4.18.0-13 worked fine, why did you need to revert to a 4.15 kernel?

Would you be able to test some custom kernels that I provide you so we can try to narrow down the change that caused this? It should take 1-10 bisections to find out.

Thanks

Revision history for this message
P.D. (paed808) wrote :

@kmously I reverted back to 4.15 because I don't know of anything in 4.18 that would benefit me over 4.15, but I'm willing to test custom kernels because I want this to be fixed.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D.: OK, I recommend switching to 4.18.0-13, since we'll be assuming that it's our last "good" kernel and we'll try to narrow down what's the issue between 4.18.0-13 and 4.18.0-14.

When you say "doesn't boot past grub" - what exactly do you see after grub? Just a blank screen? Any kernel logs at all (or stack trace, emergency shell, etc.) ?

Revision history for this message
P.D. (paed808) wrote :

Ok, I'll install 4.18.0-13 again.

When I mean "it doesn't boot past grub", I mean when I hit enter to boot from the grub screen, it stays on the grub screen, so I see just my grub background forever, and I see my disk activity light blinking a bit. I don't think it even said it was loading the ramdisk.

Revision history for this message
P.D. (paed808) wrote :

Nevermind, it does say "loading ramdisk", it just gets stuck doing it. I have a video of the problem if you want.

Revision history for this message
P.D. (paed808) wrote :

Recovery mode does boot with the problematic kernel, so I can drop into a root shell. Regular mode doesn't work.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v5.0-rc4 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".

Thanks in advance.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.0-rc4/

Revision history for this message
P.D. (paed808) wrote :

5.0-rc4 mainline kernel boots up fine.

tags: added: kernel-fixed-upstream
Revision history for this message
Thomas Debesse (illwieckz) wrote :

Can you provide a new dmesg output from 4.18 kernel since you discovered you are able to boot on recovery mode? Just dump it somewhere so you can recover it once booted graphically with another kernel.

Basically once you are in front of the recovery screen, select the option to enable network, not because of network but because it's known to also mount filesystems in write mode, then select the option to open a root console, then type that:

dmesg > /var/log/dmesg.$(uname -r)

It will produce a file named like /var/log/dmesg.4.18.0-14-generic

Then reboot on a non-faulty kernel and attach this file there, you can delete it from your computer after that.

Revision history for this message
P.D. (paed808) wrote :

Attached is dmesg after booting the faulty kernel in recovery mode.

Revision history for this message
Török Edwin (edwintorok) wrote :

I tried to file a bug, but launchpad keeps timing out (see https://bugs.launchpad.net/ubuntu/+bug/1814585).

I think I have the same problem on Ubuntu 18.10.
4.18.0-13 was fine, 4.18.0-14 always crashes with BUG on boot.

Revision history for this message
Török Edwin (edwintorok) wrote :
Download full text (7.7 KiB)

Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.818888] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822564] [drm] RC6 disabled, disabling runtime PM support
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822604] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822613] PGD 0 P4D 0
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822623] Oops: 0000 [#1] SMP PTI
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822631] CPU: 0 PID: 176 Comm: systemd-udevd Not tainted 4.18.0-14-generic #15-Ubuntu
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822639] Hardware name: LENOVO 647814G/647814G, BIOS 7TET36WW (1.10 ) 05/11/2009
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822732] RIP: 0010:gen4_render_ring_flush+0x60/0x110 [i915]
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822737] Code: 00 48 89 df e8 51 fe ff ff 48 3d 00 f0 ff ff 77 6c 44 89 20 48 8d 48 44 c7 40 04 02 40 00 7a 48 8b 53 78 48 8b 92 10 02 00 00 <48> 8b 52 08 48 c7 40 0c 00 00 00 00 83 ca 04 89 50 08 48 8d 50 14
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822817] RSP: 0018:ffffb15f80a3f978 EFLAGS: 00010287
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822825] RAX: ffffb15f90002000 RBX: ffff967c30438240 RCX: ffffb15f90002044
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822832] RDX: 0000000000000000 RSI: 00000000000001a8 RDI: 0000000000000150
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822838] RBP: ffffb15f80a3f988 R08: 0000000000000001 R09: 000000003c4a9d47
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822846] R10: 00000000181a474d R11: ffff967c30438240 R12: 0000000002000002
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822852] R13: ffff967c30438240 R14: ffff967c31389000 R15: ffff967c30dc8000
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822860] FS: 00007f8967b738c0(0000) GS:ffff967c3bc00000(0000) knlGS:0000000000000000
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822868] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822874] CR2: 0000000000000008 CR3: 0000000130cc4000 CR4: 00000000000006f0
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822881] Call Trace:
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.822970] i915_request_alloc+0x24e/0x370 [i915]
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823055] i915_gem_init+0x26b/0x470 [i915]
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823131] i915_driver_load+0xab8/0xd80 [i915]
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823143] ? mutex_lock+0x12/0x30
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823219] i915_pci_probe+0x46/0x60 [i915]
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823229] local_pci_probe+0x46/0x90
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823238] pci_device_probe+0x11c/0x1a0
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823248] driver_probe_device+0x2e3/0x460
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823256] __driver_attach+0xe4/0x110
Feb 4 15:22:03 mama-tata-laptop kernel: [ 1.823263] ? driver_probe_device+0x460/0x460
Feb 4 15:22:03 mama-tata-laptop ...

Read more...

Revision history for this message
Török Edwin (edwintorok) wrote :

From older, working kernels. Note that the FIFO underrun error is present in these successful boots too (I've been seeing it for years), but it never crashes before on that.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. Thanks for following up.

There are 869 commits between 4.18.0-13 and 4.18.0-14, so it should take up to 10 attempts to bisect the issue.

I have the first test kernel for you based on commit

40208c782496 9p locks: fix glock.client_id leak in do_lock

You can download the test kernel from:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-40208c7-zMA5/

Please let me know if this kernel shows the same problem or not. Thanks

Revision history for this message
Khaled El Mously (kmously) wrote :

@edwintorok: I don't know if it's certain that the issue you're seeing is necessarily the same issue described in the bug report, but feel free to test the same kernels as well and report your result. If you see different results from P.D. then we can continue the investigation of the second problem under another bug report. Thanks.

Revision history for this message
P.D. (paed808) wrote :

@kmously That kernel you provided booted up but it doesn't seem like the intel video drivers worked in that kernel (everything was low resolution, and my desktop environment cinnamon ran in software rendering mode). I only installed these packages though:

linux-headers-4.18.0-14_4.18.0-14.15_all.deb
linux-headers-4.18.0-14-generic_4.18.0-14.15_amd64.deb
linux-image-unsigned-4.18.0-14-generic_4.18.0-14.15_amd64.deb
 linux-modules-4.18.0-14-generic_4.18.0-14.15_amd64.deb

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. Thanks for the info.

We're only concerned with the boot issue here. The video driver problem could be due to a missing module that needs to be installed (e.g. from linux-modules-extra ) or a dkms driver.

I have the second test kernel for you based on commit

5a7784d6a34f scsi: hisi_sas: unmask interrupts ent72 and ent74

You can download the test kernel from:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-5a7784d-Yadl/

Please let me know if this kernel shows the same problem or not. Thanks

Revision history for this message
P.D. (paed808) wrote :

@kmously That kernel does not boot up.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. Thanks for the info.

I have the third kernel for you based on commit:

94785f13e73c efi/arm/libstub: Pack FDT after populating it

You can download the test kernel from:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-94785f1-kUm4/

Please let me know if this kernel shows the same problem or not. Thanks

Revision history for this message
P.D. (paed808) wrote :

@kmously That kernel boots up fine, and you were right about the video driver problem, it was solved by installing the modules-extra package. (Atleast for that kernel)

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. Great, thanks for the feedback.

I have the fourth kernel for you based on commit:

b5970159440d ARM: 8809/1: proc-v7: fix Thumb annotation of cpu_v7_hvc_switch_mm

You can download the test kernel from:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-b597015-sFIG/

Please let me know if this kernel shows the same problem or not. Thanks

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. Sorry, please disregard the last message. (I used the wrong commit to build that kernel).

I will update shortly with a new kernel to test.

Revision history for this message
Martin Barlow (martin-barlow) wrote :

I believe i have same issue from 4.18.0-13 to 4.18.0-14 on lenovo T410. on boot last thing i see is

"Loading initial ramdisk ..."

On subsequent boot on different kernel from syslog i can see an OOPS and stack trace. Attaching. Let me know if i can assist or if its separate issue.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. OK, now I have the fourth kernel for you based on commit:

aa4fa5c8c67b net: hns3: Remove tx budget to clean more TX descriptors in a napi

You can download the test kernel from:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-aa4fa5c-Yud2/

Please let me know if this kernel shows the same problem or not. Thanks

Revision history for this message
Khaled El Mously (kmously) wrote :

@Martin Barlow: It's hard to say if it's the same issue or not. One way we can find out would be if you test the same kernels that I'm building for @P.D. to try to bisect this issue. If your problem is caused by the same commit that is causing @P.D.'s problem, then it's likely the same issue. And if it's not caused by the same commit, then we would have partially bisected your issue as well and we can continue that investigation using another bug report.

So, feel free to test the same kernels I posted above and report your results here.

Revision history for this message
Jan Schnackenberg (yehaa) wrote :

I'm currently seeing this issue on my fathers computer.

After starting 4.18.0-14 the screen goes blank and nothing seems to happen anymore. Booting wiht 4.18.0-13 works without issues.

I noticed, that after some time I can SSH into the machine. So I dumped dmesg with 4.18.0-14 and 4.18.0-13. I'll attach the logs shortly.

I tried to find differences between the two logs. I found these:

1. Additional line in 4.18.0-14

Revision history for this message
Jan Schnackenberg (yehaa) wrote :

Uhm... Sorry for that, I seem to have triggered some keyboard shortcut. To continue:

1. Additional line in 4.18.0-14
Line 355: [ 0.034459] Spectre V2 : Spectre v2 cross-process SMT mitigation: Enabling STIBP

2. Additional line in 4.18.0-14
Line 590: [ 0.124079] pci 0000:00:02.0: BIOS left Intel GPU interrupts enabled; disabling

3. 3 missing lines that were present in 4.18.0-13
[ 1.362647] ata_port ata2: hash matches
[ 1.362648] ata2: hash matches
[ 1.362709] acpi device:16: hash matches

4. A BUG message in 4.18.0-14
[ 3.103113] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 3.103116] PGD 0 P4D 0
[ 3.103119] Oops: 0000 [#1] SMP PTI
[ 3.103122] CPU: 2 PID: 194 Comm: systemd-udevd Not tainted 4.18.0-14-generic #15-Ubuntu
[ 3.103123] Hardware name: /DH55TC, BIOS TCIBX10H.86A.0048.2011.1206.1342 12/06/2011
[ 3.103179] RIP: 0010:gen4_render_ring_flush+0x60/0x110 [i915]
[ 3.103180] Code: 00 48 89 df e8 51 fe ff ff 48 3d 00 f0 ff ff 77 6c 44 89 20 48 8d 48 44 c7 40 04 02 40 00 7a 48 8b 53 78 48 8b 92 10 02 00 00 <48> 8b 52 08 48 c7 40 0c 00 00 00 00 83 ca 04 89 50 08 48 8d 50 14
[ 3.103201] RSP: 0018:ffffb489011ab978 EFLAGS: 00010287
[ 3.103202] RAX: ffffb489101da000 RBX: ffff9a78d63c0b40 RCX: ffffb489101da044
[ 3.103204] RDX: 0000000000000000 RSI: 00000000000001a8 RDI: 0000000000000150
[ 3.103205] RBP: ffffb489011ab988 R08: 0000000000000001 R09: 0000000000000004
[ 3.103206] R10: ffff9a78ebfd1eb0 R11: ffff9a78d63c0b40 R12: 0000000002000022
[ 3.103207] R13: ffff9a78d63c0b40 R14: ffff9a78d5913800 R15: ffff9a78d5b08000
[ 3.103209] FS: 00007fc0c0bb98c0(0000) GS:ffff9a78e3280000(0000) knlGS:0000000000000000
[ 3.103211] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.103212] CR2: 0000000000000008 CR3: 0000000215cf4001 CR4: 00000000000206e0
[ 3.103213] Call Trace:
[ 3.103246] i915_request_alloc+0x24e/0x370 [i915]
[ 3.103276] i915_gem_init+0x26b/0x470 [i915]
[ 3.103303] i915_driver_load+0xab8/0xd80 [i915]

Now... One more interesting observation. I tried to boot 4.18.0-14 without splash screen. For this I edited the entry (using "e" in grub) to removed the lines
               load_video
               gfxmode $linux_gfx_mode
and also the paramters "quiet" and "splash" from the kernel commandline. The last line that was written to the screen was

[ 3.094622] fb: switching to inteldrmfb from EFI VGA

after that, the computer continued booting but the screen stayed blank. The BUG entrie and call trace (etc.) appear a few lines after that in the dmesg output.

Regards,
Jan

Revision history for this message
Jan Schnackenberg (yehaa) wrote :
Revision history for this message
Jan Schnackenberg (yehaa) wrote :
Revision history for this message
Jan Schnackenberg (yehaa) wrote :

@Khaled El Mously: I could try your kernels tomorrow (I hope). Can you please tell me which packages I'd need to install? That's quite a list of files there. ;)

I'm currently guessing the packages with these "base-names":
linux-headers-4.18.0-14
linux-headers-4.18.0-14-generic
linux-image-unsigned-4.18.0-14-generic
linux-modules-4.18.0-14-generic
linux-modules-extra-4.18.0-14-generic

But I only base this on the fact that those are the ones that were installed day before yesterday to cause this issue.

Regards,
Jan

Revision history for this message
P.D. (paed808) wrote :

@kmously The fourth kernel doesn't boot.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. Thanks for the feedback.

I have the fifth kernel for you based on commit:

ce68bab41c06 drm/i915: Fix ilk+ watermarks when disabling pipes

You can download the test kernel from:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-ce68bab-GzyB/

Please let me know if this kernel shows the same problem or not. Thanks

Revision history for this message
P.D. (paed808) wrote :

@kmously The fifth kernel boots up fine.

Revision history for this message
Khaled El Mously (kmously) wrote :

@Jan Schnackenberg: Thanks for that feedback and helpful and analysis. The Spectre-related change looked suspicious at first. However, if that was indeed the cause of your issue, then it's almost certainly a different issue than the one described in this bug report and experienced by @P.D., since at this point in the bisection, we've eliminated Spectre-related changes from the pool of possible culprits.

In fact, at this point it's highly likely that the problem is a regression in the Intel i915 driver. This matches with the stacktraces in all the bug reports so far, and also coincides with a few i915 changes on which we're narrowing in in the bisection process.

@Jan Schnackenberg, you are encouraged to try the test kernels that I've linked above and report your results. Your problem is almost certainly the same problem described in this bug (and if it's not, we can continue that investigation as part of another bug).

As for the required .deb files, you can forgo the -headers files, so the following should be sufficient:

linux-image-unsigned-4.18.0-14-generic
linux-modules-4.18.0-14-generic
linux-modules-extra-4.18.0-14-generic

Thanks.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. Thanks for the feedback.

I have the sixth kernel for you based on commit:

1ab407cbd9c3 (tag: sixth-bisect) drm/i915: Mark pin flags as u64

You can download the test kernel from:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-1ab407c-XRVe/

Please let me know if this kernel shows the same problem or not. Thanks

Revision history for this message
P.D. (paed808) wrote :

@kmously The sixth kernel boots up fine.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. Thanks for the feedback.

I have the seventh kernel for you based on commit:

d8370b8fbadf drm/i915/execlists: Force write serialisation into context image vs execution

You can download the test kernel from:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-d8370b8-MWga/

Please let me know if this kernel shows the same problem or not. Thanks

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D.: I also have the eighth (and final) kernel almost ready for you.

It is based on commit:

325f8e18c8ac drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5

and it should be available at:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-325f8e1-2dRL

in about 40 minutes.

This should be the last kernel that you'll need to test for bisection. I think the problematic change is "drm/i915/execlists: Force write serialisation into context image vs execution". This can be confirmed by confirming that the eighth kernel (which does not have that change) boots fine.

Thanks

Revision history for this message
P.D. (paed808) wrote :

@kmously The seventh kernel doesn't boot up.

Revision history for this message
P.D. (paed808) wrote :

I'll test the 8th one when it becomes available.

Revision history for this message
P.D. (paed808) wrote :

@kmously Unfortunately, the eighth kernel doesn't boot.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. That's surprising. In that case, it means the offending commit is "325f8e18c8ac drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5". I was expecting the offending commit to be "d8370b8fbadf drm/i915/execlists: Force write serialisation into context image vs execution" because that appears to have been identified upstream as causing a regression with symptoms similar to what you're experiencing (the regression has upstream commit ID 987abd5c62f9 and the fix for it is https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cf66b8a0ba142fbd1bf10ac8f3ae92d1b0cb7b8f )

I had built a kernel with the upstream fix for you to test and was expecting it would fix your problem. Do you mind trying it anyway to confirm that it does NOT fix your problem?

That kernel is built from commit:

619b700ded65 drm/i915/execlists: Apply a full mb before execution for Braswell

and can be found at:

https://kernel.ubuntu.com/~kmously/1813657/kernel-kmously-619b700-ouqM/

Thanks.

Revision history for this message
P.D. (paed808) wrote :

@kmously I can confirm that that kernel does NOT fix my problem, aka the system does not boot up with that kernel.

Revision history for this message
P.D. (paed808) wrote :

I looked up the "325f8e18c8ac drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5" commit and it looks like the Tails distro had the same issue with that commit https://redmine.tails.boum.org/code/issues/16224

Revision history for this message
Jan Schnackenberg (yehaa) wrote :

I just stumbled across Bug #1814555

That looks like the exact same problem (i915 driver causing the boot to fail). They even identified the same source commit.

Revision history for this message
Khaled El Mously (kmously) wrote :

@P.D. This issue will now be tracked under Bug #1814555

Under that bug, a test kernel was provided with the problematic fix removed (comment #8), and that seems to have positive testing results.

Feel free to follow Bug #1814555 and/or try out the test kernel from that bug to confirm that it fixes this problem. The fix is expected to land in the next Cosmic kernel, Ubuntu-4.18.0-15.16

@P.D. and others: Thanks for filing the bug and helping troubleshoot the problem!

Revision history for this message
Dan Wilson (wildajo) wrote :

When I booted Kubuntu, the hard drive light flashed a few times, but otherwise nothing happened; just a black screen. Had to push the hardware restart button to get going again. Next boot I selected advanced options and the previous (4.18.0-13) kernel, and all started normally. I've uninstalled 4.18.0.14 until the bug is fixed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.