after upgrade to linux-image-extra-4.13.0-26-generic the system does not start at all

Bug #1742675 reported by Kamil
60
This bug affects 11 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

After upgrade to linux-image-extra-4.13.0-26-generic the system does not start at all.
Only restart through Advanced Options with linux-image-extra-4.10.0-42-generic helps.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: ubuntu-release-upgrader-core 1:16.04.23
ProcVersionSignature: Ubuntu 4.10.0-42.46~16.04.1-generic 4.10.17
Uname: Linux 4.10.0-42-generic i686
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: i386
CrashDB: ubuntu
CurrentDesktop: Unity
Date: Thu Jan 11 15:20:26 2018
InstallationDate: Installed on 2017-06-23 (201 days ago)
InstallationMedia: Ubuntu 16.04.2 LTS "Xenial Xerus" - Release i386 (20170215.2)
PackageArchitecture: all
SourcePackage: ubuntu-release-upgrader
Symptom: release-upgrade
UpgradeStatus: No upgrade log present (probably fresh install)
mtime.conffile..etc.update-manager.release-upgrades: 2017-06-24T18:05:02.372991

Revision history for this message
Kamil (kamil2018) wrote :
Revision history for this message
Jon Evans (evansj) wrote :

This is affecting me as well. I had to connect up a monitor and keyboard to see what's happening - it's a kernel panic on boot. Machine is a Dell Precision T5500.

Another data point, 4.13.0-26-generic (recovery mode) works OK.

Screenshot attached.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu-release-upgrader (Ubuntu):
status: New → Confirmed
Revision history for this message
Jon Evans (evansj) wrote :

I noticed that Dell BIOS A09 was ancient, so I upgraded to a (slightly) newer version, A16 from 2013. It still crashes with 4.13.0-26-generic but the panic message is different.

Revision history for this message
Kamil (kamil2018) wrote :

I suppose, the following 2 bugs are related to the fact that I work under linux-image-extra-4.10.0-42-generic. Both appeared in Phpstorm only recently after this main bug:
a) Once some PHP files edited through Phpstorm were cached somewhere deeply in Ubuntu. So, when i ran them through local webserver older versions were loaded. The bug was resolved by restarting Ubuntu.
b) Now there is continuous bug. Phpstorm performs some indexing operations with files continously and some part of UI are blinking.

As you can see all this is related somehow to file system.

I'm not sure whether I need to post these bugs separately.

affects: ubuntu-release-upgrader (Ubuntu) → linux (Ubuntu)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the proposed kernel and post back if it resolves this bug?
See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Thank you in advance!

tags: added: pti
tags: added: kernel-da-key needs-bisect
Revision history for this message
Kamil (kamil2018) wrote :

I prefer to avoid testing it on working computer.

However, I can try to test it on VirtualBox if it is acceptable for this bug.
Just explain all steps, please.

Revision history for this message
Jon Evans (evansj) wrote :

@jsalisbury

Thanks, I followed the instructions and tried booting 4.13.0-30-generic. I still get a kernel panic. I've attached a screenshot from grub with the kernel version visible, and a screenshot of the panic.

Thanks

Revision history for this message
Jon Evans (evansj) wrote :
Revision history for this message
Peter LaDow (suudy) wrote :

I have a Dell Optiplex 960 that exhibits a reboot during boot with 4.13.0-26 with no kernel panic message (though 4.10.0-42 works fine). I managed to get console dump via serial, which had no information.

Initially I thought it was an out of date BIOS. The machine had A03, so I updated to A17. Still reboots.

I tried some other options, such as intel_iommu=off, and no luck.

I tried 4.13.0-30 from the proposed repository, and no luck.

Finally, I stumbled across bug 1735024. Adding acpi=off worked.

Just a suggestion for those having problems.

Revision history for this message
Peter LaDow (suudy) wrote :

Using ukuu, I tried 4.14.13-041413.201801101001 and it too fails without acpi=off.

Revision history for this message
Davis King (davis685) wrote :

I have the same issue, an automatic upgrade to 4.13.0-30-generic and now I can't boot. Just kernel panics right away. If I boot with 4.10.0 it works fine. I'm also attaching a screenshot of my kernel panic message.

Revision history for this message
Jon Evans (evansj) wrote :

It still fails for me with acpi=off.

A bit of background, this PC has been running fine for the past 12 months or so, with regular updates including kernel updates. It's only started failing since 4.13.0-26-generic.

Revision history for this message
Jon Evans (evansj) wrote :
Revision history for this message
volker kempter (v-kempter) wrote :

for me,lubuntu,16.04, 32 bit installation, no kernels 4.13 and 4.14 boot, neither without or with acpi=off. Without acpi=off, boot stops right after leaving the grub menu; with acpi=off after the message "starting to show plymouth boot screen".

Neither does kernel 4.14.14-041414 boot; it fails also with acpi=off.

Boot, without any additional boot options, is possible for 4.10.0-43 and 4.4.0-112

Revision history for this message
Kamil (kamil2018) wrote :

Question for experienced Ubuntu developers/sysadmins/users:
how long such bugs like this one usually stay non-fixed?

Revision history for this message
Leszek (bigl-aff) wrote :

Similar thing in my situation - all kernel including linux-image-4.13.0-25-generic work OK on my Asus Zenbook UX305FA but linux-image-4.13.0-26-generic makes my laptop almost unusable (everything is many times slower) and any newer version up to current linux-image-4.13.0-30-generic hang during start. On plymounth screen it boots up to progress bar and hangs just before showing mouse pointer (comparing to normal boot).

On the other hand all kernels from mainline PPA work without problem (including latest 4.14.13 and 4.15-RC8).

Revision history for this message
volker kempter (v-kempter) wrote :

ad #16:

in my case, mainline kernels, including 4.15-rc8, do also not work (even with acpi=off).

However, 4.15-rc8 boots when adding (in grub) the 5 (!) options (all seem to be necessary) acpi=off noapic nolapic irqpoll pnpbios=off (the first 3 options are not sufficient; the last two options alone also not sufficient...).

Revision history for this message
Leszek (bigl-aff) wrote :

I've tried your 5 options with linux-image-4.13.0-30-generic but without luck.
On the other hand I've tried mainline 4.14.14 (compiled today) and as all mainline kernels it works without problems.

Revision history for this message
volker kempter (v-kempter) wrote :

ad #19

I've tried 4.14.14 as well: no boot without additional boot options.

With the above mentioned 5 boot options it fails to boot "sometimes"; one out of three, may be.
It should mention that I'm using a dell latitude e6500 for these tests; the discrepancy between our results will probably be due to the hardware....

Revision history for this message
Leszek (bigl-aff) wrote :

Yes, probably differences in results due to different hardware.

But based on your and my test results i see 2 probably different problems:

1. Mine looks quite well as a result of some Ubuntu patch applied in 4.13.0-26 and above - i can use without problems all mainline kernels and all kernels up to 4.13.0-25.

2. Your problem is really less predictable and random results.

But this is only my maybe false assumption.

Revision history for this message
Leszek (bigl-aff) wrote :

3 additional tests:

1. Latest standard Ubuntu 16.04 kernel from 4.4 series works OK and additionally provides fill Spectre+Meltdown protection - linux-image-4.4.0-111-generic

1. Latest proposed Ubuntu 16.04 kernel from 4.4 series doesn't work - linux-image-4.4.0-112-generic

3. I've tried experimental kernel firm Canonical Kernel Team from PPA with retpoline patches. And it works, but is really sloooow

Revision history for this message
Kamil (kamil2018) wrote :

After standard system update to 4.13.0-31 the problem still persist.

Revision history for this message
volker kempter (v-kempter) wrote :

ad #22 and 23:

same with me: 4.13.0-31 not working
BUT 4.4.0-112 working (without boot options) (32bit installation)

Revision history for this message
Leszek (bigl-aff) wrote :

Since it is probably due to Meltdown+Spectre patches we've probably missed additional parameter - intel microcode used to build initramfs. It has changed 2018-01-08 to new one but was changed back to old one yesterday because many users reported boot and stability problems. It for sure affects boot and generally "behavior". So to be sure that i use correct microcode i've updated to lates on as of today and regenerated all initramfs files fo all installed kernels doing:

sudo update-initramfs -u -k 4.4.0-112-generic

and so on for all kernels.

Now linux-image-4.4.0-112-generic works for me most of the time. Sometimes it hangs during boot (about 1 in 3 boots) but once it boots everything is ok and additionally Meltdown+Spectre is fully fixed (according to checker script).

But still there are problems with latest versions of 4.10 and 4.13 kernels.
Mainline kernels work for me since they don't have all patches for Meltdown+Spectre (or even if they have them as a code, they need updated toolchain to be used to compile them to activate these patches but automated build system for mainline PPA doesn't support it for now).

It look in general like in my case (Asus Zenbook UX305FA with Intel Core M-Y510) Meltdown+Spectre patches are OK when they are applied to 4.4 kernel but they break my laptop when applied to more recent kernels.

On the other hand every kernel works on my "big" PC with AMD Athlon.

Revision history for this message
Aviv (avivdm) wrote :

I am affected as well: cannot boot from kernels 4.13.0-26 and 4.13.0-31 even when acpi=off. Reverted to 4.10.0-42.
Lubuntu 16.04 32 bit on Intel NUC 5PPYH.

Revision history for this message
Lou (lou-gregory42) wrote :

I am also affected - all version of 4.13 released so far do not boot. I also have reverted to 4.10.0.42. Xubuntu 16.04.3 installed on Dell e6430 laptop Intel/nVidia Graphics no dual-boot. I would echo the sentiments of an earlier poster - how can this continue through 3+ releases with no resolution in sight?

Revision history for this message
volker kempter (v-kempter) wrote :

ad #27:

for a very recent kernel, you could try 4.14.0-17 from the canonical-kernel-team PPA: for me (Dell e6500) it runs reliably.

However, you need the boot option acpi=off.

Revision history for this message
volker kempter (v-kempter) wrote :

ad #28:
you need to use the canonical-kernel-team/unstable PPA!!!!!

Revision history for this message
Leszek (bigl-aff) wrote :

I can confirm that kernel 4.14.0-17 runs also on my hardware (in fact even without any additional boot options).

As I understand it's mainly because this kernel haven't received any Spectre+Meltdown patches. So time will show how it will be with these patches. But for now it's best option to have current kernel easy and automatic way from PPA.

Revision history for this message
Leszek (bigl-aff) wrote :

Sorri for unintentional misinformation about kernel 4.14.0-17. In fact it has patches for Meltdown. But not for Spectre.

Anyway let's hope that soon kernel 4.15 will be on this PPA (if Canonical decides to go with non-LTS kernel for LTS version).

Revision history for this message
Dmitriy Merzlov (rxwrxrx) wrote :

Upgrading to kernel 4.13.0-33.36 from ppa:canonical-kernel-team/ppa solved the issue.
No need to use "acpi=off"

System works stable. (Dell E5570 i5-6300U Skylake)

Revision history for this message
Leszek (bigl-aff) wrote :

I can confirm that kernel 4.13.0-33.36 work ok on my hardware (Asus Zenbook UX305) and this kernel has applied all patches for Spectre+Meltdown.
The same result with kernel 4.15.2 from Canonical mainline ppa. So you have choice - vanilla kernel or Ubuntu kernel.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: removed: pti
Revision history for this message
Leszek (bigl-aff) wrote :

Sadly issue returned with kernel 4.13.0-35. As i can see from changelog some Spectre patches have been changed to different approach to fix this issue. So this is probably reason for this behavior.

Revision history for this message
Leszek (bigl-aff) wrote :

Also lates vanilla 4.15.3 works OK so it looks like 100% problem with Ubuntu kernel patches.

Revision history for this message
Jon Evans (evansj) wrote :

I updated again this morning and installed the latest 4.13.0-36-generic kernel. Unfortunately it still panics on boot. I've attached another slo-mo video.

Revision history for this message
Kamil (kamil2018) wrote :

I'm author of this bug report.
The bug has been resolved by the recent upgrade.

BUT... during recent days the system freezed rather often and required restart to be usable again. It happened randomly in the following circumstances, for example:
a) manipulation with complex js widgets by mouse;
b) running heavy job in gulp through yarn;
c) and some other unrelated activities.

Question: is there a way to collect any diagnostics data to post reasonable bug report for this system behavior that will be enough to fix this new bug?

Revision history for this message
Kamil (kamil2018) wrote :

Here is a new bug report for a new bug caused by recent updates that fixed #1742675:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753010

Revision history for this message
Jon Evans (evansj) wrote :

I am still experiencing the same bug. I regularly pull down the latest updates with apt update and apt dist-upgrade, and if a kernel update is included then I always test it.

This morning my system got updated with both linux-image-4.13.0-45-generic and linux-image-4.4.0-128-generic. I have modified my grub config so that it boots into 4.10.0-42-generic, the last hwe kernel that booted successfully.

4.4.0-128 boots OK
4.13.0-45 panics

I have captured another video, with boot_delay this time. I will transcribe it to text and attach another comment.

Revision history for this message
Jon Evans (evansj) wrote :

Attached is a transcription of the latest kernel boot panic.

This is the Oops:

[ 2.620000] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2.620000] IP: __bitmap_intersects+0x10/0x70
[ 2.620000] PGD 0
[ 2.620000] P4D 0
[ 2.620000]
[ 2.620000] Oops: 0000 [#1] SMP PTI
[ 2.620000] Modules linked in:
[ 2.620000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-45-generic #50~16.04.1-Ubuntu
[ 2.620000] Hardware name: Dell Inc. Precision WorkStation T5500 /0CRH6C, BIOS A16 05/28/2013
[ 2.620000] task: ffff989914e81600 task.stack: ffffb038c313800
[ 2.620000] RIP: 0010:__bitmap_intersects+0x10/0x70
[ 2.620000] RSP: 0000:ffffb038c313bb00 EFLAGS: 00010002
[ 2.620000] RAX: 0000000000000282 RBX: ffff989923fcb2c0 RCX: ffff989914e70780
[ 2.620000] RDX: 0000000000000040 RSI: ffffffff994603e0 RDI: 0000000000000000
[ 2.620000] RBP: ffffb038c313bb00 R08: ffff989916919000 R09: ffff989916406460
[ 2.620000] R10: 0000000000000001 R11: 0000000000000040 R12: 0000000000000000
[ 2.620000] R13: ffffb038c313bd88 R14: ffff989916906c60 R15: ffff989916906c60
[ 2.620000] FS: 0000000000000000(0000) GS:ffff989916e00000(0000) knlGS:0000000000000000
[ 2.620000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.620000] CR2: 0000000000000000 CR3: 000000016240a001 CR4: 00000000000206f0
[ 2.620000] Call Trace:
[ 2.620000] assign_irq_vector+0x58/0x450
[ 2.620000] ? radix_tree_lookup+0xd/0x10
[ 2.620000] x86_vector_alloc_irqs+0x10e/0x1a0
[ 2.620000] irq_domain_alloc_irqs_parent+0x1f/0x30
[ 2.620000] intel_irq_remapping_alloc+0x79/0x7c0
[ 2.620000] ? __radix_tree_create+0x17a/0x1f0
[ 2.620000] irq_domain_alloc_irqs_parent+0x1f/0x30
[ 2.620000] mp_irqdomain_alloc+0xa0/0x2c0
[ 2.620000] __irq_domain_alloc_irqs+0x144/0x330
[ 2.620000] alloc_isa_irq_from_domain.isra.10+0xc4/0xe0
[ 2.620000] mp_map_pin_to_irq+0x195/0x2f0
[ 2.620000] pin_2_irq+0x47/0x80
[ 2.620000] setup_IO_APIC+0x101/0x1c4
[ 2.620000] apic_bsp_setup+0xb7/0xc8
[ 2.620000] native_smp_prepare_cpus+0x2bd/0x332
[ 2.620000] kernel_init_freeable+0xd2/0x23b
[ 2.620000] ? rest_init+0xc0/0xc0
[ 2.620000] kernel_init+0xe/0x101
[ 2.620000] ret_from_fork+0x35/0x40
[ 2.620000] Code: 04 d7 49 09 c1 31 c0 4d 85 c9 0f 95 c0 5d c3 45 31 c9 eb c8 0f 1f 80 00 00 00 00 41 89 d2 55 41 c1 ea 06 45 85 d2 48 89 e5 74 2b <48> 8b 07 48 85 06 75 4e 31 c0 45 31 c9 eb 13 4c 8b 44 07 08 4c
[ 2.620000] RIP: __bitmap_intersects+0x10/0x70 RSP: ffffb038c313bb00
[ 2.620000] CR2: 0000000000000000
[ 2.620000] ---[ end trace 1dc91a86fe1886bb ]---
[ 2.620000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 2.620000]
[ 2.620000] --[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 2.620000]

Revision history for this message
Jon Evans (evansj) wrote :

Another data point, 4.13.0-45 crashes on boot even with acpi=off. It looks like a different call trace though, which I can transcribe on request if anyone thinks it will help.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Maybe try the -hwe-edge kernel instead?

Revision history for this message
Jon Evans (evansj) wrote :

Thanks for the suggestion @kaihengfeng. I installed linux-generic-hwe-16.04-edge, which gave me a working 4.15.0-23-generic after a reboot.

The log from 4.13.0-45 failed at this point:

[ 2.444002] DMAR-IR: IOAPIC id 10 under DRHD base 0xdfffe000 IOMMU 0
[ 2.500002] DMAR-IR: IOAPIC id 9 under DRHD base 0xfedc0000 IOMMU 1
[ 2.556002] DMAR-IR: IOAPIC id 8 under DRHD base 0xfedc0000 IOMMU 1
[ 2.560000] DMAR-IR: Enabled IRQ remapping in xapic mode
[ 2.616002] Switched APIC routing to physical flat.
[ 2.620000] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 2.620000] IP: __bitmap_intersects+0x10/0x70

and the same section of dmesg from the working 4.15.0-23 is:

[ 0.000000] DMAR-IR: IOAPIC id 10 under DRHD base 0xdfffe000 IOMMU 0
[ 0.000000] DMAR-IR: IOAPIC id 9 under DRHD base 0xfedc0000 IOMMU 1
[ 0.000000] DMAR-IR: IOAPIC id 8 under DRHD base 0xfedc0000 IOMMU 1
[ 0.000000] DMAR-IR: Enabled IRQ remapping in xapic mode
[ 0.000000] Switched APIC routing to physical flat.
[ 0.000000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.020000] tsc: Fast TSC calibration using PIT
[ 0.024000] tsc: Detected 2394.078 MHz processor
[ 0.024000] Calibrating delay loop (skipped), value calculated using timer frequency.. 4788.15 BogoMIPS (lpj=9576312)

... etc.

Maybe that will be useful for someone to track down what causes the crash.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.