Unable to boot with i386 4.13.0-25 / 4.13.0-26 / 4.13.0-31 kernel on Xenial / Artful

Bug #1745118 reported by Po-Hsu Lin
50
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Linux Mint
New
Undecided
Unassigned
linux (Ubuntu)
Fix Released
High
Unassigned
Artful
Fix Released
High
Unassigned

Bug Description

Some SRU testing node cannot boot with the latest 32bit 4.13 linux-hwe kernel.

Take node "fozzie"(Dell PowerEdge R320) for example, it works with 4.13.0-21.24~16.04.1 but not 4.13.0-25 / 4.13.0-26 / 4.13.0-31 kernel on Xenial.

From the BMC console, I can see the grub menu on boot and after that it will drop into a boot loop.

This can be reproduced on Artful 4.13 as well.

Note that this kernel works for some of the node in our test pool (Intel SDP - Denlow)

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.13.0-21-generic 4.13.0-21.24~16.04.1
ProcVersionSignature: User Name 4.13.0-21.24~16.04.1-generic 4.13.13
Uname: Linux 4.13.0-21-generic i686
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: i386
Date: Wed Jan 24 08:49:57 2018
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-hwe-edge
UpgradeStatus: No upgrade log present (probably fresh install)
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Jan 24 09:17 seq
 crw-rw---- 1 root audio 116, 33 Jan 24 09:17 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 16.04
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Dell Inc. PowerEdge R320
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-21-generic root=UUID=0845f9a0-ab8c-4dfa-8385-af21f2f2b9ad ro
ProcVersionSignature: User Name 4.13.0-21.24~16.04.1-generic 4.13.13
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-21-generic N/A
 linux-backports-modules-4.13.0-21-generic N/A
 linux-firmware 1.157.14
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial uec-images
Uname: Linux 4.13.0-21-generic i686
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True
dmi.bios.date: 05/11/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.2.4
dmi.board.name: 0DY523
dmi.board.vendor: Dell Inc.
dmi.board.version: A03
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.2.4:bd05/11/2012:svnDellInc.:pnPowerEdgeR320:pvr:rvnDellInc.:rn0DY523:rvrA03:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R320
dmi.sys.vendor: Dell Inc.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Po-Hsu Lin (cypressyew)
summary: Unable to boot with i386 4.13.0-25 / 4.13.0-26 / 4.13.0-31 kernel on
- Xenial
+ Xenial / Artful
affects: linux-hwe-edge (Ubuntu) → linux (Ubuntu)
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1745118

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : CRDA.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Po-Hsu Lin (cypressyew) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : JournalErrors.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Lspci.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : Lsusb.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : ProcModules.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : UdevDb.txt

apport information

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged
Changed in linux (Ubuntu Artful):
status: New → Triaged
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
William Grant (wgrant) wrote :

I've only been able to reproduce this on systems with an Intel IOMMU. Disabling the IOMMU in the firmware (usually labelled "VT-d") lets the latest 4.13 i386 kernel boot. It's also reproducible in qemu if you give it an IOMMU, eg. "-machine q35 -device intel-iommu".

The problem is that the IDT page (0xffc00000) overlaps the FIX_BTMAPS range. IOMMU detection tries to read ACPI tables, which eventually calls early_ioremap, which maps an ACPI table over the IDT, and then eventually unmaps it completely. The first kernel interrupt after that (usually in test_wp_bit) triple-faults when it can't find the IDT.

I've built an experimental kernel with the FIX_BTMAPS vs IDT conflict fixed at https://people.canonical.com/~wgrant/linux-image-4.13.0-31-generic_4.13.0-31.34~16.04.1_i386.deb. It works on my affected hardware and qemu, but it would be good to confirm that it's the same issue that others have.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi William,
I got your kernel verified on 4 affected system with 3 different CPU (ES-2403, X3430, X3470), I can boot the system with your 4.13.0-31 kernel.

Thanks!

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1745118

tags: added: iso-testing
Revision history for this message
Damien Buhl (damien-buhl) wrote :

It looks like the same issue occurs on Intel Core 2 Duo CPU P9400 @ 2.40Ghz with i686 Xubuntu with Kernel linux-image-4.13.0-32-generic.

It loops directly after grub before any kernel print out back to bios.

It boots with the previous kernel 4.10.0-42-generic.

Revision history for this message
Alkis Georgopoulos (alkisg) wrote :

I had bisected this issue and I had reported upstream on 19 Jan:
https://bugzilla.kernel.org/show_bug.cgi?id=198529.

I verify that the patch/test kernel by wgrant works fine here.
Thanks a lot!

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

SRU request sent to the kernel team mailing list:
https://lists.ubuntu.com/archives/kernel-team/2018-February/089895.html

Changed in linux (Ubuntu Artful):
status: Triaged → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Revision history for this message
ubuntuslave (ubuntuslave) wrote :

Same problem here when trying to use kernel 4.13.0-32-generic on a Macbook Pro. Reverting to the previous kernel on this machine 4.10 is my current only choice/solution.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Node fozzie, onza, onibi (these three nodes are affected by this bug) can boot with the proposed 4.13.0-36-generic #40~16.04.1-Ubuntu i386 kernel.

It looks like the 4.13.0-32 kernel is still affected by this issue, so I can't deploy Artful image with MaaS, which uses the 4.13.0-32 kernel, we can expect the Artful deployment back to normal when -36 has been released in the future.

tags: added: verification-done-artful
removed: verification-needed-artful
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.1 KiB)

This bug was fixed in the package linux - 4.13.0-36.40

---------------
linux (4.13.0-36.40) artful; urgency=medium

  * linux: 4.13.0-36.40 -proposed tracker (LP: #1750010)

  * Rebuild without "CVE-2017-5754 ARM64 KPTI fixes" patch set

linux (4.13.0-35.39) artful; urgency=medium

  * linux: 4.13.0-35.39 -proposed tracker (LP: #1748743)

  * CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present"
    - SAUCE: turn off IBRS when full retpoline is present
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

linux (4.13.0-34.37) artful; urgency=medium

  * linux: 4.13.0-34.37 -proposed tracker (LP: #1748475)

  * libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053)
    - libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

  * KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb)
    (LP: #1747090)
    - KVM: s390: wire up bpb feature

  * artful 4.13 i386 kernels crash after memory hotplug remove (LP: #1747069)
    - Revert "mm, memory_hotplug: do not associate hotadded memory to zones until
      online"

  * CVE-2017-5715 (Spectre v2 Intel)
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/entry: Stuff RSB for entry to kernel for non-SMEP platform
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
    - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control
    - x86/cpu/AMD: Add speculative control support for AMD
    - x86/microcode: Extend post microcode reload to support IBPB feature
    - KVM: SVM: Do not intercept new speculative control MSRs
    - x86/svm: Set IBRS value on VM entry and exit
    - x86/svm: Set IBPB when running a different VCPU
    - KVM: x86: Add speculative control CPUID support for guests
    - SAUCE: turn off IBPB when full retpoline is present

  * Artful 4.13 fixes for tun (LP: #1748846)
    - tun: call dev_get_valid_name() before register_netdevice()
    - tun: allow positive return values on dev_get_valid_name() call
    - tun/tap: sanitize TUNSETSNDBUF input

  * boot failure on AMD Raven + WestonXT (LP: #1742759)
    - SAUCE: drm/amdgpu: add atpx quirk handling (v2)

linux (4.13.0-33.36) artful; urgency=low

  * linux: 4.13.0-33.36 -proposed tracker (LP: #1746903)

  [ Stefan Bader ]
  * starting VMs causing retpoline4 to reboot (LP: #1747507) // CVE-2017-5715
    (Spectre v2 retpoline)
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
    - x86/retpol...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Po-Hsu Lin (cypressyew)
Changed in linux (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.