Fails to boot under Xen PV: BUG: unable to handle kernel paging request at edc21fd9

Bug #1789118 reported by Andy Smith on 2018-08-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Bionic
High
Joseph Salisbury

Bug Description

== SRU Justification ==
After an upgrade to Bionic, the bug reporter states the 32-bit kernel
does not boot under Xen PV mode. This bug does not affect 64-bit
kernels and if fixed by mainline commit 6a92b11169a6.

== Fix ==
6a92b11169a6 ("x86/EISA: Don't probe EISA bus for Xen PV guests")

== Regression Potential ==
Low. This commit only affects x86 kernels. This commit has also been cc'd to upstream stable, so it has had
additional upstream review.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

After the most recent 18.04 32-bit upgrade of linux-image-generic, it now refuses to boot under Xen PV mode:

.
.
.
[ 0.114370] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.114382] futex hash table entries: 256 (order: 2, 16384 bytes)
[ 0.114423] pinctrl core: initialized pinctrl subsystem
[ 0.134326] RTC time: 165:165:165, date: 165/165/65
[ 0.134442] NET: Registered protocol family 16
[ 0.134457] xen:grant_table: Grant tables using version 1 layout
[ 0.134502] Grant table initialized
[ 0.134544] audit: initializing netlink subsys (disabled)
[ 0.134611] audit: type=2000 audit(1535307799.132:1): state=initialized audit_enabled=0 res=1
[ 0.134678] EISA bus registered
[ 0.136019] PCI: setting up Xen PCI frontend stub
[ 0.136073] BUG: unable to handle kernel paging request at edc21fd9
[ 0.136084] IP: eisa_bus_probe+0x19/0x36
[ 0.136089] *pdpt = 0000000001ee6027 *pde = 0000000029cc6067 *pte = 0000000000000000
[ 0.136100] Oops: 0000 [#1] SMP
[ 0.136105] Modules linked in:
[ 0.136111] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.15.0-33-generic #36-Ubuntu
[ 0.136120] EIP: eisa_bus_probe+0x19/0x36
[ 0.136125] EFLAGS: 00010246 CPU: 0
[ 0.136130] EAX: edc21fd9 EBX: 00000000 ECX: 01e0d000 EDX: 00000200
[ 0.136138] ESI: c1d0d452 EDI: c1dd34a4 EBP: e9c89f24 ESP: e9c89f24
[ 0.136145] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: e021
[ 0.136154] CR0: 80050033 CR2: edc21fd9 CR3: 01e10000 CR4: 00042660
[ 0.136166] Call Trace:
[ 0.136173] do_one_initcall+0x49/0x174
[ 0.136179] ? parse_args+0x143/0x390
[ 0.136187] ? set_debug_rodata+0x14/0x14
[ 0.136193] kernel_init_freeable+0x149/0x1c5
[ 0.136201] ? rest_init+0xa0/0xa0
[ 0.136207] kernel_init+0xd/0xf0
[ 0.136213] ret_from_fork+0x2e/0x38
[ 0.140000] Code: ff b8 df 43 ae c1 e8 35 1b 88 ff e8 20 12 88 ff c9 c3 3e 8d 74 26 00 55 b9 04 00 00 00 31 d2 b8 d9 ff 0f 00 89 e5 e8 35 8d 35 ff <8b> 10 81 fa 45 49 53 41 75 0a c7 05 a0 76 ed c1 01 00 00 00 e8
[ 0.140000] EIP: eisa_bus_probe+0x19/0x36 SS:ESP: e021:e9c89f24
[ 0.140000] CR2: 00000000edc21fd9
[ 0.140000] ---[ end trace 8c00b3cb7d4f06ba ]---
[ 0.140013] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

This is: [ 0.000000] Linux version 4.15.0-33-generic (buildd@lgw01-amd64-038) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #36-Ubuntu SMP Wed Aug 15 13:44:35 UTC 2018 (Ubuntu 4.15.0-33.36-generic 4.15.18)

Switching to a 64-bit kernel allows boot to proceed.

I cannot include output of the commands you request (uname, version_signature, dmesg, lspci) because the guest doesn't boot. The above kernel output is from a just-installed clean guest however.

Andy Smith (grifferz) wrote :

Additionally, this means that the netboot installer at e.g. http://gb.archive.ubuntu.com/ubuntu/dists/bionic/main/installer-i386/current/images/netboot/xen/vmlinuz also does not boot under Xen PV. Again, the 64-bit equivalent does still work.

affects: linux-meta (Ubuntu) → linux (Ubuntu)

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1789118

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Andy Smith (grifferz) wrote :

I am unable to run apport-collect because the guest does not boot.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Does the machine boot if you select 4.15.0-33 from the GRUB menu? If so, can you test the latest 4.15 upstream stable kernel, which can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15.18/

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
status: New → Incomplete
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-key
Andy Smith (grifferz) wrote :

32-bit kernel http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15.18/linux-image-4.15.18-041518-generic_4.15.18-041518.201804190330_i386.deb does not boot, in fact it crashes immediately without producing any console output at all (so doesn't get as far as the currently-packaged linux-image-generic).

The 64-bit package http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15.18/linux-image-4.15.18-041518-generic_4.15.18-041518.201804190330_amd64.deb boots and seems to behave as expected.

Would it help if I went back through package versions of linux-image-generic until I find one that does boot? I know for certain that one of them does because this was encountered only after an update.

Joseph Salisbury (jsalisbury) wrote :

Sorry, I requested testing of the 4.15.0-33 kernel, but should have requested 4.15.0-32.

Yes, it would be great if you could identify the last 4.15 kernel that booted and the first that did not. Once we know this, we can perform a bisect to identify the commit that introduced the regression. All of the 4.15 kernels can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/

Changed in linux (Ubuntu Bionic):
status: Incomplete → Triaged
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: added: kernel-da-key
removed: kernel-key
Andy Smith (grifferz) wrote :

Okay, I went back through the kernel PPAs and in fact the most recent one that I can get to boot under 32-bit PV mode Xen is 4.13.16. 4.14-rc1 does not boot.

I realised then that it was possible that I had never tested any version of 32-bit Ubuntu 18.04 under Xen. It is possible that I only ever tested it 64-bit. The only reason why I am looking into this now is because a user with a 32-bit Ubuntu 16.04 did a release upgrade to 18.04.

On seeing that a large number of upstream kernel releases don't boot under 32-bit Xen, I did a quick check what the situation is in Debian. I know that I tested the current Debian stable both 32- and 64-bit. But that is based on kernel 4.9.x.

A 4.17.x kernel is available in stretch-backports, so I tried that, and sure enough that crashes too.

So, I'm a little shocked that I would be the first to notice this, but it seems like the upstream Linux kernel stopped working under 32-bit PV mode Xen quite some time ago.

I am more familiar with Debian so I am going to built the latest upstream stable kernel release on Debian and check that it also crashes in the same way, and if so then I guess I report it upstream, maybe do a bisection myself in Debian.

Joseph Salisbury (jsalisbury) wrote :

Thanks for the update, Andy. If you need any specific upstream kernel versions, they are all prebuilt here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/

Just let me know if you need assistance with a kernel bisect or building any specific kernels.

Andy Smith (grifferz) wrote :

I reported the problem to Xen, and they came up with this patch:

https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg02775.html

I recompiled linux-image-4.15.0-33-generic with this patch applied and it now works.

As far as I can see there is no working 32-bit Xen PV kernel package in 18.04, so it would be good if this could be remedied. What needs to happen for that? Does the above fix need to reach the upstream kernel and then be considered for backport?

Andy Smith (grifferz) wrote :

Alternatively as the bug is in the EISA code, if the kernel was built without CONFIG_EISA=y then I think that would avoid it. I haven't hit this problem on Debian stable or testing where CONFIG_EISA is not set.

Joseph Salisbury (jsalisbury) wrote :

Was there any discussion at Xen whether or not the patch you tested would be submitted to mainline?

Andy Smith (grifferz) wrote :

This patch was submitted here: https://lkml.org/lkml/2018/9/11/885

Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the patch posted in comment #12. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1789118

Can you test this kernel and see if it resolves this bug? If it does, I'll submit it as an SRU, so we don't have to wait for the patch to come down from upstream via stable updates.

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Andy Smith (grifferz) wrote :

Hi Joseph,

As this problem only affects 32-bit, there would need to be an i686 kernel there. As far as I can see there are only amd64 packages.

I have already tested this patch on the linux source package from 18.04 (comment #9), but I don't have the binary packages around any more.

If you want me to do it again with packages you supply then I can do that, though it will take a little while as I will have to install a 17.10 image and then do-release-upgrade it to 18.04 first.

Thanks,
Andy

Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Stefan Bader (smb) on 2018-10-01
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Andy Smith (grifferz) wrote :

Successful boot with kernel from bionic-proposed.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (20.0 KiB)

This bug was fixed in the package linux - 4.15.0-38.41

---------------
linux (4.15.0-38.41) bionic; urgency=medium

  * linux: 4.15.0-38.41 -proposed tracker (LP: #1797061)

  * Silent data corruption in Linux kernel 4.15 (LP: #1796542)
    - block: add a lower-level bio_add_page interface
    - block: bio_iov_iter_get_pages: fix size of last iovec
    - blkdev: __blkdev_direct_IO_simple: fix leak in error case
    - block: bio_iov_iter_get_pages: pin more pages for multi-segment IOs

linux (4.15.0-37.40) bionic; urgency=medium

  * linux: 4.15.0-37.40 -proposed tracker (LP: #1795564)

  * hns3: enable ethtool rx-vlan-filter on supported hw (LP: #1793394)
    - net: hns3: Add vlan filter setting by ethtool command -K

  * hns3: Modifying channel parameters will reset ring parameters back to
    defaults (LP: #1793404)
    - net: hns3: Fix desc num set to default when setting channel

  * hisi_sas: Add SATA FIX check for v3 hw (LP: #1794151)
    - scsi: hisi_sas: Add SATA FIS check for v3 hw

  * Fix potential corruption using SAS controller on HiSilicon arm64 boards
    (LP: #1794156)
    - scsi: hisi_sas: add memory barrier in task delivery function

  * hisi_sas: Reduce unnecessary spin lock contention (LP: #1794165)
    - scsi: hisi_sas: Tidy hisi_sas_task_prep()

  * Add functional level reset support for the SAS controller on HiSilicon D06
    systems (LP: #1794166)
    - scsi: hisi_sas: tidy host controller reset function a bit
    - scsi: hisi_sas: relocate some common code for v3 hw
    - scsi: hisi_sas: Implement handlers of PCIe FLR for v3 hw

  * HiSilicon SAS controller doesn't recover from PHY STP link timeout
    (LP: #1794172)
    - scsi: hisi_sas: tidy channel interrupt handler for v3 hw
    - scsi: hisi_sas: Fix the failure of recovering PHY from STP link timeout

  * getxattr: always handle namespaced attributes (LP: #1789746)
    - getxattr: use correct xattr length

  * Fix unusable NVIDIA GPU after S3 (LP: #1793338)
    - PCI: Reprogram bridge prefetch registers on resume

  * Fails to boot under Xen PV: BUG: unable to handle kernel paging request at
    edc21fd9 (LP: #1789118)
    - x86/EISA: Don't probe EISA bus for Xen PV guests

  * qeth: use vzalloc for QUERY OAT buffer (LP: #1793086)
    - s390/qeth: use vzalloc for QUERY OAT buffer

  * SRU: Enable middle button of touchpad on ThinkPad P72 (LP: #1793463)
    - Input: elantech - enable middle button of touchpad on ThinkPad P72

  * Dell new AIO requires a new uart backlight driver (LP: #1727235)
    - SAUCE: platform/x86: dell-uart-backlight: new backlight driver for DELL AIO
    - updateconfigs for Dell UART backlight driver

  * [Ubuntu] s390/crypto: Fix return code checking in cbc_paes_crypt.
    (LP: #1794294)
    - s390/crypto: Fix return code checking in cbc_paes_crypt()

  * hns3: Retrieve RoCE MSI-X config from firmware (LP: #1793221)
    - net: hns3: Fix MSIX allocation issue for VF
    - net: hns3: Refine the MSIX allocation for PF

  * net: hns: Avoid hang when link is changed while handling packets
    (LP: #1792209)
    - net: hns: add the code for cleaning pkt in chip
    - net: hns: add netif_carrier_off before change speed and duplex

  * Page leaki...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: In Progress → Fix Released
Andy Smith (grifferz) wrote :

As mentioned in my first comment, this bug also affects the netboot installer at e.g. http://gb.archive.ubuntu.com/ubuntu/dists/bionic/main/installer-i386/current/images/ so it is still not possible to boot the installer in 32-bit under Xen. The main OS kernel has been fixed but not the installer kernel. Who do we need to speak to in order to get that kernel fixed?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers