Fresh Eoan upgrade fails to boot dom0 with message "decoding failed" (kernel 5.3.0)

Bug #1851091 reported by David on 2019-11-03
56
This bug affects 10 people
Affects Status Importance Assigned to Milestone
xen (Ubuntu)
High
Stefan Bader
Disco
High
Unassigned
Eoan
High
Unassigned

Bug Description

[SRU Justification]

IMPORTANT NOTE: See comment #14 (must upload to Disco and copy forward to newer releases with binaries)

[Impact]

Due to a kernel change, it is impossible to bring up a Eoan or later kernel as dom0 (the hypervisor starts and then fails with a "decode failed" message.

[Fix]

Pick 3 patches from upstream Xen which handle fixes to the lz4 compression.

[Testcase]

Start a Xen host with a Eoan kernel. This either succeeds or fails.

[Risk of Regression]

Low, all 3 patches are for a specific decompression method which is used early on boot. This either works or does not and is 100% fatal right now.

---

A freshly upgraded amd64 system with Ubuntu 19.10 and Xen 4.9 won't boot dom0. It fails with the attached screen (saying "Decoding failed ********** Panic on CPU 0: Could not set up DOM0 Guest OS").

The system booted fine before the upgrade from 19.04. Dom0 still boots if kernel 5.0.0.32 (from Ubuntu 19.04) is selected in grub.

Release: Ubuntu 19.10
kernel: 5.3.0-19-generic (also verified with 5.3.0-21-generic from eoan-proposed)
xen-system-amd64 version: 4.9.2-0ubuntu2
xen-utils-4.9 version: 4.9.2-0ubuntu2

David (dbourget) wrote :
description: updated
description: updated
summary: - Xen fails to boot dom0 with message "decoding failed" with dom0 5.3.x
+ Xen fails to boot dom0 with message "decoding failed" kernel 5.3
summary: - Xen fails to boot dom0 with message "decoding failed" kernel 5.3
+ Xen fails to boot dom0 with message "decoding failed" with kernel 5.3
summary: - Xen fails to boot dom0 with message "decoding failed" with kernel 5.3
+ Fresh Eoan upgrade fails to boot dom0 with message "decoding failed"
+ (kernel 5.3.0)
David (dbourget) wrote :

I should add two comments:

- I can boot in non-xen ubuntu with the same kernel that doesn't work as dom0.

- Since I found reports of "decoding failed" issues due to the change to lz4 compression in 19.10, I tried changing back to gzip compression in /etc/initramfs-tools/initramfs.conf (and running update-initramfs -u), and this didn't help.

Andreas Hasenack (ahasenack) wrote :

Thanks for filing this bug in Ubuntu.

Subscribing Stefan Bader, who has handled xen uploads/bugs in the past, to see if he has any insights.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xen (Ubuntu):
status: New → Confirmed
Gray (gray-green) wrote :

Got the same after clear install of 19.10 release with xen. "apt update;apt upgrade" didn't help

David (dbourget) wrote :

Has anybody found a workaround?

ShadowEO (dreamcaster23) wrote :

Can confirm, upgraded to Ubuntu 19.10, had installed all new kernel images from the Eoan apt repository and then pulled in Xen and was greeted to "Error: Decoding Failed" for Dom0. I as with the others, attempted to use a GZIP'd initramfs and failed.

David (dbourget) wrote :

Looks like Xen is completely broken in Eoan. This should get some attention from Xen maintainers...

ShadowEO (dreamcaster23) wrote :

Just attempted OP's advice and downgraded my kernel to 5.0.0.xx and Xen 4.9 boots, albeit with no ethernet support (I assume I need a backported driver, or to install the version of Xen that came with 19.04). So this is definitely a problem with the new Eoan kernel, and it affects the current git tree as I did build my own kernel from Ubuntu's Eoan sources to check and could not boot at all.

Seems like Eoan's Xen support may have rotted out. Sad, would've loved to use Xen instead of attempting to use VMware (considering my processor is now on the VMware unsupported list after ESXi 4.5)

Maurice Johnson (mejohns) wrote :

Unfortunately, I stumbled on this same problem.
What I see is that all but two Kernel configuration items are the same across multiple kernel versions. The two items that have changed are:

CONFIG_XEN_SELFBALLOONING=y (not sure why this has gone away)
CONFIG_XEN_TMEM=m

These two are missing in the 5.3.x kernels but XEN (Dom0) expects SELF_BALLOONING. I am limited in being able to experiment with this as most of my hardware is running Deb 9, 10 and Ubuntu 18.04. I have 19.10 running on a laptop that works with 5.0.0.xx and Xen.

Hope this helps. In the meantime, when I get a chance, I'll poke around on the laptop for more.

David (dbourget) wrote :

Maurice: unfortunately I can't test playing with these config options because I've given up and downgraded this machine. But thanks for posting. It will be interesting to see if someone can get it work.

I have talked with Stefan Bader (subscribed) and he confirmed that it is recently rather broken.
I'm not sure he'll get to it, but lets assign to him to be clear on what this bug would wait for.

Changed in xen (Ubuntu):
assignee: nobody → Stefan Bader (smb)
Stefan Bader (smb) on 2019-12-12
Changed in xen (Ubuntu Disco):
importance: Undecided → High
Changed in xen (Ubuntu Eoan):
importance: Undecided → High
Changed in xen (Ubuntu):
importance: Undecided → High
Stefan Bader (smb) wrote :

I was able to find the Xen changes which fix the decoding matter. Right now there is also work needed to work around issues when compiling with gcc-8 (and gcc-9). I got the compiling side fixed, too. Unfortunately the result, when compiling with gcc-9, is a non-working hypervisor. In my tests that just triggers a host reset before even printing any early boot messages. So right now there is only one way of getting an at least working solution for Eoan/Focal: to upload into Disco and copy-forward that (with binaries) to E/F.

Stefan Bader (smb) wrote :

Tested from a PPA build on 2 hosts (one Intel and on AMD based) running Eoan:

# apt-cache policy xen-hypervisor-4.9-amd64
xen-hypervisor-4.9-amd64:
  Installed: 4.9.2-0ubuntu4
  Candidate: 4.9.2-0ubuntu4
  Version table:
 *** 4.9.2-0ubuntu4 500
        500 http://ppa.launchpad.net/smb/eoan/ubuntu eoan/main amd64 Packages
        100 /var/lib/dpkg/status
     4.9.2-0ubuntu2 500
        500 http://archive.ubuntu.com/ubuntu eoan/universe amd64 Packages
# uname -a
Linux argabuthon 5.3.0-24-generic #26-Ubuntu SMP Thu Nov 14 01:33:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# virsh version
Compiled against library: libvirt 5.4.0
Using library: libvirt 5.4.0
Using API: Xen 5.4.0
Running hypervisor: Xen 4.9.0

Stefan Bader (smb) on 2019-12-12
Changed in xen (Ubuntu Disco):
status: New → Fix Committed

I managed to build a working xen.efi on eoan with the following commits from the xen git tree:

6561994b87af3e9cd28ee99c42e8b2697621687d (lz4 fix)
14b62ab3e5a79816edfc6dd3afce1bb68c106ac5 (lz4 fix)
45342cd88d564a7da2dfbbc921898805008b0b6c (gcc 9 fix)
2effc2f131145fdd40352085c11adb1e17164135 (gcc 9 fix)

and this hack:

--- xen-4.9.2.orig/xen/arch/x86/Rules.mk
+++ xen-4.9.2/xen/arch/x86/Rules.mk
@@ -35,6 +35,7 @@ endif
 ifneq ($(call cc-option,$(CC),-mindirect-branch-register,n),n)
 CFLAGS += -mindirect-branch=thunk-extern -mindirect-branch-register
 CFLAGS += -DCONFIG_INDIRECT_THUNK
+CFLAGS += -fcf-protection=none
 export CONFIG_INDIRECT_THUNK=y
 endif

The compile still fails in the userspace packages with other GCC 9 issues but I got what I needed for a working system. (intel cpu, direct efi boot with xen.efi, windows hvm guest)

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xen (Ubuntu Eoan):
status: New → Confirmed
Biterror (jmarin) wrote :

Sorry for a silly question, but can I expect to receive a working Xen version via apt update soon or should I try to downgrade to an older Ubuntu Server version? (I tried to install 18.04LTS, but the installer has some kind of a bug and can't find my NVMe disks..) Thanks!

Stefan Bader (smb) on 2020-01-07
Changed in xen (Ubuntu Disco):
status: Fix Committed → In Progress
Changed in xen (Ubuntu Eoan):
status: Confirmed → In Progress
Stefan Bader (smb) on 2020-01-07
description: updated

An upload of xen to disco-proposed has been rejected from the upload queue for the following reason: "cruft in the package (d/changelog.orig), and could use -0ubuntu3 as the version".

Hello David, or anyone else affected,

Accepted xen into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/xen/4.9.2-0ubuntu5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in xen (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
Robert Russell (rrbrussell) wrote :

Based on my testing with fresh installs of both Disco and Eoan the 4.9.2-0ubuntu5 version fixes the problem.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xen - 4.9.2-0ubuntu6

---------------
xen (4.9.2-0ubuntu6) focal; urgency=medium

  * Build-depend on python2-dev.
  * Depend on python2.
  * Build using python2.
  * Build-depend on lmodern.

 -- Matthias Klose <email address hidden> Mon, 13 Jan 2020 14:51:35 +0100

Changed in xen (Ubuntu):
status: Confirmed → Fix Released
Stefan Bader (smb) on 2020-01-15
tags: added: verification-done verification-done-disco verification-done-eoan
removed: verification-needed verification-needed-disco
Andy Whitcroft (apw) on 2020-01-15
Changed in xen (Ubuntu Eoan):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xen - 4.9.2-0ubuntu5

---------------
xen (4.9.2-0ubuntu5) disco; urgency=medium

  * Fix FTBS in Eoan (LP: #1823441). Except the last two changes, these are
    all cherry picks from Xen upstream to handle gcc8 and gcc9 changes.
    - d/p/x86-e820-fix-build-with-gcc9.patch
    - d/p/x86-IO-APIC-fix-build-with-gcc9.patch
    - d/p/trace-fix-build-with-gcc9.patch
    - d/p/tools-libxc-fix-strncpy-size.patch
    - d/p/tools-misc-fix-hypothetical-buffer-overflow-in-xen-l.patch
    - d/p/tools-xentop-replace-use-of-deprecated-vwprintw.patch
    - d/p/tools-xenpmd-fix-possible-0-truncation.patch
    - d/p/xenpmd-make-32-bit-gcc-8.1-non-debug-build-work.patch
    - d/p/libacpi-fixes-for-iasl-20180427.patch
    - d/p/tools-blktap2-fix-possible-0-truncation.patch
    - d/p/tools-blktap2-fix-hypothetical-buffer-overflow.patch
    - d/p/libxl-arm-Fix-build-on-arm64-acpi-w-gcc-8.2.patch
    - d/p/ubuntu/flags-fcs-protect-none.patch
    - d/p/ubuntu/strip-note-gnu-property.patch
  * Fix decode failed panics with v5.2+ kernels (LP: #1851091)
    - d/p/0001-lz4-refine-commit-9143a6c55ef7-for-the-64-bit-case.patch
    - d/p/0002-lz4-pull-out-constant-tables.patch
    - d/p/0003-lz4-fix-system-halt-at-boot-kernel-on-x86_64.patch

 -- Stefan Bader <email address hidden> Wed, 11 Dec 2019 17:23:34 +0100

Changed in xen (Ubuntu Disco):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for xen has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Andreas Hasenack (ahasenack) wrote :

Was the sru for disco released before eoan's?

Stefan Bader (smb) wrote :

@Andreas, it seems so but it is the same thing as for Eoan. Unfortunately this required a build in Eoan and then to be copy-fwd. So it is the very same bits in both cases.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xen - 4.9.2-0ubuntu5

---------------
xen (4.9.2-0ubuntu5) disco; urgency=medium

  * Fix FTBS in Eoan (LP: #1823441). Except the last two changes, these are
    all cherry picks from Xen upstream to handle gcc8 and gcc9 changes.
    - d/p/x86-e820-fix-build-with-gcc9.patch
    - d/p/x86-IO-APIC-fix-build-with-gcc9.patch
    - d/p/trace-fix-build-with-gcc9.patch
    - d/p/tools-libxc-fix-strncpy-size.patch
    - d/p/tools-misc-fix-hypothetical-buffer-overflow-in-xen-l.patch
    - d/p/tools-xentop-replace-use-of-deprecated-vwprintw.patch
    - d/p/tools-xenpmd-fix-possible-0-truncation.patch
    - d/p/xenpmd-make-32-bit-gcc-8.1-non-debug-build-work.patch
    - d/p/libacpi-fixes-for-iasl-20180427.patch
    - d/p/tools-blktap2-fix-possible-0-truncation.patch
    - d/p/tools-blktap2-fix-hypothetical-buffer-overflow.patch
    - d/p/libxl-arm-Fix-build-on-arm64-acpi-w-gcc-8.2.patch
    - d/p/ubuntu/flags-fcs-protect-none.patch
    - d/p/ubuntu/strip-note-gnu-property.patch
  * Fix decode failed panics with v5.2+ kernels (LP: #1851091)
    - d/p/0001-lz4-refine-commit-9143a6c55ef7-for-the-64-bit-case.patch
    - d/p/0002-lz4-pull-out-constant-tables.patch
    - d/p/0003-lz4-fix-system-halt-at-boot-kernel-on-x86_64.patch

 -- Stefan Bader <email address hidden> Wed, 11 Dec 2019 17:23:34 +0100

Changed in xen (Ubuntu Eoan):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers