arm64: loop on boot after installing linux-generic-hwe-18.04-edge/bionic-proposed

Bug #1845820 reported by Patricia Domingues
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
dann frazier

Bug Description

Description:
system is stuck in a loop after installing linux-generic-hwe-18.04-edge/bionic-proposed

System:
Cavium CN88XX
Board Model: crb-1s
SKU: CN8890-2000BG2601-AAP-PR-Y-G

1) Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-23-generic aarch64)
Description: Ubuntu 18.04.3 LTS
Release: 18.04

2) ((enable -proposed archive))
linux-generic-hwe-18.04-edge:
  Installed: 5.0.0.23.79
  Candidate: 5.3.0.12.83
  Version table:
     5.3.0.12.83 500
        500 http://ports.ubuntu.com/ubuntu-ports bionic-proposed/main arm64 Packages
 *** 5.0.0.23.79 500
        500 http://ports.ubuntu.com/ubuntu-ports bionic-updates/main arm64 Packages
        100 /var/lib/dpkg/status
     5.0.0.20.76 500
        500 http://ports.ubuntu.com/ubuntu-ports bionic-security/main arm64 Packages

3)
a. install linux-generic-hwe-18.04-edge/bionic-proposed 5.3.0.12.83 arm64 [upgradable from: 5.0.0.23.79]
b. reboot
c. ssh into the system

4)
a. installed linux-generic-hwe-18.04-edge/bionic-proposed
b. system is stuck in a boot loop

```
linux-generic-hwe-18.04-edge/bionic-proposed 5.3.0.12.83 arm64 [upgradable from: 5.0.0.23.79]
  Complete Generic Linux kernel and headers
```

Revision history for this message
Patricia Domingues (patriciasd) wrote :

adding machine console log after 'sudo reboot':
 https://paste.ubuntu.com/p/wYP45XV8xH/

Revision history for this message
dann frazier (dannf) wrote :

I can reproduce.

Moving to 'linux' package because:
 - linux-meta-hwe-edge is a meta package, it doesn't actually provide the kernel
 - While linux-hwe-edge is the package in which we're seeing the bug, the kernel team actually seems to always track such bugs in the 'linux' package from whence it came (i.e. eoan).

Changed in linux-meta-hwe-edge (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
dann frazier (dannf) wrote :

I bisected this down to the following commit, which suggests we need a fix for ThunderX IOMMU config, but can consider disabling ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT until then. To confirm, I verified that 5.3 boots fine with arm-smmu.disable_bypass=n on the cmdline.

commit 954a03be033c7cef80ddc232e7cbdb17df735663
Author: Douglas Anderson <REDACTED>
Date: Fri Mar 1 11:20:17 2019 -0800

    iommu/arm-smmu: Break insecure users by disabling bypass by default

    If you're bisecting why your peripherals stopped working, it's
    probably this CL. Specifically if you see this in your dmesg:
      Unexpected global fault, this could be serious
    ...then it's almost certainly this CL.

    Running your IOMMU-enabled peripherals with the IOMMU in bypass mode
    is insecure and effectively disables the protection they provide.
    There are few reasons to allow unmatched stream bypass, and even fewer
    good ones.

    This patch starts the transition over to make it much harder to run
    your system insecurely. Expected steps:

    1. By default disable bypass (so anyone insecure will notice) but make
       it easy for someone to re-enable bypass with just a KConfig change.
       That's this patch.

    2. After people have had a little time to come to grips with the fact
       that they need to set their IOMMUs properly and have had time to
       dig into how to do this, the KConfig will be eliminated and bypass
       will simply be disabled. Folks who are truly upset and still
       haven't fixed their system can either figure out how to add
       'arm-smmu.disable_bypass=n' to their command line or revert the
       patch in their own private kernel. Of course these folks will be
       less secure.

dann frazier (dannf)
no longer affects: linux-meta-hwe-edge (Ubuntu)
dann frazier (dannf)
Changed in linux (Ubuntu):
assignee: nobody → dann frazier (dannf)
Seth Forshee (sforshee)
Changed in linux (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.6 KiB)

This bug was fixed in the package linux - 5.3.0-17.18

---------------
linux (5.3.0-17.18) eoan; urgency=medium

  * eoan/linux: 5.3.0-17.18 -proposed tracker (LP: #1846641)

  * CVE-2019-17056
    - nfc: enforce CAP_NET_RAW for raw sockets

  * CVE-2019-17055
    - mISDN: enforce CAP_NET_RAW for raw sockets

  * CVE-2019-17054
    - appletalk: enforce CAP_NET_RAW for raw sockets

  * CVE-2019-17053
    - ieee802154: enforce CAP_NET_RAW for raw sockets

  * CVE-2019-17052
    - ax25: enforce CAP_NET_RAW for raw sockets

  * CVE-2019-15098
    - ath6kl: fix a NULL-ptr-deref bug in ath6kl_usb_alloc_urb_from_pipe()

  * xHCI on AMD Stoney Ridge cannot detect USB 2.0 or 1.1 devices.
    (LP: #1846470)
    - x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect

  * Re-enable linux-libc-dev build on i386 (LP: #1846508)
    - [Packaging] Build only linux-libc-dev for i386
    - [Debian] final-checks -- ignore archtictures with no binaries

  * arm64: loop on boot after installing linux-generic-hwe-18.04-edge/bionic-
    proposed (LP: #1845820)
    - [Config] Disable CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT

  * Revert ESE DASD discard support (LP: #1846219)
    - SAUCE: Revert "s390/dasd: Add discard support for ESE volumes"

  * Miscellaneous Ubuntu changes
    - update dkms package versions

linux (5.3.0-16.17) eoan; urgency=medium

  * eoan/linux: 5.3.0-16.17 -proposed tracker (LP: #1846204)

  * zfs fails to build on s390x with debug symbols enabled (LP: #1846143)
    - SAUCE: s390: Mark atomic const ops always inline

linux (5.3.0-15.16) eoan; urgency=medium

  * eoan/linux: 5.3.0-15.16 -proposed tracker (LP: #1845987)

  * Drop i386 build for 19.10 (LP: #1845714)
    - [Packaging] Remove x32 arch references from control files
    - [Debian] final-checks -- Get arch list from debian/control

  * ZFS kernel modules lack debug symbols (LP: #1840704)
    - [Debian] Fix conditional for setting zfs debug package path

  * Use pyhon3-sphinx instead of python-sphinx for building html docs
    (LP: #1845808)
    - [Packaging] Update sphinx build dependencies to python3 packages

  * Kernel panic with 19.10 beta image (LP: #1845454)
    - efi/tpm: Don't access event->count when it isn't mapped.
    - efi/tpm: don't traverse an event log with no events
    - efi/tpm: only set efi_tpm_final_log_size after successful event log parsing

linux (5.3.0-14.15) eoan; urgency=medium

  * eoan/linux: 5.3.0-14.15 -proposed tracker (LP: #1845728)

  * Drop i386 build for 19.10 (LP: #1845714)
    - [Debian] Remove support for producing i386 kernels
    - [Debian] Don't use CROSS_COMPILE for i386 configs

  * udevadm trigger will fail when trying to add /sys/devices/vio/
    (LP: #1845572)
    - SAUCE: powerpc/vio: drop bus_type from parent device

  * Trying to online dasd drive results in invalid input/output from the kernel
    on z/VM (LP: #1845323)
    - SAUCE: s390/dasd: Fix error handling during online processing

  * intel-lpss driver conflicts with write-combining MTRR region (LP: #1845584)
    - SAUCE: mfd: intel-lpss: add quirk for Dell XPS 13 7390 2-in-1

  * Support Hi1620 zip hw accelerator (LP: #1845355)
    - [Config] Enable HiSilicon QM/ZIP as module...

Read more...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (linux-gcp-5.3/5.3.0-1008.9~18.04.1)

All autopkgtests for the newly accepted linux-gcp-5.3 (5.3.0-1008.9~18.04.1) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

linux-gcp-5.3/unknown (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#linux-gcp-5.3

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Liz Fong-Jones (lizthegrey) wrote :

FYI: SolidRun Honeycomb/ClearFog LX2K users depend upon this feature as well (ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT=n); despite this original issue being filed for Cavium boards, it's still true for more recent hardware as well.

Thus, please don't remove it without compatibility testing. Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.