ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on

Bug #1673564 reported by Ciprian Barbu on 2017-03-16
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Xenial
High
Unassigned
Yakkety
High
Unassigned
Zesty
High
dann frazier

Bug Description

[Impact]
VMs can cause interrupts to be disabled on the host CPU, resulting in hangs.

[Test Case]
Download the attached vm-start-stop.tar.gz and unpack it. Run the start.sh script within. This will start 30 parallel loops where a VM is defined, started, allowed 60s to run, then destroyed. If the bug exists, within 10 minutes you'll see qemu processes start to hang, along with soft lockup messages on the console. Note that this script is compatible with xenial libvirt syntax - it likely needs tweaks to run in newer Ubuntu releases, just due to QEMU/libvirt changes.

[Regression Risk]
The code changes are restricted to ARM - but there are a lot of them. While we've attempted to stress test the proposed changes on both impacted and non-impacted (non-ThunderX) systems, it is possible that there are issues that our test isn't finding, which would likely surface as KVM guest crashes/hangs.

Ciprian Barbu (ciprian-barbu) wrote :

Launch script showing how to start qemu with AAVMF and vhost on. It requires an AAVMF vars file, which can be a simple zero filled 64M data file.

Ciprian Barbu (ciprian-barbu) wrote :

Serial console log, including debug messages from the Firmware and kernel errors. Could not get dmesg output, I might try again.

no longer affects: qemu (Ubuntu)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in edk2 (Ubuntu):
status: New → Confirmed
Ciprian Barbu (ciprian-barbu) wrote :

Just an additional comment, running qemu-kvm with a 3part image (kernel, initramfs and disk) does not yield a crash on 4.8 kernel, even when enabling vhost. Similarly, running a guest with AAVMF and no vhost behaves ok, it's the combination of the two that didn't work no matter what guest, FW or AAVMF binaries we tried.

Ciprian Barbu (ciprian-barbu) wrote :

The bug shows on the CRB boards as well

dann frazier (dannf) on 2017-03-27
Changed in linux (Ubuntu):
status: New → Confirmed
dann frazier (dannf) wrote :

Thanks for the steps. As you noted, I can reproduce on xenial w/ the 4.8 hwe kernel, but not with the GA 4.4. I'm marking the kernel packages as affected because the host kernel shouldn't hang, even if the guest does something bad (e.g. if edk2 is buggy).

Hi, Dann,
Thanks for looking into this!
One more thing: we blacklisted the module "vhost_net", and that bypasses the issue.
I know it's not the right direction for finding a fix, but maybe it helps with the debug.

dann frazier (dannf) wrote :

I bisected this down to the upstream commit below. Now, that's not to say that this commit is necessarily bad - it may just make an existing problem more reproducible.

commit 7235acdb1144460d9f520f0d931f3cbb79eb244c
Author: Jason Wang <email address hidden>
Date: Mon Apr 25 22:14:32 2016 -0400

    vhost: simplify work flushing

    We used to implement the work flushing through tracking queued seq,
    done seq, and the number of flushing. This patch simplify this by just
    implement work flushing through another kind of vhost work with
    completion. This will be used by lockless enqueuing patch.

    Signed-off-by: Jason Wang <email address hidden>
    Reviewed-by: Michael S. Tsirkin <email address hidden>
    Signed-off-by: Michael S. Tsirkin <email address hidden>

David Daney (david.daney) wrote :

We are testing this patch, it is not a definitive solution at this point, but we would like to get wider testing on it to see if it helps for this bug.

The attachment "Experimental workaround" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
David Daney (david.daney) wrote :

The Experimental patch in comment #9, although it seems to prevent the hang under some workloads, puts the code in the wrong function. We really think it should be at the end of __vgic_v3_save_state()

We will have a revised patch after some further testing.

dann frazier (dannf) wrote :

Thanks, the patch is working well in our internal testing. I believe this rules out edk2, so I'll drop that task from this bug.

no longer affects: edk2 (Ubuntu)
dann frazier (dannf) wrote :

I've uploaded a test kernel w/ a potential workaround developed by Cavium to:
  ppa:yarmouth-team/thunderx

The build should complete in ~4 hours. Mind giving that a test?

dann frazier (dannf) wrote :

Once the build is complete, you'll be able to install it with:

sudo apt-add-repository ppa:yarmouth-team/thunderx
sudo apt update
sudo apt install linux-image-extra-4.8.0-49-generic=4.8.0-49.52~16.04.1+lp1673564.1

The PPA web page is at:
  https://launchpad.net/~yarmouth-team/+archive/ubuntu/thunderx

dann frazier (dannf) wrote :

Note that, while this symptom is seemingly only reproducible with >= 4.10 (hwe-y), there is another symptom of this problem that is reproducible with 4.4 (hwe-x). That symptom is that VMs will hang during teardown - easily reproducible using a parallel VM start/stop test. I'll therefore mark this as impacting >= xenial.

Changed in linux (Ubuntu Xenial):
status: New → Confirmed
Changed in linux (Ubuntu Yakkety):
status: New → Confirmed
Changed in linux (Ubuntu Zesty):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
Changed in linux (Ubuntu Yakkety):
importance: Undecided → High
Changed in linux (Ubuntu Zesty):
importance: Undecided → High
description: updated
Yu-Chiang Huang (tjjh89017) wrote :

@dannf: while i used ppa:yarmouth-team/thunderx kernel, it still causes the server hanging.
I'm using Openstack on arm64 just like what ciprian-barbu did.

R150-T60
BIOS version T12
Ubuntu 16.04.2
tested kernel: offical-4.4, thunderx-4.8
program: openstack mitaka, qemu command is same as ciprian-barbu's `launch.sh` in Comment #1

Seth Forshee (sforshee) on 2017-07-14
Changed in linux (Ubuntu):
status: Confirmed → Fix Committed

This bug was nominated against a series that is no longer supported, ie yakkety. The bug task representing the yakkety nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Yakkety):
status: Confirmed → Won't Fix
dann frazier (dannf) wrote :
dann frazier (dannf) on 2017-08-03
description: updated
Changed in linux (Ubuntu Zesty):
status: Confirmed → In Progress
assignee: nobody → dann frazier (dannf)
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (24.9 KiB)

This bug was fixed in the package linux - 4.11.0-13.19

---------------
linux (4.11.0-13.19) artful; urgency=low

  * CVE-2017-7533
    - dentry name snapshots

linux (4.11.0-12.18) artful; urgency=low

  * linux: 4.11.0-12.18 -proposed tracker (LP: #1707635)
    - no change rebuild to pick up the new binutils.

  * Adt tests of src:linux time out often on armhf lxc containers (LP: #1705495)
    - [Packaging] tests -- reduce rebuild test to one flavour
    - [Packaging] tests -- reduce rebuild test to one flavour -- use filter

  * [ARM64] config EDAC_GHES=y depends on EDAC_MM_EDAC=y (LP: #1706141)
    - [Config] set EDAC_MM_EDAC=y for ARM64

  * [Hyper-V] hv_netvsc: Exclude non-TCP port numbers from vRSS hashing
    (LP: #1690174)
    - hv_netvsc: Exclude non-TCP port numbers from vRSS hashing

  * ath10k doesn't report full RSSI information (LP: #1706531)
    - ath10k: add per chain RSSI reporting

  * ideapad_laptop don't support v310-14isk (LP: #1705378)
    - platform/x86: ideapad-laptop: Add several models to no_hw_rfkill

  * Ubuntu 16.04.3: Qemu fails on P9 (LP: #1686019)
    - KVM: PPC: Pass kvm* to kvmppc_find_table()
    - KVM: PPC: Use preregistered memory API to access TCE list
    - KVM: PPC: VFIO: Add in-kernel acceleration for VFIO
    - powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()
    - powerpc/iommu/vfio_spapr_tce: Cleanup iommu_table disposal
    - powerpc/vfio_spapr_tce: Add reference counting to iommu_table
    - powerpc/mmu: Add real mode support for IOMMU preregistered memory
    - KVM: PPC: Reserve KVM_CAP_SPAPR_TCE_VFIO capability number
    - KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers

  * hns: ethtool selftest crashes system (LP: #1705712)
    - net/hns:bugfix of ethtool -t phy self_test

  * ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
    (LP: #1673564)
    - KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2
      registers
    - KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
    - arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
    - KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
    - KVM: arm64: Make kvm_condition_valid32() accessible from EL2
    - KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
    - KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
    - KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
    - KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
    - KVM: arm64: vgic-v3: Add misc Group-0 handlers
    - KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
    - KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
    - arm64: Add MIDR values for Cavium cn83XX SoCs
    - arm64: Add wor...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
dann frazier (dannf) wrote :

I've ran the vm-start-stop test on a ThunderX system for an hour on the proposed kernel w/o any issues.

tags: added: verification-done-zesty
removed: patch verification-needed-zesty
dann frazier (dannf) wrote :

I've been able to backport the fixes back as far as 4.7. 4.7 was when the new vgic reimplementation was merged, and the upstream patchset would need significant surgery to apply. To fix Ubuntu 16.04's GA kernel (4.4-based), we'd probably need to develop a new (hopefully simpler) solution. In the meantime, I recommend anyone using ThunderX w/ 16.04 stick with the HWE (currently 4.10-based).

Launchpad Janitor (janitor) wrote :
Download full text (8.5 KiB)

This bug was fixed in the package linux - 4.10.0-33.37

---------------
linux (4.10.0-33.37) zesty; urgency=low

  * linux: 4.10.0-33.37 -proposed tracker (LP: #1709303)

  * CVE-2017-1000112
    - Revert "udp: consistently apply ufo or fragmentation"
    - udp: consistently apply ufo or fragmentation

  * CVE-2017-1000111
    - Revert "net-packet: fix race in packet_set_ring on PACKET_RESERVE"
    - packet: fix tp_reserve race in packet_set_ring

  * ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
    (LP: #1673564)
    - irqchip/gic-v3: Add missing system register definitions
    - arm64: KVM: Do not use stack-protector to compile EL2 code
    - KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2
      registers
    - KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
    - arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
    - KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
    - KVM: arm64: Make kvm_condition_valid32() accessible from EL2
    - KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
    - KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
    - KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
    - KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
    - KVM: arm64: vgic-v3: Add misc Group-0 handlers
    - KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
    - KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
    - arm64: Add MIDR values for Cavium cn83XX SoCs
    - [Config] CONFIG_CAVIUM_ERRATUM_30115=y
    - arm64: Add workaround for Cavium Thunder erratum 30115
    - KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
    - KVM: arm64: Enable GICv3 common sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
    - arm64: KVM: Make unexpected reads from WO registers inject an undef
    - KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
    - KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access

  * ibmvscsis: Do not send aborted task response (LP: #1689365)
    - target: Fix unknown fabric callback queue-full errors
    - ibmvscsis: Do not send aborted task response
    - ibmvscsis: Clear left-over abort_cmd pointers
    - ibmvscsis: Fix the incorrect req_lim_delta

  * hisi_sas performance improvements (LP: #1708734)
    - scsi: hisi_sas: define hisi_sas_device.device_id as int
    - scsi: hisi_sas: optimise the usage of hisi_hba.lock
    - scsi: hisi_sas: relocate sata_done_v2_hw()
    - scsi: hisi_sas: optimise DMA slot memory

  * hisi_sas...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Raghuram Kota (rkota) wrote :

Backporting to the Xenial (Ubntu 16.04) kernel isn't feasible. 16.04 users are requested to use the hwe-z kernel as noted in the following Wiki page : https://wiki.ubuntu.com/HardwareSupport/Machines/Servers/Cavium/ThunderXCRB1S

Changed in linux (Ubuntu Xenial):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers