Ubuntu 16.04 (4.4.0-127) hangs on boot with virtio-scsi MQ enabled

Bug #1775235 reported by Felipe Franciosi on 2018-06-05
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Xenial
High
Joseph Salisbury

Bug Description

== SRU Justification ==
The bug reporter noticed that Xenial guests running on Nutanix AHV stopped
booting after they were upgraded to 4.4.0-127. Only guests with scsi mq
enabled suffered from this problem. AHV is one of the few hypervisor
products to offer multiqueue for virtio-scsi devices.

Upon further investigation, the saw that the kernel would hang during the
scanning of scsi targets. More specifically, immediately after coming
across a target without any luns present.

It was found the following commit introduced this regression:
commit f1f609d8015e1d34d39458924dcd9524fccd4307
Author: Jay Vosburgh <email address hidden>
Date: Thu Apr 19 21:40:00 2018 +0200

The patch spins on the target's 'reqs' counter waiting for the target to quiesce.

Further study revealed that virtio-scsi itself is broken in a way that it
doesn't increment the 'reqs' counter when submitting requests on MQ in
certain conditions. That caused the counter to go to -1 (on the completion
of the first request) and the CPU to hang indefinitely.

This regression is fixed by the requested SAUCE patch.

== Fix ==
UBUNTU: SAUCE: (no-up) virtio-scsi: Increment reqs counter.

== Regression Potential ==
Low. Limited to virtio and fixes a regression.

== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

We noticed that Ubuntu 16.04 guests running on Nutanix AHV stopped booting after they were upgraded to the latest kernel (4.4.0-127). Only guests with scsi mq enabled suffered from this problem. AHV is one of the few hypervisor products to offer multiqueue for virtio-scsi devices.

Upon further investigation, we could see that the kernel would hang during the scanning of scsi targets. More specifically, immediately after coming across a target without any luns present. That's the first time the kernel destroys a target (given it doesn't have luns). This could be confirmed with gdb (attached to qemu's gdbserver):

#0 0xffffffffc0045039 in ?? ()
#1 0xffff88022c753c98 in ?? ()
#2 0xffffffff815d1de6 in scsi_target_destroy (starget=0xffff88022ad62400)
    at /build/linux-E14mqW/linux-4.4.0/drivers/scsi/scsi_scan.c:322

This shows the guest vCPU stuck on virtio-scsi's implementation of target_destroy. Despite lacking symbols, we managed to examine the virtio_scsi_target_state to see that the 'reqs' counter was invalid:

(gdb) p *(struct virtio_scsi_target_state *)starget->hostdata
$6 = {tgt_seq = {sequence = 0}, reqs = {counter = -1}, req_vq = 0xffff88022cbdd9e8}
(gdb)

This drew our attention to the following patch which is exclusive to the Ubuntu kernel:
commit f1f609d8015e1d34d39458924dcd9524fccd4307
Author: Jay Vosburgh <email address hidden>
Date: Thu Apr 19 21:40:00 2018 +0200

In a nutshell, the patch spins on the target's 'reqs' counter waiting for the target to quiesce:
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -785,6 +785,10 @@ static int virtscsi_target_alloc(struct scsi_target *starget)
 static void virtscsi_target_destroy(struct scsi_target *starget)
 {
        struct virtio_scsi_target_state *tgt = starget->hostdata;
+
+ /* we can race with concurrent virtscsi_complete_cmd */
+ while (atomic_read(&tgt->reqs))
+ cpu_relax();
        kfree(tgt);
 }

Personally, I think this is a catastrophic way of waiting for a target to quiesce since virtscsi_target_destroy() is called with IRQs disabled from scsi_scan.c:scsi_target_destroy(). Devices which take a long time to quiesce during a target_destroy() could hog the CPU for relatively long periods of time.

Nevertheless, further study revealed that virtio-scsi itself is broken in a way that it doesn't increment the 'reqs' counter when submitting requests on MQ in certain conditions. That caused the counter to go to -1 (on the completion of the first request) and the CPU to hang indefinitely.

The following patch fixes the issue:

--- old/linux-4.4.0/drivers/scsi/virtio_scsi.c 2018-06-04 10:23:07.000000000 -0700
+++ new/linux-4.4.0/drivers/scsi/virtio_scsi.c 2018-06-05 10:03:29.083428545 -0700
@@ -641,9 +641,10 @@
                                scsi_target(sc->device)->hostdata;
        struct virtio_scsi_vq *req_vq;

- if (shost_use_blk_mq(sh))
+ if (shost_use_blk_mq(sh)) {
                req_vq = virtscsi_pick_vq_mq(vscsi, sc);
- else
+ atomic_inc(&tgt->reqs);
+ } else
                req_vq = virtscsi_pick_vq(vscsi, tgt);

        return virtscsi_queuecommand(vscsi, req_vq, sc);

Signed-off-by: Felipe Franciosi <email address hidden>

Please consider this a urgent fix as all of our customers which use Ubuntu 16.04 and have MQ enabled for better performance will be affected by your latest update. Our workaround is to recommend that they disable SCSI MQ while you work on the issue.

Best regards,
Felipe

CVE References

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1775235

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: xenial
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
description: updated
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → High
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with the patch you posted in the description. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1775235

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Changed in linux (Ubuntu Xenial):
status: Confirmed → In Progress
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
status: In Progress → Invalid
Felipe Franciosi (felipe-1) wrote :

Hi Joseph,

Thanks for looking at this so promptly. I have downloaded and tested your kernel. It appears to work well both with scsi mq enabled and disabled (over virtio-scsi). I also ran some basic io integrity tests and didn't spot any problems.

Before you commit this, may I propose a v2 of my own patch which is functionally identical but slightly more elegant? It does the atomic_inc() within virtscsi_pick_vq_mq(). That's probably preferable given the other uses of atomic_*() for non-mq are done within virtscsi_pick_vq().

Have a look:
-------------8<-------------
--- old/linux-4.4.0/drivers/scsi/virtio_scsi.c 2018-06-04 10:23:07.000000000 -0700
+++ new/linux-4.4.0/drivers/scsi/virtio_scsi.c 2018-06-07 06:20:58.596764040 -0700
@@ -588,11 +588,13 @@
 }

 static struct virtio_scsi_vq *virtscsi_pick_vq_mq(struct virtio_scsi *vscsi,
+ struct virtio_scsi_target_state *tgt,
                                                  struct scsi_cmnd *sc)
 {
        u32 tag = blk_mq_unique_tag(sc->request);
        u16 hwq = blk_mq_unique_tag_to_hwq(tag);

+ atomic_inc(&tgt->reqs);
        return &vscsi->req_vqs[hwq];
 }

@@ -642,7 +644,7 @@
        struct virtio_scsi_vq *req_vq;

        if (shost_use_blk_mq(sh))
- req_vq = virtscsi_pick_vq_mq(vscsi, sc);
+ req_vq = virtscsi_pick_vq_mq(vscsi, tgt, sc);
        else
                req_vq = virtscsi_pick_vq(vscsi, tgt);

Signed-off-by: Felipe Franciosi <email address hidden>

-------------8<-------------

I'm happy to test it again if you'd like, but it should be functionally identical.

Joseph Salisbury (jsalisbury) wrote :

I built a second test kernel with the updated patch you posted in comment #3. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1775235

Can you test this kernel and see if it resolves this bug?

Felipe Franciosi (felipe-1) wrote :

Hi Joseph,

We gave it a try internally and it looks good. Passed the same set of tests I ran with the previous kernel you provided: works with mq disabled, enabled and also passes a simple io integrity test.

Thanks,
F.

Joseph Salisbury (jsalisbury) wrote :
description: updated
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Felipe Franciosi (felipe-1) wrote :

Hi Brad,

We have tested the -proposed kernel and it resolves the issue for us. I have updated the "verification-needed-xenial" tag to "verification-done-xenial".

Thanks again for the prompt response on this.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (29.8 KiB)

This bug was fixed in the package linux - 4.4.0-130.156

---------------
linux (4.4.0-130.156) xenial; urgency=medium

  * linux: 4.4.0-130.156 -proposed tracker (LP: #1776822)

  * CVE-2018-3665 (x86)
    - x86/fpu: Fix early FPU command-line parsing
    - x86/fpu: Fix 'no387' regression
    - x86/fpu: Disable MPX when eagerfpu is off
    - x86/fpu: Default eagerfpu=on on all CPUs
    - x86/fpu: Fix FNSAVE usage in eagerfpu mode
    - x86/fpu: Fix math emulation in eager fpu mode
    - x86/fpu: Fix eager-FPU handling on legacy FPU machines

linux (4.4.0-129.155) xenial; urgency=medium

  * linux: 4.4.0-129.155 -proposed tracker (LP: #1776352)

  * Xenial update to 4.4.134 stable release (LP: #1775771)
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - kernel/signal.c: avoid undefined behaviour in kill_something_info
    - xfs: remove racy hasattr check from attr ops
    - do d_instantiate/unlock_new_inode combinations safely
    - firewire-ohci: work around oversized DMA reads on JMicron controllers
    - NFSv4: always set NFS_LOCK_LOST when a lock is lost.
    - ALSA: hda - Use IS_REACHABLE() for dependency on input
    - ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
    - kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
    - tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into
      account
    - PCI: Add function 1 DMA alias quirk for Marvell 9128
    - tools lib traceevent: Simplify pointer print logic and fix %pF
    - perf callchain: Fix attr.sample_max_stack setting
    - tools lib traceevent: Fix get_field_str() for dynamic strings
    - dm thin: fix documentation relative to low water mark threshold
    - nfs: Do not convert nfs_idmap_cache_timeout to jiffies
    - watchdog: sp5100_tco: Fix watchdog disable bit
    - kconfig: Don't leak main menus during parsing
    - kconfig: Fix automatic menu creation mem leak
    - kconfig: Fix expr_free() E_NOT leak
    - ipmi/powernv: Fix error return code in ipmi_powernv_probe()
    - Btrfs: set plug for fsync
    - btrfs: Fix out of bounds access in btrfs_search_slot
    - Btrfs: fix scrub to repair raid6 corruption
    - scsi: fas216: fix sense buffer initialization
    - HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
    - jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
    - powerpc/numa: Use ibm,max-associativity-domains to discover possib...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers