Bugfix for handling of shadow doorbell buffer

Bug #1788222 reported by David Coronel on 2018-08-21
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Marcelo Cerri
Xenial
Undecided
Marcelo Cerri
Bionic
Undecided
Marcelo Cerri
Cosmic
Undecided
Marcelo Cerri
linux-gcp (Ubuntu)
Undecided
Unassigned

Bug Description

This request is to pull in the following patch for NVMe bug from https://lore.kernel.org/patchwork/patch/974880/ in the linux-gcp kernel:

===
This patch adds full memory barrier into nvme_dbbuf_update_and_check_event
function to ensure that the shadow doorbell is written before reading
EventIdx from memory. This is a critical bugfix for initial patch that
added support for shadow doorbell into NVMe driver[...].

CVE References

Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Cosmic):
status: New → In Progress

This fix will included on the main kernels, so no need to track the linux-gcp derivatives.

no longer affects: linux-gcp (Ubuntu Xenial)
no longer affects: linux-gcp (Ubuntu Bionic)
no longer affects: linux-gcp (Ubuntu Cosmic)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Michal Wnukowski (wnukowski) wrote :

Applied the patch from -proposed:
root@Ubuntu1804:~# echo "deb http://archive.ubuntu.com/ubuntu/ bionic-proposed restricted main multiverse universe" | tee -a /etc/apt/sources.list
root@Ubuntu1804:~# apt-get update
root@Ubuntu1804:~# apt-get dist-upgrade
root@Ubuntu1804:~# shutdown -r 0

Verified that the -proposed contains fix:
root@Ubuntu1804:~# objdump -d /lib/modules/$(uname -r)/kernel/drivers/nvme/host/nvme.ko | grep -A 50 __nvme_submit_cmd\>: | grep fence
adf: 0f ae f8 sfence
ae6: 0f ae f0 mfence

(second mfence is the one added by patch)

Verified that the patch solves the issue:

./orion -run advanced -num_large 0 -size_small 8 -type rand -simulate concat -write 40 -duration 120 -matrix row -testname gcp_test

(no hang after 36 hours)

I assume that the same patch will be added to Xenial and Cosmic.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Marcelo Cerri (mhcerri) on 2018-09-06
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-gcp (Ubuntu):
status: New → Confirmed
Marcelo Cerri (mhcerri) wrote :

The 4.4 xenial kernel is not affected by the problem.

no longer affects: linux (Ubuntu Xenial)
Launchpad Janitor (janitor) wrote :
Download full text (32.9 KiB)

This bug was fixed in the package linux - 4.15.0-34.37

---------------
linux (4.15.0-34.37) bionic; urgency=medium

  * linux: 4.15.0-34.37 -proposed tracker (LP: #1788744)

  * Bionic update: upstream stable patchset 2018-08-09 (LP: #1786352)
    - MIPS: c-r4k: Fix data corruption related to cache coherence
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - fs: don't scan the inode cache before SB_BORN is set
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - do d_instantiate/unlock_new_inode combinations safely
    - mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
    - arm64: lse: Add early clobbers to some input/output asm operands
    - powerpc/64s: Clear PCR on boot
    - IB/hfi1: Use after free race condition in send context error path
    - IB/umem: Use the correct mm during ib_umem_release
    - idr: fix invalid ptr dereference on item delete
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - mm/kasan: don't vfree() nonexistent vm_area
    - kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - KVM: s390: vsie: fix < 8k check for the itdba
    - KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
    - kvm: x86: IA32_ARCH_CAPABILITIES is always supported
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/pseries: Restore default security feature flags on setup
    - powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
    - MIPS: generic: Fix machine compatible matching
    - mac80211: mesh: fix wrong mesh TTL offset calculation
    - ARC: Fix malformed ARC_EMUL_UNALIGNED default
    - ptr_ring: prevent integer overflow when calculating size
    - arm64: dts: rockchip: fix rock64 gmac2io stability issues
    - arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
    - libata: Fix compile warning with ATA_DEBUG enabled
    - selftests: sync: missing CFLAGS while compiling
    - selftest/vDSO: fix O=
    - selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
    - selftests: memfd: add config fragment for fuse
    - ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
    - ARM: OMAP3: Fix prm wake interrupt for resume
    - ARM: OMAP2+: Fix sar_base inititalization for HS omaps
    - ARM: OMAP1: clock: Fix debugfs_create_*() usage
    - tls: retrun the correct IV in getsockopt
    - xhci: workaround for AMD Promontory disabled ports w...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (33.0 KiB)

This bug was fixed in the package linux-gcp - 4.15.0-1019.20

---------------
linux-gcp (4.15.0-1019.20) bionic; urgency=medium

  * linux-gcp: 4.15.0-1019.20 -proposed tracker (LP: #1788752)

  [ Ubuntu: 4.15.0-34.37 ]

  * linux: 4.15.0-34.37 -proposed tracker (LP: #1788744)
  * Bionic update: upstream stable patchset 2018-08-09 (LP: #1786352)
    - MIPS: c-r4k: Fix data corruption related to cache coherence
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - fs: don't scan the inode cache before SB_BORN is set
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - do d_instantiate/unlock_new_inode combinations safely
    - mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
    - arm64: lse: Add early clobbers to some input/output asm operands
    - powerpc/64s: Clear PCR on boot
    - IB/hfi1: Use after free race condition in send context error path
    - IB/umem: Use the correct mm during ib_umem_release
    - idr: fix invalid ptr dereference on item delete
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - mm/kasan: don't vfree() nonexistent vm_area
    - kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - KVM: s390: vsie: fix < 8k check for the itdba
    - KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
    - kvm: x86: IA32_ARCH_CAPABILITIES is always supported
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/pseries: Restore default security feature flags on setup
    - powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
    - MIPS: generic: Fix machine compatible matching
    - mac80211: mesh: fix wrong mesh TTL offset calculation
    - ARC: Fix malformed ARC_EMUL_UNALIGNED default
    - ptr_ring: prevent integer overflow when calculating size
    - arm64: dts: rockchip: fix rock64 gmac2io stability issues
    - arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
    - libata: Fix compile warning with ATA_DEBUG enabled
    - selftests: sync: missing CFLAGS while compiling
    - selftest/vDSO: fix O=
    - selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
    - selftests: memfd: add config fragment for fuse
    - ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
    - ARM: OMAP3: Fix prm wake interrupt for resume
    - ARM: OMAP2+: Fix sar_base inititalization for HS omaps
    - ARM: OMAP1: clock: Fix debugfs_create_*() usage
 ...

Changed in linux-gcp (Ubuntu):
status: Confirmed → Fix Released
status: Confirmed → Fix Released
Marcelo Cerri (mhcerri) on 2018-09-11
no longer affects: linux-gcp (Ubuntu Xenial)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Marcelo Cerri (mhcerri)
status: New → In Progress
Marcelo Cerri (mhcerri) wrote :

Xenial has an earlier version of the patches that have introduced the issue and it will require a custom fix.

Marcelo Cerri (mhcerri) wrote :

The fix for xenial was submitted for review:

https://lists.ubuntu.com/archives/kernel-team/2018-September/095382.html

And the debian packages for a test kernel are available at:

http://kernel.ubuntu.com/~mhcerri/gcp/lp1788222/

Marcelo Cerri (mhcerri) wrote :

I added all debian to a tarball to simplify the download:

http://kernel.ubuntu.com/~mhcerri/gcp/lp1788222.tar.gz

In order to install the test kernel, download the packages and run the following command:

apt install \
./linux-headers-4.4.0-135-generic_4.4.0-135.161+lp1788222.0_amd64.deb \
./linux-headers-4.4.0-135_4.4.0-135.161+lp1788222.0_all.deb \
./linux-image-4.4.0-135-generic_4.4.0-135.161+lp1788222.0_amd64.deb \
./linux-image-extra-4.4.0-135-generic_4.4.0-135.161+lp1788222.0_amd64.deb

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Michal Wnukowski (wnukowski) wrote :

I was able to verify that the fix in Xenial 4.4.0-138.164 works. The procedure was the same as in comment #4 [1], but with less run time.

[1] https://bugs.launchpad.net/ubuntu/+source/linux-gcp/+bug/1788222/comments/4

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (28.0 KiB)

This bug was fixed in the package linux - 4.4.0-138.164

---------------
linux (4.4.0-138.164) xenial; urgency=medium

  * linux: 4.4.0-138.164 -proposed tracker (LP: #1795582)

  * Linux 4.4.155 stable release build is broken on ppc64 (LP: #1795662)
    - powerpc/fadump: Return error when fadump registration fails

  * Kernel hang on drive pull caused by regression introduced by commit
    287922eb0b18 (LP: #1791790)
    - block: Fix a race between blk_cleanup_queue() and timeout handling

  * qeth: use vzalloc for QUERY OAT buffer (LP: #1793086)
    - s390/qeth: use vzalloc for QUERY OAT buffer

  * Page leaking in cachefiles_read_backing_file while vmscan is active
    (LP: #1793430)
    - SAUCE: cachefiles: Page leaking in cachefiles_read_backing_file while vmscan
      is active

  * Bugfix for handling of shadow doorbell buffer (LP: #1788222)
    - nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event

  * Xenial update to 4.4.155 stable release (LP: #1792419)
    - net: 6lowpan: fix reserved space for single frames
    - net: mac802154: tx: expand tailroom if necessary
    - 9p/net: Fix zero-copy path in the 9p virtio transport
    - net: lan78xx: Fix misplaced tasklet_schedule() call
    - spi: davinci: fix a NULL pointer dereference
    - drm/i915/userptr: reject zero user_size
    - powerpc/fadump: handle crash memory ranges array index overflow
    - powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
    - fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
    - 9p/virtio: fix off-by-one error in sg list bounds check
    - net/9p/client.c: version pointer uninitialized
    - net/9p/trans_fd.c: fix race-condition by flushing workqueue before the
      kfree()
    - dm cache metadata: save in-core policy_hint_size to on-disk superblock
    - iio: ad9523: Fix displayed phase
    - iio: ad9523: Fix return value for ad952x_store()
    - vmw_balloon: fix inflation of 64-bit GFNs
    - vmw_balloon: do not use 2MB without batching
    - vmw_balloon: VMCI_DOORBELL_SET does not check status
    - vmw_balloon: fix VMCI use when balloon built into kernel
    - tracing: Do not call start/stop() functions when tracing_on does not change
    - tracing/blktrace: Fix to allow setting same value
    - kthread, tracing: Don't expose half-written comm when creating kthreads
    - uprobes: Use synchronize_rcu() not synchronize_sched()
    - 9p: fix multiple NULL-pointer-dereferences
    - PM / sleep: wakeup: Fix build error caused by missing SRCU support
    - pnfs/blocklayout: off by one in bl_map_stripe()
    - ARM: tegra: Fix Tegra30 Cardhu PCA954x reset
    - mm/tlb: Remove tlb_remove_table() non-concurrent condition
    - iommu/vt-d: Add definitions for PFSID
    - iommu/vt-d: Fix dev iotlb pfsid use
    - osf_getdomainname(): use copy_to_user()
    - sys: don't hold uts_sem while accessing userspace memory
    - userns: move user access out of the mutex
    - ubifs: Fix memory leak in lprobs self-check
    - Revert "UBIFS: Fix potential integer overflow in allocation"
    - ubifs: Check data node size before truncate
    - ubifs: Fix synced_i_size calculation for xattr inodes
    - pwm: ti...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers