Bugfix for handling of shadow doorbell buffer

Bug #1788222 reported by David Coronel
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Committed
Undecided
Marcelo Cerri
Xenial
Fix Released
Undecided
Marcelo Cerri
Bionic
Fix Released
Undecided
Marcelo Cerri
Cosmic
Fix Committed
Undecided
Marcelo Cerri
linux-gcp (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

This request is to pull in the following patch for NVMe bug from https://lore.kernel.org/patchwork/patch/974880/ in the linux-gcp kernel:

===
This patch adds full memory barrier into nvme_dbbuf_update_and_check_event
function to ensure that the shadow doorbell is written before reading
EventIdx from memory. This is a critical bugfix for initial patch that
added support for shadow doorbell into NVMe driver[...].

CVE References

Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This fix will included on the main kernels, so no need to track the linux-gcp derivatives.

no longer affects: linux-gcp (Ubuntu Xenial)
no longer affects: linux-gcp (Ubuntu Bionic)
no longer affects: linux-gcp (Ubuntu Cosmic)
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Michal Wnukowski (wnukowski) wrote :

Applied the patch from -proposed:
root@Ubuntu1804:~# echo "deb http://archive.ubuntu.com/ubuntu/ bionic-proposed restricted main multiverse universe" | tee -a /etc/apt/sources.list
root@Ubuntu1804:~# apt-get update
root@Ubuntu1804:~# apt-get dist-upgrade
root@Ubuntu1804:~# shutdown -r 0

Verified that the -proposed contains fix:
root@Ubuntu1804:~# objdump -d /lib/modules/$(uname -r)/kernel/drivers/nvme/host/nvme.ko | grep -A 50 __nvme_submit_cmd\>: | grep fence
adf: 0f ae f8 sfence
ae6: 0f ae f0 mfence

(second mfence is the one added by patch)

Verified that the patch solves the issue:

./orion -run advanced -num_large 0 -size_small 8 -type rand -simulate concat -write 40 -duration 120 -matrix row -testname gcp_test

(no hang after 36 hours)

I assume that the same patch will be added to Xenial and Cosmic.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-gcp (Ubuntu):
status: New → Confirmed
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

The 4.4 xenial kernel is not affected by the problem.

no longer affects: linux (Ubuntu Xenial)
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (32.9 KiB)

This bug was fixed in the package linux - 4.15.0-34.37

---------------
linux (4.15.0-34.37) bionic; urgency=medium

  * linux: 4.15.0-34.37 -proposed tracker (LP: #1788744)

  * Bionic update: upstream stable patchset 2018-08-09 (LP: #1786352)
    - MIPS: c-r4k: Fix data corruption related to cache coherence
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - fs: don't scan the inode cache before SB_BORN is set
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - do d_instantiate/unlock_new_inode combinations safely
    - mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
    - arm64: lse: Add early clobbers to some input/output asm operands
    - powerpc/64s: Clear PCR on boot
    - IB/hfi1: Use after free race condition in send context error path
    - IB/umem: Use the correct mm during ib_umem_release
    - idr: fix invalid ptr dereference on item delete
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - mm/kasan: don't vfree() nonexistent vm_area
    - kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - KVM: s390: vsie: fix < 8k check for the itdba
    - KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
    - kvm: x86: IA32_ARCH_CAPABILITIES is always supported
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/pseries: Restore default security feature flags on setup
    - powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
    - MIPS: generic: Fix machine compatible matching
    - mac80211: mesh: fix wrong mesh TTL offset calculation
    - ARC: Fix malformed ARC_EMUL_UNALIGNED default
    - ptr_ring: prevent integer overflow when calculating size
    - arm64: dts: rockchip: fix rock64 gmac2io stability issues
    - arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
    - libata: Fix compile warning with ATA_DEBUG enabled
    - selftests: sync: missing CFLAGS while compiling
    - selftest/vDSO: fix O=
    - selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
    - selftests: memfd: add config fragment for fuse
    - ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
    - ARM: OMAP3: Fix prm wake interrupt for resume
    - ARM: OMAP2+: Fix sar_base inititalization for HS omaps
    - ARM: OMAP1: clock: Fix debugfs_create_*() usage
    - tls: retrun the correct IV in getsockopt
    - xhci: workaround for AMD Promontory disabled ports w...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (33.0 KiB)

This bug was fixed in the package linux-gcp - 4.15.0-1019.20

---------------
linux-gcp (4.15.0-1019.20) bionic; urgency=medium

  * linux-gcp: 4.15.0-1019.20 -proposed tracker (LP: #1788752)

  [ Ubuntu: 4.15.0-34.37 ]

  * linux: 4.15.0-34.37 -proposed tracker (LP: #1788744)
  * Bionic update: upstream stable patchset 2018-08-09 (LP: #1786352)
    - MIPS: c-r4k: Fix data corruption related to cache coherence
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - fs: don't scan the inode cache before SB_BORN is set
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - do d_instantiate/unlock_new_inode combinations safely
    - mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
    - arm64: lse: Add early clobbers to some input/output asm operands
    - powerpc/64s: Clear PCR on boot
    - IB/hfi1: Use after free race condition in send context error path
    - IB/umem: Use the correct mm during ib_umem_release
    - idr: fix invalid ptr dereference on item delete
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - mm/kasan: don't vfree() nonexistent vm_area
    - kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - KVM: s390: vsie: fix < 8k check for the itdba
    - KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
    - kvm: x86: IA32_ARCH_CAPABILITIES is always supported
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/pseries: Restore default security feature flags on setup
    - powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
    - MIPS: generic: Fix machine compatible matching
    - mac80211: mesh: fix wrong mesh TTL offset calculation
    - ARC: Fix malformed ARC_EMUL_UNALIGNED default
    - ptr_ring: prevent integer overflow when calculating size
    - arm64: dts: rockchip: fix rock64 gmac2io stability issues
    - arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
    - libata: Fix compile warning with ATA_DEBUG enabled
    - selftests: sync: missing CFLAGS while compiling
    - selftest/vDSO: fix O=
    - selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
    - selftests: memfd: add config fragment for fuse
    - ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
    - ARM: OMAP3: Fix prm wake interrupt for resume
    - ARM: OMAP2+: Fix sar_base inititalization for HS omaps
    - ARM: OMAP1: clock: Fix debugfs_create_*() usage
 ...

Changed in linux-gcp (Ubuntu):
status: Confirmed → Fix Released
status: Confirmed → Fix Released
Marcelo Cerri (mhcerri)
no longer affects: linux-gcp (Ubuntu Xenial)
Changed in linux (Ubuntu Bionic):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Marcelo Cerri (mhcerri)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Marcelo Cerri (mhcerri)
status: New → In Progress
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Xenial has an earlier version of the patches that have introduced the issue and it will require a custom fix.

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

The fix for xenial was submitted for review:

https://lists.ubuntu.com/archives/kernel-team/2018-September/095382.html

And the debian packages for a test kernel are available at:

http://kernel.ubuntu.com/~mhcerri/gcp/lp1788222/

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

I added all debian to a tarball to simplify the download:

http://kernel.ubuntu.com/~mhcerri/gcp/lp1788222.tar.gz

In order to install the test kernel, download the packages and run the following command:

apt install \
./linux-headers-4.4.0-135-generic_4.4.0-135.161+lp1788222.0_amd64.deb \
./linux-headers-4.4.0-135_4.4.0-135.161+lp1788222.0_all.deb \
./linux-image-4.4.0-135-generic_4.4.0-135.161+lp1788222.0_amd64.deb \
./linux-image-extra-4.4.0-135-generic_4.4.0-135.161+lp1788222.0_amd64.deb

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Michal Wnukowski (wnukowski) wrote :

I was able to verify that the fix in Xenial 4.4.0-138.164 works. The procedure was the same as in comment #4 [1], but with less run time.

[1] https://bugs.launchpad.net/ubuntu/+source/linux-gcp/+bug/1788222/comments/4

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (28.0 KiB)

This bug was fixed in the package linux - 4.4.0-138.164

---------------
linux (4.4.0-138.164) xenial; urgency=medium

  * linux: 4.4.0-138.164 -proposed tracker (LP: #1795582)

  * Linux 4.4.155 stable release build is broken on ppc64 (LP: #1795662)
    - powerpc/fadump: Return error when fadump registration fails

  * Kernel hang on drive pull caused by regression introduced by commit
    287922eb0b18 (LP: #1791790)
    - block: Fix a race between blk_cleanup_queue() and timeout handling

  * qeth: use vzalloc for QUERY OAT buffer (LP: #1793086)
    - s390/qeth: use vzalloc for QUERY OAT buffer

  * Page leaking in cachefiles_read_backing_file while vmscan is active
    (LP: #1793430)
    - SAUCE: cachefiles: Page leaking in cachefiles_read_backing_file while vmscan
      is active

  * Bugfix for handling of shadow doorbell buffer (LP: #1788222)
    - nvme-pci: add a memory barrier to nvme_dbbuf_update_and_check_event

  * Xenial update to 4.4.155 stable release (LP: #1792419)
    - net: 6lowpan: fix reserved space for single frames
    - net: mac802154: tx: expand tailroom if necessary
    - 9p/net: Fix zero-copy path in the 9p virtio transport
    - net: lan78xx: Fix misplaced tasklet_schedule() call
    - spi: davinci: fix a NULL pointer dereference
    - drm/i915/userptr: reject zero user_size
    - powerpc/fadump: handle crash memory ranges array index overflow
    - powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
    - fs/9p/xattr.c: catch the error of p9_client_clunk when setting xattr failed
    - 9p/virtio: fix off-by-one error in sg list bounds check
    - net/9p/client.c: version pointer uninitialized
    - net/9p/trans_fd.c: fix race-condition by flushing workqueue before the
      kfree()
    - dm cache metadata: save in-core policy_hint_size to on-disk superblock
    - iio: ad9523: Fix displayed phase
    - iio: ad9523: Fix return value for ad952x_store()
    - vmw_balloon: fix inflation of 64-bit GFNs
    - vmw_balloon: do not use 2MB without batching
    - vmw_balloon: VMCI_DOORBELL_SET does not check status
    - vmw_balloon: fix VMCI use when balloon built into kernel
    - tracing: Do not call start/stop() functions when tracing_on does not change
    - tracing/blktrace: Fix to allow setting same value
    - kthread, tracing: Don't expose half-written comm when creating kthreads
    - uprobes: Use synchronize_rcu() not synchronize_sched()
    - 9p: fix multiple NULL-pointer-dereferences
    - PM / sleep: wakeup: Fix build error caused by missing SRCU support
    - pnfs/blocklayout: off by one in bl_map_stripe()
    - ARM: tegra: Fix Tegra30 Cardhu PCA954x reset
    - mm/tlb: Remove tlb_remove_table() non-concurrent condition
    - iommu/vt-d: Add definitions for PFSID
    - iommu/vt-d: Fix dev iotlb pfsid use
    - osf_getdomainname(): use copy_to_user()
    - sys: don't hold uts_sem while accessing userspace memory
    - userns: move user access out of the mutex
    - ubifs: Fix memory leak in lprobs self-check
    - Revert "UBIFS: Fix potential integer overflow in allocation"
    - ubifs: Check data node size before truncate
    - ubifs: Fix synced_i_size calculation for xattr inodes
    - pwm: ti...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.