fscache cookie refcount updated incorrectly during fscache object allocation

Bug #1776277 reported by Kiran Kumar Modukuri on 2018-06-11
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Trusty
Undecided
Unassigned
Xenial
Undecided
Unassigned
Bionic
Undecided
Unassigned

Bug Description

== SRU Justification ==

[Impact]
Oops during heavy NFS + FSCache + Cachefiles use:

 kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
 kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!

[Cause]
 1)Two threads are trying to do operate on a cookie and two objects.
 2a)One thread tries to unmount the filesystem and in process goes over
   a huge list of objects marking them dead and deleting the objects.
   cookie->usage is also decremented in following path
      nfs_fscache_release_super_cookie
       -> __fscache_relinquish_cookie
        ->__fscache_cookie_put
        ->BUG_ON(atomic_read(&cookie->usage) <= 0);

 2b)second thread tries to lookup an object for reading data in
    following path

     fscache_alloc_object
      1) cachefiles_alloc_object
          -> fscache_object_init
            -> assign cookie, but usage not bumped.
     2) fscache_attach_object -> fails in cant_attach_object because the
        cookie's backing object or cookie's->parent object are going away
     3)fscache_put_object
           -> cachefiles_put_object
            ->fscache_object_destroy
            ->fscache_cookie_put
            ->BUG_ON(atomic_read(&cookie->usage) <= 0);
[Fix]
 Bump up the cookie usage in fscache_object_init,
 when it is first being assigned a cookie atomically such that the cookie
 is added and bumped up if its refcount is not zero.
 remove the assignment in the attach_object.

[Testcase]
A user has run ~100 hours of NFS stress tests and not seen this bug recur.

[Regression Potential]
 - Limited to fscache/cachefiles.

description: updated

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1776277

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Changed in linux (Ubuntu Bionic):
status: New → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (12.2 KiB)

This bug was fixed in the package linux - 4.17.0-9.10

---------------
linux (4.17.0-9.10) cosmic; urgency=medium

  * linux: 4.17.0-9.10 -proposed tracker (LP: #1787988)

  * Cosmic update to 4.17.17 stable release (LP: #1787973)
    - x86/speculation/l1tf: Exempt zeroed PTEs from inversion
    - Linux 4.17.17

  * Cosmic update to 4.17.16 stable release (LP: #1787972)
    - x86/l1tf: Fix build error seen if CONFIG_KVM_INTEL is disabled
    - x86: i8259: Add missing include file
    - x86/platform/UV: Mark memblock related init code and data correctly
    - x86/mm/pti: Clear Global bit more aggressively
    - xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits
    - x86/mm: Disable ioremap free page handling on x86-PAE
    - kbuild: verify that $DEPMOD is installed
    - crypto: ccree - fix finup
    - crypto: ccree - fix iv handling
    - crypto: ccp - Check for NULL PSP pointer at module unload
    - crypto: ccp - Fix command completion detection race
    - crypto: x86/sha256-mb - fix digest copy in sha256_mb_mgr_get_comp_job_avx2()
    - crypto: vmac - require a block cipher with 128-bit block size
    - crypto: vmac - separate tfm and request context
    - crypto: blkcipher - fix crash flushing dcache in error path
    - crypto: ablkcipher - fix crash flushing dcache in error path
    - crypto: skcipher - fix aligning block size in skcipher_copy_iv()
    - crypto: skcipher - fix crash flushing dcache in error path
    - ioremap: Update pgtable free interfaces with addr
    - x86/mm: Add TLB purge to free pmd/pte page interfaces
    - Linux 4.17.16

  * Cosmic update to 4.17.16 stable release (LP: #1787972) // CVE-2018-9363
    - Bluetooth: hidp: buffer overflow in hidp_process_report

  * linux-cloud-tools-common: Ensure hv-kvp-daemon.service starts before
    walinuxagent.service (LP: #1739107)
    - [Debian] hyper-v -- Ensure that hv-kvp-daemon.service starts before
      walinuxagent.service

  * Miscellaneous Ubuntu changes
    - [Packaging] retpoline -- fix temporary filenaming

linux (4.17.0-8.9) cosmic; urgency=medium

  * linux: 4.17.0-8.9 -proposed tracker (LP: #1787259)

  * Cosmic update to v4.17.15 stable release (LP: #1787257)
    - parisc: Enable CONFIG_MLONGCALLS by default
    - parisc: Define mb() and add memory barriers to assembler unlock sequences
    - Mark HI and TASKLET softirq synchronous
    - stop_machine: Disable preemption after queueing stopper threads
    - sched/deadline: Update rq_clock of later_rq when pushing a task
    - zram: remove BD_CAP_SYNCHRONOUS_IO with writeback feature
    - xen/netfront: don't cache skb_shinfo()
    - bpf, sockmap: fix leak in bpf_tcp_sendmsg wait for mem path
    - bpf, sockmap: fix bpf_tcp_sendmsg sock error handling
    - scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management
      enabled
    - scsi: qla2xxx: Fix memory leak for allocating abort IOCB
    - init: rename and re-order boot_cpu_state_init()
    - root dentries need RCU-delayed freeing
    - make sure that __dentry_kill() always invalidates d_seq, unhashed or not
    - fix mntput/mntput race
    - fix __legitimize_mnt()/mntput() race
    - ARM: dts: imx6sx: fix irq for pcie bridge
  ...

Changed in linux (Ubuntu):
status: Triaged → Fix Released
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
David Coronel (davecore) wrote :

I have confirmation that the kernel in -proposed fixes the issue in xenial.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (32.9 KiB)

This bug was fixed in the package linux - 4.15.0-34.37

---------------
linux (4.15.0-34.37) bionic; urgency=medium

  * linux: 4.15.0-34.37 -proposed tracker (LP: #1788744)

  * Bionic update: upstream stable patchset 2018-08-09 (LP: #1786352)
    - MIPS: c-r4k: Fix data corruption related to cache coherence
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - fs: don't scan the inode cache before SB_BORN is set
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - do d_instantiate/unlock_new_inode combinations safely
    - mmc: sdhci-iproc: remove hard coded mmc cap 1.8v
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - mmc: sdhci-iproc: add SDHCI_QUIRK2_HOST_OFF_CARD_ON for cygnus
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN|OUT] macros
    - arm64: lse: Add early clobbers to some input/output asm operands
    - powerpc/64s: Clear PCR on boot
    - IB/hfi1: Use after free race condition in send context error path
    - IB/umem: Use the correct mm during ib_umem_release
    - idr: fix invalid ptr dereference on item delete
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - mm/kasan: don't vfree() nonexistent vm_area
    - kasan: free allocated shadow memory on MEM_CANCEL_ONLINE
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - KVM: s390: vsie: fix < 8k check for the itdba
    - KVM: x86: Update cpuid properly when CR4.OSXAVE or CR4.PKE is changed
    - kvm: x86: IA32_ARCH_CAPABILITIES is always supported
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/pseries: Restore default security feature flags on setup
    - powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
    - MIPS: generic: Fix machine compatible matching
    - mac80211: mesh: fix wrong mesh TTL offset calculation
    - ARC: Fix malformed ARC_EMUL_UNALIGNED default
    - ptr_ring: prevent integer overflow when calculating size
    - arm64: dts: rockchip: fix rock64 gmac2io stability issues
    - arm64: dts: rockchip: correct ep-gpios for rk3399-sapphire
    - libata: Fix compile warning with ATA_DEBUG enabled
    - selftests: sync: missing CFLAGS while compiling
    - selftest/vDSO: fix O=
    - selftests: pstore: Adding config fragment CONFIG_PSTORE_RAM=m
    - selftests: memfd: add config fragment for fuse
    - ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
    - ARM: OMAP3: Fix prm wake interrupt for resume
    - ARM: OMAP2+: Fix sar_base inititalization for HS omaps
    - ARM: OMAP1: clock: Fix debugfs_create_*() usage
    - tls: retrun the correct IV in getsockopt
    - xhci: workaround for AMD Promontory disabled ports w...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-158.208

---------------
linux (3.13.0-158.208) trusty; urgency=medium

  * linux: 3.13.0-158.208 -proposed tracker (LP: #1788764)

  * CVE-2018-3620 // CVE-2018-3646
    - SAUCE: x86/fremap: Invert the offset when converting to/from a PTE

  * BUG: scheduling while atomic (Kernel : Ubuntu-3.13 + VMware: 6.0 and late)
    (LP: #1780470)
    - VSOCK: sock_put wasn't safe to call in interrupt context
    - VSOCK: Fix lockdep issue.
    - VSOCK: Detach QP check should filter out non matching QPs.

  * CacheFiles: Error: Overlong wait for old active object to go away.
    (LP: #1776254)
    - cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
    - cachefiles: Wait rather than BUG'ing on "Unexpected object collision"

  * fscache cookie refcount updated incorrectly during fscache object allocation
    (LP: #1776277)
    - fscache: Fix reference overput in fscache_attach_object() error handling

  * FS-Cache: Assertion failed: FS-Cache: 6 == 5 is false (LP: #1774336)
    - Revert "UBUNTU: SAUCE: CacheFiles: fix a read_waiter/read_copier race"
    - fscache: Allow cancelled operations to be enqueued
    - cachefiles: Fix refcounting bug in backing-file read monitoring

 -- Kleber Sacilotto de Souza <email address hidden> Fri, 24 Aug 2018 15:08:23 +0000

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-135.161

---------------
linux (4.4.0-135.161) xenial; urgency=medium

  * linux: 4.4.0-135.161 -proposed tracker (LP: #1788766)

  * [Regression] APM Merlin boards fail to recover link after interface down/up
    (LP: #1785739)
    - net: phylib: fix interrupts re-enablement in phy_start
    - net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT

  * qeth: don't clobber buffer on async TX completion (LP: #1786057)
    - s390/qeth: don't clobber buffer on async TX completion

  * nvme: avoid cqe corruption (LP: #1788035)
    - nvme: avoid cqe corruption when update at the same time as read

  * CacheFiles: Error: Overlong wait for old active object to go away.
    (LP: #1776254)
    - cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
    - cachefiles: Wait rather than BUG'ing on "Unexpected object collision"

  * fscache cookie refcount updated incorrectly during fscache object allocation
    (LP: #1776277) // fscache cookie refcount updated incorrectly during fscache
    object allocation (LP: #1776277)
    - fscache: Fix reference overput in fscache_attach_object() error handling

  * FS-Cache: Assertion failed: FS-Cache: 6 == 5 is false (LP: #1774336)
    - Revert "UBUNTU: SAUCE: CacheFiles: fix a read_waiter/read_copier race"
    - fscache: Allow cancelled operations to be enqueued
    - cachefiles: Fix refcounting bug in backing-file read monitoring

  * linux-cloud-tools-common: Ensure hv-kvp-daemon.service starts before
    walinuxagent.service (LP: #1739107)
    - [Debian] hyper-v -- Ensure that hv-kvp-daemon.service starts before
      walinuxagent.service

 -- Khalid Elmously <email address hidden> Sun, 26 Aug 2018 23:56:50 -0400

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers