FS-Cache: Assertion failed: FS-Cache: 6 == 5 is false

Bug #1774336 reported by Daniel Axtens
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Daniel Axtens
Trusty
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned
Artful
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned

Bug Description

== SRU Justification ==

[Impact]
Oops during heavy NFS + FSCache use:

[81738.886634] FS-Cache:
[81738.888281] FS-Cache: Assertion failed
[81738.889461] FS-Cache: 6 == 5 is false
[81738.890625] ------------[ cut here ]------------
[81738.891706] kernel BUG at /build/linux-hVVhWi/linux-4.4.0/fs/fscache/operation.c:494!

6 == 5 represents an operation being DEAD when it was not expected to be.

[Cause]
There is a race in fscache and cachefiles.

One thread is in cachefiles_read_waiter:
 1) object->work_lock is taken.
 2) the operation is added to the to_do list.
 3) the work lock is dropped.
 4) fscache_enqueue_retrieval is called, which takes a reference.

Another thread is in cachefiles_read_copier:
 1) object->work_lock is taken
 2) an item is popped off the to_do list.
 3) object->work_lock is dropped.
 4) some processing is done on the item, and fscache_put_retrieval() is called, dropping a reference.

Now if the this process in cachefiles_read_copier takes place *between* steps 3 and 4 in cachefiles_read_waiter, a reference will be dropped before it is taken, which leads to the objects reference count hitting zero, which leads to lifecycle events for the object happening too soon, leading to the assertion failure later on.

(This is simplified and clarified from the original upstream analysis for this patch at https://www.redhat.com/archives/linux-cachefs/2018-February/msg00001.html and from a similar patch with a different approach to fixing the bug at https://www.redhat.com/archives/linux-cachefs/2017-June/msg00002.html)

[Fix]

 (Old sauce patch being reverted) Move fscache_enqueue_retrieval under the lock in cachefiles_read_waiter. This means that the object cannot be popped off the to_do list until it is in a fully consistent state with the reference taken.

 (New upstream patch) Explicitly take a reference to the object while it is being enqueued. Adjust another part of the code to deal with the greater range of object states this exposes.

[Testcase]
A user has run ~100 hours of NFS stress tests and not seen this bug recur.

[Regression Potential]
 - Limited to fscache/cachefiles.
 - The change makes things more conservative (taking more references) so that's reassuring.
 - There may be performance impacts but none have been observed so far.

Daniel Axtens (daxtens)
Changed in linux (Ubuntu):
status: New → Confirmed
assignee: nobody → Daniel Axtens (daxtens)
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
Changed in linux (Ubuntu Artful):
status: New → Fix Committed
Changed in linux (Ubuntu Bionic):
status: New → Fix Committed
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Revision history for this message
David Coronel (davecore) wrote :

I have confirmation from a user running NFS stress tests on Xenial with this kernel and has not seen any issues. Changed tag to verification-done-xenial.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.13.0-46.51

---------------
linux (4.13.0-46.51) artful; urgency=medium

  * linux: 4.13.0-46.51 -proposed tracker (LP: #1776333)

  * register on binfmt_misc may overflow and crash the system (LP: #1775856)
    - fs/binfmt_misc.c: do not allow offset overflow

  * CVE-2018-11508
    - compat: fix 4-byte infoleak via uninitialized struct field

  * rfi-flush: Switch to new linear fallback flush (LP: #1744173)
    - SAUCE: rfi-flush: Factor out init_fallback_flush()
    - SAUCE: rfi-flush: Move rfi_flush_fallback_area to end of paca
    - powerpc/64s: Improve RFI L1-D cache flush fallback
    - powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again
    - powerpc/rfi-flush: Differentiate enabled and patched flush types
    - powerpc/rfi-flush: Call setup_rfi_flush() after LPM migration

  * Fix enabling bridge MMIO windows (LP: #1771344)
    - powerpc/eeh: Fix enabling bridge MMIO windows

  * CVE-2018-1130
    - dccp: check sk for closed state in dccp_sendmsg()

  * CVE-2018-7757
    - scsi: libsas: fix memory leak in sas_smp_get_phy_events()

  * cpum_sf: ensure sample freq is non-zero (LP: #1772593)
    - s390/cpum_sf: ensure sample frequency of perf event attributes is non-zero

  * wlp3s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-22)
    (LP: #1720930)
    - iwlwifi: mvm: fix "failed to remove key" message

  * CVE-2018-6927
    - futex: Prevent overflow by strengthen input validation

  * After update to 4.13-43 Intel Graphics are Laggy (LP: #1773520)
    - SAUCE: Revert "drm/i915/edp: Allow alternate fixed mode for eDP if
      available."

  * ELANPAD ELAN0612 does not work, patch available (LP: #1773509)
    - SAUCE: Input: elan_i2c - add ELAN0612 to the ACPI table

  * kernel backtrace when receiving large UDP packages (LP: #1772031)
    - iov_iter: fix page_copy_sane for compound pages

  * FS-Cache: Assertion failed: FS-Cache: 6 == 5 is false (LP: #1774336)
    - SAUCE: CacheFiles: fix a read_waiter/read_copier race

  * CVE-2018-5803
    - sctp: verify size of a new chunk in _sctp_make_chunk()

  * enable mic-mute hotkey and led on Lenovo M820z and M920z (LP: #1774306)
    - ALSA: hda/realtek - Enable mic-mute hotkey for several Lenovo AIOs

  * CVE-2018-7755
    - SAUCE: floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl

  * CVE-2018-5750
    - ACPI: sbshc: remove raw pointer from printk() message

 -- Khalid Elmously <email address hidden> Mon, 11 Jun 2018 23:25:30 +0000

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (49.5 KiB)

This bug was fixed in the package linux - 4.15.0-24.26

---------------
linux (4.15.0-24.26) bionic; urgency=medium

  * linux: 4.15.0-24.26 -proposed tracker (LP: #1776338)

  * Bionic update: upstream stable patchset 2018-06-06 (LP: #1775483)
    - drm: bridge: dw-hdmi: Fix overflow workaround for Amlogic Meson GX SoCs
    - i40e: Fix attach VF to VM issue
    - tpm: cmd_ready command can be issued only after granting locality
    - tpm: tpm-interface: fix tpm_transmit/_cmd kdoc
    - tpm: add retry logic
    - Revert "ath10k: send (re)assoc peer command when NSS changed"
    - bonding: do not set slave_dev npinfo before slave_enable_netpoll in
      bond_enslave
    - ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
    - ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts
    - KEYS: DNS: limit the length of option strings
    - l2tp: check sockaddr length in pppol2tp_connect()
    - net: validate attribute sizes in neigh_dump_table()
    - llc: delete timers synchronously in llc_sk_free()
    - tcp: don't read out-of-bounds opsize
    - net: af_packet: fix race in PACKET_{R|T}X_RING
    - tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
    - net: fix deadlock while clearing neighbor proxy table
    - team: avoid adding twice the same option to the event list
    - net/smc: fix shutdown in state SMC_LISTEN
    - team: fix netconsole setup over team
    - packet: fix bitfield update race
    - tipc: add policy for TIPC_NLA_NET_ADDR
    - pppoe: check sockaddr length in pppoe_connect()
    - vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi
    - amd-xgbe: Add pre/post auto-negotiation phy hooks
    - sctp: do not check port in sctp_inet6_cmp_addr
    - amd-xgbe: Improve KR auto-negotiation and training
    - strparser: Do not call mod_delayed_work with a timeout of LONG_MAX
    - amd-xgbe: Only use the SFP supported transceiver signals
    - strparser: Fix incorrect strp->need_bytes value.
    - net: sched: ife: signal not finding metaid
    - tcp: clear tp->packets_out when purging write queue
    - net: sched: ife: handle malformed tlv length
    - net: sched: ife: check on metadata length
    - llc: hold llc_sap before release_sock()
    - llc: fix NULL pointer deref for SOCK_ZAPPED
    - net: ethernet: ti: cpsw: fix tx vlan priority mapping
    - virtio_net: split out ctrl buffer
    - virtio_net: fix adding vids on big-endian
    - KVM: s390: force bp isolation for VSIE
    - s390: correct module section names for expoline code revert
    - microblaze: Setup dependencies for ASM optimized lib functions
    - commoncap: Handle memory allocation failure.
    - scsi: mptsas: Disable WRITE SAME
    - cdrom: information leak in cdrom_ioctl_media_changed()
    - m68k/mac: Don't remap SWIM MMIO region
    - block/swim: Check drive type
    - block/swim: Don't log an error message for an invalid ioctl
    - block/swim: Remove extra put_disk() call from error path
    - block/swim: Rename macros to avoid inconsistent inverted logic
    - block/swim: Select appropriate drive on device open
    - block/swim: Fix array bounds check
    - block/swim: Fix IO error at end of medium
    -...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-153.203

---------------
linux (3.13.0-153.203) trusty; urgency=medium

  * linux: 3.13.0-153.203 -proposed tracker (LP: #1776819)

  * CVE-2018-3665 (x86)
    - x86/fpu: Print out whether we are doing lazy/eager FPU context switches
    - x86/fpu: Default eagerfpu=on on all CPUs
    - x86/fpu: Fix math emulation in eager fpu mode

linux (3.13.0-152.202) trusty; urgency=medium

  * linux: 3.13.0-152.202 -proposed tracker (LP: #1776350)

  * CVE-2017-15265
    - ALSA: seq: Fix use-after-free at creating a port

  * register on binfmt_misc may overflow and crash the system (LP: #1775856)
    - fs/binfmt_misc.c: do not allow offset overflow

  * CVE-2018-1130
    - dccp: check sk for closed state in dccp_sendmsg()
    - ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped

  * add_key04 in LTP syscall test cause kernel oops (NULL pointer dereference)
    with T kernel (LP: #1775316) // CVE-2017-12193
    - assoc_array: Fix a buggy node-splitting case

  * CVE-2017-12154
    - kvm: nVMX: Don't allow L2 to access the hardware CR8

  * CVE-2018-7757
    - scsi: libsas: fix memory leak in sas_smp_get_phy_events()

  * CVE-2018-6927
    - futex: Prevent overflow by strengthen input validation

  * FS-Cache: Assertion failed: FS-Cache: 6 == 5 is false (LP: #1774336)
    - SAUCE: CacheFiles: fix a read_waiter/read_copier race

  * CVE-2018-5803
    - sctp: verify size of a new chunk in _sctp_make_chunk()

  * WARNING: CPU: 28 PID: 34085 at /build/linux-
    90Gc2C/linux-3.13.0/net/core/dev.c:1433 dev_disable_lro+0x87/0x90()
    (LP: #1771480)
    - net/core: generic support for disabling netdev features down stack
    - SAUCE: Backport helper function netdev_upper_get_next_dev_rcu

  * CVE-2018-7755
    - SAUCE: floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl

  * CVE-2018-5750
    - ACPI: sbshc: remove raw pointer from printk() message

 -- Stefan Bader <email address hidden> Thu, 14 Jun 2018 07:00:42 +0200

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (29.8 KiB)

This bug was fixed in the package linux - 4.4.0-130.156

---------------
linux (4.4.0-130.156) xenial; urgency=medium

  * linux: 4.4.0-130.156 -proposed tracker (LP: #1776822)

  * CVE-2018-3665 (x86)
    - x86/fpu: Fix early FPU command-line parsing
    - x86/fpu: Fix 'no387' regression
    - x86/fpu: Disable MPX when eagerfpu is off
    - x86/fpu: Default eagerfpu=on on all CPUs
    - x86/fpu: Fix FNSAVE usage in eagerfpu mode
    - x86/fpu: Fix math emulation in eager fpu mode
    - x86/fpu: Fix eager-FPU handling on legacy FPU machines

linux (4.4.0-129.155) xenial; urgency=medium

  * linux: 4.4.0-129.155 -proposed tracker (LP: #1776352)

  * Xenial update to 4.4.134 stable release (LP: #1775771)
    - MIPS: ptrace: Expose FIR register through FP regset
    - MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
    - KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
    - affs_lookup(): close a race with affs_remove_link()
    - aio: fix io_destroy(2) vs. lookup_ioctx() race
    - ALSA: timer: Fix pause event notification
    - mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
    - libata: Blacklist some Sandisk SSDs for NCQ
    - libata: blacklist Micron 500IT SSD with MU01 firmware
    - xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
    - Revert "ipc/shm: Fix shmat mmap nil-page protection"
    - ipc/shm: fix shmat() nil address after round-down when remapping
    - kasan: fix memory hotplug during boot
    - kernel/sys.c: fix potential Spectre v1 issue
    - kernel/signal.c: avoid undefined behaviour in kill_something_info
    - xfs: remove racy hasattr check from attr ops
    - do d_instantiate/unlock_new_inode combinations safely
    - firewire-ohci: work around oversized DMA reads on JMicron controllers
    - NFSv4: always set NFS_LOCK_LOST when a lock is lost.
    - ALSA: hda - Use IS_REACHABLE() for dependency on input
    - ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
    - kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
    - tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into
      account
    - PCI: Add function 1 DMA alias quirk for Marvell 9128
    - tools lib traceevent: Simplify pointer print logic and fix %pF
    - perf callchain: Fix attr.sample_max_stack setting
    - tools lib traceevent: Fix get_field_str() for dynamic strings
    - dm thin: fix documentation relative to low water mark threshold
    - nfs: Do not convert nfs_idmap_cache_timeout to jiffies
    - watchdog: sp5100_tco: Fix watchdog disable bit
    - kconfig: Don't leak main menus during parsing
    - kconfig: Fix automatic menu creation mem leak
    - kconfig: Fix expr_free() E_NOT leak
    - ipmi/powernv: Fix error return code in ipmi_powernv_probe()
    - Btrfs: set plug for fsync
    - btrfs: Fix out of bounds access in btrfs_search_slot
    - Btrfs: fix scrub to repair raid6 corruption
    - scsi: fas216: fix sense buffer initialization
    - HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
    - jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
    - powerpc/numa: Use ibm,max-associativity-domains to discover possib...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (4.1 KiB)

This bug was fixed in the package linux - 4.15.0-29.31

---------------
linux (4.15.0-29.31) bionic; urgency=medium

  * linux: 4.15.0-29.31 -proposed tracker (LP: #1782173)

  * [SRU Bionic][Cosmic] kernel panic in ipmi_ssif at msg_done_handler
    (LP: #1777716)
    - ipmi_ssif: Fix kernel panic at msg_done_handler

  * Update to ocxl driver for 18.04.1 (LP: #1775786)
    - misc: ocxl: use put_device() instead of device_unregister()
    - powerpc: Add TIDR CPU feature for POWER9
    - powerpc: Use TIDR CPU feature to control TIDR allocation
    - powerpc: use task_pid_nr() for TID allocation
    - ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
    - ocxl: Expose the thread_id needed for wait on POWER9
    - ocxl: Add an IOCTL so userspace knows what OCXL features are available
    - ocxl: Document new OCXL IOCTLs
    - ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()

  * Critical upstream bugfix missing in Ubuntu 18.04 - frequent Xorg crash after
    suspend (LP: #1776887)
    - ocxl: Document the OCXL_IOCTL_GET_METADATA IOCTL

  * Hard LOCKUP observed on stressing Ubuntu 18 04 (LP: #1777194)
    - powerpc: use NMI IPI for smp_send_stop
    - powerpc: Fix smp_send_stop NMI IPI handling

  * IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
    CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n (LP: #1773964)
    - rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

  * [Regression] EXT4-fs error (device sda2): ext4_validate_block_bitmap:383:
    comm stress-ng: bg 4705: bad block bitmap checksum (LP: #1781709)
    - SAUCE: Revert "UBUNTU: SAUCE: ext4: fix ext4_validate_inode_bitmap: comm
      stress-ng: Corrupt inode bitmap"
    - SAUCE: ext4: check for allocation block validity with block group locked

linux (4.15.0-28.30) bionic; urgency=medium

  * linux: 4.15.0-28.30 -proposed tracker (LP: #1781433)

  * Cannot set MTU higher than 1500 in Xen instance (LP: #1781413)
    - xen-netfront: Fix mismatched rtnl_unlock
    - xen-netfront: Update features after registering netdev

linux (4.15.0-27.29) bionic; urgency=medium

  * linux: 4.15.0-27.29 -proposed tracker (LP: #1781062)

  * [Regression] EXT4-fs error (device sda1): ext4_validate_inode_bitmap:99:
    comm stress-ng: Corrupt inode bitmap (LP: #1780137)
    - SAUCE: ext4: fix ext4_validate_inode_bitmap: comm stress-ng: Corrupt inode
      bitmap

linux (4.15.0-26.28) bionic; urgency=medium

  * linux: 4.15.0-26.28 -proposed tracker (LP: #1780112)

  * failure to boot with linux-image-4.15.0-24-generic (LP: #1779827) // Cloud-
    init causes potentially huge boot delays with 4.15 kernels (LP: #1780062)
    - random: Make getrandom() ready earlier

linux (4.15.0-25.27) bionic; urgency=medium

  * linux: 4.15.0-25.27 -proposed tracker (LP: #1779354)

  * hisi_sas_v3_hw: internal task abort: timeout and not done. (LP: #1777736)
    - scsi: hisi_sas: Update a couple of register settings for v3 hw

  * hisi_sas: Add missing PHY spinlock init (LP: #1777734)
    - scsi: hisi_sas: Add missing PHY spinlock init

  * hisi_sas: improve read performance by pre-allocating slot DMA buffers
    (LP: #1777727)
    - scsi: hisi_sas: use dma_zalloc_cohe...

Read more...

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Daniel Axtens (daxtens)
description: updated
Andy Whitcroft (apw)
tags: added: kernel-fixup-verification-needed-bionic
removed: verification-needed-bionic
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Andy Whitcroft (apw) wrote :

This bug was erroneously marked for verification in bionic; verification is not required and verification-needed-bionic is being removed.

tags: removed: verification-needed-bionic
Andy Whitcroft (apw)
tags: added: verification-done-bionic
Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.