Request backport of ceph commits into bionic

Bug #1834235 reported by Chris Newcomer on 2019-06-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Bionic
High
Connor Kuehl

Bug Description

[Impact]

Deadlock may occur if iput_final() decides to wait for readahead pages
while a lock is held.

In order to resolve this, the following two patches install an
asynchronous "iput" for the ceph inodes so that a hold-and-wait deadlock
doesn't occur. A more detailed example is shown in the original patch:
https://github.com/ceph/ceph-client/commit/093ea205acd4b047cf5aacabc0c6ffecf198d2a9

Requested patches:

3e1d0452edcee ceph: avoid iput_final() while holding mutex or in dispatch thread
1cf89a8dee5e6 ceph: single workqueue for inode related works

[Test Case]

These changes were tested by the original requester with positive results over a few days in their own environment where they first experienced the regression. They have determined they are no longer experiencing the regression with this patch set applied to a test kernel.

[Regression Potential]

Several patches were required in order to cleanly cherry pick the requested patches. A large number of changes increases the regression potential, however, these pre-requisite patches have been in mainline since early 2018 and the blast radius is localized only to ceph.

Original bug description follows:
------------------------------------
Our internal cluster has run into a few ceph client related issues, which were root caused to be resolved by the following commits:
https://github.com/ceph/ceph-client/commit/f42a774a2123e6b29bb0ca296e166d0f089e9113
https://github.com/ceph/ceph-client/commit/093ea205acd4b047cf5aacabc0c6ffecf198d2a9
Can you please backport these into bionic?

3e1d0452edcee ceph: avoid iput_final() while holding mutex or in dispatch thread
1cf89a8dee5e6 ceph: single workqueue for inode related works

description: updated
Terry Rudd (terrykrudd) on 2019-06-25
Changed in linux (Ubuntu):
assignee: nobody → Connor Kuehl (connork)
importance: Undecided → High

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1834235

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Connor Kuehl (connork) on 2019-06-25
Changed in linux (Ubuntu):
status: Incomplete → In Progress
summary: - Request backport of ceph commits into bionic Edit
+ Request backport of ceph commits into bionic
Connor Kuehl (connork) on 2019-06-28
Changed in linux (Ubuntu Bionic):
importance: Undecided → High
assignee: nobody → Connor Kuehl (connork)
status: New → In Progress
Connor Kuehl (connork) on 2019-06-28
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic

Verified by the customer to fix the issue.

tags: added: verification-done-bionic
removed: verification-needed-bionic
Connor Kuehl (connork) on 2019-07-19
Changed in linux (Ubuntu):
status: In Progress → Invalid
assignee: Connor Kuehl (connork) → nobody
Launchpad Janitor (janitor) wrote :
Download full text (11.2 KiB)

This bug was fixed in the package linux - 4.15.0-55.60

---------------
linux (4.15.0-55.60) bionic; urgency=medium

  * linux: 4.15.0-55.60 -proposed tracker (LP: #1834954)

  * Request backport of ceph commits into bionic (LP: #1834235)
    - ceph: use atomic_t for ceph_inode_info::i_shared_gen
    - ceph: define argument structure for handle_cap_grant
    - ceph: flush pending works before shutdown super
    - ceph: send cap releases more aggressively
    - ceph: single workqueue for inode related works
    - ceph: avoid dereferencing invalid pointer during cached readdir
    - ceph: quota: add initial infrastructure to support cephfs quotas
    - ceph: quota: support for ceph.quota.max_files
    - ceph: quota: don't allow cross-quota renames
    - ceph: fix root quota realm check
    - ceph: quota: support for ceph.quota.max_bytes
    - ceph: quota: update MDS when max_bytes is approaching
    - ceph: quota: add counter for snaprealms with quota
    - ceph: avoid iput_final() while holding mutex or in dispatch thread

  * QCA9377 isn't being recognized sometimes (LP: #1757218)
    - SAUCE: USB: Disable USB2 LPM at shutdown

  * hns: fix ICMP6 neighbor solicitation messages discard problem (LP: #1833140)
    - net: hns: fix ICMP6 neighbor solicitation messages discard problem
    - net: hns: fix unsigned comparison to less than zero

  * Fix occasional boot time crash in hns driver (LP: #1833138)
    - net: hns: Fix probabilistic memory overwrite when HNS driver initialized

  * use-after-free in hns_nic_net_xmit_hw (LP: #1833136)
    - net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()

  * hns: attempt to restart autoneg when disabled should report error
    (LP: #1833147)
    - net: hns: Restart autoneg need return failed when autoneg off

  * systemd 237-3ubuntu10.14 ADT test failure on Bionic ppc64el (test-seccomp)
    (LP: #1821625)
    - powerpc: sys_pkey_alloc() and sys_pkey_free() system calls
    - powerpc: sys_pkey_mprotect() system call

  * [UBUNTU] pkey: Indicate old mkvp only if old and curr. mkvp are different
    (LP: #1832625)
    - pkey: Indicate old mkvp only if old and current mkvp are different

  * [UBUNTU] kernel: Fix gcm-aes-s390 wrong scatter-gather list processing
    (LP: #1832623)
    - s390/crypto: fix gcm-aes-s390 selftest failures

  * System crashes on hot adding a core with drmgr command (4.15.0-48-generic)
    (LP: #1833716)
    - powerpc/numa: improve control of topology updates
    - powerpc/numa: document topology_updates_enabled, disable by default

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

  * [UBUNTU] kernel: Fix wrong dispatching for control domain CPRBs
    (LP: #1832624)
    - s390/zcrypt: Fix wrong dispatching for control domain CPRBs

  * CVE-2019-11815
    - net: rds: force to destroy connection if t_sock is NULL in
      rds_tcp_kill_sock().

  * Sound device not detected after resume from hibernate (LP: #1826868)
    - drm/i915: Force 2*96 MHz cdclk on glk/cnl when audio power is enabled
    - drm/i915: Save the old CDCLK atomic state
...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers