Sharded OpWQ drops suicide_grace after waiting for work

Bug #1840348 reported by Kellen Renshaw on 2019-08-15
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Medium
Dan Hill
Bionic
Medium
Dan Hill
Eoan
Medium
Dan Hill
Focal
Medium
Dan Hill

Bug Description

[Impact]
The Sharded OpWQ will opportunistically wait for more work when processing an empty queue. While waiting, the heartbeat timeout and suicide_grace values are modified. The `threadpool_default_timeout` grace is left applied and suicide_grace is disabled.

After finding work, the original work queue grace/suicide_grace values are not re-applied. This can result in hung operations that do not trigger an OSD suicide recovery.

The missing suicide recovery was observed on Luminous 12.2.11. The environment was consistently hitting a known authentication race condition (issue#37778 [0]) due to repeated OSD service restarts on a node exhibiting MCEs from a faulty DIMM.

The auth race condition would stall pg operations. In some cases, the hung ops would persist for hours without suicide recovery.

[Test Case]
I have not identified a reliable reproducer. Currently testing the fix by exercising I/O.

Recommend letting this bake upstream before considering a back-port.

[Regression Potential]
This fix improves suicide_grace coverage of the Sharded OpWq.

This change is made in a critical code path that drives client I/O. An OSD suicide will trigger a service restart and repeated restarts (flapping) will adversely impact cluster performance.

The fix mitigates risk by keeping the applied suicide_grace value consistent with the value applied before entering `OSD::ShardedOpWQ::_process()`. The fix is also restricted to the empty queue edge-case that drops the suicide_grace timeout. The suicide_grace value is only re-applied when work is found after waiting on an empty queue.

- In-Progress -
Opened upstream tracker for issue#45076 [1] and fix pr#34575 [2]

[0] https://tracker.ceph.com/issues/37778
[1] https://tracker.ceph.com/issues/45076
[2] https://github.com/ceph/ceph/pull/34575

Eric Desrochers (slashd) on 2019-09-10
tags: added: sts
Dan Hill (hillpd) on 2020-02-13
Changed in ceph (Ubuntu):
status: New → Triaged
assignee: nobody → Dan Hill (hillpd)
importance: Undecided → Medium
James Page (james-page) wrote :

@hillpd any update on this bug?

Dan Hill (hillpd) on 2020-04-10
summary: - Ceph 12.2.11-0ubuntu0.18.04.2 doesn't honor suicide_grace
+ Sharded OpWQ drops suicide_grace after waiting for work
Dan Hill (hillpd) wrote :

There are two edge-cases in 12.2.11 where a worker thread's suicide_grace value gets dropped:
[0] In the Threadpool context, Threadpool:worker() drops suicide_grace while waiting on an empty work queue.
[1] In the ShardedThreadpool context, OSD::ShardedOpWQ::_process() drops suicide_grace while opportunistically waiting for more work (to prevent additional lock contention).

The Threadpool context always re-assigns suicide_grace before driving any work. The ShardedThreadpool context does not follow this pattern. After delaying to find additional work, the default sharded work queue timeouts are not re-applied.

This oversight exists in Luminous on-wards. Mimic, and Nautilus have each reworked the ShardedOpWQ code path, but did not address the problem.

[0] https://github.com/ceph/ceph/blob/v12.2.11/src/common/WorkQueue.cc#L137
[1] https://github.com/ceph/ceph/blob/v12.2.11/src/osd/OSD.cc#L10476

description: updated
description: updated
Dan Hill (hillpd) wrote :

Attaching the proposed fix for 12.2.13 that I am testing.

Dan Hill (hillpd) on 2020-04-10
Changed in ceph (Ubuntu Bionic):
status: New → Confirmed
assignee: nobody → Dan Hill (hillpd)
Changed in ceph (Ubuntu Eoan):
assignee: nobody → Dan Hill (hillpd)
Changed in ceph (Ubuntu Bionic):
importance: Undecided → Medium
Changed in ceph (Ubuntu Eoan):
importance: Undecided → Medium
status: New → Confirmed
Changed in ceph (Ubuntu Focal):
status: Triaged → Confirmed
description: updated

The attachment "ceph_12.2.13-0ubuntu0.18.04.1+20200409sf00238701b1.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Dan Hill (hillpd) on 2020-04-13
description: updated
Dan Hill (hillpd) on 2020-04-16
description: updated
Brian Murray (brian-murray) wrote :

The Eoan Ermine has reached end of life, so this bug will not be fixed for that release

Changed in ceph (Ubuntu Eoan):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers