Add "osd op queue cut off" charm setting, default to high
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
New
|
Undecided
|
Unassigned |
Bug Description
[Impact]
OSD heartbeat and map updates can get starved during backfill and recovery scenarios. This issue becomes significantly more pronounced with fast storage devices. The OSD's peers effectively DDoS the strict priority queue, stalling critical time-sensitive operations. This can result in the OSD being marked down when it is unable to respond to MON heartbeat requests.
The solution is to change "osd op queue cut off" to "high" shifting recovery operations out of the strict queue and into the weighted priority queue. The recovery ops will still be serviced at a higher priority, but will not starve critical cluster communications.
[Test Case]
This was encountered while attempting to perform an upgrade from 12.2.11 to 12.2.12 on an all-SSD cluster while under heavy workloads.
[Other Info]
The "high" cut-off was intended to be the default in Luminous. The author of the weighted priority queue discusses the cut-off in a ceph-users thread [0]. The "osd op queue cut off" default was set to "high" by upstream in the Octopus 15.2.0 release [1].
[0] https:/
[1] https:/
The default osd op queue is 'wpq' for Luminous and above. With the weighted priority queue set, you can use the following workaround to address this issue:
Update the juju configuration to persist the new cut off setting: flags=' {"osd": {"osd op queue cut off": "high"}}'
juju config ceph-osd config-
Take care when modifying config-flags to check for any existing flags that may have been set. The new flag must be merged with any that are already configured.
Then modify the run-time config by running the following command on the ceph-mon: queue_cut_ off high
sudo ceph tell osd.* config set osd_op_