Recovery operation takes high priority than client I/O with mclock scheduler
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ceph (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Starting with Quincy, the mclock_scheduler is used as default for OSD op queue. However, the default recovery settings are very high that it the impact on client I/O can be really high depending on the amount of recovery operations needed to be done. This is a bug and has been fixed in 'main' branch and backported to Quincy [0][1].
There's no upstream Quincy release with this fix yet. 17.2.6 will have this fix which is undergoing QA at the moment.
Workaround:
There are couple of ways this can be mitigated in Quincy.
1. Use the 'wpq' as osd_op_queue. This has been the default in previous releases and works just fine. This will require restarting OSDs.
Steps:
i. Change osd_op_queue to 'wpq': `sudo ceph config set osd osd_op_queue wpq`
ii. Rolling restart of all the OSDs (with `noout` & `norebalance` flags set)
iii. Check that 'wpq' is now set: `ceph tell osd.* config get osd_op_queue`
2. Stick with mclock scheduler but use custom mclock profile. This will allow users to be able to modify recovery parameters.
```
osd_mclock_
osd_mclock_
osd_mclock_
```
To be able to use this option, 17.2.4 or later is required due to another
bug [2]. So probably it's both simpler & straightforward to stick with 'wpq' until the fix for [0] is available or 17.2.6 is out.
NB: This affects Quincy release only. Older (pacific, octopus, et all) use
'wpq' and as such the recovery parameters can be modified as usual. Only
starting from Quincy this has changed.
[0] https:/
[1] https:/
[2] https:/
tags: | added: sts |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Ceph 17.2.6 has been released now.
https:/ /ceph.io/ en/news/ blog/2023/ v17-2-6- quincy- released/
which has the fix for this issue.