Need for per-unit blacklist of osd-devices
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Fix Released
|
Medium
|
Frode Nordahl |
Bug Description
Over time nodes running Ceph OSDs will eventually grow bad disks. While Ceph itself handles the bulk of this problem domain, the charm plays a important part in the operational handling of this.
Having a node with a bad, but still present to the operating system disk device, can in some circumstances lead to complications for Juju and ceph-osd charm operation of said node.
During the ceph-osd charm handling of 'config-changed' events the charm will make an attempt at initialize and format any currently not active disk devices listed in the 'osd-devices' config option. When this operation fails due to bad disk the charm will end up in a error state, leaving the node inoperable through Juju.
At initial deployment time, getting a error for unsuccessful initialization is useful and expected. Having a ceph-osd unit in error state due to a bad disk further down the road is not desirable. Note that it may not make operational sense to swap the physical disk immediately and the node should be operable even with a bad disk.
There currently exist three config options that could have an effect on this behaviour: 'osd-reformat', 'ignore-
However config options is set at the application-level in the Juju model and in a large cluster it may not be desirable to change any of these cluster-wide as that will affect how the rest of the cluster is managed and operated.
Suggestion:
- Add device blacklist handling to ceph-osd charm
- The list could be managed using actions 'blacklist-
- The blacklisted disks could be listed under the 'blacklisted' key returned by the existing 'list-disks' action
tags: | added: sts |
Changed in charm-ceph-osd: | |
assignee: | nobody → Frode Nordahl (fnordahl) |
description: | updated |
Changed in charm-ceph-osd: | |
importance: | Undecided → Medium |
status: | New → Triaged |
Changed in charm-ceph-osd: | |
milestone: | none → 17.11 |
Changed in charm-ceph-osd: | |
status: | Fix Committed → Fix Released |
Fix proposed to branch: master /review. openstack. org/517989
Review: https:/