upgrade-charm ceph-osd causes ceph-osd to bounce services causing degredation and crush remapping
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ceph OSD Charm |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
During a typical charm upgrade to 19.10 version of ceph-osd charm on a xenial-queens cloud, I watched the state of my cluster and the crush map bounce around all the way to 14% degraded state on 202 OSDs, I saw 40 out at once.
I set noout at the beginning of the window, but this doesn't trigger no rebalancing in luminous/mimic, so we should probably add some method to upgrade-charm that sets some no-rebalance temporarily, or provide a flag/action to enable and disable that feature similar to the [un]set-noout actions on ceph-mon.
Suggest somehow that upgrade-charm grows the ability to smartly perform noout, norebalance, norecovery, nobackfill and then unset those states when complete to prevent data movement in the cluster for the brief ceph-osd service restarts.
I know that this is dangerous as a service may fail to restart and then you should be recovering at some point, so making these manual operator choices and actions would make sense.
tags: | added: charm-upgrade |
Changed in charm-ceph-osd: | |
importance: | Undecided → Wishlist |
status: | New → Triaged |