upgrade-charm ceph-osd causes ceph-osd to bounce services causing degredation and crush remapping

Bug #1851999 reported by Drew Freiberger
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Triaged
Wishlist
Unassigned

Bug Description

During a typical charm upgrade to 19.10 version of ceph-osd charm on a xenial-queens cloud, I watched the state of my cluster and the crush map bounce around all the way to 14% degraded state on 202 OSDs, I saw 40 out at once.

I set noout at the beginning of the window, but this doesn't trigger no rebalancing in luminous/mimic, so we should probably add some method to upgrade-charm that sets some no-rebalance temporarily, or provide a flag/action to enable and disable that feature similar to the [un]set-noout actions on ceph-mon.

Suggest somehow that upgrade-charm grows the ability to smartly perform noout, norebalance, norecovery, nobackfill and then unset those states when complete to prevent data movement in the cluster for the brief ceph-osd service restarts.

I know that this is dangerous as a service may fail to restart and then you should be recovering at some point, so making these manual operator choices and actions would make sense.

tags: added: charm-upgrade
Changed in charm-ceph-osd:
importance: Undecided → Wishlist
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.