charm pause/resume actions removes the OSD, rather than just stoping a service

Bug #1793507 reported by Chris MacNaughton
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
Medium
David Ames

Bug Description

Per Ceph's documentation, 'down'ing the OSD will cause the ceph cluster to start rebalancing the cluter (http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#take-the-osd-out-of-the-cluster); it is surprising that `pause` would cause ceph to start a rebalance.

Tags: sts
Revision history for this message
David Ames (thedac) wrote :

We should rename the pause and resume actions.

Possibly extend the existing code to take a single osd out at a time.
Or possibly remove entirely.

James Page (james-page)
Changed in charm-ceph-osd:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Felipe Reyes (freyes) wrote :

Considering the charm deployment guide has a section where it describes an upgrade of the charms using pause/resume ( https://docs.openstack.org/project-deploy-guide/charm-deployment-guide/latest/app-upgrade-openstack.html#ha-with-pause-resume ), I think we should reconsider this bug's importance.

Last week I was dragged to a live session where the user was upgrading their env using that guide (which led to bug 1802917).

tags: added: sts
David Ames (thedac)
Changed in charm-ceph-osd:
assignee: nobody → David Ames (thedac)
milestone: none → 19.04
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (master)

Fix proposed to branch: master
Review: https://review.openstack.org/620175

Changed in charm-ceph-osd:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.openstack.org/620175
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=1e0c6548b886fba21055feb16396ac2ad13da7a8
Submitter: Zuul
Branch: master

commit 1e0c6548b886fba21055feb16396ac2ad13da7a8
Author: David Ames <email address hidden>
Date: Mon Nov 26 14:40:48 2018 -0800

    Rename pause/resume osd-out/osd-in

    The actions pause and resume actually take all osds on a unit out of the
    cluster. This is incredibly misleading.

    This change renames to osd-out and osd-in to better describe what the
    actions actually do.

    Change-Id: I76793999f5d3382563eff308a5d7c4db18d065a0
    Closes-Bug: #1793507

Changed in charm-ceph-osd:
status: In Progress → Fix Committed
Revision history for this message
Felipe Reyes (freyes) wrote :

David, do you have plans on backporting this change?, I understand that due to the nature of the patch may not be a good idea to backport it, on the other hand the idea of having a potential non desired rebalance not good either.

David Ames (thedac)
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.