Fuel deletes nodes without cleaning up or warning user

Bug #1424060 reported by Chris Clason on 2015-02-20
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Critical
Ryan Moe
4.1.x
Critical
Ryan Moe
5.0.x
Critical
Ryan Moe
5.1.x
Critical
Ryan Moe
6.0.x
Critical
Ryan Moe
6.1.x
Critical
Ryan Moe

Bug Description

Fuel allows users to delete nodes without warning the user that Fuel does absolutely zero clean-up to the node prior to pulling it out of service. This is extremely troubling for Ceph nodes since if enough of them are removed then there is a real risk for data loss.

At a minimum Fuel should tell users they need to follow the proper procedures to remove OSD's from the Ceph cluster when a user attempts to delete a Ceph node. Ideally, Fuel would automatically handle draining the OSD's and removing them from the Ceph cluster prior to taking a node out of service.

For compute nodes, it would be nice to evacuate the host prior to removing the node from service as well.

Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
milestone: none → 6.1
importance: Undecided → High
status: New → Confirmed
Andrew Woodward (xarses) on 2015-02-26
Changed in fuel:
importance: High → Critical
assignee: Fuel Python Team (fuel-python) → Ryan Moe (rmoe)
Andrew Woodward (xarses) wrote :

causes data loss == Critical

Dmitry Borodaenko (angdraug) wrote :

This problem is further aggravated by bug #1415954, which can cause an unwarranted deployment failure when adding nodes to an existing cluster, prompting user to delete the nodes in error status and lose data that was rebalanced to the nodes during deployment.

Bogdan Dobrelya (bogdando) wrote :

we should probably modify the delete task to abort if it finds any running OSD's and point the user to an ops guide on how to remove them

Ryan Moe (rmoe) wrote :

That's the plan. I have an implementation of that that I'll be testing today. Right now the check isn't very sophisticated, just a pgrep on any node pending deletion with a Ceph OSD role but it should be enough for a first pass.

Fix proposed to branch: master
Review: https://review.openstack.org/161045

Changed in fuel:
status: Confirmed → In Progress
tags: added: docs
Mykola Golub (mgolub) wrote :

Just checking if an osd daemon is still running is not safe. You could have the daemon down but still the node to have valid data, availble only on this node, so removing it would lead to data loss.

I think we should follow the documentation:

http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

I.e. if the node is going to be removed from the cluster, the first command to run is:

 ceph osd out {osd-num}

which stars migrating placement groups out of the OSD. And don't proceed with removal until it completes. If the osd daemon is down, but data is available from other replica, it will be used for rebalancing. If there is no other replica available the migration gets stuck and in this case manual intervation is necessary.

There could be different ways to check if the migration complete, the simplest looks like

  ceph pg stat

and check that all pgs are active+clean, but it might give a false positive when the problem is due to some other osd. Still it might be safer to just abort here?

If a user wants a faster way it should do this on her own risk.

Ryan Moe (rmoe) wrote :

I agree that just checking for running OSD processes is not ideal. What would you suggest we check for? Is there a way we can check that a particular node has data that needs to be migrated? The idea is to not let the user delete nodes if it could potentially result in data loss, not to actually remove the OSDs from the cluster and wait for rebalancing to finish (they will have to do this work manually before Fuel will allow them to remove the node). That is a worthwhile feature to have in Fuel but it would require a blueprint.

Mykola Golub (mgolub) wrote :

As I has already written, the simplest way to ensure all data is migrated is after 'ceph osd out' wait unti 'ceph pg stat' returns all pgs in active+clean state.

The drawback of this approach is that it might make the osd removal get stuck due to a problem with some other osd (so all data from the removing osd are migrated but all pgs are not able to reach active+clean, due to other osd is currently down).

The more complicated approach would be to parse the output of 'ceph pg dump' and 'ceph pg <pg> query' commands ensuring that no pgs remain on the removing osd. I can provide more details for developers if necessary, still I think it is good enough to start from the simpler approach: anyway it is better for the user to resolve all potential issues before proceeding with removal.

Mykola Golub (mgolub) wrote :

Also trying the node deletion in a lab cluster (fuel v6.0.0), I noticed the following issues:

When a node is deleted it is not removed from the crush and osd maps, so the removed nodes will stay forever in osd tree in down state.

For the 3 OSD cluster with replication factor 2 I was able to remove down to 1 node without any warning from fuel, i.e. resulting in the cluster in degraded state. When I tried to remove the lats node, there was no warning when I was adding removal to pending (it was only about "Compute nodes are recommended for deployment"), still there was the error when I actually tried to deploy changes: "Number of OSD nodes (1) cannot be less than the Ceph object replication factor (2). Please either assign ceph-osd role to more nodes, or reduce Ceph replication factor in the Settings tab." So the fuel did not allow me to lost data (good), but It looks like the error was supposed to be triggered earlier, when I was reducing the cluster from 2 OSD to 1?

Ryan Moe (rmoe) wrote :

In 6.0 the only check we do is ensuring that there are at least replica factor OSDs in the environment. But as you know this is not enough to prevent data loss. All we need is a check that can run on each OSD prior to deletion that tells us "Yes, this node is ok to delete" or "No, we can't delete this node". I'm interested in the details of your pg query solution.

Mikolaj Golub (to-my-trociny) wrote :

The "pg query" solution could be:

after ceph osd out <osd_num>

1) run ceph pg dump, iterate over all pgs and check that osd_num is not for any pg in up or acting sets. If it is, return false (may not delete).
2) run 'ceph status' or 'ceph health detail' to check if lost objects are reported. If they are reported, get list of pgs with lost objects (e.g. 'ceph health detail' will report it), iterate over the list, runnig 'ceph pg <pg_num> query' and looking if osd_num is present in recovery_state/might_have_unfound list. If it is return false (may not delete).
3) return true (may delete).

Reviewed: https://review.openstack.org/161045
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=1b42ec374a2b07d0d15f2f5d6ea037f4f413f080
Submitter: Jenkins
Branch: master

commit 1b42ec374a2b07d0d15f2f5d6ea037f4f413f080
Author: Ryan Moe <email address hidden>
Date: Tue Mar 3 16:36:41 2015 -0800

    Prevent deletion of nodes which have running OSD processes

    Node deletion will fail if a node with the ceph-osd role still
    has PGs placed on any of its OSDs. Removing too many OSDs can result in
    data loss. The end-user will be required to manually remove
    those OSDs from the cluster and allow it to rebalance to ensure
    no data is lost when deleting these nodes as described here:
    http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

    DocImpact
    Change-Id: I41173a83a3268455148652680a534e47296af319
    Closes-bug: #1424060

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/165254
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=a58858190a037556b0d1f6d6ab0881b958e4bcfa
Submitter: Jenkins
Branch: master

commit a58858190a037556b0d1f6d6ab0881b958e4bcfa
Author: Ryan Moe <email address hidden>
Date: Tue Mar 17 16:42:23 2015 -0700

    Do not run check_ceph_osds when deleting a cluster

    When deleting a cluster there is no reason to care whether
    or not any Ceph nodes still contain data.

    Related-bug: #1424060
    Change-Id: Ibca91d215850b6902608c5e11363a53850312720

tags: added: on-verification
tags: added: qa-accept
removed: on-verification
Vitaly Sedelnik (vsedelnik) wrote :

This is feature request, not a bug. Won't Fix for all versions prior to 6.1.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers