Fuel for OpenStack

Fuel deletes nodes without cleaning up or warning user

Bug #1424060 reported by Chris Clason on 2015-02-20

This bug affects 3 people

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Released	Critical	Ryan Moe	Fuel for OpenStack 6.1
4.1.x	Won't Fix	Critical	Ryan Moe	Fuel for OpenStack 4.1.1-updates
5.0.x	Won't Fix	Critical	Ryan Moe	Fuel for OpenStack 5.0-updates
5.1.x	Won't Fix	Critical	Ryan Moe	Fuel for OpenStack 5.1.1-updates
6.0.x	Won't Fix	Critical	Ryan Moe	Fuel for OpenStack 6.0-updates
6.1.x	Fix Released	Critical	Ryan Moe	Fuel for OpenStack 6.1

Bug Description

Fuel allows users to delete nodes without warning the user that Fuel does absolutely zero clean-up to the node prior to pulling it out of service. This is extremely troubling for Ceph nodes since if enough of them are removed then there is a real risk for data loss.

At a minimum Fuel should tell users they need to follow the proper procedures to remove OSD's from the Ceph cluster when a user attempts to delete a Ceph node. Ideally, Fuel would automatically handle draining the OSD's and removing them from the Ceph cluster prior to taking a node out of service.

For compute nodes, it would be nice to evacuate the host prior to removing the node from service as well.

Tags:

Matthew Mosesohn (raytrac3r) on 2015-02-25

Changed in fuel:
assignee:	nobody → Fuel Python Team (fuel-python)
milestone:	none → 6.1
importance:	Undecided → High
status:	New → Confirmed

Andrew Woodward (xarses) on 2015-02-26

Changed in fuel:
importance:	High → Critical
assignee:	Fuel Python Team (fuel-python) → Ryan Moe (rmoe)

Revision history for this message

Andrew Woodward (xarses) wrote on 2015-02-26:

causes data loss == Critical

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2015-02-28:

This problem is further aggravated by bug #1415954, which can cause an unwarranted deployment failure when adding nodes to an existing cluster, prompting user to delete the nodes in error status and lose data that was rebalanced to the nodes during deployment.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-03-03:

we should probably modify the delete task to abort if it finds any running OSD's and point the user to an ops guide on how to remove them

Revision history for this message

Ryan Moe (rmoe) wrote on 2015-03-03:

That's the plan. I have an implementation of that that I'll be testing today. Right now the check isn't very sophisticated, just a pgrep on any node pending deletion with a Ceph OSD role but it should be enough for a first pass.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-04: Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/161045

Changed in fuel:
status:	Confirmed → In Progress

Miroslav Anashkin (manashkin) on 2015-03-04

tags:

added: docs

Revision history for this message

Mykola Golub (mgolub) wrote on 2015-03-05:

Just checking if an osd daemon is still running is not safe. You could have the daemon down but still the node to have valid data, availble only on this node, so removing it would lead to data loss.

I think we should follow the documentation:

http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

I.e. if the node is going to be removed from the cluster, the first command to run is:

ceph osd out {osd-num}

which stars migrating placement groups out of the OSD. And don't proceed with removal until it completes. If the osd daemon is down, but data is available from other replica, it will be used for rebalancing. If there is no other replica available the migration gets stuck and in this case manual intervation is necessary.

There could be different ways to check if the migration complete, the simplest looks like

ceph pg stat

and check that all pgs are active+clean, but it might give a false positive when the problem is due to some other osd. Still it might be safer to just abort here?

If a user wants a faster way it should do this on her own risk.

Revision history for this message

Ryan Moe (rmoe) wrote on 2015-03-05:

I agree that just checking for running OSD processes is not ideal. What would you suggest we check for? Is there a way we can check that a particular node has data that needs to be migrated? The idea is to not let the user delete nodes if it could potentially result in data loss, not to actually remove the OSDs from the cluster and wait for rebalancing to finish (they will have to do this work manually before Fuel will allow them to remove the node). That is a worthwhile feature to have in Fuel but it would require a blueprint.

Revision history for this message

Mykola Golub (mgolub) wrote on 2015-03-06:

As I has already written, the simplest way to ensure all data is migrated is after 'ceph osd out' wait unti 'ceph pg stat' returns all pgs in active+clean state.

The drawback of this approach is that it might make the osd removal get stuck due to a problem with some other osd (so all data from the removing osd are migrated but all pgs are not able to reach active+clean, due to other osd is currently down).

The more complicated approach would be to parse the output of 'ceph pg dump' and 'ceph pg <pg> query' commands ensuring that no pgs remain on the removing osd. I can provide more details for developers if necessary, still I think it is good enough to start from the simpler approach: anyway it is better for the user to resolve all potential issues before proceeding with removal.

Revision history for this message

Mykola Golub (mgolub) wrote on 2015-03-06:

Also trying the node deletion in a lab cluster (fuel v6.0.0), I noticed the following issues:

When a node is deleted it is not removed from the crush and osd maps, so the removed nodes will stay forever in osd tree in down state.

For the 3 OSD cluster with replication factor 2 I was able to remove down to 1 node without any warning from fuel, i.e. resulting in the cluster in degraded state. When I tried to remove the lats node, there was no warning when I was adding removal to pending (it was only about "Compute nodes are recommended for deployment"), still there was the error when I actually tried to deploy changes: "Number of OSD nodes (1) cannot be less than the Ceph object replication factor (2). Please either assign ceph-osd role to more nodes, or reduce Ceph replication factor in the Settings tab." So the fuel did not allow me to lost data (good), but It looks like the error was supposed to be triggered earlier, when I was reducing the cluster from 2 OSD to 1?

Revision history for this message

Ryan Moe (rmoe) wrote on 2015-03-06:

#10

In 6.0 the only check we do is ensuring that there are at least replica factor OSDs in the environment. But as you know this is not enough to prevent data loss. All we need is a check that can run on each OSD prior to deletion that tells us "Yes, this node is ok to delete" or "No, we can't delete this node". I'm interested in the details of your pg query solution.

Revision history for this message

Mikolaj Golub (to-my-trociny) wrote on 2015-03-07:

#11

The "pg query" solution could be:

after ceph osd out <osd_num>

1) run ceph pg dump, iterate over all pgs and check that osd_num is not for any pg in up or acting sets. If it is, return false (may not delete).
2) run 'ceph status' or 'ceph health detail' to check if lost objects are reported. If they are reported, get list of pgs with lost objects (e.g. 'ceph health detail' will report it), iterate over the list, runnig 'ceph pg <pg_num> query' and looking if osd_num is present in recovery_state/might_have_unfound list. If it is return false (may not delete).
3) return true (may delete).

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-17: Fix merged to fuel-astute (master)

#12

Reviewed: https://review.openstack.org/161045
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=1b42ec374a2b07d0d15f2f5d6ea037f4f413f080
Submitter: Jenkins
Branch: master

commit 1b42ec374a2b07d0d15f2f5d6ea037f4f413f080
Author: Ryan Moe <email address hidden>
Date: Tue Mar 3 16:36:41 2015 -0800

Prevent deletion of nodes which have running OSD processes

    Node deletion will fail if a node with the ceph-osd role still
    has PGs placed on any of its OSDs. Removing too many OSDs can result in
    data loss. The end-user will be required to manually remove
    those OSDs from the cluster and allow it to rebalance to ensure
    no data is lost when deleting these nodes as described here:
    http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#removing-osds-manual

    DocImpact
    Change-Id: I41173a83a3268455148652680a534e47296af319
    Closes-bug: #1424060

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-19: Related fix merged to fuel-astute (master)

#13

Reviewed: https://review.openstack.org/165254
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=a58858190a037556b0d1f6d6ab0881b958e4bcfa
Submitter: Jenkins
Branch: master

commit a58858190a037556b0d1f6d6ab0881b958e4bcfa
Author: Ryan Moe <email address hidden>
Date: Tue Mar 17 16:42:23 2015 -0700

Do not run check_ceph_osds when deleting a cluster

When deleting a cluster there is no reason to care whether
or not any Ceph nodes still contain data.

Related-bug: #1424060
Change-Id: Ibca91d215850b6902608c5e11363a53850312720

Przemyslaw Kaminski (pkaminski) on 2015-05-19

tags:

added: on-verification

Przemyslaw Kaminski (pkaminski) on 2015-05-19

tags:

added: qa-accept
removed: on-verification

Revision history for this message

Vitaly Sedelnik (vsedelnik) wrote on 2015-10-23:

#14

This is feature request, not a bug. Won't Fix for all versions prior to 6.1.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.