Orchestrator doesn't clean deleted node from the cloud upon node removal

Bug #1471172 reported by Dmitry Nikishov on 2015-07-03
94
This bug affects 16 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Matthew Mosesohn
6.0.x
Medium
Fuel Library (Deprecated)
6.1.x
Medium
Fuel Library (Deprecated)
7.0.x
Medium
Fuel Library (Deprecated)
8.0.x
Medium
Fuel Library (Deprecated)
Mitaka
High
Fuel Library (Deprecated)

Bug Description

MOS 6.0

HA, any configuration

Upon node removal, orchestrator doesn't clean up the cloud. This node is still present in OpenStack database and clusters.

E.g. when a controller node is deleted, user has to manually clean up following:
- delete Neutron agents associated with removed nodes from DB
- delete Nova services associated with removed nodes from DB (or else certain OSTF test will be failing)
- delete removed node from Corosync/Pacemaker cluster (or else deployment of new nodes might fail)
- delete removed node from Ceph cluster (or else ceph -s will report HEALTH_WARN for missing/offline mon)
- delete appropriate devices from swift rings

This isn't obvious and it's not very good from UX perspective. Orchestrator should auto-clean up the cloud upon node removal.

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 7.0
importance: Undecided → Medium
status: New → Incomplete
status: Incomplete → Confirmed
Bogdan Dobrelya (bogdando) wrote :

At least "delete removed node from Corosync/Pacemaker cluster" was addressed in the 6.1 scope. I'm setting the status as incomplete for the 6.1 and later milestones as it requires QA confirmation

Aleksandr Didenko (adidenko) wrote :

The following ones are already addressed in 6.1:
- delete removed node from Corosync/Pacemaker cluster (or else deployment of new nodes might fail)
- delete removed node from Ceph cluster (or else ceph -s will report HEALTH_WARN for missing/offline mon)

I'm afraid we won't be able to backport such fixes into 6.0, due to large number of architectural and other changes (like corosync/pacemaker package versions and compatibility).

So in 6.1 user indeed has to manually clean up following:
- delete Neutron agents associated with removed nodes from DB
- delete Nova services associated with removed nodes from DB (or else certain OSTF test will be failing)
But it's not critical since it does not cause denial of service and cloud should operate just fine.

Bogdan Dobrelya (bogdando) wrote :

Adding swift related bug as a dup

Mike Scherbakov (mihgen) wrote :

Please create a separate ticket for docs team to explain manual steps to be done when we remove the node, if we can't automate this in 7.0 timeframe.

tags: added: release-notes
tags: added: docs
Vladimir Kuklin (vkuklin) wrote :

https://bugs.launchpad.net/fuel/+bug/1482203 created. Marking this bug as won't fix for 7.0

tags: added: feature
description: updated
Bogdan Dobrelya (bogdando) wrote :

This is a life-cycle-management feature, not a bug. Please submit a blueprint and attach this bug there

tags: added: life-cycle-management
tags: added: known-issue
Dmitry Pyzhov (dpyzhov) on 2015-10-12
Changed in fuel:
milestone: 7.0 → 8.0
no longer affects: fuel/8.0.x
Dmitry Pyzhov (dpyzhov) on 2015-10-22
tags: added: area-library
Changed in fuel:
assignee: Alex Schultz (alex-schultz) → nobody
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
milestone: 8.0 → 9.0
status: Triaged → New
no longer affects: fuel/future
no longer affects: fuel/future
Changed in fuel:
status: New → Won't Fix

Wrong link has been paster,
correct one:
https://bugs.launchpad.net/fuel/+bug/1471172/comments/7

Changed in fuel:
milestone: 10.0 → next
status: Won't Fix → Confirmed
no longer affects: fuel/future

Fix proposed to branch: master
Review: https://review.openstack.org/309330

Changed in fuel:
status: Confirmed → In Progress

That is a swarm blocker due to fact that all tests which remove/reinstall controller are failing on ostf step.

It it is hard to implement on library side, it is possible to relax requirements for ostf (allow failure of couple tests which check node statues)

tags: added: swarm-blocker
Dmitry Pyzhov (dpyzhov) wrote :

@Alexandr, this is a new feature request. Tests shouldn't rely on parts of LCM that are not supposed to be implemented in 9.0. As far as I know there is a separate bug for fix in OSTF. Removing 'swarm-blocker' tag.

tags: removed: swarm-blocker
tags: added: swarm-blocker
tags: removed: swarm-blocker

Reviewed: https://review.openstack.org/309330
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=c02971c21a1c0149926a0d1e048d9f798cc387e0
Submitter: Jenkins
Branch: master

commit c02971c21a1c0149926a0d1e048d9f798cc387e0
Author: Matthew Mosesohn <email address hidden>
Date: Fri Apr 22 12:57:11 2016 +0300

    Purge nova services for deleted nodes

    Deleted hosts create downed services in nova, which should
    be cleaned up on the next deploy cycle.

    Change-Id: Ie644062c9befec1745adcc8ca9e76a895cf0304c
    Depends-On: I01db215e77a3532a6fa7bf46ab7e20e281e8c165
    Depends-On: I303ab218bc6a48cf2c60727feecc522040a80a68
    Partial-Bug: #1471172

no longer affects: fuel/newton

Related fix proposed to branch: master
Change author: Evgeny Konstantinov <email address hidden>
Review: https://review.fuel-infra.org/22320

Reviewed: https://review.fuel-infra.org/22320
Submitter: Evgeny Konstantinov <email address hidden>
Branch: master

Commit: 2845bca5e1ccec1e894672bf3fc1c07493619608
Author: Evgeny Konstantinov <email address hidden>
Date: Wed Jun 22 10:30:56 2016

Add Nova resolved issues 9.0

Change-Id: Ia8c83f6dfffa2df143cf5c494e41e8676aff8028
Related-Bug: #1556819
Related-Bug: #1471172

tags: added: release-notes-done
removed: release-notes
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers