When env is deleted, Fuel must remove nodes from Cobbler even if nodes are not responding via RPC/ssh

Bug #1283825 reported by Mike Scherbakov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Evgeniy L

Bug Description

Env: Fuel ISO #180.

Not sure if it is the case, logs research is required. Still, it looks like that when I clicked on "remove env", and nodes were inaccessible by ssh & RPC, Fuel did not remove systems from Cobbler. So, when I turned on nodes, they started to boot from disk - as Cobbler system information forces them to do.

It becomes a real issue in production systems in case if node was offline during removal, as the only workaround is to go into Cobbler UI and manually remove systems.

Tags: astute
Revision history for this message
Mike Scherbakov (mihgen) wrote :
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

This is very strange, because we always delete nodes from a cobbler when run reset/delete env tasks. Of course, only if Nailgun send a data about this nodes to the orchestrator, but he send as i can see.

Extracts from logs:

2014-02-23T20:31:27 debug: [10792] Cobbler initialize with username: cobbler, password: cobbler
2014-02-23T20:31:27 info: [10792] Removing system from cobbler: node-1
2014-02-23T20:31:27 info: [10792] System has been successfully removed from cobbler: node-1
2014-02-23T20:31:27 info: [10792] Removing system from cobbler: node-2
2014-02-23T20:31:27 info: [10792] System has been successfully removed from cobbler: node-2
2014-02-23T20:31:27 info: [10792] Removing system from cobbler: node-3
2014-02-23T20:31:27 info: [10792] System has been successfully removed from cobbler: node-3
2014-02-23T20:31:27 debug: [10792] Cobbler syncing

2014-02-23T20:35:45 warning: [10792] 08fae517-5662-4834-9c84-09d0c5c22644: Removing of nodes ["1", "2", "3"] finished with errors. Nodes [{"uid"=>"1", "error"=>"Node not answered by RPC."}] are inaccessible
2014-02-23T20:35:45 info: [10792] 08fae517-5662-4834-9c84-09d0c5c22644: Finished removing of nodes: ["1", "2", "3"]

Is node-1 or another node was still present in cobbler web ui after reset env tasks was finished?

Changed in fuel:
status: New → Incomplete
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Looks like there were some messages stuck in RabbitMQ. When I restarted Naily, they came into place and among other logs I've seen cobbler removing.

> Is node-1 or another node was still present in cobbler web ui after reset env tasks was finished?
Not sure if task was finished correctly, and if it was reset or removal at that moment, but yes, nodes were present in Cobbler.

Please check
> This is very strange, because we always delete nodes from a cobbler when run reset/delete env tasks.
if this happens as well even in the case if some nodes were inaccessible by RPC or any other means, or if there was some other exception on the way. If operation fail on one node, we must ensure that operation is still applied on other nodes, and code which supposed to be run after that was ran.

Changed in fuel:
milestone: 4.1 → 5.0
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

> if this happens as well even in the case if some nodes were inaccessible by RPC or any other means

Yes. All nodes are removed from cluster if cobbler service is available. In any other cases (part or all nodes have offline status, mcollective fail) it will delete this nodes from cobbler.

> Looks like there were some messages stuck in RabbitMQ. When I restarted Naily, they came into place and among other logs I've seen cobbler removing.

This is new behavior of Naily and RabbitMQ. If it restarted/killed, all tasks will run again. This is part of stop/reset architecture.

> Looks like there were some messages stuck in RabbitMQ. When I restarted Naily, they came into place and among other logs I've seen cobbler removing.

Thanks, i will try to repeat!

Revision history for this message
Evgeniy L (rustyrobot) wrote :

Nailgun doesn't send nodes in orchestrator which are offline. As result they were not deleted from cobbler.

Changed in fuel:
assignee: Vladimir Sharshov (vsharshov) → Evgeniy L (rustyrobot)
status: Incomplete → In Progress
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: 5.0 → 4.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/75909

Revision history for this message
Evgeniy L (rustyrobot) wrote :

So, I've provided a patch https://review.openstack.org/#/c/75909/

It was tested on 2-nodes virtual box installation.
I checked that
* nodes delete from cobbler even not all of them present
* in case if mcollective try to erase mbr of offline node, it doesn't fail entire task

Here you can see orchestrator's logs for this case http://paste.openstack.org/show/69042/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/75909
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=5a89ca1bc95fb6e35a17c097fda11c79c79796ff
Submitter: Jenkins
Branch: master

commit 5a89ca1bc95fb6e35a17c097fda11c79c79796ff
Author: Evgeniy L <email address hidden>
Date: Mon Feb 24 20:07:11 2014 +0400

    Fix nodes deletion from cobbler in case of offline nodes

    We must send offline nodes to orchestrator
    to delete it from cobbler.

    Change-Id: I59b1f427ae61aa36e9a301b089cb2f08a107a725
    Closes-bug: #1283825

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Verified on Iso#112-with-gerrit

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.