[nailgun] nodes don't return to discovered state

Bug #1273006 reported by Andrew Woodward
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Triaged
Medium
Fuel Sustaining
Newton
Triaged
Medium
Fuel Sustaining

Bug Description

case: if a node was previously in an error or deployed state, and for whichever reason the administrator want's to reset it. they should be able to force the node to load the discovery image and the node should switch status back to the discovered state.

If you force the node back into the discovered image it doesn't return to the discovered state, it just sits at its old state even though it should transition back to discovered so that tools can perform the necessary steps to re-provision/deploy

Changed in fuel:
status: New → Triaged
Changed in fuel:
milestone: none → 4.1
Revision history for this message
Evgeniy L (rustyrobot) wrote :

Hi Andrew,

It looks like reset environment feature https://blueprints.launchpad.net/fuel/+spec/nailgun-reset-env , doesn't it?

Revision history for this message
Andrew Woodward (xarses) wrote :

Not exactly, This is to allow a node that may be in a funky state to be able to run the bootstrap agent, and when that occurs the node must reset its data in the db as if it's a fresh node. This could be included in 'reset environment' work, but after re-reading the BP, it is likely out of scope

Changed in fuel:
milestone: 4.1 → 5.0
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Looks like you simply need to remove a node from environment, do not you?
If it is in a "funky" state, you really need to reset it / remove it from Nailgun, so Cobbler feeds it with bootstrap.

We have node removal feature for this, as well as Delete Environment.
Will this solve your issue?

Also, why do we have multi-l3 tag on this bug?

Ryan Moe (rmoe)
Changed in fuel:
milestone: 5.0 → 5.1
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Changing to Incomplete until questions from comment #3 are addressed.

Changed in fuel:
status: Triaged → Incomplete
Andrew Woodward (xarses)
tags: removed: multi-l3
Revision history for this message
Andrew Woodward (xarses) wrote :

No, deleting the node from the environment is not the goal here. We want to re-set it to discovery so that we can re-deploy the node in the current environment. Usually this occurs because there was a hardware error or other issue that caused the node to fail deployment. However when you force the node back into discovery, it doesn't reset the error state so it can not be re-deployed with out going through the overly un-necessary process of deleting/removing the node (which will cause it to lose all of its configuration)

Issues:

1) If you force a node into the bootstrap discovery image, it should reset the node back to discovered state. If you subsequently start the deployment, it should re-provision and bootstrap the node as if it was the first time adding the node to the cluster

2) You should be able to reset a node with out deleting it from the DB and cluster (I.E. It should maintain its Role, ID, Interface mappings

Changed in fuel:
status: Incomplete → Triaged
Dmitry Ilyin (idv1985)
summary: - nodes don't return to discovered state
+ [nailgun] nodes don't return to discovered state
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 5.1 → 6.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/132809

Revision history for this message
Dima Shulyak (dshulyak) wrote :

It is possible to provide desired functionality by refactoring ResetEnvironment task.
It should be possible to define what nodes should be resetted, and reset everything only in case no nodes was received

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/132809
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=0ff69ab5fde7bba3b2cb75a5efa20027c01781bf
Submitter: Jenkins
Branch: master

commit 0ff69ab5fde7bba3b2cb75a5efa20027c01781bf
Author: Dima Shulyak <email address hidden>
Date: Wed Nov 5 11:33:26 2014 +0200

    Refactor reset_to_discover procedure

    Added reset_to_discover method which should be used
    when node resetted to discovery state.
    - Reconfigure volumes
    - Move roles to pendings roles
    - Remove assigned ip address
    - Set pending addition

    Change-Id: I07f1d12f904e8faa76e35f00a49a978a6643eb20
    Closes-Bug: 1389572
    Related-Bug: 1273006

Changed in fuel:
milestone: 6.0 → 6.1
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: Triaged → Confirmed
Dmitry Pyzhov (dpyzhov)
tags: added: feature
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 6.1 → next
Dmitry Pyzhov (dpyzhov)
tags: removed: nailgun
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Maciej Kwiek (maciej-iai)
Changed in fuel:
assignee: Maciej Kwiek (maciej-iai) → Fuel Python Team (fuel-python)
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: next → 7.0
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Known issue. Moving to 8.0

tags: added: known-issue
Changed in fuel:
status: Confirmed → Won't Fix
tags: removed: feature
Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Alexey Shtokolov, why did you remove "feature" tag? It's not a bug, we never announce such functionality and hence this is a small enhancement . There should be a new handler for resetting node, which changes node status to "discover" and send to Astute message about removing node from cobbler and rebooting it.

So assign "feature" tag again.

tags: added: feature
Revision history for this message
Alexey Shtokolov (ashtokolov) wrote :

Igor Kalnitsky, this story looks like a re-installation feature implemented by Telco Team in 7.0
I've asked Michael Polenchuk to close this bug if it was covered by that feature.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 7.0 → 8.0
status: Won't Fix → Confirmed
no longer affects: fuel/8.0.x
Changed in fuel:
status: Confirmed → Triaged
Revision history for this message
Dmitriy Novakovskiy (dnovakovskiy) wrote :

So this appears to be a feature for the following UX:

1. User decides bring a node back to "discovered" state (from any currently applied one, incl. "provisioned" and "error")
2. User uses CLI or API to trigger this operation for a given node id
3. Fuel removes node from DB (and, if necessary, MOS cloud it's currently assigned to) prompts user to reboot it in PXE boot mode[1]

[1] Optionally - attempt to reboot it remotely, if node is accessible

Given that I suggest we add this as CLI-only option, with disclaimer provided - "Please be aware that any resources currently running on a selected node (VMs, Volumes etc) will be lost"

Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Revision history for this message
Alexander Kislitsky (akislitsky) wrote :

We passed SCF in 8.0. Moving the bug to 9.0.

Changed in fuel:
milestone: 8.0 → 9.0
Revision history for this message
Amine (amine-bouabdallah) wrote :

We stumbled into this bug when we wanted to to remove one compute node for a re-deployment.
Once the compute node has been removed, then rebooted and bootstrapped, it does not appear anymore in the fuel Web UI.

is there a workaround to this bug until it's been fully addressed ?

Thank you.

We are using MOS 7.0.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.