FUEL doesn't check nodes via mcollective

Bug #1373988 reported by Gleb
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Vladimir Sharshov
5.1.x
Fix Committed
High
Vladimir Sharshov

Bug Description

I think that FUEL must check if nodes can be managed via mcollective before creating task.
I had to use non standard bootstrap image and mcollective doesn't work with it, but FUEL couldn't see the trouble.
Everything looked nice in UI until I started the deployment.

I suggest FUEL to check mco ping before the create new task.
 And it should reject task if some nodes aren't reachable.

Tags: astute
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
milestone: none → 6.0
importance: Undecided → High
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Actually, the problem was with mcollective credentials - so let's add a simple check running '/bin/true' on the nodes in order to verify that everything works.

Changed in fuel:
status: New → Confirmed
assignee: Fuel Library Team (fuel-library) → Fuel Astute Team (fuel-astute)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

The check we use already in astute is just fine. The one that checks nailgun_systemtype over mcollective is suitable and already defined.

tags: added: astute
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

If we talk about provisioning operation, first mcollective operation will be 'erase_node' from mcollective 'erase_node' agent.
And we do not analyze problem in this step, for example with inaccessible or error nodes. In first minutes after starting deploy we can got only error about problem with rebooting (but without mcollective erasing we can successfully rebooting and maybe installing OS or got error in this step or just reboot to bootstrap stage again).

To solve this we just need stop deployment if any of nodes got problem with erasing. This one should solve problem.

What do you think?

Changed in fuel:
assignee: Fuel Astute Team (fuel-astute) → Vladimir Sharshov (vsharshov)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

+1 error handling in erase_node is a good approach.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/135581

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Evgeniy L (rustyrobot) wrote :

"non standard bootstrap image" - I'm not sure if it's really High and should be backported to '5.1.1'.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/135581
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=84cca6c1789552df6d9a775faad026ad85abe8d5
Submitter: Jenkins
Branch: master

commit 84cca6c1789552df6d9a775faad026ad85abe8d5
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Wed Nov 19 15:08:32 2014 +0300

    Check nodes availability using mcollective

    If we talk about provisioning operation,
    first mcollective operation will be 'erase_node'
    from mcollective 'erase_node' agent.

    We just need stop deployment if any of nodes
    got problem with erasing.

    This fix help to detect problem with bootstrap
    image as early as possible.

    Change-Id: I3047f7cc097eda5e63099f0299f7696c15fba10f
    Closes-Bug: #1373988

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/136397

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Yes, we need this in 5.1.1. Non standard bootstrap image is what we have to use in production every time we come across hardware that for whatever reason doesn't work right with our stock bootstrap image.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (stable/5.1)

Reviewed: https://review.openstack.org/136397
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=dade74af41d4972fe05a1c16ae1db2a2e60c6715
Submitter: Jenkins
Branch: stable/5.1

commit dade74af41d4972fe05a1c16ae1db2a2e60c6715
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Wed Nov 19 15:08:32 2014 +0300

    Check nodes availability using mcollective

    If we talk about provisioning operation,
    first mcollective operation will be 'erase_node'
    from mcollective 'erase_node' agent.

    We just need stop deployment if any of nodes
    got problem with erasing.

    This fix help to detect problem with bootstrap
    image as early as possible.

    Change-Id: I3047f7cc097eda5e63099f0299f7696c15fba10f
    Closes-Bug: #1373988
    (cherry picked from commit 84cca6c1789552df6d9a775faad026ad85abe8d5)

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Verified on ISO #49 for CentOS and Ubuntu

"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "auth_required": true, "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-12-09_22-41-06", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "49", "api": "1.0", "nailgun_sha": "22bd43b89a17843f9199f92d61fc86cb0f8772f1", "production": "docker", "fuelmain_sha": "3aab16667f47dd8384904e27f70f7a87ba15f4ee", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}}}, "fuellib_sha": "2c99931072d951301d395ebd5bf45c8d401301bb"}

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.