Deployment was stuck as one node was stuck on reboot

Bug #1438933 reported by Sergii Golovatiuk
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Łukasz Oleś

Bug Description

On large deployment installation we had a situation when one node was stuck on reboot (20 minutes)

root@node-16:~# uptime -s
2015-03-31 16:58:09

though in astute.log I see

2015-03-31T18:27:32 debug: [535] 135c09a6-b082-40c9-9eaf-1da3d3af4e22: MC agent 'puppetd', method 'enable', results: {:sender=>"25", :statuscode=>0, :statusmsg=>"OK", :data=>{:output=>"Already enabled"}}
2015-03-31T18:27:33 debug: [535] Retry #1 to run mcollective agent on nodes: '16'

which means the reboot was issues somewhere around 16:25-26

We should add tolerate functions like what we do for provisioning.

Changed in fuel:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Łukasz Oleś (loles)
milestone: none → 6.1
Revision history for this message
Łukasz Oleś (loles) wrote :

Deployment fails if pre_deployment_action fails on any node. It doesn't fail during pre_deploy action and during deploy. I will prepare a fix

Łukasz Oleś (loles)
Changed in fuel:
status: Triaged → Won't Fix
status: Won't Fix → In Progress
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

@Lukasz, do you have an update or fix on review? Could you please link, if any WIP?

summary: - Deployment was stuck as one one was stuck on reboot
+ Deployment was stuck as one node was stuck on reboot
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Correct me please, if I'm wrong, but this issue should be fixed in the scope of the https://blueprints.launchpad.net/fuel/+spec/200-nodes-support, hence superseded and won't fix

Changed in fuel:
status: In Progress → Won't Fix
Revision history for this message
Łukasz Oleś (loles) wrote :

It should, but we missed pre deploy actions. I'm working on it

Changed in fuel:
status: Won't Fix → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/183081

tags: added: module-astute
Changed in fuel:
assignee: Łukasz Oleś (loles) → Evgeniy L (rustyrobot)
Evgeniy L (rustyrobot)
Changed in fuel:
assignee: Evgeniy L (rustyrobot) → Łukasz Oleś (loles)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/183081
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=b09729c64b695b2e6fcc88c31843321759ec45d5
Submitter: Jenkins
Branch: master

commit b09729c64b695b2e6fcc88c31843321759ec45d5
Author: Łukasz Oleś <email address hidden>
Date: Wed May 13 03:19:16 2015 +0200

    Remove nodes which failed to provision

    Currently during provision some nodes may fail but provision
    will success. This failed nodes are causing pre deployment actions
    to fail.
    This change removes failed nodes from deployment info and from all tasks.
    It is safe to do because currently we allow only compute nodes to fail.

    Change-Id: I5c3b677ca49ad9d2fd93a6ca1f524edc91e0766d
    Closes-bug: #1438933

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.