Bug #1435610 “Fault Tolerance is broken in Task-based Deployment...” : Bugs : Fuel for OpenStack

Revision history for this message

Andrew Woodward (xarses) wrote on 2015-03-24:

#1

snapshot https://drive.google.com/file/d/0B6PBRlkJOUbYQnRCVnJOUktwSEU/view?usp=sharing

Revision history for this message

Dima Shulyak (dshulyak) wrote on 2015-03-25:

#2

There is a comment from Vova S. exactly on this topic:

https://github.com/stackforge/fuel-astute/blob/master/lib/astute/deployment_engine/granular_deployment.rb#L221-223

I think that it can be improved in the next way - we should fail only nodes that have tasks assigned to them after failed one

Dima Shulyak (dshulyak) on 2015-03-27

Changed in fuel:
importance:	High → Medium

Dmitry Pyzhov (dpyzhov) on 2015-03-27

tags:

added: feature-tasks

Dmitry Pyzhov (dpyzhov) on 2015-03-27

tags:

added: module-tasks
removed: feature-tasks

Dima Shulyak (dshulyak) on 2015-04-03

Changed in fuel:
milestone:	6.1 → 7.0

Dmitry Pyzhov (dpyzhov) on 2015-04-23

Changed in fuel:
importance:	Medium → High
assignee:	Fuel Python Team (fuel-python) → Vladimir Sharshov (vsharshov)
milestone:	7.0 → 6.1

Vladimir Sharshov (vsharshov) on 2015-04-30

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2015-04-30:

#3

Guys, at now moment we could not change such behavior because of our report logic. Nodes which had already taken ready status excluding primary-controller, will be excluded from future tasks. In our case this is mean that post tasks after failed tasks can run without necessary nodes because we already inform Nailgun about 'ready' status for such nodes.

Example:

Nodes: 1,2,3
post_hook_cirros (runs on node 1)
post_hook_host (runs on all nodes(1,2,3))

If post_hook_cirros failed and we will change our code, we got such case after re-run:

post_hook_cirros (runs on node 1, because this node mark as failed)
post_hook_host (runs on 1 not on 2 and 3 because this node already marked as ready in another task)

All that we can help user is show details message about failed hook and we do such thing (sorry, i had not taken screenshot in case of cirros error, but did it for another similar error):
https://www.dropbox.com/s/ybgij3zdmm1yvw2/Nailgun-info-error.png?dl=0

I suggest to move it to 7.0

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2015-04-30:

#4

Potential solution: we can change Nailgun behavior and send post hooks always for all nodes in cluster regardless of node status, but we need check and change where necessary all post hooks tasks, because now they suppose that will be run only on deploying nodes (excluding host hook ).

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2015-04-30:

#5

Another solution is connecting with possible future change in other bug https://bugs.launchpad.net/fuel/+bug/1439776. They are very connected.

Vladimir Sharshov (vsharshov) on 2015-04-30

Changed in fuel:
status:	In Progress → Confirmed

Dmitry Pyzhov (dpyzhov) on 2015-04-30

tags:	added: feature
Changed in fuel:
milestone:	6.1 → 7.0

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2015-04-30:

#6

This is not a feature, clearly. This is bug. It was not by design to be so. If we can't fix it in 6.1, let's see if need to provide some documentation piece for this.

tags:

removed: feature

Revision history for this message

Aviram Bar-Haim (aviramb) wrote on 2015-05-12:

#7

Upload_cirros.rb fails for us at the end of CentOS installations using ISOs 361 and 395.
Do we have an open bug for this issue?

Failure message:
Deployment has failed. Method granular_deploy. Failed to execute hook 'shell'.
---
priority: 800
fail_on_error: true
type: shell
uids:
- '4'
parameters:
  retries: 3
  cmd: ruby /etc/puppet/modules/osnailyfacter/modular/astute/upload_cirros.rb
  timeout: 180
  interval: 20

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2015-07-22:

#8

Still actual and https://bugs.launchpad.net/fuel/+bug/1435610/comments/3 is best explanation why we could not change it at now moment.

tags:	added: covered-by-bp
Changed in fuel:
assignee:	Vladimir Sharshov (vsharshov) → nobody
assignee:	nobody → Fuel Python Team (fuel-python)

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2015-07-22:

#9

Covered by: https://blueprints.launchpad.net/fuel/+spec/progress-bar-based-on-tasks

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2015-08-06:

#10

Moving to 8.0. Arch limitation. Could not fix without https://blueprints.launchpad.net/fuel/+spec/progress-bar-based-on-tasks

tags:	added: known-issue
Changed in fuel:
status:	Confirmed → Won't Fix

Nastya Urlapova (aurlapova) on 2015-08-10

tags:

added: qa-agree-8.0

Revision history for this message

Ihor Kalnytskyi (ikalnytskyi) wrote on 2015-09-22:

#11

Add 'frature' tag, since it's covered by blueprint and requires changes.

tags:

added: feature

Dmitry Pyzhov (dpyzhov) on 2015-10-12

Changed in fuel:
milestone:	7.0 → 8.0
no longer affects:	fuel/8.0.x

Dmitry Pyzhov (dpyzhov) on 2015-10-22

tags:

added: area-python

Alexey Shtokolov (ashtokolov) on 2016-01-20

Changed in fuel:
milestone:	8.0 → 9.0

Fuel Devops McRobotson (fuel-devops-robot) on 2016-04-19

Changed in fuel:
milestone:	9.0 → 10.0

Dmitry Pyzhov (dpyzhov) on 2016-04-27

Changed in fuel:
assignee:	Fuel Python Team (fuel-python) → Fuel Toolbox (fuel-toolbox)

Alexey Shtokolov (ashtokolov) on 2016-05-11

Changed in fuel:
assignee:	Fuel Toolbox (fuel-toolbox) → Vladimir Sharshov (vsharshov)
summary:	- upload_cirros failed and marked all nodes failed + Fault Tolerance is broken in Task-based Deployment

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-24: Related fix proposed to fuel-astute (master)

#12

Related fix proposed to branch: master
Review: https://review.openstack.org/320605

OpenStack Infra (hudson-openstack) on 2016-05-31

Changed in fuel:
assignee:	Vladimir Sharshov (vsharshov) → Bulat Gaifullin (bgaifullin)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-31: Fix proposed to fuel-library (master)

#13

Fix proposed to branch: master
Review: https://review.openstack.org/323440

Changed in fuel:
assignee:	Bulat Gaifullin (bgaifullin) → Vladimir Kuklin (vkuklin)

OpenStack Infra (hudson-openstack) on 2016-06-01

Changed in fuel:
assignee:	Vladimir Kuklin (vkuklin) → Bulat Gaifullin (bgaifullin)

OpenStack Infra (hudson-openstack) on 2016-06-01

Changed in fuel:
assignee:	Bulat Gaifullin (bgaifullin) → Vladimir Kuklin (vkuklin)

Vladimir Sharshov (vsharshov) on 2016-06-01

Changed in fuel:
assignee:	Vladimir Kuklin (vkuklin) → Vladimir Sharshov (vsharshov)

Revision history for this message

Alexey Shtokolov (ashtokolov) wrote on 2016-06-01:

#14

ETA: 6/07

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-02: Related fix proposed to fuel-qa (master)

#15

Related fix proposed to branch: master
Review: https://review.openstack.org/324808

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-03: Fix merged to fuel-library (master)

#16

Reviewed: https://review.openstack.org/323440
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=a7a6b04d3efa0218f5667eda02418793722b3aa0
Submitter: Jenkins
Branch: master

commit a7a6b04d3efa0218f5667eda02418793722b3aa0
Author: Vladimir Kuklin <email address hidden>
Date: Tue May 31 18:10:20 2016 +0300

Add fault tolerance to task groups

    This commit is a part of defining fault tolerance groups
    for deployment to allow task executor to detect whether
    we should stop the deployment and exit in case of failure of
    tasks belonging to particular groups.

It allows a user to specify critical nodes (e.g. by running

Related to Change-Id I1969b953eca667c09248a6b67ffee37bfd20f474 and
Ica2a4ae64b4dfa4f7fccfbc95108d1412c40dc3f

Change-Id: Id866cd578c7c76dd5a1dfc43fb219e1c2ecd4abd
Partial-bug: #1435610

OpenStack Infra (hudson-openstack) on 2016-06-03

Changed in fuel:
assignee:	Vladimir Sharshov (vsharshov) → Bulat Gaifullin (bgaifullin)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-06: Fix proposed to fuel-library (master)

#17

Fix proposed to branch: master
Review: https://review.openstack.org/325886

Changed in fuel:
assignee:	Bulat Gaifullin (bgaifullin) → Vladimir Kuklin (vkuklin)

OpenStack Infra (hudson-openstack) on 2016-06-06

Changed in fuel:
assignee:	Vladimir Kuklin (vkuklin) → Bulat Gaifullin (bgaifullin)

OpenStack Infra (hudson-openstack) on 2016-06-06

Changed in fuel:
assignee:	Bulat Gaifullin (bgaifullin) → Vladimir Sharshov (vsharshov)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-06: Fix merged to fuel-web (master)

#18

Reviewed: https://review.openstack.org/323183
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=ebe80dc4ef0dc9d0d79216f787d2357c9a03fd1b
Submitter: Jenkins
Branch: master

commit ebe80dc4ef0dc9d0d79216f787d2357c9a03fd1b
Author: Bulat Gaifullin <email address hidden>
Date: Mon May 30 19:51:18 2016 +0300

Added fault_tolerance_group to deployment metadata

    This property contains list of groups, that is built from
    tasks with type 'group' and each task may contain property
    fault_tolerance, that shall be moved from openstack.yaml
    to deployment tasks.
    For plugins this attribute is filled from roles_metadata
    for all tasks with type group (for backward compatibility).

    DocImpact
    Partial-Bug: 1435610
    Change-Id: I1969b953eca667c09248a6b67ffee37bfd20f474

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-06: Fix proposed to fuel-web (master)

#19

Fix proposed to branch: master
Review: https://review.openstack.org/326086

Changed in fuel:
assignee:	Vladimir Sharshov (vsharshov) → Bulat Gaifullin (bgaifullin)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-06: Fix proposed to fuel-web (stable/mitaka)

#20

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/326088

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Fix merged to fuel-library (master)

#21

Reviewed: https://review.openstack.org/325886
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=54002b3308f44a1219e80df2d4f803cda4668df3
Submitter: Jenkins
Branch: master

commit 54002b3308f44a1219e80df2d4f803cda4668df3
Author: Vladimir Kuklin <email address hidden>
Date: Tue May 31 18:10:20 2016 +0300

Adjust fault tolerance for task groups to zero tolerance for critical roles

Set fault tolerance to 0 for critical deployment groups

Related to Change-Id I1969b953eca667c09248a6b67ffee37bfd20f474 and
Ica2a4ae64b4dfa4f7fccfbc95108d1412c40dc3f

Change-Id: I5197adc796603dfb40cf1efa57427344b358d353
Partial-bug: #1435610

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Fix proposed to fuel-library (stable/mitaka)

#22

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/326317

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Related fix proposed to fuel-qa (stable/mitaka)

#23

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/326447

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Related fix merged to fuel-qa (master)

#24

Reviewed: https://review.openstack.org/324808
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=df1a70a4ce6af9944e8925123f5f75e0934ae201
Submitter: Jenkins
Branch: master

commit df1a70a4ce6af9944e8925123f5f75e0934ae201
Author: Alexander Kurenyshev <email address hidden>
Date: Thu Jun 2 17:56:53 2016 +0300

Add new check for Operational cluster status

    We have a new behaviour when deployment task
    will be in a ready state even when some
    non-important nodes are in an Error state.
    This path adds check for cluster that it's
    in Operational state

Related-Bug: #1435610
Change-Id: I53175e4a84f2fbeedc056e39f2976c5f1a690fc1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Fix merged to fuel-astute (master)

#25

Reviewed: https://review.openstack.org/320605
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=5a9f87c08062d3f0a23116b1a339da3252a69f24
Submitter: Jenkins
Branch: master

commit 5a9f87c08062d3f0a23116b1a339da3252a69f24
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Tue May 24 20:46:30 2016 +0300

Gracefully stop if tolerance limit exceeded

Several changes:

    - support fault tolerance group;
    - support internal stop deployment instead of raise in
      case of error;
    - do not show last run summary debug report from mcollective;
    - fix support of detecting offline nodes before run deployment;
    - support fail on error behavior.

Support fault tolerance group

      Nailgun send fault tolerance group which inform Astute about
      available number of error nodes in this deployment and importance
      of every node in this task.

If number of error exceeds number of available errors, deployment
will stop.

Support internal stop deployment instead of raise in case of error

Before this change Astute is end processing, marks all nodes
as error and do not waiting of puppet process on nodes.

      Now we use same way that used in case of stop deployment.
      Mark failed nodes as error, another nodes as skipped(stopped),
      ready nodes as ready. Also Astute will wait before current
      tasks end.

Do not show last run summary debug report from mcollective

For now moment it not so useful, but quickly filled log file
and difficult debug process

Fix support of detecting offline nodes before run deployment

      Astute gets response from mcollective to detect node availability.
      If node do not respond, it will mark as failed. It also support
      fault tollerance mechanism

Support fail on error behavior

From this moment task which setup fail_on_error if false,
task marks as skipped instead of failed in case of error.

Change-Id: Ica2a4ae64b4dfa4f7fccfbc95108d1412c40dc3f
Closes-Bug: #1435610

Reviewed:  https://review.openstack.org/320605
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=5a9f87c08062d3f0a23116b1a339da3252a69f24
Submitter: Jenkins
Branch:    master

commit 5a9f87c08062d3f0a23116b1a339da3252a69f24
Author: Vladimir Sharshov (warpc) <vsharshov@mirantis.com>
Date:   Tue May 24 20:46:30 2016 +0300

Gracefully stop if tolerance limit exceeded
    
    Several changes:
    
    - support fault tolerance group;
    - support internal stop deployment instead of raise in
      case of error;
    - do not show last run summary debug report from mcollective;
    - fix support of detecting offline nodes before run deployment;
    - support fail on error behavior.
    
    Support fault tolerance group
    
      Nailgun send fault tolerance group which inform Astute about
      available number of error nodes in this deployment and importance
      of every node in this task.
    
    If number of error exceeds number of available errors, deployment
    will stop.
    
      Support internal stop deployment instead of raise in case of error
    
      Before this change Astute is end processing, marks all nodes
      as error and do not waiting of puppet process on nodes.
    
      Now we use same way that used in case of stop deployment.
      Mark failed nodes as error, another nodes as skipped(stopped),
      ready nodes as ready. Also Astute will wait before current
      tasks end.
    
    Do not show last run summary debug report from mcollective
    
      For now moment it not so useful, but quickly filled log file
      and difficult debug process
    
    Fix support of detecting offline nodes before run deployment
    
      Astute gets response from mcollective to detect node availability.
      If node do not respond, it will mark as failed. It also support
      fault tollerance mechanism
    
    Support fail on error behavior
    
      From this moment task which setup fail_on_error if false,
      task marks as skipped instead of failed in case of error.
    
    Change-Id: Ica2a4ae64b4dfa4f7fccfbc95108d1412c40dc3f
    Closes-Bug: #1435610

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Fix proposed to fuel-astute (stable/mitaka)

#26

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/326485

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Fix merged to fuel-library (stable/mitaka)

#27

Reviewed: https://review.openstack.org/326317
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=5e79256424fd82f39506f13a4313cc985765d9d5
Submitter: Jenkins
Branch: stable/mitaka

commit 5e79256424fd82f39506f13a4313cc985765d9d5
Author: Vladimir Kuklin <email address hidden>
Date: Tue May 31 18:10:20 2016 +0300

Adjust fault tolerance for task groups to zero tolerance for critical roles

Set fault tolerance to 0 for critical deployment groups

Related to Change-Id I1969b953eca667c09248a6b67ffee37bfd20f474 and
Ica2a4ae64b4dfa4f7fccfbc95108d1412c40dc3f

Change-Id: I5197adc796603dfb40cf1efa57427344b358d353
Partial-bug: #1435610

tags:

added: in-stable-mitaka

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Fix merged to fuel-web (stable/mitaka)

#28

Reviewed: https://review.openstack.org/326088
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=66b1609df3b3a80a0e9c04ec392a1f4f608601a6
Submitter: Jenkins
Branch: stable/mitaka

commit 66b1609df3b3a80a0e9c04ec392a1f4f608601a6
Author: Bulat Gaifullin <email address hidden>
Date: Mon May 30 19:51:18 2016 +0300

Added fault_tolerance_group to deployment metadata

    This property contains list of groups, that is built from
    tasks with type 'group' and each task may contain property
    fault_tolerance, that shall be moved from openstack.yaml
    to deployment tasks.
    For plugins this attribute is filled from roles_metadata
    for all tasks with type group (for backward compatibility).

    DocImpact
    Partial-Bug: 1435610
    Change-Id: I1969b953eca667c09248a6b67ffee37bfd20f474

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Related fix merged to fuel-qa (stable/mitaka)

#29

Reviewed: https://review.openstack.org/326447
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=df643556114bbf767439cf3d98fa33b615987e50
Submitter: Jenkins
Branch: stable/mitaka

commit df643556114bbf767439cf3d98fa33b615987e50
Author: Alexander Kurenyshev <email address hidden>
Date: Thu Jun 2 17:56:53 2016 +0300

Add new check for Operational cluster status

    We have a new behaviour when deployment task
    will be in a ready state even when some
    non-important nodes are in an Error state.
    This path adds check for cluster that it's
    in Operational state

Related-Bug: #1435610
Change-Id: I53175e4a84f2fbeedc056e39f2976c5f1a690fc1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Fix merged to fuel-web (master)

#30

Reviewed: https://review.openstack.org/326086
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=e66e5f3197ec1b3f641ace2c135c284013b26e75
Submitter: Jenkins
Branch: master

commit e66e5f3197ec1b3f641ace2c135c284013b26e75
Author: Bulat Gaifullin <email address hidden>
Date: Mon Jun 6 21:39:17 2016 +0300

Reworked calculate_fault_tolerance to make it more clear

Change-Id: I1a4dd0985ce0d00ef9ed39d7e3fd7895212ba012
Partial-Bug: 1435610

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-07: Fix merged to fuel-astute (stable/mitaka)

#31

Reviewed: https://review.openstack.org/326485
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=1a8e53cb5bf41b7afc2333cef91b162455ebe1f9
Submitter: Jenkins
Branch: stable/mitaka

commit 1a8e53cb5bf41b7afc2333cef91b162455ebe1f9
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Tue May 24 20:46:30 2016 +0300

Gracefully stop if tolerance limit exceeded

Several changes:

    - support fault tolerance group;
    - support internal stop deployment instead of raise in
      case of error;
    - do not show last run summary debug report from mcollective;
    - fix support of detecting offline nodes before run deployment;
    - support fail on error behavior.

Support fault tolerance group

      Nailgun send fault tolerance group which inform Astute about
      available number of error nodes in this deployment and importance
      of every node in this task.

If number of error exceeds number of available errors, deployment
will stop.

Support internal stop deployment instead of raise in case of error

Before this change Astute is end processing, marks all nodes
as error and do not waiting of puppet process on nodes.

      Now we use same way that used in case of stop deployment.
      Mark failed nodes as error, another nodes as skipped(stopped),
      ready nodes as ready. Also Astute will wait before current
      tasks end.

Do not show last run summary debug report from mcollective

For now moment it not so useful, but quickly filled log file
and difficult debug process

Fix support of detecting offline nodes before run deployment

      Astute gets response from mcollective to detect node availability.
      If node do not respond, it will mark as failed. It also support
      fault tollerance mechanism

Support fail on error behavior

From this moment task which setup fail_on_error if false,
task marks as skipped instead of failed in case of error.

    Change-Id: Ica2a4ae64b4dfa4f7fccfbc95108d1412c40dc3f
    Closes-Bug: #1435610
    (cherry picked from commit 5a9f87c08062d3f0a23116b1a339da3252a69f24)

Reviewed:  https://review.openstack.org/326485
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=1a8e53cb5bf41b7afc2333cef91b162455ebe1f9
Submitter: Jenkins
Branch:    stable/mitaka

commit 1a8e53cb5bf41b7afc2333cef91b162455ebe1f9
Author: Vladimir Sharshov (warpc) <vsharshov@mirantis.com>
Date:   Tue May 24 20:46:30 2016 +0300

Gracefully stop if tolerance limit exceeded
    
    Several changes:
    
    - support fault tolerance group;
    - support internal stop deployment instead of raise in
      case of error;
    - do not show last run summary debug report from mcollective;
    - fix support of detecting offline nodes before run deployment;
    - support fail on error behavior.
    
    Support fault tolerance group
    
      Nailgun send fault tolerance group which inform Astute about
      available number of error nodes in this deployment and importance
      of every node in this task.
    
    If number of error exceeds number of available errors, deployment
    will stop.
    
      Support internal stop deployment instead of raise in case of error
    
      Before this change Astute is end processing, marks all nodes
      as error and do not waiting of puppet process on nodes.
    
      Now we use same way that used in case of stop deployment.
      Mark failed nodes as error, another nodes as skipped(stopped),
      ready nodes as ready. Also Astute will wait before current
      tasks end.
    
    Do not show last run summary debug report from mcollective
    
      For now moment it not so useful, but quickly filled log file
      and difficult debug process
    
    Fix support of detecting offline nodes before run deployment
    
      Astute gets response from mcollective to detect node availability.
      If node do not respond, it will mark as failed. It also support
      fault tollerance mechanism
    
    Support fail on error behavior
    
      From this moment task which setup fail_on_error if false,
      task marks as skipped instead of failed in case of error.
    
    Change-Id: Ica2a4ae64b4dfa4f7fccfbc95108d1412c40dc3f
    Closes-Bug: #1435610
    (cherry picked from commit 5a9f87c08062d3f0a23116b1a339da3252a69f24)

Maksym Strukov (unbelll) on 2016-06-21

tags:

added: on-verification

Revision history for this message

Maksym Strukov (unbelll) wrote on 2016-06-21:

#32

Verified as fixed in 9.0-mos-490

tags:

removed: on-verification

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Committed	High	Bulat Gaifullin	Fuel for OpenStack 10.0
	Mitaka	Fix Released	High	Vladimir Sharshov	Fuel for OpenStack 9.0

Fuel for OpenStack

Fault Tolerance is broken in Task-based Deployment

Bug Description

Duplicates of this bug

Other bug subscribers

Related blueprints

Remote bug watches