IBP fails deployment if one of the nodes failed provisioning

Bug #1438844 reported by Mike Scherbakov
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Łukasz Oleś
6.0.x
Invalid
Medium
Łukasz Oleś

Bug Description

Reproduced on scale lab. We have code in astute which handles situation, when some nodes fail to provision - and we can still continue deployment. This code seems to be working with classic provisioning, but not with IBP.
This bug doesn't cover WHY provisioning failed for some of the nodes.

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "233"
  build_id: "2015-03-26_21-32-43"
  nailgun_sha: "b163f6fc77d6639aaffd9dd992e1ad96951c3bbf"
  python-fuelclient_sha: "e5e8389d8d481561a4d7107a99daae07c6ec5177"
  astute_sha: "3f1ece0318e5e93eaf48802fefabf512ca1dce40"
  fuellib_sha: "9c7716bc2ce6075065d7d9dcf96f4c94662c0b56"
  ostf_sha: "a4cf5f218c6aea98105b10c97a4aed8115c15867"
  fuelmain_sha: "320b5f46fc1b2798f9e86ed7df51d3bda1686c10"

Exception in astute:
2015-03-31T16:23:01 err: [538] 64325f39-e53c-438d-aaf8-b5f44c139688: Provision command returned non zero exit
 code on node: 32
2015-03-31T16:23:01 err: [538] 64325f39-e53c-438d-aaf8-b5f44c139688: At least one of nodes have failed during
 provisioning
2015-03-31T16:23:01 err: [538] Error occured while provisioning: #<Astute::FailedImageProvisionError: At leas
t one of nodes have failed during provisioning>
2015-03-31T16:23:01 info: [538] Casting message to Nailgun: {"method"=>"provision_resp", "args"=>{"task_uuid"
=>"64325f39-e53c-438d-aaf8-b5f44c139688", "status"=>"error", "progress"=>100, "msg"=>"At least one of nodes h
ave failed during provisioning", "error_type"=>"provision"}}
2015-03-31T16:23:01 info: [538] Casting message to Nailgun: {"method"=>"provision_resp", "args"=>{"task_uuid"
=>"64325f39-e53c-438d-aaf8-b5f44c139688", "status"=>"error", "error"=>"At least one of nodes have failed duri
ng provisioning", "progress"=>100}}
2015-03-31T16:23:10 debug: [538] 64325f39-e53c-438d-aaf8-b5f44c139688: MC agent 'execute_shell_command', meth
od 'execute', results: {:sender=>"16", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>"", :exit_code=>0,
:stderr=>""}}
2015-03-31T16:23:10 debug: [538] Unlock discovery for failed nodes. Result: [{"uid"=>"16", "exit code"=>0}]
2015-03-31T16:23:10 err: [538] Error running provisioning: At least one of nodes have failed during provision
ing, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:116:in `image_provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:245:in `provision_piece'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:148:in `block (3 levels) in provision_and_watch_progress'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:301:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:301:in `sleep_not_greater_than'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:142:in `block (2 levels) in provision_and_watch_progress'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:141:in `loop'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:141:in `block in provision_and_watch_progress'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:140:in `catch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:140:in `provision_and_watch_progress'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:54:in `provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:79:in `provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:50:in `provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

2015-03-31T16:23:10 debug: [538] Dispatching aborted by image_provision

It was initially investigated together with Lukasz. Should be easy to fix.

Tags: scale
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Marked as Critical, as scale lab is blocked by this issue, regardless of the fact that the original issue is in provisioning (whether in code or infrastructure). Fuel has to handle failures properly.

Changed in fuel:
importance: Undecided → Critical
Łukasz Oleś (loles)
Changed in fuel:
assignee: nobody → Łukasz Oleś (loles)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/169477

Revision history for this message
Andrey Maximov (maximov) wrote :

set Medium priority because IBP is experimental feature in 6.0.1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/169477
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=23cc4ca330c0af2d33bcce27f15f194779992d00
Submitter: Jenkins
Branch: master

commit 23cc4ca330c0af2d33bcce27f15f194779992d00
Author: Łukasz Oleś <email address hidden>
Date: Mon Mar 30 08:12:17 2015 +0200

    Do not fail if some images were not provisioned

    It is adding fault tolerance for image based provision method.
    Also all test were changed to use image based provision method.

    Change-Id: Id933fc589b132ccb09487b2924b49efd3f714148
    Closes-bug: 1438844

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Łukasz Oleś (loles) wrote :

This bug does not exist in 6.0.1. The buggy code was introduced in 6.1

Revision history for this message
Leontii Istomin (listomin) wrote :

Hasn't been reproduced at least with 425, 497, 511 and 521 builds

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.