Reproduced on scale lab. We have code in astute which handles situation, when some nodes fail to provision - and we can still continue deployment. This code seems to be working with classic provisioning, but not with IBP.
This bug doesn't cover WHY provisioning failed for some of the nodes.
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "6.1"
api: "1.0"
build_number: "233"
build_id: "2015-03-26_21-32-43"
nailgun_sha: "b163f6fc77d6639aaffd9dd992e1ad96951c3bbf"
python-fuelclient_sha: "e5e8389d8d481561a4d7107a99daae07c6ec5177"
astute_sha: "3f1ece0318e5e93eaf48802fefabf512ca1dce40"
fuellib_sha: "9c7716bc2ce6075065d7d9dcf96f4c94662c0b56"
ostf_sha: "a4cf5f218c6aea98105b10c97a4aed8115c15867"
fuelmain_sha: "320b5f46fc1b2798f9e86ed7df51d3bda1686c10"
Exception in astute:
2015-03-31T16:23:01 err: [538] 64325f39-e53c-438d-aaf8-b5f44c139688: Provision command returned non zero exit
code on node: 32
2015-03-31T16:23:01 err: [538] 64325f39-e53c-438d-aaf8-b5f44c139688: At least one of nodes have failed during
provisioning
2015-03-31T16:23:01 err: [538] Error occured while provisioning: #<Astute::FailedImageProvisionError: At leas
t one of nodes have failed during provisioning>
2015-03-31T16:23:01 info: [538] Casting message to Nailgun: {"method"=>"provision_resp", "args"=>{"task_uuid"
=>"64325f39-e53c-438d-aaf8-b5f44c139688", "status"=>"error", "progress"=>100, "msg"=>"At least one of nodes h
ave failed during provisioning", "error_type"=>"provision"}}
2015-03-31T16:23:01 info: [538] Casting message to Nailgun: {"method"=>"provision_resp", "args"=>{"task_uuid"
=>"64325f39-e53c-438d-aaf8-b5f44c139688", "status"=>"error", "error"=>"At least one of nodes have failed duri
ng provisioning", "progress"=>100}}
2015-03-31T16:23:10 debug: [538] 64325f39-e53c-438d-aaf8-b5f44c139688: MC agent 'execute_shell_command', meth
od 'execute', results: {:sender=>"16", :statuscode=>0, :statusmsg=>"OK", :data=>{:stdout=>"", :exit_code=>0,
:stderr=>""}}
2015-03-31T16:23:10 debug: [538] Unlock discovery for failed nodes. Result: [{"uid"=>"16", "exit code"=>0}]
2015-03-31T16:23:10 err: [538] Error running provisioning: At least one of nodes have failed during provision
ing, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:116:in `image_provision'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:245:in `provision_piece'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:148:in `block (3 levels) in provision_and_watch_progress'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:301:in `call'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:301:in `sleep_not_greater_than'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:142:in `block (2 levels) in provision_and_watch_progress'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:141:in `loop'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:141:in `block in provision_and_watch_progress'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:140:in `catch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:140:in `provision_and_watch_progress'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/provision.rb:54:in `provision'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/orchestrator.rb:79:in `provision'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:50:in `provision'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/dispatcher.rb:37:in `image_provision'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:142:in `dispatch_message'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:103:in `block in dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `call'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:64:in `block in each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/task_queue.rb:56:in `each'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `each_with_index'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:101:in `dispatch'",
"/usr/lib64/ruby/gems/2.1.0/gems/astute-6.0.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]
2015-03-31T16:23:10 debug: [538] Dispatching aborted by image_provision
It was initially investigated together with Lukasz. Should be easy to fix.
Marked as Critical, as scale lab is blocked by this issue, regardless of the fact that the original issue is in provisioning (whether in code or infrastructure). Fuel has to handle failures properly.