I've got the same problems on 4.1 ISOs sporadically on bootstrap stage during pre-deployment network verification. The last time 2 of 4 nodes were reporting problems with "net_probe" Mcollectiva agent, here is an example from naily.log:
2014-01-23T16:53:10 debug: [9064] 085e75e9-8d1c-4cc5-9756-a1e3aed2f30e: MC agent 'net_probe', method 'start_frame_listeners', results: {:sender=>"5", :statuscode=>5, :statusmsg=>"Transport endpoint is not connected", :data=>{}}
2014-01-23T16:53:10 err: [9064] MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes:
ID: 5 - Reason: Transport endpoint is not connected
2014-01-23T16:53:10 err: [9064] Error running RPC method verify_networks: 085e75e9-8d1c-4cc5-9756-a1e3aed2f30e: MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes:
ID: 5 - Reason: Transport endpoint is not connected
, trace: ["/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/mclient.rb:114:in `check_results_with_retries'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/mclient.rb:60:in `method_missing'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:79:in `block in start_frame_listeners'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:72:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:72:in `start_frame_listeners'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:40:in `check_network'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/orchestrator.rb:157:in `verify_networks'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:108:in `verify_networks'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:103:in `dispatch_message'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:70:in `block in dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each_with_index'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:38:in `block (2 levels) in server_loop'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:49:in `block (2 levels) in consume_one'"]
2014-01-23T16:53:10 info: [9064] Casting message to fuel: {"method"=>"verify_networks_resp", "args"=>{"task_uuid"=>"085e75e9-8d1c-4cc5-9756-a1e3aed2f30e", "status"=>"error", "error"=>"Error occurred while running method 'verify_networks'. Inspect Orchestrator logs for the details."}}
I've checked all 4 bootstrapped nodes and found processes like these ones on 2 of them:
I've got the same problems on 4.1 ISOs sporadically on bootstrap stage during pre-deployment network verification. The last time 2 of 4 nodes were reporting problems with "net_probe" Mcollectiva agent, here is an example from naily.log:
2014-01-23T16:53:10 debug: [9064] 085e75e9- 8d1c-4cc5- 9756-a1e3aed2f3 0e: MC agent 'net_probe', method 'start_ frame_listeners ', results: {:sender=>"5", :statuscode=>5, :statusmsg= >"Transport endpoint is not connected", :data=>{}} frame_listeners ', failed nodes: 8d1c-4cc5- 9756-a1e3aed2f3 0e: MCollective call failed in agent 'net_probe', method 'start_ frame_listeners ', failed nodes: rbenv/versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ astute- 0.0.2/lib/ astute/ mclient. rb:114: in `check_ results_ with_retries' ", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ astute- 0.0.2/lib/ astute/ mclient. rb:60:in `method_missing'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ astute- 0.0.2/lib/ astute/ network. rb:79:in `block in start_frame_ listeners' ", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ astute- 0.0.2/lib/ astute/ network. rb:72:in `each'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ astute- 0.0.2/lib/ astute/ network. rb:72:in `start_ frame_listeners '", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ astute- 0.0.2/lib/ astute/ network. rb:40:in `check_network'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ astute- 0.0.2/lib/ astute/ orchestrator. rb:157: in `verify_networks'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ naily-0. 1.0/lib/ naily/dispatche r.rb:108: in `verify_networks'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ naily-0. 1.0/lib/ naily/server. rb:103: in `dispatch_ message' ", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ naily-0. 1.0/lib/ naily/server. rb:70:in `block in dispatch'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ naily-0. 1.0/lib/ naily/server. rb:68:in `each'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ naily-0. 1.0/lib/ naily/server. rb:68:in `each_with_index'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ naily-0. 1.0/lib/ naily/server. rb:68:in `dispatch'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ naily-0. 1.0/lib/ naily/server. rb:38:in `block (2 levels) in server_loop'", "/opt/rbenv/ versions/ 1.9.3-p392/ lib/ruby/ gems/1. 9.1/gems/ naily-0. 1.0/lib/ naily/server. rb:49:in `block (2 levels) in consume_one'"] =>"verify_ networks_ resp", "args"= >{"task_ uuid"=> "085e75e9- 8d1c-4cc5- 9756-a1e3aed2f3 0e", "status"=>"error", "error"=>"Error occurred while running method 'verify_networks'. Inspect Orchestrator logs for the details."}}
2014-01-23T16:53:10 err: [9064] MCollective call failed in agent 'net_probe', method 'start_
ID: 5 - Reason: Transport endpoint is not connected
2014-01-23T16:53:10 err: [9064] Error running RPC method verify_networks: 085e75e9-
ID: 5 - Reason: Transport endpoint is not connected
, trace: ["/opt/
2014-01-23T16:53:10 info: [9064] Casting message to fuel: {"method"
I've checked all 4 bootstrapped nodes and found processes like these ones on 2 of them:
root 715 0.1 0.1 86620 22872 ? Sl 16:52 0:03 /usr/bin/ruby /usr/sbin/ mcollectived --pid=/ var/run/ mcollectived. pid --config= /etc/mcollectiv e/server. cfg mcollectived --pid=/ var/run/ mcollectived. pid --config= /etc/mcollectiv e/server. cfg net_probe. py -c /tmp/net_ probe20140123- 715-1co0tjp- 0
root 2898 0.0 0.1 84380 19260 ? S 17:03 0:00 \_ /usr/bin/ruby /usr/sbin/
root 2901 0.0 0.3 743688 52912 ? Sl 17:03 0:00 \_ /usr/bin/python /usr/bin/
Only after killing those "/usr/bin/ net_probe. py" processes on 2 problem nodes network verification started to work fine.