Comment 9 for bug 1259935

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

I've got the same problems on 4.1 ISOs sporadically on bootstrap stage during pre-deployment network verification. The last time 2 of 4 nodes were reporting problems with "net_probe" Mcollectiva agent, here is an example from naily.log:

2014-01-23T16:53:10 debug: [9064] 085e75e9-8d1c-4cc5-9756-a1e3aed2f30e: MC agent 'net_probe', method 'start_frame_listeners', results: {:sender=>"5", :statuscode=>5, :statusmsg=>"Transport endpoint is not connected", :data=>{}}
2014-01-23T16:53:10 err: [9064] MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes:
ID: 5 - Reason: Transport endpoint is not connected
2014-01-23T16:53:10 err: [9064] Error running RPC method verify_networks: 085e75e9-8d1c-4cc5-9756-a1e3aed2f30e: MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes:
ID: 5 - Reason: Transport endpoint is not connected
, trace: ["/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/mclient.rb:114:in `check_results_with_retries'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/mclient.rb:60:in `method_missing'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:79:in `block in start_frame_listeners'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:72:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:72:in `start_frame_listeners'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:40:in `check_network'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/orchestrator.rb:157:in `verify_networks'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:108:in `verify_networks'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:103:in `dispatch_message'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:70:in `block in dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each_with_index'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:38:in `block (2 levels) in server_loop'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:49:in `block (2 levels) in consume_one'"]
2014-01-23T16:53:10 info: [9064] Casting message to fuel: {"method"=>"verify_networks_resp", "args"=>{"task_uuid"=>"085e75e9-8d1c-4cc5-9756-a1e3aed2f30e", "status"=>"error", "error"=>"Error occurred while running method 'verify_networks'. Inspect Orchestrator logs for the details."}}

I've checked all 4 bootstrapped nodes and found processes like these ones on 2 of them:

root 715 0.1 0.1 86620 22872 ? Sl 16:52 0:03 /usr/bin/ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid --config=/etc/mcollective/server.cfg
root 2898 0.0 0.1 84380 19260 ? S 17:03 0:00 \_ /usr/bin/ruby /usr/sbin/mcollectived --pid=/var/run/mcollectived.pid --config=/etc/mcollective/server.cfg
root 2901 0.0 0.3 743688 52912 ? Sl 17:03 0:00 \_ /usr/bin/python /usr/bin/net_probe.py -c /tmp/net_probe20140123-715-1co0tjp-0

Only after killing those "/usr/bin/net_probe.py" processes on 2 problem nodes network verification started to work fine.