[update] Patching fail with message " MCollective agents didn't respond within the allotted time."

Bug #1359088 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Fuel Python (Deprecated)

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1"
  api: "1.0"
  build_number: "459"
  build_id: "2014-08-20_02-01-17"
  astute_sha: "efe3cb3668b9079e68fb1534fd4649ac45a344e1"
  fuellib_sha: "fa23adb05c58fdad5011a3ad806467eb3d883217"
  ostf_sha: "c6ecd0137b5d7c1576fa65baef0fc70f9a150daa"
  nailgun_sha: "36d27ff737b361f92093986d061bbfc1670bee45"
  fuelmain_sha: "365fc0bfe9d5e4ce38101d9158f66347bf32c310"

Steps to Reproduce:
1. Deploy centos simple neutron gre on 5-0-26 iso (release 2014.1)
2. Using 459 tarball upgrade master node
3. Navigate to action tab on ui and run update

Expected:
Update passed. Environment is ready. ostf passed

Actual Result
Update failed
In nailgun -agent on the node

2014-08-20T04:38:16.530256+00:00 notice: E, [2014-08-20T04:38:16.514288 #28382] ERROR -- : Connection refused - connect(2) (http://10.108.26.2:8000)
2014-08-20T04:38:16.530256+00:00 notice: /usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient/session.rb:803:in `initialize'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient/session.rb:803:in `new'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient/session.rb:803:in `create_socket'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient/session.rb:752:in `connect'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient/timeout.rb:131:in `timeout'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient/session.rb:751:in `connect'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient/session.rb:609:in `query'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient/session.rb:164:in `query'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient.rb:1080:in `do_get_block'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient.rb:884:in `do_request'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpclient.rb:978:in `protect_keep_alive_disconnected'/usr/lib/ruby/gems/1.8/gems/httpclient-2.3.2/lib/httpcli
2014-08-20T04:38:16.530556+00:00 notice: ent.rb:883:in `d

in mcollective agent
2014-08-19T13:32:24.147397+00:00 debug: E, [2014-08-19T13:32:20.175654 #1444] ERROR -- : rabbitmq.rb:50:in `on_hbread_fail' Heartbeat read failed from 'stomp://mcollective@10.108.26.2:61613': {"read_fail_count"=>0, "lock_fail_count"=>1, "ticker_interval"=>29.5, "lock_fail"=>true}

Tags: update
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

This error in nailgun agent log appeared after upgrade or during upgrade or this output got after manually running /opt/nailgun/bin/agent ?

Revision history for this message
Dima Shulyak (dshulyak) wrote :

in mcollective agent
2014-08-19T13:32:24.147397+00:00 debug: E, [2014-08-19T13:32:20.175654 #1444] ERROR -- : rabbitmq.rb:50:in `on_hbread_fail' Heartbeat read failed from 'stomp://mcollective@10.108.26.2:61613': {"read_fail_count"=>0, "lock_fail_count"=>1, "ticker_interval"=>29.5, "lock_fail"=>true}

this one related to initial deployment, not patching, and hb failure is casual stuff and handled properly by rabbitmq/mcollective

Maybe patching failed because overall execution was increased by 3 min of reconnections attemps?

2014-08-20T06:10:05 debug: [431] Retry #1 to run mcollective agent on nodes: '1'
2014-08-20T06:10:48 debug: [431] Retry #2 to run mcollective agent on nodes: '1'
2014-08-20T06:11:30 debug: [431] Retry #3 to run mcollective agent on nodes: '1'
2014-08-20T06:12:13 debug: [431] Retry #4 to run mcollective agent on nodes: '1'
2014-08-20T06:12:55 debug: [431] Retry #5 to run mcollective agent on nodes: '1'
2014-08-20T06:13:37 debug: [431] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"1", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}
2014-08-20T06:13:37 debug: [431] Data send by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"1", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}

Revision history for this message
Dima Shulyak (dshulyak) wrote :

According to logs patching was started at:

2014-08-20 05:53:30.244 DEBUG [7f01215a0740] (__init__) RPC cast to orchestrator:

nailgun-agent error fired at 04:38:16, maybe it is related somehow to snapshot/resume procedure? But i dont think this is related to patching

Changed in fuel:
status: New → Confirmed
Revision history for this message
Dima Shulyak (dshulyak) wrote :

I reproduced this behaviour with next procedure:

1.start deployment
2. wait after deployment will be started on slaves
3. find corresponding vm id on host system and virsh suspend it
4. wait until some time will pass and look at next logs
5. virsh resume it

2014-08-20 10:24:34 INFO
[417] Casting message to Nailgun: {"method"=>"deploy_resp", "args"=>{"task_uuid"=>"8b0fe7d8-2e4f-4509-8eea-1f9af1f347ee", "nodes"=>[{"uid"=>"12", "status"=>"error", "error_type"=>"deploy", "role"=>"compute"}]}}
2014-08-20 10:24:34 DEBUG
[417] Nodes statuses: {"succeed"=>[], "error"=>[], "running"=>["13"]}
2014-08-20 10:24:34 DEBUG
[417] Data send by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"12", "status"=>"error", "error_type"=>"deploy", "role"=>"compute"}]}
2014-08-20 10:24:34 DEBUG
[417] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"12", "status"=>"error", "error_type"=>"deploy", "role"=>"compute"}]}
2014-08-20 10:23:52 DEBUG
[417] Retry #5 to run mcollective agent on nodes: '12'
2014-08-20 10:23:09 DEBUG
[417] Retry #4 to run mcollective agent on nodes: '12'
2014-08-20 10:22:27 DEBUG
[417] Retry #3 to run mcollective agent on nodes: '12'
2014-08-20 10:21:44 DEBUG
[417] Retry #2 to run mcollective agent on nodes: '12'
2014-08-20 10:21:02 DEBUG
[417] Retry #1 to run mcollective agent on nodes: '12'

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.