Execution expired on non-primary controller

Bug #1349684 reported by Mike Scherbakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Vladimir Sharshov

Bug Description

  release: "5.1"
  api: "1.0"
  build_number: "369"
  build_id: "2014-07-29_02-01-14"
  astute_sha: "aa5aed61035a8dc4035ab1619a8bb540a7430a95"
  fuellib_sha: "851b29ac6434c1671078ef293bb25a04dc492f49"
  ostf_sha: "9c0454b2197756051fc9cee3cfd856cf2a4f0875"
  nailgun_sha: "98ea1ce54f4d084e2533c7ea23aa98551f955ec5"
  fuelmain_sha: "2379783517e4830868f66b5fbab512eec4695679"

With enabled patch https://review.openstack.org/#/c/104267/6 , which makes parallel run of 2nd and 3rd controllers, BVT test failed: http://jenkins-product.srt.mirantis.net:8080/job/fuel_master.ubuntu.bvt_2/208/.

less 10.108.0.2/var/log/docker-logs/astute/astute.log:
2014-07-29T00:47:40 debug: [372] f738b612-c864-425a-a99f-f70d01cebcdb: MC agent 'puppetd', method 'last_run_
summary', results: {:sender=>"2", :statuscode=>0, :statusmsg=>"OK", :data=>{:status=>"running", :changes=>ni
l, :runtime=>1406591259, :enabled=>1, :events=>nil, :idling=>0, :version=>nil, :stopped=>0, :time=>{"last_ru
n"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "out_of_sync"=>0, "restarted"=>0}, :lastrun=>0,
:running=>1, :output=>"Currently running; last completed run 1406591259 seconds ago"}}
2014-07-29T00:47:40 debug: [372] f738b612-c864-425a-a99f-f70d01cebcdb: MC agent 'puppetd', method 'last_run_
summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:idling=>0, :status=>"running",
:runtime=>1406591260, :stopped=>0, :resources=>{"changed"=>0, "failed"=>1, "total"=>0, "restarted"=>0, "out_
of_sync"=>0}, :lastrun=>0, :version=>nil, :time=>{"last_run"=>0}, :changes=>nil, :running=>1, :enabled=>1, :
events=>nil, :output=>"Currently running; last completed run 1406591260 seconds ago"}}
2014-07-29T00:47:40 debug: [372] Nodes statuses: {"succeed"=>[], "error"=>[], "running"=>["2", "3"]}
2014-07-29T00:47:41 debug: [372] Got progress for nodes: [{"uid"=>"2", "progress"=>4}, {"uid"=>"3", "progres
s"=>4}]
2014-07-29T00:47:41 debug: [372] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"
=>"2", "progress"=>4, "status"=>"deploying", "role"=>"controller"}, {"uid"=>"3", "progress"=>4, "status"=>"d
eploying", "role"=>"controller"}]}
2014-07-29T00:48:25 debug: [372] f738b612-c864-425a-a99f-f70d01cebcdb: MC agent 'puppetd', method 'last_run_
summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:idling=>0, :status=>"running",
:runtime=>1406591263, :stopped=>0, :resources=>{"changed"=>0, "failed"=>1, "total"=>0, "restarted"=>0, "out_
of_sync"=>0}, :lastrun=>0, :version=>nil, :time=>{"last_run"=>0}, :changes=>nil, :running=>1, :enabled=>1, :
events=>nil, :output=>"Currently running; last completed run 1406591263 seconds ago"}}
2014-07-29T00:48:25 debug: [372] f738b612-c864-425a-a99f-f70d01cebcdb: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"2", :statuscode=>5, :statusmsg=>"execution expired", :data=>{:changes=>nil, :events=>nil, :version=>nil, :time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "out_of_sync"=>0, "restarted"=>0}}}
2014-07-29T00:48:25 err: [372] MCollective call failed in agent 'puppetd', method 'last_run_summary', failed nodes:
ID: 2 - Reason: execution expired

2014-07-29T00:48:25 err: [372] Timeout of deployment is exceeded.

It is unclear:
1) why execution expired
2) why astute gave up with timeout error?

Revision history for this message
Mike Scherbakov (mihgen) wrote :
Revision history for this message
Mike Scherbakov (mihgen) wrote :

According to logs, puppet continued to run:
less node-2.test.domain.local/var/log/puppet.log
 "stop"
Mon Jul 28 23:49:15 +0000 2014 Service[p_rabbitmq-server](provider=pacemaker) (debug): LSO: {"node-3"=>"stop", "node-2"=>"stop", "node-1"=>"start"}
Mon Jul 28 23:49:15 +0000 2014 Service[p_rabbitmq-server](provider=pacemaker) (debug): STATUS IS: stopped

grep puppet node-2.test.domain.local/commands/ps.txt
root 3789 8.7 11.5 443104 236652 ? Rs 23:40 0:46 /usr/bin/ruby /usr/bin/puppet apply /etc/puppet/manifests/site.pp --logdest syslog --trace --no-report --debug --evaltrace --logdest /var/log/puppet.log

I could not find syslog for node-2, wondering where it is in snapshot...

description: updated
Revision history for this message
Mike Scherbakov (mihgen) wrote :
Download full text (3.1 KiB)

mcollective.log on node-2 has errors and exception:

2014-07-29T00:48:26.637543+01:00 debug: D, [2014-07-28T23:48:25.132097 #1103] DEBUG -- : rabbitmq.rb:64:in
`on_hbfire' Received heartbeat from stomp://mcollective@10.108.0.2:61613: receive_fire, curt1406591305.1319
2014-07-29T00:48:26.637543+01:00 debug: E, [2014-07-28T23:48:25.150784 #1103] ERROR -- : agent.rb:111:in `h
andlemsg' puppetd#last_run_summary failed: #<Class:0x7fbe518c5e58>: execution expired
2014-07-29T00:48:26.637543+01:00 debug: D, [2014-07-28T23:48:25.150920 #1103] DEBUG -- : rabbitmq.rb:66:in
`on_hbfire' Publishing heartbeat to stomp://mcollective@10.108.0.2:61613: send_fire, curt1406591305.15084last_sleep30.498820066452
2014-07-29T00:48:26.637543+01:00 debug: E, [2014-07-28T23:48:25.151117 #1103] ERROR -- : agent.rb:112:in `handlemsg' /usr/lib/ruby/1.8/timeout.rb:64:in `puppet_pid'
2014-07-29T00:48:26.637543+01:00 debug: /usr/share/mcollective/plugins/mcollective/agent/puppetd.rb:122:in `puppet_daemon_status'
2014-07-29T00:48:26.637543+01:00 debug: /usr/share/mcollective/plugins/mcollective/agent/puppetd.rb:100:in `set_status'
2014-07-29T00:48:26.637543+01:00 debug: /usr/share/mcollective/plugins/mcollective/agent/puppetd.rb:53:in `last_run_summary_action'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/rpc/agent.rb:88:in `send'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/rpc/agent.rb:88:in `handlemsg'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/agents.rb:126:in `dispatch'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/1.8/timeout.rb:67:in `timeout'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/agents.rb:125:in `dispatch'
ialize'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/agents.rb:121:in `new'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/agents.rb:121:in `dispatch'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/runner.rb:84:in `agentmsg'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/runner.rb:58:in `run'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/runner.rb:53:in `loop'
2014-07-29T00:48:26.637543+01:00 debug: /usr/lib/ruby/vendor_ruby/mcollective/runner.rb:53:in `run'
2014-07-29T00:48:26.637543+01:00 debug: /usr/sbin/mcollectived:52
2014-07-29T00:48:26.637543+01:00 debug: D, [2014-07-28T23:48:25.169685 #1103] DEBUG -- : pluginmanager.rb:83:in `[]' Returning cached plugin security_plugin with class MCollective::Security::Psk
2014-07-29T00:48:26.637751+01:00 debug: D, [2014-07-28T23:48:25.169904 #1103] DEBUG -- : pluginmanager.rb:83:in `[]' Returning cached plugin security_plugin with class MCollective::Security::Psk

....
2014-07-29T00:48:26.637751+01:00 debug: D, [2014-07-28T23:48:25.170218 #1103] DEBUG -- : rabbitmq.rb:258:in `publish' Sending a broadcast message to RabbitMQ target '/reply-queue/amq.gen-hleQms2YqpE1GApozUUEOg' with headers '{"expi...

Read more...

Changed in fuel:
importance: Undecided → High
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

no reproducer so far

Changed in fuel:
status: New → Incomplete
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

From the snapshot log /10.108.0.2/var/log/docker-logs/mcollective.log :

E, [2014-07-29T00:15:12.942800 #408] ERROR -- : rabbitmq.rb:50:in `on_hbread_fail' Heartbeat read failed from 'stomp://mcollective@10.108.0.2:61613': {"ticker_interval"=>29.5, "read_fail_count"=>0, "lock_fail"=>true, "lock_fail_count"=>1}

So this bug is duplicate of https://bugs.launchpad.net/fuel/+bug/1356954

Changed in fuel:
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.