astute erroneously decides that puppet on nodes has hung and aborts the deployment

Bug #1385079 reported by Alexei Sheplyakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
Medium
Alexei Sheplyakov
6.0.x
Invalid
Medium
Alexei Sheplyakov
6.1.x
Invalid
Medium
Alexei Sheplyakov

Bug Description

Environment: Icehouse on Ubuntu 14.04

For some reason astute thinks that puppet agent on the (controller) node has hung:

2014-10-23T11:59:15 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'uploadfile', method 'upload', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:msg=>"File was uploaded!"}}
2014-10-23T11:59:15 info: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: Required attrs/metadata passed via facts extension. Starting deployment.
2014-10-23T11:59:15 debug: [429] Waiting for puppet to finish deployment on all
                           nodes (timeout = 5400 sec)...
2014-10-23T11:59:16 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "restarted"=>0, "out_of_sync"=>0}, :changes=>nil, :events=>nil, :version=>nil, :status=>"idling", :running=>0, :enabled=>1, :idling=>1, :stopped=>0, :lastrun=>0, :runtime=>1414061955, :output=>"Currently idling; last completed run 1414061955 seconds ago"}}
2014-10-23T11:59:16 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "restarted"=>0, "out_of_sync"=>0}, :changes=>nil, :events=>nil, :version=>nil, :status=>"idling", :running=>0, :enabled=>1, :idling=>1, :stopped=>0, :lastrun=>0, :runtime=>1414061956, :output=>"Currently idling; last completed run 1414061956 seconds ago"}}
2014-10-23T11:59:46 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "restarted"=>0, "out_of_sync"=>0}, :changes=>nil, :events=>nil, :version=>nil, :status=>"idling", :running=>0, :enabled=>1, :idling=>1, :stopped=>0, :lastrun=>0, :runtime=>1414061986, :output=>"Currently idling; last completed run 1414061986 seconds ago"}}
2014-10-23T12:00:16 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "restarted"=>0, "out_of_sync"=>0}, :changes=>nil, :events=>nil, :version=>nil, :status=>"idling", :running=>0, :enabled=>1, :idling=>1, :stopped=>0, :lastrun=>0, :runtime=>1414062016, :output=>"Currently idling; last completed run 1414062016 seconds ago"}}
2014-10-23T12:00:46 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "restarted"=>0, "out_of_sync"=>0}, :changes=>nil, :events=>nil, :version=>nil, :status=>"idling", :running=>0, :enabled=>1, :idling=>1, :stopped=>0, :lastrun=>0, :runtime=>1414062046, :output=>"Currently idling; last completed run 1414062046 seconds ago"}}
2014-10-23T12:01:16 debug: [429] puppetd_runonce completed within 120 seconds.
2014-10-23T12:01:16 warning: [429] Following nodes have puppet hung: '3'
2014-10-23T12:01:16 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "restarted"=>0, "out_of_sync"=>0}, :changes=>nil, :events=>nil, :version=>nil, :status=>"idling", :running=>0, :enabled=>1, :idling=>1, :stopped=>0, :lastrun=>0, :runtime=>1414062076, :output=>"Currently idling; last completed run 1414062076 seconds ago"}}
2014-10-23T12:01:16 debug: [429] Nodes statuses: {"succeed"=>[], "error"=>["3"], "running"=>[]}
2014-10-23T12:01:16 debug: [429] Puppet on node "3" will be restarted. 1 retries remained.

After several attempts astute aborts the deployment:

2014-10-23T12:04:48 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "restarted"=>0, "out_of_sync"=>0}, :changes=>nil, :events=>nil, :version=>nil, :status=>"idling", :running=>0, :enabled=>1, :idling=>1, :stopped=>0, :lastrun=>0, :runtime=>1414062288, :output=>"Currently idling; last completed run 1414062288 seconds ago"}}
2014-10-23T12:05:18 debug: [429] puppetd_runonce completed within 120 seconds.
2014-10-23T12:05:18 warning: [429] Following nodes have puppet hung: '3'
2014-10-23T12:05:20 debug: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: MC agent 'puppetd', method 'last_run_summary', results: {:sender=>"3", :statuscode=>0, :statusmsg=>"OK", :data=>{:time=>{"last_run"=>0}, :resources=>{"failed"=>1, "changed"=>0, "total"=>0, "restarted"=>0, "out_of_sync"=>0}, :changes=>nil, :events=>nil, :version=>nil, :status=>"idling", :running=>0, :enabled=>1, :idling=>1, :stopped=>0, :lastrun=>0, :runtime=>1414062320, :output=>"Currently idling; last completed run 1414062320 seconds ago"}}
2014-10-23T12:05:20 debug: [429] Nodes statuses: {"succeed"=>[], "error"=>["3"], "running"=>[]}
2014-10-23T12:05:20 debug: [429] Node "3" has failed to deploy. There is no more retries for puppet run.
2014-10-23T12:05:20 debug: [429] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"3", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}
2014-10-23T12:05:20 debug: [429] Data send by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"3", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}
2014-10-23T12:05:20 info: [429] Casting message to Nailgun: {"method"=>"deploy_resp", "args"=>{"task_uuid"=>"ce4d0357-a19b-4aff-9d13-43b2ef37008d", "nodes"=>[{"uid"=>"3", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}}
2014-10-23T12:05:20 info: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: Spent 364.952956756 seconds on puppet run for following nodes(uids): 3
2014-10-23T12:05:20 info: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: Finished deployment of nodes => roles: [{"3"=>"controller"}]
2014-10-23T12:05:20 warning: [429] ce4d0357-a19b-4aff-9d13-43b2ef37008d: Critical nodes with uids: 3 fail to deploy. Stop deployment

However logging into the node and running

LC_ALL=C puppet apply --verbose --debug /etc/puppet/manifests/site.pp

yields no errors.

Tags: astute
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :
Mike Scherbakov (mihgen)
Changed in mos:
assignee: Fuel for Openstack (fuel) → Fuel Library Team (fuel-library)
Changed in mos:
milestone: none → 6.0
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The code to build Fuel 5.1.1 ISO with Ubuntu 14.04 is available here:

https://github.com/asheplyakov/fuel-main
https://github.com/asheplyakov/fuel-library
https://github.com/asheplyakov/fuel-astute
https://github.com/asheplyakov/fuel-web

Use the `ubuntu14_5.1' branch.
The ready to use ISO is being uploaded to public storage, I'll post the link as soon as the upload completes.
The test environment (fuel master node + 2 nodes) is available upon a request.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The Fuel 5.1 with Ubuntu 14.04 ISO can be found here: https://drive.google.com/file/d/0B74GIIFE6F3XWVlGa01ZOUtYRkk/edit

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Looks like a duplicate of bug #1384332.

tags: added: astute
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

I don't think this bug is similar to #1384332. Puppet doesn't report any errors (in fact puppet log is empty).

Dima Shulyak (dshulyak)
Changed in mos:
assignee: Fuel Library Team (fuel-library) → Dima Shulyak (dshulyak)
Revision history for this message
Dima Shulyak (dshulyak) wrote :

The problem with hung mcollective is that puppet agent conflicts wtih current puppet workflow, therefore it was disabled for current puppet version.

root@node-1:~# cat /etc/default/puppet
# Defaults for puppet - sourced by /etc/init.d/puppet

# Enable puppet agent service?
# Setting this to "yes" allows the puppet agent service to run.
# Setting this to "no" keeps the puppet agent service from running.
START=no

# Startup options
DAEMON_OPTS=""

Also there might be other differences in 3.4.3 (14.04) and 3.4.2 (12.04) versions of puppet packages

Changed in mos:
assignee: Dima Shulyak (dshulyak) → MOS Linux (mos-linux)
status: Confirmed → Triaged
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Guys, hi! As i know, problem with package was solved.I think we can close this bug as fix commited.

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

In fact I'm still experiencing this problem (with Ubuntu 14.04)

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

seems that issue was reproduced on our CI - puppet hungs on node-3 and deployment was marked as failed

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Dmitry Ukov (dukov) wrote :

I saw this problem on our deployments. And we were able to determine the flow.
1. Astute starts puppet
2. Every 2 seconds it checks puppet status: pid file exists, procces presented in process list, lock file exists and non-empty
3. Sometimes (especially when environment is slow) puppet pid-file appears before process shows in process list. So if pid-file extists and process is not presented in process table, mcollective agent removes pid file (process appears in process list a few seconds later).
4. Sleep 2 seconds.
5. On next iteration Astute checks status again: pid file exists, procces presented in process list, lock file exists and non-empty
6. Pid file does not exists (removed in step 3), puppet process presented in process table. Conclusion: puppet hangs. Astute marks deployment as failed

We've solved this issue by increasing sleep time for astute
conf[:PUPPET_DEPLOY_INTERVAL] = 15

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

This issue was reproduced again on CI during build verification tests:

http://jenkins-product.srt.mirantis.net:8080/job/6.0.ubuntu.bvt_2/122/

Here is the part of Astute's logs:

http://paste.openstack.org/show/139424/

Diagnostic snapshot is attached.

Revision history for this message
Dmitry Ukov (dukov) wrote :

Update :
to prevent puppet from running on node start up we have changes cobbler snippet template (fuel-library/deployment/puppet/cobbler/templates/snippets/ubuntu_puppet_config.erb)
add following line to this template

in-target update-rc.d puppet disable && \

Changed in mos:
milestone: 6.0 → 6.1
Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

The bug is caused by

1) mcollective shipped with Ubuntu 14.04 is unable to communicate via rabbitmq
2) puppet agent is enabled by default in Ubuntu 14.04

The 1st problem has been solved by updating mcollective (to version 2.3.1), the second one has been addressed by fuel-library

Revision history for this message
Alexei Sheplyakov (asheplyakov) wrote :

Fuel 6.0.x is not supposed to deploy Ubuntu 14.04, setting bug status as Invalid for MOS 6.0.x

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.