Test for controller replacement fails by deployment timeout: 'rabbitmq.pp' task returns error on new controller

Bug #1519366 reported by Artem Panchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
Kyrylo Galanov

Bug Description

Fuel version info (8.0 build #200): http://paste.openstack.org/show/479824/

System test 'deploy_ha_neutron_tun_ctrl_replacement' fails, because deployment (removing old and adding new controller) takes more than 2 hours 10 minutes. There are errors in puppet logs on new controller node:

node-5 2015-11-24T02:00:21.171243 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) Could not evaluate: Command is still failing after 180 seconds expired!
node-5 2015-11-24T02:00:21.171823 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /etc/puppet/modules/rabbitmq/lib/puppet/provider/rabbitmqctl.rb:30:in `run_with_retries'
node-5 2015-11-24T02:00:21.172589 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /etc/puppet/modules/rabbitmq/lib/puppet/provider/rabbitmq_user/rabbitmqctl.rb:17:in `retry_rabbitmqctl'
node-5 2015-11-24T02:00:21.172589 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /etc/puppet/modules/rabbitmq/lib/puppet/provider/rabbitmq_user/rabbitmqctl.rb:71:in `exists?'
node-5 2015-11-24T02:00:21.172589 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/property/ensure.rb:81:in `retrieve'
node-5 2015-11-24T02:00:21.172589 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/type.rb:1035:in `retrieve'
node-5 2015-11-24T02:00:21.172589 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/type.rb:1063:in `retrieve_resource'
node-5 2015-11-24T02:00:21.172589 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:223:in `from_resource'
node-5 2015-11-24T02:00:21.173127 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:19:in `evaluate'
node-5 2015-11-24T02:00:21.174125 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction.rb:174:in `apply'
node-5 2015-11-24T02:00:21.174125 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction.rb:187:in `eval_resource'
node-5 2015-11-24T02:00:21.174125 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction.rb:117:in `call'
node-5 2015-11-24T02:00:21.174125 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction.rb:117:in `block (2 levels) in evaluate'
node-5 2015-11-24T02:00:21.174125 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/util.rb:327:in `block in thinmark'
node-5 2015-11-24T02:00:21.174125 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/1.9.1/benchmark.rb:295:in `realtime'
node-5 2015-11-24T02:00:21.174125 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/util.rb:326:in `thinmark'
node-5 2015-11-24T02:00:21.174125 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction.rb:117:in `block in evaluate'
node-5 2015-11-24T02:00:21.174309 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/graph/relationship_graph.rb:118:in `traverse'
node-5 2015-11-24T02:00:21.174309 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction.rb:108:in `evaluate'
node-5 2015-11-24T02:00:21.174309 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/resource/catalog.rb:164:in `block in apply'
node-5 2015-11-24T02:00:21.174309 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/util/log.rb:149:in `with_destination'
node-5 2015-11-24T02:00:21.175026 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/transaction/report.rb:108:in `as_logging_destination'
node-5 2015-11-24T02:00:21.176045 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/resource/catalog.rb:163:in `apply'
node-5 2015-11-24T02:00:21.176045 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/configurer.rb:125:in `block in apply_catalog'
node-5 2015-11-24T02:00:21.176045 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/util.rb:161:in `block in benchmark'
node-5 2015-11-24T02:00:21.176045 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/1.9.1/benchmark.rb:295:in `realtime'
node-5 2015-11-24T02:00:21.176045 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/util.rb:160:in `benchmark'
node-5 2015-11-24T02:00:21.176045 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/configurer.rb:124:in `apply_catalog'
node-5 2015-11-24T02:00:21.176045 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/configurer.rb:192:in `run'
node-5 2015-11-24T02:00:21.176045 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/application/apply.rb:268:in `apply_catalog'
node-5 2015-11-24T02:00:21.176234 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/application/apply.rb:218:in `main'
node-5 2015-11-24T02:00:21.176234 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/application/apply.rb:146:in `run_command'
node-5 2015-11-24T02:00:21.176234 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/application.rb:364:in `block (2 levels) in run'
node-5 2015-11-24T02:00:21.176234 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/application.rb:470:in `plugin_hook'
node-5 2015-11-24T02:00:21.176234 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/application.rb:364:in `block in run'
node-5 2015-11-24T02:00:21.178088 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/util.rb:478:in `exit_on_fail'
node-5 2015-11-24T02:00:21.178088 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/application.rb:364:in `run'
node-5 2015-11-24T02:00:21.178088 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/util/command_line.rb:137:in `run'
node-5 2015-11-24T02:00:21.178088 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/lib/ruby/vendor_ruby/puppet/util/command_line.rb:91:in `execute'
node-5 2015-11-24T02:00:21.178088 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) /usr/bin/puppet:4:in `<main>'

Steps to reproduce:

1. Create cluster with neutron vlan
2. Add 3 node with controller role
3. Add 1 node with compute
4. Deploy cluster
5. Remove one controller add new controller
6. Deploy changes

Expected result: cluster is deployed within 7800 sec (2h10m)

Actual result: cluster deployment took more than 7800 sec (2h10m), there are errors in puppet logs on new controller

Diagnostic snapshot is attached.

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :
tags: added: swarm-blocker
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

I've looked at how RabbitMQ started on node-5. Here are excerpts from 10.109.0.2/var/log/docker-logs/remote/node-5.test.domain.local/lrmd.log from the snapshot: http://paste.openstack.org/show/479961/

In general, yes, joining cluster timed out on 1st attempt and so RabbitMQ was restarted. It successfully joined the cluster on the 2nd attempt. And anyway, all that took 5 minutes (instead of 2 minutes) and did not cause unrecoverable errors in puppet (please correct me if I am wrong).

So, to sum up, it looks like rabbitmq was not the cause of the job time out.

Changed in fuel:
status: New → Incomplete
assignee: MOS Oslo (mos-oslo) → Artem Panchenko (apanchenko-8)
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
Changed in fuel:
assignee: Artem Panchenko (apanchenko-8) → Fuel Library Team (fuel-library)
status: Incomplete → Confirmed
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
removed: area-mos
tags: added: swarm-fail-driver
removed: swarm-blocker
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Investigating another rabbitmq issue https://bugs.launchpad.net/fuel/+bug/1528287

tags: added: team-bugfixx
tags: added: team-bugfix
removed: team-bugfixx
Changed in fuel:
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.