Rabbitmq_user provider failed to retry list_users code block

Bug #1429095 reported by Anastasia Palkina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
Medium
Dmitry Ilyin

Bug Description

"build_id": "2015-03-05_22-54-44",
"ostf_sha": "8df5f2fcdae3bc9ea7d700ffd64db820baf51914",
"build_number": "165",
"release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-03-05_22-54-44", "ostf_sha": "8df5f2fcdae3bc9ea7d700ffd64db820baf51914", "build_number": "165", "api": "1.0", "nailgun_sha": "f12221d79e0d97c7b4405331e11a54fc5dcfcd4e", "production": "docker", "python-fuelclient_sha": "4eb787f1ad969bd23c93d192865543dbd45a8626", "astute_sha": "ca7635a356a90404d3dedb5cf26f1d16e07144a9", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "0e45b31db1677651d6ddb1c852d62ebfd8875dcd", "fuellib_sha": "07288d7bfde840b7ec47292ff96a3b670a79c859"}}},
"auth_required": true,
"api": "1.0",
"nailgun_sha": "f12221d79e0d97c7b4405331e11a54fc5dcfcd4e",
"production": "docker",
"python-fuelclient_sha": "4eb787f1ad969bd23c93d192865543dbd45a8626",
"astute_sha": "ca7635a356a90404d3dedb5cf26f1d16e07144a9",
"feature_groups": ["mirantis"],
"release": "6.1",
"fuelmain_sha": "0e45b31db1677651d6ddb1c852d62ebfd8875dcd",
"fuellib_sha": "07288d7bfde840b7ec47292ff96a3b670a79c859"

1. Create new environment (Ubuntu)
2. Choose Neutron, GRE
3. Add 3 controller, 2 compute, 1 cinder
4. Start deployment. It was successful
5. Start OSTF tests. It was successful
6. But there is error in puppet.log on second controller (node-6):

2015-03-06 11:04:56 ERR

 (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) Could not evaluate: Execution of '/usr/sbin/rabbitmqctl -q list_users' returned 2: Error: {aborted,{no_exists,[rabbit_user,{internal_user,'_','_','_'}]}}

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Changed in fuel:
importance: High → Medium
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

At the given moment of time, 2015-03-06 11:04:56, the rabbit was being clustered as a part of deployment process:

2015-03-06T11:04:56.351252+00:00 info: INFO: Stopping node 'rabbit@node-6' ... ...done.
2015-03-06T11:05:06.991297+00:00 info: INFO: p_rabbitmq-server: get_monitor(): rabbit app is running and is member of healthy cluster

The minor issue is that "Error: {aborted,{no_exists,[rabbit_user,{internal_user,'_','_','_'}]}}" error, was expected to be handled by a retry code block of the Rabbitmq_user provider, but it failed:

2015-03-06T10:58:28.110932+00:00 notice: (Scope(Class[main])) MODULAR: openstack-controller.pp
2015-03-06T11:04:56.086379+00:00 debug: Executing '/usr/sbin/rabbitmqctl -q list_users'
2015-03-06T11:04:56.790255+00:00 err: (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) Could not evaluate: Execution of '/usr/sbin/rabbitmqctl -q list_users' returned 2: Error: {aborted,{no_exists,[rabbit_user,{internal_user,'_','_','_'}]}}

The deployment was not failed, though, so the issue is a minor one:
2015-03-06T11:07:32.477530+00:00 notice: (Scope(Class[main])) MODULAR: openstack-controller.pp
2015-03-06T11:09:32.338623+00:00 debug: Executing '/usr/sbin/rabbitmqctl -q list_users'
2015-03-06T11:09:32.834132+00:00 debug: Command succeeded

summary: - (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) Could not evaluate:
- Execution of '/usr/sbin/rabbitmqctl -q list_users' returned 2: Error:
- {aborted,{no_exists,[rabbit_user,{internal_user,'_','_','_'}]}}
+ Rabbitmq_user provider failed to retry list_users code block
Changed in fuel:
status: New → Confirmed
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #253

"build_id": "2015-03-30_22-54-44", "ostf_sha": "80036bd4e433b0a1a5b3dc5732608a52135bb2b4", "build_number": "253", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-03-30_22-54-44", "ostf_sha": "80036bd4e433b0a1a5b3dc5732608a52135bb2b4", "build_number": "253", "api": "1.0", "nailgun_sha": "8bc89eee197089ae38a023dd0215caae219f24b1", "production": "docker", "python-fuelclient_sha": "05ec53f94206decdce19bb9373523022e5616b83", "astute_sha": "7292fc2a673cb1c32a688a46fd4836ca0500a957", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "320b5f46fc1b2798f9e86ed7df51d3bda1686c10", "fuellib_sha": "6d366b4e7d2d6722c245c4691a6605e2e3bc3b4a"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "8bc89eee197089ae38a023dd0215caae219f24b1", "production": "docker", "python-fuelclient_sha": "05ec53f94206decdce19bb9373523022e5616b83", "astute_sha": "7292fc2a673cb1c32a688a46fd4836ca0500a957", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "320b5f46fc1b2798f9e86ed7df51d3bda1686c10", "fuellib_sha": "6d366b4e7d2d6722c245c4691a6605e2e3bc3b4a"

1. Create new environment (CentOS)
2. Choose Neutron, GRE
3. Choose Ceph for images, RadosGW
4. Choose Sahara, Murano, Ceilometer
5. Add 3 controller, 1 compute, 1 cinder+mongo, 3 ceph, 2 mongo
6. Deployment was successful
7. Error on third controller (node-10):

2015-03-31 17:06:12 ERR

 (/Stage[main]/Nova::Rabbitmq/Rabbitmq_user[nova]) Could not evaluate: Execution of '/usr/sbin/rabbitmqctl -q list_users' returned 2: Error: {aborted,{no_exists,[rabbit_user,{internal_user,'_','_','_'}]}}

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Dmitry Ilyin (idv1985)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Dmitry Ilyin (idv1985)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/170083

Changed in fuel:
status: Confirmed → In Progress
Changed in fuel:
assignee: Dmitry Ilyin (idv1985) → Vladimir Kuklin (vkuklin)
Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Dmitry Ilyin (idv1985)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/170083
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=f66e62c4cc2380ace65944896bbdd7ae933e94a2
Submitter: Jenkins
Branch: master

commit f66e62c4cc2380ace65944896bbdd7ae933e94a2
Author: Dmitry Ilyin <email address hidden>
Date: Wed Apr 1 21:07:30 2015 +0300

    Improve rabbitmq retry function

    * Refactor retry function
    * Cover all calls with retries

    Change-Id: Iaf38c93a9f09bfdb44b6b6b94d7f815903d29ddd
    Closes-Bug: 1429095

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

Seems that there were reproduce at https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.partition_preservation/76/console
with scenario

Scenario:
1. Revert the snapshot
2. Create a ceilometer alarm
3. Mark 'mongo' and 'mysql' partitions to be
preserved on one of controllers
4. Reinstall the controller
5. Verify that the alarm is present after the node reinstallation
6. Verify IST has been received for the reinstalled controller
7. Run network verification
8. Run OSTF

There were 3 failures with
`/etc/puppet/modules/rabbitmq/lib/puppet/provider/rabbitmqctl.rb:30:in `run_with_retries'`
trying to list users.

After revert list_users returned valid information.
Possible timeout cause is that test uses reinstallation and timeouts are nod adjusted.

Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

Since puppet has started to check, something strange happened with rabbitmq.
And that strange action lasted untill 00:50:28.
Puppet timeout happened at 00:38:37 which is about 10 minutes too early.
Seems, that timeout(30*(6+10) = 8 minutes.) is about two times smaller for that case.
run_with_retries(count=30, step=6, timeout=10)
sleep step

http://paste.openstack.org/show/493963/

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Dmitry Ilyin (idv1985) wrote :

Now we are using the upstream version of the rabbitmq module and looks like this but is fixed there

Changed in fuel:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.