rabbitmqctl list_channels timeout causes RabbitMQ restart

Bug #1566816 reported by Dmitry Stepanenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
High
Alexey Galkin
9.x
Invalid
High
Alexey Galkin

Bug Description

Detailed bug description:

BVT_2 build #197 failed with several messages in logs saying that AMQP server is unreachable (nova-all.log, neutron-all.log, neutron/server.log). "lrmd.log" contains several messages like that:

2016-04-03T14:22:01.180966+00:00 info: INFO: p_rabbitmq-server[16942]: get_status(): failed with code 69. Command output: Error: unable to connect to node 'rabbit@messaging-node-2': nodedown

Also there are several messages saying that rabbitmqctl list_channels is timed out
2016-04-03T14:07:13.201907+00:00 err: ERROR: p_rabbitmq-server[3919]: get_monitor(): 'rabbitmqctl list_channels' timed out, per-node explanation:

RabbitMQ was restarted because of these timeouts and as a result the test failed.

Expected results:
RabbitMQ works seamlessly, test passes

Actual result:
RabbitMQ restarts, test fails

Reproducibility:

seen once in 197th build of BVT_2 task

See attachment for futher clarifications.

Revision history for this message
Dmitry Stepanenko (dstepanenko) wrote :
Dina Belova (dbelova)
Changed in mos:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → MOS Oslo (mos-oslo)
milestone: none → 9.0
tags: added: area-oslo
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

version

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote : Re: RabbitMQ falls down with error "Error: unable to connect to node 'rabbit@messaging-node-2': nodedown"

The issue was found only once but we anyway should check the log files to make sure that we don't have a bugs here.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

QA team, please reproduce the issue and provide us with environment where 'rabbitmqctl list_channels' times out. Here are the suggest steps:

1. Before performing the test and after deploying env execute the following command on one of the controllers:
 crm_resource --resource p_rabbitmq-server --set-parameter max_rabbitmqctl_timeouts --parameter-value 1000000
   If rabbitmq restarts, wait for it to come up.

2. Do some tests
3. After doing the tests, examine end of /var/log/node-X.domain.tld/lrmd.log for each controller. If you see there the following lines:
2016-04-03T14:17:16.020640+00:00 err: ERROR: p_rabbitmq-server[20677]: get_monitor(): 'rabbitmqctl list_channels' timed out 5 of max. 3 time(s) in a row and is not responding. The resource is failed.

then you got the reproduction.

Changed in mos:
assignee: MOS Oslo (mos-oslo) → Alexey Galkin (agalkin)
summary: - RabbitMQ falls down with error "Error: unable to connect to node
- 'rabbit@messaging-node-2': nodedown"
+ rabbitmqctl list_channels timeout causes RabbitMQ restart
description: updated
Changed in mos:
milestone: 9.0 → 9.0-updates
Revision history for this message
Alexey Galkin (agalkin) wrote :

This bug can't reproduce. Temporary moved to Invalid.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.