Can not get rabbit channel list in 20 second

Bug #1470138 reported by Egor Kotko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)

Bug Description

{"build_id": "2015-06-29_18-10-01", "build_number": "17", "release_versions": {"2014.2.2-7.0": {"VERSION": {"build_id": "2015-06-29_18-10-01", "build_number": "17", "api": "1.0", "fuel-library_sha": "29935efd3d6c0bfb385d14b94528df418ac81ef3", "nailgun_sha": "026d4ee9570fc844c9b312e3d52c643528b655f6", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-7.0", "production": "docker", "python-fuelclient_sha": "79f6129e3b6b440c96ac9e96041fad4fd2e64379", "astute_sha": "776157f722b13aff5f59bc098cf948793e6498ef", "fuel-ostf_sha": "69e7fa120e8efa7ed74d98efc63079d2f5c69d7b", "release": "7.0", "fuelmain_sha": "4f2dff3bdc327858fa45bcc2853cfbceae68a40c"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "29935efd3d6c0bfb385d14b94528df418ac81ef3", "nailgun_sha": "026d4ee9570fc844c9b312e3d52c643528b655f6", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-7.0", "production": "docker", "python-fuelclient_sha": "79f6129e3b6b440c96ac9e96041fad4fd2e64379", "astute_sha": "776157f722b13aff5f59bc098cf948793e6498ef", "fuel-ostf_sha": "69e7fa120e8efa7ed74d98efc63079d2f5c69d7b", "release": "7.0", "fuelmain_sha": "4f2dff3bdc327858fa45bcc2853cfbceae68a40c"}

http://jenkins-product.srt.mirantis.net:8080/view/7.0_swarm/job/7.0.system_test.centos.services_ha/10/testReport/%28root%29/deploy_murano_ha_with_gre/deploy_murano_ha_with_gre/
Test murano_ha_with_gre failed on OSTF:

Can not get rabbit channel list in 20 second

Scenario:
1. Retrieve cluster status for each controller.
2. Check that numbers of rabbit nodes is the same as controllers.
3. Check crm status for rabbit
4. List channels

rabbitmqctl list_channels command output:
http://paste.openstack.org/show/328689/

OSTF error log:
http://paste.openstack.org/show/328688/

Tags: rabbitmq
Revision history for this message
Egor Kotko (ykotko) wrote :
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

By design, the rabbitmq pacemaker resource agent will reset the node if list_channels is not responding. It looks like that was the case and the affected node got restarted as a part of failover procedure making OSTF test failed. This issue should only be considered as a bug if the restarted node failed to recover its normal operations. Please clarify the end state.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

diagnostic logs snapshot looks corrupted

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

According to logs, the node-1 had failed while the OSTF HA check was in progress (2015-06-30T03:29:21). The failure ended up in the partitioned clusters - successfully recovered later. Starting from 2015-06-30T03:29:41 AMQP connections present in logs.

The issue looks invalid as entering and recovering from partitions is an ordinary situation in distributed systems. So, please elaborate the final state of the deployed environment, was it operational after recovered from partitions?

Logs from autoheal: http://paste.openstack.org/show/348381/

Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Marking as invalid, no update for more than a month.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.