[OSTF] HA tests failures after several stops/starts of RabbitMQ

Bug #1605266 reported by Andrey Lavrentyev
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Kyrylo Galanov
Mitaka
Invalid
High
Kyrylo Galanov

Bug Description

Detailed bug description:
HA tests failures after several stops/starts of RabbitMQ.
AssertionError: Failed 6 OSTF tests; should fail 0 tests. Names of failed tests:
  - Check state of haproxy backends on controllers (failure) Some haproxy backend has down state.. Please refer to OpenStack logs for more details.
  - Check data replication over mysql (failure) Can not connect to mysql. Please check that mysql is running and there is connectivity by management network Please refer to OpenStack logs for more details.
  - Check if amount of tables in databases is the same on each node (failure) Can list tables Please refer to OpenStack logs for more details.
  - Check galera environment state (failure) Verification of galera cluster node status failed Please refer to OpenStack logs for more details.
  - RabbitMQ availability (failure) Cannot retrieve cluster nodes Please refer to OpenStack logs for more details.
  - RabbitMQ replication (failure) Failed to establish AMQP connection to 5673/tcp port on 10.109.11.4 from controller node! Please refer to OpenStack logs for more details.

Failures on Swarm job: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/2/consoleFull

Steps to reproduce:
Steps from 'ha_neutron_test_3_1_rabbit_failover' test:
1. SSH to controller and get rabbit master
2. Destroy not rabbit master node
3. Check that rabbit master stay as was
4. Run ostf ha
5. Turn on destroyed slave
6. Check rabbit master is the same
7. Run ostf ha
8. Destroy rabbit master node
9. Check that new rabbit-master appears
10. Run ostf ha
11. Power on destroyed node
12. Check that new rabbit-master was not elected
13. Run ostf ha

Expected results:
All tests are passed

Actual result:
Failed 6 OSTF tests

Description of the environment:
9.1 snapshot #31
[root@nailgun log]# shotgun2 short-report
cat /etc/fuel_build_id:
 495
cat /etc/fuel_build_number:
 495
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6349.noarch
 fuel-misc-9.0.0-1.mos8460.noarch
 python-packetary-9.0.0-1.mos140.noarch
 fuel-bootstrap-cli-9.0.0-1.mos285.noarch
 fuel-migrate-9.0.0-1.mos8460.noarch
 rubygem-astute-9.0.0-1.mos750.noarch
 fuel-mirror-9.0.0-1.mos140.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-openstack-metadata-9.0.0-1.mos8743.noarch
 fuel-notify-9.0.0-1.mos8460.noarch
 nailgun-mcagents-9.0.0-1.mos750.noarch
 python-fuelclient-9.0.0-1.mos325.noarch
 fuel-9.0.0-1.mos6349.noarch
 fuel-utils-9.0.0-1.mos8460.noarch
 fuel-setup-9.0.0-1.mos6349.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8743.noarch
 fuel-library9.0-9.0.0-1.mos8460.noarch
 network-checker-9.0.0-1.mos74.x86_64
 fuel-agent-9.0.0-1.mos285.noarch
 fuel-ui-9.0.0-1.mos2717.noarch
 fuel-ostf-9.0.0-1.mos936.noarch
 fuelmenu-9.0.0-1.mos274.noarch
 fuel-nailgun-9.0.0-1.mos8743.noarch

Notes:
Similar issue is observed in Swarm job: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive_vlan/2/console when executing 'neutron_l3_migration_after_reset_vlan' tests

description: updated
summary: - [OSTF] HA tests failures after several stops/starts RabbitMQ
+ [OSTF] HA tests failures after several stops/starts of RabbitMQ
description: updated
Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

there something with network, node-4 failed with all haproxy backends because of:

2016-07-21T05:34:39.013846+00:00 node-4 haproxy[11954]: Server nova-novncproxy/node-1 is DOWN, reason: Layer4 connection problem, info: "General socket error (Network is unreachable)"

there something with haproxy namespace. from the node-4 all other controllers not accessible:

root@node-4:~# ip netns exec haproxy ping 10.109.11.3
PING 10.109.11.3 (10.109.11.3) 56(84) bytes of data.
^C
--- 10.109.11.3 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1003ms

but, on the other hand, from the node-3 the node-4 accessible:

root@node-3:~# ip netns exec haproxy ping 10.109.11.4
PING 10.109.11.4 (10.109.11.4) 56(84) bytes of data.
64 bytes from 10.109.11.4: icmp_seq=1 ttl=63 time=1.08 ms
64 bytes from 10.109.11.4: icmp_seq=2 ttl=63 time=3.62 ms
^C
--- 10.109.11.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.080/2.352/3.625/1.273 ms

tags: added: area-library l23network
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → l23network (l23network)
milestone: none → 10.0
tags: added: swarm-fail
Revision history for this message
Oleksiy Molchanov (omolchanov) wrote :

Seems that the bug is related to CI environment, we'd prefer to have a live env for debug. Marking as Incomplete, please re-open as soon as this issue happen again.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Changed in fuel:
status: Incomplete → Confirmed
Changed in fuel:
assignee: l23network (l23network) → Kyrylo Galanov (kgalanov)
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

ETA: 8/8/16

tags: added: tricky
Dmitry Pyzhov (dpyzhov)
tags: added: 9.1-proposed
Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

I could not reproduce it locally. It was not reproduced on CI during latest jobs. Please reopen if you can provide a broken environment.

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Please reopen if issue comes up again.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.