[OSTF] HA tests failures after several stops/starts of RabbitMQ
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Invalid
|
High
|
Kyrylo Galanov | ||
Mitaka |
Invalid
|
High
|
Kyrylo Galanov |
Bug Description
Detailed bug description:
HA tests failures after several stops/starts of RabbitMQ.
AssertionError: Failed 6 OSTF tests; should fail 0 tests. Names of failed tests:
- Check state of haproxy backends on controllers (failure) Some haproxy backend has down state.. Please refer to OpenStack logs for more details.
- Check data replication over mysql (failure) Can not connect to mysql. Please check that mysql is running and there is connectivity by management network Please refer to OpenStack logs for more details.
- Check if amount of tables in databases is the same on each node (failure) Can list tables Please refer to OpenStack logs for more details.
- Check galera environment state (failure) Verification of galera cluster node status failed Please refer to OpenStack logs for more details.
- RabbitMQ availability (failure) Cannot retrieve cluster nodes Please refer to OpenStack logs for more details.
- RabbitMQ replication (failure) Failed to establish AMQP connection to 5673/tcp port on 10.109.11.4 from controller node! Please refer to OpenStack logs for more details.
Failures on Swarm job: https:/
Steps to reproduce:
Steps from 'ha_neutron_
1. SSH to controller and get rabbit master
2. Destroy not rabbit master node
3. Check that rabbit master stay as was
4. Run ostf ha
5. Turn on destroyed slave
6. Check rabbit master is the same
7. Run ostf ha
8. Destroy rabbit master node
9. Check that new rabbit-master appears
10. Run ostf ha
11. Power on destroyed node
12. Check that new rabbit-master was not elected
13. Run ostf ha
Expected results:
All tests are passed
Actual result:
Failed 6 OSTF tests
Description of the environment:
9.1 snapshot #31
[root@nailgun log]# shotgun2 short-report
cat /etc/fuel_build_id:
495
cat /etc/fuel_
495
cat /etc/fuel_release:
9.0
cat /etc/fuel_
mitaka-9.0
rpm -qa | egrep 'fuel|astute|
fuel-release-
fuel-misc-
python-
fuel-bootstrap
fuel-migrate-
rubygem-
fuel-mirror-
shotgun-
fuel-openstack
fuel-notify-
nailgun-
python-
fuel-9.
fuel-utils-
fuel-setup-
fuel-provision
fuel-library9.
network-
fuel-agent-
fuel-ui-
fuel-ostf-
fuelmenu-
fuel-nailgun-
Notes:
Similar issue is observed in Swarm job: https:/
description: | updated |
summary: |
- [OSTF] HA tests failures after several stops/starts RabbitMQ + [OSTF] HA tests failures after several stops/starts of RabbitMQ |
description: | updated |
tags: | added: swarm-fail |
Changed in fuel: | |
status: | Incomplete → Confirmed |
Changed in fuel: | |
assignee: | l23network (l23network) → Kyrylo Galanov (kgalanov) |
tags: | added: 9.1-proposed |
Changed in fuel: | |
status: | Confirmed → Incomplete |
there something with network, node-4 failed with all haproxy backends because of:
2016-07- 21T05:34: 39.013846+ 00:00 node-4 haproxy[11954]: Server nova-novncproxy /node-1 is DOWN, reason: Layer4 connection problem, info: "General socket error (Network is unreachable)"
there something with haproxy namespace. from the node-4 all other controllers not accessible:
root@node-4:~# ip netns exec haproxy ping 10.109.11.3
PING 10.109.11.3 (10.109.11.3) 56(84) bytes of data.
^C
--- 10.109.11.3 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1003ms
but, on the other hand, from the node-3 the node-4 accessible:
root@node-3:~# ip netns exec haproxy ping 10.109.11.4 352/3.625/ 1.273 ms
PING 10.109.11.4 (10.109.11.4) 56(84) bytes of data.
64 bytes from 10.109.11.4: icmp_seq=1 ttl=63 time=1.08 ms
64 bytes from 10.109.11.4: icmp_seq=2 ttl=63 time=3.62 ms
^C
--- 10.109.11.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.080/2.