Fuel for OpenStack

[OSTF] HA tests failures after several stops/starts of RabbitMQ

Bug #1605266 reported by Andrey Lavrentyev on 2016-07-21

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	Kyrylo Galanov	Fuel for OpenStack 10.0
	Mitaka	Invalid	High	Kyrylo Galanov	Fuel for OpenStack 9.1

Bug Description

Detailed bug description:
HA tests failures after several stops/starts of RabbitMQ.
AssertionError: Failed 6 OSTF tests; should fail 0 tests. Names of failed tests:
  - Check state of haproxy backends on controllers (failure) Some haproxy backend has down state.. Please refer to OpenStack logs for more details.
  - Check data replication over mysql (failure) Can not connect to mysql. Please check that mysql is running and there is connectivity by management network Please refer to OpenStack logs for more details.
  - Check if amount of tables in databases is the same on each node (failure) Can list tables Please refer to OpenStack logs for more details.
  - Check galera environment state (failure) Verification of galera cluster node status failed Please refer to OpenStack logs for more details.
  - RabbitMQ availability (failure) Cannot retrieve cluster nodes Please refer to OpenStack logs for more details.
  - RabbitMQ replication (failure) Failed to establish AMQP connection to 5673/tcp port on 10.109.11.4 from controller node! Please refer to OpenStack logs for more details.

Failures on Swarm job: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/2/consoleFull

Steps to reproduce:
Steps from 'ha_neutron_test_3_1_rabbit_failover' test:
1. SSH to controller and get rabbit master
2. Destroy not rabbit master node
3. Check that rabbit master stay as was
4. Run ostf ha
5. Turn on destroyed slave
6. Check rabbit master is the same
7. Run ostf ha
8. Destroy rabbit master node
9. Check that new rabbit-master appears
10. Run ostf ha
11. Power on destroyed node
12. Check that new rabbit-master was not elected
13. Run ostf ha

Expected results:
All tests are passed

Actual result:
Failed 6 OSTF tests

Description of the environment:
9.1 snapshot #31
[root@nailgun log]# shotgun2 short-report
cat /etc/fuel_build_id:
495
cat /etc/fuel_build_number:
495
cat /etc/fuel_release:
9.0
cat /etc/fuel_openstack_version:
mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
fuel-release-9.0.0-1.mos6349.noarch
fuel-misc-9.0.0-1.mos8460.noarch
python-packetary-9.0.0-1.mos140.noarch
fuel-bootstrap-cli-9.0.0-1.mos285.noarch
fuel-migrate-9.0.0-1.mos8460.noarch
rubygem-astute-9.0.0-1.mos750.noarch
fuel-mirror-9.0.0-1.mos140.noarch
shotgun-9.0.0-1.mos90.noarch
fuel-openstack-metadata-9.0.0-1.mos8743.noarch
fuel-notify-9.0.0-1.mos8460.noarch
nailgun-mcagents-9.0.0-1.mos750.noarch
python-fuelclient-9.0.0-1.mos325.noarch
fuel-9.0.0-1.mos6349.noarch
fuel-utils-9.0.0-1.mos8460.noarch
fuel-setup-9.0.0-1.mos6349.noarch
fuel-provisioning-scripts-9.0.0-1.mos8743.noarch
fuel-library9.0-9.0.0-1.mos8460.noarch
network-checker-9.0.0-1.mos74.x86_64
fuel-agent-9.0.0-1.mos285.noarch
fuel-ui-9.0.0-1.mos2717.noarch
fuel-ostf-9.0.0-1.mos936.noarch
fuelmenu-9.0.0-1.mos274.noarch
fuel-nailgun-9.0.0-1.mos8743.noarch

Notes:
Similar issue is observed in Swarm job: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive_vlan/2/console when executing 'neutron_l3_migration_after_reset_vlan' tests

See original description

Tags:

Andrey Lavrentyev (alavrentyev) on 2016-07-21

description:	updated
summary:	- [OSTF] HA tests failures after several stops/starts RabbitMQ + [OSTF] HA tests failures after several stops/starts of RabbitMQ

Andrey Lavrentyev (alavrentyev) on 2016-07-21

description:

updated

Revision history for this message

Maksim Malchuk (mmalchuk) wrote on 2016-07-21:

there something with network, node-4 failed with all haproxy backends because of:

2016-07-21T05:34:39.013846+00:00 node-4 haproxy[11954]: Server nova-novncproxy/node-1 is DOWN, reason: Layer4 connection problem, info: "General socket error (Network is unreachable)"

there something with haproxy namespace. from the node-4 all other controllers not accessible:

root@node-4:~# ip netns exec haproxy ping 10.109.11.3
PING 10.109.11.3 (10.109.11.3) 56(84) bytes of data.
^C
--- 10.109.11.3 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1003ms

but, on the other hand, from the node-3 the node-4 accessible:

root@node-3:~# ip netns exec haproxy ping 10.109.11.4
PING 10.109.11.4 (10.109.11.4) 56(84) bytes of data.
64 bytes from 10.109.11.4: icmp_seq=1 ttl=63 time=1.08 ms
64 bytes from 10.109.11.4: icmp_seq=2 ttl=63 time=3.62 ms
^C
--- 10.109.11.4 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.080/2.352/3.625/1.273 ms

tags:	added: area-library l23network
Changed in fuel:
status:	New → Confirmed
importance:	Undecided → High
assignee:	nobody → l23network (l23network)
milestone:	none → 10.0

Sergey Shevorakov (sshevorakov) on 2016-07-22

tags:

added: swarm-fail

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2016-07-25:

Seems that the bug is related to CI environment, we'd prefer to have a live env for debug. Marking as Incomplete, please re-open as soon as this issue happen again.

Changed in fuel:
status:	Confirmed → Incomplete

Revision history for this message

Dmitry Belyaninov (dbelyaninov) wrote on 2016-07-27:

https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.ha_neutron_destructive/8/testReport/(root)/ha_neutron_test_3_1_rabbit_failover/ha_neutron_test_3_1_rabbit_failover/

Oleksiy Molchanov (omolchanov) on 2016-07-28

Changed in fuel:
status:	Incomplete → Confirmed

Kyrylo Galanov (kgalanov) on 2016-07-28

Changed in fuel:
assignee:	l23network (l23network) → Kyrylo Galanov (kgalanov)

Revision history for this message

Kyrylo Galanov (kgalanov) wrote on 2016-07-29:

ETA: 8/8/16

tags:

added: tricky

Dmitry Pyzhov (dpyzhov) on 2016-08-04

tags:

added: 9.1-proposed

Kyrylo Galanov (kgalanov) on 2016-08-05

Changed in fuel:
status:	Confirmed → Incomplete

Revision history for this message

Kyrylo Galanov (kgalanov) wrote on 2016-08-05:

I could not reproduce it locally. It was not reproduced on CI during latest jobs. Please reopen if you can provide a broken environment.

Revision history for this message

Kyrylo Galanov (kgalanov) wrote on 2016-08-29:

Please reopen if issue comes up again.

Changed in fuel:
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1604859

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.