Openstack operations finish with 502 error after failover, with errors on oslo.messaging
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Invalid
|
Critical
|
Alexey Khivin | ||
5.1.x |
Invalid
|
Critical
|
Alexey Khivin | ||
6.0.x |
Invalid
|
Critical
|
Alexey Khivin | ||
6.1.x |
Invalid
|
Critical
|
Alexey Khivin |
Bug Description
Steps :
deploy ha on Ubuntu: 3 controllers , 1 compute, 1 cinder
with neutron gre
1. When cluster ready run ostf (tests are passed)
2. Run safe reboot of primary controller
3. Wait while system recovers, run ostf (ostf is passed again)
4. Execute ha proxy fail (kill -9 $(pidof haproxy)) on pr controller
5. Wait while ha proxy pids appears on the all controllers, and crm says that there is no problem with haproxy resource
6. Run OSTF (ha suit pass, sanity test pass), create volume in horizon
7. This env stay alone near 12 h, and I run ostf again - result the same, ha suit - pass, sanity suit - pass, smoke tests - pass all tests except cinder tests
Actual result:
cinder operation randomly failed with next errors
http://
[root@nailgun ~]# cat /etc/fuel/
VERSION:
feature_groups:
- mirantis
production: "docker"
release: "5.1.1"
api: "1.0"
build_number: "48"
build_id: "2014-12-
astute_sha: "ef8aa0fd0e3ce2
fuellib_sha: "a3043477337b4a
ostf_sha: "64cb59c681658a
nailgun_sha: "500e36d08a45db
fuelmain_sha: "7626c5aeedcde7
Other case:
Ubuntu HA with nova network, 3 controllers + 3 mongo + cinder + compute with ceilometer enabled
1. Shutdown primary controller
2. Wait while ostf ha pass
3. Run OSTF smoke suit
Instance creation / volume creation became failed with:
http://
root@node-18:~# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-18' ...
[{nodes,
{running_
{cluster_
{partitions,[]}]
...done.
rabbit says that it really ok
crm status says the same
Master/Slave Set: master_
Masters: [ node-17 ]
Slaves: [ node-18 ]
root@node-18:~# python test_rabbitmq.py /etc/nova/nova.conf
Connecting to 127.0.0.1:5673... Done!
Connecting to 10.108.3.3:5673... FAILED [[Errno 113] No route to host]
Connecting to 10.108.3.4:5673... Done!
Using host 127.0.0.1:5673
Declaring queue test-rabbit-
Publishing messages... Done!
Consuming message from 127.0.0.1:5673... Done!
Consuming message from 10.108.3.4:5673... Done!
Deleting queue test-rabbit-
I'll attach new snapshot (please note , controllers node is node-16, node-18 and node-17)
Changed in fuel: | |
milestone: | 5.1.2 → 5.1.1 |
milestone: | 5.1.1 → 5.1.2 |
description: | updated |
no longer affects: | fuel/6.1.x |
summary: |
- Cinder operations finish with 502 error after failover + OS operations finish with 502 error after failover, with errors on + oslo.messaging |
description: | updated |
summary: |
- OS operations finish with 502 error after failover, with errors on - oslo.messaging + Openstack operations finish with 502 error after failover, with errors + on oslo.messaging |
tags: | added: ha |
The logs indicate an issue with rabbitmq cluster. As I mentioned here https:/ /bugs.launchpad .net/fuel/ +bug/1399181/ comments/ 4 the rabbitmq failover procedure could take quite a while, and OSTF HA health checks should be used *prior* to any other checks. Please elaborate which was a report of OSTF HA check before the step 6 , did you recieve success for "RabbitMQ availability" ?