Openstack cluster do not work after failover of primary controller
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
Critical
|
Vladimir Kuklin | ||
4.1.x |
Fix Released
|
Critical
|
Registry Administrators | ||
5.0.x |
Fix Released
|
Critical
|
Vladimir Kuklin |
Bug Description
{"build_id": "2014-05-
Steps to Reproduce:
1. Deploy environment
3 controllers + 2 computes on nova Vlan
2. When deployment finish with succes - run ostf to be sure that all works
3. Run rally banchmark tests(create/delete isnatce) and ostf
4. While tests running - force off primary controller(in my deployment it is node-1)
5. wait untill vips and other ha services recovered
6. run ostf
Expected Result:
Openstack cluster is operational. Ostf passed. User can succesfully create/delete instance on horizon
Actual result:
Ostf failes, Instance do not created/ deleted
queues status
http://
rabbit cluster status:
[root@node-2 ~]# rabbitmqctl cluster_status
Cluster status of node 'rabbit@node-2' ...
[{nodes,
{running_
{partitions,[]}]
...done.
[root@node-2 ~]#
crm:
[root@node-2 ~]# crm_mon -1
Last updated: Thu May 22 15:47:26 2014
Last change: Thu May 22 12:40:37 2014 via cibadmin on node-3.
Stack: classic openais (with plugin)
Current DC: node-2.
Version: 1.1.10-
3 Nodes configured, 3 expected votes
9 Resources configured
Online: [ node-2.
OFFLINE: [ node-1.
vip__managemen
vip__public_old (ocf::mirantis:
Clone Set: clone_p_haproxy [p_haproxy]
Started: [ node-2.
Stopped: [ node-1.
Clone Set: clone_p_mysql [p_mysql]
Started: [ node-2.
Stopped: [ node-1.
openstack-
[root@node-2 ~]#
on compute I have not see rabbit connection at all:
-leasefile-ro --domain=novalocal --no-hosts --addn-
[root@node-4 ~]# lsof -p 21836 | grep IP
nova-comp 21836 nova 20u IPv4 89794 0t0 TCP node-4:
nova-comp 21836 nova 21u IPv4 94273 0t0 TCP node-4:
nova-comp 21836 nova 22u IPv4 94275 0t0 TCP node-4:
nova-comp 21836 nova 23u IPv4 94287 0t0 TCP node-4:
[root@node-4 ~]#
[root@node-4 ~]# lsof -p 21836 | grep 56714-05-22 15:48:54.561 21836 DEBUG nova.compute.
also on computes a lot of errors
2014-05-22 15:48:54.561 21836 DEBUG nova.openstack.
2014-05-22 15:49:54.561 21836 ERROR nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
2014-05-22 15:49:54.561 21836 TRACE nova.servicegro
3
[root@node-4 ~]#
Changed in fuel: | |
assignee: | Vladimir Kuklin (vkuklin) → Dmitry Borodaenko (dborodaenko) |
Changed in fuel: | |
assignee: | Dmitry Borodaenko (dborodaenko) → Vladimir Kuklin (vkuklin) |
tags: | added: to-be-covered-by-tests |
no longer affects: | fuel/5.1.x |
Changed in fuel: | |
milestone: | 5.0 → 5.1 |
Changed in fuel: | |
status: | Fix Committed → Fix Released |
This commit seems to improve recovery significantly: /review. openstack. org/95007
https:/
Not sure if it will make the described test steps pass (ostf might still fail of controller is lost in the middle of a test run), but it does reduce post-failover recovery time considerably.