Keystone becomes not operable if there is no connectivity on br-mgmt

Bug #1438276 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Aleksandr Didenko
6.0.x
Invalid
High
Aleksandr Didenko

Bug Description

{"build_id": "2015-03-26_09-08-29", "ostf_sha": "a4cf5f218c6aea98105b10c97a4aed8115c15867", "build_number": "231", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-03-26_09-08-29", "ostf_sha": "a4cf5f218c6aea98105b10c97a4aed8115c15867", "build_number": "231", "api": "1.0", "nailgun_sha": "7f0e0af1f54db840230745ee4f7aec6824dac9b9", "production": "docker", "python-fuelclient_sha": "e5e8389d8d481561a4d7107a99daae07c6ec5177", "astute_sha": "631f96d5a09cc48bfbddcbf056b946c8a80438f0", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "320b5f46fc1b2798f9e86ed7df51d3bda1686c10", "fuellib_sha": "345a98b34dd0cd450a45d405ac47a6a9fa48b6d8"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "7f0e0af1f54db840230745ee4f7aec6824dac9b9", "production": "docker", "python-fuelclient_sha": "e5e8389d8d481561a4d7107a99daae07c6ec5177", "astute_sha": "631f96d5a09cc48bfbddcbf056b946c8a80438f0", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "320b5f46fc1b2798f9e86ed7df51d3bda1686c10", "fuellib_sha": "345a98b34dd0cd450a45d405ac47a6a9fa48b6d8"}

Steps to reproduce:
1. Deploy Ha on Centos with neutron
- 3 controllers
- 2 computes
2. When cluster ready run ostf ha, smoke and sanity suites
3. As soon as tests passed ssh on any controller and block input/output traffic on br-mgmt
4. Wait until cluster recovers after fail-over (I waiting for ~30 minutes)
5. manually check rabbitmq health and galera health, check crm
6. Try to login in horizon

Actual result:
Authorization failed. ssh on node and execute . openrc nova list. Command failed with 401 from keystone.
execute telnet to memcached on each controller. telned on controller where we block traffic failed(and it is expected), on other 2 controllers we can successfully connect to memcached.
On controller where we block traffic check haproxy backends for keystone
[root@node-5 ~]# haproxy-status | grep keystone
2015/03/30 10:17:58 socat[4902] E connect(3, AF=1 "/var/lib/haproxy/stats", 24): Connection refused
[root@node-5 ~]#
check haproxy-backends for keystone from healthy controller:
root@node-2 ~]# haproxy-status | grep keystone
keystone-1 FRONTEND Status: OPEN Sessions: 0 Rate: 0
keystone-1 node-2 Status: UP/L7OK Sessions: 0 Rate: 0
keystone-1 node-4 Status: UP/L7OK Sessions: 0 Rate: 0
keystone-1 node-5 Status: DOWN/L4TOUT Sessions: 0 Rate: 0
keystone-1 BACKEND Status: UP Sessions: 0 Rate: 0
keystone-2 FRONTEND Status: OPEN Sessions: 0 Rate: 0
keystone-2 node-2 Status: UP/L7OK Sessions: 0 Rate: 0
keystone-2 node-4 Status: UP/L7OK Sessions: 0 Rate: 0
keystone-2 node-5 Status: DOWN/L4TOUT Sessions: 0 Rate: 0
keystone-2 BACKEND Status: UP Sessions: 0 Rate: 0
[root@node-2 ~]#
(node-5 out node with blocked traffic)

run command . openrc nova-list one more time from healthy controller - result 401 from keystone

edit keystone.conf on both healthy controller - remove here node with failed memcached from section [memchache] and [cache], restart keystone on both controllers, run command . openrc keystone token get - it is passed
rum command . openrc nova -list - it failed with 401 error from keystone, user can not pass authorization in horizon, services also failed to communicate (according keystone send all the time 401 error)
http://paste.openstack.org/show/197570/

Tags: ha keystone
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

create the same issue on mos-keystone https://bugs.launchpad.net/mos/+bug/1438279 (Because I am not sure that it is library part), So guys could you pls take a look and if it is related only to keystone close as duplicate

summary: - Keystone becomes not operatable if there is not connectivity on br-mgmt
+ Keystone becomes not operable if there is not connectivity on br-mgmt
summary: - Keystone becomes not operable if there is not connectivity on br-mgmt
+ Keystone becomes not operable if there is no connectivity on br-mgmt
Revision history for this message
Miroslav Anashkin (manashkin) wrote :

There is variable CACHES in /etc/openstack-dashboard/local_settings.
It points to all Memcached server instances.
May be one needs to exclude the failed Memcached instance from this file as well and restart Apache

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Aleksandr Didenko (adidenko)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

IIRC, we decided to simulate the failover with unplugging interfaces from br-mgmt instead of iptable rules as the latter ones do not work correct. If it is possible, reproduce the issue in the accepted way.

Changed in fuel:
status: New → Incomplete
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

If you stop keystone on any controller on neutron env, you'll get a lot of 500 errors from nova-api. It's happening because of wrong configuration of neutron service which uses management_IP instead of management_VIP as auth_host parameter.
Also we're using management_IP instead of management_VIP in 'openrc' file, but it would be better to use management_VIP there too.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/169251

Changed in fuel:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/169251
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=573de5bd1d4eb53a8a2191d3e7aa210fd87b3aa0
Submitter: Jenkins
Branch: master

commit 573de5bd1d4eb53a8a2191d3e7aa210fd87b3aa0
Author: Aleksandr Didenko <email address hidden>
Date: Tue Mar 31 13:19:53 2015 +0300

    Fix auth_host for neutron configs

    * Use management_vip as auth_host in neutron configuration.
    * Switch openrc from internal_address to management_vip.
    * Remove unused parameters from openstack::network class.

    Change-Id: I0b20c6a081753160f6a09031c6ce3dd39b46b869
    Related-bug: #1438276

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

It's not confirmed on 6.0, marking invalid.

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Verified on ISO #295

"build_id": "2015-04-08_22-54-31", "ostf_sha": "c3b06dba5c96d225882e9f1a465f74eaa6374fbf", "build_number": "295", "release_versions": {"2014.2-6.1": {"VERSION": {"build_id": "2015-04-08_22-54-31", "ostf_sha": "c3b06dba5c96d225882e9f1a465f74eaa6374fbf", "build_number": "295", "api": "1.0", "nailgun_sha": "e5f101635455e94415969907deca510d2ded6b73", "openstack_version": "2014.2-6.1", "production": "docker", "python-fuelclient_sha": "5c94b59bafc8dc1cbecb088020f4ef14ce62044a", "astute_sha": "5041b2fb508e6860c3cb96474ca31ec97e549e8b", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "2ca546b86e651d5638dbb1be9bae44b86c84a893", "fuellib_sha": "dee81f2be4d9063808a6755271ee818314997006"}}}, "auth_required": true, "api": "1.0", "nailgun_sha": "e5f101635455e94415969907deca510d2ded6b73", "openstack_version": "2014.2-6.1", "production": "docker", "python-fuelclient_sha": "5c94b59bafc8dc1cbecb088020f4ef14ce62044a", "astute_sha": "5041b2fb508e6860c3cb96474ca31ec97e549e8b", "feature_groups": ["mirantis"], "release": "6.1", "fuelmain_sha": "2ca546b86e651d5638dbb1be9bae44b86c84a893", "fuellib_sha": "dee81f2be4d9063808a6755271ee818314997006"

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.