HA nova cluster in error state with 504 error from keystone

Bug #1376686 reported by Tatyanka
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
High
MOS Keystone

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.0"
  api: "1.0"
  build_number: "540"
  build_id: "2014-10-02_00-00-10"
  astute_sha: "a3e5da62af91b99f958ab958161d3dcec09c657b"
  fuellib_sha: "df3377534acd70389b551025539152039e260f5c"
  ostf_sha: "f238ecaef0cad7d31aca7e6af2f2523d8fcdc9e7"
  nailgun_sha: "e691dcfbb6cdf1164a70f389ec854185b7c73f32"
  fuelmain_sha: "0a7cfb07241741dba3ecbd14210512b9eb1d88ae"

http://jenkins-product.srt.mirantis.net:8080/view/0_master_swarm/job/master_fuelmain.system_test.centos.thread_5/164/consoleFull
http://pastebin.com/b6aKct63
Deployment od cluster with 3 controllers 1 compute failes with next error in puppet:
http://pastebin.com/CsMPaeM4

It is happens on first controller: [root@nailgun ~]# fuel node
id | status | name | cluster | ip | mac | roles | pending_roles | online
---|-------------|---------------------|---------|-------------|-------------------|------------|---------------|-------
1 | error | slave-02_controller | 1 | 10.108.30.3 | 64:40:c2:7a:4c:70 | controller | | True
3 | provisioned | slave-03_controller | 1 | 10.108.30.5 | 64:3b:3a:86:fc:2f | controller | | True
5 | error | slave-05_compute | 1 | 10.108.30.7 | 64:c1:29:dd:ba:41 | compute | | True
2 | provisioned | slave-01_controller | 1 | 10.108.30.4 | 64:23:54:a0:25:a1 | controller | | True
4 | provisioned | slave-04_compute | 1 | 10.108.30.6 | 64:59:33:96:a3:ce | compute | | True

if we look at openrc:[root@node-1 ~]# cat ~/openrc
#!/bin/sh
export OS_NO_CACHE='true'
export OS_TENANT_NAME='admin'
export OS_USERNAME='admin'
export OS_PASSWORD='admin'
export OS_AUTH_URL='http://10.108.32.2:5000/v2.0/'

if we try to run command keystone service-list we wiil see:

[root@node-1 ~]# keystone --debug service-list
DEBUG:keystoneclient.session:REQ: curl -i -X POST http://10.108.32.2:5000/v2.0/tokens -H "Content-Type: application/json" -H "User-Agent: python-keystoneclient" -d '{"auth": {"tenantName": "admin", "passwordCredentials": {"username": "admin", "password": "admin"}}}'
INFO:urllib3.connectionpool:Starting new HTTP connection (1): 10.108.32.2
Authorization Failed: Unable to establish connection to http://10.108.32.2:5000/v2.0/tokens

at the same time
[root@node-1 ~]# ps uax| grep haproxy
root 12678 0.0 0.0 103244 864 pts/0 S+ 11:35 0:00 grep haproxy
[root@node-1 ~]# ip netns exec haproxy ifconfig
hapr-ns Link encap:Ethernet HWaddr 7A:63:DF:74:AD:B9
          inet addr:240.0.0.2 Bcast:0.0.0.0 Mask:255.255.255.252
          inet6 addr: fe80::7863:dfff:fe74:adb9/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:18 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1489 (1.4 KiB) TX bytes:1804 (1.7 KiB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:5 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:440 (440.0 b) TX bytes:440 (440.0 b)

Also crm says that qourum do not created, but I can not see error in puppet about this
[root@node-1 ~]# crm status
Last updated: Thu Oct 2 11:36:52 2014
Last change: Thu Oct 2 00:15:03 2014 via crm_attribute on node-1.test.domain.local
Stack: classic openais (with plugin)
Current DC: node-1.test.domain.local - partition WITHOUT quorum
Version: 1.1.10-14.el6_5.3-368c726
1 Nodes configured, 2 expected votes
7 Resources configured

Online: [ node-1.test.domain.local ]

Tags: ha
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Revision history for this message
Aleksandr Didenko (adidenko) wrote :
Download full text (5.8 KiB)

Puppet tried to evaluate Nova_floating_range resource 3 times during 3 apply attempts:

--
2014-10-02T00:46:10.103006+01:00 info: (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.31.128-10.108.31.254]) Starting to evaluate the resource
2014-10-02T00:47:10.390866+01:00 err: (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.31.128-10.108.31.254]) Could not evaluate: Authentication failed with response code 504
--
2014-10-02T00:59:27.169018+01:00 info: (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.31.128-10.108.31.254]) Starting to evaluate the resource
2014-10-02T01:00:27.184067+01:00 err: (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.31.128-10.108.31.254]) Could not evaluate: Authentication failed with response code 504
--
2014-10-02T01:10:50.201245+01:00 info: (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.31.128-10.108.31.254]) Starting to evaluate the resource
2014-10-02T01:11:50.232199+01:00 err: (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.31.128-10.108.31.254]) Could not evaluate: Authentication failed with response code 504

And all 3 times got 504 from keystone. According to haproxy logs, keystone backend was running, but it timed out (504 error):

--
2014-10-02T00:47:10.122969+01:00 info: 10.108.32.3:44245 [01/Oct/2014:23:46:10.118] keystone-1 keystone-1/node-1 0/0/0/-1/60004 504 194 - - sH-- 1/0/0/0/0 0/0 "POST /v2.0/tokens HTTP/1.1"
--
2014-10-02T01:00:27.173470+01:00 info: 10.108.32.3:53001 [01/Oct/2014:23:59:27.283] keystone-1 keystone-1/node-1 0/0/0/-1/60003 504 194 - - sH-- 1/0/0/0/0 0/0 "POST /v2.0/tokens HTTP/1.1"
--
2014-10-02T01:11:50.218234+01:00 info: 10.108.32.3:60821 [02/Oct/2014:00:10:50.234] keystone-1 keystone-1/node-1 0/0/0/-1/60003 504 194 - - sH-- 0/0/0/0/0 0/0 "POST /v2.0/tokens HTTP/1.1"

There's a strange pause in keystone-all.log for more than 60 seconds (00:46:10 -00:47:10) which most likely caused those 504 gateway timeout errors:

2014-10-02T00:46:10.114923+01:00 debug: 2014-10-01 23:46:10.120 5741 DEBUG keystone.middleware.core [-] Auth token not in the request header. Will not build auth context. process_request /usr/lib/python2.6/site-packages/keystone/middleware/core.py:271
2014-10-02T00:46:10.118667+01:00 debug: 2014-10-01 23:46:10.124 5741 DEBUG keystone.common.wsgi [-] arg_dict: {} __call__ /usr/lib/python2.6/site-packages/keystone/common/wsgi.py:181
2014-10-02T00:46:10.122182+01:00 debug: 2014-10-01 23:46:10.127 5741 DEBUG keystone.notifications [-] CADF Event: {'typeURI': 'http://schemas.dmtf.org/cloud/audit/1.0/event', 'initiator': {'typeURI': 'service/security/account/user', 'host': {'address': '10.108.32.2'}, 'id': 'openstack:4b6b8303-0a70-4011-a1c6-3d83f5dafa0f', 'name': u'0a524d9318014004be49590ac737d7c5'}, 'target': {'typeURI': 'service/security/account/user', 'id': 'openstack:1b969859-2b86-4b54-9fdb-a02fe679566b'}, 'observer': {'typeURI': 'service/security', 'id': 'openstack:2898e02f-11ad-447b-9844-3b69e636ea91'}, 'eventType': 'activity', 'eventTime': '2014-10-01T23:46:10.127696+0000', 'action': 'authenticate', 'outcome': 'pending', 'id': 'openstack:a035cd34-ae9...

Read more...

Changed in fuel:
status: New → Triaged
assignee: Fuel Library Team (fuel-library) → MOS Keystone (mos-keystone)
Revision history for this message
Alexander Makarov (amakarov) wrote :

Please provide environment with repro.

Changed in fuel:
status: Triaged → Incomplete
Changed in fuel:
assignee: MOS Keystone (mos-keystone) → Aleksandr Didenko (adidenko)
Changed in fuel:
assignee: Aleksandr Didenko (adidenko) → Tatyana (tatyana-leontovich)
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

I've managed to reproduce it:

Puppet log:
2014-11-14T16:34:14.870574+00:00 info: (/Stage[main]/Neutron::Server::Notifications/Nova_admin_tenant_id_setter[nova_admin_tenant_id]) Starting to evaluate the resource
2014-11-14T16:35:14.908851+00:00 err: Received error response from Keystone server at http://10.108.2.2:35357/v2.0/tokens: Gateway Time-out

Haproxy log:
2014-11-14T16:35:14.885928+00:00 info: 10.108.2.3:41559 [14/Nov/2014:16:34:15.012] keystone-2 keystone-2/node-1 2/0/0/-1/60006 504 194 - - sH-- 0/0/0/0/0 0/0 "POST /v2.0/tokens HTTP/1.1"

So request "POST /v2.0/tokens HTTP/1.1" was sent at 16:34:14, and it timed out at 16:35:14.

Snapshot with additional keystone logs in fuel-snapshot-2014-11-14_17-28-37/node-1.test.domain.local/var/log/ attached.

Here is access to the env (default root password):
ssh -p 3022 root@172.18.204.14

Changed in fuel:
status: Incomplete → Confirmed
assignee: Tatyana (tatyana-leontovich) → MOS Keystone (mos-keystone)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.