ERR: Authorization Failed: Gateway Timeout (HTTP 504)

Bug #1334306 reported by Anastasia Palkina
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Confirmed
Medium
Aleksandr Didenko
5.0.x
Confirmed
Medium
Fuel Library (Deprecated)
5.1.x
Confirmed
Medium
Fuel Library (Deprecated)
6.0.x
Confirmed
Medium
Aleksandr Didenko

Bug Description

"build_id": "2014-06-25_00-31-14",
"mirantis": "yes",
"build_number": "270",
"ostf_sha": "4d2efa822344b6ca022ec4086b6f083c07d90e14",
"nailgun_sha": "eeb88eecafa11a200de8f169a29975506dda29b2",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "e1fe73e77b7a89a035540390f5a6f6e5c8fb3615",
"astute_sha": "694b5a55695e01e1c42185bfac9cc7a641a9bd48",
"release": "5.1",
"fuellib_sha": "d204858549ce3e118935fb2a9ed8a907dd197bb5"

1. Create new environment (CentOS, HA mode)
2. Choose GRE segmentation
3. Add 3 controllers, 1 compute
4. Check openstack debugging
5. Start deployment. It has failed. Timeout of deployment is exceeded.
6. There are errors in puppet.log on first controller (node-1):

2014-06-25 14:03:35 ERR

 Execution of '/usr/bin/keystone --os-tenant-name services --os-username neutron --os-password DW32vBoG --os-auth-url http://192.168.0.1:35357/v2.0 tenant-list' returned 1: Authorization Failed: Gateway Timeout (HTTP 504)

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The first '504' errors encountered are not keystone related:
2014-06-25T14:00:28.114376 node-1 ./node-1.domain.tld/puppet-apply.log:2014-06-25T14:00:28.114376+01:00 debug: Non-fatal error: "Execution of '/usr/bin/neutron --os-tenant-name services --os-username neutron --os-password DW32vBoG --os-auth-url http://192.168.0.1:35357/v2.0 net-list' returned 1: <html><body><h1>504 Gateway Time-out</h1>
2014-06-25T14:01:31.366807 node-1 ./node-1.domain.tld/puppet-apply.log:2014-06-25T14:01:31.366807+01:00 err: Could not prefetch neutron_net provider 'neutron': Can't prefetch net-list. Neutron or Keystone API not availaible.

And Keystone errors go 3 min later.

Galera cluster looks ok, tho
2014-06-25T13:51:46.013399 node-1 ./node-1.domain.tld/mysqld.log:2014-06-25T13:51:46.013399+01:00 err: 140625 12:51:46 [Note] /usr/sbin/mysqld: ready for connections.

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergii Golovatiuk (sgolovatiuk)
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

This bug was addressed in "ha-pacemaker-improvements" blueprint. I cannot reproduce it on 274 build.

Changed in fuel:
status: New → Invalid
Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Egor Kotko (ykotko) wrote :
Changed in fuel:
status: Invalid → Confirmed
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

it looks like performance issue as 504 timeout may be returned due to load and slow system. we need atop diagnostic info here

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

Analyzing the log I found
Wed Jun 25 12:59:23 +0000 2014 Exec[haproxy reload for neutron](provider=shell) (debug): Executing '/bin/sh-cexport OCF_ROOT="/usr/lib/ocf"; (ip netns list | grep haproxy) && ip netns exec haproxy /usr/lib/ocf/resource.d/mirantis/ns_haproxy reload'

Wed Jun 25 12:59:25 +0000 2014 Puppet (debug): Executing '/usr/bin/neutron --os-tenant-name services --os-username neutron --os-password DW32vBoG --os-auth-url http://192.168.0.1:35357/v2.0 net-list'
Wed Jun 25 13:00:28 +0000 2014 Puppet (debug): Non-fatal error: "Execution of '/usr/bin/neutron --os-tenant-name services --os-username neutron --os-password DW32vBoG --os-auth-url http://192.168.0.1:35357/v2.0 net-list' returned 1: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

This means that HAProxy was restarted 2 seconds before neutron started asking for services.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/116193

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.0)

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/116195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/5.0)

Change abandoned by Sergii Golovatiuk (<email address hidden>) on branch: stable/5.0
Review: https://review.openstack.org/116195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Sergii Golovatiuk (<email address hidden>) on branch: master
Review: https://review.openstack.org/116193

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

This issue was reproduced again during system tests on CI:

http://jenkins-product.srt.mirantis.net:8080/view/0_master_swarm/job/master_fuelmain.system_test.centos.thread_3/150/testReport/(root)/deploy_neutron_vlan_ha/deploy_neutron_vlan_ha/

Deployment failed due to puppet errors on node-1:

http://paste.openstack.org/show/101721/

I manually checked that keystone was up at that moment, but unavailable via VIP. Here is corosync status:

http://paste.openstack.org/show/101722/

and the parts of its logs:

http://paste.openstack.org/show/101723/

Changed in fuel:
status: Fix Committed → Confirmed
Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Artem, corosync logs show that cluster was affected by network partitioning. This is why all the resources on the node-1 node were stopped by pacemaker and thus haproxy returned error 504. And we are not going to handle 504 error for keystone in neutron providers as keystone must be ready by the moment neutron wants to create networks.

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

There is a wrong regexp in provider.rb. Error message clearly stated <html><body><h1>504 Gateway Time-out</h1>

and it doesn't match

/(\(HTTP\s+400\))|(\[Errno 111\]\s+Connection\s+refused)|(503\s+Service\s+Unavailable)|(Max\s+retries\s+exceeded)|(Unable\sto\sestablish\sconnection\sto) |\(HTTP\s+50[34]\)/

It should be

/(\(HTTP\s+400\))|(\[Errno 111\]\s+Connection\s+refused)|(503\s+Service\s+Unavailable)|(Max\s+retries\s+exceeded)|(Unable\sto\sestablish\sconnection\sto)|(504\s+Gateway\s+Time-out)/

as 503 and 504 have individual Error descriptions and don't have HTTP at the beginning.

Changed in fuel:
status: Fix Committed → In Progress
Changed in fuel:
importance: High → Medium
Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

also affected
http://jenkins-product.srt.mirantis.net:8080/view/0_master_swarm/job/master_fuelmain.system_test.centos.thread_3/151/testReport/%28root%29/deploy_ha_vlan/deploy_ha_vlan/

{
   "build_id": "2014-08-29_00-01-17",
   "ostf_sha": "4dcd99cc4bfa19f52d4b87ed321eb84ff03844da",
   "build_number": "486",
   "auth_required": true,
   "api": "1.0",
   "nailgun_sha": "a762c6029ba852e73ad6aef89fbcd0a8afb79d87",
   "production": "docker",
   "fuelmain_sha": "c450b341ea416813e8358026754d85228f4513eb",
   "astute_sha": "bc60b7d027ab244039f48c505ac52ab8eb0a990c",
   "feature_groups": [
      "mirantis"
   ],
   "release": "5.1",
   "fuellib_sha": "639ac9e633b13e9cbdb93abee0423a881d70b105"
}

2014-08-29 01:22:36 ERR
 (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.26.128-10.108.26.254]) Could not evaluate: Oops - not sure what happened: 757: unexpected token at '<html><body><h1>504 Gateway Time-out</h1>

Revision history for this message
Kirill Omelchenko (komelchenko) wrote :

Also affected:

HA + Ceilometr:
- 3x Controllers
- 3x MongoDB

{

   "build_id": "2014-09-12_00-01-11",

   "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",

   "build_number": "4",

   "auth_required": true,

   "api": "1.0",

   "nailgun_sha": "d389bc6489fe296c9c210f7c65ac84e154a8b82b",

   "production": "docker",

   "fuelmain_sha": "d899675a5a393625f8166b29099d26f45d527035",

   "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13",

   "feature_groups": [

      "experimental"

   ],

   "release": "5.1",

   "release_versions": {

      "2014.1.1-5.1": {

         "VERSION": {

            "build_id": "2014-09-12_00-01-11",

            "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346",

            "build_number": "4",

            "api": "1.0",

            "nailgun_sha": "d389bc6489fe296c9c210f7c65ac84e154a8b82b",

            "production": "docker",

            "fuelmain_sha": "d899675a5a393625f8166b29099d26f45d527035",

            "astute_sha": "f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13",

            "feature_groups": [

               "experimental"

            ],

            "release": "5.1",

            "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"

         }

      }

   },

   "fuellib_sha": "395fd9d20a003603cc9ad26e16cb13c1c45e24e6"

}

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.