Fuel for OpenStack

Deployment of Ha nova-flat cluster failed with (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) Could not evaluate: Oops - not sure what happened: 757: unexpected token at '<html><body><h1>504 Gateway Time-out</h1>

Bug #1391180 reported by Andrey Sledzinskiy on 2014-11-10

This bug affects 7 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	Critical	Unassigned	Fuel for OpenStack 6.0
	5.1.x	Fix Released	Critical	Aleksandr Didenko	Fuel for OpenStack 5.1.1

Bug Description

{

    "build_id": "2014-11-08_21-30-24",
    "ostf_sha": "9c6fadca272427bb933bc459e14bb1bad7f614aa",
    "build_number": "85",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "8330f6221e190db87fc5baa735fa719c85a2e02d",
    "production": "docker",
    "fuelmain_sha": "f536d11fb40fed78c26f4d7326c45b0c48f157d8",
    "astute_sha": "3c374c9f7bfbdbcd7ce2f716cd704e3044e6fb41",
    "feature_groups": [
        "mirantis"
    ],
    "release": "6.0",
    "release_versions": {
        "2014.2-6.0": {
            "VERSION": {
                "build_id": "2014-11-08_21-30-24",
                "ostf_sha": "9c6fadca272427bb933bc459e14bb1bad7f614aa",
                "build_number": "85",
                "api": "1.0",
                "nailgun_sha": "8330f6221e190db87fc5baa735fa719c85a2e02d",
                "production": "docker",
                "fuelmain_sha": "f536d11fb40fed78c26f4d7326c45b0c48f157d8",
                "astute_sha": "3c374c9f7bfbdbcd7ce2f716cd704e3044e6fb41",
                "feature_groups": [
                    "mirantis"
                ],
                "release": "6.0",
                "fuellib_sha": "c7b71bd1ee939b5a634715ac7b13c2936ad93a5e"
            }
        }
    },
    "fuellib_sha": "c7b71bd1ee939b5a634715ac7b13c2936ad93a5e"

}

Steps:
1. Create next cluster - CentOS, HA, flat nova-network, 3 controller, 2 compute nodes
2. Deploy cluster

Expected - cluster is deployed successfully
Actual - deployment failed with errors in puppet log on node-2
2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) /etc/puppet/modules/nova/lib/puppet/provider/nova_floating_range/nova_manage.rb:15:in `exists?'

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) /etc/puppet/modules/nova/lib/puppet/provider/nova_floating_range/nova_manage.rb:49:in `operate_range'

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) /usr/lib/ruby/gems/1.8/gems/openstack-1.1.2/lib/openstack/compute/connection.rb:475:in `get_floating_ips_bulk'

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) /usr/lib/ruby/gems/1.8/gems/openstack-1.1.2/lib/openstack/compute/connection.rb:502:in `check_extension'

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) /usr/lib/ruby/gems/1.8/gems/openstack-1.1.2/lib/openstack/compute/connection.rb:242:in `api_extensions'

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) /usr/lib/ruby/gems/1.8/gems/openstack-1.1.2/lib/openstack/connection.rb:207:in `req'

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) /usr/lib/ruby/gems/1.8/gems/openstack-1.1.2/lib/openstack/connection.rb:510:in `raise_exception'

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) /usr/lib/ruby/gems/1.8/gems/openstack-1.1.2/lib/openstack/connection.rb:533:in `deal_with_faulty_error'

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) '

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) </body></html>

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) The server didn't respond in time.

2014-11-10 07:22:03 ERR

(/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) Could not evaluate: Oops - not sure what happened: 757: unexpected token at '<html><body><h1>504 Gateway Time-out</h1>

Logs are attached

Revision history for this message

Andrey Sledzinskiy (asledzinskiy) wrote on 2014-11-10:

fail_error_ha_flat_add_compute-2014_11_10__07_44_39.tar.gz Edit (3.9 MiB, application/x-tar)

Ihor Kalnytskyi (ikalnytskyi) on 2014-11-11

Changed in fuel:
milestone:	none → 6.0

Aleksandr Didenko (adidenko) on 2014-11-11

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Aleksandr Didenko (adidenko)

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-11-11:

Download full text (3.9 KiB)

It looks like the problem is caused by Haproxy. When it starts, it shows all backends as "UP", then it runs health-checks and marks problem ones as "DOWN".

Here is an easy experiment on how to reproduce - go to any controller in your HA env, edit /etc/haproxy/conf.d/170-nova-novncproxy.cfg file and change node-* IPs to some non existent ones. Then kill haproxy (it will be started automatically by pacemaker) and run this command:

while : ; do date ; echo show stat | socat unix-connect:///var/lib/haproxy/stats stdio | grep 'nova-novncproxy,BACKEND'; sleep 1 ; don

Tue Nov 11 11:09:35 UTC 2014
2014/11/11 11:09:35 socat[25769] E connect(3, AF=1 "///var/lib/haproxy/stats", 26): Connection refused
Tue Nov 11 11:09:36 UTC 2014
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,UP,3,3,0,,0,0,0,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,
Tue Nov 11 11:09:37 UTC 2014
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,UP,3,3,0,,0,1,0,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,
Tue Nov 11 11:09:38 UTC 2014
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,UP,3,3,0,,0,2,0,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,
Tue Nov 11 11:09:39 UTC 2014
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,UP,3,3,0,,0,3,0,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,
Tue Nov 11 11:09:40 UTC 2014
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,DOWN,0,0,0,,1,0,0,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,
Tue Nov 11 11:09:41 UTC 2014
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,DOWN,0,0,0,,1,1,1,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,
Tue Nov 11 11:09:42 UTC 2014
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,DOWN,0,0,0,,1,2,2,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,
Tue Nov 11 11:09:43 UTC 2014
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,DOWN,0,0,0,,1,3,3,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,

As you can see haproxy marks backend as UP at first and then marks it DOWN.

The same thing happens with our "wait-for-haproxy-keystone-backend" exec. In this particular case the hronology was:

2014-11-10T07:20:53.314581+00:00 notice: Proxy keystone-1 started.
2014-11-10T07:20:53.314581+00:00 notice: Proxy keystone-2 started.

*** keystone backend is UP according to haproxy ^^

2014-11-10T07:20:53.597929+00:00 notice: (/Stage[main]/Osnailyfacter::Cluster_ha/Exec[wait-for-haproxy-keystone-backend]/returns) executed successfully

*** puppet thinks keystone backend is ready ^^

2014-11-10T07:20:53.601276+00:00 info: (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) Starting to evaluate the resource

*** puppet tries to evaluate Nova_floating_range but keystone is down ^^

It looks like the problem is caused by Haproxy. When it starts, it shows all backends as "UP", then it runs health-checks and marks problem ones as "DOWN".

while : ; do date ; echo show stat | socat unix-connect:///var/lib/haproxy/stats stdio | grep 'nova-novncproxy,BACKEND'; sleep 1 ; don

As you can see haproxy marks backend as UP at first and then marks it DOWN.

The same thing happens with our "wait-for-haproxy-keystone-backend" exec. In this particular case the hronology was:

2014-11-10T07:20:53.314581+00:00 notice:  Proxy keystone-1 started.
2014-11-10T07:20:53.314581+00:00 notice:  Proxy keystone-2 started.

*** keystone backend is UP according to haproxy ^^

2014-11-10T07:20:53.597929+00:00 notice:  (/Stage[main]/Osnailyfacter::Cluster_ha/Exec[wait-for-haproxy-keystone-backend]/returns) executed successfully

*** puppet thinks keystone backend is ready ^^

2014-11-10T07:20:53.601276+00:00 info:  (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) Starting to evaluate the resource

*** puppet tries to evaluate Nova_floating_range but keystone is down ^^

2014-11-10T07:20:55.481416+00:00 alert:  Server keystone-1/node-4 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2014-11-10T07:20:55.522536+00:00 alert:  Server keystone-1/node-5 is DOWN, reason: Layer4 timeout, check duration: 2000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2014-11-10T07:20:55.605931+00:00 alert:  Server keystone-2/node-4 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2014-11-10T07:20:55.647980+00:00 alert:  Server keystone-2/node-5 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 1 active and 0 backup servers left. 0 sessions active, 0 reque

*** haproxy marks keystone as DOWN ^^

2014-11-10T07:22:03.087707+00:00 err:  (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) Could not evaluate: Oops - not sure what happened: 757: unexpected token at '<html><body><h1>504 Gateway Time-out</h1>

*** puppet fails to evaluate Nova_floating_range

Changed in fuel:
status:	New → Triaged
importance:	High → Critical

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-11-11: Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/133686

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2014-11-11:

Here is how it looks right after haproxy start:

nova-novncproxy,FRONTEND,,,0,0,8000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,18,0,,,,0,0,0,0,,,,0,0,0,0,0,0,,0,0,0,,,0,0,0,0,,,,,,,,
nova-novncproxy,node-1,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP 1/3,1,1,0,0,0,4,0,,1,18,1,,0,,2,0,,0,INI,,,0,0,0,0,0,0,0,,,,0,0,,,,,-1,,,0,0,0,0,
nova-novncproxy,node-3,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP 1/3,1,1,0,0,0,4,0,,1,18,2,,0,,2,0,,0,INI,,,0,0,0,0,0,0,0,,,,0,0,,,,,-1,,,0,0,0,0,
nova-novncproxy,node-5,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP 1/3,1,1,0,0,0,4,0,,1,18,3,,0,,2,0,,0,INI,,,0,0,0,0,0,0,0,,,,0,0,,,,,-1,,,0,0,0,0,
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,UP,3,3,0,,0,4,0,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,

Then haproxy marks backend as DOWN:

nova-novncproxy,FRONTEND,,,0,0,8000,0,0,0,0,0,0,,,,,OPEN,,,,,,,,,1,18,0,,,,0,0,0,0,,,,0,0,0,0,0,0,,0,0,0,,,0,0,0,0,,,,,,,,
nova-novncproxy,node-1,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,2,2,,1,18,1,,0,,2,0,,0,L4TOUT,,2000,0,0,0,0,0,0,0,,,,0,0,,,,,-1,,,0,0,0,0,
nova-novncproxy,node-3,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,2,2,,1,18,2,,0,,2,0,,0,L4TOUT,,2001,0,0,0,0,0,0,0,,,,0,0,,,,,-1,,,0,0,0,0,
nova-novncproxy,node-5,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN,1,1,0,1,1,2,2,,1,18,3,,0,,2,0,,0,L4TOUT,,2001,0,0,0,0,0,0,0,,,,0,0,,,,,-1,,,0,0,0,0,
nova-novncproxy,BACKEND,0,0,0,0,800,0,0,0,0,0,,0,0,0,0,DOWN,0,0,0,,1,2,2,,1,18,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,,0,0,0,0,0,0,-1,,,0,0,0,0,

Aleksandr Didenko (adidenko) on 2014-11-11

Changed in fuel:
status:	Triaged → In Progress

Aleksandr Didenko (adidenko) on 2014-11-11

Changed in fuel:
importance:	Critical → High

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2014-11-15:

Priority upgraded back to Critical since it affects Fuel CI in 5.1.1:
https://review.openstack.org/#/c/134677/

Changed in fuel:
importance:	High → Critical

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-11-17: Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/133686
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=360ef59e8f828243304a669328257be8504b8df6
Submitter: Jenkins
Branch: master

commit 360ef59e8f828243304a669328257be8504b8df6
Author: Aleksandr Didenko <email address hidden>
Date: Tue Nov 11 14:53:35 2014 +0200

Improve wait-for-* execs

    Right after start haproxy marks all backends as UP. It marks
    problem ones as DOWN only after health-checks. So we should not
    rely on BACKEND status while INI is in process.

Change-Id: Ia3743d446826ee1a9d42799a83e14a911a577966
Related-bug: #1391180

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-11-17: Related fix proposed to fuel-library (stable/5.1)

Related fix proposed to branch: stable/5.1
Review: https://review.openstack.org/134920

Bogdan Dobrelya (bogdando) on 2014-11-17

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-11-17:

Fix for 6.0 is not yet committed as this error happens due to memcached issue, for which we occasionally did not merge the patch into our 6.0 packages.

Changed in fuel:
status:	Fix Committed → In Progress
assignee:	Aleksandr Didenko (adidenko) → Fuel OSCI Team (fuel-osci)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-11-17: Related fix merged to fuel-library (stable/5.1)

Reviewed: https://review.openstack.org/134920
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=add3fdd3e2af57b20dbb73a6bc53a9ccc4701c9a
Submitter: Jenkins
Branch: stable/5.1

commit add3fdd3e2af57b20dbb73a6bc53a9ccc4701c9a
Author: Aleksandr Didenko <email address hidden>
Date: Tue Nov 11 14:53:35 2014 +0200

Improve wait-for-* execs

    Right after start haproxy marks all backends as UP. It marks
    problem ones as DOWN only after health-checks. So we should not
    rely on BACKEND status while INI is in process.

    Change-Id: Ia3743d446826ee1a9d42799a83e14a911a577966
    Related-bug: #1391180
    (cherry picked from commit 360ef59e8f828243304a669328257be8504b8df6)

Aleksandr Didenko (adidenko) on 2014-11-18

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Mykhaylo Slobodyan (mykhaylo-slobodyan) wrote on 2014-11-18:

#10

Verified on 5.1.1 ISO #18

{"build_id": "2014-11-17_12-14-23", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "18", "auth_required": true, "api": "1.0", "nailgun_sha": "2fcab95dc43a248ba867065e96ab764ee73882d1", "production": "docker", "fuelmain_sha": "ff22ca819e6eb7c63b6d7978fdd80ef9b84457d9", "astute_sha": "702af3db6f5bca92525bc8322d7d5d7675ec857e", "feature_groups": ["mirantis"], "release": "5.1.1", "release_versions": {"2014.1.3-5.1.1": {"VERSION": {"build_id": "2014-11-17_12-14-23", "ostf_sha": "64cb59c681658a7a55cc2c09d079072a41beb346", "build_number": "18", "api": "1.0", "nailgun_sha": "2fcab95dc43a248ba867065e96ab764ee73882d1", "production": "docker", "fuelmain_sha": "ff22ca819e6eb7c63b6d7978fdd80ef9b84457d9", "astute_sha": "702af3db6f5bca92525bc8322d7d5d7675ec857e", "feature_groups": ["mirantis"], "release": "5.1.1", "fuellib_sha": "0d3909b9a291880af28dbe48b9c7d25215aa98ea"}}}, "fuellib_sha": "0d3909b9a291880af28dbe48b9c7d25215aa98ea"}

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-11-20:

#11

Verified on ISO #114 for 6.0

"build_id": "2014-11-18_22-00-23", "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068", "build_number": "114", "auth_required": true, "api": "1.0", "nailgun_sha": "b0add09c4361fee8fc70637c9a6ef42fbe738abe", "production": "docker", "fuelmain_sha": "e556f0e1b00c30ec5c4b374ca2878c047c8686c2", "astute_sha": "65eb911c38afc0e23d187772f9a05f703c685896", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-11-18_22-00-23", "ostf_sha": "82465a94eed4eff1fc8d8e1f2fb7e9993c22f068", "build_number": "114", "api": "1.0", "nailgun_sha": "b0add09c4361fee8fc70637c9a6ef42fbe738abe", "production": "docker", "fuelmain_sha": "e556f0e1b00c30ec5c4b374ca2878c047c8686c2", "astute_sha": "65eb911c38afc0e23d187772f9a05f703c685896", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "5a5275370b33ab3b9a403728a1c7ad173289e4a0"}}}, "fuellib_sha": "5a5275370b33ab3b9a403728a1c7ad173289e4a0"