Deployment of Ha nova-flat cluster failed with (/Stage[main]/Osnailyfacter::Cluster_ha/Nova_floating_range[10.108.78.128-10.108.78.254]) Could not evaluate: Oops - not sure what happened: 757: unexpected token at '<html><body><h1>504 Gateway Time-out</h1>
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Fuel for OpenStack |
Fix Released
|
Critical
|
Unassigned | ||
5.1.x |
Fix Released
|
Critical
|
Aleksandr Didenko |
Bug Description
{
"build_id": "2014-11-
"ostf_sha": "9c6fadca272427
"build_number": "85",
"auth_
"api": "1.0",
"nailgun_sha": "8330f6221e190d
"production": "docker",
"fuelmain_sha": "f536d11fb40fed
"astute_sha": "3c374c9f7bfbdb
"feature_
"mirantis"
],
"release": "6.0",
"release_
],
}
}
},
"fuellib_sha": "c7b71bd1ee939b
}
Steps:
1. Create next cluster - CentOS, HA, flat nova-network, 3 controller, 2 compute nodes
2. Deploy cluster
Expected - cluster is deployed successfully
Actual - deployment failed with errors in puppet log on node-2
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
2014-11-10 07:22:03 ERR
(/Stage[
Logs are attached
Changed in fuel: | |
milestone: | none → 6.0 |
Changed in fuel: | |
assignee: | Fuel Library Team (fuel-library) → Aleksandr Didenko (adidenko) |
Changed in fuel: | |
status: | Triaged → In Progress |
Changed in fuel: | |
importance: | Critical → High |
Changed in fuel: | |
status: | In Progress → Fix Committed |
Changed in fuel: | |
status: | In Progress → Fix Committed |
Changed in fuel: | |
assignee: | Registry Administrators (registry) → nobody |
It looks like the problem is caused by Haproxy. When it starts, it shows all backends as "UP", then it runs health-checks and marks problem ones as "DOWN".
Here is an easy experiment on how to reproduce - go to any controller in your HA env, edit /etc/haproxy/ conf.d/ 170-nova- novncproxy. cfg file and change node-* IPs to some non existent ones. Then kill haproxy (it will be started automatically by pacemaker) and run this command:
while : ; do date ; echo show stat | socat unix-connect: ///var/ lib/haproxy/ stats stdio | grep 'nova-novncprox y,BACKEND' ; sleep 1 ; don
Tue Nov 11 11:09:35 UTC 2014 lib/haproxy/ stats", 26): Connection refused ,BACKEND, 0,0,0,0, 800,0,0, 0,0,0,, 0,0,0,0, UP,3,3, 0,,0,0, 0,,1,18, 0,,0,,1, 0,,0,,, ,0,0,0, 0,0,0,, ,,,0,0, 0,0,0,0, -1,,,0, 0,0,0, ,BACKEND, 0,0,0,0, 800,0,0, 0,0,0,, 0,0,0,0, UP,3,3, 0,,0,1, 0,,1,18, 0,,0,,1, 0,,0,,, ,0,0,0, 0,0,0,, ,,,0,0, 0,0,0,0, -1,,,0, 0,0,0, ,BACKEND, 0,0,0,0, 800,0,0, 0,0,0,, 0,0,0,0, UP,3,3, 0,,0,2, 0,,1,18, 0,,0,,1, 0,,0,,, ,0,0,0, 0,0,0,, ,,,0,0, 0,0,0,0, -1,,,0, 0,0,0, ,BACKEND, 0,0,0,0, 800,0,0, 0,0,0,, 0,0,0,0, UP,3,3, 0,,0,3, 0,,1,18, 0,,0,,1, 0,,0,,, ,0,0,0, 0,0,0,, ,,,0,0, 0,0,0,0, -1,,,0, 0,0,0, ,BACKEND, 0,0,0,0, 800,0,0, 0,0,0,, 0,0,0,0, DOWN,0, 0,0,,1, 0,0,,1, 18,0,,0, ,1,0,,0, ,,,0,0, 0,0,0,0, ,,,,0,0, 0,0,0,0, -1,,,0, 0,0,0, ,BACKEND, 0,0,0,0, 800,0,0, 0,0,0,, 0,0,0,0, DOWN,0, 0,0,,1, 1,1,,1, 18,0,,0, ,1,0,,0, ,,,0,0, 0,0,0,0, ,,,,0,0, 0,0,0,0, -1,,,0, 0,0,0, ,BACKEND, 0,0,0,0, 800,0,0, 0,0,0,, 0,0,0,0, DOWN,0, 0,0,,1, 2,2,,1, 18,0,,0, ,1,0,,0, ,,,0,0, 0,0,0,0, ,,,,0,0, 0,0,0,0, -1,,,0, 0,0,0, ,BACKEND, 0,0,0,0, 800,0,0, 0,0,0,, 0,0,0,0, DOWN,0, 0,0,,1, 3,3,,1, 18,0,,0, ,1,0,,0, ,,,0,0, 0,0,0,0, ,,,,0,0, 0,0,0,0, -1,,,0, 0,0,0,
2014/11/11 11:09:35 socat[25769] E connect(3, AF=1 "///var/
Tue Nov 11 11:09:36 UTC 2014
nova-novncproxy
Tue Nov 11 11:09:37 UTC 2014
nova-novncproxy
Tue Nov 11 11:09:38 UTC 2014
nova-novncproxy
Tue Nov 11 11:09:39 UTC 2014
nova-novncproxy
Tue Nov 11 11:09:40 UTC 2014
nova-novncproxy
Tue Nov 11 11:09:41 UTC 2014
nova-novncproxy
Tue Nov 11 11:09:42 UTC 2014
nova-novncproxy
Tue Nov 11 11:09:43 UTC 2014
nova-novncproxy
As you can see haproxy marks backend as UP at first and then marks it DOWN.
The same thing happens with our "wait-for- haproxy- keystone- backend" exec. In this particular case the hronology was:
2014-11- 10T07:20: 53.314581+ 00:00 notice: Proxy keystone-1 started. 10T07:20: 53.314581+ 00:00 notice: Proxy keystone-2 started.
2014-11-
*** keystone backend is UP according to haproxy ^^
2014-11- 10T07:20: 53.597929+ 00:00 notice: (/Stage[ main]/Osnailyfa cter::Cluster_ ha/Exec[ wait-for- haproxy- keystone- backend] /returns) executed successfully
*** puppet thinks keystone backend is ready ^^
2014-11- 10T07:20: 53.601276+ 00:00 info: (/Stage[ main]/Osnailyfa cter::Cluster_ ha/Nova_ floating_ range[10. 108.78. 128-10. 108.78. 254]) Starting to evaluate the resource
*** puppet tries to evaluate Nova_floating_range but keystone is down ^^
2014-11- 10T07:20: 55.481416+ 00:00 alert: Server keystone-1/node-4 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. 10T07:20: 55.522536+ 00:00 alert: Server keystone-1/node-5 is DOWN, reason: Layer4 timeout, check duration: 2000ms. 1 active and 0 b...
2014-11-