Deployment failed because HAProxy stats backend was unavailable

Bug #1459357 reported by Alexander Kurenyshev
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Dmitry Ilyin
6.1.x
Invalid
High
Denis Meltsaykin
7.0.x
Fix Released
High
Dmitry Ilyin

Bug Description

Steps to reproduce:
1. Create cluster
2. Add 3 nodes with controller role
3. Add 2 node with compute role
4. Setup bonding for all interfaces except admin/pxe
5. Run network verification
6. Deploy the cluster

Expected result:
Deploy passed.

Actual result:
Deploy failed.

At the Astute log:
2015-05-27T04:29:21 err: [653] Error running RPC method granular_deploy: Deployment failed on nodes 3, trace:
["/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine.rb:68:in `block (3 levels) in deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine.rb:51:in `each_slice'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine.rb:51:in `block (2 levels) in deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine.rb:49:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine.rb:49:in `block in deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine.rb:45:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/deployment_engine.rb:45:in `deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/orchestrator.rb:171:in `deploy_cluster'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/orchestrator.rb:56:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/dispatcher.rb:111:in `granular_deploy'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:142:in `dispatch_message'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:103:in `block in dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `call'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:64:in `block in each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/task_queue.rb:56:in `each'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `each_with_index'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:101:in `dispatch'",
 "/usr/lib64/ruby/gems/2.1.0/gems/astute-6.1.0/lib/astute/server/server.rb:85:in `block in perform_main_job'"]

At the puppet log from node-3:
Wed May 27 04:19:26 +0000 2015 Puppet (err): Could not run: execution expired
/usr/lib/ruby/1.8/timeout.rb:64:in `open'
/usr/lib/ruby/1.8/net/http.rb:560:in `connect'
/usr/lib/ruby/1.8/timeout.rb:53:in `timeout'
/usr/lib/ruby/1.8/timeout.rb:101:in `timeout'
/usr/lib/ruby/1.8/net/http.rb:560:in `connect'
/usr/lib/ruby/1.8/net/http.rb:553:in `do_start'
/usr/lib/ruby/1.8/net/http.rb:542:in `start'
/usr/lib/ruby/1.8/open-uri.rb:242:in `open_http'
/usr/lib/ruby/1.8/open-uri.rb:616:in `buffer_open'
/usr/lib/ruby/1.8/open-uri.rb:164:in `open_loop'
/usr/lib/ruby/1.8/open-uri.rb:162:in `catch'
/usr/lib/ruby/1.8/open-uri.rb:162:in `open_loop'
/usr/lib/ruby/1.8/open-uri.rb:132:in `open_uri'
/usr/lib/ruby/1.8/open-uri.rb:518:in `open'
/usr/lib/ruby/1.8/open-uri.rb:30:in `open'
/etc/puppet/modules/haproxy/lib/puppet/provider/haproxy_backend_status/haproxy.rb:135:in `get_csv_url'
/etc/puppet/modules/haproxy/lib/puppet/provider/haproxy_backend_status/haproxy.rb:20:in `csv'
/etc/puppet/modules/haproxy/lib/puppet/provider/haproxy_backend_status/haproxy.rb:14:in `csv'
/etc/puppet/modules/haproxy/lib/puppet/provider/haproxy_backend_status/haproxy.rb:12:in `times'
/etc/puppet/modules/haproxy/lib/puppet/provider/haproxy_backend_status/haproxy.rb:12:in `csv'
/etc/puppet/modules/haproxy/lib/puppet/provider/haproxy_backend_status/haproxy.rb:41:in `stats'
/etc/puppet/modules/haproxy/lib/puppet/provider/haproxy_backend_status/haproxy.rb:101:in `exists?'
/etc/puppet/modules/haproxy/lib/puppet/provider/haproxy_backend_status/haproxy.rb:71:in `ensure'
/usr/lib/ruby/site_ruby/1.8/puppet/property.rb:471:in `send'
/usr/lib/ruby/site_ruby/1.8/puppet/property.rb:471:in `retrieve'
/usr/lib/ruby/site_ruby/1.8/puppet/type.rb:1035:in `retrieve'
/usr/lib/ruby/site_ruby/1.8/puppet/type.rb:1063:in `retrieve_resource'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:223:in `from_resource'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/resource_harness.rb:17:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:174:in `apply'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:187:in `eval_resource'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:117:in `call'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:117:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:327:in `thinmark'
/usr/lib/ruby/1.8/benchmark.rb:308:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:326:in `thinmark'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:117:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/graph/relationship_graph.rb:118:in `traverse'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction.rb:108:in `evaluate'
/usr/lib/ruby/site_ruby/1.8/puppet/resource/catalog.rb:164:in `apply'
/usr/lib/ruby/site_ruby/1.8/puppet/util/log.rb:149:in `with_destination'
/usr/lib/ruby/site_ruby/1.8/puppet/transaction/report.rb:108:in `as_logging_destination'
/usr/lib/ruby/site_ruby/1.8/puppet/resource/catalog.rb:163:in `apply'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:125:in `apply_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:161:in `benchmark'
/usr/lib/ruby/1.8/benchmark.rb:308:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:160:in `benchmark'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:124:in `apply_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:192:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application/apply.rb:268:in `apply_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/application/apply.rb:218:in `main'
/usr/lib/ruby/site_ruby/1.8/puppet/application/apply.rb:146:in `run_command'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:364:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:470:in `plugin_hook'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:364:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:478:in `exit_on_fail'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:364:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/util/command_line.rb:137:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/util/command_line.rb:91:in `execute'
/usr/bin/puppet:4

Fuel used:
 feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "469"
  build_id: "2015-05-26_16-19-56"
  nailgun_sha: "f737675091bd1903aace0e36812e855ce47dfec7"
  python-fuelclient_sha: "e19f1b65792f84c4a18b5a9473f85ef3ba172fce"
  astute_sha: "0bd72c72369e743376864e8e8dabfe873d40450a"
  fuel-library_sha: "5cde6f16688486a37f3b280501e98a70883e0d06"
  fuel-ostf_sha: "87819878bc0ca572900e1f6933d9b99e666d6f62"
  fuelmain_sha: "13b3e9cf074ba1cf1ae06509c55fbab613c73f4e"

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :
Ryan Moe (rmoe)
summary: - Deploy failed with "Error running RPC method granular_deploy"
+ Deployment failed because HAProxy stats backend was unavailable
Ryan Moe (rmoe)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Ryan Moe (rmoe)
Revision history for this message
Ryan Moe (rmoe) wrote :

I'm not sure exactly what's going on here. Only node-3 can't connect to the haproxy stats port. During the time node-3 fails to connect, node-2(the other non-primary controller) is able to connect to that port with no issue.

Node-3:
Wed May 27 04:16:40 +0000 2015 Haproxy_backend_status[mysql](provider=haproxy) (debug): Get CSV from url 'http://10.109.11.34:10000/;csv'
Wed May 27 04:16:58 +0000 2015 Puppet (err): Could not run: execution expired

Node-2:
Wed May 27 04:17:03 +0000 2015 Haproxy_backend_status[mysql](provider=haproxy) (debug): Get CSV from url 'http://10.109.11.34:10000/;csv'
Wed May 27 04:17:03 +0000 2015 Haproxy_backend_status[mysql](provider=haproxy) (debug): Return: up

There doesn't seem to be a problem with the management network on node-3 as it's able to join the corosync and galera clusters.

Ryan Moe (rmoe)
Changed in fuel:
assignee: Ryan Moe (rmoe) → Fuel Library Team (fuel-library)
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Oleksiy Molchanov (omolchanov)
Andrey Maximov (maximov)
Changed in fuel:
assignee: Oleksiy Molchanov (omolchanov) → Dmitry Ilyin (idv1985)
Andrey Maximov (maximov)
Changed in fuel:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/185923
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=3119253955ac49b0ff9c49b84b56206f621e5524
Submitter: Jenkins
Branch: master

commit 3119253955ac49b0ff9c49b84b56206f621e5524
Author: Vladimir Kuklin <email address hidden>
Date: Wed May 27 21:45:12 2015 +0300

    Create a wrapper to apache start and stop

    Stop/Start apache service, on failure repeat operation after timeout

    Closes-bug: #1457893
    Related-Bug: #1459357
    Change-Id: I906139554c7cf770ffdac29f95d3d97f29b87f43
    Signed-off-by: Sergii Golovatiuk <email address hidden>

Revision history for this message
Dmitry Ilyin (idv1985) wrote :

Let's try to reproduce this bug with the updates haproxy backend provider
https://review.openstack.org/#/c/179131/
At least there are improved debug messages

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Latest swarm results showed alb bonding test passing successfully. Let's mark this one as Incomplete until there is an env that we can debug.

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Nastya Urlapova (aurlapova) wrote :
Changed in fuel:
status: Incomplete → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/179131
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=c2f2189978b750b4d9605999ab6a18199d310a89
Submitter: Jenkins
Branch: master

commit c2f2189978b750b4d9605999ab6a18199d310a89
Author: Dmitry Ilyin <email address hidden>
Date: Thu Apr 30 18:25:18 2015 +0300

    Refactor Haproxy_backend_status

    * Clean the code up
    * Improved debugging

    Change-Id: Ie1ba15e2f40c903efdf970447676987a14156ed4
    Related-Bug: 1449219
    Related-Bug: 1459357

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/209924
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=ff2274e44ab732d943976d1b21c8997dc90a7d94
Submitter: Jenkins
Branch: master

commit ff2274e44ab732d943976d1b21c8997dc90a7d94
Author: vsaienko <email address hidden>
Date: Fri Aug 14 11:45:12 2015 +0300

    Tune tweak::apache_wrappers module

    - Sometimes apache fails to start after stop, due to unclosed
      resources. The problem frequently reproduced with keystone wsgi
      module, and didn't reproduced with horizon or radosgw.
      'apachectl restart' is recommended if doing start/stop rapidly
      https://wiki.apache.org/httpd/CouldNotBindToAddress
    - Redefine restart => 'apachectl graceful' for apache service
    - Remove disabling of GarbageCollector

    Related-Bug: #1472675
    Related-Bug: #1484066
    Related-Bug: #1457893
    Related-Bug: #1459357

    Change-Id: I34843639eacc9bcb6d451d3376440c8bfe9014f7

Changed in fuel:
status: Fix Committed → Confirmed
status: Confirmed → Fix Committed
tags: added: on-verification
Revision history for this message
Vasiliy Pleshakov (vpleshakov) wrote :
Download full text (4.6 KiB)

Verified with fuel-7.0-256-2015-08-30_09-35-52.iso
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "256"
  build_id: "2015-08-30_09-35-52"
  nailgun_sha: "3189ccfb8c1dac888e351f535b03bdbc9d392406"
  python-fuelclient_sha: "9643fa07f1290071511066804f962f62fe27b512"
  fuel-agent_sha: "1e8f38bbb864ed99aa8fe862b6367e82afec3263"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "53c86cba593ddbac776ce5a3360240274c20738c"
  fuel-library_sha: "bc04a7092d92400c79e6ea6ede25e7b67c6a6355"
  fuel-ostf_sha: "644db51186dc23c9b27e9b5486c120c8363dc87c"
  fuelmain_sha: "0e54d68392b359bc122e5bbba9249c729eeaf579"

Deploy passed.

[root@fuel remote]# grep haproxy_backend_status * -ri|grep csv -A1
node-18.domain.tld/puppet-apply.log:2015-09-15T14:53:47.692579+00:00 debug: (Haproxy_backend_status[mysql](provider=haproxy)) Get CSV from url: 'http://192.168.0.9:10000/;csv'
node-18.domain.tld/puppet-apply.log:2015-09-15T14:53:47.696191+00:00 debug: (Haproxy_backend_status[mysql](provider=haproxy)) Got status: 'DOWN'
--
node-18.domain.tld/puppet-apply.log:2015-09-15T14:53:47.699614+00:00 debug: (Haproxy_backend_status[mysql](provider=haproxy)) Get CSV from url: 'http://192.168.0.9:10000/;csv'
node-18.domain.tld/puppet-apply.log:2015-09-15T14:53:47.702681+00:00 debug: (Haproxy_backend_status[mysql](provider=haproxy)) Got status: 'DOWN'
node-18.domain.tld/puppet-apply.log:2015-09-15T14:53:53.703071+00:00 debug: (Haproxy_backend_status[mysql](provider=haproxy)) Get CSV from url: 'http://192.168.0.9:10000/;csv'
node-18.domain.tld/puppet-apply.log:2015-09-15T14:53:53.706283+00:00 debug: (Haproxy_backend_status[mysql](provider=haproxy)) Got status: 'UP'
--
SKIP
--
node-18.domain.tld/puppet-apply.log:2015-09-15T15:22:37.910465+00:00 debug: (Haproxy_backend_status[keystone-public](provider=haproxy)) Get CSV from url: 'http://192.168.0.9:10000/;csv'
node-18.domain.tld/puppet-apply.log:2015-09-15T15:22:38.186860+00:00 debug: (Haproxy_backend_status[keystone-public](provider=haproxy)) Got status: 'UP'
--
SKIP
--
node-19.domain.tld/puppet-apply.log:2015-09-15T16:01:59.032336+00:00 debug: (Haproxy_backend_status[keystone-public](provider=haproxy)) Get CSV from url: 'http://192.168.0.9:10000/;csv'
node-19.domain.tld/puppet-apply.log:2015-09-15T16:01:59.037852+00:00 debug: (Haproxy_backend_status[keystone-public](provider=haproxy)) Got status: 'UP'
--
node-20.domain.tld/puppet-apply.log:2015-09-15T15:39:45.491916+00:00 debug: (Haproxy_backend_status[mysql](provider=haproxy)) Get CSV from url: 'http://192.168.0.9:10000/;csv'
node-20.domain.tld/puppet-apply.log:2015-09-15T15:39:45.495927+00:00 debug: (Haproxy_backend_status[mysql](provider=haproxy)) Got status: 'UP'
--
SKIP
--
node-20.domain.tld/puppet-apply.log:2015-09-15T16:02:28.390916+00:00 debug: (Haproxy_backend_status[keystone-public](provider=haproxy)) Get CSV from url: 'http://192.168.0.9:10000/;csv'
node-20.domain.tld/puppet-apply.log:2015-09-15T16:02:28.394767+00:00 debug: (Haproxy_backend_status[keystone-public](provider=haproxy)) Got status: 'UP'

root@node-18:~# cat /etc/networ...

Read more...

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
Revision history for this message
lidiwen (lidiwencs) wrote :

Does this issue fix on Fuel-6.1.iso?

Revision history for this message
Nastya Urlapova (aurlapova) wrote :

We have to backport it to 6.1-updates, request from customer was received.

Changed in fuel:
milestone: 6.1 → 7.0-updates
tags: added: customer-found
Revision history for this message
Abhimanyu (abhikec09) wrote :

I am facing the same issue in FUEL 7.0.

Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Such a scenario is being tested in the 6.1.system_test.ubuntu.bonding_ha swarm test, which is successful for a long time. Therefore, marking this report as Invalid for 6.1.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.