Failed haproxy reload breaks deployment of primary controller

Bug #1264388 reported by Dmitry Borodaenko
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Dmitry Borodaenko

Bug Description

Fuel 4.0 ISO #213, HA Ubuntu Neutron w/ VLAN segmentation, 3x controller+ceph 2x compute+ceph.

Deployment failed starting with the following error on the primary controller:

 (/Stage[main]/Openstack::Controller_ha/Exec[restart_haproxy]) Failed to call refresh: export OCF_ROOT="/usr/lib/ocf"; /usr/lib/ocf/resource.d/mirantis/haproxy reload returned 1 instead of one of [0,] at /etc/puppet/modules/openstack/manifests/controller_ha.pp:177

It is the same error as observed on 12/23, then the problem was solved by removing this commit from the build:

https://review.openstack.org/61940

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The haproxy.cfg isn't updated with the Fuel version until:

2013-12-26T21:40:29.614107+00:00 notice: (/Stage[main]/Openstack::Controller_ha/Concat[/etc/haproxy/haproxy.cfg]/File[/var/lib/puppet/concat/_etc_haproxy_haproxy.cfg]/ensure) created

Until then, default haproxy.cfg from the deb package is used. With it, the haproxy reload command from the bug description fails as follows:

root@node-1:~# export OCF_ROOT="/usr/lib/ocf"; /usr/lib/ocf/resource.d/mirantis/haproxy reload
haproxy[12281]: INFO: haproxy daemon is not running
haproxy[12281]: INFO: Haproxy daemon is not running. Starting it.
haproxy[12281]: INFO: haproxy daemon is not running
[ALERT] 359/234739 (12441) : [/usr/sbin/haproxy.main()] Cannot create pidfile
haproxy[12281]: ERROR: Error. haproxy daemon returned error 0.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

By 21:47:37, when the haproxy reload failed, haproxy.cfg was already fully populated:

Dec 26 21:47:27 node-1 puppet-apply[1076]: (/Stage[main]/Openstack::Controller_ha/Concat[/etc/haproxy/haproxy.cfg]/File[/etc/haproxy/haproxy.cfg]/content) content changed '{md5}c3bfb0c86138552475dea458e8ab36f3' to '{md5}a252435ce06257d853095acb67187156'

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Compared against a successful deployment with https://review.openstack.org/61940 reverted, the difference in the logs starts with:

2013-12-26T21:47:37.256453+00:00 notice: (/Stage[main]/Openstack::Controller_ha/Exec[restart_haproxy]/returns) [ALERT] 359/214735 (28437) : Starting proxy ceilometer: cannot bind socket
2013-12-26T21:47:37.257114+00:00 notice: (/Stage[main]/Openstack::Controller_ha/Exec[restart_haproxy]/returns) [ALERT] 359/214735 (28437) : Starting proxy ceilometer: cannot bind socket

The above seems to be the reason why haproxy failed to restart. netstat output from node-1 in the failed deployment contains the following:

tcp 0 0 0.0.0.0:8777 0.0.0.0:* LISTEN 20325/python

Compared to netstat from a successful deployment:

tcp 0 0 192.168.0.2:8777 0.0.0.0:* LISTEN 16730/haproxy
tcp 0 0 10.108.1.3:8777 0.0.0.0:* LISTEN 16730/haproxy
tcp 0 0 192.168.0.3:8777 0.0.0.0:* LISTEN 15417/python

tags: added: ceilometer
tags: added: ha
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Presense of this bug also confirmed in ISO #214.

In a failed deployment as well as a successful one, ceilometer.conf is configured to use the internal IP (192.168.0.3) as api.host (IP address to bind for ceilometer-api service), so misconfiguration of ceilometer can be ruled out as a root cause.

Changed in fuel:
status: New → Confirmed
Mike Scherbakov (mihgen)
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
milestone: 4.0 → 4.1
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The primary suspect for the root cause is that ceilometer-api service is not restarted between the package is installed and when haproxy for ceilometer is configured. On Ubuntu, the service is started with default configuration (binding to 0.0.0.0:8777) when the package is installed. On CentOS, this problem does not occur.

tags: added: ubuntu
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

In a deployment on Ubuntu with Neutron/VLAN, with Ceilometer enabled and Savana and Murano disabled, this problem occurred only on a secondary controller, deployment didn't fail and completed successfully.

Changed in fuel:
status: Confirmed → In Progress
assignee: Fuel Library Team (fuel-library) → Dmitry Borodaenko (dborodaenko)
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Adding a straght requirement like Service['ceilometer-api'] -> Haproxy_service['ceilometer'] creates a dependency loop in Puppet, there is a dependency chain somewhere from haproxy_service back to ceilometer-api. Most likely, it involves Savanna: deployment still fails if Savanna is enabled and Murano is disabled.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The dependency loop is even shorter: haproxy_service -> haproxy_done -> galera -> mysql-server -> openstack::db::mysql -> openstack::ceilometer.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/65591

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The proposed solution is insufficient due to a fundamental limitation in the Puppet concat module: it does not allow for a configuration file to be generated more than once per Puppet run. Because of that, adding a dependency from ceilometer-api to haproxy still creates a dependency cycle.

There are two possible approaches to resolve this:

1) Patch haproxy to allow loading configuration from multiple files (a patch was published but never got merged at http://marc.info/?l=haproxy&m=129235503410444), and rewrite the haproxy Puppet module to use separate files instead of concat.

2) Patch Puppet inifile module to support HAProxy configuration file format (section headers are based on indentation instead of [], setting names and values are separated by whitespace instead of =, parts of the same section with the same name may show up many times), then rewrite the haproxy Puppet module to use inifile instead of concat.

I'm in favor of proceeding with the first option.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Approach #1 implemented and is now being tested.
Review: https://review.openstack.org/65591

Revision history for this message
Ivan Ponomarev (ivanzipfer) wrote :

we have got the problem described in this bug. When we apply this patch we have
 (/Stage[cluster_head]/Cluster::Haproxy/Service[haproxy]/ensure) change from stopped to running failed: execution expired
logs are in attach

{"build_id": "2014-01-30_01-17-41",
"ostf_sha": "338ddf840c229918d1df8c6597588b853d02de4c", "build_number": "68",
"nailgun_sha": "6f9766853968f0bba68a0487b43fe67ba02a05f7",
"fuelmain_sha": "7d8768f2ac7e1e54d16c135e4ebd64722e49179e",
"astute_sha": "d002c3bf626cff96a1d4aec9eb92fc4d5f4542c4", "release": "4.1",
"fuellib_sha": "6cb8a54d91bd552539970b01a80ba5635876bd2c"}

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Patch Set 8 version of the patch has worked for me with Ubuntu but has caused an "execution expired" problem with CentOS, I'm now downloading the logs to confirm if it was the same issue. I'm currently testing an update with will resolve the remaining ordering problems and would work both in CentOS and Ubuntu.

Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

The current version of the fix (Patch Set 13) has passed testing and is now waiting for code review to finish.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)
Download full text (3.4 KiB)

Reviewed: https://review.openstack.org/65591
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=8ebe363bca2a25b443d57d171f96b5cacccf9cc4
Submitter: Jenkins
Branch: master

commit 8ebe363bca2a25b443d57d171f96b5cacccf9cc4
Author: Dmitry Borodaenko <email address hidden>
Date: Fri Jan 3 12:41:31 2014 -0800

    Finish configuring a service before adding it to HAProxy

    Starting an HAProxy instance for ceilometer-api (and possibly other
    similar services) will fail on Ubuntu if the package for that service is
    installed but not yet configured: the service is still running in its
    default configuration binding to 0.0.0.0 instead of just the management
    IP, preventing HAProxy from binding to the management VIP on the same
    port.

    To prevent a dependency loop (Ceilometer, like most other OpenStack
    services, depends on MySQL), Galera dependencies have to be narrowed
    down from the whole of HAProxy setup to just the registration of MySQL
    service with it. HAProxy related parts of controller_ha and galera
    classes are refactored to support reloading HAProxy for each service
    independently, and to group together HAProxy options specific to
    individual services.

    To support granular addition of services into HAProxy, haproxy module
    updated to puppetlabs-haproxy 1.0.1 (0d4c50e) and refactored to split
    haproxy.cfg into multiple files under /etc/haproxy/conf.d using include
    directive as implemented in the patch by Brane F. Gračnar:

    http://marc.info/?l=haproxy&m=129235503410444

    The define_backends option for haproxy class introduced in Fuel is
    renamed to define_backups and ported to the new haproxy module.

    Support for selecting exported haproxy::balancermember resources by tag
    is dropped since it is no longer used (and not relevant since Fuel no
    longer runs puppetmaster).

    An ordered list of controllers is passed from osnailyfacter::cluster_ha
    to openstack::controller_ha so that haproxy::balancermember always sets
    the primary controller as the active server in failover configurations.

    HAProxy is now configured as part the "main" stage, so "cluster_head"
    stage is no longer necessary. MySQL and RabbitMQ listen blocks are
    populated before HAProxy is first started, other listen blocks are added
    incrementally with HAProxy being reloaded via OCF. In HA mode, all
    dbsync invokations now require HAProxy MySQL (Ceilometer and Heat were
    missing), and all OpenStack services now require HAProxy Keystone.

    HAProxy configuration for RadosGW is changed from active-passive
    failover to active-active load balancing between controller nodes.
    Backend IP addresses for RadosGW servers are changed from storage to
    internal.

    HAProxy mode option handling is cleaned up so that the default "http"
    mode now propagates properly for listening services that do not specify
    "tcp" mode explicitly.

    Syslog facility for Ceilometer changed from SYSLOG to LOG_SYSLOG to
    catch up to the cleanup done in LP #1263675.

    Added "--no-class_inherits_from_params_clas...

Read more...

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Nastya Urlapova (aurlapova) wrote :

{
build_id: "2014-02-14_14-14-37",
mirantis: "no",
build_number: "122",
nailgun_sha: "c00e635e9093a6cf9cfe559737aedadc8b7260a5",
ostf_sha: "f86abe5544b5ffcf621e0c450bca15737c92361f",
fuelmain_sha: "439577ed2795d86d57351d6e7cec6ed0049101c9",
astute_sha: "7eed50fc30cec675fff7787c37fcf6da6dd518ee",
release: "4.1",
fuellib_sha: "ed8b7b8385b5726af43c5e57d6fba263bdc55c08"
}

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.