tripleo

Bug #1775196
Comment #2

Comment 2 for bug 1775196

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-06-11: Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/572840
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a6389da22d43fa61d4db16b676b23c3a4c468dfd
Submitter: Zuul
Branch: master

commit a6389da22d43fa61d4db16b676b23c3a4c468dfd
Author: Michele Baldessari <email address hidden>
Date: Tue Jun 5 14:19:24 2018 +0000

Introduce restart_bundle containers to detect config changes and restart pacemaker resources

    During the containerization work we regressed on the restart of
    pacemaker resources when a config change for the service was detected.
    In baremetal we used to do the following:
    1) If a puppet config change was detect we'd touch a file with the
       service name under /var/lib/tripleo/pacemaker-restarts/<service>
    2) A post deployment bash script (extraconfig/tasks/pacemaker_resource_restart.sh)
       would test for the service file's existence and restart the pcs service via
       'pcs resource restart --wait=600 service' on the bootstrap node.

    With this patchset we make use of paunch's ability do detect if a config
    hash change happened to respawn a temporary container (called
    <service>_restart_bundle) which will simply always restart the pacemaker
    service from the bootstrap node whenever invoked, but only if the pcmk
    resource already exists. For this reason we add config_volume and bind
    mount it inside the container, so that the TRIPLEO_CONFIG_HASH env
    variable gets generated for these *_restart_bundle containers.

    We tested this change as follows:
    A) Deployed an HA overcloud with this change and observed that pcmk resources
       were not restarted needlessly during initial deploy
    B) Rerun the exact same overcloud deploy with no changes, observed that
       no spurious restarts would take place
    C) Added an env file to trigger the of config of haproxy[1], redeployed and observed that it restarted
       haproxy only:
       Jun 06 16:22:37 overcloud-controller-0 dockerd-current[15272]: haproxy-bundle restart invoked
    D) Added a trigger [2] for mysql config change, redeployed and observed restart:
       Jun 06 16:40:52 overcloud-controller-0 dockerd-current[15272]: galera-bundle restart invoked
    E) Added a trigger [3] for a rabbitmq config change, redeployed and observed restart:
       Jun 06 17:03:41 overcloud-controller-0 dockerd-current[15272]: rabbitmq-bundle restart invoked
    F) Added a trigger [4] for a redis config change, redeployed and observed restart:
       Jun 07 08:42:54 overcloud-controller-0 dockerd-current[15272]: redis-bundle restart invoked
    G) Rerun a deploy with no changes and observed that no spurious restarts
       were triggered

    [1] haproxy config change trigger:
    parameter_defaults:
      ExtraConfig:
        tripleo::haproxy::haproxy_globals_override:
          'maxconn': 1111

    [2] mysql config change trigger:
    parameter_defaults:
      ExtraConfig:
        mysql_max_connections: 1111

    [3] rabbitmq config change trigger (default partition handling is 'ignore'):
    parameter_defaults:
      ExtraConfig:
        rabbitmq_config_variables:
          cluster_partition_handling: 'pause_minority'
          queue_master_locator: '<<"min-masters">>'
          loopback_users: '[]'

    [4] redis config change trigger:
    parameter_defaults:
      ExtraConfig:
        redis::tcp_backlog: 666
        redis::params::tcp_backlog: 666

    Change-Id: I62870c055097569ceab2ff67cf0fe63122277c5b
    Co-Authored-By: Damien Ciabrini <email address hidden>
    Closes-Bug: #1775196

Reviewed:  https://review.openstack.org/572840
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a6389da22d43fa61d4db16b676b23c3a4c468dfd
Submitter: Zuul
Branch:    master

commit a6389da22d43fa61d4db16b676b23c3a4c468dfd
Author: Michele Baldessari <michele@acksyn.org>
Date:   Tue Jun 5 14:19:24 2018 +0000

Introduce restart_bundle containers to detect config changes and restart pacemaker resources
    
    During the containerization work we regressed on the restart of
    pacemaker resources when a config change for the service was detected.
    In baremetal we used to do the following:
    1) If a puppet config change was detect we'd touch a file with the
       service name under /var/lib/tripleo/pacemaker-restarts/<service>
    2) A post deployment bash script (extraconfig/tasks/pacemaker_resource_restart.sh)
       would test for the service file's existence and restart the pcs service via
       'pcs resource restart --wait=600 service' on the bootstrap node.
    
    With this patchset we make use of paunch's ability do detect if a config
    hash change happened to respawn a temporary container (called
    <service>_restart_bundle) which will simply always restart the pacemaker
    service from the bootstrap node whenever invoked, but only if the pcmk
    resource already exists. For this reason we add config_volume and bind
    mount it inside the container, so that the TRIPLEO_CONFIG_HASH env
    variable gets generated for these *_restart_bundle containers.
    
    We tested this change as follows:
    A) Deployed an HA overcloud with this change and observed that pcmk resources
       were not restarted needlessly during initial deploy
    B) Rerun the exact same overcloud deploy with no changes, observed that
       no spurious restarts would take place
    C) Added an env file to trigger the of config of haproxy[1], redeployed and observed that it restarted
       haproxy only:
       Jun 06 16:22:37 overcloud-controller-0 dockerd-current[15272]: haproxy-bundle restart invoked
    D) Added a trigger [2] for mysql config change, redeployed and observed restart:
       Jun 06 16:40:52 overcloud-controller-0 dockerd-current[15272]: galera-bundle restart invoked
    E) Added a trigger [3] for a rabbitmq config change, redeployed and observed restart:
       Jun 06 17:03:41 overcloud-controller-0 dockerd-current[15272]: rabbitmq-bundle restart invoked
    F) Added a trigger [4] for a redis config change, redeployed and observed restart:
       Jun 07 08:42:54 overcloud-controller-0 dockerd-current[15272]: redis-bundle restart invoked
    G) Rerun a deploy with no changes and observed that no spurious restarts
       were triggered
    
    [1] haproxy config change trigger:
    parameter_defaults:
      ExtraConfig:
        tripleo::haproxy::haproxy_globals_override:
          'maxconn': 1111
    
    [2] mysql config change trigger:
    parameter_defaults:
      ExtraConfig:
        mysql_max_connections: 1111
    
    [3] rabbitmq config change trigger (default partition handling is 'ignore'):
    parameter_defaults:
      ExtraConfig:
        rabbitmq_config_variables:
          cluster_partition_handling: 'pause_minority'
          queue_master_locator: '<<"min-masters">>'
          loopback_users: '[]'
    
    [4] redis config change trigger:
    parameter_defaults:
      ExtraConfig:
        redis::tcp_backlog: 666
        redis::params::tcp_backlog: 666
    
    Change-Id: I62870c055097569ceab2ff67cf0fe63122277c5b
    Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
    Closes-Bug: #1775196