config changes do not trigger a service restart for pcmk-managed services

Bug #1775196 reported by Michele Baldessari on 2018-06-05
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
High
Michele Baldessari

Bug Description

Currently we do not restart pacemaker managed resources when a config change would trigger a restart like we used to do on baremetal.

It is expected that a config change in, say haproxy, will trigger a restart of the resource managed by pacemaker. First time this has been discussed was here:
https://bugzilla.redhat.com/show_bug.cgi?id=1559105

Changed in tripleo:
milestone: rocky-2 → rocky-3

Fix proposed to branch: master
Review: https://review.openstack.org/572840

Changed in tripleo:
status: Triaged → In Progress
Download full text (3.5 KiB)

Reviewed: https://review.openstack.org/572840
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a6389da22d43fa61d4db16b676b23c3a4c468dfd
Submitter: Zuul
Branch: master

commit a6389da22d43fa61d4db16b676b23c3a4c468dfd
Author: Michele Baldessari <email address hidden>
Date: Tue Jun 5 14:19:24 2018 +0000

    Introduce restart_bundle containers to detect config changes and restart pacemaker resources

    During the containerization work we regressed on the restart of
    pacemaker resources when a config change for the service was detected.
    In baremetal we used to do the following:
    1) If a puppet config change was detect we'd touch a file with the
       service name under /var/lib/tripleo/pacemaker-restarts/<service>
    2) A post deployment bash script (extraconfig/tasks/pacemaker_resource_restart.sh)
       would test for the service file's existence and restart the pcs service via
       'pcs resource restart --wait=600 service' on the bootstrap node.

    With this patchset we make use of paunch's ability do detect if a config
    hash change happened to respawn a temporary container (called
    <service>_restart_bundle) which will simply always restart the pacemaker
    service from the bootstrap node whenever invoked, but only if the pcmk
    resource already exists. For this reason we add config_volume and bind
    mount it inside the container, so that the TRIPLEO_CONFIG_HASH env
    variable gets generated for these *_restart_bundle containers.

    We tested this change as follows:
    A) Deployed an HA overcloud with this change and observed that pcmk resources
       were not restarted needlessly during initial deploy
    B) Rerun the exact same overcloud deploy with no changes, observed that
       no spurious restarts would take place
    C) Added an env file to trigger the of config of haproxy[1], redeployed and observed that it restarted
       haproxy only:
       Jun 06 16:22:37 overcloud-controller-0 dockerd-current[15272]: haproxy-bundle restart invoked
    D) Added a trigger [2] for mysql config change, redeployed and observed restart:
       Jun 06 16:40:52 overcloud-controller-0 dockerd-current[15272]: galera-bundle restart invoked
    E) Added a trigger [3] for a rabbitmq config change, redeployed and observed restart:
       Jun 06 17:03:41 overcloud-controller-0 dockerd-current[15272]: rabbitmq-bundle restart invoked
    F) Added a trigger [4] for a redis config change, redeployed and observed restart:
       Jun 07 08:42:54 overcloud-controller-0 dockerd-current[15272]: redis-bundle restart invoked
    G) Rerun a deploy with no changes and observed that no spurious restarts
       were triggered

    [1] haproxy config change trigger:
    parameter_defaults:
      ExtraConfig:
        tripleo::haproxy::haproxy_globals_override:
          'maxconn': 1111

    [2] mysql config change trigger:
    parameter_defaults:
      ExtraConfig:
        mysql_max_connections: 1111

    [3] rabbitmq config change trigger (default partition handling is 'ignore'):
    parameter_defaults:
      ExtraConfig:
        rabbitmq_config_variables:
          cluster_par...

Read more...

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/567821
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=a0dfc6c0c694091eae195d4090a63e339f3daa39
Submitter: Zuul
Branch: master

commit a0dfc6c0c694091eae195d4090a63e339f3daa39
Author: Michele Baldessari <email address hidden>
Date: Fri May 11 12:12:32 2018 +0200

    rerun *_init_bundles all the time

    In the same spirit as change I1f07272499b419079466cf9f395fb04a082099bd
    we want to rerun all pacemaker _init_bundles all the time. For a few main
    reasons:
    1) We will eventually support scaling-up roles that contain
       pacemaker-managed services and we need to rerun _init_bundles so that
       pacemaker properties are created for the newly added nodes.
    2) When you replace a controller the pacemaker properties will be
       recreated for the newly added node.
    3) We need to create appropriate iptables rules whenever we add a
       service to an existing deployment.

    We do this by adding the DeployIdentifier to the environment so that
    paunch will retrigger a run at every redeploy.

    Partial-Bug: #1775196
    Change-Id: Ifd48d74507609fc7f4abc269b61b2868bfbc9272

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/574264

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/574263
Reason: The gate is having serious troubles with docker.io, we need to abandon this patch so it leaves the gate and when it's stable again I will restore this patch. Please do not restore or do anything, I'll take care of it as soon as things work again.

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/574264
Reason: The gate is having serious troubles with docker.io, we need to abandon this patch so it leaves the gate and when it's stable again I will restore this patch. Please do not restore or do anything, I'll take care of it as soon as things work again.

Download full text (3.6 KiB)

Reviewed: https://review.openstack.org/574263
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=4ca4ce2d8a0fd81fa3fd6f8f2c6070ef29707c52
Submitter: Zuul
Branch: stable/queens

commit 4ca4ce2d8a0fd81fa3fd6f8f2c6070ef29707c52
Author: Michele Baldessari <email address hidden>
Date: Tue Jun 5 14:19:24 2018 +0000

    Introduce restart_bundle containers to detect config changes and restart pacemaker resources

    During the containerization work we regressed on the restart of
    pacemaker resources when a config change for the service was detected.
    In baremetal we used to do the following:
    1) If a puppet config change was detect we'd touch a file with the
       service name under /var/lib/tripleo/pacemaker-restarts/<service>
    2) A post deployment bash script (extraconfig/tasks/pacemaker_resource_restart.sh)
       would test for the service file's existence and restart the pcs service via
       'pcs resource restart --wait=600 service' on the bootstrap node.

    With this patchset we make use of paunch's ability do detect if a config
    hash change happened to respawn a temporary container (called
    <service>_restart_bundle) which will simply always restart the pacemaker
    service from the bootstrap node whenever invoked, but only if the pcmk
    resource already exists. For this reason we add config_volume and bind
    mount it inside the container, so that the TRIPLEO_CONFIG_HASH env
    variable gets generated for these *_restart_bundle containers.

    We tested this change as follows:
    A) Deployed an HA overcloud with this change and observed that pcmk resources
       were not restarted needlessly during initial deploy
    B) Rerun the exact same overcloud deploy with no changes, observed that
       no spurious restarts would take place
    C) Added an env file to trigger the of config of haproxy[1], redeployed and observed that it restarted
       haproxy only:
       Jun 06 16:22:37 overcloud-controller-0 dockerd-current[15272]: haproxy-bundle restart invoked
    D) Added a trigger [2] for mysql config change, redeployed and observed restart:
       Jun 06 16:40:52 overcloud-controller-0 dockerd-current[15272]: galera-bundle restart invoked
    E) Added a trigger [3] for a rabbitmq config change, redeployed and observed restart:
       Jun 06 17:03:41 overcloud-controller-0 dockerd-current[15272]: rabbitmq-bundle restart invoked
    F) Added a trigger [4] for a redis config change, redeployed and observed restart:
       Jun 07 08:42:54 overcloud-controller-0 dockerd-current[15272]: redis-bundle restart invoked
    G) Rerun a deploy with no changes and observed that no spurious restarts
       were triggered

    [1] haproxy config change trigger:
    parameter_defaults:
      ExtraConfig:
        tripleo::haproxy::haproxy_globals_override:
          'maxconn': 1111

    [2] mysql config change trigger:
    parameter_defaults:
      ExtraConfig:
        mysql_max_connections: 1111

    [3] rabbitmq config change trigger (default partition handling is 'ignore'):
    parameter_defaults:
      ExtraConfig:
        rabbitmq_config_variables:
          clus...

Read more...

tags: added: in-stable-queens

Reviewed: https://review.openstack.org/574264
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1065da35a04579aeb384048b163187179b855d35
Submitter: Zuul
Branch: stable/queens

commit 1065da35a04579aeb384048b163187179b855d35
Author: Michele Baldessari <email address hidden>
Date: Fri May 11 12:12:32 2018 +0200

    rerun *_init_bundles all the time

    In the same spirit as change I1f07272499b419079466cf9f395fb04a082099bd
    we want to rerun all pacemaker _init_bundles all the time. For a few main
    reasons:
    1) We will eventually support scaling-up roles that contain
       pacemaker-managed services and we need to rerun _init_bundles so that
       pacemaker properties are created for the newly added nodes.
    2) When you replace a controller the pacemaker properties will be
       recreated for the newly added node.
    3) We need to create appropriate iptables rules whenever we add a
       service to an existing deployment.

    We do this by adding the DeployIdentifier to the environment so that
    paunch will retrigger a run at every redeploy.

    Partial-Bug: #1775196
    Change-Id: Ifd48d74507609fc7f4abc269b61b2868bfbc9272
    (cherry picked from commit a0dfc6c0c694091eae195d4090a63e339f3daa39)

tags: added: pike-backport-potential

This issue was fixed in the openstack/tripleo-heat-templates 8.0.4 release.

This issue was fixed in the openstack/tripleo-heat-templates 9.0.0.0b4 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers