Comment 1 for bug 1797542

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-pacemaker (master)

Reviewed: https://review.openstack.org/609998
Committed: https://git.openstack.org/cgit/openstack/puppet-pacemaker/commit/?id=90033be6533218efd93182c3dcf6c883b03c95d3
Submitter: Zuul
Branch: master

commit 90033be6533218efd93182c3dcf6c883b03c95d3
Author: Michele Baldessari <email address hidden>
Date: Fri Oct 26 15:37:17 2018 +0200

    Improve deep_compare detection

    In the initial implementation of resource updates in puppet-pacemaker,
    in order to detect if a resource changed we did the following:
    1) dump the cib and make a copy
    2) remove the resource from the dump
    3) readd the resource with current parameters
    4) via crm_diff --cib we would detect if there were any changes related
       to the resource by checking a specific xpath on the the diff
    5) If 4) matched the xpath we would push the newly created CIB back to
       the cluster

    This approach has a number of drawbacks:
    A) In case of bundles it would remove the resource running inside the
    bundle for a very brief amount of time
    B) pcs would remove all constraints related to the resource
    automatically, so we had to recreate them forcefully during the update
    operation and then update the CIB
    C) We sometimes missed to trigger a resource update because the XPATH on
    4) was not always matching and so we missed updating a resource from
    time to time.

    We now change approach slightly to make it more robust. Namely,
    instead of removing a resource from the offline CIB we actually just
    update it via pcs update commands on the offline CIB. In the case of
    bundles we need to actually remove all currently existing storage maps
    and then add the newly defined set, because pcs does not allow to update
    a resource in a declarative way [1]. We also extend the xpath at C) to match
    most things and also be careful about not matching empty meta attributes
    which get added by pcs due to a bug [2].
    This new approach solves A) and B) and is also more precise and solve C)
    fully as well.

    Given the complexity of the changes here is how I tested this change:
    1) 1 mount point added
    Oct 25 14:46:31 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    2) 1 mount point removed
    Oct 25 15:56:38 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    3) 3 mount points added
    Oct 26 01:46:42 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    4) 2 mount points removed
    Oct 26 02:42:07 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    5) 1 mount point added plus replicas number changed
    Correctly get a reduced replicas number and correct mount point
     Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest]
       rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-2
       rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-0

    6) VIP netmask changed (from /32 to /24)
    Oct 26 03:53:04 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-192.168.24.10
    Oct 26 03:53:20 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-10.0.0.101
    Oct 26 03:53:36 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.1.12
    Oct 26 03:53:52 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.1.10
    Oct 26 03:54:07 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.3.17
    Oct 26 03:54:24 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource ip-172.17.4.13
    [root@controller-0 10]# ip -o a |grep '/32'
    [root@controller-0 10]#

    7) VIP netmask changed (from /24 to /32) and bundle replicas number changed
    rabbitmq-bundle went from two replicas (see 5.) to three:
     Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest]
       rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-2
       rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-0
       rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-1

    VIP netmasks correctly moved from /24 to /32

    8) Redeploy with no changes
    No resources were restarted

    9) Change mount point order in puppet does not restart the service
    Oct 26 04:59:36 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned false for resource rabbitmq-bundle

    10) Change a single option of a mountpoint
    Oct 26 05:32:13 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    11) Change a parameter of a pacemaker remote resource
    Added the following hiera keys:
      pacemaker::resource::remote::deep_compare: true
      pacemaker_remote_reconnect_interval: 105

    and correctly got:
    [root@controller-0 ~]# pcs resource show compute-0 |grep reconnect
      Attributes: reconnect_interval=105 server=172.17.1.14

    12) Change 1 mount point on three bundles
    Oct 26 10:40:30 controller-0 dockerd-current[19339]: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle

    13) Try and change a meta attribute
    Added an 'ordered=true' to the rabbitmq bundle
    [root@controller-0 ~]# pcs resource show rabbitmq-bundle |grep ordered
       Meta Attrs: container-attribute-target=host notify=true ordered=true

    14) Delete a single location constraint and redeploy
    [root@controller-0 ~]# pcs constraint remove location-rabbitmq-bundle
    ..redeployed..
    [root@controller-0 ~]# pcs constraint |grep rabbit
      Resource: rabbitmq-bundle
        Constraint: location-rabbitmq-bundle (resource-discovery=exclusive)
            Expression: rabbitmq-role eq true

    Other things verified during the tests above:
    - Constraints stayed the same
    - Properties stayed the same

    [1] https://bugzilla.redhat.com/show_bug.cgi?id=1598197
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=1568353
    Closes-Bug: #1797542

    Change-Id: I048fbb6b0319bac33c82ad2d7639d0f594fb73bb