deep_compare misses changes under certain conditions

Bug #1797542 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
puppet-pacemaker
Fix Released
High
Michele Baldessari

Bug Description

We currently run a crm_diff --cib -o orig -n new and then parse the diff xml to detect if we need to change a resource. Namely we check on the following xpath:
"/diff/change[@operation and contains(@path, \"#{resource_name}\")]/change-result"

If we do find 'change' entries which contain an 'operation' attribute and the resource name is in the diff *and* it ends in /change-result we deem the resource needs an update.

This seems to be not always the case and it seems that when the number of changes in a bind mount list is more than one, the '/change-result' part is a section on its own.

Changed in puppet-pacemaker:
status: New → Triaged
importance: Undecided → High
Changed in puppet-pacemaker:
assignee: nobody → Michele Baldessari (michele)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-pacemaker (master)
Download full text (6.9 KiB)

Reviewed: https://review.openstack.org/609998
Committed: https://git.openstack.org/cgit/openstack/puppet-pacemaker/commit/?id=90033be6533218efd93182c3dcf6c883b03c95d3
Submitter: Zuul
Branch: master

commit 90033be6533218efd93182c3dcf6c883b03c95d3
Author: Michele Baldessari <email address hidden>
Date: Fri Oct 26 15:37:17 2018 +0200

    Improve deep_compare detection

    In the initial implementation of resource updates in puppet-pacemaker,
    in order to detect if a resource changed we did the following:
    1) dump the cib and make a copy
    2) remove the resource from the dump
    3) readd the resource with current parameters
    4) via crm_diff --cib we would detect if there were any changes related
       to the resource by checking a specific xpath on the the diff
    5) If 4) matched the xpath we would push the newly created CIB back to
       the cluster

    This approach has a number of drawbacks:
    A) In case of bundles it would remove the resource running inside the
    bundle for a very brief amount of time
    B) pcs would remove all constraints related to the resource
    automatically, so we had to recreate them forcefully during the update
    operation and then update the CIB
    C) We sometimes missed to trigger a resource update because the XPATH on
    4) was not always matching and so we missed updating a resource from
    time to time.

    We now change approach slightly to make it more robust. Namely,
    instead of removing a resource from the offline CIB we actually just
    update it via pcs update commands on the offline CIB. In the case of
    bundles we need to actually remove all currently existing storage maps
    and then add the newly defined set, because pcs does not allow to update
    a resource in a declarative way [1]. We also extend the xpath at C) to match
    most things and also be careful about not matching empty meta attributes
    which get added by pcs due to a bug [2].
    This new approach solves A) and B) and is also more precise and solve C)
    fully as well.

    Given the complexity of the changes here is how I tested this change:
    1) 1 mount point added
    Oct 25 14:46:31 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    2) 1 mount point removed
    Oct 25 15:56:38 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    3) 3 mount points added
    Oct 26 01:46:42 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    4) 2 mount points removed
    Oct 26 02:42:07 controller-0 journal: Debug: pcmk_resource_has_changed (ng version) returned true for resource rabbitmq-bundle
    and rabbitmq-bundle was correctly restarted with the correct bind mounts

    5) 1 mount point added plus replicas number changed
    Correctly get a reduced rep...

Read more...

Changed in puppet-pacemaker:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-pacemaker 0.7.2

This issue was fixed in the openstack/puppet-pacemaker 0.7.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.