Minor update job logs errors when moving away VIP in HA control plane

Bug #1892570 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Damien Ciabrini

Bug Description

During a minor update of a HA controller node, an initial step
consists in moving away any virtual IP hosted on the node to be
updated. But when the HA control plane only has 1-node, this step logs
an error, because there is no other available node to move the virtual
IP to:

 * 2020-08-05 11:06:29 | <192.168.24.3> (0, b'\n{"cmd": "CLUSTER_NODE=$(crm_node -n)\\necho \\"Retrieving all the VIPs which are hosted on this node\\"\\nVIPS_TO_MOVE=$(crm_mon --as-xml | xmllint --xpath \'//resource[@resource_agent = \\"ocf::heartbeat:IPaddr2\\" and @role = \\"Started\\" and @managed = \\"true\\" and ./node[@name = \\"\'${CLUSTER_NODE}\'\\"]]/@id\' - | sed -e \'s/id=//g\' -e \'s/\\"//g\')\\nfor v in ${VIPS_TO_MOVE}; do\\n echo \\"Moving VIP $v on another node\\"\\n pcs resource move $v --wait=300\\ndone\\necho \\"Removing the location constraints that were created to move the VIPs\\"\\nfor v in ${VIPS_TO_MOVE}; do\\n echo \\"Removing location ban for VIP $v\\"\\n ban_id=$(cibadmin --query | xmllint --xpath \'string(//rsc_location[@rsc=\\"\'${v}\'\\" and @node=\\"\'${CLUSTER_NODE}\'\\" and @score=\\"-INFINITY\\"]/@id)\' -)\\n if [ -n \\"$ban_id\\" ]; then\\n pcs constraint remove ${ban_id}\\n else\\n echo \\"Could not retrieve and clear location constraint for VIP $v\\" 2>&1\\n fi\\ndone\\n", "stdout": "Retrieving all the VIPs which are hosted on this node\\nMoving VIP ip-192.168.24.16 on another node\\nWarning: Creating location constraint \'cli-ban-ip-192.168.24.16-on-node-0000333831\' with a score of -INFINITY for resource ip-192.168.24.16 on node-0000333831.\\n\\tThis will prevent ip-192.168.24.16 from running on node-0000333831 until the constraint is removed\\n\\tThis will be the case even if node-0000333831 is the last node in the cluster\\nRemoving the location constraints that were created to move the VIPs\\nRemoving location ban for VIP ip-192.168.24.16", "stderr": "Error: resource \'ip-192.168.24.16\' is not running on any node", "rc": 0, "start": "2020-08-05 11:06:25.880027", "end": "2020-08-05 11:06:29.046251", "delta": "0:00:03.166224", "changed": true, "invocation": {"module_args": {"_raw_params": "CLUSTER_NODE=$(crm_node -n)\\necho \\"Retrieving all the VIPs which are hosted on this node\\"\\nVIPS_TO_MOVE=$(crm_mon --as-xml | xmllint --xpath \'//resource[@resource_agent = \\"ocf::heartbeat:IPaddr2\\" and @role = \\"Started\\" and @managed = \\"true\\" and ./node[@name = \\"\'${CLUSTER_NODE}\'\\"]]/@id\' - | sed -e \'s/id=//g\' -e \'s/\\"//g\')\\nfor v in ${VIPS_TO_MOVE}; do\\n echo \\"Moving VIP $v on another node\\"\\n pcs resource move $v --wait=300\\ndone\\necho \\"Removing the location constraints that were created to move the VIPs\\"\\nfor v in ${VIPS_TO_MOVE}; do\\n echo \\"Removing location ban for VIP $v\\"\\n ban_id=$(cibadmin --query | xmllint --xpath \'string(//rsc_location[@rsc=\\"\'${v}\'\\" and @node=\\"\'${CLUSTER_NODE}\'\\" and @score=\\"-INFINITY\\"]/@id)\' -)\\n if [ -n \\"$ban_id\\" ]; then\\n pcs constraint remove ${ban_id}\\n else\\n echo \\"Could not retrieve and clear location constraint for VIP $v\\" 2>&1\\n fi\\ndone\\n", "_uses_shell": true, "warn": true, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b'')
  * 2020-08-05 11:06:29 | "stderr": "Error: resource 'ip-192.168.24.16' is not running on any node"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/746927
Reason: The gate is currently hitting the "docker api 429" issue, see #tripleo channel for more details. I'll abandon that patch so it's cleared from the gate. Please do not restore it as I'll take care of it when the gate is stable again. Thanks for your understanding and patience!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/746927
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=164fac75a0fbff2130f39a5f4d7c5931f5ea3b08
Submitter: Zuul
Branch: master

commit 164fac75a0fbff2130f39a5f4d7c5931f5ea3b08
Author: Damien Ciabrini <email address hidden>
Date: Wed Aug 19 15:40:00 2020 +0200

    minor update: only migrate HA VIP away when needed

    When update tasks runs in a HA controller node, pacemaker is
    stopped, along with all HA resources hosted on the node. If
    any VIP is hosted on that node, it is moved to another node
    prior to stopping pacemaker to limit service downtime.

    If the HA controller node doesn't manage VIP (no HAProxy) or
    the control plane only has 1 node, there is no need to try and
    move VIP away before stopping pacemaker.

    Tested on a 1-node HA control plane, and also on a control
    plane with external balancer (no HAproxy service, thus no VIP
    managed in pacemaker). The dedicated ansible task no longer
    tries to move VIP if it doesn't need to.

    Closes-Bug: #1892570
    Change-Id: Id9b9c413ee37dcda422e69ebef4aca81e4877156

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/748130

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/748130
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=cc17467c5018d4f05b53af0a0046def5c5a8633e
Submitter: Zuul
Branch: stable/ussuri

commit cc17467c5018d4f05b53af0a0046def5c5a8633e
Author: Damien Ciabrini <email address hidden>
Date: Wed Aug 19 15:40:00 2020 +0200

    minor update: only migrate HA VIP away when needed

    When update tasks runs in a HA controller node, pacemaker is
    stopped, along with all HA resources hosted on the node. If
    any VIP is hosted on that node, it is moved to another node
    prior to stopping pacemaker to limit service downtime.

    If the HA controller node doesn't manage VIP (no HAProxy) or
    the control plane only has 1 node, there is no need to try and
    move VIP away before stopping pacemaker.

    Tested on a 1-node HA control plane, and also on a control
    plane with external balancer (no HAproxy service, thus no VIP
    managed in pacemaker). The dedicated ansible task no longer
    tries to move VIP if it doesn't need to.

    Closes-Bug: #1892570
    Change-Id: Id9b9c413ee37dcda422e69ebef4aca81e4877156
    (cherry picked from commit 164fac75a0fbff2130f39a5f4d7c5931f5ea3b08)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/748486

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/748486
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=ad090f94cccd050240d7e9293830994da44d0683
Submitter: Zuul
Branch: stable/train

commit ad090f94cccd050240d7e9293830994da44d0683
Author: Damien Ciabrini <email address hidden>
Date: Wed Aug 19 15:40:00 2020 +0200

    minor update: only migrate HA VIP away when needed

    When update tasks runs in a HA controller node, pacemaker is
    stopped, along with all HA resources hosted on the node. If
    any VIP is hosted on that node, it is moved to another node
    prior to stopping pacemaker to limit service downtime.

    If the HA controller node doesn't manage VIP (no HAProxy) or
    the control plane only has 1 node, there is no need to try and
    move VIP away before stopping pacemaker.

    Tested on a 1-node HA control plane, and also on a control
    plane with external balancer (no HAproxy service, thus no VIP
    managed in pacemaker). The dedicated ansible task no longer
    tries to move VIP if it doesn't need to.

    Closes-Bug: #1892570
    Change-Id: Id9b9c413ee37dcda422e69ebef4aca81e4877156
    (cherry picked from commit 164fac75a0fbff2130f39a5f4d7c5931f5ea3b08)
    (cherry picked from commit cc17467c5018d4f05b53af0a0046def5c5a8633e)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.