VIP managed by pacemaker should have inifinity resource-stickiness

Bug #1763586 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Michele Baldessari

Bug Description

Right now we do not add any resource stickiness to the VIPs. This has one consequence when we configure IHA:
Namely, when a fenced compute node comes back (i.e. it recovers), pacemaker is free to move the VIPs around the controllers (see http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_prevent_resources_from_moving_after_recovery.html) to optimize the resource placement. We can observe the VIP moving with the following message:
Apr 12 06:37:04 [979790] controller-1 pengine: notice: LogAction: * Move ip-10.0.0.110 ( controller-1 -> controller-0 )

This movement of the VIP is highly undesirable because in Instance HA the fence_compute agent needs to talk to keystone via the VIP, and if the VIP is on the move we might get the following errors:
Apr 12 06:37:23 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[259311] stderr: [ Starting new HTTP connection (1): 10.0.0.110 ]
Apr 12 06:37:23 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[259311] stderr: [ keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.0.0.110:5000 ]
Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ REQ: curl -g -i -X GET http://10.0.0.110:5000 -H "Accept: application/json" -H "User-Agent: python-keystoneclient" ]
Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ Starting new HTTP connection (1): 10.0.0.110 ]
Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.0.0.110:5000 ]

By setting the resource-stickiness of the VIPs to INFINITY we control how strongly they prefer to stay running where they are

Revision history for this message
Michele Baldessari (michele) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/561136

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/561136
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=2131880c710d04d1d6030f02f9e257bbfe14d339
Submitter: Zuul
Branch: master

commit 2131880c710d04d1d6030f02f9e257bbfe14d339
Author: Michele Baldessari <email address hidden>
Date: Fri Apr 13 08:48:18 2018 +0200

    Add resource-stickiness=INFINITY to VIPs

    Right now we do not add any resource stickiness to the VIPs. This has
    one consequence when we configure IHA: Namely, when a fenced compute
    node comes back (i.e. it recovers), pacemaker is free to move the VIPs
    around the controllers (see http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_prevent_resources_from_moving_after_recovery.html)
    to optimize the resource placement. We can observe the VIP moving with
    the following message:
    Apr 12 06:37:04 [979790] controller-1 pengine: notice: LogAction: * Move ip-10.0.0.110 ( controller-1 -> controller-0 )

    This movement of the VIP is highly undesirable because in Instance HA the fence_compute agent needs to talk to keystone via the VIP, and if the VIP is on the move we might get the following errors:
    Apr 12 06:37:23 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[259311] stderr: [ Starting new HTTP connection (1): 10.0.0.110 ]
    Apr 12 06:37:23 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[259311] stderr: [ keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.0.0.110:5000 ]
    Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ REQ: curl -g -i -X GET http://10.0.0.110:5000 -H "Accept: application/json" -H "User-Agent: python-keystoneclient" ]
    Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ Starting new HTTP connection (1): 10.0.0.110 ]
    Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.0.0.110:5000 ]

    By setting the resource-stickiness of the VIPs to INFINITY we control how strongly they prefer to stay running where they are.

    Change-Id: I6862452d2250ac4c2c3e04840983510a3cd13536
    Closes-Bug: #1763586

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/561300

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/queens)

Reviewed: https://review.openstack.org/561300
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=cb114de452ad1ca6a95b0da4c17d5d747eddd103
Submitter: Zuul
Branch: stable/queens

commit cb114de452ad1ca6a95b0da4c17d5d747eddd103
Author: Michele Baldessari <email address hidden>
Date: Fri Apr 13 08:48:18 2018 +0200

    Add resource-stickiness=INFINITY to VIPs

    Right now we do not add any resource stickiness to the VIPs. This has
    one consequence when we configure IHA: Namely, when a fenced compute
    node comes back (i.e. it recovers), pacemaker is free to move the VIPs
    around the controllers (see http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_prevent_resources_from_moving_after_recovery.html)
    to optimize the resource placement. We can observe the VIP moving with
    the following message:
    Apr 12 06:37:04 [979790] controller-1 pengine: notice: LogAction: * Move ip-10.0.0.110 ( controller-1 -> controller-0 )

    This movement of the VIP is highly undesirable because in Instance HA the fence_compute agent needs to talk to keystone via the VIP, and if the VIP is on the move we might get the following errors:
    Apr 12 06:37:23 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[259311] stderr: [ Starting new HTTP connection (1): 10.0.0.110 ]
    Apr 12 06:37:23 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[259311] stderr: [ keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.0.0.110:5000 ]
    Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ REQ: curl -g -i -X GET http://10.0.0.110:5000 -H "Accept: application/json" -H "User-Agent: python-keystoneclient" ]
    Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ Starting new HTTP connection (1): 10.0.0.110 ]
    Apr 12 06:37:28 [979787] controller-1 stonith-ng: warning: log_action: fence_compute[261144] stderr: [ keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.0.0.110:5000 ]

    By setting the resource-stickiness of the VIPs to INFINITY we control how strongly they prefer to stay running where they are.

    Change-Id: I6862452d2250ac4c2c3e04840983510a3cd13536
    Closes-Bug: #1763586
    (cherry picked from commit 2131880c710d04d1d6030f02f9e257bbfe14d339)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 9.0.0

This issue was fixed in the openstack/puppet-tripleo 9.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 8.3.2

This issue was fixed in the openstack/puppet-tripleo 8.3.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.