Resource stickiness issue with service_ips

Bug #2059394 reported by DUFOUR Olivier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Designate-Bind Charm
New
Undecided
Unassigned

Bug Description

The goal of having service_ips is to be able to use a reserved range of IP addresses for designate-bind units in an Openstack deployment.

# Problem :
It has been observed that on some occurrences, a unit can end up with multiple addresses from service_ips. This can happens whenever one of the designate-bind unit is created a bit later than the others or during a maintenance if a unit is rebooted.

# Environment :
* Juju 3.4
* Ubuntu Jammy
* Openstack Yoga (yoga/stable) and Bobcat (charm latest/edge)

# Reproducer :
* Deploy the attached bundle.
* Use on one of designate-bind unit "sudo crm status" to check the status of the resource allocation :
Cluster Summary:
  * Stack: corosync
  * Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Mar 28 07:21:37 2024
  * Last change: Thu Mar 28 07:12:08 2024 by root via cibadmin on juju-1e6d0c-2-lxd-0
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ juju-1e6d0c-0-lxd-0 juju-1e6d0c-1-lxd-1 juju-1e6d0c-2-lxd-0 ]

Full List of Resources:
  * service_ip_10.244.40.201 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.202 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-1-lxd-1
  * service_ip_10.244.40.203 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-1-lxd-1 (already wrong from test deployment)

* Set 2 nodes in maintenance
* one unit should have all IP addresses from service_ips
Cluster Summary:
  * Stack: corosync
  * Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Mar 28 07:22:54 2024
  * Last change: Thu Mar 28 07:22:44 2024 by root via crm_attribute on juju-1e6d0c-0-lxd-0
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Node juju-1e6d0c-1-lxd-1: OFFLINE (standby)
  * Node juju-1e6d0c-2-lxd-0: standby
  * Online: [ juju-1e6d0c-0-lxd-0 ]

Full List of Resources:
  * service_ip_10.244.40.201 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.202 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.203 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0

* Set all nodes as back online
 * Expected behaviour : each unit gets one distinct IP address
 * Current behaviour : one unit stays with all 3 IP addresses of service_ips
Cluster Summary:
  * Stack: corosync
  * Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Mar 28 07:32:05 2024
  * Last change: Thu Mar 28 07:24:10 2024 by root via crm_attribute on juju-1e6d0c-0-lxd-0
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ juju-1e6d0c-0-lxd-0 juju-1e6d0c-1-lxd-1 juju-1e6d0c-2-lxd-0 ]

Full List of Resources:
  * service_ip_10.244.40.201 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.202 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.203 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0

# Root cause
The root cause is most likely because of the default value of resources-stickiness, which ensure that a resource, here an IP address, doesn't move whenever an unit is offline and then back online.
There is a colocation rule, to ensure all services are being run on different units, but it is blocked by the default stickiness

Workaround :
* is to remove the default stickiness manually
or
* ensure the resources created have the stickiness set to 0
(this might be easier to fix from a charm point of view)

juju ssh designate-bind/leader 'for service in $(sudo crm resource status | grep service_ip | awk '"'"'{print $2}'"'"'); do sudo crm resource meta $service set resource-stickiness 0; done'

Revision history for this message
DUFOUR Olivier (odufourc) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.