Resource stickiness issue with service_ips
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Designate-Bind Charm |
New
|
Undecided
|
Unassigned |
Bug Description
The goal of having service_ips is to be able to use a reserved range of IP addresses for designate-bind units in an Openstack deployment.
# Problem :
It has been observed that on some occurrences, a unit can end up with multiple addresses from service_ips. This can happens whenever one of the designate-bind unit is created a bit later than the others or during a maintenance if a unit is rebooted.
# Environment :
* Juju 3.4
* Ubuntu Jammy
* Openstack Yoga (yoga/stable) and Bobcat (charm latest/edge)
# Reproducer :
* Deploy the attached bundle.
* Use on one of designate-bind unit "sudo crm status" to check the status of the resource allocation :
Cluster Summary:
* Stack: corosync
* Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Mar 28 07:21:37 2024
* Last change: Thu Mar 28 07:12:08 2024 by root via cibadmin on juju-1e6d0c-2-lxd-0
* 3 nodes configured
* 3 resource instances configured
Node List:
* Online: [ juju-1e6d0c-0-lxd-0 juju-1e6d0c-1-lxd-1 juju-1e6d0c-2-lxd-0 ]
Full List of Resources:
* service_
* service_
* service_
* Set 2 nodes in maintenance
* one unit should have all IP addresses from service_ips
Cluster Summary:
* Stack: corosync
* Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Mar 28 07:22:54 2024
* Last change: Thu Mar 28 07:22:44 2024 by root via crm_attribute on juju-1e6d0c-0-lxd-0
* 3 nodes configured
* 3 resource instances configured
Node List:
* Node juju-1e6d0c-
* Node juju-1e6d0c-
* Online: [ juju-1e6d0c-0-lxd-0 ]
Full List of Resources:
* service_
* service_
* service_
* Set all nodes as back online
* Expected behaviour : each unit gets one distinct IP address
* Current behaviour : one unit stays with all 3 IP addresses of service_ips
Cluster Summary:
* Stack: corosync
* Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Thu Mar 28 07:32:05 2024
* Last change: Thu Mar 28 07:24:10 2024 by root via crm_attribute on juju-1e6d0c-0-lxd-0
* 3 nodes configured
* 3 resource instances configured
Node List:
* Online: [ juju-1e6d0c-0-lxd-0 juju-1e6d0c-1-lxd-1 juju-1e6d0c-2-lxd-0 ]
Full List of Resources:
* service_
* service_
* service_
# Root cause
The root cause is most likely because of the default value of resources-
There is a colocation rule, to ensure all services are being run on different units, but it is blocked by the default stickiness
Workaround :
* is to remove the default stickiness manually
or
* ensure the resources created have the stickiness set to 0
(this might be easier to fix from a charm point of view)
juju ssh designate-