OpenStack Designate-Bind Charm

Resource stickiness issue with service_ips

Bug #2059394 reported by DUFOUR Olivier on 2024-03-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Designate-Bind Charm	New	Undecided	Unassigned

Bug Description

The goal of having service_ips is to be able to use a reserved range of IP addresses for designate-bind units in an Openstack deployment.

# Problem :
It has been observed that on some occurrences, a unit can end up with multiple addresses from service_ips. This can happens whenever one of the designate-bind unit is created a bit later than the others or during a maintenance if a unit is rebooted.

# Environment :
* Juju 3.4
* Ubuntu Jammy
* Openstack Yoga (yoga/stable) and Bobcat (charm latest/edge)

# Reproducer :
* Deploy the attached bundle.
* Use on one of designate-bind unit "sudo crm status" to check the status of the resource allocation :
Cluster Summary:
  * Stack: corosync
  * Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Mar 28 07:21:37 2024
  * Last change: Thu Mar 28 07:12:08 2024 by root via cibadmin on juju-1e6d0c-2-lxd-0
  * 3 nodes configured
  * 3 resource instances configured

Node List:
* Online: [ juju-1e6d0c-0-lxd-0 juju-1e6d0c-1-lxd-1 juju-1e6d0c-2-lxd-0 ]

Full List of Resources:
  * service_ip_10.244.40.201 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.202 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-1-lxd-1
  * service_ip_10.244.40.203 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-1-lxd-1 (already wrong from test deployment)

* Set 2 nodes in maintenance
* one unit should have all IP addresses from service_ips
Cluster Summary:
  * Stack: corosync
  * Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Mar 28 07:22:54 2024
  * Last change: Thu Mar 28 07:22:44 2024 by root via crm_attribute on juju-1e6d0c-0-lxd-0
  * 3 nodes configured
  * 3 resource instances configured

Node List:
  * Node juju-1e6d0c-1-lxd-1: OFFLINE (standby)
  * Node juju-1e6d0c-2-lxd-0: standby
  * Online: [ juju-1e6d0c-0-lxd-0 ]

Full List of Resources:
  * service_ip_10.244.40.201 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.202 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.203 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0

* Set all nodes as back online
* Expected behaviour : each unit gets one distinct IP address
* Current behaviour : one unit stays with all 3 IP addresses of service_ips
Cluster Summary:
  * Stack: corosync
  * Current DC: juju-1e6d0c-2-lxd-0 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Thu Mar 28 07:32:05 2024
  * Last change: Thu Mar 28 07:24:10 2024 by root via crm_attribute on juju-1e6d0c-0-lxd-0
  * 3 nodes configured
  * 3 resource instances configured

Node List:
* Online: [ juju-1e6d0c-0-lxd-0 juju-1e6d0c-1-lxd-1 juju-1e6d0c-2-lxd-0 ]

Full List of Resources:
  * service_ip_10.244.40.201 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.202 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0
  * service_ip_10.244.40.203 (ocf:heartbeat:IPaddr2): Started juju-1e6d0c-0-lxd-0

# Root cause
The root cause is most likely because of the default value of resources-stickiness, which ensure that a resource, here an IP address, doesn't move whenever an unit is offline and then back online.
There is a colocation rule, to ensure all services are being run on different units, but it is blocked by the default stickiness

Workaround :
* is to remove the default stickiness manually
or
* ensure the resources created have the stickiness set to 0
(this might be easier to fix from a charm point of view)

juju ssh designate-bind/leader 'for service in $(sudo crm resource status | grep service_ip | awk '"'"'{print $2}'"'"'); do sudo crm resource meta $service set resource-stickiness 0; done'

Revision history for this message

DUFOUR Olivier (odufourc) wrote on 2024-03-28:

designate-bind-issue.yaml Edit (3.2 KiB, text/plain)

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

designate-bind-issue.yaml Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.