OpenStack HA Cluster Charm

Resources with order and/or colocation constraint can't settle in HACluster

Bug #1952492 reported by Gabriel Cocenza on 2021-11-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack HA Cluster Charm	New	Undecided	Unassigned

Bug Description

I have the following situation: 3 nodes that have cloned resources A and B. Resource B needs resource A started running in the same node.

While configuring cloned resources in 3 nodes, the cluster gets stuck trying to start a resource in a different node, not honoring the constraints for order and colocation. Reading the logs, the hook ha-relation-changed doesn't configure all the resources, clones, groups constraints and so on to then start allocating the resources.

The logs show that after configuring two resources, it got stuck (20 minutes) because tried to add the resources in different nodes.

unit-hacluster-kubernetes-master-0: 23:09:49 WARNING unit.hacluster-kubernetes-master/0.ha-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
unit-hacluster-kubernetes-master-0: 23:20:40 DEBUG juju.worker.uniter.remotestate update status timer triggered for hacluster-kubernetes-master/0
unit-hacluster-kubernetes-master-0: 23:24:45 DEBUG juju.worker.uniter.remotestate update status timer triggered for hacluster-kubernetes-master/0

Manually solved the issue and then it started to roll again

unit-hacluster-kubernetes-master-0: 23:25:00 WARNING unit.hacluster-kubernetes-master/0.ha-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
unit-hacluster-kubernetes-master-0: 23:25:00 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: crm -w -F configure primitive res_kube_controller_manager_snap.kube_controller_manager.daemon systemd:snap.kube-controller-manager.daemon meta migration-threshold="INFINITY" failure-timeout="5s" op monitor interval="5s"
unit-hacluster-kubernetes-master-0: 23:25:07 WARNING unit.hacluster-kubernetes-master/0.ha-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
unit-hacluster-kubernetes-master-0: 23:25:07 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: crm -w -F configure primitive res_kube_proxy_snap.kube_proxy.daemon systemd:snap.kube-proxy.daemon meta migration-threshold="INFINITY" failure-timeout="5s" op monitor interval="5s"
unit-hacluster-kubernetes-master-0: 23:25:14 WARNING unit.hacluster-kubernetes-master/0.ha-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
unit-hacluster-kubernetes-master-0: 23:25:14 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: crm -w -F configure primitive res_kube_scheduler_snap.kube_scheduler.daemon systemd:snap.kube-scheduler.daemon meta migration-threshold="INFINITY" failure-timeout="5s" op monitor interval="5s"
unit-hacluster-kubernetes-master-0: 23:25:16 WARNING unit.hacluster-kubernetes-master/0.ha-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
unit-hacluster-kubernetes-master-0: 23:25:17 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: crm -w -F configure primitive res_kubernetes-master_cee3595_vip ocf:heartbeat:IPaddr2 params ip="10.5.2.203" meta migration-threshold="INFINITY" failure-timeout="5s" op monitor timeout="20s" interval="10s" depth="0"
unit-hacluster-kubernetes-master-0: 23:25:17 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: Configuring Groups: {'grp_kubernetes-master_vips': 'res_kubernetes-master_cee3595_vip'}
unit-hacluster-kubernetes-master-0: 23:25:19 WARNING unit.hacluster-kubernetes-master/0.ha-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
unit-hacluster-kubernetes-master-0: 23:25:19 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: crm -w -F configure group grp_kubernetes-master_vips res_kubernetes-master_cee3595_vip
unit-hacluster-kubernetes-master-0: 23:25:19 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: Configuring Master/Slave (ms): {}
unit-hacluster-kubernetes-master-0: 23:25:19 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: Configuring Master/Slave (ms): {}

Just here it configures the order after solving the issue manually
unit-hacluster-kubernetes-master-0: 23:25:19 DEBUG unit.hacluster-kubernetes-master/0.juju-log ha:17: Configuring Orders: {'clone-then-api': 'Mandatory: cl_res_kube_apiserver_snap.kube_apiserver.daemon res_kube_apiserver_snap.kube_apiserver.daemon', 'manager-after-api': 'Mandatory: res_kube_apiserver_snap.kube_apiserver.daemon res_kube_controller_manager_snap.kube_controller_manager.daemon'}

My guess is that during the execution of ha-relation-changed hook we need to put the cluster under maintenance-mode, configure it and then remove the maintenance-mode. If resources doesn't start after adding it to the cluster, it will have time to receive all the configuration and then make the correct decisions.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.