Comment 0 for bug 1961448

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (there matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding).

Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data.

How to force refresh the relation-data to re-read parameters from network-get ?

As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful.

Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state?

Things I have tried:

1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as:

a) juju resolved --no-retry
b) writing ring0_addr values in corosync.conf manually

Still, changing the binding to the correct one resulted in errors due to the lack of private-address property.

2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways:

a) juju resolved --no-retry
b) writing ring0_addr values in corosync.conf manually
c) setting the private-address properties manually through relation-set
d) restarting jujud
e) restarting the lxd container

None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set.

[0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187