Activity log for bug #1961448

Date Who What changed Old value New value Message
2022-02-18 21:51:13 Rodrigo Barbieri bug added bug
2022-02-21 13:47:08 Kabanov Oleg bug added subscriber Kabanov Oleg
2022-02-21 16:39:53 James Page bug task added juju
2022-02-21 16:42:14 Rodrigo Barbieri description On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (there matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding). Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data. How to force refresh the relation-data to re-read parameters from network-get ? As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful. Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state? Things I have tried: 1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually Still, changing the binding to the correct one resulted in errors due to the lack of private-address property. 2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually c) setting the private-address properties manually through relation-set d) restarting jujud e) restarting the lxd container None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set. [0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187 On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (therefore matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding). Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data. How to force refresh the relation-data to re-read parameters from network-get ? As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful. Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state? Things I have tried: 1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually Still, changing the binding to the correct one resulted in errors due to the lack of private-address property. 2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually c) setting the private-address properties manually through relation-set d) restarting jujud e) restarting the lxd container None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set. [0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187
2022-02-22 10:55:29 Joseph Phillips juju: status New Triaged
2022-02-22 10:55:31 Joseph Phillips juju: importance Undecided High
2022-02-22 10:55:33 Joseph Phillips juju: assignee Joseph Phillips (manadart)
2022-02-22 10:55:46 Joseph Phillips juju: milestone 2.9.26
2022-03-06 16:52:24 Joseph Phillips juju: status Triaged Fix Committed
2022-03-06 18:13:15 Joseph Phillips juju: status Fix Committed Triaged
2022-03-09 11:03:13 Canonical Juju QA Bot juju: milestone 2.9.26 2.9.27
2022-03-18 12:36:59 Canonical Juju QA Bot juju: milestone 2.9.27 2.9.28
2022-03-25 21:00:05 Rodrigo Barbieri tags sts
2022-03-30 09:21:16 Joseph Phillips juju: status Triaged Incomplete
2022-03-30 13:21:12 Canonical Juju QA Bot juju: milestone 2.9.28 2.9.29
2022-03-30 16:26:01 Rodrigo Barbieri description On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (therefore matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding). Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data. How to force refresh the relation-data to re-read parameters from network-get ? As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful. Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state? Things I have tried: 1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually Still, changing the binding to the correct one resulted in errors due to the lack of private-address property. 2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually c) setting the private-address properties manually through relation-set d) restarting jujud e) restarting the lxd container None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set. [0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187 Juju version used: 2.9.12 On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (therefore matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding). Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data. How to force refresh the relation-data to re-read parameters from network-get ? As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful. Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state? Things I have tried: 1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually Still, changing the binding to the correct one resulted in errors due to the lack of private-address property. 2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually c) setting the private-address properties manually through relation-set d) restarting jujud e) restarting the lxd container None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set. [0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187
2022-04-01 09:14:02 Joseph Phillips juju: milestone 2.9.29
2022-04-06 21:47:08 OpenStack Infra charm-hacluster: status New In Progress
2022-04-12 21:12:52 Felipe Reyes charm-hacluster: assignee Rodrigo Barbieri (rodrigo-barbieri2010)
2022-04-12 21:12:55 Felipe Reyes charm-hacluster: milestone 22.04
2022-04-12 21:22:06 OpenStack Infra charm-hacluster: status In Progress Fix Committed
2022-05-10 16:48:03 Alex Kavanagh charm-hacluster: status Fix Committed Fix Released
2022-05-13 22:21:38 OpenStack Infra tags sts in-stable-focal sts