Canonical Juju

Bug #1961448
Activity log

Activity log for bug #1961448

Date	Who	What changed	Old value	New value	Message
2022-02-18 21:51:13	Rodrigo Barbieri	bug			added bug
2022-02-21 13:47:08	Kabanov Oleg	bug			added subscriber Kabanov Oleg
2022-02-21 16:39:53	James Page	bug task added		juju
2022-02-21 16:42:14	Rodrigo Barbieri	description	On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (there matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding). Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data. How to force refresh the relation-data to re-read parameters from network-get ? As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful. Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state? Things I have tried: 1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually Still, changing the binding to the correct one resulted in errors due to the lack of private-address property. 2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually c) setting the private-address properties manually through relation-set d) restarting jujud e) restarting the lxd container None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set. [0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187	On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (therefore matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding). Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data. How to force refresh the relation-data to re-read parameters from network-get ? As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful. Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state? Things I have tried: 1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually Still, changing the binding to the correct one resulted in errors due to the lack of private-address property. 2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually c) setting the private-address properties manually through relation-set d) restarting jujud e) restarting the lxd container None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set. [0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187
2022-02-22 10:55:29	Joseph Phillips	juju: status	New	Triaged
2022-02-22 10:55:31	Joseph Phillips	juju: importance	Undecided	High
2022-02-22 10:55:33	Joseph Phillips	juju: assignee		Joseph Phillips (manadart)
2022-02-22 10:55:46	Joseph Phillips	juju: milestone		2.9.26
2022-03-06 16:52:24	Joseph Phillips	juju: status	Triaged	Fix Committed
2022-03-06 18:13:15	Joseph Phillips	juju: status	Fix Committed	Triaged
2022-03-09 11:03:13	Canonical Juju QA Bot	juju: milestone	2.9.26	2.9.27
2022-03-18 12:36:59	Canonical Juju QA Bot	juju: milestone	2.9.27	2.9.28
2022-03-25 21:00:05	Rodrigo Barbieri	tags		sts
2022-03-30 09:21:16	Joseph Phillips	juju: status	Triaged	Incomplete
2022-03-30 13:21:12	Canonical Juju QA Bot	juju: milestone	2.9.28	2.9.29
2022-03-30 16:26:01	Rodrigo Barbieri	description	On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (therefore matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding). Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data. How to force refresh the relation-data to re-read parameters from network-get ? As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful. Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state? Things I have tried: 1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually Still, changing the binding to the correct one resulted in errors due to the lack of private-address property. 2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually c) setting the private-address properties manually through relation-set d) restarting jujud e) restarting the lxd container None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set. [0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187	Juju version used: 2.9.12 On a fully functional deployment where hacluster has the correct binding for the hanode endpoint (therefore matching the IP assigned to the unit), changing the binding to an incorrect one (by running juju bind hacluster <wrong_binding> --force) expectedly causes network-get to fail and hanode-relation-changed hook failure, resulting in failure to write the IP to the ring0_addr properties in corosync.conf because the private-address property disappears from the relation-data (due to failure of network-get due to incorrect binding). Now, setting the binding back to the correct one (through juju bind hacluster <correct_binding>) restores the network-get functionality, but it does not restore the missing private-address property from the relation-data. Therefore the hanode-relation-changed hook failure persists and the ring0_addr still cannot be written to corosync.conf because the private-address property is not found in the relation-data. How to force refresh the relation-data to re-read parameters from network-get ? As I understand, the properties private-address, ingress-address and egress-subnets are "essential" properties that are present in every endpoint, as long as network-get command is successful. Is something blocking the relation-data to being refreshed or re-querying network-get ? like a hook error or blocked state? Things I have tried: 1) First I tried smoothing out the errors from the wrong binding change until status was clear and back to active/idle, before invoking "juju bind hacluster <correct_binding>", such as: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually Still, changing the binding to the correct one resulted in errors due to the lack of private-address property. 2) With the correct binding now set, I then tried to refresh the property and overcome the errors in several ways: a) juju resolved --no-retry b) writing ring0_addr values in corosync.conf manually c) setting the private-address properties manually through relation-set d) restarting jujud e) restarting the lxd container None of those would work, and despite having set the property manually, the code at [0] still re-read "None" from the private-address properties in the relation-data as if they weren't set. [0] https://github.com/juju/charm-helpers/blob/446cbfdad83e15b5cfd20f862d3c3b5b1956b998/charmhelpers/contrib/hahelpers/cluster.py#L187
2022-04-01 09:14:02	Joseph Phillips	juju: milestone	2.9.29
2022-04-06 21:47:08	OpenStack Infra	charm-hacluster: status	New	In Progress
2022-04-12 21:12:52	Felipe Reyes	charm-hacluster: assignee		Rodrigo Barbieri (rodrigo-barbieri2010)
2022-04-12 21:12:55	Felipe Reyes	charm-hacluster: milestone		22.04
2022-04-12 21:22:06	OpenStack Infra	charm-hacluster: status	In Progress	Fix Committed
2022-05-10 16:48:03	Alex Kavanagh	charm-hacluster: status	Fix Committed	Fix Released
2022-05-13 22:21:38	OpenStack Infra	tags	sts	in-stable-focal sts