Re-configuring resource parameters dysfunctional (cannot reconfigure VIPs)

Bug #1513889 reported by Peter Sabaini
60
This bug affects 11 people
Affects Status Importance Assigned to Milestone
OpenStack HA Cluster Charm
In Progress
Medium
Unassigned
hacluster (Juju Charms Collection)
Invalid
Undecided
Unassigned

Bug Description

I've updated a resource parameter (vip_cidr) that has been set erroneously on the principal, but the parameter change didn't take effect. From a cursory reading of hacluster/hooks/hooks.py, configure_cluster() is deliberately designed that way: there's a marker file that, if set, makes the function exit early.

The cluster should support reconfiguration, or at least document the fact that it doesn't.

Revision history for this message
Giorgio Di Guardia (giorgiodg) wrote :

+1

Revision history for this message
Alvaro Uria (aluria) wrote :

This issue not only happens with vip_cidr parameter, but it also happens with vip.

OTOH, when removing the relation (ie. mysql mysql-hacluster), directory /var/lib/pacemaker doesn't get removed (as it does happen with /var/lib/corosync). Re-creating the relation will use old data.

James Page (james-page)
Changed in hacluster (Juju Charms Collection):
status: New → Invalid
tags: added: cpe cpe-sa
summary: - Re-configuring resource parameters dysfunctional
+ Re-configuring resource parameters dysfunctional (cannot reconfigure
+ VIPs)
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (6.7 KiB)

+1 on this

if you ever configure an incorrect VIP, you won't be able to change it afterwards. Relation data gets passed from a primary unit to a subordinate hacluster but it does not react properly to the config change.

http://paste.ubuntu.com/25096416/

Although normally VIPs do not change much, when they do it is impossible to change them without manual intervention. Also, config-changes silently succeed making you wonder what is wrong with the service.

It seems that the only workaround is to remove a relation between a primary application and a subordinate, wait until hacluster is cleaned up, remove leftovers and VIPs from interfaces and re-add the relation.

After doing remove-relation hangs for ~ 2 minutes after printing 'Purging configuration files for corosync':

(openstack-client) ubuntu@maas:~/bundles$ juju debug-log -i unit-hacluster-heat*
unit-hacluster-heat-1: 10:04:20 DEBUG unit.hacluster-heat/1.stop dpkg: warning: while removing corosync, directory '/etc/corosync' not empty so not removed
unit-hacluster-heat-1: 10:04:20 DEBUG unit.hacluster-heat/1.stop Processing triggers for man-db (2.7.5-1) ...
unit-hacluster-heat-0: 10:04:20 DEBUG unit.hacluster-heat/0.stop Removing corosync (2.3.5-3ubuntu1) ...
unit-hacluster-heat-2: 10:04:20 DEBUG unit.hacluster-heat/2.stop Removing corosync (2.3.5-3ubuntu1) ...
unit-hacluster-heat-0: 10:04:21 DEBUG unit.hacluster-heat/0.stop Purging configuration files for corosync (2.3.5-3ubuntu1) ...
unit-hacluster-heat-2: 10:04:21 DEBUG unit.hacluster-heat/2.stop Purging configuration files for corosync (2.3.5-3ubuntu1) ...
unit-hacluster-heat-0: 10:04:21 DEBUG unit.hacluster-heat/0.stop dpkg: warning: while removing corosync, directory '/etc/corosync' not empty so not removed
unit-hacluster-heat-0: 10:04:21 DEBUG unit.hacluster-heat/0.stop Processing triggers for man-db (2.7.5-1) ...
unit-hacluster-heat-2: 10:04:21 DEBUG unit.hacluster-heat/2.stop dpkg: warning: while removing corosync, directory '/etc/corosync' not empty so not removed
unit-hacluster-heat-2: 10:04:21 DEBUG unit.hacluster-heat/2.stop Processing triggers for man-db (2.7.5-1) ...
unit-hacluster-heat-0: 10:04:20 DEBUG unit.hacluster-heat/0.stop Purging configuration files for pacemaker (1.1.14-2ubuntu1.1) ...
unit-hacluster-heat-2: 10:04:20 DEBUG unit.hacluster-heat/2.stop Purging configuration files for pacemaker (1.1.14-2ubuntu1.1) ...
unit-hacluster-heat-1: 10:04:19 DEBUG unit.hacluster-heat/1.stop Purging configuration files for pacemaker (1.1.14-2ubuntu1.1) ...
unit-hacluster-heat-1: 10:04:19 DEBUG unit.hacluster-heat/1.stop Removing corosync (2.3.5-3ubuntu1) ...
unit-hacluster-heat-1: 10:04:20 DEBUG unit.hacluster-heat/1.stop Purging configuration files for corosync (2.3.5-3ubuntu1) ...
unit-hacluster-heat-0: 10:06:23 ERROR unit.hacluster-heat/0.juju-log Pacemaker is down. Please manually start it.
unit-hacluster-heat-0: 10:06:23 INFO juju.worker.uniter.operation ran "stop" hook
unit-hacluster-heat-2: 10:06:23 ERROR unit.hacluster-heat/2.juju-log Pacemaker is down. Please manually start it.
unit-hacluster-heat-1: 10:06:22 ERROR unit.hacluster-heat/1.juju-log Pacemaker is down. Please manually start it.
unit-hacluster-heat-1: 10:...

Read more...

Changed in charm-hacluster:
status: New → Confirmed
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Also, in addition to steps in #3 I had to manually do 'pkill -f corosync' on each node before I could properly `systemctl restart corosync` or before charm could report something other than

  hacluster-heat/7 blocked executing <addr> Pacemaker is down. Please manually start it.

. It appears to be that even after packages are purged, the related processes are not killed.

tags: added: cpec
Ante Karamatić (ivoks)
tags: added: cpe-onsite
removed: cpe cpe-sa cpec
James Page (james-page)
Changed in charm-hacluster:
importance: Undecided → Medium
status: Confirmed → Triaged
Revision history for this message
Xav Paice (xavpaice) wrote :

We changed the vip on an application recently, and did not remove/add the hacluster relation. The new VIP (.43) was configured as an extra resource, outside the resource group, and the old VIP (.42) was left alone:

vault-2# crm configure edit

node 1000: vault-1
node 1001: vault-2
node 1002: vault-3
primitive res_vault-ext_225dab8_vip IPaddr2 \
        params ip=172.27.84.42 \
        meta migration-threshold=INFINITY failure-timeout=5s \
        op monitor timeout=20s interval=10s depth=0
primitive res_vault-ext_8f67fe7_vip IPaddr2 \
        params ip=172.27.84.43 \
        meta migration-threshold=INFINITY failure-timeout=5s \
        op monitor timeout=20s interval=10s depth=0
group grp_vault-ext_vips res_vault-ext_225dab8_vip
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.18-2b07d5c5a9 \
        cluster-infrastructure=corosync \
        cluster-name=debian \
        no-quorum-policy=stop \
        cluster-recheck-interval=60 \
        stonith-enabled=false \
        last-lrm-refresh=1631512275
rsc_defaults rsc-options: \
        resource-stickiness=100 \
        failure-timeout=180

I then deleted res_vault-ext_225dab8_vip and added res_vault-ext_8f67fe7_vip to grp_vault-ext_vips manually as a clean up. The pacemaker status is good, and the cluster operates correctly, however charm status is now "waiting", with "Resource: res_vault-ext_225dab8_vip not yet configured".

This prevents a cleanup even if the charm itself does not support removing resources no longer needed, because (as pointed out by the author of LP:#1943422):

juju run -u hacluster-vault/0 -- relation-get -r 300 - vault/0
corosync_mcastport: "4440"
egress-subnets: 172.27.84.62/32
ingress-address: 172.27.84.62
json_groups: '{"grp_vault-ext_vips": "res_vault-ext_225dab8_vip res_vault-ext_8f67fe7_vip"}'
json_resource_params: '{"res_vault-ext_225dab8_vip": " params ip=\"172.27.84.42\" meta
  migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor timeout=\"20s\"
  interval=\"10s\" depth=\"0\"", "res_vault-ext_8f67fe7_vip": " params ip=\"172.27.84.43\" meta
  migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor timeout=\"20s\"
  interval=\"10s\" depth=\"0\""}'
json_resources: '{"res_vault-ext_225dab8_vip": "ocf:heartbeat:IPaddr2", "res_vault-ext_8f67fe7_vip":
  "ocf:heartbeat:IPaddr2"}'

Revision history for this message
Joe Guo (guoqiao) wrote :

I tried to remove `res_vault-ext_225dab8_vip` from the relation data with relation-set, but it doesn't work.

The cli I have tried:

1) full cmd:

juju run -u hacluster-vault/0 -- relation-set -r 300 json_groups='{"grp_vault-ext_vips": "res_vault-ext_8f67fe7_vip"}' json_resource_params='{"res_vault-ext_8f67fe7_vip": " params ip=\"172.27.84.43\" meta migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor timeout=\"20s\" interval=\"10s\" depth=\"0\""}' json_resources='{"res_vault-ext_8f67fe7_vip": "ocf:heartbeat:IPaddr2"}'

2) cmd with --file optiton:

cat 0.yaml

json_groups: '{"grp_vault-ext_vips": "res_vault-ext_8f67fe7_vip"}'
json_resource_params: '{"res_vault-ext_8f67fe7_vip": " params ip=\"172.27.84.43\" meta migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor timeout=\"20s\" interval=\"10s\" depth=\"0\""}'
json_resources: '{"res_vault-ext_8f67fe7_vip": "ocf:heartbeat:IPaddr2"}'

juju scp 0.yaml hacluster-vault/0:/tmp/
juju run -u hacluster-vault/0 -- relation-set -r 300 --file /tmp/0.yaml

3) only modify 1 single key:

juju run -u hacluster-vault/0 -- relation-set -r 300 json_resources='{"res_vault-ext_8f67fe7_vip": "ocf:heartbeat:IPaddr2"}'

with any of above methods, there is no error, but no effect either, the relation data remains the same.

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

I was able to successfully delete old resources without removing the relations. Basically a relation-set needs to used as this:

juju run -u glance/1 -- relation-set -r ha:43 json_delete_resources='["res_glance_ens3_vip","res_glance_242d562_vip"]'

notice that I'm applying the relation-set on glance, because it is glance that writes the data into that relation.

So there's this json_delete_resources property that triggers the cleanup of VIPs in the hacluster charm. When VIPs are changed, on most charms (like glance), it will replace the relation-data with the new VIP data, which hacluster will then apply, however, it does not clean up the old ones on pacemaker, so you end up with the new VIP + the old ones you already had.

On some other charms, like placement, it is different. Every time the VIP for placement is changed, it appends to the relation-data instead of overriding, so the relation-set will be a bit similar to what Joe Guo posted above. Therefore it is necessary to reset the relation-data to the only VIPs you want, manually removing the old ones from the relation-data, in addition to the json_delete_resources to clean up the old one from pacemaker:

juju run -u placement/0 -- relation-set -r ha:48 json_groups='{"grp_placement_vips": "res_placement_df14adf_vip"}' json_resources='{"res_placement_df14adf_vip": "ocf:heartbeat:IPaddr2", "res_placement_haproxy": "lsb:haproxy"}' json_delete_resources='["res_placement_ens3_vip","res_placement_72e5569_vip"]'

But those charms that append the data to the relation (like placement), they do that every time they write to the relation, despite the VIP config being changed, so they save the old VIPs somewhere. We have ongoing investigation indicating that it is sqlite, but I still need to test further to devise a proper series of steps to clean that up.

tags: added: sts
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-hacluster (master)
Changed in charm-hacluster:
status: Triaged → In Progress
Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

The above change addresses the issue from the hacluster point of view. However, reactive charms such as placement and vault use the charm-interface-hacluster interface and therefore behave differently than classic charms. Classic charms always provide the value of the VIP in config into the relation data. Reactive charms using that interface append the reconfigured values and provide multiple VIPs into the relation data. The interface and/or the charms using the interface need to have a separate fix so the hacluster charm always receive the correct data from the relation.

no longer affects: charm-placement
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-hacluster (master)

Change abandoned by "Rodrigo Barbieri <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/charm-hacluster/+/818996
Reason: this is still a very nice improvement to have, but we do not currently have a meaningful demand or motivation for further working on it at the moment. Abandoning it

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.