After series upgrade (bionic->focal) of gnocchi and its subordinate hacluster, hacluster hanode-relation-changed hook errors because the crm parameters are out of order

Bug #1942787 reported by Aurelien Lourot
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Gnocchi Charm
New
Undecided
Unassigned
OpenStack HA Cluster Charm
New
Undecided
Unassigned

Bug Description

Seen together with ~gabor.meszaros . After having upgraded gnocchi and its corresponding hacluster subordinate, the hacluster hanode-relation-changed hook errors because the crm parameters are out of order (depth, timeout, interval VS. timeout, interval, depth). It seems like both charms are setting and getting things in the right order with the latest implementation but most likely the relation still contained old data from before the upgrade, now considered in the wrong order.

2021-09-06 10:23:37 DEBUG juju-log hanode:20: Configuring and (maybe) restarting corosync
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Writing file /etc/systemd/system/corosync.service.d/overrides.conf root:root 444
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Writing file /etc/systemd/system/pacemaker.service.d/overrides.conf root:root 444
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Writing file /etc/default/corosync root:root 444
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Writing file /etc/corosync/uidgid.d/hacluster root:root 444
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Changing permissions on existing content: 33024 -> 256
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Changing permissions on existing content: 33056 -> 288
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Found sufficient values in local config to populate corosync.conf
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Writing file /etc/corosync/corosync.conf root:root 444
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Pacemaker is ready
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Applying global cluster configuration
2021-09-06 10:23:37 DEBUG juju-log hanode:20: Configuring no-quorum-policy to stop
2021-09-06 10:23:38 WARNING hanode-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
2021-09-06 10:23:39 WARNING hanode-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
2021-09-06 10:23:39 DEBUG juju-log hanode:20: Configuring cluster-recheck-interval to 60 seconds
2021-09-06 10:23:39 WARNING hanode-relation-changed ERROR: (unpack_config) warning: Blind faith: not fencing unseen nodes
2021-09-06 10:23:39 DEBUG juju-log hanode:20: Checking monitor host configuration
2021-09-06 10:23:40 INFO juju-log hanode:20: Disabling STONITH
2021-09-06 10:23:41 INFO juju-log hanode:20: Setting cluster symmetry
2021-09-06 10:23:41 WARNING juju-log hanode:20: Inconsistent or absent enable-resources setting []
2021-09-06 10:23:41 WARNING juju-log hanode:20: Unable to calculated desired symmetric-cluster setting
2021-09-06 10:23:41 DEBUG juju-log hanode:20: Deleting Resources
2021-09-06 10:23:42 DEBUG juju-log hanode:20: Configuring Resources: {'res_gnocchi_b16e788_vip': 'ocf:heartbeat:IPaddr2', 'res_gnocchi_c41e929_vip': 'ocf:heartbeat:IPaddr2', 'res_gnocchi_haproxy': 'lsb:haproxy'}
2021-09-06 10:23:43 INFO juju-log hanode:20: Updating resource res_gnocchi_b16e788_vip
2021-09-06 10:23:43 DEBUG juju-log hanode:20: File content:
b'primitive res_gnocchi_b16e788_vip ocf:heartbeat:IPaddr2 \\\n\t params ip="10.64.5.11" meta migration-threshold="INFINITY" failure-timeout="5s" op monitor depth="0" timeout="20s" interval="10s"'
2021-09-06 10:23:43 INFO juju-log hanode:20: Update command: crm configure load update /tmp/tmpbfect71t
2021-09-06 10:23:44 WARNING hanode-relation-changed ERROR: 1: syntax in primitive: Attribute order error: timeout must appear before any instance attribute parsing 'primitive res_gnocchi_b16e788_vip ocf:heartbeat:IPaddr2 params ip=10.64.5.11 meta migration-threshold=INFINITY failure-timeout=5s op monitor depth=0 timeout=20s interval=10s'
2021-09-06 10:23:44 WARNING juju-log hanode:20: crm command exit code: 1
2021-09-06 10:23:44 WARNING hanode-relation-changed Traceback (most recent call last):
2021-09-06 10:23:44 WARNING hanode-relation-changed File "/var/lib/juju/agents/unit-hacluster-gnocchi-4/charm/hooks/hanode-relation-changed", line 754, in <module>
2021-09-06 10:23:44 WARNING hanode-relation-changed hooks.execute(sys.argv)
2021-09-06 10:23:44 WARNING hanode-relation-changed File "/var/lib/juju/agents/unit-hacluster-gnocchi-4/charm/charmhelpers/core/hookenv.py", line 956, in execute
2021-09-06 10:23:44 WARNING hanode-relation-changed self._hooks[hook_name]()
2021-09-06 10:23:44 WARNING hanode-relation-changed File "/var/lib/juju/agents/unit-hacluster-gnocchi-4/charm/hooks/hanode-relation-changed", line 297, in hanode_relation_changed
2021-09-06 10:23:44 WARNING hanode-relation-changed ha_relation_changed()
2021-09-06 10:23:44 WARNING hanode-relation-changed File "/var/lib/juju/agents/unit-hacluster-gnocchi-4/charm/hooks/hanode-relation-changed", line 487, in ha_relation_changed
2021-09-06 10:23:44 WARNING hanode-relation-changed raise Exception(msg)
2021-09-06 10:23:44 WARNING hanode-relation-changed Exception: Cannot update pcmkr resource: res_gnocchi_b16e788_vip
2021-09-06 10:23:44 ERROR juju.worker.uniter.operation runhook.go:139 hook "hanode-relation-changed" (via explicit, bespoke hook script) failed: exit status 1

$ juju run -u hacluster-gnocchi/4 -- relation-get -r 67 json_resource_params gnocchi/4
{"res_gnocchi_b16e788_vip": " params ip=\"10.64.5.11\" meta migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor depth=\"0\" timeout=\"20s\" interval=\"10s\"", "res_gnocchi_c41e929_vip": " params ip=\"10.64.0.148\" meta migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor depth=\"0\" timeout=\"20s\" interval=\"10s\"", "res_gnocchi_haproxy": " meta migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor interval=\"5s\""}

A workaround is setting the data on the relation as follows:
{"res_gnocchi_b16e788_vip": " params ip=\"10.64.5.11\" meta migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor timeout=\"20s\" interval=\"10s\" depth=\"0\"", "res_gnocchi_c41e929_vip": " params ip=\"10.64.0.148\" meta migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor timeout=\"20s\" interval=\"10s\" depth=\"0\"", "res_gnocchi_haproxy": " meta migration-threshold=\"INFINITY\" failure-timeout=\"5s\" op monitor interval=\"5s\""}

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Gabor do you remember which revisions of these two charms were in place before and after the upgrade? Thanks!

Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

I haven't upgraded the charm, it's pure bionic -> focal upgrade.
Charm revisions are gnocchi 46, hacluster-gnocchi 76

tags: added: series-upgrade
removed: charm-upgrade
Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote :

Facing the same issue again.

summary: - After charm-upgrade of gnocchi and its subordinate hacluster, hacluster
- hanode-relation-changed hook errors because the crm parameters are out
- of order
+ After series upgrade (bionic->focal) of gnocchi and its subordinate
+ hacluster, hacluster hanode-relation-changed hook errors because the crm
+ parameters are out of order
Revision history for this message
Gábor Mészáros (gabor.meszaros) wrote (last edit ):

This time I spent a bit on the workaround automation.

haunit=hacluster-gnocchi/2

# Extract the principal unit on which the hacluster is deployed on
unit=$(juju run -u $haunit 'relation-list -r $(relation-ids ha | cut -d: -f2)')
# Dump current relation data in case something
juju run -u $haunit 'relation-get -r $(relation-ids ha) json_resource_params '$unit
# Rewrite the resource_params field with bringing timeout to front
juju run -u $haunit 'rel_id=$(relation-ids ha); relation-get -r $rel_id json_resource_params '$unit' | sed "s/depth=..0.. timeout=..20s.. interval=..10s../timeout=\\\"20s\\\" interval=\\\"10s\\\" depth=\\\"0\\\"/g" > /tmp/json_resource_params_fix'
juju run -u $unit 'rel_id=$(relation-ids ha); relation-set -r $rel_id json_resource_params="$(cat /tmp/json_resource_params_fix)"'
# Re-dump relation data to confirm it got rewritten as desired
juju run -u $haunit 'relation-get -r $(relation-ids ha) json_resource_params '$unit
# Resolve and retry if unit is in error state
juju resolved $haunit

The commands above are not working. Couldn't figure out how to in place update the backslashes.
Ended up dumping the relation data to /tmp/json_resource_params_fix and edited manually.
Then loaded back with the relation-set command above.

tags: added: aubergine
Revision history for this message
Bayani Carbone (bcarbone) wrote (last edit ):

This worked for me (previous fix was just missing an extra pair of baskslashes each time):

haunits=$(juju status gnocchi | grep "hacluster-gnocchi/" | awk '{print $1}' | cut -d* -f1)
for haunit in $haunits
do
    echo $haunit
    # Extract the principal unit on which the hacluster is deployed on
    unit=$(juju run -u $haunit 'relation-list -r $(relation-ids ha | cut -d: -f2)')
    # Dump current relation data in case something
    juju run -u $haunit 'relation-get -r $(relation-ids ha) json_resource_params '$unit
    # Rewrite the resource_params field with bringing timeout to front
    juju run -u $haunit 'rel_id=$(relation-ids ha); relation-get -r $rel_id json_resource_params '$unit' | sed "s/depth=..0.. timeout=..20s.. interval=..10s../timeout=\\\\\"20s\\\\\" interval=\\\\\"10s\\\\\" depth=\\\\\"0\\\\\"/g" > /tmp/json_resource_params_fix'
    #juju run -u $unit 'rel_id=$(relation-ids ha); relation-set -r $rel_id json_resource_params="$(cat /tmp/json_resource_params_fix)"'
    # Re-dump relation data to confirm it got rewritten as desired
    juju run -u $haunit 'relation-get -r $(relation-ids ha) json_resource_params '$unit
done

Revision history for this message
Billy Olsen (billy-olsen) wrote :

I think I see what's going on here. Bug #1843830 was seen during the 19.10 development cycle where the crmsh version in Eoan included commit [0]. In order to fix this, the hacluster reactive interface and charmhelpers code was fixed to correct the ordering of the depth parameter for VIP resources. This resolved the problem for any new hacluster VIP resources as they use the correct ordering. This of course works for all new VIPs that are set between the principle charm and the hacluster subordinate charm and charms that update/reconfigure the ha resources on charm upgrade. However, the gnocchi charm is not updating the hacluster resources on charm-upgrade and therefore the VIP resource parameters were never updated.

In order to recreate this, one would need to deploy the charm with a sufficiently older version on bionic which has the ordering of the depth parameter for VIP resource in the wrong location and then upgrade the charm to the latest one, and then perform the upgrade from bionic->focal.

[0] - https://github.com/ClusterLabs/crmsh/commit/d6a9a3ebc8e17bf4d31b0660d2f75eea13a85611

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.