unit missing in allowed_units when leadership changes

Bug #1989505 reported by Rodrigo Barbieri
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Triaged
High
Rodrigo Barbieri

Bug Description

Using mysql-innodb-cluster and mysql-router charms, with charms such as keystone, upon changing leadership of mysql-innodb-cluster from /0 to /2 for example, any new keystone units added will face the "Allowed_units list provided but this unit not present" due to relation-data discrepancy error.

Steps to reproduce:

1) deploy 3 units of keystone, mysql-router and mysql-innodb-cluster. Attached bundle for convenience

2) If upon complete deployment the mysql-innodb-cluster leader is any unit other than unit /0, move leader to /0 and clear the relation-data from the previous leader:

a) check data using:

juju show-unit keystone-mysql-router/0

b) clear data using:

juju run -u mysql-innodb-cluster/2 -- relation-set -r db-router:6 MRUP_allowed_units= MRUP_password= db_host= mysqlrouter_allowed_units= mysqlrouter_password= wait_timeout=

3) Now, confirm that mysql-innodb-cluster/0 has the [MRUP|mysqlrouter]_allowed_units relation-data provided by mysql-innodb-cluster/0 and all other mysql-innodb-cluster units are not providing the property, by running:

juju show-unit keystone-mysql-router/0

4) move mysql-innodb-cluster leadership to any other unit other than /0 (I usually move to /2). This can be done by simply stopping the jujud daemon on mysql-innodb-cluster/0 (and on /1, to make sure it goes straight to /2 if desired). After leadership change completes, restore the jujud daemon on mysql-innodb-cluster/0.

5) add a new keystone unit:

juju add-unit keystone

6) Wait for it to finish deployment and the problem is seen in juju status.

7) See that the relation-data is stale for mysql-innodb-cluster/0, as the new leader has published new data, but the new keystone unit can only make use of the stale relation-data:

juju show-unit keystone-mysql-router/0

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :
tags: added: sts
Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

according to [1] and the fact that we cannot determine a leader if we are not the unit calling is_leader, then mysql-innodb-cluster units MUST somehow agree on their relation-data.

[1] https://github.com/juju-solutions/charms.reactive/blob/8496d04e6497200ca8530d84eff4a3191509ed0f/charms/reactive/relations.py#L817

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

actually we can figure out who is the leader, using the juju_leases command which must be run on a controller. It is unknown whether it is possible to run in command from an agent and find the result. It would be, however, prone to race conditions even if it was possible.

The best workaround is to move leadership back to the lowest numbered unit, as fixing the relation-data with relation-set will be only temporary as the data will be stale again when it is updated by the leader-unit if it is higher-numbered. Moving leadership back workaround will also be temporary until leadership eventually changes again, re-triggering the issue when relation-data also gets updated.

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

Given that the desired leader is the lowest-numbered unit, it is best to block communication or stop jujud agents of the other units except for the desired leader, than to invoke juju_revoke_lease repeatedly until finally the desired leader is randomly selected

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-mysql-innodb-cluster (master)
Changed in charm-mysql-innodb-cluster:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-mysql-innodb-cluster (master)

Reviewed: https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/859901
Committed: https://opendev.org/openstack/charm-mysql-innodb-cluster/commit/662656055d0a55dcadf9649801300c719c47e54c
Submitter: "Zuul (22348)"
Branch: master

commit 662656055d0a55dcadf9649801300c719c47e54c
Author: Rodrigo Barbieri <email address hidden>
Date: Thu Sep 29 18:29:01 2022 -0300

    Invoke create_databases_and_users on upgrade

    Fix for missing allowed_units is implemented through an
    interface code change, but the 'create_databases_and_users'
    method needs to be invoked to update the relation-data
    during upgrade so the fix can be delivered.

    Partial-bug: #1989505
    Change-Id: Ia2a90a0e210524ec485ce0d8493350699f8a6f7e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-mysql-innodb-cluster (stable/jammy)
Revision history for this message
Nishant Dash (dash3) wrote :

Hello,

Running into a similar case. Using 8.0.31 (8.0/stable 35) on a focal-ussuri deployment.

When trying to migrate a unit (add a new unit, remove an old unit) for an application with a corresponding mysql-router, we would that the db-router relation data would contain the allowed units in an inconsistent state. Specifically, the master innodb unit would report the right units of the application but the non master nodes would not have the new unit we added in its allowed lists. This was the case for every app we had to deal with from cinder to n-c-c, etc...

Workaround;
Scenario
mysql-innodb-cluster/0 -> non master, R/O
mysql-innodb-cluster/2 -> non master, R/W
mysql-innodb-cluster/4 -> master R/O

1) grab the relation id of db-router from `juju show-unit $APP-mysql-router/3`

Taking aodh as an example,
2) grab LIST, i.e, correct entry from `juju show-unit $APP-mysql-router/<X>` under master innodb

3)
LIST="aodh-mysql-router/26 aodh-mysql-router/27 aodh-mysql-router/28"
juju run -u mysql-innodb-cluster/0 -- "relation-set -r db-router:36 mysqlrouter_allowed_units='\"$LIST\"' MRUP_allowed_units='\"$LIST\"'"

- repeat same command as above for mysql-innodb-cluster/2

4) Then you can verify that all innodbs will have the same entries, and can also see it propagate to the unit itself in allowed units like so,
juju show-unit $APP/{A,B,C} | grep allowed

Changed in charm-mysql-innodb-cluster:
assignee: nobody → Rodrigo Barbieri (rodrigo-barbieri2010)
Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

@Nishant, that is a valid workaround, until you add/remove another unit and hit the problem again. The best way to overcome the problem and avoid it repeating itself is to move unit leadership back to the lowest numbered unit. In your example it is mysql-innodb-cluster/0. This way the relation-data that will be read will be the one maintained by the leader. If leadership changes again, you just have to move it back to lowest numbered unit.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-mysql-innodb-cluster (stable/jammy)

Reviewed: https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/866666
Committed: https://opendev.org/openstack/charm-mysql-innodb-cluster/commit/87d3fd3b65560db8c0c752cac0b3b549c86f4379
Submitter: "Zuul (22348)"
Branch: stable/jammy

commit 87d3fd3b65560db8c0c752cac0b3b549c86f4379
Author: Rodrigo Barbieri <email address hidden>
Date: Thu Sep 29 18:29:01 2022 -0300

    Invoke create_databases_and_users on upgrade

    Fix for missing allowed_units is implemented through an
    interface code change, but the 'create_databases_and_users'
    method needs to be invoked to update the relation-data
    during upgrade so the fix can be delivered.

    Partial-bug: #1989505
    Change-Id: Ia2a90a0e210524ec485ce0d8493350699f8a6f7e
    (cherry picked from commit 662656055d0a55dcadf9649801300c719c47e54c)

tags: added: in-stable-jammy
Changed in charm-mysql-innodb-cluster:
status: In Progress → Triaged
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.