MySQL InnoDB Cluster Charm

removing an instance and adding it back does not work

Bug #2006760 reported by Rodrigo Barbieri on 2023-02-09

This bug report is a duplicate of: Bug #2015256: During scale-out of cluster (zaza-openstack-tests) the leader fails to join in the new instance when related to prometheus. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MySQL InnoDB Cluster Charm	New	Undecided	Unassigned

Bug Description

On a fresh jammy 3-unit deployment using charm revision 39 of 8.0/stable channel, trying to remove an instance and adding it back results in error:

juju run-action mysql-innodb-cluster/leader --wait remove-instance address=10.5.3.85

the remove action fails due to bug LP#1954306 but it actually partially succeeds removing the instance:

{"address": "10.5.3.85:3306", "instanceErrors":
["NOTE: group_replication is stopped."], "memberState": "OFFLINE", "mode": "R/O",
"readReplicas": {}, "role": "HA", "status": "(MISSING)", "version": "8.0.32"}

The instance is not removed from the cluster, but it is taken offline and group_replication is stopped.

Trying to add it back now:

juju run-action mysql-innodb-cluster/leader --wait add-instance address=10.5.3.85

The action succeeds on the leader, but the status does not change. Trying to workaround this to start the group_replication back, the only action that does that is update-unit-acls, but it cannot be run due to the condition at [1]. Hacking the code to remove the condition or starting it manually, results in the following state:

{"address": "10.5.3.85:3306",
      "instanceErrors": ["ERROR: GR Recovery channel receiver stopped with an error:
      Fatal error: Invalid (empty) username when attempting to connect to the master
      server. Connection attempt terminated. (13117) at 2023-02-09 15:42:58.656640"],
      "mode": "R/O", "readReplicas": {}, "recovery": {"receiverError": "Fatal error:
      Invalid (empty) username when attempting to connect to the master server. Connection
      attempt terminated.", "receiverErrorNumber": 13117, "state": "CONNECTION_ERROR"}

At this point, another workaround is to forcibly removing the instance, but that hits bugs LP#2006759 and LP#1983158.

[1] https://github.com/openstack/charm-mysql-innodb-cluster/blob/0a3bb225c1a653767f542e1f9023ad27735a5bc5/src/lib/charm/openstack/mysql_innodb_cluster.py#L2034

Tags:

Rodrigo Barbieri (rodrigo-barbieri2010) on 2023-02-09

tags:

added: sts

Revision history for this message

Alex Kavanagh (ajkavanagh) wrote on 2023-04-05:

I'm fairly sure this is due to, or related to, "During scale-out of cluster (zaza-openstack-tests) the leader fails to join in the new instance when related to prometheus" (https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/2015256) where the create_user() for prometheus user causes a write to the db whilst it is configured, but not yet joined, to the cluster; this causes the join_instance() to fail at that point.

I'm going to mark this as a dup, but if you feel it is not, then please un-dup it and add further comments/evidence. Thanks.

Revision history for this message

Rodrigo Barbieri (rodrigo-barbieri2010) wrote on 2023-04-11:

@Alex: Yes and No. I found one of the causes of this but hadn't had time to post back. The thing is, there is a lot usability issues that are being addressed in [1] and fixing those usability issues exposes the problem that when a unit is removed, they stay in SUPER_READ_ONLY mode, and in that mode, they cannot be added back. SSH'ing to the unit and disabling SUPER_READ_ONLY fixes it. A possible solution is to disable SUPER_READ_ONLY before trying to add or after removing, just to make it clean removal, but fixing the usability issues and exposing the problem was my top priority at [1]. I am curious to see if your patch to the bug marked duplicate will change anything. I will test that soon.

[1] https://review.opendev.org/c/openstack/charm-mysql-innodb-cluster/+/875041

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #2015256 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.