MySQL InnoDB Cluster Charm

Bug #2044821
Activity log

Activity log for bug #2044821

Date	Who	What changed	Old value	New value	Message
2023-11-27 17:34:11	John Lettman	bug			added bug
2023-11-27 20:41:05	John Lettman	description	Under certain circumstances, power loss causes MySQL state files to become corrupted on charm units. If present on any unit, this corruption causes the charm to fail the `reboot-cluster-from-complete-outage` action when it tries to find the most up-to-date binary logs, preventing power-loss recovery. The following is a snapshot of the action result: https://pastebin.canonical.com/p/t3QvgHMPFc/ As a result, the following "clone method" workaround is required to recover from this critical outage: 1. Obtain the passwords: - `juju run --unit mysql-innodb-cluster/leader leader-get cluster-password` - `juju run --unit mysql-innodb-cluster/leader leader-get mysql.passwd` 2. Access each downed unit and clone the instance from the working unit: - SSH to the downed unit: `juju ssh mysql-innodb-cluster/XX` - Obtain a MySQL shell: `mysql -u root -p # use 'mysql.passwd' when prompted` - Clone the working unit (please note errata below): ```sql STOP GROUP_REPLICATION \W; SET GLOBAL super_read_only = 0; CLONE INSTANCE FROM 'clusteruser'@'[IP of the working unit]':3306 IDENTIFIED BY '[use cluster-password]' REQUIRE SSL; ``` 3. Join each downed unit back into the cluster: - Grab a new MySQL shell: `mysqlsh` - Join the cluster: ```python shell.connect('clusteruser:[use cluster-password]@[IP of the working unit]') cluster = dba.get_cluster() cluster.add_instance('clusteruser:[use cluster-password]@[this units IP]') ``` Errata for the "clone method:" - Where `CLONE INSTANCE` fails, stating the plugin is not loaded, it may need to be loaded: ```sql INSTALL PLUGIN clone SONAME 'mysql_clone.so'; ``` - Where an error is raised regarding `clone_valid_donor_list`, the IP of the current unit may need to be added: ```sql SET GLOBAL clone_valid_donor_list = '[this units IP]:3306' ```	Under certain circumstances, power loss causes MySQL state files to become corrupted on charm units. If present on any unit, this corruption causes the charm to fail the `reboot-cluster-from-complete-outage` action when it tries to find the most up-to-date binary logs, preventing power-loss recovery. The following is a snapshot of the action result: https://pastebin.canonical.com/p/t3QvgHMPFc/ As a result, the following "clone method" workaround is required to recover from this critical outage: 1. Obtain the passwords: - `juju run --unit mysql-innodb-cluster/leader leader-get cluster-password` - `juju run --unit mysql-innodb-cluster/leader leader-get mysql.passwd` 2. Access each downed unit and clone the instance from the working unit: - SSH to the downed unit: `juju ssh mysql-innodb-cluster/XX` - Obtain a MySQL shell: `mysql -u root -p # use 'mysql.passwd' when prompted` - Clone the working unit (please note errata below): ```sql STOP GROUP_REPLICATION \W; SET GLOBAL super_read_only = 0; CLONE INSTANCE FROM 'clusteruser'@'[IP of the working unit]':3306 IDENTIFIED BY '[use cluster-password]' REQUIRE SSL; ``` 3. Join each downed unit back into the cluster: - Grab a new MySQL shell: `mysqlsh` - Join the cluster: ```python shell.connect('clusteruser:[use cluster-password]@[IP of the working unit]') cluster = dba.get_cluster() cluster.add_instance('clusteruser:[use cluster-password]@[this units IP]') ``` Errata for the "clone method:" - Where `CLONE INSTANCE` fails, stating the plugin is not loaded, it may need to be loaded: ```sql INSTALL PLUGIN clone SONAME 'mysql_clone.so'; ``` - Where an error is raised regarding `clone_valid_donor_list`, the IP of the current unit may need to be added: ```sql SET GLOBAL clone_valid_donor_list = '[this units IP]:3306' ``` If possible, could this also be made into a separate action?