Comment 0 for bug 2044821

Revision history for this message
John Lettman (jplettman) wrote :

Under certain circumstances, power loss causes MySQL state files to become corrupted on charm units. If present on any unit, this corruption causes the charm to fail the `reboot-cluster-from-complete-outage` action when it tries to find the most up-to-date binary logs, preventing power-loss recovery.

The following is a snapshot of the action result:
https://pastebin.canonical.com/p/t3QvgHMPFc/

As a result, the following "clone method" workaround is required to recover from this critical outage:

1. Obtain the passwords:
   - `juju run --unit mysql-innodb-cluster/leader leader-get cluster-password`
   - `juju run --unit mysql-innodb-cluster/leader leader-get mysql.passwd`
2. Access each downed unit and clone the instance from the working unit:
   - SSH to the downed unit: `juju ssh mysql-innodb-cluster/XX`
   - Obtain a MySQL shell: `mysql -u root -p # use 'mysql.passwd' when prompted`
   - Clone the working unit (please note **errata** below):
     ```sql
     STOP GROUP_REPLICATION \W;
     SET GLOBAL super_read_only = 0;
     CLONE INSTANCE FROM 'clusteruser'@'[IP of the working unit]':3306 IDENTIFIED BY '[use cluster-password]' REQUIRE SSL;
     ```
3. Join each downed unit back into the cluster:
   - Grab a new MySQL shell: `mysqlsh`
   - Join the cluster:
     ```python
     shell.connect('clusteruser:[use cluster-password]@[IP of the working unit]')
     cluster = dba.get_cluster()
     cluster.add_instance('clusteruser:[use cluster-password]@[this units IP]')
     ```

**Errata** for the "clone method:"
- Where `CLONE INSTANCE` fails, stating the plugin is not loaded, it may need to be loaded:
  ```sql
  INSTALL PLUGIN clone SONAME 'mysql_clone.so';
  ```
- Where an error is raised regarding `clone_valid_donor_list`, the IP of the current unit may need to be added:
  ```sql
  SET GLOBAL clone_valid_donor_list = '[this units IP]:3306'
  ```