Activity log for bug #2044821

Date Who What changed Old value New value Message
2023-11-27 17:34:11 John Lettman bug added bug
2023-11-27 20:41:05 John Lettman description Under certain circumstances, power loss causes MySQL state files to become corrupted on charm units. If present on any unit, this corruption causes the charm to fail the `reboot-cluster-from-complete-outage` action when it tries to find the most up-to-date binary logs, preventing power-loss recovery. The following is a snapshot of the action result: https://pastebin.canonical.com/p/t3QvgHMPFc/ As a result, the following "clone method" workaround is required to recover from this critical outage: 1. Obtain the passwords: - `juju run --unit mysql-innodb-cluster/leader leader-get cluster-password` - `juju run --unit mysql-innodb-cluster/leader leader-get mysql.passwd` 2. Access each downed unit and clone the instance from the working unit: - SSH to the downed unit: `juju ssh mysql-innodb-cluster/XX` - Obtain a MySQL shell: `mysql -u root -p # use 'mysql.passwd' when prompted` - Clone the working unit (please note **errata** below): ```sql STOP GROUP_REPLICATION \W; SET GLOBAL super_read_only = 0; CLONE INSTANCE FROM 'clusteruser'@'[IP of the working unit]':3306 IDENTIFIED BY '[use cluster-password]' REQUIRE SSL; ``` 3. Join each downed unit back into the cluster: - Grab a new MySQL shell: `mysqlsh` - Join the cluster: ```python shell.connect('clusteruser:[use cluster-password]@[IP of the working unit]') cluster = dba.get_cluster() cluster.add_instance('clusteruser:[use cluster-password]@[this units IP]') ``` **Errata** for the "clone method:" - Where `CLONE INSTANCE` fails, stating the plugin is not loaded, it may need to be loaded: ```sql INSTALL PLUGIN clone SONAME 'mysql_clone.so'; ``` - Where an error is raised regarding `clone_valid_donor_list`, the IP of the current unit may need to be added: ```sql SET GLOBAL clone_valid_donor_list = '[this units IP]:3306' ``` Under certain circumstances, power loss causes MySQL state files to become corrupted on charm units. If present on any unit, this corruption causes the charm to fail the `reboot-cluster-from-complete-outage` action when it tries to find the most up-to-date binary logs, preventing power-loss recovery. The following is a snapshot of the action result: https://pastebin.canonical.com/p/t3QvgHMPFc/ As a result, the following "clone method" workaround is required to recover from this critical outage: 1. Obtain the passwords:    - `juju run --unit mysql-innodb-cluster/leader leader-get cluster-password`    - `juju run --unit mysql-innodb-cluster/leader leader-get mysql.passwd` 2. Access each downed unit and clone the instance from the working unit:    - SSH to the downed unit: `juju ssh mysql-innodb-cluster/XX`    - Obtain a MySQL shell: `mysql -u root -p # use 'mysql.passwd' when prompted`    - Clone the working unit (please note **errata** below):      ```sql      STOP GROUP_REPLICATION \W;      SET GLOBAL super_read_only = 0;      CLONE INSTANCE FROM 'clusteruser'@'[IP of the working unit]':3306 IDENTIFIED BY '[use cluster-password]' REQUIRE SSL;      ``` 3. Join each downed unit back into the cluster:    - Grab a new MySQL shell: `mysqlsh`    - Join the cluster:      ```python      shell.connect('clusteruser:[use cluster-password]@[IP of the working unit]')      cluster = dba.get_cluster()      cluster.add_instance('clusteruser:[use cluster-password]@[this units IP]')      ``` **Errata** for the "clone method:" - Where `CLONE INSTANCE` fails, stating the plugin is not loaded, it may need to be loaded:   ```sql   INSTALL PLUGIN clone SONAME 'mysql_clone.so';   ``` - Where an error is raised regarding `clone_valid_donor_list`, the IP of the current unit may need to be added:   ```sql   SET GLOBAL clone_valid_donor_list = '[this units IP]:3306'   ``` If possible, could this also be made into a separate action?