MySQL InnoDB Cluster Charm

The `reboot-cluster-from-complete-outage` action fails after power-loss binary log corruption

Bug #2044821 reported by John Lettman on 2023-11-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MySQL InnoDB Cluster Charm	New	Undecided	Unassigned

Bug Description

Under certain circumstances, power loss causes MySQL state files to become corrupted on charm units. If present on any unit, this corruption causes the charm to fail the `reboot-cluster-from-complete-outage` action when it tries to find the most up-to-date binary logs, preventing power-loss recovery.

The following is a snapshot of the action result:
https://pastebin.canonical.com/p/t3QvgHMPFc/

As a result, the following "clone method" workaround is required to recover from this critical outage:

1. Obtain the passwords:
   - `juju run --unit mysql-innodb-cluster/leader leader-get cluster-password`
   - `juju run --unit mysql-innodb-cluster/leader leader-get mysql.passwd`
2. Access each downed unit and clone the instance from the working unit:
   - SSH to the downed unit: `juju ssh mysql-innodb-cluster/XX`
   - Obtain a MySQL shell: `mysql -u root -p # use 'mysql.passwd' when prompted`
   - Clone the working unit (please note **errata** below):
     ```sql
     STOP GROUP_REPLICATION \W;
     SET GLOBAL super_read_only = 0;
     CLONE INSTANCE FROM 'clusteruser'@'[IP of the working unit]':3306 IDENTIFIED BY '[use cluster-password]' REQUIRE SSL;
     ```
3. Join each downed unit back into the cluster:
   - Grab a new MySQL shell: `mysqlsh`
   - Join the cluster:
     ```python
     shell.connect('clusteruser:[use cluster-password]@[IP of the working unit]')
     cluster = dba.get_cluster()
     cluster.add_instance('clusteruser:[use cluster-password]@[this units IP]')
     ```

**Errata** for the "clone method:"
- Where `CLONE INSTANCE` fails, stating the plugin is not loaded, it may need to be loaded:
  ```sql
  INSTALL PLUGIN clone SONAME 'mysql_clone.so';
  ```
- Where an error is raised regarding `clone_valid_donor_list`, the IP of the current unit may need to be added:
  ```sql
  SET GLOBAL clone_valid_donor_list = '[this units IP]:3306'
  ```

If possible, could this also be made into a separate action?

See original description

John Lettman (jplettman) on 2023-11-27

description:

updated

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.