unit-mysql-innodb-cluster-0 /var/log/mysql/error.log: 2020-06-04T19:00:16.144352Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: 172.17.108.12:3306' 2020-06-04T19:02:08.894844Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: 172.17.108.10:3306' 2020-06-04T19:03:13.320409Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 172.17.108.12:3306 has become unreachable.' 2020-06-04T19:03:13.320507Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address 172.17.108.10:3306 has become unreachable.' 2020-06-04T19:03:13.320524Z 0 [ERROR] [MY-011495] [Repl] Plugin group_replication reported: 'This server is not able to reach a majority of members in the group. This server will now block all updates. The server will remain blocked until contact with the majority is restored. It is possible to use group_replication_force_members to force a new group membership.' 2020-06-04T19:03:16.519849Z 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.' 2020-06-04T19:03:16.520033Z 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.' 2020-06-04T19:03:16.528336Z 160 [Warning] [MY-013373] [Repl] Plugin group_replication reported: 'Started auto-rejoin procedure attempt 1 of 1000' The symptoms here look similar to what I saw in a CI failure for https://review.opendev.org/#/c/742500. Automatic re-connection retries are set to 1000 for members that lose connectivity to the cluster, however, they are expelled 5 seconds after the loss of connectivity. https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-cluster-working-with-cluster.html#configuring-automatic-rejoin-of-instances https://opendev.org/openstack/charm-mysql-innodb-cluster/src/commit/6d979f9ab3dba1938baa5262c20d5ad423aee273/src/config.yaml#L21-L26 https://opendev.org/openstack/charm-mysql-innodb-cluster/src/commit/6d979f9ab3dba1938baa5262c20d5ad423aee273/src/reactive/mysql_innodb_cluster_handlers.py#L221-L223 Also the period of time between retries seems to be 5 minutes: "After an unsuccessful auto-rejoin attempt, the member waits 5 minutes before the next try." https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_autorejoin_tries But in this case there was only one attempt. I don't think bumping up the group_replication_member_expel_timeout will fully fix this but at least it will make it equal to the one used in newer versions upstream. https://dev.mysql.com/doc/refman/8.0/en/group-replication-options.html#sysvar_group_replication_member_expel_timeout "Up to and including MySQL 8.0.20, the value of group_replication_member_expel_timeout defaults to 0, meaning that there is no waiting period and a suspected member is liable for expulsion immediately after the 5-second detection period ends. From MySQL 8.0.21, the value defaults to 5, meaning that a suspected member is liable for expulsion 5 seconds after the 5-second detection period." The version we use in Focal is 8.0.20-0ubuntu0.20.04.1 so the default of "5" does not apply to us yet.