Comment 1 for bug 2011605

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

mysql8 clustering is a bit sensitive to networking and delays in responses from other nodes. My guess is that everything just took too long and the clustering code (in mysql8, not the charm), just gave up. In that instance it can be quite tricky to recover, but basically you have to pick a node, force it to be the lead in the cluster, and then force the other two back into the cluster.

The logs are full of "Error on opening connect to ..." repeated over and over, indicating that the other node(s) are simply "not there".

Was this a resource constrained system that this was being tested on?

This is essentially very similar to https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/1917332, for example, except that, in that case the lead node was still running.

If all the units were upgraded at the same time (series-upgrade) with no settling between them, then this bug is very similar to https://bugs.launchpad.net/charm-mysql-innodb-cluster/+bug/1907202

Otherwise it may be something else - mysql8 is tricky!

If you could provide some more context, please, to how the units are being series upgraded, that would be great. Thanks.