Comment 8 for bug 1926449

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

Thanks for the investigation. I'm afraid that the test run Jeffrey pointed out is a bit of a red herring, it had some manual intervention from me while it was running which has polluted the crashdumps.

I took a look at https://solutions.qa.canonical.com/testruns/b49db99c-959f-417e-beca-cf4a2521709a, which has the following state:
=====
mysql-innodb-cluster/0 active idle 3/lxd/1 10.246.166.115 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
  logrotated/10 active idle 10.246.166.115 Unit is ready.
mysql-innodb-cluster/1* active idle 4/lxd/1 10.246.167.161 Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
  logrotated/11 active idle 10.246.167.161 Unit is ready.
mysql-innodb-cluster/2 blocked idle 5/lxd/2 10.246.166.105 Cluster is inaccessible from this instance. Please check logs for details.
  logrotated/12 active idle 10.246.166.105 Unit is ready.
=====

The crashdumps can be downloaded here: https://oil-jenkins.canonical.com/artifacts/b49db99c-959f-417e-beca-cf4a2521709a/generated/generated/kubernetes-maas/juju-crashdump-kubernetes-maas-2023-06-24-19.11.12.tar.gz

Units 0 and 1 are clustered, but 2 did not join for some reason. In the crashdump, the juju relation between units 2 and 1 was clearly enabled.

The following message comes up a lot in the logs:
=====
2023-06-24 15:33:04 ERROR unit.mysql-innodb-cluster/2.juju-log server.go:325 Cluster is unavailable: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
  File "<string>", line 2, in <module>
RuntimeError: Dba.get_cluster: This function is not available through a session to an instance belonging to an unmanaged replication group
=====

I don't see anything suspicious otherwise. There are some messages about a leader change, is it possible that this issue is due to an unfortunate timing of a leader change?