Cluster stuck with status: Not all instances configured for clustering

Bug #2015774 reported by Bas de Bruijne
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Triaged
High
Unassigned

Bug Description

In test run https://solutions.qa.canonical.com/v2/testruns/51ba50bc-a0d0-43b0-8a08-e6d9006544ac (yoga/focal) the mysql cluster is stuck with status:

================
mysql-innodb-cluster/0 waiting idle 0/lxd/6 10.246.65.16 Instance not yet configured for clustering
  filebeat/68 active idle 10.246.65.16 Filebeat ready.
  landscape-client/68 maintenance idle 10.246.65.16 Need computer-title and juju-info to proceed
  logrotated/63 active idle 10.246.65.16 Unit is ready.
  nrpe/78 active idle 10.246.65.16 icmp,5666/tcp Ready
  prometheus-grok-exporter/69 active idle 10.246.65.16 9144/tcp Unit is ready
  telegraf/68 active idle 10.246.65.16 9103/tcp Monitoring mysql-innodb-cluster/0 (source version/commit 23.01-4-...)
mysql-innodb-cluster/1* waiting idle 2/lxd/7 10.246.64.226 Not all instances configured for clustering
  filebeat/20 active idle 10.246.64.226 Filebeat ready.
  landscape-client/20 maintenance idle 10.246.64.226 Need computer-title and juju-info to proceed
  logrotated/15 active idle 10.246.64.226 Unit is ready.
  nrpe/26 active idle 10.246.64.226 icmp,5666/tcp Ready
  prometheus-grok-exporter/21 active idle 10.246.64.226 9144/tcp Unit is ready
  telegraf/20 active idle 10.246.64.226 9103/tcp Monitoring mysql-innodb-cluster/1 (source version/commit 23.01-4-...)
mysql-innodb-cluster/2 waiting idle 4/lxd/8 10.246.65.22 Instance not yet configured for clustering
  filebeat/72 active idle 10.246.65.22 Filebeat ready.
  landscape-client/72 maintenance idle 10.246.65.22 Need computer-title and juju-info to proceed
  logrotated/66 active idle 10.246.65.22 Unit is ready.
  nrpe/80 active idle 10.246.65.22 icmp,5666/tcp Ready
  prometheus-grok-exporter/72 active idle 10.246.65.22 9144/tcp Unit is ready
  telegraf/72 active idle 10.246.65.22 9103/tcp Monitoring mysql-innodb-cluster/2 (source version/commit 23.01-4-...)
================

In the logs of the leader unit we can see that it fails to connect to the other units:
================
2023-04-08 13:41:55 ERROR unit.mysql-innodb-cluster/1.juju-log server.go:316 Failed configuring instance 192.168.33.214: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
mysqlsh.DBError: MySQL Error (2003): Dba.configure_instance: Can't connect to MySQL server on '192.168.33.214' (113)

2023-04-08 13:41:55 DEBUG unit.mysql-innodb-cluster/1.juju-log server.go:316 Checking cluster status.
2023-04-08 13:41:56 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Adding instance, 192.168.33.214, to the cluster.
2023-04-08 13:41:58 ERROR unit.mysql-innodb-cluster/1.juju-log server.go:316 Failed adding instance 192.168.33.214 to cluster: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
ERROR: Unable to connect to the target instance '192.168.33.214:3306'. Please verify the connection settings, make sure the instance is available and try again.
Traceback (most recent call last):
  File "<string>", line 3, in <module>
mysqlsh.DBError: MySQL Error (2003): Cluster.add_instance: Could not open connection to '192.168.33.214:3306': Can't connect to MySQL server on '192.168.33.214' (113)

2023-04-08 13:41:58 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Configuring instance for clustering: 192.168.33.219.
2023-04-08 13:42:01 ERROR unit.mysql-innodb-cluster/1.juju-log server.go:316 Failed configuring instance 192.168.33.219: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
mysqlsh.DBError: MySQL Error (2003): Dba.configure_instance: Can't connect to MySQL server on '192.168.33.219' (113)

2023-04-08 13:42:01 DEBUG unit.mysql-innodb-cluster/1.juju-log server.go:316 Checking cluster status.
2023-04-08 13:42:02 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Adding instance, 192.168.33.219, to the cluster.
2023-04-08 13:42:04 ERROR unit.mysql-innodb-cluster/1.juju-log server.go:316 Failed adding instance 192.168.33.219 to cluster: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
ERROR: Unable to connect to the target instance '192.168.33.219:3306'. Please verify the connection settings, make sure the instance is available and try again.
Traceback (most recent call last):
  File "<string>", line 3, in <module>
mysqlsh.DBError: MySQL Error (2003): Cluster.add_instance: Could not open connection to '192.168.33.219:3306': Can't connect to MySQL server on '192.168.33.219' (113)
================

In the non-leader units there are no errors and it does indicate that it found peers:
================
2023-04-08 13:30:51 INFO unit.mysql-innodb-cluster/0.juju-log server.go:316 Invoking reactive handler: reactive/mysql_innodb_cluster_handlers.py:138:check_quorum
2023-04-08 13:30:51 DEBUG unit.mysql-innodb-cluster/0.juju-log server.go:316 Found peers: 192.168.33.168,192.168.33.219
2023-04-08 13:30:52 DEBUG unit.mysql-innodb-cluster/0.juju-log server.go:316 Expect 2 peers
================

I'm not sure what the cause of the connection issue is.

Crashdumps and configs can be found here:
https://oil-jenkins.canonical.com/artifacts/51ba50bc-a0d0-43b0-8a08-e6d9006544ac/index.html

tags: added: cdo-qa foundations-engine
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :
Download full text (3.6 KiB)

So the 'reason' it didn't cluster is due to the leader not being able to contact the followers:

023-04-08 13:41:49 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Invoking reactive handler: reactive/mysql_innodb_cluster_handlers.py:138:check_quorum
2023-04-08 13:41:49 DEBUG unit.mysql-innodb-cluster/1.juju-log server.go:316 Found peers: 192.168.33.214,192.168.33.219
2023-04-08 13:41:49 DEBUG unit.mysql-innodb-cluster/1.juju-log server.go:316 Expect 2 peers
2023-04-08 13:41:49 DEBUG unit.mysql-innodb-cluster/1.juju-log server.go:316 Reached quorum
2023-04-08 13:41:49 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Invoking reactive handler: reactive/mysql_innodb_cluster_handlers.py:172:configure_instances_for_clustering
2023-04-08 13:41:49 DEBUG unit.mysql-innodb-cluster/1.juju-log server.go:316 Configuring instances for clustering.
2023-04-08 13:41:49 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Configuring instance for clustering: 192.168.33.214.
2023-04-08 13:41:55 ERROR unit.mysql-innodb-cluster/1.juju-log server.go:316 Failed configuring instance 192.168.33.214: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
mysqlsh.DBError: MySQL Error (2003): Dba.configure_instance: Can't connect to MySQL server on '192.168.33.214' (113)

2023-04-08 13:41:55 DEBUG unit.mysql-innodb-cluster/1.juju-log server.go:316 Checking cluster status.
2023-04-08 13:41:56 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Adding instance, 192.168.33.214, to the cluster.
2023-04-08 13:41:58 ERROR unit.mysql-innodb-cluster/1.juju-log server.go:316 Failed adding instance 192.168.33.214 to cluster: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
ESC[31mERROR: ESC[0mUnable to connect to the target instance '192.168.33.214:3306'. Please verify the connection settings, make sure the instance is available and try again.
Traceback (most recent call last):
  File "<string>", line 3, in <module>
mysqlsh.DBError: MySQL Error (2003): Cluster.add_instance: Could not open connection to '192.168.33.214:3306': Can't connect to MySQL server on '192.168.33.214' (113)

2023-04-08 13:41:58 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Configuring instance for clustering: 192.168.33.219.
2023-04-08 13:42:01 ERROR unit.mysql-innodb-cluster/1.juju-log server.go:316 Failed configuring instance 192.168.33.219: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
mysqlsh.DBError: MySQL Error (2003): Dba.configure_instance: Can't connect to MySQL server on '192.168.33.219' (113)

2023-04-08 13:42:01 DEBUG unit.mysql-innodb-cluster/1.juju-log server.go:316 Checking cluster status.
2023-04-08 13:42:02 INFO unit.mysql-innodb-cluster/1.juju-log server.go:316 Adding instance, 192.168.33.219, to the cluster.
2023-04-08 13:42:04 ERROR unit.mysql-innodb-cluster/1.juju-log server.go:316 Failed adding instance 192.168.33.219 to cluster: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
ESC[31mERROR: ESC[0mUnable to connect to the target instance '1...

Read more...

Changed in charm-mysql-innodb-cluster:
status: New → Incomplete
Changed in charm-mysql-innodb-cluster:
status: Incomplete → Triaged
importance: Undecided → High
Revision history for this message
Russell Myers (russellmyers) wrote :

I have a similar issue when adding a fourth or fifth node to a running cluster. I was doing this in an effort to replace a host in a test env. Basically the node ends up in a state of "Instance not yet configured for clustering" and when I check the juju logs on the leader I can see the same message as above:

2023-09-12 15:31:34 INFO unit.mysql-innodb-cluster/2.juju-log server.go:325 cluster:1: Configuring instance for clustering: 192.168.6.227.

ERROR: Unable to connect to the target instance '192.168.6.227:3306'. Please verify the connection settings, make sure the instance is available and try again.
mysqlsh.DBError: MySQL Error (1130): Cluster.add_instance: Could not open connection to '192.168.6.227:3306': Host '192.168.6.5' is not allowed to connect to this MySQL server
2023-09-12 15:31:36 WARNING unit.mysql-innodb-cluster/2.juju-log server.go:325 cluster:1: Instance: 192.168.6.6, already clustered.
2023-09-12 15:31:36 WARNING unit.mysql-innodb-cluster/2.juju-log server.go:325 cluster:1: Instance: 192.168.6.7, already clustered.
2023-09-12 15:31:36 WARNING unit.mysql-innodb-cluster/2.juju-log server.go:325 cluster:1: Instance: 192.168.6.211, already clustered.
2023-09-12 15:31:37 INFO unit.mysql-innodb-cluster/2.juju-log server.go:325 cluster:1: Adding instance, 192.168.6.227, to the cluster.

On the new node:

mysql> select user, host from mysql.user;
+------------------+---------------+
| user | host |
+------------------+---------------+
| clusteruser | 192.168.6.227 |
| clusteruser | 192.168.6.6 |
| clusteruser | 192.168.6.7 |
| clusteruser | localhost |
| debian-sys-maint | localhost |
| mysql.infoschema | localhost |
| mysql.session | localhost |
| mysql.sys | localhost |
| root | localhost |
+------------------+---------------+

It looks like it doesn't add the clusteruser for every host. If I add it manually and the run the add-instance action from juju all is well.

Revision history for this message
Russell Myers (russellmyers) wrote :

Model Controller Cloud/Region Version SLA Timestamp
openstack-nb1 nb1-juju-1 maas-nb1/default 3.1.5 unsupported 20:28:00Z

App Version Status Scale Charm Channel Rev Exposed Message
mysql-innodb-cluster 8.0.34 waiting 5 mysql-innodb-cluster 8.0/stable 56 no Instance not yet configured for clustering

Unit Workload Agent Machine Public address Ports Message
mysql-innodb-cluster/0 active idle 0/lxd/0 192.168.6.6 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/1 active idle 1/lxd/0 192.168.6.7 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/2* active idle 2/lxd/0 192.168.6.5 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/3 active idle 1/lxd/16 192.168.6.211 Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/4 waiting idle 0/lxd/19 192.168.6.227 Instance not yet configured for clustering

Revision history for this message
Russell Myers (russellmyers) wrote :

I just discovered that in two environments this only happens when the third original unit is the leader. If either of the first two units are the leader adding an additional unit runs without issue. In the latter case I have to go into the mysql db on the new unit and add the clusteruser for the third host and the run the add-instance action. The user gets created for the first two hosts, but not the third.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.