MySQL InnoDB Cluster Charm

Juju hook runtime error causes cluster status to be blocked but cluster is OK

Bug #1998356 reported by Jadon Naas on 2022-11-30

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	MySQL InnoDB Cluster Charm	Triaged	High	Unassigned

Bug Description

Juju marked a mysql-innodb-cluster unit as blocked after a runtime error in a hook, even though Juju was later able to successfully run its hooks and communicate with the unit. The database cluster itself seemed to be in good condition as far as the MySQL error log of all three units showed.

This bug was observed in an automated IntegrationsQA test on Jammy Yoga. The test was marked as a failure because the mysql-innodb-cluster/2 unit was marked as "blocked" for an extended period of time. The test run timed out and was killed.

The Juju status for the unit was "Cluster is inaccessible from this instance. Please check logs for details.".

Juju's logs for the mysql-innodb-cluster/2 Juju agent show the following error message:

2022-11-29 00:49:35 ERROR unit.mysql-innodb-cluster/2.juju-log server.go:316 db-router:206: Cluster is unavailable: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
File "<string>", line 2, in <module>
RuntimeError: Dba.get_cluster: This function is not available through a session to an instance belonging to an unmanaged replication group

According to the test logs, mysql-innodb-cluster/2 entered a blocked state at 00:49:35 UTC time. Here is the test log message:

DEBUG:root:mysql-innodb-cluster/2 workload status is blocked since 2022-11-29 00:49:35+00:00

However, the MySQL for mysql-innodb-cluster/2's error log reported that the server was online and part of the replication group in the cluster immediately before the issue. Here are the relevant logs:

2022-11-29T00:48:52.127627Z 27 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_1000@10.246.168.177:3306',replication started in log 'FIRST' at position 4
2022-11-29T00:48:52.285937Z 26 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='10.246.168.177', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2022-11-29T00:48:52.300058Z 0 [System] [MY-011490] [Repl] Plugin group_replication reported: 'This server was declared online within the replication group.'