Juju hook runtime error causes cluster status to be blocked but cluster is OK

Bug #1998356 reported by Jadon Naas
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Triaged
High
Unassigned

Bug Description

Juju marked a mysql-innodb-cluster unit as blocked after a runtime error in a hook, even though Juju was later able to successfully run its hooks and communicate with the unit. The database cluster itself seemed to be in good condition as far as the MySQL error log of all three units showed.

This bug was observed in an automated IntegrationsQA test on Jammy Yoga. The test was marked as a failure because the mysql-innodb-cluster/2 unit was marked as "blocked" for an extended period of time. The test run timed out and was killed.

The Juju status for the unit was "Cluster is inaccessible from this instance. Please check logs for details.".

Juju's logs for the mysql-innodb-cluster/2 Juju agent show the following error message:

2022-11-29 00:49:35 ERROR unit.mysql-innodb-cluster/2.juju-log server.go:316 db-router:206: Cluster is unavailable: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
Traceback (most recent call last):
  File "<string>", line 2, in <module>
RuntimeError: Dba.get_cluster: This function is not available through a session to an instance belonging to an unmanaged replication group

According to the test logs, mysql-innodb-cluster/2 entered a blocked state at 00:49:35 UTC time. Here is the test log message:

DEBUG:root:mysql-innodb-cluster/2 workload status is blocked since 2022-11-29 00:49:35+00:00

However, the MySQL for mysql-innodb-cluster/2's error log reported that the server was online and part of the replication group in the cluster immediately before the issue. Here are the relevant logs:

2022-11-29T00:48:52.127627Z 27 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_1000@10.246.168.177:3306',replication started in log 'FIRST' at position 4
2022-11-29T00:48:52.285937Z 26 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='10.246.168.177', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2022-11-29T00:48:52.300058Z 0 [System] [MY-011490] [Repl] Plugin group_replication reported: 'This server was declared online within the replication group.'

The logs for the other two mysql-innodb-cluster units also showed they could see the mysql-innodb-cluster/2 unit as a cluster member.

OpenStack Juju Crashdump:

https://oil-jenkins.canonical.com/artifacts/bbf9de63-9833-4090-90fc-2f8e6845eeec/generated/generated/openstack/juju-crashdump-openstack-2022-11-29-04.26.31.tar.gz

Full list of artifacts for the test run:
https://oil-jenkins.canonical.com/artifacts/bbf9de63-9833-4090-90fc-2f8e6845eeec/index.html

Changed in charm-mysql-innodb-cluster:
status: New → Triaged
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.