Mysql gets into a bad state, Relay log read failure, leaves group

Bug #1996098 reported by Alexander Balderson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
New
Undecided
Unassigned

Bug Description

During a deployment of Focal Ussuri with Masakari, Mysql-innodb-cluser went down and reported cluster inaccessible from this instance. As a result some services (keystone) were unable to connect to mysql and requests were getting dropped.

Mysql looks to be at version 8.0.31 and running the 8.0/stable charm.

At the time of outage the 0 unit looks to start doing replication, and then goes down reporting:

2022-11-07T00:26:07.695870Z 54 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2022-11-07T00:26:17.699058Z 0 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Shutting down an outgoing connection. This happens because something might be wrong on a bi-directional connection to node 192.168.33.40:33061. Please check the connection status to this member'
2022-11-07T00:26:17.716934Z 2 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address 192.168.33.40:3306.'
2022-11-07T00:26:18.718306Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'
2022-11-07T00:26:18.718544Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 192.168.33.40:3306, 192.168.33.79:3306, 192.168.33.53:3306 on view 16677726854339640:15.'
2022-11-07T00:26:21.083469Z 56 [ERROR] [MY-010596] [Repl] Error reading relay log event for channel 'group_replication_applier': corrupted data in log event
2022-11-07T00:26:21.083542Z 56 [ERROR] [MY-013121] [Repl] Slave SQL for channel 'group_replication_applier': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, the server was unable to fetch a keyring key required to open an encrypted relay log file, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: MY-013121
2022-11-07T00:26:21.083570Z 56 [ERROR] [MY-011451] [Repl] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2022-11-07T00:26:21.083616Z 54 [ERROR] [MY-011452] [Repl] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
2022-11-07T00:26:21.083702Z 54 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2022-11-07T00:26:21.084194Z 56 [ERROR] [MY-010586] [Repl] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 0
2022-11-07T00:26:21.084255Z 860 [ERROR] [MY-011622] [Repl] Plugin group_replication reported: 'Unable to evaluate the group replication applier execution status. Group replication recovery will shutdown to avoid data corruption.'
2022-11-07T00:26:21.084300Z 860 [ERROR] [MY-011620] [Repl] Plugin group_replication reported: 'Fatal error during the incremental recovery process of Group Replication. The server will leave the group.'
2022-11-07T00:26:21.084336Z 860 [Warning] [MY-011645] [Repl] Plugin group_replication reported: 'Skipping leave operation: concurrent attempt to leave the group is on-going.'
2022-11-07T00:26:21.084355Z 860 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2022-11-07T00:26:24.181786Z 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'

From here the cluster never comes back up.

The first testrun can be found at:
https://solutions.qa.canonical.com/v2/testruns/3e0f9b9c-ebce-4fdd-9e18-d5de9f2122c8/
with crashdump at:
https://oil-jenkins.canonical.com/artifacts/3e0f9b9c-ebce-4fdd-9e18-d5de9f2122c8/generated/generated/openstack/juju-crashdump-openstack-2022-11-07-00.27.34.tar.gz

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.