Comment 3 for bug 1938708

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

In order to fix this we can make the followers crash-tolerant which is currently not done:

https://dev.mysql.com/doc/refman/8.0/en/replication-solutions-unexpected-replica-halt.html
"In order for replication to be resilient to unexpected halts of the server (sometimes described as crash-safe) it must be possible for the replica to recover its state before halting. This section describes the impact of an unexpected halt of a replica during replication, and how to configure a replica for the best chance of recovery to continue replication."

The right option isn't explicitly enabled and is set to OFF by default:

"Set relay_log_recovery = ON, which enables automatic relay log recovery immediately following server startup. This global variable defaults to OFF and is read-only at runtime, but you can set it to ON with the --relay-log-recovery option at replica startup following an unexpected halt of a replica. Note that this setting ignores the existing relay log files, in case they are corrupted or inconsistent. The relay log recovery process starts a new relay log file and fetches transactions from the source beginning at the replication SQL thread position recorded in the applier metadata repository. The previous relay log files are removed over time by the replica's normal purge mechanism."

This could also be enabled to make the setup safer (but slower):

"Set sync_relay_log=1, which instructs the replication receiver thread to synchronize the relay log to disk after each received transaction is written to it.

This means the replica's record of the current position read from the source's binary log (in the applier metadata repository) is never ahead of the record of transactions saved in the relay log. Note that although this setting is the safest, it is also the slowest due to the number of disk writes involved."