Improper transition from REPLICATION_FAIL to ONLINE
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mysql-mmm |
New
|
Undecided
|
Unassigned |
Bug Description
This is with version 2.2.1.
Suppose we have a high setting of trap_period and max_backlog. Then we can encounter the following unpleasant surprise which can be called a feature with a stretch but otherwise a bug:
- slave thread dies
- the host goes into REPLICATION_FAIL
- the slave thread is restarted
- the slave is far behind, but it is still put in ONLINE because the trap to go into REPLICATION_DELAY has not yet been triggered
- it stays long enough in ONLINE to get assigned the role of READER
Suggestion for a fix:
In this code in in lib/Monitor/
if ($state eq 'REPLICATION_DELAY' || $state eq 'REPLICATION_FAIL') {
Do not put it in ONLINE right away if the previous state was REPLICATION_FAIL if there is any lag at all, or nothing wrong with assuming REPLICATION_DELAY unconditionally. Put it in REPLICATION_DELAY first. Then if the slave catches up fast, the next iteration will put it in ONLINE.