Joiner state seqno is ahead of group seqno after IST

Bug #1195071 reported by Jervin R
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Galera
New
Undecided
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
New
Undecided
Unassigned
5.6
New
Undecided
Unassigned

Bug Description

From a customer cluster, after a successfull SST, and some network seizures causing the joiner to drop on and off the cluster an IST is requested.

From the joiner:

130623 4:28:09 [Note] WSREP: Receiving IST: 125 writesets, seqnos 474944918-474945043
130623 4:28:10 [Note] WSREP: 1 (DB3): State transfer to 0 (DB1) complete.
130623 4:28:10 [Note] WSREP: Member 1 (DB3) synced with group.
130623 4:28:10 [Note] WSREP: IST received: cfa1bbe6-ad61-11e2-0800-ddbec9296193:474945223

From the donor:

130623 4:28:10 [Note] WSREP: IST request: cfa1bbe6-ad61-11e2-0800-ddbec9296193:474944918-474945043|tcp://192.168.20.120:4568
130623 4:28:10 [Note] WSREP: async IST sender starting to serve tcp://192.168.20.120:4568 sending 474944919-474945223
130623 4:28:10 [Note] WSREP: async IST sender served
130623 4:28:10 [Note] WSREP: 1 (DB3): State transfer to 0 (DB1) complete.

Afterwards from the joiner:

130623 4:28:10 [ERROR] WSREP: Local state seqno (474945223) is greater than group seqno (474945133): states diverged. Aborting to avoid potential data loss. Remove '/mnt/data/var/lib/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
         at galera/src/replicator_str.cpp:state_transfer_required():33

Earlier in the joiner, there were 2 view updates:

130623 4:26:22 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 8,
        members = 2/3 (joined/total),
        act_id = 474945133,
        last_appl. = 474944861,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = cfa1bbe6-ad61-11e2-0800-ddbec9296193
130623 4:26:22 [Note] WSREP: Flow-control interval: [253, 256]
130623 4:26:55 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 10,
        members = 2/3 (joined/total),
        act_id = 474945223,
        last_appl. = 474944861,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = cfa1bbe6-ad61-11e2-0800-ddbec9296193
130623 4:26:55 [Note] WSREP: Flow-control interval: [253, 256]

But seems only the first one persisted leading to the joiner reporting advanced seqno.

Attaching full relevant logs from donor and joiner shortly.

Revision history for this message
Jervin R (revin) wrote :

Joiner log.

tags: added: i32918
Revision history for this message
Jervin R (revin) wrote :

Donor log.

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

What seems to be happening here is:

1) For joiner here were 3 cluster configuration changes: C6 (seqno 474945043), C8 (seqno 474945133) and C10 (seqno 474945223). There also were C7 and C9 between the other nodes, because joiner was so stuck that was kicked out of the cluster.

2) Joiner processed C6 and requested IST. But it managed to do so only after C10 happened (i.e. it took joiner >55 seconds to process C6 message), so donor gave it IST up to 474945223

3) It went on to processing the rest of the slave queue after receiving IST (474945223) and hence protective heuristics mistakenly fired up on processing C8. It is interesting that such behaviour - sending IST up to the latest configuration change was introduced to fix another bug.

In this case just restarting the node (without removing grastate.dat) should have worked just fine.

This shows a bigger problem: current protective heuristics based merely on sequence numbers is not sufficient. See https://bugs.launchpad.net/galera/+bug/843752

summary: - Joiner state seqno out of sync with group seqno after SST
+ Joiner state seqno is ahead of group seqno after IST
tags: added: i38589
Revision history for this message
Przemek (pmalkowski) wrote :

Another incident like that on PXC 5.6.30, during IST attempt, the joiner node had serious overload problems with very high IO latency, hence the communication with the rest of the cluster was delayed, so the cluster configuration was flapping. As a result, it failed later with:

2016-07-07 11:42:39 2745 [ERROR] WSREP: Local state seqno (6421528) is greater than group seqno (6421160): states diverged. Aborting to avoid potential data loss. Remove '/var/lib/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
         at galera/src/replicator_str.cpp:state_transfer_required():39

Full log in attachment.

tags: added: i70853
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.