Galera

Joiner state seqno is ahead of group seqno after IST

Bug #1195071 reported by Jervin R on 2013-06-27

This bug report is a duplicate of: Bug #843752: Galera doesn't detect that a node has diverged, same grastate.dat on nodes having different data. Edit Remove

This bug affects 1 person

	Status	Importance	Assigned to
Galera	New	Undecided	Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.5	New	Undecided	Unassigned
5.6	New	Undecided	Unassigned

Bug Description

From a customer cluster, after a successfull SST, and some network seizures causing the joiner to drop on and off the cluster an IST is requested.

From the joiner:

130623 4:28:09 [Note] WSREP: Receiving IST: 125 writesets, seqnos 474944918-474945043
130623 4:28:10 [Note] WSREP: 1 (DB3): State transfer to 0 (DB1) complete.
130623 4:28:10 [Note] WSREP: Member 1 (DB3) synced with group.
130623 4:28:10 [Note] WSREP: IST received: cfa1bbe6-ad61-11e2-0800-ddbec9296193:474945223

From the donor:

130623 4:28:10 [Note] WSREP: IST request: cfa1bbe6-ad61-11e2-0800-ddbec9296193:474944918-474945043|tcp://192.168.20.120:4568
130623 4:28:10 [Note] WSREP: async IST sender starting to serve tcp://192.168.20.120:4568 sending 474944919-474945223
130623 4:28:10 [Note] WSREP: async IST sender served
130623 4:28:10 [Note] WSREP: 1 (DB3): State transfer to 0 (DB1) complete.

Afterwards from the joiner:

130623 4:28:10 [ERROR] WSREP: Local state seqno (474945223) is greater than group seqno (474945133): states diverged. Aborting to avoid potential data loss. Remove '/mnt/data/var/lib/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
at galera/src/replicator_str.cpp:state_transfer_required():33

Earlier in the joiner, there were 2 view updates:

130623 4:26:22 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 8,
        members = 2/3 (joined/total),
        act_id = 474945133,
        last_appl. = 474944861,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = cfa1bbe6-ad61-11e2-0800-ddbec9296193
130623 4:26:22 [Note] WSREP: Flow-control interval: [253, 256]
130623 4:26:55 [Note] WSREP: Quorum results:
        version = 2,
        component = PRIMARY,
        conf_id = 10,
        members = 2/3 (joined/total),
        act_id = 474945223,
        last_appl. = 474944861,
        protocols = 0/4/2 (gcs/repl/appl),
        group UUID = cfa1bbe6-ad61-11e2-0800-ddbec9296193
130623 4:26:55 [Note] WSREP: Flow-control interval: [253, 256]

But seems only the first one persisted leading to the joiner reporting advanced seqno.

Attaching full relevant logs from donor and joiner shortly.

Tags:

Revision history for this message

Jervin R (revin) wrote on 2013-06-27:

db1-joiner.log Edit (33.4 KiB, text/plain)

Joiner log.

tags:

added: i32918

Revision history for this message

Jervin R (revin) wrote on 2013-06-27:

db3-donor.log Edit (33.5 KiB, text/plain)

Donor log.

Revision history for this message

Alex Yurchenko (ayurchen) wrote on 2013-06-27:

What seems to be happening here is:

1) For joiner here were 3 cluster configuration changes: C6 (seqno 474945043), C8 (seqno 474945133) and C10 (seqno 474945223). There also were C7 and C9 between the other nodes, because joiner was so stuck that was kicked out of the cluster.

2) Joiner processed C6 and requested IST. But it managed to do so only after C10 happened (i.e. it took joiner >55 seconds to process C6 message), so donor gave it IST up to 474945223

3) It went on to processing the rest of the slave queue after receiving IST (474945223) and hence protective heuristics mistakenly fired up on processing C8. It is interesting that such behaviour - sending IST up to the latest configuration change was introduced to fix another bug.

In this case just restarting the node (without removing grastate.dat) should have worked just fine.

This shows a bigger problem: current protective heuristics based merely on sequence numbers is not sufficient. See https://bugs.launchpad.net/galera/+bug/843752

summary:

- Joiner state seqno out of sync with group seqno after SST
+ Joiner state seqno is ahead of group seqno after IST

Ovais Tariq (ovais-tariq) on 2014-01-24

tags:

added: i38589

Revision history for this message

Przemek (pmalkowski) wrote on 2016-07-07:

fsql20-logs.txt Edit (35.8 KiB, text/plain)

Another incident like that on PXC 5.6.30, during IST attempt, the joiner node had serious overload problems with very high IO latency, hence the communication with the rest of the cluster was delayed, so the cluster configuration was flapping. As a result, it failed later with:

2016-07-07 11:42:39 2745 [ERROR] WSREP: Local state seqno (6421528) is greater than group seqno (6421160): states diverged. Aborting to avoid potential data loss. Remove '/var/lib/mysql//grastate.dat' file and restart if you wish to continue. (FATAL)
at galera/src/replicator_str.cpp:state_transfer_required():39

Full log in attachment.

tags:

added: i70853