Comment 7 for bug 966679

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Well,

1) "desynced" and "disconnected from primary component" are different things. In the case of RSU the node is still a part of the primary component and it has full state - except for the schema change it is undergoing, but doing RSU makes an implicit contract that "it does not matter" - either through using STATEMENT-based replication events or by not touching the table at all. The important part is that the node has all cluster data (in slave queue) and "desynced" state is just a helper to explain the application that the database may be _temporarily_ "behind the schedule".

2) If you want strictly synchronous reads, then setting wsrep_causal_reads will give you that, and in the RSU case SELECT will block until _at least_ all events that it can possibly depend on are applied - that's the guarantee. However, provided the slave queue is empty, it may return way before DDL concludes, but that will be a native server behaviour then.

3) We can prohibit any selects in desynced state as we do for non-primary configuration. Note however, that it is not as strict as you might be hoping for. It is impossible to do without races (as I explained above) and race interval can be arbitrarily long. E.g. in the lab we could easily have situation that the upper layer of Galera stack (the database) was several configuration changes behind the bottom layer - which essentially defines prim/non_prim. I.e. when the node becomes non-primary, the database does not learn about it until it has processed all of the preceding slave queue events - and theoretically that could be days. It is _virtual synchrony_ - events happen in the same order, but not at the same time.

Conclusion: want a strong guarantee not to read stale data? - Use wsrep_causal_reads. Everything else is weak.