Comment 4 for bug 1260713

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Ok, so what can we deduce from the last Markus' report:

1) There are serious performance issues on slave:
140414 8:40:38 [Warning] WSREP: last inactive check more than PT1.5S ago (PT4.51434S), skipping check
(probably a result of sync_binlog and innodb_flush_log_at_trx_commit=1)

As a result slave threads may work in a highly concurrent manner, so we may have some serious reces there and transactions that were executed serially on master may be executed in parallel on slave.

2) what can be the cause of failure:
a) Node inconsistency (child table data present on slave were not present on master)
b) foreign key checks turned off on master
c) a bug in mysqld/innodb/xtradb: not all parent keys were included into a writeset, so galera tried to apply two mutually dependent transactions out of order.
d) a bug in galera library so that it failed to correctly calculate dependencies.

- Recent Fred's comments rule out b)
- d) is highly unlikely. The code to compute the dependency is very simple and had it had a such a bug, we'd hear much bigger outcry.
- I imagine that a) is also rather unlikely there.

This leaves us with c). There is nothing special about the failed constraint and no reason why the primary key won't be added to a writeset. However there are many paths that may be invoked in innodb that touch the parent keys and need to be patched to add it to a writeset. Perhaps some sort of statement invoked such path (and that is in agreement with the initial Raghu's report). So if we could have a statement that deleted the child rows, that could have helped.

However a faster fix/workaround would be disabling foreign key checks on slaves (commits are still strictly ordered, so we should never get a situation with dangling parent references)