wsrep_slave_threads >1 causes foreign key constraint violations
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
percona-xtradb-cluster-5.6 (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
When running OpenStack against Percona XtraDB 5.6 (Xenial) we observe foreign key violation crashes on the slave servers when using wsrep_slave_threads > 1. This happens semi-regularly every few days in at least 1 production environment.
Unfortunately it is not straight forward to reproduce in a test environment and likely requires a fairly performant test environment to reproduce the race. It was done many times in production particularly when deploying large heat templates against an OpenStack cloud but do not currently have a test case or even precise OpenStack steps to reproduce the issue. I suggest we can try using rally or otherwise concuct some heat templates to reproduce the issue.
The servers in question were also running on HDD storage.
I did find the following similar bug though without a lot of detail:
https:/
2018-10-08 02:26:11 283550 [ERROR] Slave SQL: Could not execute Delete_rows event on table heat.raw_template; Cannot delete or update a parent row: a foreign key constraint fails (`heat`.`stack`, CONSTRAINT `stack_ibfk_2` FOREIGN KEY (`prev_
2018-10-08 02:26:11 283550 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 90837000
2018-10-08 02:26:11 283550 [Warning] WSREP: Failed to apply app buffer: seqno: 90837000, status: 1
at galera/
Retrying 2th time
Changed in percona-xtradb-cluster-5.6 (Ubuntu): | |
status: | New → Confirmed |
Hi, Trent.
Is this bug related:
https:/ /bugs.launchpad .net/percona- xtradb- cluster/ +bug/1692745
I remembered hitting this earlier on 5.6, and was not able to reproduce it on 5.7, but it should, since then, be fixed.
It especially happens when there is network latency between PXC units. Running specific rally tests yielded the issue usually somewhere between 20 minutes and 3 hours into the test.
But, as seen above, that issue has been fixed. I'll try to re-do the testing and confirm that.