All nodes on cluster crash on foreign key check
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.6 |
Fix Released
|
Critical
|
Unassigned | |||
5.7 |
Fix Released
|
Critical
|
Unassigned |
Bug Description
On a 3 nodes pxc cluster, the whole cluster goes down when foreign key constrain fail.
2017-05-22 20:37:29 983 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_
2017-05-22 20:37:29 983 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 203
2017-05-22 20:37:29 983 [Warning] WSREP: Failed to apply app buffer: seqno: 203, status: 1
at galera/
How to reproduce:
simplified verion of https:/
-- Prepare -
CREATE DATABASE fk;
CREATE TABLE fk.parent (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, b varchar(10));
CREATE TABLE fk.child (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, pID INT, CONSTRAINT fk_parent FOREIGN KEY (pID) REFERENCES fk.parent (ID) ON DELETE NO ACTION ON UPDATE NO ACTION);
-- Node1
cat << EOF > node1.sh
#!/bin/bash
MYSQL="mysql -u root -psekret"
MYSQLADMIN=
while : ; do
ID=\`\$MYSQL -BN -e "INSERT INTO fk.parent VALUES (NULL, 'test'); SELECT LAST_INSERT_
sleep 1
\$MYSQL -e "DELETE FROM fk.parent WHERE ID=\$ID"
\$MYSQLADMIN ping > /dev/null
if [[ \$? -ne 0 ]]
then
break
fi
done
EOF
chmod +x node1.sh
-- add delay
tc qdisc add dev eth0 root handle 1: netem delay 35ms
./node1.sh
-- Node2
cat << EOF > node2.sh
#!/bin/bash
MYSQL="mysql -u root -psekret"
MYSQLADMIN=
while : ; do
ID=\`\$MYSQL -BN -e "INSERT INTO fk.child SELECT NULL, ID FROM fk.parent ORDER BY ID DESC LIMIT 1; SELECT LAST_INSERT_
sleep 2
\$MYSQL -e "DELETE FROM fk.child WHERE ID=\$ID"
\$MYSQLADMIN ping > /dev/null
if [[ \$? -ne 0 ]]
then
break
fi
done
EOF
chmod +x node2.sh
tc qdisc add dev eth0 root handle 1: netem delay 35ms
./node2.sh
-- Cleanup
DELETE FROM fk.child; DELETE FROM fk.parent;
-- tested
5.7.17-13-57
5.6.35-81.0-56
my.cnf attached
tags: | added: i206952 |
This affected us badly during data load last year. We worked around it by shutting down the other nodes of the cluster until all data was loaded, then running on a single thread for the replication.