All nodes on cluster crash on foreign key check

Bug #1692745 reported by Marcelo Altmann
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.6
Fix Released
Critical
Unassigned
5.7
Fix Released
Critical
Unassigned

Bug Description

On a 3 nodes pxc cluster, the whole cluster goes down when foreign key constrain fail.

2017-05-22 20:37:29 983 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 158, Error_code: 1451
2017-05-22 20:37:29 983 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 203
2017-05-22 20:37:29 983 [Warning] WSREP: Failed to apply app buffer: seqno: 203, status: 1
  at galera/src/trx_handle.cpp:apply():351

How to reproduce:
simplified verion of https://jira.mariadb.org/browse/MDEV-12398

-- Prepare -

CREATE DATABASE fk;
CREATE TABLE fk.parent (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, b varchar(10));
CREATE TABLE fk.child (ID INT NOT NULL AUTO_INCREMENT PRIMARY KEY, pID INT, CONSTRAINT fk_parent FOREIGN KEY (pID) REFERENCES fk.parent (ID) ON DELETE NO ACTION ON UPDATE NO ACTION);
-- Node1
cat << EOF > node1.sh
#!/bin/bash
MYSQL="mysql -u root -psekret"
MYSQLADMIN="mysqladmin -u root -psekret"
while : ; do

    ID=\`\$MYSQL -BN -e "INSERT INTO fk.parent VALUES (NULL, 'test'); SELECT LAST_INSERT_ID();"\`
    sleep 1
    \$MYSQL -e "DELETE FROM fk.parent WHERE ID=\$ID"
    \$MYSQLADMIN ping > /dev/null
    if [[ \$? -ne 0 ]]
    then
       break
    fi
done
EOF

chmod +x node1.sh

-- add delay
tc qdisc add dev eth0 root handle 1: netem delay 35ms

./node1.sh

-- Node2
cat << EOF > node2.sh
#!/bin/bash
MYSQL="mysql -u root -psekret"
MYSQLADMIN="mysqladmin -u root -psekret"
while : ; do

    ID=\`\$MYSQL -BN -e "INSERT INTO fk.child SELECT NULL, ID FROM fk.parent ORDER BY ID DESC LIMIT 1; SELECT LAST_INSERT_ID();"\`
    sleep 2
    \$MYSQL -e "DELETE FROM fk.child WHERE ID=\$ID"
    \$MYSQLADMIN ping > /dev/null
    if [[ \$? -ne 0 ]]
    then
       break
    fi
done
EOF

chmod +x node2.sh

tc qdisc add dev eth0 root handle 1: netem delay 35ms
./node2.sh

-- Cleanup
DELETE FROM fk.child; DELETE FROM fk.parent;

-- tested
5.7.17-13-57
5.6.35-81.0-56

my.cnf attached

Tags: i206952
Revision history for this message
Marcelo Altmann (marcelo.altmann) wrote :
Revision history for this message
Steve Shipway (sshipway) wrote :

This affected us badly during data load last year. We worked around it by shutting down the other nodes of the cluster until all data was loaded, then running on a single thread for the replication.

Revision history for this message
Jericho Rivera (jericho-rivera) wrote :

I can confirm the crash on PXC 5.7 (latest), will commence testing on PXC 5.6

Changed in percona-xtradb-cluster:
assignee: nobody → Jericho Rivera (jericho-rivera)
Revision history for this message
Jericho Rivera (jericho-rivera) wrote :

Confirmed on PXC 5.6 as well:

Node2:
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 160, Error_code: 1451
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 56
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 911 [Warning] WSREP: Failed to apply app buffer: seqno: 56, status: 1
May 30 02:51:01 localhost mysqld: #011 at galera/src/trx_handle.cpp:apply():351

Node3:
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [ERROR] Slave SQL: Could not execute Delete_rows event on table fk.parent; Cannot delete or update a parent row: a foreign key constraint fails (`fk`.`child`, CONSTRAINT `fk_parent` FOREIGN KEY (`pID`) REFERENCES `parent` (`ID`) ON DELETE NO ACTION ON UPDATE NO ACTION), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 160, Error_code: 1451
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 152, 56
May 30 02:51:01 localhost mysqld: 2017-05-30 02:51:01 936 [Warning] WSREP: Failed to apply app buffer: seqno: 56, status: 1
May 30 02:51:01 localhost mysqld: #011 at galera/src/trx_handle.cpp:apply():351

Revision history for this message
Florian (kartoffelheinz) wrote :

I Can also confirm the issue on PXC 5.6, though we "only" experience 1 out of 3 nodes to crash in a 3-way-master setup (same error though).

We were able to narrow the issue down to 2 consecutive transactions (first deletes from table that is referenced by foreign key which seem to be applied out of order by the 2 slaves. The transaction usually fails only on one node after 4 tries (which crashes the node then, because he thinks he is no longer consistent). The other slave node tries 1 or 2 times and then gets it right. I suspect some kind of out-of-order-committing, maybe there is a race condition in the code that makes sure transaction will be applied in order?

Btw, we double checked repl.commit_order to make sure it has been set to never apply out-of-order on any of the nodes... :)

Would be glad if someone from Percona could have a look if there is a configurable workaround until no patched version is released.

tags: added: i206952
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

Bug was fixed as part of upstream refresh from Codership 5.6.36 and 5.7.18.

PXC recent release PXC-5.6.37 and PXC-5.7.19 carry the said fix.

Release: https://www.percona.com/blog/2017/09/20/percona-xtradb-cluster-5-6-37-26-21-is-now-available/

Release https://www.percona.com/blog/2017/09/22/percona-xtradb-cluster-5-7-19-29-22-now-available/

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

* Fixes for certfication key generation in the presence of foreign key
constraints (MW-369)

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-935

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.