Percona XtraDB Cluster - HA scalable solution for MySQL

Non-Deterministic Execution Of Transactions from Same Threads

Reported by Jervin R on 2013-08-28
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MySQL patches by Codership
High
Vladislav Klyachin
Percona XtraDB Cluster
Status tracked in Trunk
5.6
Undecided
Unassigned
Trunk
Undecided
Unassigned

Bug Description

Based on PXC 5.5.31, if I have a query on different transaction that inserts a row, and a query on another transaction within the same thread id that depends on the first row i.e. FK constraint, when multiple slave threads are configured, the second transaction sometimes gets executed first causing FK violation. test case below:

CREATE TABLE `a` (
  `id` bigint(20) DEFAULT NULL,
  UNIQUE KEY `id_pk` (`id`)
) ENGINE=InnoDB;

CREATE TABLE `b` (
  `id` bigint(20) DEFAULT NULL,
  `t` int(11) NOT NULL,
  `aid` bigint(20) DEFAULT NULL,
  UNIQUE KEY `act_email_uk` (`id`,`t`),
  KEY `aid_fk` (`aid`),
  CONSTRAINT `aid_fk` FOREIGN KEY (`aid`) REFERENCES `a` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB;

Test script:

<snip>
#!/bin/bash

MY="mysql percona"

for i in {1000..2000}; do
   t=$(($i-1))
   $MY -e "BEGIN; INSERT INTO a SET id = ${i}; COMMIT; BEGIN; INSERT INTO b SET id=${i}, t=${t}, aid=${i}; COMMIT;"
done
</snip>

If I set wsrep_slave_threads = 1 or wrap both statements on the same transaction, everything is fine. below is the error:

130827 22:19:10 [ERROR] Slave SQL: Could not execute Write_rows event on table percona.b; Cannot add or update a child row: a foreign key constraint fails (`percona`.`b`, CONSTRAINT `aid_fk` FOREIGN KEY (`aid`) REFERENCES `a` (`id`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 95, Error_code: 1452
130827 22:19:10 [Warning] WSREP: RBR event 2 Write_rows apply warning: 151, 44481
130827 22:19:10 [ERROR] WSREP: Failed to apply trx: source: 7550f2c3-0f7d-11e3-b44f-7b192cf0d736 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 1180 trx_id: 497138 seqnos (l: 280, g: 44481, s: 44480, d: 44211, ts: 1377656358652661293)
130827 22:19:10 [ERROR] WSREP: Failed to apply app buffer: seqno: 44481, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():52
         at galera/src/replicator_smm.cpp:apply_trx_ws():118
130827 22:19:10 [ERROR] WSREP: Node consistency compromized, aborting...

Below is my.cnf used, 3nodes in all

[mysqld]
log-error = /var/log/mysql/error.log

socket = /var/run/mysqld/mysqld.sock
port = 3306
pid-file = /var/lib/mysql/mysqld.pid
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /var/tmp
default_storage_engine = InnoDB
skip-external-locking
max_connections = 300
max_allowed_packet = 1M
table_open_cache = 512
sort_buffer_size = 1M
net_buffer_length = 8K
read_buffer_size = 1M
read_rnd_buffer_size = 1M
join_buffer_size = 1M
transaction_isolation = REPEATABLE-READ
character-set-server = utf8
collation-server = utf8_unicode_ci
thread_cache_size = 8

binlog_format = ROW
sync_binlog = 0
binlog_cache_size = 1M
server_id = 100
#log_slave_updates = 1
log-bin = mysql-bin

innodb_file_format = barracuda
innodb_strict_mode = 1
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /var/lib/mysql
innodb_buffer_pool_size = 512M
innodb_additional_mem_pool_size = 2M
innodb_log_file_size = 512M
innodb_log_buffer_size = 2M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 50
innodb_fast_shutdown = 1
#innodb_flush_method = O_DIRECT
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = 1
innodb_doublewrite = 1

wsrep_node_name = uxdbc01
wsrep_cluster_name = uxdbc
wsrep_debug = 1
wsrep_node_address = 192.168.56.53
wsrep_node_incoming_address = 192.168.56.53
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = "ist.recv_addr=192.168.56.53:4568;gmcast.listen_addr=tcp://192.168.56.53:4567;"
wsrep_slave_threads = 3
wsrep_sst_method = xtrabackup
wsrep_sst_receive_address = 192.168.56.53
wsrep_cluster_address = gcomm://

Jay Janssen (jay-janssen) wrote :

I can reproduce this in my EC2 PXC test environment on PXC 5.5.31

Percona-XtraDB-Cluster-shared-5.5.31-23.7.5.438.rhel6.x86_64
Percona-XtraDB-Cluster-client-5.5.31-23.7.5.438.rhel6.x86_64
Percona-XtraDB-Cluster-galera-2.6-1.152.rhel6.x86_64
Percona-XtraDB-Cluster-server-5.5.31-23.7.5.438.rhel6.x86_64

Jay Janssen (jay-janssen) wrote :

130828 12:04:56 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.31' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB Cluster (GPL), wsrep_2
3.7.5.r3880
130828 13:02:30 [ERROR] Slave SQL: Could not execute Write_rows event on table test.b; Cannot add or upda
te a child row: a foreign key constraint fails (`test`.`b`, CONSTRAINT `aid_fk` FOREIGN KEY (`aid`) REFER
ENCES `a` (`id`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event'
s master log FIRST, end_log_pos 92, Error_code: 1452
130828 13:02:30 [ERROR] Slave SQL: Could not execute Write_rows event on table test.b; Cannot add or upda
te a child row: a foreign key constraint fails (`test`.`b`, CONSTRAINT `aid_fk` FOREIGN KEY (`aid`) REFER
ENCES `a` (`id`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event'
s master log FIRST, end_log_pos 92, Error_code: 1452
130828 13:02:30 [Warning] WSREP: RBR event 2 Write_rows apply warning: 151, 2397802
130828 13:02:30 [Warning] WSREP: RBR event 2 Write_rows apply warning: 151, 2397804
130828 13:02:30 [ERROR] WSREP: Failed to apply trx: source: 5a26db1a-0fd9-11e3-93a3-57798c06b48f version:
 2 local: 0 state: APPLYING flags: 1 conn_id: 11273 trx_id: 9566534 seqnos (l: 2405865, g: 2397804, s: 23
97803, d: 2397232, ts: 1377694950684317742)
130828 13:02:30 [ERROR] WSREP: Failed to apply app buffer: seqno: 2397804, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():52
         at galera/src/replicator_smm.cpp:apply_trx_ws():118
130828 13:02:30 [ERROR] WSREP: Node consistency compromized, aborting...

Changed in codership-mysql:
assignee: nobody → Seppo Jaakola (seppo-jaakola)
Jeremy L (kzch3zywf6) wrote :

We are also experiencing this issue and can reproduce it easily with some of our test workload and wsrep_slave_threads > 1. Let me know if it would be useful for us to post additional detail and/or debugging information. We are using Percona XtraDB Cluster on Ubuntu 10.04 in our test environment. We have not yet deployed this in production... so we can easily test and would love to see some progress on this bug before we make our final recommendation to management regarding Percona / Galera as a clustering solution.

ii percona-toolkit 2.2.4 Advanced MySQL and system command-line tools
ii percona-xtrabackup 2.1.4-657-1.lucid Open source backup tool for InnoDB and XtraD
ii percona-xtradb-cluster-client-5.5 5.5.31-23.7.5-438.lucid Percona Server database client binaries
ii percona-xtradb-cluster-common-5.5 5.5.31-23.7.5-438.lucid Percona Server database common files (e.g. /
ii percona-xtradb-cluster-galera-2.x 152.lucid Galera components of Percona XtraDB Cluster
ii percona-xtradb-cluster-server-5.5 5.5.31-23.7.5-438.lucid Percona Server database server binaries

Changed in codership-mysql:
assignee: Seppo Jaakola (seppo-jaakola) → Vladislav Klyachin (klyachin)
importance: Undecided → High
milestone: none → 5.5.32-23.7.6
status: New → Confirmed

Able to confirm with lp:percona-xtradb-cluster trunk as well, changing wsrep_slave_threads on non-writer node to >1 (on writer node has no effect) produces

130902 17:25:00 [ERROR] Slave SQL: Could not execute Write_rows event on table sbtest.b; Cannot add or update a child row: a foreign key constraint fails (`sbtest`.`b`, CONSTRAINT `aid_fk` FOREIGN KEY (`aid`) REFERENCES `a` (`id`) ON DELETE CASCADE), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 94, Error_code: 1452
130902 17:25:00 [Warning] WSREP: RBR event 2 Write_rows apply warning: 151, 914
130902 17:25:00 [ERROR] WSREP: Failed to apply trx: source: 8ce7edb7-13c5-11e3-b1ae-36191a17aa6e version: 2 local: 0 state: APPLYING flags: 1 conn_id: 469 trx_id: 2191 seqnos (l: 917, g: 914, s: 913, d: 683, ts: 1378122900523716090)
130902 17:25:00 [ERROR] WSREP: Failed to apply app buffer: seqno: 914, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():52
         at galera/src/replicator_smm.cpp:apply_trx_ws():118
130902 17:25:00 [ERROR] WSREP: Node consistency compromized, aborting...

Changed in percona-xtradb-cluster:
status: New → Confirmed
milestone: none → 5.5.33-23.7.6
tags: added: i35006
Changed in codership-mysql:
importance: High → Critical
Vladislav Klyachin (klyachin) wrote :

This problem arises when the referenced table (`a` in the ticket description) does not contain a PRIMARY KEY.
The easy workaround is to define PRIMARY KEY:

CREATE TABLE `a` (
  `id` bigint(20) DEFAULT NULL,
  PRIMARY KEY `id_pk` (`id`)
) ENGINE=InnoDB;

Galera documentation generally requires tables to have PKs.
The fix for the common case will be shortly.

Changed in codership-mysql:
importance: Critical → High
Jeremy L (kzch3zywf6) wrote :

Unfortunately, I believe the PK explanation above is a red-herring... we have primary keys set on all of our tables and we are still seeing the slaves drop out. We can reproduce it when running a specific set of our benchmarking tests. Unfortunately, it's not clear exactly what in the benchmarks is triggering the issue, but it happens quickly (within 5-10 minutes of running) and with some work we might be able to extract the specific elements. Here is an extract of the failure and the part of the schema involved:

Could not execute Delete_rows event on table configuration.configurations;
Cannot delete or update a parent row: a foreign key constraint fails
(`configuration`.`auto_network_nic_attributes`,
  CONSTRAINT `fk_auto_network_nic_attributes_networks_network_id`
  FOREIGN KEY (`network_id`)
  REFERENCES `networks` (`network_id`)
),
Error_code: 1451;
handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 399, Error_code: 1451

mysql> SHOW CREATE TABLE configuration.configurations;
CREATE TABLE `configurations` (
  `configuration_id` int(11) NOT NULL,
  `scratch_depot_id` int(11) DEFAULT NULL,
  `specification_version` int(10) unsigned NOT NULL,
  `error` text,
  `runnable` tinyint(1) NOT NULL,
  `routable` tinyint(1) NOT NULL,
  `deleted_at` datetime DEFAULT NULL,
  `mac_lan_hint` varchar(32) DEFAULT NULL,
  `disable_internet` tinyint(1) NOT NULL,
  `region` varchar(60) NOT NULL DEFAULT 'integ',
  PRIMARY KEY (`configuration_id`),
  KEY `fk_configurations_account_depot_allocations_scratch_depot_id` (`scratch_depot_id`),
  CONSTRAINT `fk_configurations_account_depot_allocations_scratch_depot_id` FOREIGN KEY (`scratch_depot_id`) REFERENCES `account_depot_allocations` (`account_depot_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> SHOW CREATE TABLE configuration.auto_network_nic_attributes;
CREATE TABLE `auto_network_nic_attributes` (
  `vm_id` int(11) NOT NULL,
  `nic_id` int(11) NOT NULL,
  `network_id` int(11) DEFAULT NULL,
  `address` int(10) unsigned DEFAULT NULL,
  `hostname` varchar(32) DEFAULT NULL,
  PRIMARY KEY (`vm_id`,`nic_id`),
  UNIQUE KEY `UQ_network_address` (`network_id`,`address`),
  UNIQUE KEY `UQ_network_hostname` (`network_id`,`hostname`),
  KEY `configuration_id` (`network_id`),
  CONSTRAINT `fk_auto_network_nic_attributes_networks_network_id` FOREIGN KEY (`network_id`) REFERENCES `networks` (`network_id`),
  CONSTRAINT `fk_auto_network_nic_attributes_nics_vm_id_nic_id` FOREIGN KEY (`vm_id`, `nic_id`) REFERENCES `nics` (`vm_id`, `nic_id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Alex Yurchenko (ayurchen) wrote :

Jeremy,

PK explanation above is not a guess but a hard fact for the bug as stated in the description. Which leaves us with either:
- the bug description is incorrect (specifically it refers to tables without a primary key and without a unique key that InnoDB would use as PK)
- your problem belongs to another bug report

Testing after the merge latest commit shows no issue here.

Changed in percona-xtradb-cluster:
status: Confirmed → Fix Committed
Changed in codership-mysql:
status: Confirmed → Fix Committed

@Jeremy,

That looks like a different issue from this one (though FK is involved in both). Can you report it as a separate issue with additional details?

Jeremy L (kzch3zywf6) wrote :

Will do. I'll post a link to the other ticket here once I have identified another ticket with more closely matches the issue we are seeing... or open a new one with a repro of our issue if I can't find anything. Thanks

Changed in codership-mysql:
status: Fix Committed → Fix Released
Changed in percona-xtradb-cluster:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers