Inconsistency and connection deadlocks with cross-node record updates

Bug #1560206 reported by Brad House
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.6
Confirmed
Undecided
Kenn Takara
5.7
Fix Released
Undecided
Kenn Takara

Bug Description

My sequence of events is basically identical to this blog post (Transaction with SELECT FOR UPDATE, math performed on a record, then record updated and committed), because this is directly from CoderShip, I am assuming this is intended to be supported:
http://galeracluster.com/2015/09/support-for-mysql-transaction-isolation-levels-in-galera-cluster/

However, what is being experienced is
1) Data inconsistency, dirty reads are occurring, so the calculation for the update is wrong, and the duplicate updates from different nodes aren't causing deadlocks so we end up with inconsistency due to these lost updates.
2) Connection lockup occurs, where the only way to unlock the client is the restart the DB node(s) for the locked connections. When performing a
"SHOW PROCESSLIST;" it shows all connections from the application are in a
Sleep state, however they did NOT receive responses.

The server version being used is Percona-XtraDB-Cluster-server-56-5.6.28-25.14.1.el7.x86_64 on CentOS 7.2, it is a 3-node cluster running over a local LAN connected via dual 1Gbps links, with a Linux IPVS load balancer doing round-robin in front from my application.

I have attached a test case that reproduces this issue consistently. This same test case works fine if pointing to only a single DB node in the cluster.

Config settings:

/etc/my.cnf:
[mysqld]
datadir = /var/lib/mysql
# move tmpdir due to /tmp being a memory backed tmpfs filesystem, mysql uses this for on disk sorting
tmpdir = /var/lib/mysql/tmp

[mysqld_safe]
pid-file = /run/mysqld/mysql.pid
syslog
!includedir /etc/my.cnf.d

/etc/my.cnf.d/base.cnf:
[mysqld]
bind-address = 0.0.0.0
key_buffer = 256M
max_allowed_packet = 16M
max_connections = 256
# Some optimizations
thread_concurrency = 10
sort_buffer_size = 2M
query_cache_limit = 100M
query_cache_size = 256M
log_bin
binlog_format = ROW
gtid_mode = ON
log_slave_updates
enforce_gtid_consistency = 1
group_concat_max_len = 102400
innodb_buffer_pool_size = 10G
innodb_log_file_size = 64M
innodb_file_per_table = 1
innodb_file_format = barracuda
default_storage_engine = innodb
# SSD Tuning
innodb_flush_neighbors = 0
innodb_io_capacity = 6000

/etc/my.cnf.d/cluster.cnf:
# Galera cluster
[mysqld]
wsrep_provider = /usr/lib64/libgalera_smm.so
wsrep_sst_method = xtrabackup-v2
wsrep_sst_auth = "sstuser:s3cretPass"
wsrep_cluster_name = cluster
wsrep_slave_threads = 32
wsrep_max_ws_size = 2G
wsrep_provider_options = "gcache.size = 5G; pc.recovery = true"
wsrep_cluster_address = gcomm://10.30.30.11,10.30.30.12,10.30.30.13
wsrep_sync_wait = 0
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = 1
innodb_flush_log_at_trx_commit = 0
sync_binlog = 0
innodb_support_xa = 0
innodb_flush_method = ALL_O_DIRECT

[sst]
progress = 1
time = 1
streamfmt = xbstream

Brad House (t-brad)
description: updated
description: updated
Revision history for this message
Brad House (t-brad) wrote :
Changed in percona-xtradb-cluster:
assignee: nobody → Kenn Takara (kenn-takara)
Revision history for this message
Brad House (t-brad) wrote :

bump.

This seems like a fairly serious issue to me, and provided a test case to reproduce. Surprised this hasn't had any movement.

Revision history for this message
Kenn Takara (kenn-takara) wrote :

Hi Brad,

I've reproed this on the latest builds of PXC 5.6 and 5.7. The problem is that the server is sending some unexpected responses to the client, thus causing the client to wait for additional data (that will never show up because the server killed the transaction). This results in the client threads hanging.

Thanks for the explicit testcase, that helped a lot!

Revision history for this message
Kenn Takara (kenn-takara) wrote :

Bug appears to have been fixed by the latest 5.7 upstream merge.
However, bug still appears in 5.6

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-526

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.