FTWRL grammar causes permanent deadlock

Bug #1280768 reported by Raghavendra D Prabhu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Won't Fix
Critical
Unassigned
5.6
Confirmed
Critical
Unassigned

Bug Description

While testing with ftwrl grammar - https://bazaar.launchpad.net/~raghavendra-prabhu/randgen/pxc/view/head:/conf/galera/galera_stress-ftwrl.yy - in RQG, noticed this

Processlist: http://paste.wnohang.net/6a3ff4

The server deadlocks completely.

It is run as:

    timeout -s 11 150m perl runall-new.pl --basedir=/rqg/workspace/PXC-5.6-mrandgen/BTYPE/release/Host/pxc-rqg/Percona-XtraDB-Cluster-5.6.15-25.3.729.Linux.x86_64 --vardir=/rqg/workspace/PXC-5.6-mrandgen/BTYPE/release/Host/pxc-rqg/92/ftwrl-BLP --galera=ms --mysqld=--innodb_buffer_pool_size=500M --mysqld=--query_cache_type=0 --mysqld=--wsrep_slave_threads=16 --mysqld=--query_cache_size=0 --mysqld=--innodb_flush_log_at_trx_commit=0 --mysqld=--innodb_file_per_table --reporter=Shutdown,Backtrace,ErrorLog,ErrorLogAlarm,Deadlock --grammar=conf/galera/galera_stress-ftwrl.yy --gendata=conf/galera/galera_stress-56.zz --threads=20 --queries=75000 --mysqld=--wsrep-provider=/rqg/workspace/PXC-5.6-mrandgen/BTYPE/release/Host/pxc-rqg/Percona-XtraDB-Cluster-5.6.15-25.3.729.Linux.x86_64/lib/libgalera_smm.so --mysqld=--innodb_flush_method=O_DIRECT --seed=time

Node1 (the second node) fails with node consistency issues
shortly after starting. Node0 hangs after a while.

Will provide bt and logs shortly.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

bt full.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :
Download full text (3.6 KiB)

mysql> show global status like '%wsrep%';
+------------------------------+--------------------------------------+
| Variable_name | Value |
+------------------------------+--------------------------------------+
| wsrep_local_state_uuid | 24d1faa2-96f2-11e3-aa17-d3aa3a466c95 |
| wsrep_protocol_version | 5 |
| wsrep_last_committed | 414 |
| wsrep_replicated | 427 |
| wsrep_replicated_bytes | 235269 |
| wsrep_repl_keys | 5208 |
| wsrep_repl_keys_bytes | 51506 |
| wsrep_repl_data_bytes | 156435 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 13 |
| wsrep_received_bytes | 585 |
| wsrep_local_commits | 108 |
| wsrep_local_cert_failures | 3 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_avg | 0.078522 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_avg | 0.076923 |
| wsrep_local_cached_downto | 1 |
| wsrep_flow_control_paused_ns | 2704155 |
| wsrep_flow_control_paused | 0.000000 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 2 |
| wsrep_cert_deps_distance | 7.489130 |
| wsrep_apply_oooe | 0.578313 |
| wsrep_apply_oool | 0.019277 |
| wsrep_apply_window | 3.084337 |
| wsrep_commit_oooe | 0.000000 |
| wsrep_commit_oool | 0.009732 |
| wsrep_commit_window | 2.496350 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 443 |
| wsrep_causal_reads | 0 |
| wsrep_incoming_addresses | 127.0.0.1:19300 |
| wsrep_cluster_conf_id | 2 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 24d1faa2-96f2-11e3-aa17-d3aa3a466c95 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 33 ...

Read more...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :
Download full text (82.8 KiB)

Pruned backtrace:

Thread 28 (Thread 0x7fa03c9f3700 (LWP 20478)):
#0 0x00007fa07c59f5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fa05aaaf15d in wait (this=0x24de6a0, seqno=<value optimized out>, lock=...) at galerautils/src/gu_lock.hpp:56
#2 galera::Monitor<galera::ReplicatorSMM::ApplyOrder>::drain_common (this=0x24de6a0, seqno=<value optimized out>, lock=...) at galera/src/monitor.hpp:424
#3 0x00007fa05aaa4ec4 in drain (this=0x24ddc10, recv_ctx=0x7f9fec000990, view_info=..., repl_proto=5, next_state=galera::Replicator::S_SYNCED, seqno_l=<value optimized out>) at galera/src/monitor.hpp:248
#4 galera::ReplicatorSMM::process_conf_change (this=0x24ddc10, recv_ctx=0x7f9fec000990, view_info=..., repl_proto=5, next_state=galera::Replicator::S_SYNCED, seqno_l=<value optimized out>)
    at galera/src/replicator_smm.cpp:1341
#5 0x00007fa05aa84e57 in galera::GcsActionSource::dispatch (this=<value optimized out>, recv_ctx=0x7f9fec000990, act=<value optimized out>, exit_loop=<value optimized out>)
    at galera/src/gcs_action_source.cpp:142
#6 0x00007fa05aa85073 in galera::GcsActionSource::process (this=0x24de1f8, recv_ctx=0x7f9fec000990, exit_loop=@0x7fa03c9f204f) at galera/src/gcs_action_source.cpp:177
#7 0x00007fa05aaa3983 in galera::ReplicatorSMM::async_recv (this=0x24ddc10, recv_ctx=0x7f9fec000990) at galera/src/replicator_smm.cpp:354
#8 0x00007fa05aab8673 in galera_recv (gh=<value optimized out>, recv_ctx=<value optimized out>) at galera/src/wsrep_provider.cpp:226
#9 0x00000000005be0df in wsrep_replication_process (thd=0x7f9fec000990)
    at /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/release/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.15/sql/wsrep_thd.cc:309
#10 0x00000000005ae500 in start_wsrep_THD (arg=0x5be090) at /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/release/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.15/sql/mysqld.cc:5484
#11 0x00007fa07c59b9d1 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fa07aaa3b6d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7fa03c8ae700 (LWP 20484)):
#0 0x00007fa07aa9a343 in poll () from /lib64/libc.so.6
#1 0x0000000000b829e3 in vio_io_wait (vio=<value optimized out>, event=<value optimized out>, timeout=28800000)
    at /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/release/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.15/vio/viosocket.c:867
#2 0x0000000000b82af3 in vio_socket_io_wait (vio=<value optimized out>, event=<value optimized out>)
    at /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/release/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.15/vio/viosocket.c:100
#3 0x0000000000b82d18 in vio_read (vio=0x33529b0, buf=0x348c080 "\001", size=4)
    at /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/release/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.15/vio/viosocket.c:155
#4 0x00000000006757d3 in net_read_raw_loop (net=0x34886f0, count=4) at /mnt/workspace/build-xtradb-cluster-binaries-56/BUILD_TYPE/release/label_exp/centos6-64/Percona-XtraDB-Cluster-5.6.15/sql/net_serv.cc:669
#5 0x000000000067606b in net_read_packet_header (net=<value optimized out>, complen...

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

On a closer look, this looks like an issue in galera wherein the certification, TOI deadlock with each other and flush* hangs.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :
Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

bug#1520491 has a TCs for the same using FLUSH TABLE so marking it as duplicate of the newer bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.