Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

PXC crashes if fc_limit=20000 and under stress

Bug #1579987 reported by Kenn Takara on 2016-05-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Fix Released	Undecided	Kenn Takara	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC 5.6.29-25.15

Bug Description

On a 256MB Centos7 machine, running PXC 5.6
(1) Start up a 3 node cluster
(2) Set fc_limit=20000 on node 3
(3) "flush tables with read lock;" on node 3
(4) Start a stress on node 1

Sometimes, node 3 will block and eventually get stuck at a 32K size, the other
nodes will keep on running. Other times, it will keep on growing.

(5) after the limit has been reached, "unlock tables;" on node 3

This will start to draw down the queue size, this will then trigger an assert
at some point.

========== gdb stack trace ========

#0 0x00007ffff5f635f7 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff5f64ce8 in __GI_abort () at abort.c:90
#2 0x00007ffff5f5c566 in __assert_fail_base (
fmt=0x7ffff60ac228 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7fffda6e12be "conn->recv_q_size >= size",
file=file@entry=0x7fffda6e030c "gcs/src/gcs.cpp", line=line@entry=1775,
function=function@entry=0x7fffda6e1f80 <GCS_FIFO_POP_HEAD(gcs_conn*, long)::_PRETTY_FUNCTION_> "void GCS_FIFO_POP_HEAD(gcs_conn_t*, ssize_t)") at assert.c:92
#3 0x00007ffff5f5c612 in _GI__assert_fail (
assertion=0x7fffda6e12be "conn->recv_q_size >= size",
file=0x7fffda6e030c "gcs/src/gcs.cpp", line=1775,
function=0x7fffda6e1f80 <GCS_FIFO_POP_HEAD(gcs_conn*, long)::_PRETTY_FUNCTION_> "void GCS_FIFO_POP_HEAD(gcs_conn_t*, ssize_t)") at assert.c:101
#4 0x00007fffda64864a in GCS_FIFO_POP_HEAD (conn=0x197eee0, size=140736414615528)
at gcs/src/gcs.cpp:1775
#5 0x00007fffda648814 in gcs_recv (conn=0x197eee0, action=0x7fffd8146cf0)
at gcs/src/gcs.cpp:1811
#6 0x00007fffda6a6927 in galera::Gcs::recv (this=0x1978030, act=...)
at galera/src/galera_gcs.hpp:118
#7 0x00007fffda681399 in galera::GcsActionSource::process (this=0x1978178,
recv_ctx=0x7fffb80009a0, exit_loop=@0x7fffd8146d9f: false)
at galera/src/gcs_action_source.cpp:175
#8 0x00007fffda69dcf9 in galera::ReplicatorSMM::async_recv (this=0x1977a50,
recv_ctx=0x7fffb80009a0) at galera/src/replicator_smm.cpp:368
#9 0x00007fffda6bac6e in galera_recv (gh=0x1946c70, recv_ctx=0x7fffb80009a0)
at galera/src/wsrep_provider.cpp:239
#10 0x000000000065b3c3 in wsrep_replication_process (thd=0x7fffb80009a0)
at /root/dev/pxc/sql/wsrep_thd.cc:313
#11 0x00000000006377f2 in start_wsrep_THD (arg=0x65b30b <wsrep_replication_process(THD*)>)
at /root/dev/pxc/sql/mysqld.cc:5792
#12 0x00007ffff7bc6dc5 in start_thread (arg=0x7fffd8148700) at pthread_create.c:308
#13 0x00007ffff602421d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Revision history for this message

Kenn Takara (kenn-takara) wrote on 2016-05-10:

Set the fc_limit=20000, and then block the node ("flush tables with read lock").
The queue will eventually reach the limit (which is actually ~34800), but
before that it will reach the size of the receive queue (which is set upon
startup, and on a 256MB machine will hold 32K entries). If you then
"unlock tables", this will hit an assert in the code.

There is a bug in the fifo queue implementation (galerautils/src/gu_fifo.c)
that if the circular buffer has wrapped (so that -->tail ... head-->)
and if head and tail are in the same row, when head reaches
the end of the row,it will free() the row, thus erasing part of the entries.

The upper_limit and lower_limit can also be set to be higher than the size
of the receive queue which doesn't seem to be the behavior we want.

Changed in percona-xtradb-cluster:
status:	New → Fix Committed

Hrvoje Matijakovic (hrvojem) on 2016-05-16

Changed in percona-xtradb-cluster:
milestone:	none → 5.6.29-25.15

Hrvoje Matijakovic (hrvojem) on 2016-05-20

Changed in percona-xtradb-cluster:
status:	Fix Committed → Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1901

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.