Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Percona XtraDB Cluster 5.7 single node crash with ALTER TABLE statement

Bug #1729648 reported by Mrten on 2017-11-02

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	New	Undecided	Unassigned

Bug Description

Hi,

An ALTER TABLE statement for resetting an AUTO_INCREMENT column for a table with a foreign key index results in a crash for 1 node in my 3 node percona-xtradb-cluster-server-5.7 (5.7.19-29.22-3.trusty) cluster. Attached you will find a minimal test case which consistently results in a single node crash in my 3 node cluster. The same test case does not crash any node on my 5.5 cluster.

Relevant log entries on the crashed node:

2017-11-02T16:01:15.616852Z 13 [ERROR] WSREP: Trx 299 tries to abort slave trx 300. This could be caused by:
        1) unsupported configuration options combination, please check documentation.
        2) a bug in the code.
        3) a database corruption.
Node consistency compromized, need to abort. Restart the node to resync with cluster.
16:01:15 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=536870912
read_buffer_size=1048576
max_used_connections=1
max_threads=129
thread_count=34
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 919388 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f8bb740e000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f9730b02a40 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xf0dc1c]
/usr/sbin/mysqld(handle_fatal_signal+0x461)[0x7ad821]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f99be18b330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f99bd5c8c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f99bd5cc028]
/usr/sbin/mysqld[0xff19bb]
/usr/sbin/mysqld(_Z27wsrep_innobase_kill_one_trxPvPK5trx_tPS0_m+0x790)[0xff2960]
/usr/sbin/mysqld[0x104c9f3]
/usr/sbin/mysqld(_Z10lock_tablemP12dict_table_t9lock_modeP9que_thr_t+0x11a)[0x105357a]
/usr/sbin/mysqld(_Z18lock_table_for_trxP12dict_table_tP5trx_t9lock_mode+0xaf)[0x10536cf]
/usr/sbin/mysqld(_ZN11ha_innobase26commit_inplace_alter_tableEP5TABLEP18Alter_inplace_infob+0x2d4)[0x10239c4]
/usr/sbin/mysqld(_Z17mysql_alter_tableP3THDPKcS2_P24st_ha_create_informationP10TABLE_LISTP10Alter_info+0x3a99)[0xd4c169]
/usr/sbin/mysqld(_ZN19Sql_cmd_alter_table7executeEP3THD+0x80b)[0xe3f30b]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x5763)[0xcdafa3]
/usr/sbin/mysqld(_Z11mysql_parseP3THDP12Parser_state+0x645)[0xcde595]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcm+0x6b2)[0xe7f372]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x6e)[0xe7d7de]
/usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvmjPK14wsrep_trx_meta+0x67f)[0x7c759f]
/usr/lib/libgalera_smm.so(_ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6_+0xd5)[0x7f99b1533635]
/usr/lib/libgalera_smm.so(+0x1e8df4)[0x7f99b1570df4]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0x18a)[0x7f99b15735ea]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x11e)[0x7f99b157686e]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x1b0)[0x7f99b1552da0]
/usr/lib/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x57)[0x7f99b1554577]
/usr/lib/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x7b)[0x7f99b1576f9b]
/usr/lib/libgalera_smm.so(galera_recv+0x1d)[0x7f99b158830d]
/usr/sbin/mysqld[0x7c890f]
/usr/sbin/mysqld(start_wsrep_THD+0x222)[0x79eef2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f99be183184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f99bd68fffd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f8bb74660e9): is an invalid pointer
Connection ID (thread ID): 13
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
(END)

Revision history for this message

Mrten (bugzilla-ii) wrote on 2017-11-02:

test case for node crash Edit (1.9 KiB, application/x-sql)

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2017-11-06:

Please check the steps to reproduce.

1. Start 3 nodes cluster

2. Just run the said test-case against node-1

Expect a crash.

I tried to reproduce it as said above but couldn't reproduce.

a. Either something is missing.
b. Can you share your configuration (my.cnf for the all 3 nodes).

Changed in percona-xtradb-cluster:
status:	New → Incomplete

Revision history for this message

Mrten (bugzilla-ii) wrote on 2017-11-06:

Binary searching our configs for a crash found a hit: add

wsrep_slave_threads=32

If at default (=1) no crash occurs, but going upward from >1 you need multiple runs until you reach ~16 for an almost certain crash on each test case execution.

Revision history for this message

Mrten (bugzilla-ii) wrote on 2017-11-06:

Also I have to mention that it is always the slave nodes that crash, not the node you run the test case on.

Revision history for this message

Mrten (bugzilla-ii) wrote on 2017-11-06:

Keep in mind that wsrep_autoincrement_control needs to be sane (off :) as well.

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2017-11-09:

Issue can be reproduced with upstream builder (Codership). You can file it with codership too so that when they fix the bug it get inherited in all variants.

Revision history for this message

Mrten (bugzilla-ii) wrote on 2017-11-09:

Thanks, reported here: https://github.com/codership/galera/issues/487

Can you remove the Incomplete tag to prevent this bug expiring?

Revision history for this message

Mrten (bugzilla-ii) wrote on 2017-12-04:

Do you perhaps have a more current contact address for upstream?

github is pretty silent (last reply on an issue was in march) and the google group from galera.com/community has not seen replies from codership employees for months.

Krunal Bauskar (krunal-bauskar) on 2017-12-04