Comment 1 for bug 1629566

Revision history for this message
Przemek (pmalkowski) wrote :

I can confirm the donor crashes when wsrep_desync is set during SST session, and later is tried to be turned off.

[7cf5e9886947] {root} ((none)) > select @@version,@@version_comment\G
*************************** 1. row ***************************
        @@version: 5.6.30-76.3-56-log
@@version_comment: Percona XtraDB Cluster (GPL), Release rel76.3, Revision 64742ac, WSREP version 25.16, wsrep_25.16
1 row in set (0.00 sec)

First, donor is chosen to serve full SST, and during the SST was in progress, on donor the explicit desync was executed:

[7cf5e9886947] {root} ((none)) > set global wsrep_desync = ON;
Query OK, 0 rows affected (0.00 sec)

Log from donor:

2016-10-02 13:10:47 10465 [Note] WSREP: Member 1.0 (node3) requested state transfer from 'node1'. Selected 0.0 (node1)(SYNCED) as donor.
2016-10-02 13:10:47 10465 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 26066)
2016-10-02 13:10:47 10465 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-10-02 13:10:47 10465 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.3:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix '' --binlog 'mysql-binlog' --gtid '3e6fc59a-75a5-11e6-8d27-c6583ca1a844:26066''
2016-10-02 13:10:47 10465 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20161002 13:10:47.426)
WSREP_SST: [INFO] Using socat as streamer (20161002 13:10:47.428)
WSREP_SST: [INFO] Using /tmp/tmp.85qn89WXgV as xtrabackup temporary directory (20161002 13:10:47.441)
WSREP_SST: [INFO] Using /tmp/tmp.vOdDwTkzdD as innobackupex temporary directory (20161002 13:10:47.443)
WSREP_SST: [INFO] Streaming GTID file before SST (20161002 13:10:47.447)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:172.17.0.3:4444; RC=( ${PIPESTATUS[@]} ) (20161002 13:10:47.448)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20161002 13:10:47.455)
2016-10-02 13:10:49 10465 [Note] WSREP: (864f40f1, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 172.17.0.3 4444 (20161002 13:10:57.457)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.17.0.3:4444; RC=( ${PIPESTATUS[@]} ) (20161002 13:10:57.460)
2016-10-02 13:11:08 10465 [Note] WSREP: 0.0 (node1): State transfer to 1.0 (node3) complete.
2016-10-02 13:11:08 10465 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 26066)
2016-10-02 13:11:08 10465 [Note] WSREP: SYNC message ignored as node 0.0 (node1) was re-transitioned to DONOR mode before it synced.
WSREP_SST: [INFO] Total time on donor: 0 seconds (20161002 13:11:08.717)
WSREP_SST: [INFO] Cleaning up temporary directories (20161002 13:11:08.719)
2016-10-02 13:11:43 10465 [Note] WSREP: 1.0 (node3): State transfer from 0.0 (node1) complete.
2016-10-02 13:11:43 10465 [Note] WSREP: Member 1.0 (node3) synced with group.

After that, donor state:

[7cf5e9886947] {root} ((none)) > show status like 'wsrep_local_state_comment';
+---------------------------+--------+
| Variable_name | Value |
+---------------------------+--------+
| wsrep_local_state_comment | Joined |
+---------------------------+--------+
1 row in set (0.00 sec)

So when tried to again explicitly disable desync:
[7cf5e9886947] {root} ((none)) > set global wsrep_desync = 0;
Query OK, 0 rows affected (0.00 sec)

[7cf5e9886947] {root} ((none)) > show status like 'wsrep_local_state_comment';
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111)
ERROR:
Can't connect to the server

Further donor's log:

2016-10-02 13:13:28 10465 [Note] WSREP: 0.0 (node1): State transfer to 1.0 (node3) complete.
2016-10-02 13:13:28 10465 [ERROR] WSREP: FSM: no such a transition JOINED -> JOINED
13:13:28 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=25165824
read_buffer_size=131072
max_used_connections=2
max_threads=202
thread_count=3
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 105041 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f219c000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f21c16bb998 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x907295]
/usr/sbin/mysqld(handle_fatal_signal+0x4b4)[0x66b074]
/lib64/libpthread.so.0(+0xf7e0)[0x7f21c14057e0]
/lib64/libc.so.6(gsignal+0x35)[0x7f21bf63e5e5]
/lib64/libc.so.6(abort+0x175)[0x7f21bf63fdc5]
/usr/lib64/libgalera_smm.so(_ZN6galera3FSMINS_10Replicator5StateENS_13ReplicatorSMM10TransitionENS_10EmptyGuardENS3_11StateActionEE8shift_toES2_+0x212)[0x7f21bd5e4302]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM12process_joinEll+0x178)[0x7f21bd5df478]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x21d)[0x7f21bd5ba87d]
/usr/lib64/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7f21bd5baed3]
/usr/lib64/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x73)[0x7f21bd5df8a3]
/usr/lib64/libgalera_smm.so(galera_recv+0x24)[0x7f21bd5efe14]
/usr/sbin/mysqld[0x5a21f1]
/usr/sbin/mysqld(start_wsrep_THD+0x36e)[0x58b83e]
/lib64/libpthread.so.0(+0x7aa1)[0x7f21c13fdaa1]
/lib64/libc.so.6(clone+0x6d)[0x7f21bf6f4aad]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
161002 13:13:28 mysqld_safe Number of processes running now: 0
161002 13:13:28 mysqld_safe WSREP: not restarting wsrep node automatically
161002 13:13:28 mysqld_safe mysqld from pid file /tmp/mysql.pid ended