Falling join node after a successful SST

Bug #1714684 reported by Stanislav Gritsenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
New
Undecided
Unassigned

Bug Description

I observe a strange situation when I try to add a second node to the percona galera cluster.
After a long file upload using xtrabackup-v2 (3.5TB of data and about 1,000,000 files) and successfully passed all the stages, judging by the log, the second node falls with an absolutely uninformative error:

2017-09-02 01:31:14 17707 [Note] InnoDB: Using atomics to ref count buffer pool pages
2017-09-02 01:31:14 17707 [Note] InnoDB: The InnoDB memory heap is disabled
2017-09-02 01:31:14 17707 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-09-02 01:31:14 17707 [Note] InnoDB: Memory barrier is not used
2017-09-02 01:31:14 17707 [Note] InnoDB: Compressed tables use zlib 1.2.7
2017-09-02 01:31:14 17707 [Note] InnoDB: Using Linux native AIO
2017-09-02 01:31:14 17707 [Note] InnoDB: Using CPU crc32 instructions
2017-09-02 01:31:14 17707 [Warning] InnoDB: innodb_file_io_threads is deprecated. Please use innodb_read_io_threads and innodb_write_io_threads instead
2017-09-02 01:31:14 17707 [Note] InnoDB: Initializing buffer pool, size = 10.0G
2017-09-02 01:31:15 17707 [Note] InnoDB: Completed initialization of buffer pool
2017-09-02 01:31:16 17707 [Note] InnoDB: Highest supported file format is Barracuda.
2017-09-02 01:32:27 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000047 of size 134217728 bytes
2017-09-02 02:44:02 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000048 of size 134217728 bytes
2017-09-02 03:03:40 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000049 of size 134217728 bytes
2017-09-02 03:21:14 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000050 of size 134217728 bytes
2017-09-02 03:40:19 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000051 of size 134217728 bytes
2017-09-02 03:44:41 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000052 of size 134217728 bytes
2017-09-02 03:49:21 17707 [Note] WSREP: Created page /var/lib/mysql/gcache.page.000053 of size 134217728 bytes
2017-09-02 03:53:24 17707 [Note] InnoDB: 128 rollback segment(s) are active.
2017-09-02 03:53:24 17707 [Note] InnoDB: Waiting for purge to start
2017-09-02 03:53:24 17707 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.0 started; log sequence number 7400309372985
2017-09-02 03:53:24 17707 [ERROR] Aborting

2017-09-02 03:53:24 17707 [Note] WSREP: Signalling cancellation of the SST request.
2017-09-02 03:53:24 17707 [Note] WSREP: SST request was cancelled
2017-09-02 03:53:24 17707 [Note] WSREP: Closing send monitor...
2017-09-02 03:53:24 17707 [Note] WSREP: Closed send monitor.
2017-09-02 03:53:24 17707 [Note] WSREP: gcomm: terminating thread
2017-09-02 03:53:24 17707 [Note] WSREP: gcomm: joining thread
2017-09-02 03:53:24 17707 [Note] WSREP: gcomm: closing backend
2017-09-02 03:53:26 17707 [Note] WSREP: Service disconnected.
2017-09-02 03:53:26 17707 [Note] WSREP: Waiting to close threads......
2017-09-02 03:53:26 17707 [Note] WSREP: rollbacker thread exiting

-------------------------------------

This info correspond with strace log data:

2049 03:53:24 <... setpriority resumed> ) = 0
2049 03:53:24 getpriority(PRIO_PROCESS, 2049) = 20
17707 03:53:24 write(2, "2017-09-02 03:53:24 17707 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.0 started; log sequence number 7400309372985\n", 137 <unfinished ...>
2049 03:53:24 select(0, NULL, NULL, NULL, {1, 0} <unfinished ...>
17707 03:53:24 <... write resumed> ) = 137
17707 03:53:24 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7fb32cbf9000
17707 03:53:24 mprotect(0x7fb32cbf9000, 4096, PROT_NONE) = 0
17707 03:53:24 clone(child_stack=0x7fb32d3f8e30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb32d3f99d0, tls=0x7fb32d3f9700, child_tidptr=0x7fb32d3f99d0) = 2050
2050 03:53:24 set_robust_list(0x7fb32d3f99e0, 24 <unfinished ...>
17707 03:53:24 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0 <unfinished ...>
2050 03:53:24 <... set_robust_list resumed> ) = 0
17707 03:53:24 <... mmap resumed> ) = 0x7fb32c3f8000
17707 03:53:24 mprotect(0x7fb32c3f8000, 4096, PROT_NONE) = 0
2050 03:53:24 futex(0x7fbf9746b58c, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
17707 03:53:24 clone( <unfinished ...>
2051 03:53:24 set_robust_list(0x7fb32cbf89e0, 24 <unfinished ...>
17707 03:53:24 <... clone resumed> child_stack=0x7fb32cbf7e30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb32cbf89d0, tls=0x7fb32cbf8700, child_tidptr=0x7fb32cbf89d0) = 2051
2051 03:53:24 <... set_robust_list resumed> ) = 0
17707 03:53:24 mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0 <unfinished ...>
2051 03:53:24 futex(0x7fb4cd3ef90c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1504324414, 807821000}, ffffffff <unfinished ...>
17707 03:53:24 <... mmap resumed> ) = 0x7fb32bbf7000
17707 03:53:24 mprotect(0x7fb32bbf7000, 4096, PROT_NONE) = 0
17707 03:53:24 clone( <unfinished ...>
2052 03:53:24 set_robust_list(0x7fb32c3f79e0, 24 <unfinished ...>
17707 03:53:24 <... clone resumed> child_stack=0x7fb32c3f6e30, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fb32c3f79d0, tls=0x7fb32c3f7700, child_tidptr=0x7fb32c3f79d0) = 2052
2052 03:53:24 <... set_robust_list resumed> ) = 0
2052 03:53:24 futex(0x7fb334add74c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1504324409, 808928000}, ffffffff <unfinished ...>
2047 03:53:24 <... pread resumed> "K\315\1\320\0\t\276\200\0\2\321b\0\2\321\\\0\0\6\271\7U\2727E\277\0\0\0\0\0\0\0\0\0\0\0\0\0!=\251\0\233\16\356\6\353\0\0\0\1\0\1\0\206\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\1\0\0\3<\234infimum\0\t\6\0\10\3\0\0supremum\0NJ% \0\20\7;\335whois_record_delta/#sql-291557_8e08aawhois_record_delta/FK9F837BD2D0C7A375\0\n \350NJ% \0\30\0077\313whois_record_delta/#sql-291557_a77c0bwhois_record_delta/FKA6CD4E737B556202\0\n\4\276NJ% \0 \7\0\335whois_record_delta/#sql-291557_a7a611whois_record_delta/FKB4869798B8CF12D0\0\n\4\277NJ% \0(\7\0014whois_record_delta/#sql-291557_a7ba06whois_record_delta/FKC8DFC7E4D0C7A375\0\n \340`\\7\0\0000\7\2Kwhois_record_delta/registry_data_academy_add_2017_07_31whois_record_delta/FKA03AB7BF7B556202\0\3`Md`;\0\0008\7\2\270whois_record_delta/registry_data_accountants_add_2017_04_13whois_record_delta/FK5DCE39197B556202\0\t\352\21c_:\0\0@\7\3$whois_record_delta/registry_data_accountant_add_2017_07_28whois_record_delta/FK27E110F37B556202\0\4Z\36c_:\5\0H\7\3\220whois_record_delta/registry_data_accountant_add_2017_08_28whois_record_delta/FK27E18552D0C7A375\0\4Z\35^Z5\0\0P\7\3\367whois_record_delta/registry_data_actor_add_2017_07_06whois_record_delta/FK38C0C9FED0C7A375\0\0029\24^Z5\0\0X\7\4^whois_rec"..., 16384, 10462691328) = 16384
2047 03:53:24 pread(12, "\230S\314\376\0\n \354\0\3`J\0\3`M\0\0\6\271\0079B\302E\277\0\0\0\0\0\0\0\0\0\0\0\0\0\7*~\0\200\35\222 ~\0\0\0\5\0\0\0\31\0\0\0\0\246\0\312m\0\0\0\0\0\0\0\0\0\f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\1\0\0\3\24\17infimum\0\t\3\0\10\3\0\0supremum\0J% \0\20\5\2\303whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A9420710653J%$\0\30\5\1{whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A9479B00024J%$\0 \5\0\327whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A947B556202J% \0(\5\1\315whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A94B8CF12D0J% \0000\5\0\205whois_record_delta/#sql-291557_abd422whois_record_delta/FKF3D90A94D0C7A375J% \0008\5#zwhois_record_delta/#sql-291557_abd422whois_record_delta/FKFED1313020710653J%$\0@\5\3gwhois_record_delta/#sql-291557_abd422whois_record_delta/FKFED1313079B00024J%$\0H\5\2qwhois_record_delta/#sql-291557_abd422whois_record_delta/FKFED131307B556202J% \0P\5\3\271whois_record_delta/#sql-291557_abd422whois_record_delta/FKFED13130B8CF12D0J% \0X\5\3\25whois_record_delta/#sql-291557_abd422whois_record_delta/FKFED13130C7212EEFJ%$\0`\5\2\37whois_record_delta/#sql-291557_abd422whois_record_delta/FKFED13130D0C7A"..., 16384, 10875502592) = 16384
2047 03:53:24 pread(12, <unfinished ...>
17707 03:53:24 write(2, "2017-09-02 03:53:24 17707 [ERROR] Aborting\n\n", 44) = 44
17707 03:53:24 write(2, "2017-09-02 03:53:24 17707 [Note] WSREP: Signalling cancellation of the SST request.\n", 84) = 84
17707 03:53:24 write(2, "2017-09-02 03:53:24 17707 [Note] WSREP: SST request was cancelled\n", 66) = 66
17707 03:53:24 futex(0x7fb885bdb224, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fb885bdb220, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
------------------

All logs has been attached.

Revision history for this message
Stanislav Gritsenko (stalker2011) wrote :
Revision history for this message
Stanislav Gritsenko (stalker2011) wrote :

have added archive with
- output of dmesg
- output of pt-summary
- output of pt-mysql-summary
- full copy of /var/log/messages (or syslog)
- full copy of /etc/my.cnf and any other cnf file that might be included
- full copy of error logs
- full copy of /var/lib/mysql/innobackup.*.log
- output of rpm -qa |egrep -i "mysql|percona|maria" (or dpkg -l |egrep -i "mysql|percona|maria" |grep "ii")

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

- From the log and output it suggest that SST is still active.
- Did you run out of disk space on JOINER ?

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-2005

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.