PXC crashes while running pt-table-checksum

Bug #1618097 reported by Marc A. Mueller
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Expired
Undecided
Unassigned

Bug Description

We have a PXC of 4 nodes. These nodes are docker container based on CentOS 7, running
Percona-XtraDB-Cluster-56.x86_64 1:5.6.30-25.16.1.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-client-56.x86_64 1:5.6.30-25.16.1.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-galera-3.x86_64 3.16-1.rhel7 @percona-release-x86_64
Percona-XtraDB-Cluster-server-56.x86_64 1:5.6.30-25.16.1.el7 @percona-release-x86_64
Percona-XtraDB-Cluster-shared-56.x86_64 1:5.6.30-25.16.1.el7 @percona-release-x86_64
percona-release.noarch 0.1-3 @/percona-release-0.1-3.noarch
percona-xtrabackup.x86_64 2.3.5-1.el7 @percona-release-x86_64

NODE3 and NODE4 are running on the same docker host. NODE4 ist configured as slave, but slave was not startet yet. While running
  pt-table-checksum --host=NODE3 --user=root --password=$pw --recursion-method=cluster
NODE4 crashed with following output:

2016-08-29 12:09:50 1 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='192.168.0.3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2016-08-29 12:11:36 1 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='192.168.0.3', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='192.168.0.3', master_port= 3306, master_log_file='mysql-bin.000074', master_log_pos= 844022, master_bind=''.
2016-08-29 14:11:22 1 [Warning] IP address '192.168.3.2' could not be resolved: Name or service not known
InnoDB: Page directory corruption: infimum not pointed to
2016-08-29 14:11:57 7f42347f8700 InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex [32768x'0', not printed here]
InnoDB: End of page dump
2016-08-29 14:11:57 7f42347f8700 InnoDB: uncompressed page, stored checksum in field1 0, calculated checksums for field1: crc32 536728786, innodb 1575996416, none 3735928559, stored checksum in field2 0, calculated checksums for field2: crc32 536728786, innodb 1371122432, none 3735928559, page LSN 0 0, low 4 bytes of LSN at page end 0, page number (if stored to page already) 0, space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be a freshly allocated page
InnoDB: Page directory corruption: supremum not pointed to
2016-08-29 14:11:57 7f42347f8700 InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex [32768x'0', not printed here]
InnoDB: End of page dump
2016-08-29 14:11:58 7f42347f8700 InnoDB: uncompressed page, stored checksum in field1 0, calculated checksums for field1: crc32 536728786, innodb 1575996416, none 3735928559, stored checksum in field2 0, calculated checksums for field2: crc32 536728786, innodb 1371122432, none 3735928559, page LSN 0 0, low 4 bytes of LSN at page end 0, page number (if stored to page already) 0, space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be a freshly allocated page
14:11:58 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster

key_buffer_size=524288
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=4
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61454 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f4200000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f42347f79a0 thread_stack 0x40000
mysqld(my_print_stacktrace+0x3b)[0x90b6fb]
mysqld(handle_fatal_signal+0x471)[0x67bee1]
/usr/lib64/libpthread.so.0(+0xf100)[0x7f4280b47100]
mysqld[0x98f726]
mysqld[0xa3f81a]
mysqld[0x9e5619]
mysqld[0x9257d9]
mysqld(_ZN7handler13ha_index_nextEPh+0x6d)[0x5b9d3d]
mysqld(_ZN7handler15read_range_nextEv+0x20)[0x5be530]
mysqld(_ZN7handler21multi_range_read_nextEPPc+0xb2)[0x5b5252]
mysqld(_ZN18QUICK_RANGE_SELECT8get_nextEv+0x5a)[0x81308a]
mysqld[0x836b0d]
mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x101)[0x6def21]
mysqld(_ZN4JOIN4execEv+0x2e8)[0x6de148]
mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_P10SQL_I_ListI8st_orderESB_S7_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x275)[0x72c7a5]
mysqld(_Z13handle_selectP3THDP13select_resultm+0x165)[0x72d005]
mysqld(_Z21mysql_execute_commandP3THD+0x5ffe)[0x7076ae]
mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x618)[0x7092a8]
mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0x614)[0x896c74]
mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x6a)[0x894faa]
mysqld(_Z14wsrep_apply_cbPvPKvmjPK14wsrep_trx_meta+0x628)[0x5b0cd8]
/usr/lib64/galera3/libgalera_smm.so(_ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6_+0xd5)[0x7f426237dee5]
/usr/lib64/galera3/libgalera_smm.so(+0x1f5730)[0x7f42623bb730]
/usr/lib64/galera3/libgalera_smm.so(_ZN6galera13ReplicatorSMM9apply_trxEPvPNS_9TrxHandleE+0xd4)[0x7f42623be164]
/usr/lib64/galera3/libgalera_smm.so(_ZN6galera13ReplicatorSMM11process_trxEPvPNS_9TrxHandleE+0x10e)[0x7f42623c13de]
/usr/lib64/galera3/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x1b8)[0x7f426239d738]
/usr/lib64/galera3/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x57)[0x7f426239eed7]
/usr/lib64/galera3/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x7b)[0x7f42623c195b]
/usr/lib64/galera3/libgalera_smm.so(galera_recv+0x1d)[0x7f42623d17dd]
mysqld[0x5b1834]
mysqld(start_wsrep_THD+0x293)[0x595c73]
/usr/lib64/libpthread.so.0(+0x7dc5)[0x7f4280b3fdc5]
/usr/lib64/libc.so.6(clone+0x6d)[0x7f427ed90ced]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f4200008fd0): is an invalid pointer
Connection ID (thread ID): 3
Status: NOT_KILLED

You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.

Revision history for this message
Krunal Bauskar (krunal-bauskar) wrote :

If I get you setup correctly:

Setup:
-----

You have have 4 nodes (3 pxc and 1 independent slave using PXC binaries with wsrep_provider=none).
n1, n2, n3 (pxc-node)
n4 (independent slave configured to replicate from pxc-cluster-node n3).

Process:
-------
Cluster nodes are active and running and you can execute needed workload on cluster node w/o any issues. Eventually you introduce n4.

a. How do you sync n4 from n3 or you let it catchup on its own.
b. At what stage do you pt-table-checksum. (After n4 has catched up completely ?)

Changed in percona-xtradb-cluster:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona XtraDB Cluster because there has been no activity for 60 days.]

Changed in percona-xtradb-cluster:
status: Incomplete → Expired
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1924

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.