intermittent assertion failure in compact backup prepare

Bug #1372531 reported by David Bennett
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Triaged
Medium
Alexey Kopytov
2.1
Triaged
Medium
Alexey Kopytov
2.2
Triaged
Medium
Alexey Kopytov
2.3
Triaged
Medium
Alexey Kopytov

Bug Description

While performing SST MTR testing with production builds, during a PXC SST transfer using XtraBackup, the XtraBackup prepare stage will intermittently fail with a SIGABRT while rebuilding an index causing the SST transfer to fail.

==== Platform: Centos 5 x66_64

==== Binaries (from Jenkins production builds):

  Percona-XtraDB-Cluster-5.6.20-rel68.0-25.7.886.Linux.x86_64.tar.gz (from Jenkins)
  percona-xtrabackup-2.2.4-5022-debug-Linux-x64_64.tar.gz (revno 5022)
     (PXB compiled with -DCMAKE_CXX_FLAGS=-m64 -DCMAKE_C_FLAGS=-m64 -DWITH_DEBUG=1)

==== MTR test: t/xb_galera_sst_advanced.sh

==== Configuration

  # parallel+compact+progressfile+time
  [xtrabackup]
  parallel=4
  compact

  [sst]
  time=1
  streamfmt=xbstream
  progress=/tmp/progress2-conf6.log

==== Pertinent MTR test output (xb_galera_sst_advancedconf6)

...
2014-09-21 22:06:54: run.sh: Made 29 attempts to connect to server
2014-09-21 22:06:55: run.sh: Made 30 attempts to connect to server
2014-09-21 22:06:56: run.sh: Server process PID=2749 died.
2014-09-21 22:06:56: run.sh: Can't start the server. Server log (if exists):
2014-09-21 22:06:56: run.sh: ----------------
2014-09-21 22:06:56: run.sh: Error log for server with id: 1
...
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/home/dbennett/work/2/Percona-XtraDB-Cluster-5.6.20-rel68.
0-25.7.886.Linux.x86_64/percona-xtradb-cluster-tests/var/w1/var901/my.cnf --apply-log $rebuildcmd ${DATA} &>/home/db
ennett/logs/innobackup.14092122061411351585.prepare.log (20140921 22:06:43.525)
WSREP_SST: [ERROR] Cleanup after exit with status:1 (20140921 22:06:46.433)
WSREP_SST: [INFO] Removing the sst_in_progress file (20140921 22:06:46.435)
2014-09-21 22:06:46 2749 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '127.0.0.1:19320' --auth 'root:password' --datadir '/home/dbennett/work/2/Percona-XtraDB-Cluster-5.6.20-rel68.0-25.7.886.Linux.x86_64/percona-xtradb-cluster-tests/var/w1/var901/data/' --defaults-file '/home/dbennett/work/2/Percona-XtraDB-Cluster-5.6.20-rel68.0-25.7.886.Linux.x86_64/percona-xtradb-cluster-tests/var/w1/var901/my.cnf' --parent '2749' '' : 1 (Operation not permitted)
2014-09-21 22:06:46 2749 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2014-09-21 22:06:46 2749 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2014-09-21 22:06:46 2749 [ERROR] Aborting
...

===== pertinent xtrabackup stderr log output

...
[01] Checking if there are indexes to rebuild in table sbtest/sbtest1 (space id: 6)
[01] Found index k_1
[01] Rebuilding 1 index(es).
2014-09-21 22:06:46 7ff4f37fe940 InnoDB: Assertion failure in thread 140690033994048 in file ut0byte.ic line 109
InnoDB: Failing assertion: ptr
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
02:06:46 UTC - xtrabackup got signal 6 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
xtrabackup(my_print_stacktrace+0x32) [0xba05cd]
xtrabackup(handle_fatal_signal+0x335) [0xb57021]
/lib64/libpthread.so.0 [0x7ff507fd3ca0]
/lib64/libc.so.6(gsignal+0x35) [0x7ff50684ffc5]
/lib64/libc.so.6(abort+0x110) [0x7ff506851a70]
xtrabackup [0x7708ed]
xtrabackup [0x77091f]
xtrabackup [0x771add]
xtrabackup [0x77c7d0]
xtrabackup(row_merge_build_indexes(trx_t*, dict_table_t*, dict_table_t*, bool, dict_index_t**, unsigned long const*, unsigned long, TABLE*, dtuple_t const*, unsigned long const*, unsigned long, ib_sequence_t&)+0x419) [0x77dad5]
xtrabackup [0x5f2a46]
xtrabackup [0x5f2d86]
/lib64/libpthread.so.0 [0x7ff507fcb83d]
/lib64/libc.so.6(clone+0x6d) [0x7ff5068f4fcd]

Revision history for this message
David Bennett (dbpercona) wrote :

MTR test output

Revision history for this message
David Bennett (dbpercona) wrote :

XtraBackup STDERR output

summary: - XtraBackup compact option causing intermittent SST failure in PXC
+ intermittent SST failure in PXC causing XtraBackup assertion failure
Revision history for this message
Alexey Kopytov (akopytov) wrote : Re: intermittent SST failure in PXC causing XtraBackup assertion failure

There are some static (and thus, unresolved) functions in the most interesting part of the stacktrace. Can you show the output of:

addr2line /path/to/xtrabackup -fie 0x77091f 0x771add 0x77c7d0

Changed in percona-xtrabackup:
status: New → Incomplete
Revision history for this message
David Bennett (dbpercona) wrote :

class method row_merge_build_indexes() was mangled so I added -C

$ addr2line -Cfie ./xtrabackup 0x7708ed 0x77091f 0x771add 0x77c7d0 0x77dad5 0x5f2a46 0x5f2d86
ut_align_down
/home/dbennett/launchpad/percona-xtrabackup/storage/innobase/include/ut0byte.ic:113
page_align
/home/dbennett/launchpad/percona-xtrabackup/storage/innobase/include/page0page.ic:51
page_cur_is_after_last
/home/dbennett/launchpad/percona-xtrabackup/storage/innobase/include/page0cur.ic:141
row_merge_read_clustered_index
/home/dbennett/launchpad/percona-xtrabackup/storage/innobase/row/row0merge.cc:1419
row_merge_build_indexes(trx_t*, dict_table_t*, dict_table_t*, bool, dict_index_t**, unsigned long const*, unsigned long, TABLE*, dtuple_t const*, unsigned long const*, unsigned long, ib_sequence_t&)
/home/dbennett/launchpad/percona-xtrabackup/storage/innobase/row/row0merge.cc:3534
xb_rebuild_indexes_for_table
/home/dbennett/launchpad/percona-xtrabackup/storage/innobase/xtrabackup/src/compact.cc:844
xb_rebuild_indexes_thread_func
/home/dbennett/launchpad/percona-xtrabackup/storage/innobase/xtrabackup/src/compact.cc:916

Revision history for this message
Alexey Kopytov (akopytov) wrote :

Thanks. Unfortunately, resolved symbols do not shed any light in this case, they just show that reading the clustered index encounters a zero record pointer where it is not expected.

We need a core file and a copy of the backup directory that triggers the crash to analyze further.

Revision history for this message
David Bennett (dbpercona) wrote :

I attempted to reproduce the problem by taking a snapshot of the database after the SIGABRT and was unable to reproduce the problem.

I then attempted to snapshot the database (using tar) after mysql_install_db runs and before mysqld started and was unable to re-produce the crash scenario.

XtraBackup revno 5024 added a core dump option in addition to the logged thread backtrace. This made it possible to reproduce the crash scenario with a full core dump.

https://drive.google.com/a/percona.com/file/d/0B0C8v83ulmUbc1pHRzlNR1B3elE/edit?usp=sharing

Changed in percona-xtrabackup:
status: Incomplete → Confirmed
assignee: nobody → Alexey Kopytov (akopytov)
Revision history for this message
Alexey Kopytov (akopytov) wrote :
summary: - intermittent SST failure in PXC causing XtraBackup assertion failure
+ intermittent assertion failure in compact backup prepare
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Most likely a duplicate of bug #1192834. Please reopen if reproducible with that one fixed.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-703

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.