Crash during apply with index compaction enabled

Bug #1192834 reported by Raghavendra D Prabhu
118
This bug affects 20 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Won't Fix
High
Sergei Glushchenko
2.1
Won't Fix
High
Unassigned
2.2
Won't Fix
High
Sergei Glushchenko
2.3
Fix Released
High
Sergei Glushchenko
2.4
Won't Fix
High
Sergei Glushchenko

Bug Description

When xtrabackup is taken with

[xtrabackup]
compress
compact
parallel=4

and on other node, --rebuild-indexes is passed.

It crashes with -- 130620 11:16:40 InnoDB: Assertion failure in thread 139853708203776 in file rem0rec.c line 561

Full log here - https://gist.github.com/ronin13/3075e8ad983bf57514ec

With compact disabled, it worked fine.

The data on source node was generated by this RQG:

perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=4000:user=root:database=test:password=test --gendata=conf/galera/galera_stress.zz --grammar=conf/galera/galera_stress.yy --threads=5 --queries=10000000000 --duration=30 --seed=time --reporter=Shutdown,Backtrace,QueryTimeout --mask=0 --mask-level=0 --engine=InnoDB

where the galera_*.{yy,zz} are from https://bazaar.launchpad.net/~elenst/randgen/mariadb-patches/revision/881

This was a non-debug build.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Crashed again - the logs here: https://gist.github.com/8603d0ee263de936728a

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Had another crash - this one without partitions - https://gist.github.com/9f3050a1cc1ef7c28d58

tags: added: pxc
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Is the crash in lp:1154495 similar in nature to this? (The crash in that bug is not described there).

Revision history for this message
Frederic Descamps (lefred) wrote :

I've tried to reproduce but it works for me without problem.

OS: CentOS 6.3
PXC: 5.5.31

my.cnf settings:

   [sst]
   streamfmt=xbstream

   [xtrabackup]
   compress
   compact
   parallel=2
   compress-threads=2
   rebuild-threads=2

https://github.com/lefred/percona-cluster vagrant environment was used to test it.

Test 1: sysbench running on node1, node3 joins doing SST and choose node2 as DONOR -> works
Test 2: sysbench running on node1, node2 is down, node3 joins doing SST (node1 is DONOT) -> works

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I tested with a xtrabackup built from bzr (which is newer than the xtrabackup release), however, I will test with latest code and see how it goes.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I can repeat this with PXB 2.1.4. Prepare log attached.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I have narrowed this down - this happens when binlogging is enabled.

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

binlog_format = STATEMENT is fine here, but binlog_format=ROW is what causes the problem. (Unfortunately for PXC ROW is the only way).

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Able to reproduce this with a single node.

Cnf is:

[mysqld]
socket = /pxc/datadir/pxc.sock
port = 4000
basedir=/pxc
datadir=/pxc/datadir
binlog_format = ROW
open_files_limit = 65536
thread_stack = 256K
thread_cache_size = 512
tmp_table_size = 32M
max_heap_table_size = 32M
max_connections = 10000
open-files-limit = 65535
table_open_cache = 8192
table_definition_cache = 8192
key_buffer_size = 64M
innodb_buffer_pool_size = 500M
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
wsrep_node_address=127.0.0.1
innodb_log_files_in_group = 2
innodb_log_file_size = 20M
innodb_file_per_table = 1
innodb_file_format = Barracuda

wsrep_sst_receive_address=127.0.0.1:4001
wsrep_node_address=127.0.0.1
wsrep_node_incoming_address=127.0.0.1
wsrep_cluster_address='gcomm://127.0.0.1:5010,127.0.0.1:6010'
wsrep_provider=/pxc/lib/libgalera_smm.so
wsrep_provider_options = "gmcast.listen_addr=tcp://127.0.0.1:4010"

wsrep_slave_threads=2
wsrep_cluster_name=Archie
wsrep_sst_method=xtrabackup
wsrep_node_name=Arch1

innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2

[client]
user=root
password=test
socket = /pxc/datadir/pxc.sock

[xtrabackup]
compact

[sst]
streamfmt=xbstream

Backup Log:

http://paste.wnohang.net/5ed7a2

Prepare log:

http://paste.wnohang.net/cec1a8

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I created a fresh data directory without " innodb_file_format = Barracuda", was able to repeat it.

Revision history for this message
hardbop200 (hardbop200) wrote :

I'm experiencing the same issue, although the error message is slightly different;

Error message here: https://gist.github.com/8fbf16d257cae44c6805.git

my.cnf here: https://gist.github.com/421a9576295cf6ef844f.git

MySQL version:

mysql Ver 14.14 Distrib 5.5.39-36.0, for debian-linux-gnu (x86_64) using EditLine wrapper

Xtrabackup version:

xtrabackup version 2.2.4 based on MySQL server 5.6.17 Linux (x86_64) (revision id: )

xtrabackup_info:

uuid = 1eef22ed-4a40-11e4-98be-f23c91737845
name =
tool_name = innobackupex
tool_command = --no-timestamp /var/backups/mysql/db-bkup-2014-10-02
tool_version = 1.5.1-xtrabackup
ibbackup_version = xtrabackup version 2.2.4 based on MySQL server 5.6.17 Linux (x86_64) (revision id: )
server_version = 5.5.39-36.0-log
start_time = 2014-10-02 09:22:53
end_time = 2014-10-02 09:26:38
lock_time = 12
binlog_pos = filename 'mysql-bin.000014', position 36303733
innodb_from_lsn = 0
innodb_to_lsn = 679984765165
partial = N
incremental = N
format = file
compact = N
compressed = N
encrypted = N

Revision history for this message
hardbop200 (hardbop200) wrote :

Regarding comment #11, I am able to stop and restart mysql and the backup goes through. (Sorry, I can't figure out how to edit my post.)

Revision history for this message
Skyline Servers (nimk) wrote :
Download full text (3.4 KiB)

Any bumps needed here or is a solution in sight?

This was throwing this in syslog:

Nov 24 07:19:25 db02 mysqld: WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql/ (20141124 07:19:25.536)
Nov 24 07:19:25 db02 mysqld: WSREP_SST: [INFO] Evaluating innobackupex --no-version-check --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20141124 07:19:25.547)
Nov 24 07:19:37 db02 mysqld: WSREP_SST: [ERROR] Cleanup after exit with status:1 (20141124 07:19:37.391)
Nov 24 07:19:37 db02 mysqld: WSREP_SST: [INFO] Removing the sst_in_progress file (20141124 07:19:37.407)
Nov 24 07:19:37 db02 mysqld: 2014-11-24 07:19:37 9103 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.100.101.23' --auth 'wsrep_user:' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '9103' '' : 1 (Operation not permitted)
Nov 24 07:19:37 db02 mysqld: 2014-11-24 07:19:37 9103 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
Nov 24 07:19:37 db02 mysqld: 2014-11-24 07:19:37 9103 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
Nov 24 07:19:37 db02 mysqld: 2014-11-24 07:19:37 9103 [ERROR] Aborting

prepare log would end with something like this but with a lot more 0s

Expanding ./mysql/innodb_table_stats.ibd
Expanding ./mysql/innodb_index_stats.ibd
Expanding ./mysql/slave_master_info.ibd
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = ./
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 2097152
xtrabackup: Starting InnoDB instance for recovery.
xtrabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
InnoDB: Using atomics to ref count buffer pool pages
InnoDB: The InnoDB memory heap is disabled
InnoDB: Mutexes and rw_locks use GCC atomic builtins
InnoDB: Memory barrier is not used
InnoDB: Compressed tables use zlib 1.2.7
InnoDB: Not using CPU crc32 instructions
InnoDB: Initializing buffer pool, size = 100.0M
InnoDB: Completed initialization of buffer pool
InnoDB: Highest supported file format is Barracuda.
InnoDB: The log sequence numbers 882919419 and 882919419 in ibdata files do not match the log sequence number 888968847 in the ib_logfiles!
InnoDB: Database was not shutdown normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages
InnoDB: from the doublewrite buffer...
InnoDB: Last MySQL binlog file position 0 12199785, file name mysql-bin.000004
InnoDB: 128 rollback segment(s) are active.
InnoDB: Waiting for purge to start
InnoDB: Page directory corruption: infimum not pointed to
2014-11-24 07:58:12 7f5d6d7a6700 InnoDB: Page dump in ascii and hex (16384 bytes):
len 16384; hex deadbeef00000000000000000000000000000000000000000000000000000000000000000000434f4d504143545000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Something that I did soon before this all went down, which may or may not be relevant, is that I imported a...

Read more...

Revision history for this message
Nate (nate-s) wrote :

I've got a reproduction of this on my host:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty

$ xtrabackup --version
xtrabackup version 2.2.10 based on MySQL server 5.6.22 Linux (x86_64) (revision id: )

The log from my creating the backup is here:

https://gist.github.com/njvack/c0d4ea4fc052179461a4

and here's the full log from trying to apply the log:

https://gist.github.com/njvack/747d17f938c2aed27594

Revision history for this message
Nate (nate-s) wrote :

I can't edit the comment on this bug, but it's worth noting that everything is fine if I make the backup without using --compact.

Revision history for this message
Nate (nate-s) wrote :

Also, the my.cnf:

https://gist.github.com/njvack/64fa99e005a11b4ec55c

My binlog_format appears to be STATEMENT.

Revision history for this message
Nickolay Ihalainen (ihanick) wrote :

For the record the duplicate bug:
https://bugs.launchpad.net/percona-xtrabackup/+bug/1444156

Contains a patch disabling purge thread:

1. xtrabackup creating ibd file with secondary index pages filled with "empty" pages. Each empty page started from deadbeef
2. xtrabackup starting innodb engine including purge thread
3a. xtrabackup filling secondary index pages with correct copies
3b. at the same time with 3a, purge thread could try to modify a secondary index page

A correct solution for the problem should include freezing of purge and change buffer activity untill expand will finish it's work.

The patch disabling purge will not work correctly with export option, because undo segment changes should be applied before exporting tablespace.

Revision history for this message
Nickolay Ihalainen (ihanick) wrote :
Revision history for this message
Alexey Kopytov (akopytov) wrote :

#1372531 marked as a duplicate of this one.

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :
Download full text (3.6 KiB)

test case #1:

##########################################################################
# Bug 1192834: Crash during apply with index compaction enabled #
##########################################################################

. inc/common.sh

start_server --innodb_file_per_table
load_dbase_schema sakila
load_dbase_data sakila

run_cmd $MYSQL $MYSQL_ARGS -e "DELETE FROM PAYMENT LIMIT 100" sakila

sleep 2

backup_dir="$topdir/backup"

innobackupex --no-timestamp --compact $backup_dir
vlog "Backup created in directory $backup_dir"

stop_server

# Remove datadir
rm -r $mysql_datadir

# Restore sakila

innobackupex --apply-log --rebuild-indexes $backup_dir

vlog "Restoring MySQL datadir"
mkdir -p $mysql_datadir
innobackupex --copy-back $backup_dir

start_server

Crash:

09:06:04 UTC - xtrabackup got signal 6 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Thread pointer: 0x118cb9000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
0 xtrabackup 0x000000010fbda9c8 my_print_stacktrace + 72
1 xtrabackup 0x000000011031ed15 handle_fatal_signal + 597
2 libsystem_platform.dylib 0x00007fff826dcf1a _sigtramp + 26
3 xtrabackup 0x0000000110ad9092 plugin_list + 45218
4 libsystem_c.dylib 0x00007fff82ddeb53 abort + 129
5 xtrabackup 0x000000010fc9f4e4 _Z27btr_cur_search_to_nth_levelP12dict_index_tmPK8dtuple_tmmP9btr_cur_tmPKcmP5mtr_t + 4500
6 xtrabackup 0x000000010ff3a103 _ZL17btr_pcur_open_lowP12dict_index_tmPK8dtuple_tmmP10btr_pcur_tPKcmP5mtr_t + 195
7 xtrabackup 0x000000010ff3a706 _Z22row_search_index_entryP12dict_index_tPK8dtuple_tmP10btr_pcur_tP5mtr_t + 166
8 xtrabackup 0x000000010ff324a8 _ZL33row_purge_remove_sec_if_poss_leafP12purge_node_tP12dict_index_tPK8dtuple_t + 344
9 xtrabackup 0x000000010ff315c0 _ZL28row_purge_remove_sec_if_possP12purge_node_tP12dict_index_tPK8dtuple_t + 64
10 xtrabackup 0x000000010ff30bca _ZL18row_purge_del_markP12purge_node_t + 298
11 xtrabackup 0x000000010ff305e5 _ZL21row_purge_record_funcP12purge_node_tPhPK9que_thr_tb + 133
12 xtrabackup 0x000000010ff2fc20 _ZL9row_purgeP12purge_node_tPhP9que_thr_t + 96
13 xtrabackup 0x000000010ff2fa4b _Z14row_purge_stepP9que_thr_t + 459
14 xtrabackup 0x000000010feb7d4e _ZL12que_thr_stepP9que_thr_t + 1022
15 xtrabackup 0x000000010feb73aa _ZL19que_run_threads_lowP9que_thr_t + 298
16 xtrabackup 0x000000010feb710e _Z15que_run_threadsP9que_thr_t + 206
17 xtrabackup ...

Read more...

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :
Download full text (4.6 KiB)

Test case #2:

##########################################################################
# Bug 1192834: Crash during apply with index compaction enabled #
##########################################################################

. inc/common.sh

start_server --innodb_file_per_table
load_dbase_schema sakila
load_dbase_data sakila

function start_uncomitted_transaction()
{
    run_cmd $MYSQL $MYSQL_ARGS sakila <<EOF
START TRANSACTION;
DELETE FROM payment;
SELECT SLEEP(10000);
EOF
}

start_uncomitted_transaction &
job_master=$!

sleep 2

backup_dir="$topdir/backup"

innobackupex --no-timestamp --compact $backup_dir
vlog "Backup created in directory $backup_dir"

kill -SIGKILL $job_master
stop_server

# Remove datadir
rm -r $mysql_datadir

# Restore sakila

innobackupex --apply-log --rebuild-indexes $backup_dir

vlog "Restoring MySQL datadir"
mkdir -p $mysql_datadir
innobackupex --copy-back $backup_dir

start_server

Crash:

InnoDB: Starting in background the rollback of uncommitted transactions
2015-04-28 16:07:16 10aa60000 InnoDB: Rolling back trx with id 1920, 16049 rows to undo
InnoDB: 128 rollback segment(s) are active.

InnoDB: Progress in percents: 1InnoDB: Waiting for purge to start
2015-04-28 16:07:16 10aa60000 InnoDB: Assertion failure in thread 4473618432 in file btr0cur.cc line 700
InnoDB: Failing assertion: fil_page_get_type(page) == 17855
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
09:07:16 UTC - xtrabackup got signal 6 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Thread pointer: 0x10aa60000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
0 xtrabackup 0x0000000101cab9c8 my_print_stacktrace + 72
1 xtrabackup 0x00000001023efd15 handle_fatal_signal + 597
2 libsystem_platform.dylib 0x00007fff826dcf1a _sigtramp + 26
3 ??? 0x0000000000000005 0x0 + 5
4 libsystem_c.dylib 0x00007fff82ddeb53 abort + 129
5 xtrabackup 0x0000000101d704e4 _Z27btr_cur_search_to_nth_levelP12dict_index_tmPK8dtuple_tmmP9btr_cur_tmPKcmP5mtr_t + 4500
6 xtrabackup 0x000000010200b103 _ZL17btr_pcur_open_lowP12dict_index_tmPK8dtuple_tmmP10btr_pcur_tPKcmP5mtr_t + 195
7 xtrabackup 0x000000010200b706 _Z22row_search_index_entryP12dict_index_tPK8dtuple_tmP10btr_pcur_tP5mtr_t + 166
8 xtrabackup 0x...

Read more...

tags: added: i66437
Revision history for this message
Andrew Garner (abg) wrote :

FWIW, this bug also affected openstack-ansible where they have been using the compact option for Galera SST for some time:

https://bugs.launchpad.net/openstack-ansible/+bug/1590166

I don't think this compact option makes much sense for Galera SST, but eh.

I saw this bug was mentioned in the 2.4.0-rc release notes:

https://www.percona.com/doc/percona-xtrabackup/2.4/release-notes/2.4/2.4.0-rc1.html

But seems to affect all the current releases (apparently openstack-ansible is using XB 2.2.13, and I encountered this under XB 2.3.5), so some clarification in the documentation as to when this feature might be appropriate might be useful until a fix is implemented.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

This bug is mentioned in 2.4.0-rc1 release notes with the following statement:
"Compact Backups currently don’t work due to bug #1192834."

In the subsequent release notes there is no mention of whether compact backups work in 2.4 or not. So do they work now or does the above statement still apply? I see that this bug is still open as well for almost 4 years.

Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

So it looks like compact backups still don't work. I can consistently reproduce crashes of compact backups:

InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: xtrabackup: Last MySQL binlog file position 663425927, file name mysql-bin.000117
InnoDB: Creating shared tablespace for temporary tables
InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
InnoDB: File './ibtmp1' size is now 12 MB.
InnoDB: 96 redo rollback segment(s) found. 1 redo rollback segment(s) are active.
InnoDB: 32 non-redo rollback segment(s) are active.
InnoDB: page_cleaner: 1000ms intended loop took 48773ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)
Starting 4 threads to rebuild indexes.
[01] Checking if there are indexes to rebuild in table some_meta/meta_data (space id: 6)
[04] Checking if there are indexes to rebuild in table teststore_main/like_a_boss_buffer (space id: 9)
[03] Checking if there are indexes to rebuild in table teststore_main/like_a_boss_buffer (space id: 6155)
[03] Found index uc_key
[03] Found index updated_idx
[03] Rebuilding 2 index(es).
06:00:54 UTC - xtrabackup got signal 11 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
/usr/bin/xtrabackup(my_print_stacktrace+0x2c)[0xce46fc]
/usr/bin/xtrabackup(handle_fatal_signal+0x261)[0x9d0991]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8f0)[0x7fbe4312f8f0]
/usr/bin/xtrabackup(_Z22row_merge_create_indexP5trx_tP12dict_table_tPK11index_def_tPK16dict_add_v_col_t+0x95)[0x8f6aa5]
/usr/bin/xtrabackup[0x729d75]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x80ca)[0x7fbe431280ca]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fbe40e0d8ad]

Please report a bug at https://bugs.launchpad.net/percona-xtrabackup

tags: added: i183230
Revision history for this message
Ovais Tariq (ovais-tariq) wrote :

Any updates here?

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

2.3 likely will be fixed in the next release. 2.4 likely will not allow to take compact backups at all.

Revision history for this message
Joilson Cardoso (joilson.cardoso) wrote :

Hey we got the same error. We'd like to know an estimation time to have it fixed for Linux CentOS.

regards,
Joilson Cardoso

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

--backup --compact disabled in 2.4 https://github.com/percona/percona-xtrabackup/pull/392. All related options will be removed in next major release.

Changed in percona-xtrabackup:
status: Fix Committed → Won't Fix
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-372

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.