XtraBackup 2.4 crashes during prepare stage of SST with MariaDB Galera Cluster 10.0

Bug #1630313 reported by Geoff Montee
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
New
Undecided
Unassigned

Bug Description

A user running MariaDB Galera Cluster 10.0.27 has been seeing the following crash during the prepare stage of SSTs every time when using Percona XtraBackup 2.4.4:

InnoDB: End of page dump
InnoDB: Uncompressed page, stored checksum in field1 1053245982, calculated checksums for field1: crc32 617549409/1329081819, innodb 1053245982, none 3735928559, stored checksum in field2 2978102621, calculated checksums for field2: crc32 617549409/1329081819, innodb 2978102621, none 3735928559, page LSN 5636 1857813411, low 4 bytes of LSN at page end 1857813411, page number (if stored to page already) 5, space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be a freshly allocated page
[FATAL] InnoDB: Apparent corruption of an index page [page id: space=0, page number=5] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
2016-09-20 11:04:48 0x7fed96eba740 InnoDB: Assertion failure in thread 140658416002880 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
11:04:48 UTC - xtrabackup got signal 6 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
innobackupex(my_print_stacktrace+0x2c)[0xcb280c]
innobackupex(handle_fatal_signal+0x261)[0xaecee1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7fed96a998d0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fed94d93067]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fed94d94448]
innobackupex[0x713436]
innobackupex(_ZN2ib5fatalD1Ev+0x15d)[0x9315dd]
innobackupex[0x97be8a]
innobackupex(_Z27buf_dblwr_write_single_pageP10buf_page_tb+0x698)[0x97d288]
innobackupex(_Z14buf_flush_pageP10buf_pool_tP10buf_page_t11buf_flush_tb+0x77d)[0x7d96bd]
innobackupex(_Z30buf_flush_single_page_from_LRUP10buf_pool_t+0x1d3)[0x7dcfe3]
innobackupex(_Z22buf_LRU_get_free_blockP10buf_pool_t+0x221)[0xd35051]
innobackupex(_Z22buf_page_init_for_readP7dberr_tmRK9page_id_tRK11page_size_tm+0x17d)[0x8fdb7d]
innobackupex[0xd43539]
innobackupex(_Z19buf_read_recv_pagesbmPKmm+0x453)[0xd45023]
innobackupex(_Z26recv_apply_hashed_log_recsm+0xccb)[0x91674b]
innobackupex[0x918ba8]
innobackupex(_Z35recv_recovery_from_checkpoint_startm+0x1994)[0x91a8c4]
innobackupex(_Z34innobase_start_or_create_for_mysqlv+0x3934)[0x955484]
innobackupex[0x709c3d]
innobackupex[0x70bc76]
innobackupex(main+0xc2f)[0x71c02f]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fed94d7fb45]
innobackupex[0x7304c4]

Please report a bug at https://bugs.launchpad.net/percona-xtrabackup

Reverting to Percona XtraBackup 2.2 fixed the problem for this user.

I have not yet been able to reproduce the problem myself.

Tags: mariadb
Revision history for this message
Geoff Montee (geoff-montee) wrote :

The user seeing this crash also reported that unsetting log_bin made the crash stop happening when using Percona XtraBackup 2.4.4. The user had the following log_bin settings:

log_bin = /var/log/mysql/mariadb-bin
log_bin_index = /var/log/mysql/mariadb-bin.index

Revision history for this message
Richard (richard-stracke) wrote :

I assume this bug is also related to

https://bugs.launchpad.net/percona-xtrabackup/+bug/1528271

xtrabackup: xtrabackup_logfile detected: size=4099145728, start_lsn=(24208263937616)
xtrabackup: using the following InnoDB configuration for recovery:
xtrabackup: innodb_data_home_dir = .
xtrabackup: innodb_data_file_path = ibdata1:512M:autoextend
xtrabackup: innodb_log_group_home_dir = .
xtrabackup: innodb_log_files_in_group = 1
xtrabackup: innodb_log_file_size = 4099145728

which is different from the backup.my.cnf created from xtrabackup 2.4

 This MySQL options file was generated by innobackupex.

# The MySQL server
[mysqld]
innodb_checksum_algorithm=innodb
innodb_log_checksum_algorithm=innodb
innodb_data_file_path=ibdata1:512M:autoextend
innodb_log_files_in_group=4
innodb_log_file_size=943718400
innodb_fast_checksum=false
innodb_page_size=16384
innodb_log_block_size=512
innodb_undo_directory=.
innodb_undo_tablespaces=0
server_id=169119664

redo_log_version=0

I assume, that the root cause is happened by creating the wrong innodb_log_file_size.

Perhaps a followup bug for the recently fixed

https://bugs.launchpad.net/percona-xtrabackup/+bug/1517629

Richard Stracke

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

possibly duplicate of bug 1641426

tags: added: mariadb
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Will mark it as duplicate. If issue still occurs, please feel free to reopen

Revision history for this message
Wagner Bianchi (wagnerbianchijr) wrote :

I've got recently the same issue after starting up the SST to join a new server a one node cluster. The whole story is that the cluster crashed after a not optimal migration from 5.5 to 10.1 and then, we suspect as well that data corruption happened in the middle of this.

The below error happened in the middle of the prepared phase, after a 17 hours SST (it's 8TB data):

InnoDB: Progress in percent: 0 1 InnoDB: Resetting invalid page [page id: space=0, page number=5] type 0 to 7 when flushing.
InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex 80e7a79000000006000000000000000000000d47276...
InnoDB: End of page dump
InnoDB: Uncompressed page, stored checksum in field1 2162665360, calculated checksums for field1: crc32 3723529258/2303307544, innodb 2162665360, none 3735928559, stored checksum in field2 1805232181, calculated checksums for field2: crc32 3723529258/2303307544, innodb 1805232181, none 3735928559, page LSN 3399 661564699, low 4 bytes of LSN at page end 661564699, page number (if stored to page already) 6, space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be a freshly allocated page
[FATAL] InnoDB: Apparent corruption of an index page [page id: space=0, page number=6] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
2017-06-24 03:07:27 0x7f076dffb700 InnoDB: Assertion failure in thread 139669886973696 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
00:07:27 UTC - xtrabackup got signal 6 ;

Some facts here:

 - prior to start new JOINER nodes, we run mysql_upgrade on donor, the only alive node;
 - we do have pc.weights configured for one master and many slaves in place to avoid affecting the primary component on the alive node in case of joiner nodes crashing;
 - all nodes now running the same version, MariaDB 10.1.23;
 - we added pigz as the compressor for SST;
 - we have fixed some corrupted indexes before.

With this, I think it can be:

 - page size mismatch between database row compressed tables and the way apply-log is being done by xtrabackup, as mentioned by RStrace on #1630313 - here I don’t see yet a way to make it to work better or even influence the way xtrabackup creates its own my.cnf aka backup.my.cnf;

 - a corrupted index that is silently not working on the alive database, but, when scanned by innobackupex, the error is leading to an assertion and a final signal 6;

Some questions:

Is there a way I can find the problematic index out of the above error? Any other way to find out if something is corrupted on the InnoDB side in addition to run innochecksum for 10TB of data?

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Hi Wagner,

There is a bug 1672374 about 6th page uninitialized, the patch I am woking on will reset page type for few more predefined pages.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.