Percona XtraBackup moved to https://jira.percona.com/projects/PXB

Bug #1630313
Comment #5

Comment 5 for bug 1630313

Revision history for this message

Wagner Bianchi (wagnerbianchijr) wrote on 2017-06-26:

I've got recently the same issue after starting up the SST to join a new server a one node cluster. The whole story is that the cluster crashed after a not optimal migration from 5.5 to 10.1 and then, we suspect as well that data corruption happened in the middle of this.

The below error happened in the middle of the prepared phase, after a 17 hours SST (it's 8TB data):

InnoDB: Progress in percent: 0 1 InnoDB: Resetting invalid page [page id: space=0, page number=5] type 0 to 7 when flushing.
InnoDB: Page dump in ascii and hex (16384 bytes):
len 16384; hex 80e7a79000000006000000000000000000000d47276...
InnoDB: End of page dump
InnoDB: Uncompressed page, stored checksum in field1 2162665360, calculated checksums for field1: crc32 3723529258/2303307544, innodb 2162665360, none 3735928559, stored checksum in field2 1805232181, calculated checksums for field2: crc32 3723529258/2303307544, innodb 1805232181, none 3735928559, page LSN 3399 661564699, low 4 bytes of LSN at page end 661564699, page number (if stored to page already) 6, space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be a freshly allocated page
[FATAL] InnoDB: Apparent corruption of an index page [page id: space=0, page number=6] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
2017-06-24 03:07:27 0x7f076dffb700 InnoDB: Assertion failure in thread 139669886973696 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
00:07:27 UTC - xtrabackup got signal 6 ;

Some facts here:

- prior to start new JOINER nodes, we run mysql_upgrade on donor, the only alive node;
- we do have pc.weights configured for one master and many slaves in place to avoid affecting the primary component on the alive node in case of joiner nodes crashing;
- all nodes now running the same version, MariaDB 10.1.23;
- we added pigz as the compressor for SST;
- we have fixed some corrupted indexes before.

With this, I think it can be:

- page size mismatch between database row compressed tables and the way apply-log is being done by xtrabackup, as mentioned by RStrace on #1630313 - here I don’t see yet a way to make it to work better or even influence the way xtrabackup creates its own my.cnf aka backup.my.cnf;

- a corrupted index that is silently not working on the alive database, but, when scanned by innobackupex, the error is leading to an assertion and a final signal 6;

Some questions:

Is there a way I can find the problematic index out of the above error? Any other way to find out if something is corrupted on the InnoDB side in addition to run innochecksum for 10TB of data?

The below error happened in the middle of the prepared phase, after a 17 hours SST (it's 8TB data):

InnoDB: Progress in percent: 0 1 InnoDB: Resetting invalid page [page id: space=0, page number=5] type 0 to 7 when flushing.
InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex 80e7a79000000006000000000000000000000d47276...
InnoDB: End of page dump
InnoDB: Uncompressed page, stored checksum in field1 2162665360, calculated checksums for field1: crc32 3723529258/2303307544, innodb 2162665360, none 3735928559, stored checksum in field2 1805232181, calculated checksums for field2: crc32 3723529258/2303307544, innodb 1805232181, none 3735928559,  page LSN 3399 661564699, low 4 bytes of LSN at page end 661564699, page number (if stored to page already) 6, space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be a freshly allocated page
[FATAL] InnoDB: Apparent corruption of an index page [page id: space=0, page number=6] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
2017-06-24 03:07:27 0x7f076dffb700  InnoDB: Assertion failure in thread 139669886973696 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
00:07:27 UTC - xtrabackup got signal 6 ;

Some facts here:

- prior to start new JOINER nodes, we run mysql_upgrade on donor, the only alive node;
 - we do have pc.weights configured for one master and many slaves in place to avoid affecting the primary component on the alive node in case of joiner nodes crashing;
 - all nodes now running the same version, MariaDB 10.1.23;
 - we added pigz as the compressor for SST;
 - we have fixed some corrupted indexes before.

With this, I think it can be:

- a corrupted index that is silently not working on the alive database, but, when scanned by innobackupex, the error is leading to an assertion and a final signal 6;

Some questions:

Is there a way I can find the problematic index out of the above error? Any other way to find out if something is corrupted on the InnoDB side in addition to run innochecksum for 10TB of data?