Failed recover of table if backup was taken while altering tablespace

Bug #1532878 reported by Shahriyar Rzayev
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Status tracked in 2.4
2.4
Invalid
Undecided
Unassigned

Bug Description

Hi,

Here is reproduce steps:

I have table:

CREATE TABLE `sbtest1` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=1000001 DEFAULT CHARSET=latin1 MAX_ROWS=1000000

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 4240148 |
+----------+
1 row in set (2,40 sec)

Which you can easily populate using:

sysbench --test=/usr/share/doc/sysbench/tests/db/insert.lua --oltp-test-mode=complex --num-threads=1000 --mysql-db=dbtest --mysql-user=root --mysql-password=Baku12345# --db-driver=mysql --max-requests=0 run

Create General Tablespace:

mysql> create tablespace sbtest1 add datafile '/var/lib/mysql_tablespaces/sbtest1.ibd' engine=innodb;
Query OK, 0 rows affected (0,12 sec)

Then alter sbtest1 table's tablespace:

mysql> alter table sbtest1 tablespace=sbtest1;
Query OK, 0 rows affected (36,78 sec)
Records: 0 Duplicates: 0 Warnings: 0

Of course we want our per table file back. Run following parallel with backup command:

alter table sbtest1 tablespace=innodb_file_per_table;

./xtrabackup --defaults-file=/etc/my.cnf --backup --datadir=/var/lib/mysql/ --target-dir=/home/backup_dir/full/ --user=root --password=Baku12345# --no-version-check

Then prepare full backup:

./xtrabackup --defaults-file=/home/backup_dir/full/backup-my.cnf --prepare --target-dir=/home/backup_dir/full

Some interesting output:

InnoDB: Ignoring data file './dbtest/sbtest1.ibd' with space ID 26. Another data file called ./dbtest/#sql-ib41-1284613578.ibd exists with the same space ID.
InnoDB: Ignoring data file './dbtest/#sql-ib41-1284613578.ibd' with space ID 26. Another data file called ./dbtest/sbtest1.ibd exists with the same space ID.
InnoDB: xtrabackup: Last MySQL binlog file position 581966806, file name mysql-bin.000004
InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create directories yourself, InnoDB does not create them.
InnoDB: Cannot open datafile for read-only: './sbtest1.ibd' OS error: 71
InnoDB: Operating system error number 2 in a file operation.
InnoDB: The error means the system cannot find the path specified.
InnoDB: If you are installing InnoDB, remember that you must create directories yourself, InnoDB does not create them.
InnoDB: Could not find a valid tablespace file for `sbtest1`
InnoDB: Shutdown completed; log sequence number 2383808274
160111 12:02:35 completed OK!

Shutdown MySQL + remove /var/lib/mysql/ files copy-back all files from database + start MySQL:

systemctl stop mysqld.service
mv /var/lib/mysql /var/lib/mysql_original
[root@mysql-57 full]# rsync -avrP * /var/lib/mysql/
[root@mysql-57 ~]# chown -R mysql:mysql /var/lib/mysql/

systemctl start mysqld.service

Look for table's data:

mysql> select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0,00 sec)

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :
Revision history for this message
Roel Van de Paar (roel11) wrote :

Upstream fix will only come in 5.8.

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Shahriyar,

Can you please check that after fixing bug 1555626 behaviour changed to fatal error on backup stage:

[FATAL] InnoDB: An optimized(without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet.
    PXB will not be able take a consistent backup. Retry the backup operation

Revision history for this message
Jericho Rivera (jericho-rivera) wrote :

On PXB 2.4.3 and PS 5.7.12

160716 04:00:07 >> log scanned up to (24765211836553)
160716 04:00:09 >> log scanned up to (24765212279702)
InnoDB: Last flushed lsn: 24765208267313 load_index lsn 24765213489073
[FATAL] InnoDB: An optimized(without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet.
    PXB will not be able take a consistent backup. Retry the backup operation
2016-07-16 04:00:10 0x7fb14a101700 InnoDB: Assertion failure in thread 140399428507392 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
04:00:10 UTC - xtrabackup got signal 6 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
innobackupex(my_print_stacktrace+0x2c)[0xc4e5ec]
innobackupex(handle_fatal_signal+0x262)[0xa548a2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7fb16f764330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fb16dcc0c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fb16dcc4028]
innobackupex[0x6f5d75]
innobackupex(_ZN2ib5fatalD1Ev+0xee)[0x924afe]
innobackupex[0x901feb]
innobackupex(_Z19recv_parse_log_recsm7store_tb+0x404)[0x906884]
innobackupex[0x718728]
innobackupex[0x71c759]
innobackupex[0x71cc43]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7fb16f75c184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fb16dd8437d]

Please report a bug at https://bugs.launchpad.net/percona-xtrabackup

tags: added: i113961
Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Jericho,

Does it mean the bug is no longer reproducible as reported?

Revision history for this message
Maciek Brynski (maver1ck) wrote :
Download full text (3.9 KiB)

Hi,
I have very similar bug.
I'm trying to bootstrap cluster (2 DB + 1 Arbitrator)
During start of second server I'm getting following error in innobackup.backup.log.
What can I do ?

161106 11:59:57 version_check Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_group=xtrabackup;port=3306;mysql_socket=/var/run/mysqld/mysqld.sock' as 'dbbackup' (using password: YES).
161106 11:59:57 version_check Connected to MySQL server
161106 11:59:57 version_check Executing a version check against the server...
161106 11:59:57 version_check Done.
161106 11:59:57 Connecting to MySQL server host: localhost, user: dbbackup, password: set, port: 3306, socket: /var/run/mysqld/mysqld.sock
Using server version 5.7.14-8-57-log
xtrabackup version 2.4.4 based on MySQL server 5.7.13 Linux (x86_64) (revision id: df58cf2)
xtrabackup: uses posix_fadvise().
xtrabackup: cd to /data_ssd/mysql
xtrabackup: open files limit requested 10440, set to 10440
xtrabackup: using the following InnoDB configuration:
xtrabackup: innodb_data_home_dir = .
xtrabackup: innodb_data_file_path = ibdata1:12M:autoextend
xtrabackup: innodb_log_group_home_dir = ./
xtrabackup: innodb_log_files_in_group = 4
xtrabackup: innodb_log_file_size = 1073741824
xtrabackup: using O_DIRECT
InnoDB: Number of pools: 1
161106 12:00:00 >> log scanned up to (86550191631785)
xtrabackup: Generating a list of tablespaces
InnoDB: Allocated tablespace ID 954142 for database/table, old maximum was 0
161106 12:00:00 [01] Streaming ./ibdata1
161106 12:00:01 >> log scanned up to (86550191879568)
161106 12:00:02 >> log scanned up to (86550192114963)
161106 12:00:03 >> log scanned up to (86550192957921)
161106 12:00:04 >> log scanned up to (86550193121705)
161106 12:00:05 >> log scanned up to (86550193271731)
161106 12:00:06 >> log scanned up to (86550193497677)
161106 12:00:07 >> log scanned up to (86550193825055)
InnoDB: Last flushed lsn: 86550191221755 load_index lsn 86550194084059
[FATAL] InnoDB: An optimized(without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet.
    PXB will not be able take a consistent backup. Retry the backup operation
2016-11-06 12:00:08 0x7fab5d9e6700 InnoDB: Assertion failure in thread 140373986797312 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
12:00:08 UTC - xtrabackup got signal 6 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, someth...

Read more...

Przemek (pmalkowski)
tags: added: i167828
Revision history for this message
Dov (dov.endress) wrote :
Download full text (4.1 KiB)

Sergei,

This bug is totally repeatable as noted:

xtrabackup -version
xtrabackup version 2.4.5 based on MySQL server 5.7.13 Linux (x86_64) (revision id: e41c0be)

host (none)> select @@version;
+---------------+
| @@version |
+---------------+
| 5.7.17-11-log |
+---------------+
1 row in set (0.00 sec)

laslsql10p (none)>

170305 16:00:49 >> log scanned up to (38864632546319)
InnoDB: Last flushed lsn: 38856200245485 load_index lsn 38864780967142
[FATAL] InnoDB: An optimized(without redo logging) DDLoperation has been performed. All modified pages may not have been flushed to the disk yet.
    PXB will not be able take a consistent backup. Retry the backup operation
2017-03-05 16:00:55 0x7f8ef08a4700 InnoDB: Assertion failure in thread 140251897677568 in file ut0ut.cc line 916
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:00:55 UTC - xtrabackup got signal 6 ;
This could be because you hit a bug or data is corrupted.
This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x10000
/usr/bin/innobackupex(my_print_stacktrace+0x2c)[0xceaf1c]
/usr/bin/innobackupex(handle_fatal_signal+0x262)[0x9ca752]
/lib64/libpthread.so.0(+0xf370)[0x7f8f24920370]
/lib64/libc.so.6(gsignal+0x37)[0x7f8f226201d7]
/lib64/libc.so.6(abort+0x148)[0x7f8f226218c8]
/usr/bin/innobackupex[0x6f2e09]
/usr/bin/innobackupex(_ZN2ib5fatalD1Ev+0xee)[0x8982de]
/usr/bin/innobackupex[0x83454b]
/usr/bin/innobackupex(_Z19recv_parse_log_recsm7store_tb+0x404)[0x838d74]
/usr/bin/innobackupex[0x71d430]
/usr/bin/innobackupex[0x720739]
/usr/bin/innobackupex[0x720c23]
/lib64/libpthread.so.0(+0x7dc5)[0x7f8f24918dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f8f226e273d]

Please report a bug at https://bugs.launchpad.net/percona-xtrabackup

root@DC> grep 'xtrabackup got signal 6' xtrabackup-20170*
xtrabackup-20170203.log:08:40:02 UTC - xtrabackup got signal 6 ;
xtrabackup-20170208.log:07:29:24 UTC - xtrabackup got signal 6 ;
xtrabackup-20170209.log:08:17:19 UTC - xtrabackup got signal 6 ;
xtrabackup-20170210.log:08:40:02 UTC - xtrabackup got signal 6 ;
xtrabackup-20170210.log:10:18:40 UTC - xtrabackup got signal 6 ;
xtrabackup-20170211.log:08:33:44 UTC - xtrabackup got signal 6 ;
xtrabackup-20170211.log:12:46:51 UTC - xtrabackup got signal 6 ;
xtrabackup-20170214.log:07:49:45 UTC - xtrabackup got signal 6 ;
xtrabackup-20170215.log:07:47:37 UTC - xtrabackup got signal 6 ;
xtrabackup-20170216.log:07:31:58 UTC - xtrabackup got signal 6 ;
xtrabackup-2017022...

Read more...

Revision history for this message
Aurimas Mikalauskas (aurimas-mikalauskas) wrote :

Hello, Fellas. I'm hitting the same bug but in different circumstances. Master is creating and dropping explicit innodb temporary tables pretty often and if I'm taking a backup from the slave while replication is running, I keep hitting this issue.

Only when I stop replication, the issue went away. So I ended up setting up replicate-wild-ignore-table for these tables which worked out well in my case. But I wonder what's the long term solution for this. I mean, as I understand any DDL performed while the backup is running would cause this assertion failure. But it's not that unusual to have DDLs running while taking backup, no ?

Thanks!

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Aurimas,

As a workaround you can try to SET GLOBAL old_alter_table=1 before you start backup and restore it later. The underlying issue is that MySQL 5.7 skipping redo logging for some DDLs. When xtrabackup detects it, it aborts the backup to prevent creating corrupted backup. Only available workaround is to make server to stop skipping redo logging for DDLs either by forcing ALGORITHM=COPY or by avoiding bad DDLs.

tags: added: i182889
Revision history for this message
Suresh Durgappa (sureshlaz) wrote :

Hi Sergei,

Thanks for the workaround but apparently it is not working when we tested it. May be there are some DDLs which are executed the application which I am unable to pin point.

What I fail to understand is that, DDLs (without redo logging) were supported in 5.6 and they aren't supported in 5.7. Will this be considered as a bug and fixed in the near future?

Just FYI - https://www.percona.com/forums/questions-discussions/percona-xtrabackup/46578-innobackup-fails-on-different-positions

Thanks,
Suresh

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Hi Suresh,

It is true that DDLs in 5.6 are supported and aren't supported in 5.7. But the cause for it is the change in server which makes it impossible to recover DDLs from redo logs. You are right, old_alter_table may not work for running sessions. Another option is to block DDLs while backup is running. If you are using Percona Server you may try to take a backup lock - LOCK TABLES FOR BACKUP - at the beginning of the backup and release it when the backup is done. It will block DDLs, but will not affect DMLs.

Since DDLs simply aren't written to redo logs, all we can do in xtrabackup is to workaround this issue somehow by either blocking DDLs or forcing DDLs to use COPY algorithm. I am afraid there will not be perfect solution for everyone.

Revision history for this message
Suresh Durgappa (sureshlaz) wrote :

Hi Sergei,

Thanks for your response.

We are using Percona Server and our APIs are tightly coupled with the database in terms DDL operations. It will be a hard sell to convince Dev to get rid of DDLs. Nevertheless, I will pitch this idea.

Wondering how was this handled in 5.6. Sorry for persisting.

Thanks,
Suresh

Revision history for this message
Sergei Glushchenko (sergei.glushchenko) wrote :

Since originally described behavoiur (crash on prepare) is no longer reproducible, I will mark this bug as "Invalid". For those who want to avoid "An optimized(without redo logging) DDLoperation has been performed" error, there will be couple of options in upcoming 2.4.8
https://blueprints.launchpad.net/percona-xtrabackup/+spec/mdl-lock-tables

Revision history for this message
Rick Pizzi (pizzi) wrote :

Tried the new --lock-ddl option but as soon as backup starts, it locks the SQL thread of one of my channels ... https://bugs.launchpad.net/percona-xtrabackup/+bug/1706873

Revision history for this message
Rick Pizzi (pizzi) wrote :

old_alter_table=1 workaround works for us.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-1360

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.