innobackupex continuing with FTWRL even when xtrabackup failed

Reported by Jay Janssen on 2013-04-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup
Medium
Alexey Kopytov
2.0
Medium
Alexey Kopytov
2.1
Medium
Alexey Kopytov

Bug Description

>> log scanned up to (300606271182)
xtrabackup: error: log block numbers mismatch:
xtrabackup: error: expected log block no. 587121624, but got no. 587142096 from the log file.
xtrabackup: error: it looks like InnoDB log has wrapped around before xtrabackup could process all records due to either log copying being too slow, or log files being too small.
xtrabackup: Error: xtrabackup_copy_logfile() failed.
[01] ...done

130418 01:35:49 innobackupex: Continuing after ibbackup has suspended
130418 01:35:49 innobackupex: Starting mysql with options: --defaults-file='/etc/mysql/my.cnf' --password=xxxxxxxx --user='xtrabackup' --socket='/var/run/mysqld/mysqld.sock' --unbuffered --
130418 01:35:49 innobackupex: Connected to database with mysql child process (pid=15080)
130418 01:35:51 innobackupex: Starting to lock all tables...
130418 01:36:05 innobackupex: All tables locked and flushed to disk

130418 01:36:05 innobackupex: Starting to backup non-InnoDB tables and files
innobackupex: in subdirectories of '/var/lib/mysql'
innobackupex: Backing up files '/var/lib/mysql/reg1_00009972/*.{frm,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (187 files)

...

In this case there are thousands of databases and millions of .frm files, so the FTWRL was held for 2 hours before the backup ultimately failed anyway.

Note the user has 5M log files (default), hence the xtrabackup failure.

This was 2.0.4, so I'm not 100% sure this isn't already fixed, but I figured I'd file anyway.

summary: - innobackupex continuing with FLTWRL even when xtrabackup failed
+ innobackupex continuing with FTWRL even when xtrabackup failed
Alexey Kopytov (akopytov) wrote :

The root cause is that if the log copying thread fails, it does not terminate the process immediately. Instead only the thread is terminated. The main xtrabackup thread will later check if the 'log_copying_succeed' flag is FALSE, and terminate the process. But that happens later, after xtrabackup is done with copying InnoDB files and signalling innobackupex to lock tables and proceed with copying non-InnoDB files.

The fix should be to terminate the xtrabackup process immediately on log copying failure. I don't see any reasons to delay checking its status and fail much later.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers