innobackupex should remove stale xtrabackup_suspended file on start

Reported by Alexey Kopytov on 2012-06-01
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup
Medium
George Ormond Lorch III
1.6
Undecided
Unassigned
2.0
Medium
George Ormond Lorch III
2.1
Medium
George Ormond Lorch III

Bug Description

I'm guessing that one night the system crashed while running innobackupex and left the $suspend_file, /tmp/xtrabackup_suspended, on the filesystem. It took a while for me to figure out that this then causes innobackupex to think xtrabackup has suspended itself the moment it starts, and then when xtrabackup actually does suspend itself innobackupex is waiting for it to end and does not re-remove the $suspend_file, leading to a wait deadlock.

A cron job runs innobackupex, and I'm wondering if perhaps I should add removing /tmp/xtrabackup_suspended to the beginning of the job? I'm wondering if there is some inbuilt method for locking out or at least detecting an interrupted, or worse a concurrently running, innobackupex that I should be taking advantage of?

I suppose not even detecting the presence of an already-existing $suspend_file could be considered a bug of sorts in innobackupex?

This is the process if there is a xtrabackup_suspended on tmpdir.

1- innobackupex starts
2- innobackupex launches xtrabackup
3- innobackupex sees that there is a xtrabackup_suspended, so it thinks that xtrabackup finishes copying *.ibd files. Therefore, both, xtrabackup and innobackupex start their tasks at the same time

[01] Streaming /m01/mysql/ibdata/ibdata1

120924 15:37:57 innobackupex: Continuing after ibbackup has suspended
120924 15:37:57 innobackupex: Starting mysql with options: --host='127.0.0.1' --unbuffered --
120924 15:37:57 innobackupex: Connected to database with mysql child process (pid=5652)
>> log scanned up to (4110378914)
120924 15:38:02 innobackupex: Starting to lock all tables...
>> log scanned up to (4110378914)
>> log scanned up to (4110378914)
[01] ...done
[01] Streaming ./employees5/dept_emp.ibd

As we can see here, innobackupex starts to lock tables but xtrabackup didn't finish.

4- innobackupex finishes copying *.frm files and delete xtrabackup_suspended

120924 15:38:39 innobackupex: Finished backing up .frm, .MRG, .MYD, .MYI, .TRG, .TRN, .ARM, .ARZ, .CSV, .CSM and .opt files
innobackupex: Resuming ibbackup
[01] ...done
[01] Streaming ./employees10/dept_manager.ibd

5- xtrabackup finishes copying *.ibd files and creates a xtrabackup_suspended

[01] Streaming ./employees10/departments.ibd
[01] ...done
>> log scanned up to (4110378914)

6- xtrabackups waits to innobackupex to delete xtrabackup_suspended, but innobackupex finished all its task before, so nobody is going to delete that file. therefore, xtrabackup stays in "log scanned up to" forever.

>> log scanned up to (4110378914)
>> log scanned up to (4110378914)
>> log scanned up to (4110378914)
>> log scanned up to (4110378914)
>> log scanned up to (4110378914)
....

This problem could also cause a disk space problem, because it will stay copying redo log *forever*.

tags: added: i26194
tags: added: i28451
Jonas Bygdén (jbygden) wrote :

Was this reintroduced in 2.1.3?

We upgraded to 2.1.3 last week, and have started seeing this since. If I downgrade to 2.0.5, the problem disappears.

Vipul Sabhaya (vipuls) wrote :

We're seeing something similar in 2.1.5 potentially as well. Here is a log exhibiting similar behavior.

 >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    [01] ...done
    >> log scanned up to (284142539120)
    xtrabackup: Creating suspend file '/tmp/xtrabackup_suspended_2' with pid '23557'

    131121 20:13:18 innobackupex: Continuing after ibbackup has suspended
    131121 20:13:18 innobackupex: Starting to lock all tables...
    131121 20:13:18 innobackupex: All tables locked and flushed to disk

    131121 20:13:18 innobackupex: Starting to backup non-InnoDB tables and files
    innobackupex: in subdirectories of '/var/lib/mysql'
    innobackupex: Backing up files '/var/lib/mysql/mysql/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (72 files)
    >> log scanned up to (284142539120)
    innobackupex: Backing up file '/var/lib/mysql/rvr/db.opt'
    innobackupex: Backing up files '/var/lib/mysql/performance_schema/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (18 files)
    innobackupex: Backing up file '/var/lib/mysql/scheduler/db.opt'
    innobackupex: Backing up file '/var/lib/mysql/reporting/db.opt'
    innobackupex: Backing up files '/var/lib/mysql/publisher/*.{frm,isl,MYD,MYI,MAD,MAI,MRG,TRG,TRN,ARM,ARZ,CSM,CSV,opt,par}' (43 files)
    >> log scanned up to (284142539120)
    131121 20:13:19 innobackupex: Finished backing up non-InnoDB tables and files

    131121 20:13:19 innobackupex: Waiting for log copying to finish

    xtrabackup: The latest check point (for incremental): '284142539120'
    xtrabackup: Stopping log copying thread.
    .>> log scanned up to (284142539120)

    xtrabackup: Creating suspend file '/tmp/xtrabackup_log_copied' with pid '23557'
    xtrabackup: Transaction log of lsn (284142539120) to (284142539120) was copied.
    131121 20:13:20 innobackupex: All tables unlocked

    innobackupex: Backup created in directory '/var/lib/mysql'
    innobackupex: MySQL binlog position: filename 'mysql-bin.001369', position 107
    131121 20:13:20 innobackupex: Connection to database server closed
    131121 20:13:20 innobackupex: completed OK!
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)
    >> log scanned up to (284142539120)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers