Orphaned xtrabackup_pid file Breaks Cluster SST

Reported by Jervin R on 2013-05-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona XtraBackup
Medium
Sergei Glushchenko
2.0
High
Sergei Glushchenko
2.1
Medium
Sergei Glushchenko

Bug Description

Say if for some reason an orphaned xtrabackup_pid file is left inside tmpdir i.e. a manual backup that was killed and did not cleanup properly and the MySQL user is not able to delete it from wsre_sst_xtrabackup - the SST would fail even after the prepare phase on the JOINER is complete. On the DONOR you can get this message

130502 22:19:31 [Note] WSREP: Provider paused at 16e4deca-b395-11e2-0800-2919c4e8dc9a:0
130502 22:19:42 [Note] WSREP: Provider resumed.
rm: cannot remove `/tmp/xtrabackup_pid': Operation not permitted
130502 22:19:44 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup --role 'donor' --address '192.168.56.54:4444/xtrabackup_sst' --auth '(null)' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '16e4deca-b395-11e2-0800-2919c4e8dc9a:0'
130502 22:19:44 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'donor' --address '192.168.56.54:4444/xtrabackup_sst' --auth '(null)' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '16e4deca-b395-11e2-0800-2919c4e8dc9a:0': 1 (Operation not permitted)
130502 22:19:44 [Warning] WSREP: 0 (uxdbc01): State transfer to 1 (uxdbc02) failed: -1 (Operation not permitted)

I filed this bug here because I think innobackupex should be doing this as pre-flight check i.e. if a pid-file exists already and it cannot be deleted it should bail immediately if necessary unlike bailing out after the prepare stage which could've caused significant amount of time for the user.

Jervin R (revin) on 2013-05-03
tags: added: 31156
tags: added: i31156
removed: 31156
Alexey Kopytov (akopytov) wrote :

The fix is to test if xtrabackup_pid exists once innobackupex starts, try to remove or fail if it cannot be removed (similar to suspend_file check in init()).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers