Orphaned xtrabackup_pid file Breaks Cluster SST
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona XtraBackup moved to https://jira.percona.com/projects/PXB |
Fix Released
|
Medium
|
Sergei Glushchenko | ||
2.0 |
Fix Released
|
High
|
Sergei Glushchenko | ||
2.1 |
Fix Released
|
Medium
|
Sergei Glushchenko |
Bug Description
Say if for some reason an orphaned xtrabackup_pid file is left inside tmpdir i.e. a manual backup that was killed and did not cleanup properly and the MySQL user is not able to delete it from wsre_sst_xtrabackup - the SST would fail even after the prepare phase on the JOINER is complete. On the DONOR you can get this message
130502 22:19:31 [Note] WSREP: Provider paused at 16e4deca-
130502 22:19:42 [Note] WSREP: Provider resumed.
rm: cannot remove `/tmp/xtrabacku
130502 22:19:44 [ERROR] WSREP: Failed to read from: wsrep_sst_
130502 22:19:44 [ERROR] WSREP: Process completed with error: wsrep_sst_
130502 22:19:44 [Warning] WSREP: 0 (uxdbc01): State transfer to 1 (uxdbc02) failed: -1 (Operation not permitted)
I filed this bug here because I think innobackupex should be doing this as pre-flight check i.e. if a pid-file exists already and it cannot be deleted it should bail immediately if necessary unlike bailing out after the prepare stage which could've caused significant amount of time for the user.
Related branches
- Alexey Kopytov (community): Approve
- George Ormond Lorch III (community): Approve (g2)
-
Diff: 36 lines (+21/-0)2 files modifiedinnobackupex (+7/-0)
test/t/bug1175860.sh (+14/-0)
- Alexey Kopytov (community): Approve
- George Ormond Lorch III (community): Approve (g2)
-
Diff: 38 lines (+23/-0)2 files modifiedinnobackupex (+9/-0)
test/t/bug1175860.sh (+14/-0)
tags: | added: 31156 |
tags: |
added: i31156 removed: 31156 |
The fix is to test if xtrabackup_pid exists once innobackupex starts, try to remove or fail if it cannot be removed (similar to suspend_file check in init()).