Percona XtraBackup moved to https://jira.percona.com/projects/PXB

Bug #1183322
Comment #17

Comment 17 for bug 1183322

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-08-08:

#17

I don't see a way to fix this (i.e. remove the one-descriptor-per-tablespace requirement for XtraBackup) without a risk of losing data on table renames.

Suppose that the fix for bug #1079700 is reverted, i.e. instead of opening all tablespaces on backup start, XtraBackup just opens and closes tablespaces when it has to copy them. In this case, it has to detect tablespaces renames and deal with them somehow to avoid losing data:

1. it can detect a renamed tablespace if at the time it has to open it for copying the space ID from first page (or inode from the filesystem, it doesn't matter) doesn't match space ID or inode this tablespace had at the time when the list of files was created. So detecting such conditions is the easy part.

2. What can be done to still copy the corresponding tablespace even if its name got changed while backup was in progress? XtraBackup has to discover its new name in the filesystem. In order to do so, it has to scan all tablespaces in all directories under datadir, and find the one having the same space ID or inode values, or consider the tablespace removed if it cannot be found.

3. The above would work fine if either the tablespace can never be renamed again while the directory scan is in progress, or we could scan directories atomically (i.e. we could get a snapshot of directories contents and iterate it without interfering with concurrent file renames/removals/creation). There's nothing that would guarantee either of those conditions:

a) there's always a possibility that the tablespace we are looking for is renamed again while the directory scan is in progress (and that in itself may take minutes on servers with huge numbers of tablespaces)

b) there's no way to scan a directory atomically, let alone scanning multiple directories atomically. The opendir()/readdir() combo does not provide any atomicity. For example, if a file is renamed between the opendir() call and subsequent readdir() calls, then depending on timing, we could see the old name file name, the new file name, none, or even both! I am not aware of any utilities that are capable of handling this in a reasonable way: "rm -f", rmdir, cp, rsync -- all may provide inconsistent and unexpected results if there are concurrent directory changes while the directory scan is in progress. So if XtraBackup scans the directory and doesn't find a new name for the tablespace it is looking for, does it mean the tablespace got removed so we can omit it from the backup? No, it could also mean that the tablespace has been renamed again, and we are back to the original problem we were trying to solve by the scan.

With the above in mind, I don't see a better way to handle tablespace renames than what I implemented as a fix for bug #1079700. Any attempt to fix it differently would result in lost data.

So I'm converting this bug back to a documentation request. We should document that:

1. The number of file descriptors required by XtraBackup on the backup stage is at most (number_of_tablespaces_to_copy * 2 + 10), where number_of_tablespaces_to_copy is either the total number of InnoDB tablespaces in the server (for full backups), or the number of tablespaces that would match conditions (for partial backups).

2. When XtraBackup fails with the following message, it means that the operating system limit on available file descriptors has been exceeded:

InnoDB: Error number 24 means 'Too many open files'.
InnoDB: Some operating system error numbers are described at
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/operating-system-error-codes.html
InnoDB: Error: could not open single-table tablespace file
InnoDB: ./<databasename>/<tablename>.ibd!

3. There are 2 kinds of limits on file descriptors:
a) per-user limit that can be checked and adjusted for the current session by the "ulimit -n" command (or modifying /etc/security/limits.conf to make changes persistent)
b) system-wide limit that can checked and adjusted by using either /proc/sys/fs/file-max or the sysctl utility (or modifying /etc/sysctl.conf to make changes persistent).

4. Most Linux distributions have rather strict user-level limits, but fairly high system-wide limits by default. For example, on my CentOS 5 VM I see 1024 file descriptors for a non-root user, but 207006 as the system-wide limit in /proc/sys/fs/file-max. The user limit would have to be adjusted if I had to backup more than ~500 tablespaces, and the system-wide limit would have to be adjusted if I had to backup more than ~100,000 tablespaces.

I don't see a way to fix this (i.e. remove the one-descriptor-per-tablespace requirement for XtraBackup) without a risk of losing data on table renames.

b) there's no way to scan a directory atomically, let alone scanning multiple directories atomically. The opendir()/readdir() combo does not provide any atomicity. For example, if a file is renamed between the opendir() call and subsequent readdir() calls,  then depending on timing, we could see the old name file name, the new file name, none, or even both! I am not aware of any utilities that are capable of handling this in a reasonable way: "rm -f", rmdir, cp, rsync -- all may provide inconsistent and unexpected results if there are concurrent directory changes while the directory scan is in progress. So if XtraBackup scans the directory and doesn't find a new name for the tablespace it is looking for,  does it mean the tablespace got removed so we can omit it from the backup? No, it could also mean that the tablespace has been renamed again, and we are back to the original problem we were trying to solve by the scan.

With the above in mind, I don't see a better way to handle tablespace renames than what I implemented as a fix for bug #1079700. Any attempt to fix it differently would result in lost data.

So I'm converting this bug back to a documentation request. We should document that:

2. When XtraBackup fails with the following message, it means that the operating system limit on available file descriptors has been exceeded:

3. There are 2 kinds of limits on file descriptors:
  a) per-user limit that can be checked and adjusted for the current session by the "ulimit -n" command (or modifying /etc/security/limits.conf to make changes persistent)
  b) system-wide limit that can checked and adjusted by using either /proc/sys/fs/file-max or the sysctl utility (or modifying /etc/sysctl.conf to make changes persistent).