Comment 8 for bug 870119

Revision history for this message
Alexey Kopytov (akopytov) wrote :

We've been able to reproduce the problem.

The reason is that the InnoDB file I/O subsystem may reuse file descriptors by closing the old ones when the number of open files hits innodb_open_files. Which works for InnoDB, because if InnoDB needs to access a table which has been closed, it would just reopen it.

However, that doesn't work for XtraBackup, since it only keeps a file descriptor when copying a file. So when the --parallel option is used, there's a chance that another thread wants to open a file and hits innodb_open_files. So fil_try_to_close_file_in_LRU() may close a file descriptor which is currently being in use by another thread and then this descriptor is shortly reused when opening another file. Which would result in obscure failures like this.

Another important part to this problem is the fact the XtraBackup leaks file descriptors. Which is bug #713267. But even after that bug is fixed, there will still be a possibility to hit this bug, but setting a very low value of innodb_open_files for XtraBackup, and then using a very high --parallel value. So what needs to be done to fix this in addition to fixing bug #713267, is to fail when XtraBackup hits the innodb_open_files limit, rather than follow the default InnoDB behavior and close some random files.