Slow file extend when posix_fallocate used on SSD file storage.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Server moved to https://jira.percona.com/projects/PS |
Fix Released
|
Medium
|
Laurynas Biveinis | ||
5.1 |
Invalid
|
Undecided
|
Unassigned | ||
5.5 |
Fix Released
|
Medium
|
Laurynas Biveinis | ||
5.6 |
Fix Released
|
Medium
|
Laurynas Biveinis |
Bug Description
Analysis: posix_fallocate was called using 0 as offset and len as desired size. This is not optimal for SSDs.
Fix: Call posix_fallocate with correct offset i.e. current file size and extend the file from there len bytes.
Suggested fix (5.5) also 5.6 affected:
--- fil0fil.c.ORIG 2014-02-27 18:03:33.135333993 +0200
+++ fil0fil.c 2014-02-27 18:04:21.795335296 +0200
@@ -4953,20 +4953,30 @@
#ifdef HAVE_POSIX_
if (srv_use_
- offset_high = (size_after_extend - file_start_page_no)
- * page_size / (4ULL * 1024 * 1024 * 1024);
- offset_low = (size_after_extend - file_start_page_no)
- * page_size % (4ULL * 1024 * 1024 * 1024);
+ ib_int64_t start_offset = start_page_no * page_size;
+ ib_int64_t end_offset = (size_after_extend - start_page_no) * page_size;
+ ib_int64_t desired_size = size_after_
mutex_
- success = os_file_
- offset_low, offset_high);
+
+ if (posix_
+ fprintf(stderr, "InnoDB: Error: preallocating file "
+ "space for file \'%s\' failed. Current size "
+ " %lld, len %lld, desired size %lld\n",
+ node->name, start_offset, end_offset, desired_size);
+ success = FALSE;
+ } else {
+ success = TRUE;
+ }
+
mutex_
+
if (success) {
node->size += (size_after_extend - start_page_no);
space->size += (size_after_extend - start_page_no);
os_
}
+
fil_
goto complete_io;
}
Related branches
- Sergei Glushchenko (community): Approve (g2)
-
Diff: 34 lines (+16/-6)1 file modifiedstorage/innobase/fil/fil0fil.c (+16/-6)
- Sergei Glushchenko (community): Approve (g2)
-
Diff: 28 lines (+15/-3)1 file modifiedstorage/innobase/fil/fil0fil.cc (+15/-3)
tags: | added: xtradb |
tags: | added: contribution |
@Jan,
Yes, it does look like offset is always taken to be 0 unconditionally
(in os_file_set_size). However, regarding the I/O, fallocate (which
posix_fallocate does unless fallocate is unavailable) is a no-op for the
already written/allocated parts of the file (ie, it won't zero out any
written data), thus the offset shouldn't harm here (unless the filesytem
did something awry here). Also, since fallocate doesn't involve
any I/O (due to lazy extent allocation), it shouldn't add to
added I/O pressure.
However, there is other side to posix_fallocate, where it falls
back to pwrite on filesystems/kernels where it is not supported
(say, tmpfs, for which it was added in 2011 or so), here, it may
end up doing I/O; but here again, it shouldn't do anything to
already written data (as per specs of posix_fallocate), so the
I/O should be same as extending from that offset.
In SSD case, which filesystem/kernel combinations were in use?