Slow file extend when posix_fallocate used on SSD file storage.

Bug #1286114 reported by Jan Lindström
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
Medium
Laurynas Biveinis
5.1
Invalid
Undecided
Unassigned
5.5
Fix Released
Medium
Laurynas Biveinis
5.6
Fix Released
Medium
Laurynas Biveinis

Bug Description

Analysis: posix_fallocate was called using 0 as offset and len as desired size. This is not optimal for SSDs.

Fix: Call posix_fallocate with correct offset i.e. current file size and extend the file from there len bytes.

Suggested fix (5.5) also 5.6 affected:

--- fil0fil.c.ORIG 2014-02-27 18:03:33.135333993 +0200
+++ fil0fil.c 2014-02-27 18:04:21.795335296 +0200
@@ -4953,20 +4953,30 @@

 #ifdef HAVE_POSIX_FALLOCATE
  if (srv_use_posix_fallocate) {
- offset_high = (size_after_extend - file_start_page_no)
- * page_size / (4ULL * 1024 * 1024 * 1024);
- offset_low = (size_after_extend - file_start_page_no)
- * page_size % (4ULL * 1024 * 1024 * 1024);
+ ib_int64_t start_offset = start_page_no * page_size;
+ ib_int64_t end_offset = (size_after_extend - start_page_no) * page_size;
+ ib_int64_t desired_size = size_after_extend*page_size;

   mutex_exit(&fil_system->mutex);
- success = os_file_set_size(node->name, node->handle,
- offset_low, offset_high);
+
+ if (posix_fallocate(node->handle, start_offset, end_offset) == -1) {
+ fprintf(stderr, "InnoDB: Error: preallocating file "
+ "space for file \'%s\' failed. Current size "
+ " %lld, len %lld, desired size %lld\n",
+ node->name, start_offset, end_offset, desired_size);
+ success = FALSE;
+ } else {
+ success = TRUE;
+ }
+
   mutex_enter(&fil_system->mutex);
+
   if (success) {
    node->size += (size_after_extend - start_page_no);
    space->size += (size_after_extend - start_page_no);
    os_has_said_disk_full = FALSE;
   }
+
   fil_node_complete_io(node, fil_system, OS_FILE_READ);
   goto complete_io;
  }

Related branches

tags: added: xtradb
tags: added: contribution
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Jan,

Yes, it does look like offset is always taken to be 0 unconditionally
(in os_file_set_size). However, regarding the I/O, fallocate (which
posix_fallocate does unless fallocate is unavailable) is a no-op for the
already written/allocated parts of the file (ie, it won't zero out any
written data), thus the offset shouldn't harm here (unless the filesytem
did something awry here). Also, since fallocate doesn't involve
any I/O (due to lazy extent allocation), it shouldn't add to
added I/O pressure.

However, there is other side to posix_fallocate, where it falls
back to pwrite on filesystems/kernels where it is not supported
(say, tmpfs, for which it was added in 2011 or so), here, it may
end up doing I/O; but here again, it shouldn't do anything to
already written data (as per specs of posix_fallocate), so the
I/O should be same as extending from that offset.

In SSD case, which filesystem/kernel combinations were in use?

Revision history for this message
Jan Lindström (jplindst) wrote :

This was observed with Fusion-io ioDrive2, Driver version 3.3.4, build 5833069, file system nvmfs, Linux 3.4.12. Based on performance tests fallocate to already written/allocated parts is not only no-op (could be file system missing feature).

R: Jan

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

@Jan,

Ah, yes, that explains it, the filesystem support for fallocate may be lacking/incomplete here. (For other common in-tree filesystems - ext4, XFS, tmpfs, btrfs there shouldn't be any issues).

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Jan, your patch replaced file_start_page_no with start_page_no in offset calculations. Was that intentional?

Revision history for this message
Jan Lindström (jplindst) wrote :

Yes it is, but you may change that, idea is to call posix_fallocate(fd, current_size_of_file, size_to_extent); I had little bit different version of fil0fil.cc when I first fixed this issue.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Adjusted the title as Percona Server does not have such public option.

summary: - Slow file extend when innodb_use_fallocate=1 and SSD file storage.
+ Slow file extend when posix_fallocate used on SSD file storage.
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1482

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.