Comment 4 for bug 892831

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote : Re: Re: [Bug 892831] Re: Fallocate support in innodb

Hi Stewart,

* On Tue, Nov 22, 2011 at 12:49:14AM -0000, Stewart Smith <email address hidden> wrote:
>On Mon, 21 Nov 2011 13:43:38 -0000, Raghavendra D Prabhu <email address hidden> wrote:
>> Regarding unwritten extents, I had a doubt regarding that*. However, after
>> discussing with XFS developers, I understood that since unwritten extents became
>> default years ago, the performance impact in converting unwritten extents to
>> written one are negligible now, far outweighed by benefits of fallocate and, of
>> course, better than writing zeroes.
>
>There's still a performance impact of converting them - it's file system
>metadata IO.
>
>>
>> Regarding fallocate, I went with fallocate instead of the posix
>> variant because
>> posix_fallocate fallsback to old legacy behavior on unsupported
>> systems silently
>> which may not be desirable.
>
>We use posix_fallocate() in NDB because of the portability (IIRC to
>Solaris) and just live with the fact that this may not always be
>optimal.
>
>By preallocating and then writing zeros you get the best of both worlds:
>you tell the allocator that you want huge chunks of disk and you don't
>have the performance impact of unwritten extents.

Thansks, got it. I will add posix_fallocate to that as well.

>
>This is mostly only a benefit when doing parallel operations or direct
>IO on non-empty file systems. IIRC InnoDB does not do parallel init of files.
Currently innodb init of a single file takes a really long time due to repeated
writing/syncing, a lot of time will be saved there.

Also, init is only one of the parts, the rest deal with the autoextension of shared
tablespace (ibdata) and single tablespace (ibd) files, which is where many
performance issues lurk. Currently the code for autoextension as I saw it is
behind several layers of mutexes and what not, based on the assumption that it
is something time consuming/complex when it shouldn't be.

I have seen on bugs.mysql people asking for a separate thread for it and also, fsyncing only during that
(facebook/innodb_io patches) and fdatasync otherwise.

I also noticed that ibd files have no autoincrement option, they are extended in
small increments upto a extent and after that based on actual request in short
increments (multiple of extent size -- FSP_FREE_ADD), again this can cause
severe fragmentation of file as a whole on heavily loaded systems. This is where
fallocate can help the most IMO. Currently innodb doesn't allow one to define a
variable to define this size (autoincrement variable defn allowed only for
shared ibdata files). So I have added a variable called
innodb_auto_extend_increment_single which when non-zero defines auto increment
for these files. Since fallocate is a O(1) for practical purposes, the size
shouldnt matter to a certain extent.
>
>
>--
>Stewart Smith
>
>--
>You received this bug notification because you are subscribed to the bug
>report.
>https://bugs.launchpad.net/bugs/892831
>
>Title:
> Fallocate support in innodb
>
>Status in Percona Server with XtraDB:
> Triaged
>
>Bug description:
> Currently innodb physically writes zeroes to file for --
>
> innodb table space creation (ibdata), log file creation(ib_logfile*),
> innodb single tablespace creation (ibd), extension of table space
> files (both ibdata and ibd)
>
> --- all of which make the process really slow. So I decided to add
> fallocate support to all of the above. Even though benefit should come
> from fast creation of initial files*, most benefit will be visible in
> extension, since it can actively affect the queries and also adds
> overhead with mutexes etc. Fallocate is by far a O(1) operation. I
> have tested it on XFS/ext4 filesystem on my box for small sizes and
> results are really good. But needs to be benchmarked on better
> systems.
>
> The code is here (commits from 3547 to 3550) --
> https://code.launchpad.net/~raghavendra-prabhu/+junk/mysql-server-
> fallocate and is based on latest mysql server tip from here --
> bazaar.launchpad.net/%2Bbranch/mysql-server/ . It needs to be built
> with -DWITH_FALLOCATE=ON to cmake, system should also support it
> (added a feature test for that).
>
> * Earlier, I have seen a case of innodb ibdata file being set to 2-3
> TB and that physical writing of zeroes taking hours even on RAID, so
> on a downtime or fresh boxes adding time significantly.
>
> PS: The only caveat so far is that on old ext4 (<= 2009) systems,
> Direct I/O with fallocate falls back to buffered IO. XFS doesn't have
> any such issues.
>
>To manage notifications about this bug go to:
>https://bugs.launchpad.net/percona-server/+bug/892831/+subscriptions
>
Regards,
--------------------------
Raghavendra D Prabhu (TZ: GMT + 530)
Call: +91 96118 00062
mailto:<email address hidden>
Percona, Inc. - http://www.percona.com / Blog: http://www.mysqlperformanceblog.com/
Skype: percona.raghavendrap
GPG: 0xD72BE977

Percona Live MySQL Conference April 10-12 Santa Clara
http://www.percona.com/live/mysql-conference-2012/