xtrabackup 2.0 generates a lot of IO compared to 1.6

Bug #1093385 reported by Rene' Cannao' on 2012-12-24
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup
High
Sergei Glushchenko
2.0
High
Sergei Glushchenko
2.1
High
Sergei Glushchenko
2.2
High
Sergei Glushchenko

Bug Description

After a migration from XtraBackup 1.6.4 to 2.0.x we noticed that execution time for backups increased by a factor of ~2.5 on all the systems where was deployed (independently from the size from the backups, that could range from tens to hundreds of GB).

===

How to reproduce:

mkdir /data/rene_tmp
cd /data/rene_tmp
wget http://www.percona.com/redir/downloads/XtraBackup/XtraBackup-1.6.4/Linux/binary/x86_64/xtrabackup-1.6.4.tar.gz
tar -zxf xtrabackup-1.6.4.tar.gz
http://www.percona.com/redir/downloads/XtraBackup/XtraBackup-2.0.4/binary/Linux/x86_64/percona-xtrabackup-2.0.4-484.tar.gz
tar -zxf percona-xtrabackup-2.0.4-484.tar.gz

With 1.6.4:
export PATH=/data/rene_tmp/xtrabackup-1.6.4/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/ec2/bin:/opt/iam/bin:/root/bin:/opt/ec2/bin:/opt/iam/bin
xtrabackup-1.6.4/bin/innobackupex-1.5.1 --defaults-file /etc/my.cnf --user=root --password=XXXX --slave-info --stream=tar /data/mysql-tmp 2>/tmp/innobackupex-log > /dev/null

With 2.0.4:
export PATH=/data/rene_tmp/percona-xtrabackup-2.0.4/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/ec2/bin:/opt/iam/bin:/root/bin:/opt/ec2/bin:/opt/iam/bin
percona-xtrabackup-2.0.4/bin/innobackupex-1.5.1 --defaults-file /etc/my.cnf --user=root --password=XXXX --slave-info --stream=tar /data/mysql-tmp 2>/tmp/innobackupex-log > /dev/null

===

Attached are the output of "vmstat 10" and " iostat -k -d -x 10 /dev/sd[fihg] " , for XtraBackup-1.6.4 and XtraBackup-2.0.4 .

Note: all the instances are EC2 instances with EBS volumes configured on RAID10 . We noticed the same behavior with 4 or 8 EBS volumes, and with chunks of 64kB or 256kB

Related branches

lp:~sergei.glushchenko/percona-xtrabackup/2.0-xb-bug1093385
Alexey Kopytov (community): Approve on 2014-03-01
lp:~sergei.glushchenko/percona-xtrabackup/2.1-xb-bug1093385
Alexey Kopytov (community): Approve on 2014-03-01
lp:~sergei.glushchenko/percona-xtrabackup/2.2-xb-bug1093385
Alexey Kopytov (community): Approve on 2014-03-01
Rene' Cannao' (rene-cannao) wrote :
Alexey Kopytov (akopytov) wrote :

Might be related to https://bugs.launchpad.net/percona-xtrabackup/+bug/1095249/comments/1

In 1.6, XtraBackup uses posix_fadvise() incorrectly, essentially making them useless (as one-time calls before reading the file). So it essentially does cached reads.

In 2.0, posix_fadvise() hints are used after reading every block. But, since no ranges are specified, FADV_DONTNEED also discards read-ahead data.

Also POSIX_FADV_NOREUSE exists specially for backup software.

I can confirm that using POSIX_FADV_NOREUSE with no range specified
generates more IO compared with the case when range is specified.

On my system with datafile ~886M Xtrabackup 2.0 reads ~980M, but with
ranges specified to only discard what we just read, this number becomes
~886M.

On Mon, Feb 24, 2014 at 3:36 PM, Launchpad Bug Tracker <
<email address hidden>> wrote:

> ** Branch linked: lp:~sergei.glushchenko/percona-xtrabackup/2.0-xb-
> bug1093385
>
> --
> You received this bug notification because you are a member of Percona
> core, which is subscribed to Percona XtraBackup.
> https://bugs.launchpad.net/bugs/1093385
>
> Title:
> xtrabackup 2.0 generates a lot of IO compared to 1.6
>
> Status in Percona XtraBackup:
> Triaged
> Status in Percona XtraBackup 2.0 series:
> Triaged
> Status in Percona XtraBackup 2.1 series:
> Triaged
> Status in Percona XtraBackup 2.2 series:
> Triaged
>
> Bug description:
> After a migration from XtraBackup 1.6.4 to 2.0.x we noticed that
> execution time for backups increased by a factor of ~2.5 on all the
> systems where was deployed (independently from the size from the
> backups, that could range from tens to hundreds of GB).
>
> ===
>
> How to reproduce:
>
> mkdir /data/rene_tmp
> cd /data/rene_tmp
> wget
> http://www.percona.com/redir/downloads/XtraBackup/XtraBackup-1.6.4/Linux/binary/x86_64/xtrabackup-1.6.4.tar.gz
> tar -zxf xtrabackup-1.6.4.tar.gz
>
> http://www.percona.com/redir/downloads/XtraBackup/XtraBackup-2.0.4/binary/Linux/x86_64/percona-xtrabackup-2.0.4-484.tar.gz
> tar -zxf percona-xtrabackup-2.0.4-484.tar.gz
>
> With 1.6.4:
> export
> PATH=/data/rene_tmp/xtrabackup-1.6.4/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/ec2/bin:/opt/iam/bin:/root/bin:/opt/ec2/bin:/opt/iam/bin
> xtrabackup-1.6.4/bin/innobackupex-1.5.1 --defaults-file /etc/my.cnf
> --user=root --password=XXXX --slave-info --stream=tar /data/mysql-tmp
> 2>/tmp/innobackupex-log > /dev/null
>
> With 2.0.4:
> export
> PATH=/data/rene_tmp/percona-xtrabackup-2.0.4/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/opt/ec2/bin:/opt/iam/bin:/root/bin:/opt/ec2/bin:/opt/iam/bin
> percona-xtrabackup-2.0.4/bin/innobackupex-1.5.1 --defaults-file
> /etc/my.cnf --user=root --password=XXXX --slave-info --stream=tar
> /data/mysql-tmp 2>/tmp/innobackupex-log > /dev/null
>
> ===
>
> Attached are the output of "vmstat 10" and " iostat -k -d -x 10
> /dev/sd[fihg] " , for XtraBackup-1.6.4 and XtraBackup-2.0.4 .
>
> Note: all the instances are EC2 instances with EBS volumes configured
> on RAID10 . We noticed the same behavior with 4 or 8 EBS volumes, and
> with chunks of 64kB or 256kB
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-xtrabackup/+bug/1093385/+subscriptions
>

In the path of directio, there should be no fadvise/madvise at
all since these are purely the directives associated with linux
VM.

a) On a file opened with O_DIRECT, POSIX_FADV_SEQUENTIAL should
be a no-op but again, it depends on FS/kernel, so undefined, and
shouldn't be done

b) POSIX_FADV_DONTNEED again shouldn't be used with O_DIRECT.

Out of a) and b), b) is actually bad because a) operates on file
level, where b) works on mapping level. So, in places you may end
up flushing the data InnoDB is using (if mysqld is not using
O_DIRECT). Also, a) operates as directive (it doubles read-ahead
window but doesn't do anything till anyone reads the file) whereas b) actually
acts (though defined as a directive) immediately.

POSIX_FADV_NOREUSE actually does nothing: http://lxr.linux.no/#linux+v3.13.5/mm/fadvise.c#L113 (though without POSIX_FADV_NORMAL it actually means no/low readahead)

Dropping caches for InnoDB data shouldn't really make any harm to MySQL since InnoDB has buffer pool and doesn't rely on system cache. Regarding POSIX_FADV_DONTNEED there was https://lkml.org/lkml/2011/6/23/35, isn't it make it to kernel?

It won't harm if InnoDB is doing O_DIRECT only I/O but innodb_flush_method is set otherwise, it can.

Also, if/when PXB is running in O_DIRECT mode, it will not, unlike rsync, dirty any pages in page cache, so that shouldn't apply here. (and log file pages when ALL_O_DIRECT, otherwise only data pages).

I also ran a quick test
===============================================

dd if=/dev/urandom of=file1 bs=1M count=200
200+0 records in
200+0 records out
209715200 bytes (210 MB) copied, 19.8979 s, 10.5 MB/s
dd if=/dev/urandom of=file1 bs=1M count=200 0.00s user 19.86s system 99% cpu 19.928 total

linux-fadvise file1 POSIX_FADV_SEQUENTIAL
Going to fadvise file1 as mode POSIX_FADV_SEQUENTIAL
offset: 0
length: 209715200
mode: POSIX_FADV_SEQUENTIAL
WIN

linux-fincore -s file1
filename size total_pages min_cached page cached_pages cached_size cached_perc
-------- ---- ----------- --------------- ------------ ----------- -----------
file1 209715200 51200 -1 0 0 0.00
---
total cached size: 0

cat file1 >/dev/null

linux-fincore -s file1
filename size total_pages min_cached page cached_pages cached_size cached_perc
-------- ---- ----------- --------------- ------------ ----------- -----------
file1 209715200 51200 0 51200 209715200 100.00
---

linux-fadvise file1 POSIX_FADV_DONTNEED
Going to fadvise file1 as mode POSIX_FADV_DONTNEED
offset: 0
length: 209715200
mode: POSIX_FADV_DONTNEED
WIN

linux-fincore -s file1
filename size total_pages min_cached page cached_pages cached_size cached_perc
-------- ---- ----------- --------------- ------------ ----------- -----------
file1 209715200 51200 -1 0 0 0.00
---
total cached size: 0

Well, I don't see any way how POSIX_FADV_DONTNEED can harm InnoDB.
OS file cache is redundant for InnoDB. In fact page which exists in OS cache will likely exist in buffer pool too. And if page is absent in buffer pool it will likely absent in OS cache also. Redo logs are write-only unless we do recovery. There is also read ahead in InnoDB.

I believe that POSIX_FADV_DONTNEED should work just fine for Xtrabackup both for data and log files if we will not discard data cached by read-ahead.

Alexey Kopytov (akopytov) wrote :

Raghu,

On Mon, Feb 24 2014 18:13:44 +0400, Raghavendra D. Prabhu wrote:

> In the path of directio, there should be no fadvise/madvise at
> all since these are purely the directives associated with linux
> VM.
>
> a) On a file opened with O_DIRECT, POSIX_FADV_SEQUENTIAL should
> be a no-op but again, it depends on FS/kernel, so undefined, and
> shouldn't be done
>
> b) POSIX_FADV_DONTNEED again shouldn't be used with O_DIRECT.
>

Those hints are defined in a rather generic way by both POSIX and the
Linux man page:

POSIX_FADV_SEQUENTIAL
    The application expects to access the specified data sequentially
    (with lower offsets read before higher ones).
POSIX_FADV_DONTNEED
    The specified data will not be accessed in the near future.

It neither implies nor states explicitly that those hints are
incompatible with O_DIRECT. It is meant to be a way for applications to
explain how data is going to be accessed, so the kernel could
implement optimizations where applicable. If a specific implementation
is, say, incompatible with O_DIRECT, it must also be responsible for
handling it properly.

>
> Out of a) and b), b) is actually bad because a) operates on file
> level, where b) works on mapping level. So, in places you may end
> up flushing the data InnoDB is using (if mysqld is not using
> O_DIRECT). Also, a) operates as directive (it doubles read-ahead

We always access InnoDB files in the same mode as the server. So if
neither server nor XB use O_DIRECT, POSIX_FADV_DONTNEED may (again, up
to specific implementation) indeed stomp on server’s caches. I don’t see
a way around it, but as Sergei pointed out, there shouldn’t be much
impact on InnoDB neither. If both the server and XB use O_DIRECT, then I
don’t see how using POSIX_FADV_DONTNEED may be bad.

> window but doesn't do anything till anyone reads the file) whereas b) actually
> acts (though defined as a directive) immediately.
>
> POSIX_FADV_NOREUSE actually does nothing:
> http://lxr.linux.no/#linux+v3.13.5/mm/fadvise.c#L113 (though without
> POSIX_FADV_NORMAL it actually means no/low readahead)

We could still implement it, as that would provide more info to the
kernel, whether it uses it or not (yet). But OK, thanks for the heads up.

>>We always access InnoDB files in the same mode as the server. So if
>>neither server nor XB use O_DIRECT, POSIX_FADV_DONTNEED may (again, up
>>to specific implementation) indeed stomp on server’s caches. I don’t see
>>a way around it, but as Sergei pointed out, there shouldn’t be much
>>impact on InnoDB neither. If both the server and XB use O_DIRECT, then I
>>don’t see how using POSIX_FADV_DONTNEED may be bad.

Yes, if the mode of access is same as that server and is ensured,
then it should be fine. So, unless, someone uses O_DIRECT with
PXB and non-O_DIRECT with InnoDB (or something like ALL_O_DIRECT
with one but not with another), and yes, using fadvise when doing
buffered I/O is quite desirable to reduce dirty page pressure.

> If both the server and XB use O_DIRECT, then I
> don’t see how using POSIX_FADV_DONTNEED may be bad.

Shouldn't be bad (since invalidation of the mapping is not a
function of file size).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers