MySQL galera SST may fail innobackupex because of a race: /var/lib/mysql/ib_logfile0 failed: Destination file exists

Bug #1576073 reported by Bogdan Dobrelya
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Bogdan Dobrelya
Mitaka
Fix Committed
High
Bogdan Dobrelya

Bug Description

The bug is similar to the https://bugs.launchpad.net/fuel/+bug/1574999, although this one may happen a step down the road.
When an SST finished streaming data to a joiner, the "innobackupex--move-back" command may randomly fail because of a race /var/lib/mysql/ib_logfile0 failed: Destination file exists , with the following log /var/log/mysql/error.log:

WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20160427 14:21:09.268)
2016-04-27 14:21:21 22037 [Note] WSREP: 0.0 (n1): State transfer to 3.0 (n3) complete.
2016-04-27 14:21:21 22037 [Note] WSREP: Member 0.0 (n1) synced with group.
2016-04-27 14:21:21 22037 [Note] WSREP: 1.0 (n5): State transfer to 2.0 (n4) complete.
2016-04-27 14:21:21 22037 [Note] WSREP: Member 1.0 (n5) synced with group.
WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20160427 14:21:27.817)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20160427 14:21:27.819)
WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/ (20160427 14:21:33.597)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/mysql/my.cnf --no-version-check --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20160427 14:21:33.598)
WSREP_SST: [ERROR] Cleanup after exit with status:1 (20160427 14:21:33.602)
2016-04-27 14:21:33 22037 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.10.10.4' --auth 'root:root' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '22037' '' : 1 (Operation not permitted)
2016-04-27 14:21:33 22037 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2016-04-27 14:21:33 22037 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2016-04-27 14:21:33 22037 [ERROR] Aborting

And here is the innobackup log with straces:
execve("/usrbin/innobackupex", ["innobackupex", "--defaults-file=/etc/mysql/my.cn"..., "--no-version-check", "--move-back", "--force-non-empty-directories", "/var/lib/mysql.sst"], [/
* 55 vars */]) = 0
(... snip ...)
write(2, "160428 07:43:06 innobackupex: St"..., 236160428 07:43:06 innobackupex: Starting the move-back operation

IMPORTANT: Please check that the move-back run completes successfully.
           At the end of a successful move-back run innobackupex
           prints "completed OK!".

) = 236
write(2, "innobackupex version 2.3.4 based"..., 94innobackupex version 2.3.4 based on MySQL server 5.6.24 Linux (x86_64) (revision id: e80c779)
) = 94
stat("/var/lib/mysql", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/var/lib/mysql", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
close(3) = 0
stat("/var/lib/mysql", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/var/lib/mysql", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
close(3) = 0
chdir("/var/lib/mysql//.sst") = 0
brk(0x25a5000) = 0x25a5000
mkdir("/var/lib/mysql", 0700) = -1 EEXIST (File exists)
stat("ib_logfile0", {st_mode=S_IFREG|0660, st_size=83886080, ...}) = 0
stat("/var/lib/mysql/", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/var/lib/mysql/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
close(3) = 0
stat("/var/lib/mysql/ib_logfile0", {st_mode=S_IFREG|0660, st_size=83886080, ...}) = 0
write(2, "Error: Move file ib_logfile0 to "..., 91Error: Move file ib_logfile0 to /var/lib/mysql/ib_logfile0 failed: Destination file exists
) = 91
exit_group(1) = ?
+++ exited with 1 +++

____
Note, to get strace debugs, I applied the patch to the
--- a/wsrep_sst_xtrabackup-v2
+++ b/wsrep_sst_xtrabackup-v2
@@ -578,8 +578,8 @@ fi

 INNOEXTRA=""
 INNOAPPLY="${INNOBACKUPEX_BIN} $disver $iapts --apply-log \$rebuildcmd \${DATA} &>\${DATA}/innobackup.prepare.log"
-INNOMOVE="${INNOBACKUPEX_BIN} --defaults-file=${WSREP_SST_OPT_CONF} $disver $impts --move-back --force-non-empty-directories \${DATA} &>\${DATA}/innobackup.move.log"
-INNOBACKUP="${INNOBACKUPEX_BIN} --defaults-file=${WSREP_SST_OPT_CONF} $disver $iopts \$tmpopts \$INNOEXTRA --galera-info --stream=\$sfmt \$itmpdir 2>\${DATA}/innobackup.backup.log"
+INNOMOVE="strace -f ${INNOBACKUPEX_BIN} --defaults-file=${WSREP_SST_OPT_CONF} $disver $impts --move-back --force-non-empty-directories \${DATA} &>\${DATA}/innobackup.move.log"
+INNOBACKUP="strace -f ${INNOBACKUPEX_BIN} --defaults-file=${WSREP_SST_OPT_CONF} $disver $iopts \$tmpopts \$INNOEXTRA --galera-info --stream=\$sfmt \$itmpdir 2>\${DATA}/innobackup.backup.log"

 if [ "$WSREP_SST_OPT_ROLE" = "donor" ]
 then

Packages used:
percona-xtradb-cluster-server-5.6 5.6.21-25.8-0ubuntu2, percona-xtrabackup 2.2.3-2.1build1 / 2.3.4-1.wily

Changed in fuel:
importance: Undecided → High
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
description: updated
tags: added: area-library galera
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Nope, not fixed ;(

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Perhaps we should drop xtrabackup-v2 and use rsync or something?..

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/310776

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/310776

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The files /var/lib/mysql/ib_logfile* are created by recurring ocf ra monitor actions. Those invoke the /usr/bin/mysqld_safe --wsrep-recover, which creates the files. And that makes a race to SST's in progress making it to fail.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/311108

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/310776
Reason: wrong patch... it's not needed

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/311108
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=9442e58b945c073342e1f1410c09577df56c6e1f
Submitter: Jenkins
Branch: master

commit 9442e58b945c073342e1f1410c09577df56c6e1f
Author: Bogdan Dobrelya <email address hidden>
Date: Fri Apr 29 14:58:35 2016 +0200

    MySQL OCF RA monitor to not race with SST in progress

    When an SST's in progress, xtrabackup-v2 removes /var/lib/mysql/
    ibdata1 and ib_logfiles. That is must be done for the innobackupex to
    finish its job.
    When action monitor invokes the update_node_gtid(), which calls the
    /usr/bin/mysqld_safe --wsrep-recover, it creates those files back.
    This may race with active SST and fails it.

    Fix the check_if_sst() to verify both the sst_in_progress file presense AND
    the ibdata1 file absence. Do not invoke update_node_gtid when there is an
    active SST found.

    NOTE: the SST check may not rely only on the sst_in_progress file. When
    mysqld is starting, it creates its pidfile few moments later. But there's
    likely an SST's in progress already. This makes the OCF RA thinking that
    MySQL isn't running, thus removing the sst_in_progress file as
    a stale artifact. So it cannot reliably distinguish either mysqld has died
    in the middle of SST or it is yet to be started and is syncing. So, the
    ibdata1 file makes the check more reliable.

    Closes-bug: #1576073

    Change-Id: I427c04860203441f8f2f0d0b54375ec4ff2b3c3e
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/311687

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/311687
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=05f20dca018511861018afa84dd8d3873a2e2600
Submitter: Jenkins
Branch: master

commit 05f20dca018511861018afa84dd8d3873a2e2600
Author: Bogdan Dobrelya <email address hidden>
Date: Mon May 2 10:20:17 2016 +0200

    Fix typo

    Related-bug: #1576073

    Change-Id: I4650a9072ab5e2e9568b73b9a003b357fb938a6e
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/311763

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: master
Review: https://review.openstack.org/311763
Reason: doesn't fix the race condition

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/312415

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/312424

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/312415
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=472a213ca9bfec1e43456a9f19bdbc84c9307715
Submitter: Jenkins
Branch: stable/mitaka

commit 472a213ca9bfec1e43456a9f19bdbc84c9307715
Author: Bogdan Dobrelya <email address hidden>
Date: Fri Apr 29 14:58:35 2016 +0200

    MySQL OCF RA monitor to not race with SST in progress

    When an SST's in progress, xtrabackup-v2 removes /var/lib/mysql/
    ibdata1 and ib_logfiles. That is must be done for the innobackupex to
    finish its job.
    When action monitor invokes the update_node_gtid(), which calls the
    /usr/bin/mysqld_safe --wsrep-recover, it creates those files back.
    This may race with active SST and fails it.

    Fix the check_if_sst() to verify both the sst_in_progress file presense AND
    the ibdata1 file absence. Do not invoke update_node_gtid when there is an
    active SST found.

    NOTE: the SST check may not rely only on the sst_in_progress file. When
    mysqld is starting, it creates its pidfile few moments later. But there's
    likely an SST's in progress already. This makes the OCF RA thinking that
    MySQL isn't running, thus removing the sst_in_progress file as
    a stale artifact. So it cannot reliably distinguish either mysqld has died
    in the middle of SST or it is yet to be started and is syncing. So, the
    ibdata1 file makes the check more reliable.

    Closes-bug: #1576073

    Change-Id: I427c04860203441f8f2f0d0b54375ec4ff2b3c3e
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 9442e58b945c073342e1f1410c09577df56c6e1f)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/312424
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=9ab9264b8e6d5382728f752ac11cc49fbb159002
Submitter: Jenkins
Branch: stable/mitaka

commit 9ab9264b8e6d5382728f752ac11cc49fbb159002
Author: Bogdan Dobrelya <email address hidden>
Date: Mon May 2 10:20:17 2016 +0200

    Fix typo

    Related-bug: #1576073

    Change-Id: I4650a9072ab5e2e9568b73b9a003b357fb938a6e
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 05f20dca018511861018afa84dd8d3873a2e2600)

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/315989

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/316802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/317978

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/317978

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/316802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.1)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/315989

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.