[regression] MySQL may fail to SST because of a ./ib* files race condition

Bug #1574999 reported by Bogdan Dobrelya
94
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Bogdan Dobrelya
6.1.x
Won't Fix
High
MOS Maintenance
7.0.x
Won't Fix
High
MOS Maintenance
8.0.x
Won't Fix
High
MOS Maintenance
Mitaka
Fix Released
High
Volodymyr Shypyguzov

Bug Description

When a Percona MySQL 3.6 node is starting by OCF RA, it may fail to finish an xtrabackup-v2 SST because there is a race condition with accessing and removing the ./ib* files. A logs snippet example:

2016-04-26 07:21:57 5036 [Note] WSREP: Requesting state transfer: success, donor: 3
WSREP_SST: [INFO] Proceeding with SST (20160426 07:21:58.143)
WSREP_SST: [INFO] Cleaning the existing datadir (20160426 07:21:58.144)
removed '/var/lib/mysql/ib_logfile0'
removed '/var/lib/mysql/ib_logfile1'
removed '/var/lib/mysql/ibdata1'
removed '/var/lib/mysql/backup-my.cnf'
removed '/var/lib/mysql/auto.cnf'
WSREP_SST: [INFO] Cleaning the binlog directory /var/log/mysql as well (20160426 07:21:58.178)
WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20160426 07:21:58.183)
2016-04-26 07:22:00 5036 [Note] WSREP: (8efee881, 'tcp://10.10.10.4:4567') turning message relay requesting off
^Gxbstream: Can't create/write to file './ibdata1' (Errcode: 17 - File exists)

Packages used: percona-xtradb-cluster-server-5.6 5.6.21-25.8-0ubuntu2,
percona-xtrabackup 2.2.3-2.1build1 / 2.3.4-1.wily

How to reproduce:
* deploy an env
* unmanage the mysql clone resource
* pick a node and kill -9 myslqd
* Run mysqld binary for the node manually:
/usr/sbin/mysqld --verbose --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --open-files-limit=102400 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306
* issue kill -STOP `pgrep xbstream` a couple of times
* touch /var/lib/mysql/ibdata1
* kill -CONT `pgrep xbstream`

This happens as the SST-time /var/lib/mysql/.sst dir is not managed by the xtrabackup-v2 as appropriate, if there is no innodb-data-home-dir set in mysql conf. Once it configured, the ibdata1 file will go into that .sst dir and the race will disappear.

Another race condition is reported in the duplicated bug https://bugs.launchpad.net/fuel/+bug/1576073

Changed in fuel:
importance: Undecided → High
milestone: none → 10.0
assignee: nobody → Fuel Library Team (fuel-library)
tags: added: area-library galera
description: updated
description: updated
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
status: New → Confirmed
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Bogdan Dobrelya (bogdando)
status: Confirmed → In Progress
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/310524

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/310524
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=3061ef0b584f9e43bc4f6139cf798558b77392ee
Submitter: Jenkins
Branch: master

commit 3061ef0b584f9e43bc4f6139cf798558b77392ee
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Apr 27 17:44:35 2016 +0200

    Fix a race for the xtrabackup-v2 SST

    Set innodb-data-home-dir to make the xtrabackup-v2
    script to manage the /var/lib/mysql/.sst dir as appropriate.
    This leaves the race condition with the ./ibdata1 file behind.

    Closes-bug:#1574999

    Change-Id: I9c9c979264cf592cd9411a9ed5d2a26bd090421e
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/310684

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/310684
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=db6d56e60e18ea717db85c37a80dafcf9fbb6194
Submitter: Jenkins
Branch: stable/mitaka

commit db6d56e60e18ea717db85c37a80dafcf9fbb6194
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Apr 27 17:44:35 2016 +0200

    Fix a race for the xtrabackup-v2 SST

    Set innodb-data-home-dir to make the xtrabackup-v2
    script to manage the /var/lib/mysql/.sst dir as appropriate.
    This leaves the race condition with the ./ibdata1 file behind.

    Closes-bug:#1574999

    Change-Id: I9c9c979264cf592cd9411a9ed5d2a26bd090421e
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 3061ef0b584f9e43bc4f6139cf798558b77392ee)

Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: MySQL may fail to SST because of a ./ibdata1 race condition

The fix was not complete, here is a continuation of the story http://pastebin.com/WBE7ZjPB

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/312471

Changed in fuel:
status: Confirmed → In Progress
summary: - MySQL may fail to SST because of a ./ibdata1 race condition
+ MySQL may fail to SST because of a ./ib* files race condition
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/312835

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/312471
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=590a3f13f3fee83ae971e4c5d69d33a6487196d1
Submitter: Jenkins
Branch: master

commit 590a3f13f3fee83ae971e4c5d69d33a6487196d1
Author: Bogdan Dobrelya <email address hidden>
Date: Wed May 4 14:04:34 2016 +0200

    For OCF status, match mysqld by process id

    When started and doing SST, pidfile is not created immediately.
    Fix race conditions by making action status to search by
    the pid as well. Add a dummy_test to status check, which does
    select 1.

    Co-authored-by: Sergii Golovatiuk <email address hidden>
    Closes-bug: #1574999

    Change-Id: I86fb5433d100d2ea675b259a963a0e84268fa095
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: MySQL may fail to SST because of a ./ib* files race condition

The race condition's still in place:
May 5 06:28:40 n2 ocf-mysql-wss: INFO: p_mysql: mysql_start(): Starting MySQL
May 5 06:29:10 n2 ocf-mysql-wss: INFO: p_mysql: check_if_sst(): SST is in progress
May 5 06:29:10 n2 ocf-mysql-wss: INFO: p_mysql: mysql_start(): MySQL started
May 5 06:29:16 n2 ocf-mysql-wss: ERROR: p_mysql: mysql_status(): MySQL is not running
May 5 06:29:16 n2 ocf-mysql-wss: INFO: p_mysql: check_if_sst(): SST is in progress
May 5 06:29:16 n2 ocf-mysql-wss: WARNING: p_mysql: mysql_monitor(): found and purged a stale sst_in_progress file
May 5 06:29:16 n2 crmd[155]: notice: process_lrm_event: Operation p_mysql_monitor_60000: not running (node=n2, call=390, rc=7, cib-update=1943, confirmed=false)
May 5 06:29:33 n2 ocf-mysql-wss: INFO: p_mysql: mysql_start(): Starting MySQL
(a race with the SST in progress starts here)

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The RC is non matching awk:
+ mysql_monitor
++ ps -C mysqld -o pid= -o args=
++ awk -v v=/var/lib/mysql '/datadir=/ { if ($1 ~ !/wsrep-recover/) print $1}'
+ pid=
+ '[' '' ']'

I'd better off using more pipes

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Hm, it is matching, even though ignoring the datadir var, but it doesn't exclude the "wsrep_recover" case:
# ps -C mysqld -o pid= -o args=|grep -v defu; bash -xx test
10255 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep-recover --log-error=/dev/stdout.err --open-files-limit=102400 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --wsrep_recover --log_error=/var/lib/mysql/wsrep_recovery.jHz8d2 --pid-file=/var/lib/mysql/n2-recover.pid
+ OCF_RESKEY_datadir=/var/lib/mysql
++ ps -C mysqld -o pid= -o args=
++ awk -v v=/var/lib/mysql '/datadir=/ { if ($1 ~ !/wsrep-recover/) print $1}'
+ pid=10255
+ echo 10255
10255

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/312839

Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: MySQL may fail to SST because of a ./ib* files race condition

So far so better, the "select 1" check isn't working while an xtrabackup-v2 SST is in progress. So the recent patches are still not fixing the race ...

mysql log:
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20160505 07:18:39.076)
160505 7:18:41 [Note] WSREP: (962a755b, 'tcp://10.10.10.5:4567') turning message relay requesting off
160505 7:18:51 [Note] WSREP: 0.0 (n2): State transfer to 3.0 (n4) complete.
160505 7:18:51 [Note] WSREP: Member 0.0 (n2) synced with group.
WSREP_SST: [INFO] NOTE: Joiner-Recv-SST took 17 seconds (20160505 07:18:56.743)
WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20160505 07:18:56.751)

OCF RA debug:
+ echo 'ocf-mysql-wss: 2016/05/05_07:18:42' 'ERROR: p_mysql: mysql_status(): PIDFile /var/run/mysqld/mysqld.pid of MySQL server not found. Sleeping for 2 seconds. 1 retries left'
+ sleep 2
+ '[' 1 -gt 0 ']'
+ '[' -f /var/run/mysqld/mysqld.pid ']'
++ ps -C mysqld -o pid= -o args=
++ grep /var/lib/mysql
++ awk '!/wsrep.recover/ {print $1}'
+ pid=13258
+ '[' 13258 ']'
+ dummy_test
+ /usr/bin/mysql -S /var/run/mysqld/mysqld.sock --connect_timeout=10 --user=root --password=root -s -N -e 'select 1;'
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (111 "Connection refused")

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/mitaka)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/312835
Reason: to early to backport, let's fix all of the issues discovered in the master first

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

This is regression issue which reproduced in 100% of cases now after feature "v2 tasks based deploy".

summary: - MySQL may fail to SST because of a ./ib* files race condition
+ [regression] MySQL may fail to SST because of a ./ib* files race
+ condition
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/312911

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/312839
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=43fb2ba6fb59c21542ce21204262b72672a6ba70
Submitter: Jenkins
Branch: master

commit 43fb2ba6fb59c21542ce21204262b72672a6ba70
Author: Bogdan Dobrelya <email address hidden>
Date: Thu May 5 09:02:41 2016 +0200

    Fix mysqld process matching

    Make ps -C mysqld to match the datadir and exclude
    wsrep_recover/wsrep-recover as well

    Closes-bug: #1574999

    Change-Id: I133b0c4ea3000aa896ae412591cfd5882016bcf9
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/313273

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/312911
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=03991059adcc99989431c27f38c5ffa4ae35b486
Submitter: Jenkins
Branch: master

commit 03991059adcc99989431c27f38c5ffa4ae35b486
Author: Bogdan Dobrelya <email address hidden>
Date: Thu May 5 13:43:24 2016 +0200

    Rework SST check, fix possible masters search

    * Fix racing of monitoring with SST
    * Fix printf multilines sorting
      Expected: printf -- '%s\n' ${a} | sort -u (returns a sorted multiline)
      Actuacl: printf -- '%s\n' "$a" | sort -u (returns a single string)
    * Fix possible masters search, by the greatest SEQNO found for a
      magority UUID
    (Those blocks each other in CI and must be fixed at once)

    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779

    Change-Id: I3d0d376e6bef3ccc3e738731b71f4dd60a59e653
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/313603

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/313674

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/313603
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=dcde07f439b6f5a597c8fba1748fbebc151fe4a8
Submitter: Jenkins
Branch: master

commit dcde07f439b6f5a597c8fba1748fbebc151fe4a8
Author: Bogdan Dobrelya <email address hidden>
Date: Fri May 6 17:30:21 2016 +0200

    Fix OCF MySQL monitor

    Return success if SST detected, otherwise check status
    and return error if it is not OK.

    Closes-bug: #1574999

    Change-Id: I5ee7807821ae1f21bcb3c74e15338acb8bb91ea1
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/313273
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=0197b09fc6ef25657a81a8398b6da0ea1b0603af
Submitter: Jenkins
Branch: stable/mitaka

commit 0197b09fc6ef25657a81a8398b6da0ea1b0603af
Author: Bogdan Dobrelya <email address hidden>
Date: Thu May 5 13:43:24 2016 +0200

    Rework SST check, fix possible masters search

    * Fix racing of monitoring with SST
    * Fix printf multilines sorting
      Expected: printf -- '%s\n' ${a} | sort -u (returns a sorted multiline)
      Actuacl: printf -- '%s\n' "$a" | sort -u (returns a single string)
    * Fix possible masters search, by the greatest SEQNO found for a
      magority UUID
    (Those blocks each other in CI and must be fixed at once)

    Fuel-CI: disable

    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779

    Change-Id: I3d0d376e6bef3ccc3e738731b71f4dd60a59e653
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 03991059adcc99989431c27f38c5ffa4ae35b486)
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    Signed-off-by: Sergii Golovatiuk <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/313674
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=84263cb37ff9b47885c99170b573b84389944300
Submitter: Jenkins
Branch: stable/mitaka

commit 84263cb37ff9b47885c99170b573b84389944300
Author: Bogdan Dobrelya <email address hidden>
Date: Fri May 6 17:30:21 2016 +0200

    Fix OCF MySQL monitor

    Return success if SST detected, otherwise check status
    and return error if it is not OK.

    Closes-bug: #1574999

    Change-Id: I5ee7807821ae1f21bcb3c74e15338acb8bb91ea1
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/6.1)

Fix proposed to branch: stable/6.1
Review: https://review.openstack.org/315989

tags: added: tech-debt
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

This issue isn't reproducing anymore on the latest 9.0 builds (checked on CI build #362), so, moved to Fix Released status.

Please reopen it if the issue will be reproduced again.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The RC summary:
This https://bugs.launchpad.net/fuel/+bug/1576073/comments/8,
This https://bugs.launchpad.net/fuel/+bug/1574999/comments/19,
and also that was given in the description:
"This happens as the SST-time /var/lib/mysql/.sst dir is not managed by the xtrabackup-v2 as appropriate, if there is no innodb-data-home-dir set in mysql conf"

The issue have started blocking mostly as far as we enabled concurrent deployment for DB nodes

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/316802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/317978

tags: added: on-verification
Revision history for this message
Volodymyr Shypyguzov (vshypyguzov) wrote :

Verified on 9.0 iso #458

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/8.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/8.0
Review: https://review.openstack.org/317978

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/316802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/6.1)

Change abandoned by Bogdan Dobrelya (<email address hidden>) on branch: stable/6.1
Review: https://review.openstack.org/315989

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.1, 7.0 and 8.0-updates as this is too big change to be accepted to stable branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/374219

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/7.0)

Reviewed: https://review.openstack.org/374219
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f9a2d479f3687157d2b17a927a09ce5f995522d6
Submitter: Jenkins
Branch: stable/7.0

commit f9a2d479f3687157d2b17a927a09ce5f995522d6
Author: Denis Puchkin <email address hidden>
Date: Wed Sep 21 17:38:54 2016 +0300

    Backport mysql OCF from stable/mitaka

    backport mysql ocf script from stable/mitaka

    Closes-bug: #1524826
    Closes-bug: #1542256
    Closes-bug: #1572239
    Closes-bug: #1572557
    Closes-bug: #1572601
    Closes-bug: #1574747
    Closes-bug: #1574497
    Closes-bug: #1576244
    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779
    Closes-bug: #1574999
    Closes-bug: #1576244
    Closes-bug: #1583173
    Closes-bug: #1585125

    Change-Id: I1cc6f95884a8fbd5c3418ede89bdf9ec6864bdc8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/377597

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/377597
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=b3873f5f5a0bb1526b1269f163223ae48d6e21f5
Submitter: Jenkins
Branch: stable/8.0

commit b3873f5f5a0bb1526b1269f163223ae48d6e21f5
Author: Denis Puchkin <email address hidden>
Date: Tue Sep 27 13:20:25 2016 +0300

    Backport mysql OCF from stable/mitaka

    backport mysql ocf script from stable/mitaka

    Closes-bug: #1524826
    Closes-bug: #1542256
    Closes-bug: #1572239
    Closes-bug: #1572557
    Closes-bug: #1572601
    Closes-bug: #1574747
    Closes-bug: #1574497
    Closes-bug: #1576244
    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779
    Closes-bug: #1574999
    Closes-bug: #1576244
    Closes-bug: #1583173
    Closes-bug: #1585125

    Change-Id: I1cc6f95884a8fbd5c3418ede89bdf9ec6864bdc8

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.