Galera sst not finished before timeout of starting mysql in Pacemaker

Bug #1478310 reported by Sam Stoelinga on 2015-07-26
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Sergii Golovatiuk
6.1.x
High
Rodion Tikunov
7.0.x
High
Rodion Tikunov
8.0.x
High
Sergii Golovatiuk

Bug Description

Description: If one of the nodes fail in the MySQL galera cluster there is sometimes the need for full state transfer which takes more than 10 minutes if the database is large e.g. when using Zabbix.

Current situation:
The mysql database is unable to start because it gets killed before full state transfer is finished.

Expected situation:
Instead of increasing the timeout we should do a special check to see whether a state transfer is still in progress if so notify pacemaker to not yet kill the mysql starting process. IF this is not possible we can consider changing timeout from 600 seconds to higher, but i suggest instead we decrease the timeout and use a specific check to see if SST is in process.

Steps to reproduce:
1. Create an HA environment
2. Import 10+Gb of data into the mysql database
3. Let one of the nodes fail on purpose and require full state transfer. Detailed steps for this:
kill mysqld_safe and mysqld process && rm /var/lib/mysql/* -rf
4. Wait for corosync/pacemaker to restart the mysql process on specified node.
5. The log on failed node should show: WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr,nodelay,sndbuf=1048576,rcvbuf=1048576 stdio

Current workaround:
crm configure edit p_mysql

Chanage "op start interval=0 timeout=600"
to "op start interval=0 timeout=1600"

description: updated
Changed in fuel:
milestone: none → 7.0
assignee: nobody → Fuel Library Team (fuel-library)
status: New → Confirmed
importance: Undecided → High
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergii Golovatiuk (sgolovatiuk)
Changed in fuel:
status: Confirmed → In Progress
Ruslan Kamaldinov (ruhe) wrote :

Assigned to Denis per discussion with Andrey Maksimov.

Changed in fuel:
assignee: Sergii Golovatiuk (sgolovatiuk) → Denis Egorenko (degorenko)
Changed in fuel:
assignee: Denis Egorenko (degorenko) → Sergii Golovatiuk (sgolovatiuk)

Reviewed: https://review.openstack.org/211279
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=566c30b722a94e019d4c27f36d09fc87ae2cd10f
Submitter: Jenkins
Branch: master

commit 566c30b722a94e019d4c27f36d09fc87ae2cd10f
Author: Sergii Golovatiuk <email address hidden>
Date: Mon Aug 10 19:40:41 2015 +0200

    Include SST state to mysql_monitor

    Do not kill mysqld process while SST is in progress

    Closes-Bug: 1478310
    Change-Id: Iefd028e8b6c2fa867df4ed6089b6e9da87c339c9

Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Oleksiy Molchanov (omolchanov) wrote :

Verified

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "7.0"
  openstack_version: "2015.1.0-7.0"
  api: "1.0"
  build_number: "284"
  build_id: "284"
  nailgun_sha: "5c33995a2e6d9b1b8cdddfa2630689da5084506f"
  python-fuelclient_sha: "1ce8ecd8beb640f2f62f73435f4e18d1469979ac"
  fuel-agent_sha: "082a47bf014002e515001be05f99040437281a2d"
  fuel-nailgun-agent_sha: "d7027952870a35db8dc52f185bb1158cdd3d1ebd"
  astute_sha: "8283dc2932c24caab852ae9de15f94605cc350c6"
  fuel-library_sha: "f81fdabe6c05be7a3d11d88a7c3a8f3931921c73"
  fuel-ostf_sha: "1f08e6e71021179b9881a824d9c999957fcc7045"
  fuelmain_sha: "9ab01caf960013dc882825dc9b0e11ccf0b81cb0"

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released
Andrii Petrenko (aplsms) wrote :

this fix is not fixing the bug:

after applying the patch we will have monitor process that looking for mysql.pid or for mysql process in memory (not sure), but during the replication that not satisfied. So, monitor killing the replication and we have file sst_in_progress exist.
from pacemaker side we have "OK" status, and replication is not starting.

finally, we have broken cluster, but it is OK from pacemaker prospective.

tags: added: customer-found support
Sergii Golovatiuk (sgolovatiuk) wrote :

Not all cases are resolved.

Changed in fuel:
status: Fix Released → Confirmed
Dmitry Pyzhov (dpyzhov) on 2016-01-21
no longer affects: fuel/mitaka
Changed in fuel:
milestone: 7.0 → 9.0
tags: added: area-library
tags: added: team-bugfix

Fix proposed to branch: master
Review: https://review.openstack.org/272665

Changed in fuel:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/272665
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=0039c23202739fd47df54183ab6e3c7070fc4579
Submitter: Jenkins
Branch: master

commit 0039c23202739fd47df54183ab6e3c7070fc4579
Author: Sergii Golovatiuk <email address hidden>
Date: Tue Jan 26 18:21:28 2016 +0100

    Add SST check to mysql_status

    Change-Id: Ib7dd20a0322ecb57748188290d5c52467b82c765
    Closes-Bug: 1478310

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/273258
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=a6cb75f1e33b72bd508111ef29316d588eab913f
Submitter: Jenkins
Branch: stable/8.0

commit a6cb75f1e33b72bd508111ef29316d588eab913f
Author: Sergii Golovatiuk <email address hidden>
Date: Tue Jan 26 18:21:28 2016 +0100

    Add SST check to mysql_status

    Change-Id: Ib7dd20a0322ecb57748188290d5c52467b82c765
    Closes-Bug: 1478310
    (cherry picked from commit 0039c23202739fd47df54183ab6e3c7070fc4579)

tags: added: area-docs release-notes

Fix proposed to branch: stable/6.1
Change author: Sergii Golovatiuk <email address hidden>
Review: https://review.fuel-infra.org/17411

Change abandoned by Rodion Tikunov <email address hidden> on branch: stable/6.1
Review: https://review.fuel-infra.org/17411
Reason: Not right selected repo

Reviewed: https://review.openstack.org/285169
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=5aeab5aaa6dda91aa6629b81c53a0e44f57147ca
Submitter: Jenkins
Branch: stable/7.0

commit 5aeab5aaa6dda91aa6629b81c53a0e44f57147ca
Author: Sergii Golovatiuk <email address hidden>
Date: Tue Jan 26 18:21:28 2016 +0100

    Add SST check to mysql_status

    Change-Id: Ib7dd20a0322ecb57748188290d5c52467b82c765
    Closes-Bug: #1478310
    (cherry picked from commit 0039c23202739fd47df54183ab6e3c7070fc4579)

tags: added: on-verification

Verified on MOS 7.0 + mu3 updates.

tags: removed: on-verification

Reviewed: https://review.openstack.org/284587
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=43de2a31699963d8c225d8c53c8cf0df7ba4bcb4
Submitter: Jenkins
Branch: stable/6.1

commit 43de2a31699963d8c225d8c53c8cf0df7ba4bcb4
Author: Sergii Golovatiuk <email address hidden>
Date: Tue Jan 26 18:21:28 2016 +0100

    Add SST check to mysql_status

    Conflicts:
     files/fuel-ha-utils/ocf/mysql-wss

    Change-Id: Ib7dd20a0322ecb57748188290d5c52467b82c765
    Closes-Bug: #1478310
    (cherry picked from commit 0039c23202739fd47df54183ab6e3c7070fc4579)

tags: added: on-verification

Verified on MOS 6.1 + mu6 updates.

Used steps to reproduce from bug description.

Actual result:
If to fill mysql database on 11 Gb and then to kill mysqld_safe and mysqld process and to clear /var/lib/mysql folder, pacemaker didn't kill the mysql starting process after 5 min. Full state transfer is finished in ~10 min.

tags: removed: on-verification
tags: added: on-verification

Verified on:
[root@nailgun ~]# shotgun2 short-report
cat /etc/fuel_build_id:
 455
cat /etc/fuel_build_number:
 455
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0

Actual result:
If to fill mysql database on 11 Gb and then to kill mysqld_safe and mysqld process and to clear /var/lib/mysql folder, pacemaker didn't kill the mysql starting process after 5 min. Full state transfer is finished in ~10 min.

tags: removed: on-verification
Changed in fuel:
status: Fix Committed → Fix Released

Related fix proposed to branch: master
Change author: Evgeny Konstantinov <email address hidden>
Review: https://review.fuel-infra.org/22326

Reviewed: https://review.fuel-infra.org/22326
Submitter: Evgeny Konstantinov <email address hidden>
Branch: master

Commit: dcd0ac5631e06625c76d2d62c17bc1fef67a7072
Author: Evgeny Konstantinov <email address hidden>
Date: Wed Jun 22 10:51:45 2016

Add resolved issues to relnotes 9.0

Change-Id: I87df13ac06921547312dd2165097d080528ec864
Related-Bug: #1587960
Related-Bug: #1544446
Related-Bug: #1478310
Related-Bug: #1543050
Related-Bug: #1495699

tags: added: release-notes-done
removed: release-notes
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers