Fuel for OpenStack

MySQL OCF RA action monitor must check if a seed node is running the most recent of known GTIDs

Bug #1583173 reported by Bogdan Dobrelya on 2016-05-18

This bug affects 5 people

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Committed	Medium	Bogdan Dobrelya	Fuel for OpenStack 10.0
7.0.x	Fix Released	Critical	Denis Puchkin	Fuel for OpenStack 7.0-mu-6
8.0.x	Fix Released	Critical	Denis Puchkin	Fuel for OpenStack 8.0-mu-4
Mitaka	Fix Released	Critical	Sergii Golovatiuk	Fuel for OpenStack 9.1

Bug Description

This bug looks not easy to catch up.
I caught it only a couple of times while was running jepsen tests for few days.

Details:
When the seed (aka master) node was started a long time ago, and later the OCF RA reports "MySQL lost quorum or uninitialized" on majority of the rest DB nodes, it ends up with either a *very* long auto-recovery time, or fails to recover at all.

Only the seed node keeps running, even if it has an obsolete GTID, which is not the most recent across the rest of the nodes. This requires a manual recovery of the DB cluster nodes. For example, one may "nuke" all mysqld on the nodes and allow the OCF RA to pick the most recent node. This provides sad UX, although should not be a big deal.

Example snippet (4/5 nodes was affected): http://pastebin.com/Nrih7BT1

To fix that, perhaps monitor must check if the current seed node (aka master) is running with a bad GTID, which is not the most recent across the nodes, and report failure.

Tags:

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2016-05-18:

mysql_logs.tgz Edit (2.0 MiB, application/x-tar)

Changed in fuel:
importance:	Undecided → Medium
milestone:	none → 10.0
tags:	added: galera
summary:	- MySQL OCF RA action monitor must check if a node is running the most - recent of known GTIDs + MySQL OCF RA action monitor must check if a seed node is running the + most recent of known GTIDs
Changed in fuel:
status:	New → Triaged

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-18: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/318162

Changed in fuel:
assignee:	nobody → Bogdan Dobrelya (bogdando)
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-20: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/318162
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=8093431349441f6b486e45e8aab62e0a8927a8e2
Submitter: Jenkins
Branch: master

commit 8093431349441f6b486e45e8aab62e0a8927a8e2
Author: Bogdan Dobrelya <email address hidden>
Date: Wed May 18 17:01:54 2016 +0200

Detect a split-brain for Galera OCF RA

    * One and only seed node (the one with the wsrep-new-cluster) shall
      be running, eventually.
    * For action monitor, check if the node is the seed one
      and is running the most recent GTID, or fail

Closes-bug: #1583173

Change-Id: Iaa4855d769fe1e0203fcfb9981413273e0e4dda2
Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-08-29: Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/361943

Revision history for this message

Sergii Golovatiuk (sgolovatiuk) wrote on 2016-08-29:

`pcs resource restart clone_p_mysqld` creates splitbrain

Maksim Malchuk (mmalchuk) on 2016-08-29

tags:

added: area-library

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-08-29: Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/361943
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=3478e0d1c8b35191e35a59a0f8cfc3f59030959f
Submitter: Jenkins
Branch: stable/mitaka

commit 3478e0d1c8b35191e35a59a0f8cfc3f59030959f
Author: Bogdan Dobrelya <email address hidden>
Date: Wed May 18 17:01:54 2016 +0200

Detect a split-brain for Galera OCF RA

Closes-bug: #1583173

    Change-Id: Iaa4855d769fe1e0203fcfb9981413273e0e4dda2
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 8093431349441f6b486e45e8aab62e0a8927a8e2)

Miroslav Anashkin (manashkin) on 2016-08-29

tags:

added: ct2 customer-found sla1 support

Dmitry Belyaninov (dbelyaninov) on 2016-08-31

tags:

added: on-verification

Revision history for this message

Dmitry Belyaninov (dbelyaninov) wrote on 2016-09-01:

Was verified according to #1617400.
Cluster: 3 controllers + 1 compute
Few restarts: pcs resource restart clone_p_mysqld

There is no ERROR messages on mysql logs.

Snapshot #206

tags:

removed: on-verification

Revision history for this message

Alexander Rubtsov (arubtsov) wrote on 2016-09-15:

sla1 for 7.0-updates

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-16: Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/371626

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-19:

#10

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/372471

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-19: Change abandoned on fuel-library (stable/7.0)

#11

Change abandoned by Mikhail Zhnichkov (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/371626

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-19: Fix proposed to fuel-library (stable/7.0)

#12

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/372522

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-21: Change abandoned on fuel-library (stable/7.0)

#13

Change abandoned by Mikhail Zhnichkov (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/372471

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-21: Fix proposed to fuel-library (stable/7.0)

#14

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/374219

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-21: Change abandoned on fuel-library (stable/7.0)

#15

Change abandoned by Mikhail Zhnichkov (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/372522
Reason: duplicate https://review.openstack.org/#/c/374219/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-26: Fix merged to fuel-library (stable/7.0)

#16

Reviewed: https://review.openstack.org/374219
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f9a2d479f3687157d2b17a927a09ce5f995522d6
Submitter: Jenkins
Branch: stable/7.0

commit f9a2d479f3687157d2b17a927a09ce5f995522d6
Author: Denis Puchkin <email address hidden>
Date: Wed Sep 21 17:38:54 2016 +0300

Backport mysql OCF from stable/mitaka

backport mysql ocf script from stable/mitaka

    Closes-bug: #1524826
    Closes-bug: #1542256
    Closes-bug: #1572239
    Closes-bug: #1572557
    Closes-bug: #1572601
    Closes-bug: #1574747
    Closes-bug: #1574497
    Closes-bug: #1576244
    Closes-bug: #1574999
    Closes-bug: #1578278
    Closes-bug: #1388779
    Closes-bug: #1574999
    Closes-bug: #1576244
    Closes-bug: #1583173
    Closes-bug: #1585125

Change-Id: I1cc6f95884a8fbd5c3418ede89bdf9ec6864bdc8

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-27: Fix proposed to fuel-library (stable/8.0)

#17

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/377597