haproxy restarts can break cluster init

Bug #1348931 reported by Julia Kreger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Ben Nemec

Bug Description

When control scale is set to greater than 2 and the MySQL configuration has been updated to point to the remote VIP, there is a possibility that the controller that takes ownership of the VIP may not be the same controller node as the controller bootstrap node. With this, there is a slim possibility that initialization steps may be connected to the VIP and loose connectivity during the initialization process due to the other node re-attempting initialization and restarting haproxy. This occurring can effectively break the initialization sequence possibly leaving it in a state that cannot be recovered from.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (master)

Fix proposed to branch: master
Review: https://review.openstack.org/109963

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/109963
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=d9a348837693b649a212c9ac612e0f1ee3808934
Submitter: Jenkins
Branch: master

commit d9a348837693b649a212c9ac612e0f1ee3808934
Author: Julia Kreger <email address hidden>
Date: Mon Jul 28 07:32:48 2014 -0400

    Adding haproxy reload check

    When an initializing controller node restarts haproxy as part of the
    initialization sequence, and that node happens to be in control of
    the VIP address but is not the bootstrap node, then the initial
    cluster initialization sequence can fail.

    This commit changes the haproxy script such that if haproxy is
    running, then it is reloaded instead of restarted.

    Change-Id: I1b35d3ebdb08e175fb0fd58c0c395b7b30103b4b
    Closes-bug: #1348931

Changed in tripleo:
status: In Progress → Fix Committed
Jay Dobies (jdob)
Changed in tripleo:
status: Fix Committed → Fix Released
Revision history for this message
Ben Nemec (bnemec) wrote :

Reopening because this fix has significant issues due to a broken Fedora systemd file, so we're going to have to revert it: https://review.openstack.org/#/c/112091/

Changed in tripleo:
status: Fix Released → Triaged
Changed in tripleo:
assignee: Julia Kreger (juliaashleykreger) → Ben Nemec (bnemec)
status: Triaged → In Progress
Changed in tripleo:
assignee: Ben Nemec (bnemec) → Julia Kreger (juliaashleykreger)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/112366
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=e3d26159bd1746fa8d10f77cbab798735da8d333
Submitter: Jenkins
Branch: master

commit e3d26159bd1746fa8d10f77cbab798735da8d333
Author: Ben Nemec <email address hidden>
Date: Wed Aug 6 17:41:46 2014 -0500

    Add workaround for reloading haproxy on Fedora

    We want to be able to reload haproxy, but the upstream service file
    for haproxy fails when we do. This overwrites the upstream service
    file with a fixed one that will eventually be added to the upstream
    package.

    Change-Id: I63bca52d7601850de87456fbb221a8400c6933aa
    Related-Bug: 1348931

Changed in tripleo:
assignee: Julia Kreger (juliaashleykreger) → Ben Nemec (bnemec)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/112367
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=753a5f2d6047d7b4bbaddfa90c0cd77db0872d78
Submitter: Jenkins
Branch: master

commit 753a5f2d6047d7b4bbaddfa90c0cd77db0872d78
Author: Ben Nemec <email address hidden>
Date: Wed Aug 6 17:43:30 2014 -0500

    Revert "Revert "Adding haproxy reload check""

    With the previous workaround commit this should work again.

    This reverts commit d35bc8570f197e8499ddf09657ff6b9c8224a20f.

    Change-Id: I1f0e66a3ec53ae15b4ef85444f78cc4abb9e45e2
    Closes-Bug: 1348931

Changed in tripleo:
status: In Progress → Fix Committed
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.