Improve failed mysql node removal time in HA deployments

Bug #1639189 reported by Chris Jones
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Chris Jones
Nominated for Newton by Chris Jones

Bug Description

The current haproxy.pp in puppet-triplo uses most of its global defaults when deploying HA mysql.

These settings (specifically "fall 5" "inter 2000") mean that a failed database node will not be removed from the proxy for 10 seconds. During this time, any OpenStack services which queries the database will return errors, causing confusion and delay for users.

Triggering the node removal much faster, specifically after 1 second ("inter 1s" with no "fall") is better for operators.

Prior discussions around this can be seen at https://bugzilla.redhat.com/show_bug.cgi?id=1389413 and https://bugzilla.redhat.com/show_bug.cgi?id=1211781

Chris Jones (cmsj)
summary: - Improve mysql failover time in HA deployments
+ Improve failed mysql node removal time in HA deployments
description: updated
Revision history for this message
Chris Jones (cmsj) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/393673
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=943e49435621062a1c898de4b2edccf48c59fc16
Submitter: Jenkins
Branch: master

commit 943e49435621062a1c898de4b2edccf48c59fc16
Author: Chris Jones <email address hidden>
Date: Fri Nov 4 10:01:24 2016 +0000

    Improve failed mysql node removal time in HA deploys.

    In HA deployments, we now check mysql nodes every 1s and removed them
    immediately if they are failed. Previously we would check every 2s and
    allow them to fail 5 checks before being removed, producing errors from
    other OpenStack services for 10s, which causes confusion and delay for
    operators.
    Additionally, these check options are now also a class parameter so can
    be overridden by operators.

    Closes-Bug: #1639189

    Change-Id: I0b915f790ae5a4b018a212d3aa83cca507be05e9

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/396092

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/newton)

Reviewed: https://review.openstack.org/396092
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=a99d3d3b93acaa3627e0488fb6fcfb5172e9f63f
Submitter: Jenkins
Branch: stable/newton

commit a99d3d3b93acaa3627e0488fb6fcfb5172e9f63f
Author: Chris Jones <email address hidden>
Date: Fri Nov 4 10:01:24 2016 +0000

    Improve failed mysql node removal time in HA deploys.

    In HA deployments, we now check mysql nodes every 1s and removed them
    immediately if they are failed. Previously we would check every 2s and
    allow them to fail 5 checks before being removed, producing errors from
    other OpenStack services for 10s, which causes confusion and delay for
    operators.
    Additionally, these check options are now also a class parameter so can
    be overridden by operators.

    Closes-Bug: #1639189

    Change-Id: I0b915f790ae5a4b018a212d3aa83cca507be05e9
    (cherry picked from commit 943e49435621062a1c898de4b2edccf48c59fc16)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 6.0.0

This issue was fixed in the openstack/puppet-tripleo 6.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 5.5.0

This issue was fixed in the openstack/puppet-tripleo 5.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.