Change rabbitmq queues HA mode from ha-all to ha-exactly

Bug #1628998 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Michele Baldessari

Bug Description

From Marian Krcmarik:

It turns out that reducing number of rabbitmq queues in cluster significantly improves performance of cluster especially in the case of failover recovery time. Right now cluster uses ha-all mode for rabbitmq queues, I suggest to change this to ha-exactly mode and reducing number of queue copies to ceil(N/2) where N is number of controllers in the cluster - so in typical scenario of 3 controller It would be 2.

It does not make much sense to keep the copies of queues over whole cluster since if the quorum of nodes is lost then the rest of cluster nodes will be stopped anyway.

This current setting:
pcs resource show rabbitmq
 Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
  Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"
And I am requesting to change the parameters of rabbitmq pacemaker resource to:
pcs resource show rabbitmq
 Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
  Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"
According to my tests, I achieved to reduce recovery time singificantly.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/379584

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Michele Baldessari (michele) wrote :

I added only a Partial-Bug line in the reviews because we still need to figure out how we will fix up the upgrade path

tags: added: newton-backport-potential
Changed in tripleo:
milestone: ocata-1 → newton-rc3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/380775

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/380979

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/newton)

Change abandoned by Marios Andreou (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/380979
Reason: abandon because the depends on isn't branch aware so we need to let it land first (there is https://review.openstack.org/#/c/379586/ which depends on this change )

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/379584
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1c5d16854417665f970ab6899759c25f865bf515
Submitter: Jenkins
Branch: master

commit 1c5d16854417665f970ab6899759c25f865bf515
Author: Michele Baldessari <email address hidden>
Date: Thu Sep 29 18:30:23 2016 +0200

    Change rabbitmq queues HA mode from ha-all to ha-exactly

    It turns out that reducing number of rabbitmq queues in cluster
    significantly improves performance of cluster especially in the case of
    failover recovery time. Right now the cluster uses ha-all mode for rabbitmq
    queues.

    It is best to change this to "ha-exactly" mode and reduce the number
    of queue copies to ceil(N/2) where N is number of controllers in the
    cluster - so in typical scenario of 3 controller It would be 2 by
    default.

    It does not make much sense to keep the copies of queues over whole
    cluster since if the quorum of nodes is lost then the rest of cluster
    nodes will be stopped anyway. We let the user override this with a
    parameter.

    I.e. for a 3 node controlplane cluster we will go from this:
    pcs resource show rabbitmq
     Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
      Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"

    To this:
    pcs resource show rabbitmq
     Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
      Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"

    According to Marin Krcmarik's testing recovery time from failure was
    reduced significantly.

    Partial-Bug: #1628998
    Change-Id: Iace6daf27a76cb8ef1050ada0de7ff1f530916c6

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/380775
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1d7231aae2b523c249da8bada3d33c889c94abb9
Submitter: Jenkins
Branch: master

commit 1d7231aae2b523c249da8bada3d33c889c94abb9
Author: Michele Baldessari <email address hidden>
Date: Sat Oct 1 17:42:54 2016 +0200

    Change the rabbitmq ha policies during an M/N Upgrade

    This takes care of the M->N upgrade path when changing
    the ha rabbitmq policy.

    Partial-Bug: #1628998

    Change-Id: I2468a096b5d7042bc801a742a7a85fb1521c1c02

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/381485

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/381489

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/newton)

Change abandoned by Michele Baldessari (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/381489
Reason: So I discussed this with Peter and John a little more and we decided that we will just pursue this change for Ocata (where most of it has landed already anyway)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Michele Baldessari (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/381485
Reason: So I discussed this with Peter and John a little more and we decided that we will just pursue this change for Ocata (where most of it has landed already anyway)

Changed in tripleo:
milestone: newton-rc3 → ocata-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Marios Andreou (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/380979
Reason: reabandon on request from bandini via irc

Revision history for this message
Emilien Macchi (emilienm) wrote :

Removing the backport tag since we won't backport it in newton.

tags: removed: newton-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/389309
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=71ed1dba52639fd34fd039c29195537dfff91d3f
Submitter: Jenkins
Branch: master

commit 71ed1dba52639fd34fd039c29195537dfff91d3f
Author: Michele Baldessari <email address hidden>
Date: Thu Oct 20 20:27:11 2016 +0200

    Fix up Newton->Ocata rabbitmq ha policy

    In ocata we changed the ha policy to "ha-exactly" via the following changes:
    - tht: Iace6daf27a76cb8ef1050ada0de7ff1f530916c6
    - puppet-tripleo: Ib62001c03e1e08f58cf0c6e0ba07a8879a584084

    We initially also took care of changing this policy (which is set in the
    pacemaker resource agent) for the M/N upgrade path:
    I2468a096b5d7042bc801a742a7a85fb1521c1c02

    In the end we decided against changing the policy in Newton as well (it
    was only for ocata) as it was too close to the release date and we took
    the safer path.
    This patch does two things:
    1) It renames the upgrade function to "newton_ocata" since that is the
    only upgrade path we need to take care of
    2) It reinstates the actual upgrade function which was mistakenly
    removed via an unrelated change in the ceilometer upgrade path:
    If9d6987cd0a8fc5d3f9de518ba422d97d5149732

    Closes-Bug: #1628998

    Change-Id: I3a97505d2ae1ae27f3080ffe74c33fdabffd2420

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 6.0.0.0b1

This issue was fixed in the openstack/tripleo-heat-templates 6.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.