RabbitMQ cluster upgrade failing

Bug #1474992 reported by Jacob Wagner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
High
git-harry
Kilo
Fix Released
High
git-harry
Trunk
Fix Released
High
git-harry

Bug Description

When upgrading rabbitmq versions, the cluster fails to start due to the rabbitmq-server service not being upgraded and restarted in a serial manner.

- Here is the initial upgrade run

https://gist.github.com/jacobwagner/95a49a5e5fa0a181cf7e#file-gistfile1-txt-L209-L223

- Here is a rerun of the same setup-infra play

https://gist.github.com/jacobwagner/95a49a5e5fa0a181cf7e#file-gistfile1-txt-L209-L223

- Here is the startup log for one of the out of cluster services

https://gist.github.com/jacobwagner/95a49a5e5fa0a181cf7e#file-gistfile1-txt-L209-L223

It looks like the services are all getting the upgraded packages, but the service (container) that upgrades first trys to reconnect to the services (containers) that have yet to upgrade so they fail with a version mismatch.

description: updated
Revision history for this message
Nolan Brubaker (nolan-brubaker) wrote :

I am setting up a test lab to verify this bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-ansible-deployment (master)

Fix proposed to branch: master
Review: https://review.openstack.org/202681

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (master)

Reviewed: https://review.openstack.org/202681
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=6ea86e6274bedaff955d734222c9f4b1cc2c3a2a
Submitter: Jenkins
Branch: master

commit 6ea86e6274bedaff955d734222c9f4b1cc2c3a2a
Author: git-harry <email address hidden>
Date: Thu Jul 16 11:01:13 2015 +0100

    Fix rabbitmq playbook to allow upgrades

    The rabbitmq playbook is designed to run in parallel across the cluster.
    This causes an issue when upgrading rabbitmq to a new major or minor
    version because RabbitMQ does not support doing an online migration of
    datasets between major versions. while a minor release can be upgrade
    while online it is recommended to bring down the cluster to do any
    upgrade actions. The current configuration takes no account of this.

    Reference:
    https://www.rabbitmq.com/clustering.html#upgrading for further details.

    * A new variable has been added called `rabbitmq_upgrade`. This is set to
      false by default to prevent a new version being installed unintentionally.
      To run the upgrade, which will shutdown the cluster, the variable can be
      set to true on the commandline:

      Example:
        openstack-ansible -e rabbitmq_upgrade=true \
        rabbitmq-install.yml

    * A new variable has been added called `rabbitmq_ignore_version_state`
      which can be set "true" to ignore the package and version state tasks.
      This has been provided to allow a deployer to rerun the plays in an
      environment where the playbooks have been upgraded and the default
      version of rabbitmq has changed within the role and the deployer has
      elected to upgraded the installation at that time. This will ensure a
      deployer is able to recluster an environment as needed without
      effecting the package state.

      Example:
        openstack-ansible -e rabbitmq_ignore_version_state=true \
        rabbitmq-install.yml

    * A new variable has been added `rabbitmq_primary_cluster_node` which
      allows a deployer to elect / set the primary cluster node in an
      environment. This variable is used to determine the restart order
      of RabbitMQ nodes. IE this will be the last node down and first one
      up in an environment. By default this variable is set to:
      rabbitmq_primary_cluster_node: "{{ groups['rabbitmq_all'][0] }}"

    scripts/run-upgrade.sh has been modified to pass 'rabbitmq_upgrade=true'
    on the command line so that RabbitMQ can be upgraded as part of the
    upgrade between OpenStack versions.

    DocImpact
    Change-Id: I17d4429b9b94d47c1578dd58a2fb20698d1fe02e
    Closes-bug: #1474992

Changed in openstack-ansible:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-ansible-deployment (kilo)

Reviewed: https://review.openstack.org/204680
Committed: https://git.openstack.org/cgit/stackforge/os-ansible-deployment/commit/?id=24300d361d7f5828992cddbb5c80b03249a7e9ba
Submitter: Jenkins
Branch: kilo

commit 24300d361d7f5828992cddbb5c80b03249a7e9ba
Author: git-harry <email address hidden>
Date: Thu Jul 16 11:01:13 2015 +0100

    Fix rabbitmq playbook to allow upgrades

    The rabbitmq playbook is designed to run in parallel across the cluster.
    This causes an issue when upgrading rabbitmq to a new major or minor
    version because RabbitMQ does not support doing an online migration of
    datasets between major versions. while a minor release can be upgrade
    while online it is recommended to bring down the cluster to do any
    upgrade actions. The current configuration takes no account of this.

    Reference:
    https://www.rabbitmq.com/clustering.html#upgrading for further details.

    * A new variable has been added called `rabbitmq_upgrade`. This is set to
      false by default to prevent a new version being installed unintentionally.
      To run the upgrade, which will shutdown the cluster, the variable can be
      set to true on the commandline:

      Example:
        openstack-ansible -e rabbitmq_upgrade=true \
        rabbitmq-install.yml

    * A new variable has been added called `rabbitmq_ignore_version_state`
      which can be set "true" to ignore the package and version state tasks.
      This has been provided to allow a deployer to rerun the plays in an
      environment where the playbooks have been upgraded and the default
      version of rabbitmq has changed within the role and the deployer has
      elected to upgraded the installation at that time. This will ensure a
      deployer is able to recluster an environment as needed without
      effecting the package state.

      Example:
        openstack-ansible -e rabbitmq_ignore_version_state=true \
        rabbitmq-install.yml

    * A new variable has been added `rabbitmq_primary_cluster_node` which
      allows a deployer to elect / set the primary cluster node in an
      environment. This variable is used to determine the restart order
      of RabbitMQ nodes. IE this will be the last node down and first one
      up in an environment. By default this variable is set to:
      rabbitmq_primary_cluster_node: "{{ groups['rabbitmq_all'][0] }}"

    scripts/run-upgrade.sh has been modified to pass 'rabbitmq_upgrade=true'
    on the command line so that RabbitMQ can be upgraded as part of the
    upgrade between OpenStack versions.

    DocImpact
    Change-Id: I17d4429b9b94d47c1578dd58a2fb20698d1fe02e
    Closes-bug: #1474992
    (cherry picked from commit 6ea86e6274bedaff955d734222c9f4b1cc2c3a2a)

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/openstack-ansible 11.2.14

This issue was fixed in the openstack/openstack-ansible 11.2.14 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.