rabbitmq-ready fails when node size 1 and doesn't actually fail the deployment

Bug #1741345 reported by Alex Schultz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Damien Ciabrini

Bug Description

Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Error: rabbitmqctl eval "rabbit_mnesia:is_clustered()." | grep -q true returned 1 instead of one of [0]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Error: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: change from notrun to 0 failed: rabbitmqctl eval "rabbit_mnesia:is_clustered()." | grep -q true returned 1 instead of one of [0]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Info: Class[Tripleo::Profile::Pacemaker::Rabbitmq_bundle]: Unscheduling all events on Class[Tripleo::Profile::Pacemaker::Rabbitmq_bundle]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Info: Creating state file /var/lib/puppet/state/state.yaml
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Notice: Applied catalog in 2074.61 seconds

http://logs.openstack.org/15/527515/2/check/tripleo-ci-centos-7-containers-multinode/dfe0070/logs/subnode-2/var/log/journal.txt.gz#_Jan_04_18_19_31

In taking a look at why deployment times were taking forever, it was noted that step 2 was taking ~30 minutes to complete. While looking into why, it seems that we're waiting for rabbitmq to become ready because of https://github.com/openstack/puppet-tripleo/commit/2f33d74173b79117c962146ac2c88fe1e3836403. Unfortunately because in our multinode jobs it doesn't actually ever cluster, the rabbitmq-ready exec eventually times out after ~2000 seconds.

The other issue is that this timeout doesn't actually fail the deployment because we're not using --detailed-exitcodes

https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/rabbitmq.yaml#L197

Changed in tripleo:
assignee: nobody → Damien Ciabrini (dciabrin)
Revision history for this message
wes hayutin (weshayutin) wrote :

Nice find Alex!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/531261

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/531352

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/531352
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=1cfecc39dc37a01bbaea114773397c062367ae42
Submitter: Zuul
Branch: master

commit 1cfecc39dc37a01bbaea114773397c062367ae42
Author: Damien Ciabrini <email address hidden>
Date: Fri Jan 5 09:50:17 2018 +0000

    Fix rabbitmq-ready check for single node HA deployments

    The current rabbitmq-ready exec waits for rabbitmq to become clustered
    before it allows user creation. Unfortunately this doesn't work when
    the deployment contains a single node, because rabbit doesn't trigger
    the clustering mode at all.

    Set the exec test according to the number of rabbit nodes, in order
    to check for cluster state only when necessary.

    Closes-Bug: #1741345

    Change-Id: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445
    Co-Authored-By: John Eckersberg <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/531448

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/pike)

Reviewed: https://review.openstack.org/531448
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=958f824c85bf5c97ce4068ed63d976791ac3ed63
Submitter: Zuul
Branch: stable/pike

commit 958f824c85bf5c97ce4068ed63d976791ac3ed63
Author: Damien Ciabrini <email address hidden>
Date: Fri Jan 5 09:50:17 2018 +0000

    Fix rabbitmq-ready check for single node HA deployments

    The current rabbitmq-ready exec waits for rabbitmq to become clustered
    before it allows user creation. Unfortunately this doesn't work when
    the deployment contains a single node, because rabbit doesn't trigger
    the clustering mode at all.

    Set the exec test according to the number of rabbit nodes, in order
    to check for cluster state only when necessary.

    Closes-Bug: #1741345

    Change-Id: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445
    Co-Authored-By: John Eckersberg <email address hidden>
    (cherry picked from commit 1cfecc39dc37a01bbaea114773397c062367ae42)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 7.4.7

This issue was fixed in the openstack/puppet-tripleo 7.4.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/531261
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6f834f60e619181890c4438d1315ebaa776c1d09
Submitter: Zuul
Branch: master

commit 6f834f60e619181890c4438d1315ebaa776c1d09
Author: Alex Schultz <email address hidden>
Date: Thu Jan 4 16:23:37 2018 -0700

    Use docker_config_scripts for puppet apply

    There are some configuration applies that we need to do during the
    deployment. These currently live as manually constructed bash runs which
    are missing the --detailed-exitcode handling to know when we have
    failures. In order to reduce the duplicated code and simplify this
    exeuction, this change creates a docker_config_scripts with
    docker_puppet_run.sh in containers-common that can be reused by any of
    the docker services. This allows use to properly handle
    --detailed-exitcodes while also reducing the amount of duplicated code
    bits that we have within THT.

    Additionally this change adds a new shared value for ContainersCommon to
    pull the required volumes for the docker_puppet_apply.sh script into a
    single place. Unfortunately the existing volumes from ContainersCommon
    includes a mount for /etc/puppet to /etc/puppet which causes problems
    because we need to be able to write out a hiera value. The /etc/puppet
    mount is needed for the bootstrap_host_exec function which is consumed
    by various docker_config tasks but the mount conflicts with the puppet
    apply logic being used.

    Depends-On: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445
    Change-Id: Icf4a64ed76635e39bbb34c3a088c55e1f14fddca
    Related-Bug: #1741345
    Co-Authored-By: Damien Ciabrini <email address hidden>

Revision history for this message
Emilien Macchi (emilienm) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/533062

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/533062
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=4c2c3de1c6d459e611cb3bd1dddacde1f2122b94
Submitter: Zuul
Branch: stable/pike

commit 4c2c3de1c6d459e611cb3bd1dddacde1f2122b94
Author: Alex Schultz <email address hidden>
Date: Thu Jan 4 16:23:37 2018 -0700

    Use docker_config_scripts for puppet apply

    There are some configuration applies that we need to do during the
    deployment. These currently live as manually constructed bash runs which
    are missing the --detailed-exitcode handling to know when we have
    failures. In order to reduce the duplicated code and simplify this
    exeuction, this change creates a docker_config_scripts with
    docker_puppet_run.sh in containers-common that can be reused by any of
    the docker services. This allows use to properly handle
    --detailed-exitcodes while also reducing the amount of duplicated code
    bits that we have within THT.

    Additionally this change adds a new shared value for ContainersCommon to
    pull the required volumes for the docker_puppet_apply.sh script into a
    single place. Unfortunately the existing volumes from ContainersCommon
    includes a mount for /etc/puppet to /etc/puppet which causes problems
    because we need to be able to write out a hiera value. The /etc/puppet
    mount is needed for the bootstrap_host_exec function which is consumed
    by various docker_config tasks but the mount conflicts with the puppet
    apply logic being used.

    Depends-On: I940cec6d670df39ac6e2a3559a028acbeee99331
    Change-Id: Icf4a64ed76635e39bbb34c3a088c55e1f14fddca
    Related-Bug: #1741345
    Co-Authored-By: Damien Ciabrini <email address hidden>
    (cherry picked from commit 6f834f60e619181890c4438d1315ebaa776c1d09)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 8.2.0

This issue was fixed in the openstack/puppet-tripleo 8.2.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.