tripleo

rabbitmq-ready fails when node size 1 and doesn't actually fail the deployment

Bug #1741345 reported by Alex Schultz on 2018-01-04

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	Critical	Damien Ciabrini	tripleo queens-3

Bug Description

Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Error: rabbitmqctl eval "rabbit_mnesia:is_clustered()." | grep -q true returned 1 instead of one of [0]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Error: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: change from notrun to 0 failed: rabbitmqctl eval "rabbit_mnesia:is_clustered()." | grep -q true returned 1 instead of one of [0]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Info: Class[Tripleo::Profile::Pacemaker::Rabbitmq_bundle]: Unscheduling all events on Class[Tripleo::Profile::Pacemaker::Rabbitmq_bundle]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Info: Creating state file /var/lib/puppet/state/state.yaml
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Notice: Applied catalog in 2074.61 seconds

http://logs.openstack.org/15/527515/2/check/tripleo-ci-centos-7-containers-multinode/dfe0070/logs/subnode-2/var/log/journal.txt.gz#_Jan_04_18_19_31

In taking a look at why deployment times were taking forever, it was noted that step 2 was taking ~30 minutes to complete. While looking into why, it seems that we're waiting for rabbitmq to become ready because of https://github.com/openstack/puppet-tripleo/commit/2f33d74173b79117c962146ac2c88fe1e3836403. Unfortunately because in our multinode jobs it doesn't actually ever cluster, the rabbitmq-ready exec eventually times out after ~2000 seconds.

The other issue is that this timeout doesn't actually fail the deployment because we're not using --detailed-exitcodes

https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/rabbitmq.yaml#L197

Tags:

Damien Ciabrini (dciabrin) on 2018-01-04

Changed in tripleo:
assignee:	nobody → Damien Ciabrini (dciabrin)

Revision history for this message

wes hayutin (weshayutin) wrote on 2018-01-04:

Nice find Alex!

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-04: Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/531261

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-05: Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/531352

Changed in tripleo:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-05: Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/531352
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=1cfecc39dc37a01bbaea114773397c062367ae42
Submitter: Zuul
Branch: master

commit 1cfecc39dc37a01bbaea114773397c062367ae42
Author: Damien Ciabrini <email address hidden>
Date: Fri Jan 5 09:50:17 2018 +0000

Fix rabbitmq-ready check for single node HA deployments

    The current rabbitmq-ready exec waits for rabbitmq to become clustered
    before it allows user creation. Unfortunately this doesn't work when
    the deployment contains a single node, because rabbit doesn't trigger
    the clustering mode at all.

Set the exec test according to the number of rabbit nodes, in order
to check for cluster state only when necessary.

Closes-Bug: #1741345

Change-Id: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445
Co-Authored-By: John Eckersberg <email address hidden>

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-05: Fix proposed to puppet-tripleo (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/531448

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-05: Fix merged to puppet-tripleo (stable/pike)

Reviewed: https://review.openstack.org/531448
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=958f824c85bf5c97ce4068ed63d976791ac3ed63
Submitter: Zuul
Branch: stable/pike

commit 958f824c85bf5c97ce4068ed63d976791ac3ed63
Author: Damien Ciabrini <email address hidden>
Date: Fri Jan 5 09:50:17 2018 +0000

Fix rabbitmq-ready check for single node HA deployments

Set the exec test according to the number of rabbit nodes, in order
to check for cluster state only when necessary.

Closes-Bug: #1741345

    Change-Id: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445
    Co-Authored-By: John Eckersberg <email address hidden>
    (cherry picked from commit 1cfecc39dc37a01bbaea114773397c062367ae42)

tags:

added: in-stable-pike

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-08: Fix included in openstack/puppet-tripleo 7.4.7

This issue was fixed in the openstack/puppet-tripleo 7.4.7 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-11: Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/531261
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=6f834f60e619181890c4438d1315ebaa776c1d09
Submitter: Zuul
Branch: master

commit 6f834f60e619181890c4438d1315ebaa776c1d09
Author: Alex Schultz <email address hidden>
Date: Thu Jan 4 16:23:37 2018 -0700

Use docker_config_scripts for puppet apply

    There are some configuration applies that we need to do during the
    deployment. These currently live as manually constructed bash runs which
    are missing the --detailed-exitcode handling to know when we have
    failures. In order to reduce the duplicated code and simplify this
    exeuction, this change creates a docker_config_scripts with
    docker_puppet_run.sh in containers-common that can be reused by any of
    the docker services. This allows use to properly handle
    --detailed-exitcodes while also reducing the amount of duplicated code
    bits that we have within THT.

    Additionally this change adds a new shared value for ContainersCommon to
    pull the required volumes for the docker_puppet_apply.sh script into a
    single place. Unfortunately the existing volumes from ContainersCommon
    includes a mount for /etc/puppet to /etc/puppet which causes problems
    because we need to be able to write out a hiera value. The /etc/puppet
    mount is needed for the bootstrap_host_exec function which is consumed
    by various docker_config tasks but the mount conflicts with the puppet
    apply logic being used.

    Depends-On: I24e5e344b7f657ce5d42a7c7c45be7b5ed5e6445
    Change-Id: Icf4a64ed76635e39bbb34c3a088c55e1f14fddca
    Related-Bug: #1741345
    Co-Authored-By: Damien Ciabrini <email address hidden>

Revision history for this message

Emilien Macchi (emilienm) wrote on 2018-01-11:

FYI we didn't catch that one: https://bugs.launchpad.net/tripleo/+bug/1742795

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-12: Related fix proposed to tripleo-heat-templates (stable/pike)

#10

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/533062

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-16: Related fix merged to tripleo-heat-templates (stable/pike)

#11

Reviewed: https://review.openstack.org/533062
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=4c2c3de1c6d459e611cb3bd1dddacde1f2122b94
Submitter: Zuul
Branch: stable/pike

commit 4c2c3de1c6d459e611cb3bd1dddacde1f2122b94
Author: Alex Schultz <email address hidden>
Date: Thu Jan 4 16:23:37 2018 -0700

Use docker_config_scripts for puppet apply

    Depends-On: I940cec6d670df39ac6e2a3559a028acbeee99331
    Change-Id: Icf4a64ed76635e39bbb34c3a088c55e1f14fddca
    Related-Bug: #1741345
    Co-Authored-By: Damien Ciabrini <email address hidden>
    (cherry picked from commit 6f834f60e619181890c4438d1315ebaa776c1d09)