Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Error: rabbitmqctl eval "rabbit_mnesia:is_clustered()." | grep -q true returned 1 instead of one of [0]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Error: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: change from notrun to 0 failed: rabbitmqctl eval "rabbit_mnesia:is_clustered()." | grep -q true returned 1 instead of one of [0]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Info: Class[Tripleo::Profile::Pacemaker::Rabbitmq_bundle]: Unscheduling all events on Class[Tripleo::Profile::Pacemaker::Rabbitmq_bundle]
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Info: Creating state file /var/lib/puppet/state/state.yaml
Jan 04 18:19:31 centos-7-inap-mtl01-0001686123 dockerd-current[29364]: Notice: Applied catalog in 2074.61 seconds
http://logs.openstack.org/15/527515/2/check/tripleo-ci-centos-7-containers-multinode/dfe0070/logs/subnode-2/var/log/journal.txt.gz#_Jan_04_18_19_31
In taking a look at why deployment times were taking forever, it was noted that step 2 was taking ~30 minutes to complete. While looking into why, it seems that we're waiting for rabbitmq to become ready because of https://github.com/openstack/puppet-tripleo/commit/2f33d74173b79117c962146ac2c88fe1e3836403. Unfortunately because in our multinode jobs it doesn't actually ever cluster, the rabbitmq-ready exec eventually times out after ~2000 seconds.
The other issue is that this timeout doesn't actually fail the deployment because we're not using --detailed-exitcodes
https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/rabbitmq.yaml#L197
Nice find Alex!