mistral fails to start the ceph container

Bug #1712432 reported by wes hayutin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
John Trowbridge

Bug Description

Running gate-tripleo-ci-centos-7-scenario001-multinode-oooq-container for promotion with freshly built containers.

This failed the same way twice in a row; while other container scenario jobs passed.
https://review.rdoproject.org/jenkins/job/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016/17/

https://logs.rdoproject.org/47/475747/108/openstack-manual/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016/Z0d5cd493202546698d4d3ea112e4c1a6/undercloud/var/log/mistral/executor.log.txt.gz#_2017-08-22_20_13_19_412

: provided hosts list is empty, only localhost is available\nERROR! playbooks must be a list of plays\n\nThe error appears to have been in '/tmp/ansible-mistral-actionPjtjb9/playbook.yaml': line 1, column 1, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n/usr/share/ceph-ansible/site-docker.yml.sample\n^ here\n"
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor Traceback (most recent call last):
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 109, in run_action
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor result = action.run(context.ctx())
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/actions/ansible.py", line 409, in run
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor log_errors=processutils.LogErrors.ALL)
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 400, in execute
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor cmd=sanitized_cmd)
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor ProcessExecutionError: Unexpected error while running command.
2017-08-22 20:13:19.412 31700 ERROR mistral.executors.default_executor Command: ansible-playbook /tmp/ansible-mistral-actionPjtjb9/playbook.yaml --user tripleo-admin --become --become-user root --extra-vars {"monitor_secret": "***", "ceph_conf_overrides": {"global":

wes hayutin (weshayutin)
Changed in tripleo:
importance: Critical → High
Revision history for this message
Giulio Fidente (gfidente) wrote :

The actual error is

  ERROR! playbooks must be a list of plays

which I have seen when the playbook /usr/share/ceph-ansible/site-docker.yml.sample was not installed; I am inclined to think that ceph-ansible wasn't installed in the undercloud but I am not sure why this should be only in the promotion job

The two submissions which took care of installing the package are:

https://review.openstack.org/#/c/478977/
https://review.openstack.org/#/c/478986/

Is there any reason why they would not be effective for the promotion job?

Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
assignee: nobody → John Trowbridge (trown)
Revision history for this message
John Trowbridge (trown) wrote :

Ya it is because https://review.openstack.org/475747 is not merged, so we are not getting the ceph-ansible patch in tripleo-ci. We can rebase it after the current manual run if it is not merged yet. Would be even better to just merge https://review.openstack.org/475747 though.

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
Giulio Fidente (gfidente) wrote :

Though I have to admit that that error message is really deep into mistral and we should fail harder and sooner if the package isn't installed; problem is we wanted it to be optional so right now the only thing we have is a warning message in the pre-deployment validations [1], not sure if that is visible in the promotion job, we probably don't run validations there.

1. https://review.openstack.org/#/c/483345/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart (master)

Fix proposed to branch: master
Review: https://review.openstack.org/496822

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart (master)

Reviewed: https://review.openstack.org/496822
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=1ef01f4251cc1b2786a5d60e0f71d16f88012233
Submitter: Jenkins
Branch: master

commit 1ef01f4251cc1b2786a5d60e0f71d16f88012233
Author: John Trowbridge <email address hidden>
Date: Wed Aug 23 12:13:54 2017 -0400

    Add missing ceph-ansible install to appropriate releases

    We need the same change in 7c13e99a84be38fe2bb064d3a28605c7d1231d58
    in our other releases that support ceph-ansible.

    Change-Id: I4de51dc3e3a166dd25e45585c5eb38140650eeff
    Closes-Bug: 1712432

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart 2.1.1

This issue was fixed in the openstack/tripleo-quickstart 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.