Containers multinode job deploying an empty overcloud

Bug #1703599 reported by Jiří Stránský
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Jiří Stránský

Bug Description

2017-07-11 09:37:57.081552 | TASK [validate-simple : Validate the overcloud] ********************************
2017-07-11 09:37:57.081611 | task path: /home/jenkins/workspace/gate-tripleo-ci-centos-7-containers-multinode/.quickstart/usr/local/share/ansible/roles/validate-simple/tasks/main.yml:30
2017-07-11 09:37:57.103123 | Tuesday 11 July 2017 09:37:57 +0000 (0:00:02.919) 0:48:29.040 **********
2017-07-11 09:38:30.692796 | fatal: [undercloud]: FAILED! => {"changed": true, "cmd": "set -o pipefail && /home/jenkins/overcloud-validate.sh 2>&1 | awk '{ print strftime(\"%Y-%m-%d %H:%M:%S |\"), $0; fflush(); }' > /home/jenkins/overcloud_validate.log", "delta": "0:00:32.376437", "end": "2017-07-11 09:38:30.672787", "failed": true, "rc": 1, "start": "2017-07-11 09:37:58.296350", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}

Details from validation log:

2017-07-11 09:38:09 | Failed to discover available identity versions when contacting http://192.168.24.14:5000/v2.0. Attempting to parse version from URL.
2017-07-11 09:38:12 | Unable to establish connection to http://192.168.24.14:5000/v2.0/tokens: HTTPConnectionPool(host='192.168.24.14', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x3fe8cd0>: Failed to establish a new connection: [Errno 113] No route to host',))
2017-07-11 09:38:12 | + ramdisk_id=
2017-07-11 09:38:12 | ++ openstack image create pingtest_kernel --public --container-format aki --disk-format aki --file /home/jenkins/cirros_images/cirros-0.3.5-x86_64-vmlinuz
2017-07-11 09:38:12 | ++ awk '/ id / {print $4}'
2017-07-11 09:38:15 | Failed to discover available identity versions when contacting http://192.168.24.14:5000/v2.0. Attempting to parse version from URL.
2017-07-11 09:38:18 | Unable to establish connection to http://192.168.24.14:5000/v2.0/tokens: HTTPConnectionPool(host='192.168.24.14', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x373bd10>: Failed to establish a new connection: [Errno 113] No route to host',))
2017-07-11 09:38:18 | + kernel_id=
2017-07-11 09:38:18 | + openstack image create pingtest_image --public --container-format ami --disk-format ami --property kernel_id= --property ramdisk_id= --file /home/jenkins/cirros_images/cirros-0.3.5-x86_64-blank.img
2017-07-11 09:38:21 | Failed to discover available identity versions when contacting http://192.168.24.14:5000/v2.0. Attempting to parse version from URL.
2017-07-11 09:38:24 | Unable to establish connection to http://192.168.24.14:5000/v2.0/tokens: HTTPConnectionPool(host='192.168.24.14', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x495dd10>: Failed to establish a new connection: [Errno 113] No route to host',))
2017-07-11 09:38:24 | + cleanup
2017-07-11 09:38:24 | + openstack stack delete --yes pingtest_stack
2017-07-11 09:38:27 | Failed to discover available identity versions when contacting http://192.168.24.14:5000/v2.0. Attempting to parse version from URL.
2017-07-11 09:38:30 | Unable to establish connection to http://192.168.24.14:5000/v2.0/tokens: HTTPConnectionPool(host='192.168.24.14', port=5000): Max retries exceeded with url: /v2.0/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x2ef6c10>: Failed to establish a new connection: [Errno 113] No route to host',))

Revision history for this message
Jiří Stránský (jistr) wrote :

The container logs are missing because docker service isn't running on the overcloud at all, and the log collection from containers is conditioned on a running docker service.

Revision history for this message
Jiří Stránský (jistr) wrote :

So there are *no services at all* deployed on the overcloud in the multinode job. Comparing broken multinode:

http://logs.openstack.org/49/482449/1/check/gate-tripleo-ci-centos-7-containers-multinode/afc4933/logs/postci.txt.gz

with working OVB:

http://logs.openstack.org/02/476602/25/check-tripleo/gate-tripleo-ci-centos-7-ovb-containers-oooq/6bc12cb/logs/postci.txt.gz

E.g. there's no docker puppet class mentioned in step_config of the multinode log.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/482545

Changed in tripleo:
assignee: nobody → Jiří Stránský (jistr)
status: Triaged → In Progress
Revision history for this message
Jiří Stránský (jistr) wrote : Re: Containers multinode job failing verification step and not collecting container logs

After some searching the only patch that comes to suspicion as a possible cause for such a weird issue is this one: https://review.openstack.org/#/c/479400/

It added an environment file which has all entries commented out for now, so it effectively looks just like this:

    parameter_defaults:

and i've already tried this in the past with Heat, and it broke my deployment back then. It's fairly likely that this is the issue.

tags: added: alert
summary: - Containers multinode job failing verification step and not collecting
- container logs
+ Containers multinode job deploying an empty overcloud
Revision history for this message
Alex Schultz (alex-schultz) wrote :

It should be noted that the containers-multinode job isn't voting on oooq-extras. We should probably be gating on all oooq jobs in oooq-extras

Revision history for this message
Emilien Macchi (emilienm) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/482545
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=366825ec09ac44fe0440ed31685f5870032680a1
Submitter: Jenkins
Branch: master

commit 366825ec09ac44fe0440ed31685f5870032680a1
Author: Jiri Stransky <email address hidden>
Date: Tue Jul 11 14:33:57 2017 +0200

    Don't confuse Heat with empty parameter_defaults

    Apparently providing completely empty parameter_defaults in an
    environment file can confuse Heat, and it seems like it doesn't try to
    deploy any services on the overcloud in the multinode job. See the bug
    for more details about the bug symptoms.

    Change-Id: Ia9cb01b48087b78f66004263757590877219f743
    Closes-Bug: #1703599

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.0.0b3

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.