undercloud-container: neutron post config fails

Bug #1717281 reported by Emilien Macchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Dan Prince

Bug Description

It seems like the Heat API isn't running when deploying the Heat-Installer on undercloud-container job:

http://logs.openstack.org/38/504038/3/check/gate-tripleo-ci-centos-7-undercloud-containers-nv/b425083/console.html#_2017-09-14_12_41_29_472058

Tags: ci
Changed in tripleo:
milestone: none → queens-1
Revision history for this message
Dan Prince (dan-prince) wrote :

It looks to me that Heat API is working. The confusing part of the CI logs is this:

2017-09-14 12:41:29.472058 | 2017-09-14 12:41:29.000 | Unable to establish connection to http://127.0.0.1:8006/v1/admin/stacks/undercloud/events?marker=70a9d31c-7510-4093-8652-1ec011717df9&nested_depth=6&sort_dir=asc: HTTPConnectionPool(host='127.0.0.1', port=8006): Max retries exceeded with url: /v1/admin/stacks/undercloud/events?marker=70a9d31c-7510-4093-8652-1ec011717df9&nested_depth=6&sort_dir=asc (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x39e4510>: Failed to establish a new connection: [Errno 111] Connection refused',))

----

That is from the heat "events" stream and is due to the fact that it was trying to listen to events when the stack failed, and the installer then stops heat. So there is a bit of a race here... but it isn't the root cause of the CI failure I think.

Dan Prince (dan-prince)
Changed in tripleo:
assignee: nobody → Dan Prince (dan-prince)
summary: - undercloud-container: heat api isn't working
+ undercloud-container: neutron post config fails
Revision history for this message
Dan Prince (dan-prince) wrote :

There appears to be a new race condition in the undercloud POST scripts in t-h-t.

I ran it locally once today using the latest t-h-t and tripleoupstream containers and saw it pass. Then I ran it again and it failed due to:

2017-09-14 20:28:07.000 | + grep start
2017-09-14 20:28:07.000 | + openstack network show ctlplane
2017-09-14 20:28:07.000 | Error while executing command: No Network found for ctlplane

---

You'll see the same things in the CI logs if you look at the logs/var/log/undercloud_install.txt.gz file.

Still looking for the root cause.

Revision history for this message
Dan Prince (dan-prince) wrote :

Looks like there may be issues with a few neutron services continually restarting:

78cff34a73b1 tripleoupstream/centos-binary-neutron-openvswitch-agent:latest "kolla_start" 18 minutes ago Restarting (1) 3 minutes ago neutron_ovs_agent
8e5b2468e13c tripleoupstream/centos-binary-neutron-dhcp-agent:latest "kolla_start" 18 minutes ago Up 18 minutes (unhealthy)

Revision history for this message
Martin André (mandre) wrote :

It seems to be coming from mistral which returns a 500 error each time it's queried:

2017-09-14 12:40:34.000 | ++ openstack workbook list
2017-09-14 12:40:34.000 | ++ grep tripleo
2017-09-14 12:40:34.000 | ++ cut -f 2 -d ' '
2017-09-14 12:40:34.000 | Internal Server Error (HTTP 500)
2017-09-14 12:40:34.000 | ++ openstack workflow list
2017-09-14 12:40:34.000 | ++ grep tripleo
2017-09-14 12:40:34.000 | ++ cut -f 2 -d ' '
2017-09-14 12:40:34.000 | Internal Server Error (HTTP 500)
2017-09-14 12:40:34.000 | ++ ls /usr/share/openstack-tripleo-common/workbooks/access.yaml /usr/share/openstack-tripleo-common/workbooks/baremetal.yaml /usr/share/openstack-tripleo-common/workbooks/ceph-ansible.yaml /usr/share/openstack-tripleo-common/workbooks/deployment.yaml /usr/share/openstack-tripleo-common/workbooks/derive_params_formulas.yaml /usr/share/openstack-tripleo-common/workbooks/derive_params.yaml /usr/share/openstack-tripleo-common/workbooks/fernet-key-rotate.yaml /usr/share/openstack-tripleo-common/workbooks/package_update.yaml /usr/share/openstack-tripleo-common/workbooks/plan_management.yaml /usr/share/openstack-tripleo-common/workbooks/scale.yaml /usr/share/openstack-tripleo-common/workbooks/stack.yaml /usr/share/openstack-tripleo-common/workbooks/support.yaml /usr/share/openstack-tripleo-common/workbooks/swift_rings_backup.yaml /usr/share/openstack-tripleo-common/workbooks/validations.yaml
2017-09-14 12:40:34.000 | + for workbook in '$(ls /usr/share/openstack-tripleo-common/workbooks/*)'
2017-09-14 12:40:34.000 | + openstack workbook create /usr/share/openstack-tripleo-common/workbooks/access.yaml
2017-09-14 12:40:34.000 | Internal Server Error (HTTP 500)

For some reason it seems to be unable to talk to keystone (the timestamps match with the 500 errors):
http://logs.openstack.org/38/504038/3/check/gate-tripleo-ci-centos-7-undercloud-containers-nv/b425083/logs/var/log/containers/mistral/api.txt.gz

Revision history for this message
Martin André (mandre) wrote :

Keystone is not configured to listen on localhost:

http://logs.openstack.org/38/504038/3/check/gate-tripleo-ci-centos-7-undercloud-containers-nv/b425083/logs/var/log/config-data/keystone/etc/httpd/conf.d/10-keystone_wsgi_admin.conf.gz

We need to set auth_url and auth_uri for mistral and not rely on the default which point to localhost.

Revision history for this message
Martin André (mandre) wrote :

So we have in hiera:

    "mistral::keystone::authtoken::auth_uri": "http://192.168.24.1:5000/v2.0",
    "mistral::keystone::authtoken::auth_url": "http://192.168.24.1:5000",

But puppet-mistral won't pick it up and use the default value.

This probably has to do with https://review.openstack.org/#/c/490985/.

Revision history for this message
Martin André (mandre) wrote :

Brad has a patch that should fix it at https://review.openstack.org/#/c/503755/

Changed in tripleo:
milestone: queens-1 → queens-2
Changed in tripleo:
milestone: queens-2 → queens-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-mistral 12.0.0

This issue was fixed in the openstack/puppet-mistral 12.0.0 release.

Matt Riedemann (mriedem)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.