overcloud deploy hangs causing CI job timeout

Bug #1730179 reported by Martin André
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Critical
Unassigned

Bug Description

In many jobs the overcloud deploy command hangs at the very beginning, waiting on messages:

2017-11-05 08:19:06 | + openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates --libvirt-type qemu --timeout 80 -e /home/zuul/cloud-names.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/deployed-server-bootstrap-environment-centos.yaml --overcloud-ssh-user zuul -e /usr/share/openstack-tripleo-heat-templates/ci/environments/multinode.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml -e /opt/stack/new/tripleo-ci/test-environments/worker-config.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/debug.yaml --validation-errors-nonfatal --roles-file /home/zuul/overcloud_roles.yaml --compute-scale 0
2017-11-05 08:19:10 | The disable_upgrade_deployment flag is not set in the roles file. This flag is expected when you have a nova-compute or swift-storage role. Please check the contents of the roles file: [{'networks': ['External', 'InternalApi', 'Storage', 'StorageMgmt', 'Tenant'], 'CountDefault': 1, 'name': 'Controller', 'tags': ['primary', 'controller']}]
2017-11-05 08:19:13 | Waiting for messages on queue '20d291c0-503b-46db-a087-8b01f822747b' with no timeout.

http://logs.openstack.org/61/517661/1/gate/legacy-tripleo-ci-centos-7-nonha-multinode-oooq/adad4df/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

There are a lot of errors in the zaqar logs:

2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook [-] webhook task got exception: HTTPConnectionPool(host='centos-7-vexxhost-ca-ymq-1-0000726558', port=38207): Max retries exceeded with url: /1586432a-a996-4131-b236-f23d4c4483f8 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4d0b60bcd0>: Failed to establish a new connection: [Errno 111] Connection refused',)).: ConnectionError: HTTPConnectionPool(host='centos-7-vexxhost-ca-ymq-1-0000726558', port=38207): Max retries exceeded with url: /1586432a-a996-4131-b236-f23d4c4483f8 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4d0b60bcd0>: Failed to establish a new connection: [Errno 111] Connection refused',))
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook Traceback (most recent call last):
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook File "/usr/lib/python2.7/site-packages/zaqar/notification/tasks/webhook.py", line 111, in execute
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook headers=headers)
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook File "/usr/lib/python2.7/site-packages/requests/api.py", line 112, in post
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook return request('post', url, data=data, json=json, **kwargs)
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook File "/usr/lib/python2.7/site-packages/requests/api.py", line 58, in request
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook return session.request(method=method, url=url, **kwargs)
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 518, in request
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook resp = self.send(prep, **send_kwargs)
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 639, in send
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook r = adapter.send(request, **kwargs)
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 502, in send
2017-11-05 08:19:13.950 4913 ERROR zaqar.notification.tasks.webhook raise ConnectionError(e, request=request)

http://logs.openstack.org/61/517661/1/gate/legacy-tripleo-ci-centos-7-nonha-multinode-oooq/adad4df/logs/undercloud/var/log/zaqar/zaqar.log.txt.gz#_2017-11-05_08_19_13_950

Tags: alert ci
Martin André (mandre)
Changed in tripleo:
status: Confirmed → Triaged
Changed in tripleo:
milestone: none → queens-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-zaqar 11.3.1

This issue was fixed in the openstack/puppet-zaqar 11.3.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-zaqar 12.1.0

This issue was fixed in the openstack/puppet-zaqar 12.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.