[master/Rocky]Fs035 job fails in promotion because of heat stack timeout

Bug #1801587 reported by Sagi (Sergey) Shnaidman on 2018-11-04
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Unassigned

Bug Description

fs035 OVB job fails in master promotion jobs because of heat timeout (3 times in a row)

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/7631ec2/

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/7631ec2/logs/undercloud/var/log/containers/heat/heat-engine.log.1.gz

ERROR heat.engine.resource Traceback (most recent call last):
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 924, in _action_recorder
ERROR heat.engine.resource yield
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 1037, in _do_action
ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args)
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 346, in wrapper
ERROR heat.engine.resource step = next(subtask)
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 986, in action_handler_task
ERROR heat.engine.resource done = check(handler_data)
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 409, in check_create_complete
ERROR heat.engine.resource return self._check_status_complete(self.CREATE)
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 463, in _check_status_complete
ERROR heat.engine.resource action=action)
ERROR heat.engine.resource ResourceFailure: resources[0]: Stack CREATE cancelled

Marios Andreou (marios-b) wrote :

ovb has been upgraded hopefully will address the bug (from a call i am in just now thanks)

Sorin Sbarnea (ssbarnea) on 2018-11-29
summary: - [master/Rocky]Fs035 job fails in promotion becasue of heat stack timeout
+ [master/Rocky]Fs035 job fails in promotion because of heat stack timeout
Kieran Forde (kieran-forde) wrote :

I've seen the OVB jobs are passing again but it is un-related to the upgrade of rdo-cloud (sort of).

The OVS upgrade should tackle the instability we've seen when rdo-cloud is under load.

BUT what I found to fix the OVB issue was in the heat.conf (auth_uri).

What I noticed when looking at the failures was there was a inability to reach http://172.16.0.9:5000.

This is the internal keystone URL, however my understanding is that any instances that heat creates and heat itself need to reach this URL.

So I changed it to the public URL [https://phx2.cloud.rdoproject.org:13000 and restarted heat-engine on the controllers.

Since then things have been passing.

I will look at where this was set and how to fix the value.

Sorin Sbarnea (ssbarnea) wrote :

Let create a a logstash query + elastic-recheck entry on this to measure recurrence.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers