[master/Rocky]Fs035 job fails in promotion because of heat stack timeout

Bug #1801587 reported by Sagi (Sergey) Shnaidman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

fs035 OVB job fails in master promotion jobs because of heat timeout (3 times in a row)

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/7631ec2/

https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035-master/7631ec2/logs/undercloud/var/log/containers/heat/heat-engine.log.1.gz

ERROR heat.engine.resource Traceback (most recent call last):
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 924, in _action_recorder
ERROR heat.engine.resource yield
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 1037, in _do_action
ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args)
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 346, in wrapper
ERROR heat.engine.resource step = next(subtask)
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 986, in action_handler_task
ERROR heat.engine.resource done = check(handler_data)
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 409, in check_create_complete
ERROR heat.engine.resource return self._check_status_complete(self.CREATE)
ERROR heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 463, in _check_status_complete
ERROR heat.engine.resource action=action)
ERROR heat.engine.resource ResourceFailure: resources[0]: Stack CREATE cancelled

Revision history for this message
Rabi Mishra (rabi) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :
summary: - Fs035 job fails in promotion becasue of heat stack timeout
+ [master/Rocky]Fs035 job fails in promotion becasue of heat stack timeout
Revision history for this message
chandan kumar (chkumar246) wrote : Re: [master/Rocky]Fs035 job fails in promotion becasue of heat stack timeout
Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

ovb has been upgraded hopefully will address the bug (from a call i am in just now thanks)

Sorin Sbarnea (ssbarnea)
summary: - [master/Rocky]Fs035 job fails in promotion becasue of heat stack timeout
+ [master/Rocky]Fs035 job fails in promotion because of heat stack timeout
Revision history for this message
Kieran Forde (kieran-forde) wrote :

I've seen the OVB jobs are passing again but it is un-related to the upgrade of rdo-cloud (sort of).

The OVS upgrade should tackle the instability we've seen when rdo-cloud is under load.

BUT what I found to fix the OVB issue was in the heat.conf (auth_uri).

What I noticed when looking at the failures was there was a inability to reach http://172.16.0.9:5000.

This is the internal keystone URL, however my understanding is that any instances that heat creates and heat itself need to reach this URL.

So I changed it to the public URL [https://phx2.cloud.rdoproject.org:13000 and restarted heat-engine on the controllers.

Since then things have been passing.

I will look at where this was set and how to fix the value.

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

Let create a a logstash query + elastic-recheck entry on this to measure recurrence.

Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.