test_api_reload_on_sighup failure

Bug #1607177 reported by Rabi Mishra
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Undecided
Rabi Mishra

Bug Description

We can see intermittent gate failures with this traceback.

2016-07-28 03:22:45.675327 | 2016-07-28 03:22:45.674 |
2016-07-28 03:22:45.676987 | 2016-07-28 03:22:45.676 | Captured traceback:
2016-07-28 03:22:45.678756 | 2016-07-28 03:22:45.678 | ~~~~~~~~~~~~~~~~~~~
2016-07-28 03:22:45.680328 | 2016-07-28 03:22:45.679 | Traceback (most recent call last):
2016-07-28 03:22:45.681963 | 2016-07-28 03:22:45.681 | File "/opt/stack/new/heat/heat_integrationtests/functional/test_reload_on_sighup.py", line 120, in test_api_reload_on_sighup
2016-07-28 03:22:45.683813 | 2016-07-28 03:22:45.683 | self._reload('heat_api')
2016-07-28 03:22:45.685376 | 2016-07-28 03:22:45.685 | File "/opt/stack/new/heat/heat_integrationtests/functional/test_reload_on_sighup.py", line 115, in _reload
2016-07-28 03:22:45.688105 | 2016-07-28 03:22:45.687 | self._change_config(service, old_workers, new_workers)
2016-07-28 03:22:45.689700 | 2016-07-28 03:22:45.689 | File "/opt/stack/new/heat/heat_integrationtests/functional/test_reload_on_sighup.py", line 104, in _change_config
2016-07-28 03:22:45.691274 | 2016-07-28 03:22:45.690 | self.assertEqual(new_workers, len(post_reload_children))
2016-07-28 03:22:45.693934 | 2016-07-28 03:22:45.693 | File "/opt/stack/new/heat/.tox/integration/local/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual
2016-07-28 03:22:45.695600 | 2016-07-28 03:22:45.695 | self.assertThat(observed, matcher, message)
2016-07-28 03:22:45.697112 | 2016-07-28 03:22:45.696 | File "/opt/stack/new/heat/.tox/integration/local/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat
2016-07-28 03:22:45.698861 | 2016-07-28 03:22:45.698 | raise mismatch_error
2016-07-28 03:22:45.701050 | 2016-07-28 03:22:45.700 | testtools.matchers._impl.MismatchError: 3 != 4

Revision history for this message
Rabi Mishra (rabi) wrote :

It seems the stale childs are removed after a minute or so.

SIGHUP Received at 2016-07-28 03:48:33.595[1]

2016-07-28 03:48:33.595 6673 ERROR heat.common.wsgi [-] SIGHUP received
2016-07-28 03:48:33.599 6673 INFO heat.common.wsgi [-] Starting 3 workers
2016-07-28 03:48:33.602 6673 INFO heat.common.wsgi [-] Started child 24039
2016-07-28 03:48:33.604 24039 INFO eventlet.wsgi.server [-] (24039) wsgi starting up on http://0.0.0.0:8004
2016-07-28 03:48:33.607 6673 INFO heat.common.wsgi [-] Started child 24040
2016-07-28 03:48:33.609 24040 INFO eventlet.wsgi.server [-] (24040) wsgi starting up on http://0.0.0.0:8004
2016-07-28 03:48:33.611 6673 INFO heat.common.wsgi [-] Started child 24041
2016-07-28 03:48:33.614 24041 INFO eventlet.wsgi.server [-] (24041) wsgi starting up on http://0.0.0.0:8004

Stale Childs removed: 03:49:07:325[1]

2016-07-28 03:49:07.304 6673 INFO heat.common.wsgi [-] Removed stale child 7100
2016-07-28 03:49:07.325 6673 INFO heat.common.wsgi [-] Removed stale child 7099

After almost 35 seconds. I would post a patch to increase the self.conf.sighup_timeout

[1] http://logs.openstack.org/11/347111/3/check/gate-heat-dsvm-functional-orig-mysql-lbaasv2/81f7fb6/logs/screen-h-api.txt.gz#_2016-07-28_03_49_07_304

Changed in heat:
assignee: nobody → Rabi Mishra (rabi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/348147

Changed in heat:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/348147
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=abde6bfa2d518c12ba40e1e14edbcac94deea745
Submitter: Jenkins
Branch: master

commit abde6bfa2d518c12ba40e1e14edbcac94deea745
Author: rabi <email address hidden>
Date: Thu Jul 28 12:15:32 2016 +0530

    Increase default sighup_timeout

    It seems at times it takes more time than the current default
    for the child processes to exit normally. Increasing it to
    120 seconds.

    Change-Id: Ic54d2d7edc97cbe07ed4a4445cb865bd1b157f9d
    Closes-Bug: #1607177

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
Rabi Mishra (rabi) wrote :

We increased the default sighup timeout to 120 secs. But it seems some cases it's taking more than that (~180 secs) [1]

[1] http://logs.openstack.org/90/347590/3/gate/gate-heat-dsvm-functional-orig-mysql-lbaasv2/d1de1b5/logs/screen-h-api.txt.gz

Rabi Mishra (rabi)
Changed in heat:
milestone: none → newton-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 7.0.0.0b3

This issue was fixed in the openstack/heat 7.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.