tempest.api.orchestration.stacks.test_neutron_resources.NeutronResourcesTestJSON failed to reach CREATE_COMPLETE status within the required time

Bug #1288970 reported by Steve Baker
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Steve Baker
tempest
Fix Released
Undecided
Steve Baker

Bug Description

This is an intermittent error which happens for ~20% of heat-slow runs and is likely to be related to failure of orchestrated resources rather than heat itself.

2014-03-06 19:34:36.985 | setUpClass (tempest.api.orchestration.stacks.test_neutron_resources.NeutronResourcesTestJSON)
2014-03-06 19:34:36.985 | setUpClass (tempest.api.orchestration.stacks.test_neutron_resources.NeutronResourcesTestJSON) ... FAIL
2014-03-06 19:37:56.452 | tempest.api.orchestration.stacks.test_server_cfn_init.ServerCfnInitTestJSON.test_can_log_into_created_server[slow]
2014-03-06 19:37:56.452 | tempest.api.orchestration.stacks.test_server_cfn_init.ServerCfnInitTestJSON.test_can_log_into_created_server[slow] ... ok
2014-03-06 19:39:15.598 | tempest.api.orchestration.stacks.test_server_cfn_init.ServerCfnInitTestJSON.test_stack_wait_condition_data[slow]
2014-03-06 19:39:15.599 | tempest.api.orchestration.stacks.test_server_cfn_init.ServerCfnInitTestJSON.test_stack_wait_condition_data[slow] ... ok
2014-03-06 19:39:24.591 | tempest.scenario.orchestration.test_autoscaling.AutoScalingTest.test_scale_up_then_down[compute,orchestration,slow]
2014-03-06 19:39:24.591 | tempest.scenario.orchestration.test_autoscaling.AutoScalingTest.test_scale_up_then_down[compute,orchestration,slow] ... skipped u'Skipped until Bug: 1257575 is resolved.'
2014-03-06 19:39:24.691 |
2014-03-06 19:39:24.691 | process-returncode
2014-03-06 19:39:24.692 | process-returncode ... FAIL
2014-03-06 19:39:24.734 |
2014-03-06 19:39:24.734 | ======================================================================
2014-03-06 19:39:24.734 | FAIL: setUpClass (tempest.api.orchestration.stacks.test_neutron_resources.NeutronResourcesTestJSON)
2014-03-06 19:39:24.735 | setUpClass (tempest.api.orchestration.stacks.test_neutron_resources.NeutronResourcesTestJSON)
2014-03-06 19:39:24.735 | ----------------------------------------------------------------------
2014-03-06 19:39:24.735 | _StringException: Traceback (most recent call last):
2014-03-06 19:39:24.735 | File "tempest/api/orchestration/stacks/test_neutron_resources.py", line 144, in setUpClass
2014-03-06 19:39:24.735 | raise e
2014-03-06 19:39:24.735 | TimeoutException: Request timed out
2014-03-06 19:39:24.735 | Details: Stack heat-1118778629 failed to reach CREATE_COMPLETE status within the required time (300 s).

Tags: gate-failure
Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Here is an example of a typical failure:
http://logs.openstack.org/86/78286/2/check/check-tempest-dsvm-neutron-heat-slow/ee145bf/

Because there was a failure, the boot log of the orchestrated server is outputted (search for tempest/api/orchestration/stacks/test_neutron_resources.py:143):
http://logs.openstack.org/86/78286/2/check/check-tempest-dsvm-neutron-heat-slow/ee145bf/logs/tempest.txt.gz

The timeout for this entire test is 300s, yet here the time for booting alone is 244s.

I believe this test would become significantly more reliable if the timeout was raised. 600s?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/78756

Changed in tempest:
assignee: nobody → Steve Baker (steve-stevebaker)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.openstack.org/78756
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=27f02430864443c127dbf4a09a62497c60fd0aa4
Submitter: Jenkins
Branch: master

commit 27f02430864443c127dbf4a09a62497c60fd0aa4
Author: Steve Baker <email address hidden>
Date: Fri Mar 7 09:47:32 2014 +1300

    Raise orchestration build_timeout to 600 seconds

    Observed boot time for a single server has been around 250 seconds
    so a build_timeout default of 300 would explain why ~20% of heat-slow
    jobs are failing with stack timeout errors.

    Closes-Bug: #1288970
    Change-Id: I4feb1b89acf8db0e164468d0471aff71ff5c6a77

Changed in tempest:
status: In Progress → Fix Released
Revision history for this message
Attila Fazekas (afazekas) wrote :
Revision history for this message
Attila Fazekas (afazekas) wrote :

The bellow ERROR message is visible in the logs/screen-h-eng.txt

Log stash:
message: "DB error resource with id" AND NOT build_status:"FAILURE" 0 not failed jobs
message: "DB error resource with id" AND build_status:"FAILURE" 85 hit

http://logs.openstack.org/19/82519/1/gate/gate-tempest-dsvm-neutron-heat-slow/e26bec1/logs/screen-h-eng.txt.gz?level=WARNING#_2014-04-02_17_21_10_558

Zane Bitter (zaneb)
Changed in heat:
assignee: nobody → Steve Baker (steve-stevebaker)
Angus Salkeld (asalkeld)
tags: added: gate-failure
Angus Salkeld (asalkeld)
Changed in heat:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Angus Salkeld (asalkeld) wrote :

<stevebaker> asalkeld: that can be closed. that test uses cirros as of last week

Changed in heat:
status: Triaged → Fix Committed
Thierry Carrez (ttx)
Changed in heat:
milestone: none → kilo-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: kilo-rc1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers