bvt deployment waits too long to fail

Bug #1558505 reported by Aleksandra Fedorova
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Kyrylo Galanov
Mitaka
Won't Fix
Medium
Fuel Library (Deprecated)
Newton
Fix Committed
Medium
Kyrylo Galanov

Bug Description

http://jenkins-product.srt.mirantis.net:8080/job/9.0.custom.ubuntu.bvt_2/416/

In this deployment see diagnostic snapshot attached according to astute.log there were three broken sets of tasks (HA_proxy, mysql and keystone_admin smth), which were hanging in 'running' state for more then half an hour, without any changes.

We should catch such situations early and fail the build.

Revision history for this message
Aleksandra Fedorova (bookwar) wrote :
Changed in fuel:
status: New → Confirmed
assignee: nobody → Fuel Library Team (fuel-library)
tags: added: area-python
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
Changed in fuel:
status: Confirmed → In Progress
Dmitry Pyzhov (dpyzhov)
tags: added: area-library
removed: area-python
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Hi Aleksandra,

There are 3 levels of timeouts:
1. puppet resource provider timeout
2. astute task timeout
3. automated test timeout

I cannot shrink timeouts inspired by intuition because it might break large scale deployments.
Which timeout specifically do you propose to decrease?

Best regards,
Kyrylo

Revision history for this message
Aleksandra Fedorova (bookwar) wrote :

Do we need to rely on timeouts here?

Can't we recognize that deployment got into the loop without any changes in task statuses and fail based on that?

Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

Aleksandra,

If you can provide criteria how we can unambiguously determine that the deployment has failed before timeout is met, I think it is possible to implement that in code.
My concern is that decreasing timeouts can _possibly_ impact the deployments which need more time to complete.

Changed in fuel:
status: In Progress → Confirmed
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Changed in fuel:
assignee: Kyrylo Galanov (kgalanov) → Fuel Library Team (fuel-library)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/320281

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/320281
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=bf14da9bf265638ed661abc3557b0c41f9f42bc5
Submitter: Jenkins
Branch: master

commit bf14da9bf265638ed661abc3557b0c41f9f42bc5
Author: Kyrylo Galanov <email address hidden>
Date: Tue May 24 09:57:01 2016 +0200

    Decrease task timeouts according to actual duration

    Most of tasks had timeout equal to 3600 seconds whereas actual duration
    is no more than 1 minute.
    Minimal timeout is 60 seconds even if task is complete in a few seconds.
    New timeout ~ duration * 2.5

    Change-Id: Iea9ee8f5038f5fcfd9dcdfc2d9ba964eab035549
    Closes-bug: #1558505

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.