Fuel for OpenStack

bvt deployment waits too long to fail

Series newton
Bug #1558505

Bug #1558505 reported by Aleksandra Fedorova on 2016-03-17

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Committed	Medium	Kyrylo Galanov	Fuel for OpenStack 10.0
Mitaka	Won't Fix	Medium	Fuel Library (Deprecated)	Fuel for OpenStack 9.0
Newton	Fix Committed	Medium	Kyrylo Galanov	Fuel for OpenStack 10.0

Bug Description

http://jenkins-product.srt.mirantis.net:8080/job/9.0.custom.ubuntu.bvt_2/416/

In this deployment see diagnostic snapshot attached according to astute.log there were three broken sets of tasks (HA_proxy, mysql and keystone_admin smth), which were hanging in 'running' state for more then half an hour, without any changes.

We should catch such situations early and fail the build.

Tags:

Revision history for this message

Aleksandra Fedorova (bookwar) wrote on 2016-03-17:

fail_error_ceph_rados_gw-fuel-snapshot-2016-03-17_00-51-06.tar.xz Edit (34.6 MiB, application/octet-stream)

Vladimir Kuklin (vkuklin) on 2016-03-17

Changed in fuel:
status:	New → Confirmed
assignee:	nobody → Fuel Library Team (fuel-library)
tags:	added: area-python

Kyrylo Galanov (kgalanov) on 2016-03-20

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)

Kyrylo Galanov (kgalanov) on 2016-03-20

Changed in fuel:
status:	Confirmed → In Progress

Dmitry Pyzhov (dpyzhov) on 2016-03-21

tags:

added: area-library
removed: area-python

Revision history for this message

Kyrylo Galanov (kgalanov) wrote on 2016-03-24:

Hi Aleksandra,

There are 3 levels of timeouts:
1. puppet resource provider timeout
2. astute task timeout
3. automated test timeout

I cannot shrink timeouts inspired by intuition because it might break large scale deployments.
Which timeout specifically do you propose to decrease?

Best regards,
Kyrylo

Revision history for this message

Aleksandra Fedorova (bookwar) wrote on 2016-03-25:

Do we need to rely on timeouts here?

Can't we recognize that deployment got into the loop without any changes in task statuses and fail based on that?

Revision history for this message

Kyrylo Galanov (kgalanov) wrote on 2016-03-28:

Aleksandra,

If you can provide criteria how we can unambiguously determine that the deployment has failed before timeout is met, I think it is possible to implement that in code.
My concern is that decreasing timeouts can _possibly_ impact the deployments which need more time to complete.

Kyrylo Galanov (kgalanov) on 2016-03-31

Changed in fuel:
status:	In Progress → Confirmed

Revision history for this message

Bug Checker Bot (bug-checker) wrote on 2016-03-31: Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags:

added: need-info

Kyrylo Galanov (kgalanov) on 2016-04-07

Changed in fuel:
assignee:	Kyrylo Galanov (kgalanov) → Fuel Library Team (fuel-library)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-24: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/320281

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-27: Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/320281
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=bf14da9bf265638ed661abc3557b0c41f9f42bc5
Submitter: Jenkins
Branch: master

commit bf14da9bf265638ed661abc3557b0c41f9f42bc5
Author: Kyrylo Galanov <email address hidden>
Date: Tue May 24 09:57:01 2016 +0200

Decrease task timeouts according to actual duration

    Most of tasks had timeout equal to 3600 seconds whereas actual duration
    is no more than 1 minute.
    Minimal timeout is 60 seconds even if task is complete in a few seconds.
    New timeout ~ duration * 2.5

Change-Id: Iea9ee8f5038f5fcfd9dcdfc2d9ba964eab035549
Closes-bug: #1558505