Timeout settings are misused for config download and some workflows

Bug #1868063 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Bogdan Dobrelya

Bug Description

This continues https://bugs.launchpad.net/tripleo/+bug/1867968

The --config-download-timeout CLI option and the ansible_timeout param of run_ansible_playbook utility define the ansible ssh timeout, while those are given a global timeout of a deployment. The timeout in the scale down workflow is also misused and defines ansible ssh timeout instead of a deployment action timeout.

The update/upgrade -including external- actions should also provide ansible connection (SSH) timeout. That is important to have that parameter for not only day1 but also day2 actions. Let's also inspect if we correctly pass connection timeout in workflows for all day1/2 actions. (extracted to https://bugs.launchpad.net/tripleo/+bug/1868075)

Additionally, ansible execution should be also limited via its job_timeout setting (https://bugzilla.redhat.com/show_bug.cgi?id=1801502)

See also related bug https://bugs.launchpad.net/tripleo/+bug/1867968

The impact is:
While the heat stacks part and the most of workflows are used to be set timeouts properly, the config-download part and a few bare-metal workflows to it wrongly and configure ssh timeout for ansible instead of real deployment timeouts

Changed in tripleo:
importance: Undecided → High
milestone: none → ussuri-3
description: updated
description: updated
description: updated
tags: added: train-backport-potential
Changed in tripleo:
status: New → In Progress
assignee: nobody → Bogdan Dobrelya (bogdando)
description: updated
description: updated
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.opendev.org/713807

Changed in tripleo:
importance: High → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/716980

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on python-tripleoclient (master)

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/716876
Reason: squashed it

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/716876
Reason: oops, that's a wrong one, sorry

Revision history for this message
Bogdan Dobrelya (bogdando) wrote : Re: Timeout settings are misused for config download

Raised it to high as it seems there is historically a big mess with deployment timeout in fact configuring ansible connection timeout, therefore nothing limits real deployment timeouts

Changed in tripleo:
importance: Medium → High
tags: added: stein-backport-potential
summary: - Timeout settings are misused for config download
+ Timeout settings are misused for config download and some workflows
tags: added: queens-backport-potential
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/717190

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/717192

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/716980
Reason: ok let's see if https://review.opendev.org/#/c/713807/ can pass without it

Changed in tripleo:
assignee: Bogdan Dobrelya (bogdando) → Kevin Carter (kevin-carter)
Changed in tripleo:
assignee: Kevin Carter (kevin-carter) → Bogdan Dobrelya (bogdando)
Changed in tripleo:
assignee: Bogdan Dobrelya (bogdando) → Kevin Carter (kevin-carter)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to python-tripleoclient (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/717676

Changed in tripleo:
assignee: Kevin Carter (kevin-carter) → Bogdan Dobrelya (bogdando)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/train)

Reviewed: https://review.opendev.org/717190
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=3d3afa62dc392236dd3191956ed2bf2f05f3b0e1
Submitter: Zuul
Branch: stable/train

commit 3d3afa62dc392236dd3191956ed2bf2f05f3b0e1
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Apr 2 14:45:15 2020 +0200

    [Train only] Add ssh timeout for baremetal_deploy

    Mistakingly and historically
    tripleo.baremetal_deploy.v1.deploy_instances used to pass its
    timeout input into ansible connection timeout instead of
    global/deployment timeout.

    To pertain the established tradition for setting connection timeout to
    some timeout, add missing input for ansible connection timeout and give
    it some reasonable value.

    The use of the timeout input to be passed into the real deployment
    timeout parameter is fixed in the scope of the related bug.

    Related-Bug: #1868063

    Change-Id: I7d2fce9f07c98b1770c01cacbae2d17b8e143b3b
    Signed-off-by: Bogdan Dobrelya <email address hidden>

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/717676
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=d9174e83fdf78a02692e5baa4ae735ff009b2e14
Submitter: Zuul
Branch: master

commit d9174e83fdf78a02692e5baa4ae735ff009b2e14
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Apr 6 08:47:57 2020 +0200

    Invoke ansible from writeable workdirs

    In order to make it configurable via env/settings,
    use writebale tmp paths for ansible runner. This also aligns the
    way we call it for other places.

    Change-Id: I64999f19b4ce2083f05e09c40d6b89c8d8ba2cdd
    Related-bug: #1868063
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/713807
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=9c602da452550fec4692832612739bc8812bf400
Submitter: Zuul
Branch: master

commit 9c602da452550fec4692832612739bc8812bf400
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Mar 19 09:25:21 2020 +0100

    Fix misused deployment vs connection timeouts

    Fix misused ansible connection timeout and deployment timeout passed in
    config download and ansible runner utility.

    Allow ansible runner utility to be given a job_timeout as well.

    Also fix the misuse of timeout parameters in related worklows. Add
    --overcloud-ssh-port-timeout and use it to configure ansible connection
    timeout for the DeleteNode interace of the involved
    workflows. Then use the timeout parameter as real timeout instead of
    mistakingly passing it as a connection timeout.

    Add new unit test for ansible timeout in config_download. Add missing
    coverage for the existing timeout-related params in other unit tests.

    Closes-Bug: #1868063
    Co-authored-by: Kevin Carter <email address hidden>
    Change-Id: I2a4d151bcb83074af5bcf7d1b8c68d81d3c0400d
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/718339

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/718347

description: updated
tags: added: config-download
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 13.2.0

This issue was fixed in the openstack/python-tripleoclient 13.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/716980
Reason: not needed anymore after I192c086d5fee31eea19bc8d8ea3a7e889ee9dcc8

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/stein)

Reviewed: https://review.opendev.org/717192
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=217deba8419739efb9f43f840f981d57920eff44
Submitter: Zuul
Branch: stable/stein

commit 217deba8419739efb9f43f840f981d57920eff44
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Apr 2 14:45:15 2020 +0200

    Add ssh timeout for baremetal_deploy

    Mistakingly and historically
    tripleo.baremetal_deploy.v1.deploy_instances used to pass its
    timeout input into ansible connection timeout instead of
    global/deployment timeout.

    To pertain the established tradition for setting connection timeout to
    some timeout, add missing input for ansible connection timeout and give
    it some reasonable value.

    The use of the timeout input to be passed into the real deployment
    timeout parameter is fixed in the scope of the related bug.

    Related-Bug: #1868063

    Change-Id: I7d2fce9f07c98b1770c01cacbae2d17b8e143b3b
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 3d3afa62dc392236dd3191956ed2bf2f05f3b0e1)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/train)

Reviewed: https://review.opendev.org/718339
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=a5e3f2685cbc05d4efbac2e79b31b9a3ad87cca8
Submitter: Zuul
Branch: stable/train

commit a5e3f2685cbc05d4efbac2e79b31b9a3ad87cca8
Author: Bogdan Dobrelya <email address hidden>
Date: Wed Apr 8 09:34:03 2020 +0200

    [Train only] Fix misused timeout values in the deployment workflow

    Mistakingly and historically
    tripleo.deployment.v1.config_download_deploy used to take wrong value
    for its config_download_timeout input. That input defines global
    deployment timeout but the caller passes ssh connection timeout for it.

    This is fixed in the scope of the related bug (the caller on the
    tripleoclient side).

    Although to pertain the established tradition for setting connection
    timeout to some timeout, add missing ssh connection timeout input and
    make it matching the passed run_ansible's timeout value.

    This needs to be fixes for Train and earlier releases only (the
    workflow is no longer used after Train)

    Change-Id: Ia0c92af6cdcdd66dd5eabb945b4398e9228be464
    Related-Bug: #1868063
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/718347
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=780139aabca40d2f36fad6dddedbf4a01d102895
Submitter: Zuul
Branch: stable/train

commit 780139aabca40d2f36fad6dddedbf4a01d102895
Author: Bogdan Dobrelya <email address hidden>
Date: Thu Mar 19 09:25:21 2020 +0100

    Fix misused deployment vs connection timeouts

    Fix misused ansible connection timeout and deployment timeout passed in
    config download. Also fix the misuse of timeout parameters in the
    related worklow being called by config_download.

    Add missing coverage for the existing timeout-related params in other
    unit tests.

    This partially backports https://review.opendev.org/713807.

    Closes-Bug: #1868063
    Depends-On: https://review.opendev.org/718339
    Change-Id: I2a4d151bcb83074af5bcf7d1b8c68d81d3c0400d
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 12.4.0

This issue was fixed in the openstack/python-tripleoclient 12.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.