[scale][astute] too big timeout for UploadTask during provisioning step

Bug #1629031 reported by Alexander Gordeev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Vladimir Sharshov
Newton
Fix Committed
High
Vladimir Sharshov

Bug Description

Detailed bug description:
  IBP provisioning requires specific file with provisioning data to be uploaded on every node before the actual provisioning script would be executed.
  This upload data task is executed synchronously for every node in a row; one by one.
  Unless the task accomplished for one node, it won't start to upload the data for next node.
  If one node for some reasons becomes irresponsible, then it would take 11.5minutes to recognize the failure.

Steps to reproduce:
 0. Emulate mcollective outage for some slave nodes: stop mcollective service on some of nodes.
 1. Start provisioning of the nodes

Expected results:
 Provisioning task provisions all nodes on which mcollective is still operatable. No significant delay due to some nodes are being unavailable for mcollective.
Actual result:
  Provisioning task provisions all nodes on which mcollective is still operatable. Every unavailable node add additional 11.5mins of delay to provisioning. Eg.: for 8 unavailable nodes it would be 1.5H
Reproducibility:
 Always
Workaround:
 ???
Impact:
 UX, abnormally increased OS provisioning time on large scale could lead to provisioning task failure by timeout.
Description of the environment:
 Fuel 9.0
Additional information:
 Perhaps, somebody should add more shorter timeout for non-responding mcollective agents for UploadTask. Since UploadTask for provisioning usually takes few secs, therefore the timeout should be adjusted to a dozen of seconds. Not a dozen of minutes.

https://github.com/openstack/fuel-astute/blob/stable/mitaka/lib/astute/image_provision.rb#L48-L58

Revision history for this message
Alexander Gordeev (a-gordeev) wrote :
tags: added: module-astute scale
description: updated
Changed in fuel:
milestone: none → 9.2
assignee: nobody → Vladimir Sharshov (vsharshov)
importance: Undecided → High
status: New → Confirmed
tags: added: area-python
description: updated
Revision history for this message
Roman Rufanov (rrufanov) wrote :

please provide update on proposed resolution?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/415247

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/415247
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=f475c45dfc0df2c50fc0797463c2957352c1648b
Submitter: Jenkins
Branch: master

commit f475c45dfc0df2c50fc0797463c2957352c1648b
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Tue Dec 27 17:50:14 2016 +0300

    Upload file task timeout support

    Astute will not retry and will not wait around 10 minutes for
    every node which connection was missed in case of
    upload file task. For now it will wait only default upload
    timeout.

    Default timeout for upload now can be setup in config. For now
    it is 60 seconds. Also upload file task now support timeout
    parameter which will overide default.

    Change-Id: Ice8207f539566a50d4eb30c04ab563c3ee1278ec
    Closes-Bug: #1629031

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/415478

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/415479

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (stable/newton)

Reviewed: https://review.openstack.org/415478
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=47e27fb76cfdd9b9583d32427073066cab5be044
Submitter: Jenkins
Branch: stable/newton

commit 47e27fb76cfdd9b9583d32427073066cab5be044
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Tue Dec 27 17:50:14 2016 +0300

    Upload file task timeout support

    Astute will not retry and will not wait around 10 minutes for
    every node which connection was missed in case of
    upload file task. For now it will wait only default upload
    timeout.

    Default timeout for upload now can be setup in config. For now
    it is 60 seconds. Also upload file task now support timeout
    parameter which will overide default.

    Change-Id: Ice8207f539566a50d4eb30c04ab563c3ee1278ec
    Closes-Bug: #1629031
    (cherry picked from commit f475c45dfc0df2c50fc0797463c2957352c1648b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (stable/mitaka)

Reviewed: https://review.openstack.org/415479
Committed: https://git.openstack.org/cgit/openstack/fuel-astute/commit/?id=3107c809178ada8ef27a0089ff93190020f35ca1
Submitter: Jenkins
Branch: stable/mitaka

commit 3107c809178ada8ef27a0089ff93190020f35ca1
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Tue Dec 27 17:50:14 2016 +0300

    Upload file task timeout support

    Astute will not retry and will not wait around 10 minutes for
    every node which connection was missed in case of
    upload file task. For now it will wait only default upload
    timeout.

    Default timeout for upload now can be setup in config. For now
    it is 60 seconds. Also upload file task now support timeout
    parameter which will overide default.

    Change-Id: Ice8207f539566a50d4eb30c04ab563c3ee1278ec
    Closes-Bug: #1629031
    (cherry picked from commit f475c45dfc0df2c50fc0797463c2957352c1648b)

tags: added: in-stable-mitaka
Andrew Kalach (akndex)
Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-astute 11.0.0.0rc1

This issue was fixed in the openstack/fuel-astute 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.