Timeout can happen for slow prestaging subclouds

Bug #1978704 reported by Kyle MacLeod
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Kyle MacLeod

Bug Description

Brief Description
-----------------
During large system testing we found that outlier subclouds can take a long time during prestaging, which exceeds the current timeout value.

Severity
--------
Major

Steps to Reproduce
------------------
Prestaging large number of subclouds, with various capabilities.

Expected Behavior
------------------
All subclouds are prestaged within the timeout period (currently 1hr)

Actual Behavior
----------------
Slow subclouds exceed the timeout.

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Large distributed cloud lab system.

Branch/Pull Time/Commit
-----------------------
StarlingX master

Last Pass
---------
This is from newer testing with network constraints.

Timestamp/Logs
--------------

Test Activity
-------------
Feature Testing

Workaround
----------
In /etc/dcmanager/dcmanager.conf, change

playbook_timeout=3600
to
playbook_timeout=7200

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/845783

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/845786

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on stx-puppet (master)

Change abandoned by "Kyle MacLeod <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/845783
Reason: Leaving timeout as-is, only the prestage timeout will be updated

Kyle MacLeod (kmacleod)
Changed in starlingx:
assignee: nobody → Kyle MacLeod (kmacleod)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/845786
Committed: https://opendev.org/starlingx/distcloud/commit/9b321553cc101bd9cf6f38defe3fa01ad67a8d85
Submitter: "Zuul (22348)"
Branch: master

commit 9b321553cc101bd9cf6f38defe3fa01ad67a8d85
Author: Kyle MacLeod <email address hidden>
Date: Tue Jun 14 12:10:51 2022 -0400

    Increase playbook_timeout from 1h to 2h

    Test Plan:
    Verify timeout has increased from 3600 to 7200
    via logs. The timeout mechanism is unchanged.

    Closes-Bug: 1978704

    Signed-off-by: Kyle MacLeod <email address hidden>
    Change-Id: I8890b3765ef541453af3cc6415d634e901c30e15

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.distcloud
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.