Failure to upload patch in a large DC - it times out with 600s

Bug #1978857 reported by Yuxing
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Yuxing

Bug Description

Brief Description
-----------------------------------------------------------
Failure to apply patch2 in a large DC in parallel - 600 seconds to timeout

Severity
--------------------------------------------------------------
Minor

Steps to Reproduce
--------------------------------------------------------------
Run dcmanager patch orchestration

Expected Behavior
--------------------------------------------------------------
The patch should be uploaded to all subclouds

Actual Behavior
----------------------------------------------------------------
Some subclouds fail to have the patch uploaded.

Reproducibility
--------------------------------------------------------------
More likely to reproduce in a large DC

System Configuration
----------------------------------------------------------------
DC

Load info
----------------------------------------------------------------
21.12

Last Pass
-------------------------------------------------------------------
na

Timestamp/Logs
------------------------------------------------------------------
2022-06-09 19:59:54.425 712037 INFO dcmanager.orchestrator.patch_orch_thread [-] Patch WRCP_21.12_PATCH_0002 missing from subcloud3194
2022-06-09 19:59:54.425 712037 INFO dcmanager.orchestrator.patch_orch_thread [-] Uploading patches [u'WRCP_21.12_PATCH_0002'] to subcloud subcloud3194

2022-06-09 20:11:43.651 712037 WARNING dcmanager.orchestrator.patch_orch_thread [-] Failed to upload patch file /opt/dc-vault/patches/21.12/WRCP_21.12_PATCH_0002.patch to subcloud subcloudxxxx: ReadTimeout: HTTPSConnectionPool(host='fdff:719a:bf60:394::2', port=5492): Read timed out. (read timeout=600)

Alarms
-----------------------------------
na

Test Activity
----------------------------------
Regression Testing

Workaround
-----------------------------------
na

Yuxing (yuxing)
Changed in starlingx:
assignee: nobody → Yuxing (yuxing)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/846053

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/846053
Committed: https://opendev.org/starlingx/distcloud/commit/4114969a0c75ecfa0f2e80e9bea5634a6afa398e
Submitter: "Zuul (22348)"
Branch: master

commit 4114969a0c75ecfa0f2e80e9bea5634a6afa398e
Author: Yuxing Jiang <email address hidden>
Date: Wed Jun 15 13:36:52 2022 -0400

    Extend timeout of patching REST API

    In an extreme case, the patching operation failed due to heavy
    network traffic. This commit extends the timeout of patching REST API
    to 900s to pass this scenario.

    Test:
    1 Deploy a DC with this change.
    2 Patch the subclouds with patch orchestration.

    Closes-Bug: 1978857

    Signed-off-by: Yuxing Jiang <email address hidden>
    Change-Id: I921dc98404805d60dfa1199cbf893cca91c3d637

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Al Bailey (albailey1974) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to distcloud (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/distcloud/+/847853

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to distcloud (master)

Reviewed: https://review.opendev.org/c/starlingx/distcloud/+/847853
Committed: https://opendev.org/starlingx/distcloud/commit/cca9a52b2c96044c86941ca8a8e533af01f97850
Submitter: "Zuul (22348)"
Branch: master

commit cca9a52b2c96044c86941ca8a8e533af01f97850
Author: Yuxing Jiang <email address hidden>
Date: Mon Jun 27 15:48:59 2022 -0400

    Extend timeout of patching REST API

    In an extreme case, the patching operation failed due to heavy
    network traffic. This commit extends the timeout of patching REST API
    to 900s to pass this scenario.

    Test:
    1 Deploy a DC with this change.
    2 Patch the subclouds with patch orchestration.

    Note:
    This commit is a re-do of:
    https://review.opendev.org/c/starlingx/distcloud/+/846053
    as it was un-do by:
    https://review.opendev.org/c/starlingx/distcloud/+/845484

    Closes-Bug: 1978857

    Signed-off-by: Yuxing Jiang <email address hidden>
    Change-Id: Ifecaec47f2c935bb5ff8d464a188f8ebad4d647c

Ghada Khalil (gkhalil)
tags: added: stx.7.0 stx.distcloud
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.