DC large system: Uploading patches fails on a subset of subclouds due to timeout

Bug #2042170 reported by Lindley Werner Soares Vieira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
In Progress
Undecided
Lindley Werner Soares Vieira

Bug Description

Brief Description

Patch upload fails for some random subcloud due to following timeout error.

cmanager strategy-step list | grep failed
| subcloud619 | 3 | failed | updating patches: HTTPSConnectionPool(host='2620:10a:a001:ac12::4d62', port=5492): Read timed out. (read timeout=900) | 2023-09-22 17:04:08.081243 | 2023-09-22 17:06:47.261512 |

Before request is made:
2023-09-22 17:04:31.194 217081 INFO dcmanager.orchestrator.states.base [req-1d6af774-8b65-4eb3-9570-65f5990ef2f9 - - - - -] Stage: 2, State: updating patches, Subcloud: subcloud619, Details: Uploading patches ['22.12_RR'] to subcloud

Error:
2023-09-22 17:06:47.259 217081 ERROR dcmanager.orchestrator.orch_thread [req-1d6af774-8b65-4eb3-9570-65f5990ef2f9 - - - - -] (patch) Failed! Stage: 2, State: updating patches, Subcloud: subcloud619: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='2620:10a:a001:ac12::4d62', port=5492): Read timed out. (read timeout=900)

Provide the severity of the defect.

Minor: Out of 250 subcouds 1 or 2 subclouds fails to upload patches

Steps to Reproduce

Upload RR patches on 250 subclouds in parallel

dcmanager patch-strategy create --max-parallel-subclouds 250 --upload-only

Expected Behavior

Upload patches successful for 250 subclouds

Actual Behavior
Upload patches fail for 1 or 2 subclouds

Reproducibility

Reproducible (2/2)

System Configuration

DC1000-2/1000 AWS subclouds

cat /etc/build.info
SW_VERSION="22.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2022-12-19_02-22-00"
SRC_BUILD_ID="38"

JOB="wrcp-22.12-debian"
BUILD_BY="jenkins"
BUILD_NUMBER="50"
BUILD_HOST="yow-wrcp-lx.wrs.com"
BUILD_DATE="2022-12-19 07:22:00 +0000"
[sysadmin@controller-1 ~(keystone_admin)]$ sudo sw-patch query
Patch ID RR Release Patch State
=========================== == ======= ===========
22.12_RR Y 22.12 Applied
WRCP_22.12_PATCH_0001 Y 22.12 Committed
WRCP_22.12_PATCH_0002 Y 22.12 Committed
WRCP_22.12_PATCH_0003_SEP06 Y 22.12 Committed

Last Pass

Unknown

Timestamp/Logs

2023-09-22 17:04:08.081243
/folk/cgts_logs/logs/CGTS-52742

Alarms

[sysadmin@controller-1 ~(keystone_admin)]$ dcmanager strategy-step list | grep failed
| subcloud619 | 3 | failed | updating patches: HTTPSConnectionPool(host='2620:10a:a001:ac12::4d62', port=5492): Read timed out. (read timeout=900) | 2023-09-22 17:04:08.081243 | 2023-09-22 17:06:47.261512 |

Test Activity

Feature Testing

Workaround

Apply upload patches again

Changed in starlingx:
assignee: nobody → Lindley Werner Soares Vieira (lindley-vieira)
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.