Helm failure when applying app on subcloud

Bug #1999572 reported by Fabricio Henrique Ramos
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Fabricio Henrique Ramos

Bug Description

Brief Description
-----------------
8 out of 1000 subclouds (batch size of 250) failed to apply app due to Helm failure.
Severity
--------
Provide the severity of the defect.

Major: System/Feature is usable but degraded

Steps to Reproduce
------------------
Install app 22.12 on the System Controller
Deploy 250 virtual subclouds
Check there's a 50ms delay between the System Controller and the subclouds on OAM/MGMT networks. If not, add using delayomatic
Apply app on 250 virtual subclouds in parallel

Expected Behavior
------------------
App applied on all subclouds

Actual Behavior
----------------
App may fail to apply due to Helm

Reproducibility
---------------
8 out of 100 subclouds under the following batches:

Batch 1: 1 out of 250

Batch 2: 3 out of 250

Batch 3: 1 out of 250

Batch 4: 3 out of 250

System Configuration
--------------------
Distributed Cloud (DC1000-2)

Branch/Pull Time/Commit
-----------------------
-

Last Pass
---------
-

Timestamp/Logs
-

Test Activity
-------------
Scalability Testing

Workaround
----------
Connect to the subcloud and re-apply app

Changed in starlingx:
assignee: nobody → Fabricio Henrique Ramos (fhramos)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/867570

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/867570
Committed: https://opendev.org/starlingx/config/commit/c39815727f0801d4020c0d9cbfe0252e59718d41
Submitter: "Zuul (22348)"
Branch: master

commit c39815727f0801d4020c0d9cbfe0252e59718d41
Author: Fabricio Henrique Ramos <email address hidden>
Date: Tue Dec 13 17:28:42 2022 -0300

    Add retry mechanism for retrieve helm releases functions

    Networking stabilization on resource limited systems may cause pods
    to be restarted as the system reaches steady state. In these scenarios,
    the ability to retrieve helm release information is compromised.
    Provide a retry mechanism for these requests to improve reliability
    of application operations.

    Add retry mechanism for functions retrieve_helm_v2_releases
    and retrieve_helm_v3_releases, with values:
      - number of retries: 6
      - delay: 20s
    Also log the exception when an attempt catches an exception.

    Test Plan:
    PASS: Apply app no exception / no retries
    PASS: Apply app simulate raising exception function then retries
          and log the exception

    Closes-Bug: 1999572
    Signed-off-by: Fabricio Henrique Ramos <email address hidden>
    Change-Id: I9cdf007fe33a2289e2712cae78b591e512959641

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/867568
Committed: https://opendev.org/starlingx/integ/commit/31680687c404a6bdd6f23043b542d83feb2fbd88
Submitter: "Zuul (22348)"
Branch: master

commit 31680687c404a6bdd6f23043b542d83feb2fbd88
Author: Leonardo Fagundes Luz Serrano <email address hidden>
Date: Tue Dec 13 17:21:55 2022 -0300

    Send helmv2-cli error messages to stderr

    Script was sending error logs only on stdout
    while other scripts down the line (eg. sysinv.helm.utils)
    expected these error messages on stderr.

    Test Plan:
    PASS Delete armada deployment and use helm v2 list.
         Error msg on stderr.

    Partial-Bug: 1999572

    Signed-off-by: Leonardo Fagundes Luz Serrano <email address hidden>
    Change-Id: Ic6cd2bd844382a47e9b1451ae7c3430951493da8

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.8.0 stx.apps
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.