Add timeout for helm commands

Bug #1896529 reported by Angie Wang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Angie Wang

Bug Description

Brief Description
-----------------
Sysinv runtime manifest apply triggers the application reapply evaluation to regenerate helm application overrides. In the case that the application has user overrides, "helm install --dry-run" command is invoked to merge overrides. In general, "helm install --dry-run" is relatively quick as it doesn't need to request the cluster. But it's possible that it hangs in a bad situation, it could cause the runtime puppet manifest apply failed and Sysinv RPC timeout. For the robust, the command as other helm commands in sysinv/helm/utils.py should be improved to set with an operation timeout.

Angie Wang (angiewang)
summary: - The hang of helm install could cause sysinv RPC timeout
+ Add timeout for helm commands
Changed in starlingx:
assignee: nobody → Angie Wang (angiewang)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/753152

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium priority - robustness fix

tags: added: stx.con
tags: added: stx.5.0 stx.config stx.containers
removed: stx.con
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/753152
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=e379434dd669e38b53fc50c1cc3f246e2de54ba0
Submitter: Zuul
Branch: master

commit e379434dd669e38b53fc50c1cc3f246e2de54ba0
Author: Angie Wang <email address hidden>
Date: Mon Sep 21 17:23:05 2020 -0400

    Add timeouts for helm commands

    Armada is still connecting tiller to manage releases. To make it
    more robust, we should set "--tiller-connection-timeout" to the
    helmv2 commands to prevent helm commands hang in case tiller is
    not running or having connection issue. An additional operation
    timeout is added for some commands as well to enhance the robustness.

    Change-Id: I5b7bb06fec254455602a183e9151a8abe8541517
    Closes-Bug: 1896529
    Signed-off-by: Angie Wang <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/753501

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/753501
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=5d29b9e0d9fbee1733e915275763a2cc46e35407
Submitter: Zuul
Branch: master

commit 5d29b9e0d9fbee1733e915275763a2cc46e35407
Author: Angie Wang <email address hidden>
Date: Wed Sep 23 01:15:32 2020 -0400

    Remove the space of the helm command argument

    Remove the space of the helm command argument
    "--tiller-connection-timeout 5" as the space may
    cause the argument to be unknown sometimes.

    Change-Id: Ic41a7d1efcf897bbf4dd26b2b170a0f844b366c7
    Closes-Bug: 1896529
    Signed-off-by: Angie Wang <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.