Kube Application timeout when applying custom application on subcloud

Bug #2002311 reported by Igor Pires Soares
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Igor Pires Soares

Bug Description

Brief Description

Custom application failed to apply on subcloud due to Kube Application timeout

Severity

Major

Steps to Reproduce

    Deploy and manage 500 subclouds
    Apply custom application on the subclouds, 250 per round

Expected Behavior

Custom application applied

Actual Behavior

Custom application failure due to Kube Application execution timeout.

Reproducibility

2 out of 500.

System Configuration

Distributed Cloud

Last Pass

2022-11-29

Timestamp/Logs

sysinv 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app [-] Kube Application execution progress monitor timed out.: sysinv.common.exception.KubeAppProgressMonitorTimeout: Kube Application execution progress monitor timed out. 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app Traceback (most recent call last): 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app File "/usr/lib/python3/dist-packages/sysinv/conductor/kube_app.py", line 1892, in _make_fluxcd_operation_with_monitor 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app rc = _check_progress() 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app File "/usr/lib/python3/dist-packages/sysinv/conductor/kube_app.py", line 1870, in _check_progress 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app time.sleep(1) 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app File "/usr/lib/python3/dist-packages/eventlet/greenthread.py", line 36, in sleep 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app hub.switch() 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 298, in switch 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app return self.greenlet.switch() 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app sysinv.common.exception.KubeAppProgressMonitorTimeout: Kube Application execution progress monitor timed out. 2022-12-13 16:15:09.654 67535 ERROR sysinv.conductor.kube_app sysinv 2022-12-13 16:15:09.658 67535 INFO sysinv.conductor.kube_app [-] lifecycle hook for application analytics-app (22.12-0) started {'lifecycle_type': 'fluxcd-request', 'relative_timing': 'post', 'operation': 'apply', 'extra': {'rc': False}}. sysinv 2022-12-13 16:15:09.659 67535 INFO sysinv.conductor.kube_app [-] lifecycle hook for application analytics-app started {'mode': 'manual', 'lifecycle_type': 'rbd', 'relative_timing': 'post', 'operation': 'apply', 'extra': {}}. sysinv 2022-12-13 16:15:09.661 67535 INFO sysinv.conductor.kube_app [-] lifecycle hook for application analytics-app started {'mode': 'manual', 'lifecycle_type': 'resource', 'relative_timing': 'post', 'operation': 'apply', 'extra': {}}. sysinv 2022-12-13 16:15:09.905 67535 ERROR sysinv.conductor.kube_app [-] Application apply aborted!. sysinv 2022-12-13 16:15:09.905 67535 INFO sysinv.conductor.kube_app [-] Deregister the abort status of app analytics-app

Test Activity

Scalability Testing

Workaround

NA

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/869567

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/869567
Committed: https://opendev.org/starlingx/config/commit/8bcd48f728f53591b5fc8163f5fc16014aeb8e39
Submitter: "Zuul (22348)"
Branch: master

commit 8bcd48f728f53591b5fc8163f5fc16014aeb8e39
Author: Igor Soares <email address hidden>
Date: Mon Jan 9 11:03:21 2023 -0300

    Anticipate failure for retries exhausted timeout

    Anticipate failure for corner cases in which application apply
    operations timeout due to another operation in progress.

    Helm resource statuses are parsed in order to match that
    specific case and report a failure before the timeout is reached.

    Test Plan:
    PASS: AIO-SX full build and deployment
    PASS: Apply app with no exceptions

    Closes-Bug: 2002311
    Signed-off-by: Igor Soares <email address hidden>
    Change-Id: Idd145fe10a9b6b5705f42a2726a42143aa46faed

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.8.0 stx.config stx.containers
Changed in starlingx:
assignee: nobody → Igor Pires Soares (ipiresso)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.