Upgrade activation failure due to lack of node availability for apps to be scheduled on

Bug #2022008 reported by Igor Pires Soares
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
In Progress
Undecided
Unassigned

Bug Description

Brief Description
-----------------
Upgrade activation failure platform logs error " cert-manager, was not applied in the allocated time. Exiting for manual intervention."

Severity
-----------------
Critical

Steps to Reproduce
-----------------
1. Install stx 6
2. Apply monitor app
3, Follow the upgrade procedure to upgrade to stx 8
2023-05-03T17:31:43.752 controller-0 -bash: info HISTORY: PID=341306 UID=42425 system upgrade-activate

Expected Behavior
-----------------
No failure in activation

Actual Behavior
-----------------

Upgrade activation failure with

system upgrade-show
+--------------+--------------------------------------+
| Property | Value |
+--------------+--------------------------------------+
| uuid | ffcedc5f-919d-4abd-a729-493ad9e608ea |
| state | activation-failed |
+--------------+--------------------------------------+

System Configuration
-----------------
DC

Last Pass
This is an intermittent issue.

Timestamp/Logs
-----------------

2023-05-03T17:40:51.286 controller-0 configassistant[719733] info /usr/bin/migrate_helm_release.py:151 INFO [__main__] Searching for volumeattachments.storage.k8s.io resource related to stx-rbd-provisioner...
2023-05-03T17:40:51.579 controller-0 configassistant[719733] info /usr/bin/migrate_helm_release.py:232 INFO [__main__] Cleaned up helm2 data for stx-rbd-provisioner
2023-05-03T17:40:55.511 controller-0 root: info 64-upgrade-cert-manager.sh: cert-manager, version 1.0-1, is currently in the state: applied
2023-05-03T17:40:55.513 controller-0 root: info 64-upgrade-cert-manager.sh: creating cert manager resources backup
2023-05-03T17:40:55.822 controller-0 root: info 64-upgrade-cert-manager.sh: converting cert manager resources backup
2023-05-03T17:40:56.060 controller-0 root: info 64-upgrade-cert-manager.sh: removing extra args overrides from cert-manager
2023-05-03T17:40:58.998 controller-0 root: info 64-upgrade-cert-manager.sh: Applying cert-manager, version 1.0-1
2023-05-03T17:57:14.295 controller-0 root: info 64-upgrade-cert-manager.sh: cert-manager, version 1.0-1, was not applied in the allocated time. Exiting for manual intervention...

Alarms
------------------
Test Activity
------------------
Regression

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nfv (master)

Reviewed: https://review.opendev.org/c/starlingx/nfv/+/884932
Committed: https://opendev.org/starlingx/nfv/commit/0ece2f8e0ef0fb76b3d0f7b011dc645fbf52b9b6
Submitter: "Zuul (22348)"
Branch: master

commit 0ece2f8e0ef0fb76b3d0f7b011dc645fbf52b9b6
Author: Igor Soares <email address hidden>
Date: Wed May 31 15:54:19 2023 -0300

    Add logs to inform when nodes are tainted

    Add logs to inform when the 'services=disabled:NoExecute' taint is added
    to nodes as well as when they are removed.

    This aims to improve future log analysis by facilitating the
    identification of cases where taints could be mistakenly added to nodes.

    Test Plan:
    PASS: lock controller and check if the operation is properly logged
    PASS: unlock controller and check if the operation is properly logged

    Partial-Bug: 2022008
    Change-Id: Ie9c7432211621a9fbf7aa90282dfb91405f90c33
    Signed-off-by: Igor Soares <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.