Upgrade activate step fails during upgrading FluxCD controllers complaining about images download

Bug #1999971 reported by Dan Voiculeasa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Dan Voiculeasa

Bug Description

Brief Description
--------------------
DC Upgrade - Upgrade activate failed for SystemController with error: Cannot install fluxcd controllers

Error:

    /var/log/sysinv.log:sysinv 2022-12-16 20:45:12.102 896770 ERROR controllerconfig.upgrades.utils [-] Migration script /etc/upgrade.d/62-enable-fluxcd-controllers.py failed with returncode 1Script output:
    /var/log/sysinv.log:sysinv 2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils [-] Migration script /etc/upgrade.d/62-enable-fluxcd-controllers.py failed with returncode 1Script output:
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils Traceback (most recent call last):
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils File "/usr/lib/python3/dist-packages/controllerconfig/upgrades/utils.py", line 111, in execute_migration_scripts
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils raise Exception(msg)
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils Exception: Migration script /etc/upgrade.d/62-enable-fluxcd-controllers.py failed with returncode 1Script output:
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils Traceback (most recent call last):
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 60, in <module>
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils sys.exit(main())
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 38, in main
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils enable_fluxcd_controllers(from_release)
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 55, in enable_fluxcd_controllers
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils raise Exception('Cannot install fluxcd controllers')
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils Exception: Cannot install fluxcd controllers
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils
    /var/log/sysinv.log:2022-12-16 20:45:12.103 896770 ERROR controllerconfig.upgrades.utils
    /var/log/sysinv.log:sysinv 2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager [-] Upgrade activation failed: Exception: Migration script /etc/upgrade.d/62-enable-fluxcd-controllers.py failed with returncode 1Script output:
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager Traceback (most recent call last):
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 12181, in activate_upgrade
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager upgrades_management.activate_upgrade(from_version,
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/controllerconfig/upgrades/management.py", line 360, in activate_upgrade
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager utils.execute_migration_scripts(from_load, to_load, utils.ACTION_ACTIVATE)
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager File "/usr/lib/python3/dist-packages/controllerconfig/upgrades/utils.py", line 111, in execute_migration_scripts
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager raise Exception(msg)
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager Exception: Migration script /etc/upgrade.d/62-enable-fluxcd-controllers.py failed with returncode 1Script output:
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager Traceback (most recent call last):
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 60, in <module>
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager sys.exit(main())
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 38, in main
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager enable_fluxcd_controllers(from_release)
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 55, in enable_fluxcd_controllers
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager raise Exception('Cannot install fluxcd controllers')
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager Exception: Cannot install fluxcd controllers
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager
    /var/log/sysinv.log:2022-12-16 20:45:12.104 896770 ERROR sysinv.conductor.manager
    /var/log/sysinv.log:sysinv 2022-12-16 20:45:12.109 896770 ERROR zerorpc.core [-] : Exception: Migration script /etc/upgrade.d/62-enable-fluxcd-controllers.py failed with returncode 1Script output:
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core Traceback (most recent call last):
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/zerorpc/core.py", line 167, in _async_task
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core functor.pattern.process_call(self._context, bufchan, event, functor)
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/zerorpc/patterns.py", line 30, in process_call
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core result = functor(*req_event.args, **req_event.kwargs)
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/zerorpc/decorators.py", line 44, in __call__
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core return self._functor(*args, **kargs)
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/sysinv/zmq_rpc/zmq_rpc.py", line 39, in method
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core retval = getattr(self.target, func)(context, **kwargs)
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 12190, in activate_upgrade
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core self.dbapi.software_upgrade_update(
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core self.force_reraise()
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core six.reraise(self.type_, self.value, self.tb)
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core raise value
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/sysinv/conductor/manager.py", line 12181, in activate_upgrade
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core upgrades_management.activate_upgrade(from_version,
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/controllerconfig/upgrades/management.py", line 360, in activate_upgrade
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core utils.execute_migration_scripts(from_load, to_load, utils.ACTION_ACTIVATE)
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/controllerconfig/upgrades/utils.py", line 111, in execute_migration_scripts
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core raise Exception(msg)
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core Exception: Migration script /etc/upgrade.d/62-enable-fluxcd-controllers.py failed with returncode 1Script output:
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core Traceback (most recent call last):
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 60, in <module>
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core sys.exit(main())
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 38, in main
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core enable_fluxcd_controllers(from_release)
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core File "/etc/upgrade.d/62-enable-fluxcd-controllers.py", line 55, in enable_fluxcd_controllers
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core raise Exception('Cannot install fluxcd controllers')
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core Exception: Cannot install fluxcd controllers
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core
    /var/log/sysinv.log:2022-12-16 20:45:12.109 896770 ERROR zerorpc.core

Severity
---------------
<Critical: System/Feature is not usable after the defect>

Steps to Reproduce
-----------------
Upgrade SystemController from 22.06_patch2 to 22.12

Expected Behavior
---------------------
The upgrade should complete successfully

Actual Behavior
------------
The upgrade failed to be activated

Reproducibility
-----
Seen once

System Configuration
------------
SystemControlle Distributed cloud

Load info (eg: 2022-03-10_20-00-07)
----------------------
SW_VERSION="22.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2022-12-15_22-00-07"
SRC_BUILD_ID="31"JOB="wrcp-22.12-debian"
BUILD_BY="jenkins"
BUILD_NUMBER="43"
BUILD_HOST="redacted"
BUILD_DATE="2022-12-16 03:00:07 +0000"

Last Pass
---------------------------
With a build from Dec, 2nd

Timestamp/Logs
--------------
N/A

Alarms
---------------
N/A

Test Activity
---------------
 Feature Testing

Workaround
--------------
retry upgrade-activation

Revision history for this message
Dan Voiculeasa (dvoicule) wrote :

I suspect a race condition between 2 playbooks.
After analyzing the logs from 2 labs with this issue, I see strong hints that my instinct should be correct.

Changed in starlingx:
assignee: nobody → Dan Voiculeasa (dvoicule)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/868035

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/868035
Committed: https://opendev.org/starlingx/config/commit/3853336652cb672717f9d40f767cde0a3e274d17
Submitter: "Zuul (22348)"
Branch: master

commit 3853336652cb672717f9d40f767cde0a3e274d17
Author: Dan Voiculeasa <email address hidden>
Date: Sat Dec 17 20:49:29 2022 +0200

    Sync part of the functions that call playbooks

    It is observed that concurrent calls to docker images download section
    may create problems due to race condition(for example 1 task removes
    the tags while the other expects them to be there).
    This becomes obvious when running upgrades with a race between
    upgrade-static-images.yml and upgrade-fluxcd-controllers.yml.

    Sync with a lock the functions that call upgrade playbooks at
    conductor start, and the one handling upgrade-activate.

    This may not be a complete solution, other playbook calls have to be
    examined.

    Tests on AIO-SX:

    DESC: Emulated race condition by adding a sync for
    _k8s_application_audit function, adding some logging and
    restarting sysinv conductor.
    PASS: Observed _k8s_application_audit called first, and the functions
    that handled the upgrade playbook at conductor start being serialized.

    PASS: AIO-SX bootstrap

    Closes-Bug: 1999971
    Signed-off-by: Dan Voiculeasa <email address hidden>
    Change-Id: Id96fa62b652b438fb71fc1132e3858f1bd3dca50

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.8.0 stx.apps stx.update
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-playbooks (master)

Change abandoned by "Dan Voiculeasa <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/868033
Reason: It was decided to go with https://review.opendev.org/c/starlingx/config/+/868035 to avoid multiple ansible playbook test scenarios.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.