StarlingX

Bug #1887648
Comment #11

Comment 11 for bug 1887648

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-10-09: Fix merged to config (master)

#11

Reviewed: https://review.opendev.org/753310
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=5cb20a610ff2b76f5f196dd93928a9102a1055a1
Submitter: Zuul
Branch: master

commit 5cb20a610ff2b76f5f196dd93928a9102a1055a1
Author: Dan Voiculeasa <email address hidden>
Date: Fri Sep 18 12:59:22 2020 +0300

Change restore procedure

    During restore playbook, before controller-0 unlock there will be a flag
    file created to indicate that the system is going through a system
    restore.
    Exiting the system restore state is done through a command after all
    nodes are up and unlocked.
    Next commit will introduce sysinv commands to query and control the
    system restore state.

    Until all nodes are up, pods are stuck in `Terminating`. Armada will
    timeout waiting for those pods, if an armada apply is requested.
    This commit ensures auto-apply of apps does not occur during the
    system restore.

    While in the restore state, allow apps to have their images downloaded.
    If an image download failed, revert the status of the app to
    APP_RESTORE_REQUESTED instead of APP_APPLY_FAILURE. The auto image
    download is tried for apps in APP_RESTORE_REQUESTED until the
    system restore state is exited.
    This commit ensures enough time for manual intervention to fix the
    networking, docker registries connectivity or any other issues
    related to container images.

    Note: In the case of multi-nodes setups helm overrides may have been
    detected so apps will be auto-applied after exiting the restore
    state. The auto apply is started by a peridic audit thread.

    Change-Id: I44fc4aaa528e372a84115714f271b4f5e063f86e
    Partial-Bug: 1887648
    Signed-off-by: Dan Voiculeasa <email address hidden>