commit 8f02a3cf7bc61956f7245ce02ed0c280ca07a75c
Author: Dan Voiculeasa <email address hidden>
Date: Fri Mar 17 02:32:46 2023 +0200
Defer certificate install during app downloading images
It is observed that when the docker registry is in use(eg. app
download images) if it is restarted, it will wrongly report some
images as being successfully downloaded, when they are not. No error
is thrown to the docker API client used, thus the error is silently
hidden.
By docker registry in use we mean an image push to the registry is in
progress.
Because the failed push is hidden, the error will be propagated and
the components needing the images will fail.
This behavior was observed during a particular case: upgrade of the
system. It is observed that the cause for docker registry restart is a
manifest that is run [1].
Defer the logic for installing the certificate (files and manifest).
Implement file deferral, which is needed.
Consider the condition for deferral to be the present of apps that
will have the images downloaded by the framework part of
restore/upgrade procedure.
Note: outside of the scope of this work, seems deferrals will be
forgotten and not attempted after a sysinv-conductor restart.
Tests:
PASS: Deploy AIO-DX SystemController DC,
Deploy AIO-SX Subcloud DC,
Deploy AIO-SX
PASS: Observe the new log entries for both deferred and instant
config type config_update_file filter_mapping ...
config type config_apply_runtime_manifest filter_mapping ...
config type ... False (wait)
config type ... True (continue)
PASS: Applied a docker certificate and observed the manifest and
files updated intantly, no app in 'restore-requested' or 'applying' state
PASS: Changed one app state to 'restore-requested' and 'applying',
also alternating between them. Applied a docker certificate
and observed the manifest and files are deferred until the app
is moved out of these 2 states.
Observed the manifest applied after the wait is indeed the one restarting the docker registry
Reviewed: https:/ /review. opendev. org/c/starlingx /config/ +/877724 /opendev. org/starlingx/ config/ commit/ 8f02a3cf7bc6195 6f7245ce02ed0c2 80ca07a75c
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 8f02a3cf7bc6195 6f7245ce02ed0c2 80ca07a75c
Author: Dan Voiculeasa <email address hidden>
Date: Fri Mar 17 02:32:46 2023 +0200
Defer certificate install during app downloading images
It is observed that when the docker registry is in use(eg. app
download images) if it is restarted, it will wrongly report some
images as being successfully downloaded, when they are not. No error
is thrown to the docker API client used, thus the error is silently
hidden.
By docker registry in use we mean an image push to the registry is in
progress.
Because the failed push is hidden, the error will be propagated and
the components needing the images will fail.
This behavior was observed during a particular case: upgrade of the
system. It is observed that the cause for docker registry restart is a
manifest that is run [1].
Defer the logic for installing the certificate (files and manifest).
Implement file deferral, which is needed.
Consider the condition for deferral to be the present of apps that
will have the images downloaded by the framework part of
restore/upgrade procedure.
Note: outside of the scope of this work, seems deferrals will be
forgotten and not attempted after a sysinv-conductor restart.
Tests: apply_runtime_ manifest filter_mapping ...
'applying' state
restarting the docker registry
PASS: Deploy AIO-DX SystemController DC,
Deploy AIO-SX Subcloud DC,
Deploy AIO-SX
PASS: Observe the new log entries for both deferred and instant
config type config_update_file filter_mapping ...
config type config_
config type ... False (wait)
config type ... True (continue)
PASS: Applied a docker certificate and observed the manifest and
files updated intantly, no app in 'restore-requested' or
PASS: Changed one app state to 'restore-requested' and 'applying',
also alternating between them. Applied a docker certificate
and observed the manifest and files are deferred until the app
is moved out of these 2 states.
Observed the manifest applied after the wait is indeed the one
[1]: https:/ /opendev. org/starlingx/ config/ src/commit/ c937f46ecee2802 473d786ab8c0add ddb9039abc/ sysinv/ sysinv/ sysinv/ sysinv/ conductor/ manager. py#L13449- L13453 5431d73114d28ed ade4cf6663c
Closes-Bug: 2013800
Signed-off-by: Dan Voiculeasa <email address hidden>
Change-Id: Ie0e5d6cee62533