After upgrade, fail to pull images from local registry

Bug #2013800 reported by Dan Voiculeasa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Dan Voiculeasa

Bug Description

After upgrade, apps fail to pull images from local registry.

Reproducibility
---------------
Seen once

------------
Investigation

sysinv startup at 2022-09-21 18:54:57

Images for cert-manager N version stated to be downloaded at 18:56:09, this happens for apps that are in 'restore-requested' state(both cert-manager and platform-integ-apps).

sysinv 2022-09-21 18:56:09.500 129181 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/quay.io/jetstack/cert-manager-webhook:v0.15.0 is not available in local registry, download started from public/private registry
sysinv 2022-09-21 18:56:09.501 129181 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/quay.io/jetstack/cert-manager-acmesolver:v0.15.0 is not available in local registry, download started from public/private registry
sysinv 2022-09-21 18:56:09.503 129181 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/quay.io/jetstack/cert-manager-cainjector:v0.15.0 is not available in local registry, download started from public/private registry
sysinv 2022-09-21 18:56:09.533 129181 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/quay.io/jetstack/cert-manager-controller:v0.15.0 is not available in local registry, download started from public/private registry

A kubeadm join command. don't know if relevant

sysinv 2022-09-21 18:56:09.856 129181 INFO sysinv.puppet.kubernetes [-] get_kubernetes_join_cmd join_cmd=kubeadm join [aefd::1]:6443 --token 1km87k.u8iq06q0tnzge8n3 --discovery-token-ca-cert-hash sha256:98379f5f54fac4f5f5e86fc136405b37d2ed1814acbc510f478e08a93f584652 --control-plane --certificate-key 090940907bd45c97e6fcc646a2ca3ed6c140c2d8f2baffefa1764efb432aeaff --apiserver-advertise-address aefd::2 --cri-socket /var/run/containerd/containerd.sock

Some puppet manifests start at 18:56:11, finished at 18:56:23.

sysinv 2022-09-21 18:56:11.720 129181 INFO sysinv.agent.rpcapi [-] config_apply_runtime_manifest: fanout_cast: sending config 2a518e2b-cb46-4f53-898a-9f4c3efb892b {'classes': ['platform::dockerdistribution::runtime'], 'force': False, 'personalities': ['controller'], 'host_uuids': [u'1367468e-982d-4d16-a4d5-7332d381a03e']} to agent
sysinv 2022-09-21 18:56:11.723 105118 INFO sysinv.agent.manager [-] config_apply_runtime_manifest: 2a518e2b-cb46-4f53-898a-9f4c3efb892b {u'classes': [u'platform::dockerdistribution::runtime'], u'force': False, u'personalities': [u'controller'], u'host_uuids': [u'1367468e-982d-4d16-a4d5-7332d381a03e']} controller
sysinv 2022-09-21 18:56:11.724 129181 INFO sysinv.conductor.manager [-] found _audit_deferred_runtime_config request apply {'config_type': 'config_update_file', 'config_uuid': '0a34b9d8-5ea5-4594-a633-0f5e2cd50622', 'config_dict': {'file_content': '-----BEGIN CERTIFICATE-----\nMIICkzCCAjmgAwIBAgIRANQKInqS5mROnYSs4CJfQ58wCgYIKoZIzj0EAwIwMjEw\nMC4GA1UEAxMnV1JTLVdSQ1AtQ3VtdWx1cy1TdGVwQ2EgSW50ZXJtZWRpYXRlIENB\nMB4XDTIyMDkyMDIxNDcwMloXDTIyMTIxOTIxNDcwMlowFjEUMBIGA1UEChMLV1JD\nUC1TeXN0ZW0wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCvSfkGFrMA\nAC3E2Dijxs0dkLhgSSlVU0p+wXvYjadAGxtYMiP5I8ucQ/afY9Jc0AdGQWdSidKM\nfThHyMVv5UHutTYHHSzQxfC8PD+SHVeqtKFh2XRyubuQCqXk7lvaYizfI7GNXnT0\n01Fymkv2zEqf1YI4CvONs3QOvWITggAWGLV0ELAUO114+6awvq6zkffLvtG+ITBs\n8vFCwmnECAtvI1T6It6VNt60YD96ptXEjAuhLt13j2zRBjnKYFrXoC1154UTL4Ov\ndGlUnwcbY4LmsFj+gvLarWOqhSdl4OPXhn3hxtb8ggWUNbIpUaF9ki5nHkC0sOp4\n332MR4/GS7tbAgMBAAGjgYAwfjAOBgNVHQ8BAf8EBAMCBaAwDAYDVR0TAQH/BAIw\nADAfBgNVHSMEGDAWgBRUbX1Un0W+9PMh2JCfO6jCCBbn4DA9BgNVHREENjA0gg5y\nZWdpc3RyeS5sb2NhbIcQJiABCqABoQMAAAAAAAACCIcQ/QEBeQAAAAAAAAAAAAAA\nAjAKBggqhkjOPQQDAgNIADBFAiBiOTPFsBfHdbD/XqIro50/t60YWrfL+eHqfUqa\n8ApMPAIhAPtA/o+JxnwXWLjvKCq1HfFyUm28BA6ddCoPjSA9UJbO\n-----END CERTIFICATE-----\n-----BEGIN CERTIFICATE-----\nMIIBwDCCAWagAwIBAgIQRj6NMu2lu0nKhWnzqQ9KsjAKBggqhkjOPQQDAjAqMSgw\nJgYDVQQDEx9XUlMtV1JDUC1DdW11bHVzLVN0ZXBDYSBSb290IENBMB4XDTIwMDYx\nODEyMDIzN1oXDTMwMDYxNjEyMDIzN1owMjEwMC4GA1UEAxMnV1JTLVdSQ1AtQ3Vt\ndWx1cy1TdGVwQ2EgSW50ZXJtZWRpYXRlIENBMFkwEwYHKoZIzj0CAQYIKoZIzj0D\nAQcDQgAEm18RRceX455fnU+yFDQCoHGgYizy8EBlb9Px1kpHKpxg++N1KQEZYjtw\nBXSRPwFu8WyAhVAtl9Y9XGxECGsWJ6NmMGQwDgYDVR0PAQH/BAQDAgEGMBIGA1Ud\nEwEB/wQIMAYBAf8CAQAwHQYDVR0OBBYEFFRtfVSfRb708yHYkJ87qMIIFufgMB8G\nA1UdIwQYMBaAFHjxZQ/NNdLHm0e6RxVp4rTCBM4TMAoGCCqGSM49BAMCA0gAMEUC\nIQCbi5xcv0fkGGWKX6gjpRYxz9/NS5Q5KdI4bmaha85SQAIgOxQ+5OYyqVCamWBC\noiwZyDqTrbC2VTyyVp249bktM/c=\n-----END CERTIFICATE-----\n', 'file_names': ['/etc/docker/certs.d/registry.local:9001/registry-cert.crt'], 'permissions': 256, 'personalities': ['controller', 'worker'], 'nobackup': True}}

sysinv 2022-09-21 18:56:11.740 129181 INFO sysinv.puppet.puppet [-] Updating hiera for host: controller-0 with config_uuid: 0a34b9d8-5ea5-4594-a633-0f5e2cd50622

..
sysinv 2022-09-21 18:56:23.081 105118 INFO sysinv.agent.manager [-] Runtime manifest apply completed for classes [u'platform::dockerdistribution::runtime'].
sysinv 2022-09-21 18:56:23.138 105118 INFO sysinv.agent.manager [-] Agent config applied 2a518e2b-cb46-4f53-898a-9f4c3efb892b
sysinv 2022-09-21 18:56:23.196 129181 INFO sysinv.conductor.manager [-] _remove_config_from_reboot_config_list host: 1367468e-982d-4d16-a4d5-7332d381a03e,config_uuid: 2a518e2b-cb46-4f53-898a-9f4c3efb892b
sysinv 2022-09-21 18:56:23.219 129181 WARNING sysinv.conductor.manager [-] controller-0: iconfig out of date: target 409c4869-ec71-439b-a92b-66527342893d, applied 2a518e2b-cb46-4f53-898a-9f4c3efb892b
sysinv 2022-09-21 18:56:23.220 129181 WARNING sysinv.conductor.manager [-] SYS_I Raise system config alarm: host controller-0 config applied: 2a518e2b-cb46-4f53-898a-9f4c3efb892b vs. target: 409c4869-ec71-439b-a92b-66527342893d.
sysinv 2022-09-21 18:56:23.271 105118 INFO sysinv.agent.manager [-] Agent config applied 0a34b9d8-5ea5-4594-a633-0f5e2cd50622

One cert-manager image pushed at 18:56:18.

sysinv 2022-09-21 18:56:18.956 129181 INFO sysinv.conductor.kube_app [-] Remove image registry.central:9001/quay.io/jetstack/cert-manager-acmesolver:v0.15.0 after push to local registry.
sysinv 2022-09-21 18:56:18.986 129181 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/quay.io/jetstack/cert-manager-acmesolver:v0.15.0 download succeeded in 9 seconds

Manifests updated CA and restarted docker registry(between 18:56:19 -> 18:56:22).

2022-09-21T18:56:19.524 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Class[Platform::Config::Timezone]: The container Stage[pre] will propagate my refresh event^[[0m
2022-09-21T18:56:19.526 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Exec[set-hostname](provider=posix): Executing check 'test `hostname` = `cat /etc/hostname`'^[[0m
2022-09-21T18:56:19.528 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Executing: 'test `hostname` = `cat /etc/hostname`'^[[0m
2022-09-21T18:56:19.531 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Exec[update-dc-ca-trust](provider=posix): Executing 'update-ca-trust'^[[0m
2022-09-21T18:56:19.533 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Executing: 'update-ca-trust'^[[0m
2022-09-21T18:56:19.956 ^[[mNotice: 2022-09-21 18:56:19 +0000 /Stage[pre]/Platform::Config::Dc_root_ca/Exec[update-dc-ca-trust]/returns: executed successfully^[[0m
2022-09-21T18:56:19.959 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 /Stage[pre]/Platform::Config::Dc_root_ca/Exec[update-dc-ca-trust]: The container Class[Platform::Config::Dc_root_ca] will propagate my refresh event^[[0m
2022-09-21T18:56:19.962 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Class[Platform::Config::Dc_root_ca]: The container Stage[pre] will propagate my refresh event^[[0m
2022-09-21T18:56:19.964 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Executing: '/bin/systemctl is-active crond'^[[0m
2022-09-21T18:56:19.970 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Executing: '/bin/systemctl is-enabled crond'^[[0m
2022-09-21T18:56:19.978 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Exec[sm-restart-docker-distribution](provider=posix): Executing 'sm-restart-safe service docker-distribution'^[[0m
2022-09-21T18:56:19.980 ^[[0;36mDebug: 2022-09-21 18:56:19 +0000 Executing: 'sm-restart-safe service docker-distribution'^[[0m
2022-09-21T18:56:21.350 ^[[mNotice: 2022-09-21 18:56:21 +0000 /Stage[post]/Platform::Dockerdistribution::Reload/Platform::Sm::Restart[docker-distribution]/Exec[sm-restart-docker-distribution]/returns: executed successfully^[[0m
2022-09-21T18:56:21.352 ^[[0;36mDebug: 2022-09-21 18:56:21 +0000 /Stage[post]/Platform::Dockerdistribution::Reload/Platform::Sm::Restart[docker-distribution]/Exec[sm-restart-docker-distribution]: The container Platform::Sm::Restart[docker-distribution] will propagate my refresh event^[[0m
2022-09-21T18:56:21.354 ^[[0;36mDebug: 2022-09-21 18:56:21 +0000 Exec[sm-restart-registry-token-server](provider=posix): Executing 'sm-restart-safe service registry-token-server'^[[0m
2022-09-21T18:56:21.356 ^[[0;36mDebug: 2022-09-21 18:56:21 +0000 Executing: 'sm-restart-safe service registry-token-server'^[[0m
2022-09-21T18:56:22.618 ^[[mNotice: 2022-09-21 18:56:22 +0000 /Stage[post]/Platform::Dockerdistribution::Reload/Platform::Sm::Restart[registry-token-server]/Exec[sm-restart-registry-token-server]/returns: executed successfully^[[0m
2022-09-21T18:56:22.619 ^[[0;36mDebug: 2022-09-21 18:56:22 +0000 /Stage[post]/Platform::Dockerdistribution::Reload/Platform::Sm::Restart[registry-token-server]/Exec[sm-restart-registry-token-server]: The container Platform::Sm::Restart[registry-token-server] will propagate my refresh event^[[0m
2022-09-21T18:56:22.621 ^[[0;36mDebug: 2022-09-21 18:56:22 +0000 Platform::Sm::Restart[registry-token-server]: The container Class[Platform::Dockerdistribution::Reload] will propagate my refresh event^[[0m
2022-09-21T18:56:22.623 ^[[0;36mDebug: 2022-09-21 18:56:22 +0000 Platform::Sm::Restart[docker-distribution]: The container Class[Platform::Dockerdistribution::Reload] will propagate my refresh event^[[0m
2022-09-21T18:56:22.625 ^[[0;36mDebug: 2022-09-21 18:56:22 +0000 Class[Platform::Dockerdistribution::Reload]: The container Stage[post] will propagate my refresh event^[[0m

Rest of the images pushed at 18.56.25

sysinv 2022-09-21 18:56:25.371 129181 INFO sysinv.conductor.kube_app [-] Remove image registry.central:9001/quay.io/jetstack/cert-manager-controller:v0.15.0 after push to local registry.
sysinv 2022-09-21 18:56:25.373 129181 INFO sysinv.conductor.kube_app [-] Remove image registry.central:9001/quay.io/jetstack/cert-manager-cainjector:v0.15.0 after push to local registry.
sysinv 2022-09-21 18:56:25.375 129181 INFO sysinv.conductor.kube_app [-] Remove image registry.central:9001/quay.io/jetstack/cert-manager-webhook:v0.15.0 after push to local registry.
sysinv 2022-09-21 18:56:25.401 129181 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/quay.io/jetstack/cert-manager-controller:v0.15.0 download succeeded in 15 seconds
sysinv 2022-09-21 18:56:25.494 129181 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/quay.io/jetstack/cert-manager-cainjector:v0.15.0 download succeeded in 16 seconds
sysinv 2022-09-21 18:56:25.530 129181 INFO sysinv.conductor.kube_app [-] Image registry.local:9001/quay.io/jetstack/cert-manager-webhook:v0.15.0 download succeeded in 16 seconds
sysinv 2022-09-21 18:56:25.531 129181 INFO sysinv.conductor.kube_app [-] All docker images for application cert-manager were successfully downloaded in 22 seconds

We see errors about uploading ~ 18:56:25.

2022-09-21T18:56:25.369 controller-0 registry[157387]: info time="2022-09-21T18:56:25.369848645Z" level=warning msg="error authorizing context: invalid token" go.version=go1.16.12 http.request.host="registry.local:9001" http.request.id=1dd7979e-0a22-4c61-9cb5-bcacbfd2cf78 http.request.method=POST http.request.remoteaddr="[fd01:179::2]:37146" http.request.uri="/v2/quay.io/external_storage/rbd-provisioner/blobs/uploads/" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/5.10.112-200.49.tis.rt.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \(linux\))" vars.name="quay.io/external_storage/rbd-provisioner"
2022-09-21T18:56:25.370 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.370167513Z" level=error msg="Upload failed: unauthorized: authentication required"
2022-09-21T18:56:25.371 controller-0 registry[157387]: info time="2022-09-21T18:56:25.371063823Z" level=warning msg="error authorizing context: invalid token" go.version=go1.16.12 http.request.host="registry.local:9001" http.request.id=0cdba8ee-ab96-48be-b768-63e3d32fb56c http.request.method=POST http.request.remoteaddr="[fd01:179::2]:37148" http.request.uri="/v2/quay.io/jetstack/cert-manager-controller/blobs/uploads/" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/5.10.112-200.49.tis.rt.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \(linux\))" vars.name="quay.io/jetstack/cert-manager-controller"
2022-09-21T18:56:25.371 controller-0 registry[157387]: info time="2022-09-21T18:56:25.371222364Z" level=warning msg="error authorizing context: invalid token" go.version=go1.16.12 http.request.host="registry.local:9001" http.request.id=60cf6feb-83fc-4db2-82b2-0a14d0ce980e http.request.method=POST http.request.remoteaddr="[fd01:179::2]:37152" http.request.uri="/v2/quay.io/jetstack/cert-manager-cainjector/blobs/uploads/" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/5.10.112-200.49.tis.rt.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \(linux\))" vars.name="quay.io/jetstack/cert-manager-cainjector"
2022-09-21T18:56:25.371 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.371358034Z" level=error msg="Upload failed: unauthorized: authentication required"
2022-09-21T18:56:25.371 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.371428551Z" level=info msg="Attempting next endpoint for push after error: unauthorized: authentication required"
2022-09-21T18:56:25.371 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.371474346Z" level=error msg="Upload failed: unauthorized: authentication required"
2022-09-21T18:56:25.371 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.371517828Z" level=info msg="Attempting next endpoint for push after error: unauthorized: authentication required"
2022-09-21T18:56:25.371 controller-0 registry[157387]: info time="2022-09-21T18:56:25.371588654Z" level=warning msg="error authorizing context: invalid token" go.version=go1.16.12 http.request.host="registry.local:9001" http.request.id=df066ce6-f5f0-4bd3-a827-3b967abe5cca http.request.method=POST http.request.remoteaddr="[fd01:179::2]:37144" http.request.uri="/v2/quay.io/external_storage/rbd-provisioner/blobs/uploads/" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/5.10.112-200.49.tis.rt.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \(linux\))" vars.name="quay.io/external_storage/rbd-provisioner"
2022-09-21T18:56:25.371 controller-0 registry[157387]: info time="2022-09-21T18:56:25.371593579Z" level=warning msg="error authorizing context: invalid token" go.version=go1.16.12 http.request.host="registry.local:9001" http.request.id=fcf6d6d3-57fe-49da-bd9a-9548c29e2695 http.request.method=POST http.request.remoteaddr="[fd01:179::2]:37150" http.request.uri="/v2/quay.io/jetstack/cert-manager-webhook/blobs/uploads/" http.request.useragent="docker/18.09.6 go/go1.10.8 git-commit/481bc77 kernel/5.10.112-200.49.tis.rt.el7.x86_64 os/linux arch/amd64 UpstreamClient(Docker-Client/18.09.6 \(linux\))" vars.name="quay.io/jetstack/cert-manager-webhook"
2022-09-21T18:56:25.371 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.371771671Z" level=error msg="Upload failed: unauthorized: authentication required"
2022-09-21T18:56:25.371 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.371783692Z" level=error msg="Upload failed: unauthorized: authentication required"
2022-09-21T18:56:25.371 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.371824866Z" level=info msg="Attempting next endpoint for push after error: unauthorized: authentication required"
2022-09-21T18:56:25.371 controller-0 dockerd[98916]: info time="2022-09-21T18:56:25.371836458Z" level=info msg="Attempting next endpoint for push after error: unauthorized: authentication required"

Nothing else in the logs would indicate if there will/will not be future issues with uploading to local registry.

I believe they are wrongly reported as being pushed when the token expired. I believe the token is reported as expired because the docker registry was restarted mid operation, forgetting about in-use tokens.

The scenario identified here is a push with an expired token is reported as a successful push. (docker registry is restarted by a puppet manifest apply).

Next steps. Need to identify how to avoid the puppet manifest that restarts docker registry so soon aroung sysinv startup, or delay the docker image push for restore-requested apps even further or fix the docker module(preferred) or write a better API over the docker module(preferred) or something else.

Narrowed down the docker restart. It is triggered by this manifest: https://opendev.org/starlingx/config/blame/commit/01e23af74c939ccadc579ccd93f55cd93c17380f/sysinv/sysinv/sysinv/sysinv/conductor/manager.py#L13466 in the general area of https://opendev.org/starlingx/config/blame/commit/01e23af74c939ccadc579ccd93f55cd93c17380f/sysinv/sysinv/sysinv/sysinv/conductor/manager.py#L13271

with the config_uuid 2a518e2b-cb46-4f53-898a-9f4c3efb892b in the logs.

sysinv 2022-09-21 18:55:03.750 129587 INFO sysinv.api.controllers.v1.certificate [-] certificate is not valid before 2022-09-20 21:47:02 nor after 2022-12-19 21:47:02
sysinv 2022-09-21 18:55:03.751 129587 INFO sysinv.api.controllers.v1.certificate [-] certificate is not valid before 2020-06-18 12:02:37 nor after 2030-06-16 12:02:37
sysinv 2022-09-21 18:55:03.755 129181 INFO sysinv.conductor.manager [-] config_certificate mode=docker_registry
sysinv 2022-09-21 18:55:03.757 129181 INFO sysinv.conductor.manager [-] config_certificate signature=docker_registry_281848957388213202288208458859027579807
sysinv 2022-09-21 18:55:03.757 129181 INFO sysinv.conductor.manager [-] config_certificate signature=docker_registry_93370745964883815546615037000514751154
sysinv 2022-09-21 18:55:03.757 129181 INFO sysinv.conductor.manager [-] Docker registry certificate install
sysinv 2022-09-21 18:55:03.758 129181 INFO sysinv.conductor.manager [-] config_certificate signature=docker_registry_281848957388213202288208458859027579807
sysinv 2022-09-21 18:55:03.758 129181 INFO sysinv.conductor.manager [-] config_certificate signature=docker_registry_93370745964883815546615037000514751154
sysinv 2022-09-21 18:55:03.762 129181 INFO sysinv.conductor.manager [-] _config_update_hosts personalities=['controller'] host_uuids=None reboot=False config_uuid=7af3af0d-368f-4dc0-8bf6-17c375275dbd tb= File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 13159, in config_certificate
...
sysinv 2022-09-21 18:55:03.805 129181 INFO sysinv.conductor.manager [-] _config_update_hosts personalities=['controller'] host_uuids=None reboot=False config_uuid=2a518e2b-cb46-4f53-898a-9f4c3efb892b tb= File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 13195, in config_certificate
    config_uuid = self._config_update_hosts(context, personalities)

Changed in starlingx:
assignee: nobody → Dan Voiculeasa (dvoicule)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/877724
Committed: https://opendev.org/starlingx/config/commit/8f02a3cf7bc61956f7245ce02ed0c280ca07a75c
Submitter: "Zuul (22348)"
Branch: master

commit 8f02a3cf7bc61956f7245ce02ed0c280ca07a75c
Author: Dan Voiculeasa <email address hidden>
Date: Fri Mar 17 02:32:46 2023 +0200

    Defer certificate install during app downloading images

    It is observed that when the docker registry is in use(eg. app
    download images) if it is restarted, it will wrongly report some
    images as being successfully downloaded, when they are not. No error
    is thrown to the docker API client used, thus the error is silently
    hidden.
    By docker registry in use we mean an image push to the registry is in
    progress.
    Because the failed push is hidden, the error will be propagated and
    the components needing the images will fail.

    This behavior was observed during a particular case: upgrade of the
    system. It is observed that the cause for docker registry restart is a
    manifest that is run [1].

    Defer the logic for installing the certificate (files and manifest).
    Implement file deferral, which is needed.
    Consider the condition for deferral to be the present of apps that
    will have the images downloaded by the framework part of
    restore/upgrade procedure.

    Note: outside of the scope of this work, seems deferrals will be
    forgotten and not attempted after a sysinv-conductor restart.

    Tests:
    PASS: Deploy AIO-DX SystemController DC,
          Deploy AIO-SX Subcloud DC,
          Deploy AIO-SX
    PASS: Observe the new log entries for both deferred and instant
          config type config_update_file filter_mapping ...
          config type config_apply_runtime_manifest filter_mapping ...
          config type ... False (wait)
          config type ... True (continue)
    PASS: Applied a docker certificate and observed the manifest and
          files updated intantly, no app in 'restore-requested' or
          'applying' state
    PASS: Changed one app state to 'restore-requested' and 'applying',
          also alternating between them. Applied a docker certificate
          and observed the manifest and files are deferred until the app
          is moved out of these 2 states.
          Observed the manifest applied after the wait is indeed the one
          restarting the docker registry

    [1]: https://opendev.org/starlingx/config/src/commit/c937f46ecee2802473d786ab8c0addddb9039abc/sysinv/sysinv/sysinv/sysinv/conductor/manager.py#L13449-L13453
    Closes-Bug: 2013800
    Signed-off-by: Dan Voiculeasa <email address hidden>
    Change-Id: Ie0e5d6cee625335431d73114d28edade4cf6663c

Changed in starlingx:
status: In Progress → Fix Released
Frank Miller (sensfan22)
tags: added: stx.9.0 stx.config stx.security
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.