Comment 5 for bug 1848721

Revision history for this message
Bob Church (rchurch) wrote :

platform-integ-apps fails to apply because the replica count for the number of rbd-provisioner pods is zero. With zero relics the armada manifest apply will timeout as no pods will be launched as it's waiting on notification of this event. This occurs because VIM services go enabled 4s after the overrides are generated and the replica count is based on number of enabled controllers w/ vim_services.

2019-10-18 00:35:28.293 95972 INFO sysinv.api.controllers.v1.host [-] controller-0 Action unlock perform notify_mtce
2019-10-18 00:48:05.275 113546 INFO sysinv.conductor.manager [-] Platform managed application platform-integ-apps: Uploading...
2019-10-18 00:48:08.556 113546 INFO sysinv.conductor.kube_app [-] Generating application overrides...
2019-10-18 00:48:08.987 113546 INFO sysinv.conductor.kube_app [-] Application platform-integ-apps (1.0-8) upload completed.
2019-10-18 00:49:05.671 113546 INFO sysinv.conductor.manager [-] Platform managed application platform-integ-apps: Applying...
2019-10-18 00:49:05.990 113546 INFO sysinv.conductor.kube_app [-] Application platform-integ-apps (1.0-8) apply started.
2019-10-18 00:49:06.294 113546 INFO sysinv.conductor.kube_app [-] Generating application overrides...
2019-10-18 00:49:10.474 114087 INFO sysinv.api.controllers.v1.host [-] controller-0 notify_availability=services-enabled
2019-10-18 01:20:29.138 113546 ERROR sysinv.conductor.kube_app [-] Failed to apply application manifest /manifests/platform-integ-apps/1.0-8/platform-integ-apps-manifest.yaml. See /var/log/armada/platform-integ-apps-apply.log for details.

If you are seeing success 50% of the time then we have a race condition here. The fix is to update the apply criteria in _met_app_apply_prerequisites() to align with the logic that determines the number of replicas. This way we can ensure the apply happens only if there is a guarantee of at least one replica.