An assesment of what happened in the lab when platform-integ-apps and oidc-auth-apps failed to upload At 09:12:29.137 platform-integ-apps failed to upload 2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app Traceback (most recent call last): │ 2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1928, in perform_app_upload │ 2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app reason="Failed to validate application manifest.") │ 2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app KubeAppUploadFailure: Upload of application platform-integ-apps (1.0-8) failed: Failed to validate application manifest. │ 2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app At 09:12:29.978 oidc-auth-apps failed to upload 2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app Traceback (most recent call last): │ 2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1928, in perform_app_upload │ 2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app reason="Failed to validate application manifest.") │ 2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app KubeAppUploadFailure: Upload of application oidc-auth-apps (1.0-0) failed: Failed to validate application manifest. │ 2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app Around 2020-05-21T09:12:04 The cert-manager pod container started to throw this errors in a loop: 2020-05-21T09:12:04.317805242Z stderr F E0521 09:12:04.317715 1 dynamic_source.go:87] "msg"="Failed to generate initial serving certificate, retrying..." "er ror"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input" "interval"=1000000000 2020-05-21T09:12:05.308809217Z stderr F I0521 09:12:05.308645 1 dynamic_source.go:171] "msg"="Generating new ECDSA private key" 2020-05-21T09:12:05.315177628Z stderr F I0521 09:12:05.315036 1 dynamic_source.go:186] "msg"="Signing new serving certificate" Around 2020-05-21T09:12:09Z Things started to go wrong with one of cert-manager application pod 2020-05-21T09:12:09Z cm-cert-manager-webhook-7d5c897795-tstjz Pod Readiness probe failed: HTTP probe failed with statuscode: 500 Unhealthy Warning 2020-05-21T09:12:10Z calico-kube-controllers-5cd4695574-mtspd Pod Container image "registry.local:9001/quay.io/calico/kube-controllers:v3.12.0" already pres ent on machine Pulled Normal 2020-05-21T09:12:10Z coredns-78d9fd7cb9-q5nxw Pod Readiness probe failed: HTTP probe failed with statuscode: 503 Unhealthy Warning 2020-05-21T09:12:10Z calico-kube-controllers-5cd4695574-mtspd Pod Started container calico-kube-controllers Started Normal 2020-05-21T09:12:10Z calico-kube-controllers-5cd4695574-mtspd Pod Created container calico-kube-controllers Created Normal 2020-05-21T09:12:12Z calico-kube-controllers-5cd4695574-mtspd Pod Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory Unhealthy Warning 2020-05-21T09:12:21Z calico-kube-controllers-5cd4695574-mtspd Pod Back-off restarting failed container BackOff Warning 2020-05-21T09:12:24Z kube-scheduler Lease controller-0_df95482f-1ac0-474a-a491-c503622f57d1 became leader LeaderElection Normal 2020-05-21T09:12:24Z coredns-78d9fd7cb9-lsv8g Pod 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy exist ing pods anti-affinity rules. FailedScheduling Warning 2020-05-21T09:12:24Z kube-scheduler Endpoints controller-0_df95482f-1ac0-474a-a491-c503622f57d1 became leader LeaderElection Normal 2020-05-21T09:12:32Z cm-cert-manager-cainjector-56b68989b5-8xrw6 Pod Back-off restarting failed container BackOff Warning 2020-05-21T09:12:33Z cert-manager-controller ConfigMap cm-cert-manager-7b8b94bf9f-v5cmx-external-cert-manager-controller became leader LeaderElection No rmal 2020-05-21T09:12:34Z platform-deployment-manager-0 Pod Back-off restarting failed container BackOff Warning Some issues with disk capacity around 2020-05-21T09:38:34 (The times are ulterior to the application failure so my guess is that they can't be causal) 2020-05-21T09:38:34Z coredns-78d9fd7cb9-lsv8g Pod Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "07b16d 823331e1ac4b326b65c01737d4eb2ce258835c2a5d74b4899664d78fc6": Multus: [kube-system/coredns-78d9fd7cb9-lsv8g]: error adding container to network "chain": delegateAd d: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/ FailedCreatePodSandBox Warning 2020-05-21T09:38:37Z ic-nginx-ingress-controller-hbs8r Pod Successfully pulled image "registry.local:9001/quay.io/kubernetes-ingress-controller/nginx-ingress -controller:0.23.0" Pulled Normal 2020-05-21T09:38:39Z ic-nginx-ingress-controller-hbs8r Pod Created container nginx-ingress-controller Created Normal 2020-05-21T09:38:39Z ic-nginx-ingress-controller-hbs8r Pod Started container nginx-ingress-controller Started Normal 2020-05-21T09:38:41Z ic-nginx-ingress-controller ConfigMap ConfigMap kube-system/ic-nginx-ingress-controller CREATE Normal 2020-05-21T09:38:48Z calico-node-dpfzj Pod Successfully pulled image "registry.local:9001/quay.io/calico/node:v3.12.0" Pulled Normal 2020-05-21T09:38:55Z controller-1 Node Starting kubelet. Starting Normal 2020-05-21T09:38:55Z controller-1 Node invalid capacity 0 on image filesystem InvalidDiskCapacity Warning 2020-05-21T09:38:55Z controller-1 Node invalid capacity 0 on image filesystem ImageGCFailed Warning My only conclusion until now is that cert-manager malfunction is causal to failing application upload.