An assesment of what happened in the lab when platform-integ-apps and oidc-auth-apps failed to upload
At 09:12:29.137 platform-integ-apps failed to upload
2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app Traceback (most recent call last): │
2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1928, in perform_app_upload │
2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app reason="Failed to validate application manifest.") │
2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app KubeAppUploadFailure: Upload of application platform-integ-apps (1.0-8) failed: Failed to validate application manifest. │
2020-05-21 09:12:29.137 104305 ERROR sysinv.conductor.kube_app
At 09:12:29.978 oidc-auth-apps failed to upload
2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app Traceback (most recent call last): │
2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1928, in perform_app_upload │
2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app reason="Failed to validate application manifest.") │
2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app KubeAppUploadFailure: Upload of application oidc-auth-apps (1.0-0) failed: Failed to validate application manifest. │
2020-05-21 09:12:29.978 104305 ERROR sysinv.conductor.kube_app
Around 2020-05-21T09:12:04
The cert-manager pod container started to throw this errors in a loop:
2020-05-21T09:12:04.317805242Z stderr F E0521 09:12:04.317715 1 dynamic_source.go:87] "msg"="Failed to generate initial serving certificate, retrying..." "er
ror"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input" "interval"=1000000000
2020-05-21T09:12:05.308809217Z stderr F I0521 09:12:05.308645 1 dynamic_source.go:171] "msg"="Generating new ECDSA private key"
2020-05-21T09:12:05.315177628Z stderr F I0521 09:12:05.315036 1 dynamic_source.go:186] "msg"="Signing new serving certificate"
Around 2020-05-21T09:12:09Z
Things started to go wrong with one of cert-manager application pod
2020-05-21T09:12:09Z cm-cert-manager-webhook-7d5c897795-tstjz Pod Readiness probe failed: HTTP probe failed with statuscode: 500 Unhealthy Warning
2020-05-21T09:12:10Z calico-kube-controllers-5cd4695574-mtspd Pod Container image "registry.local:9001/quay.io/calico/kube-controllers:v3.12.0" already pres
ent on machine Pulled Normal
2020-05-21T09:12:10Z coredns-78d9fd7cb9-q5nxw Pod Readiness probe failed: HTTP probe failed with statuscode: 503 Unhealthy Warning
2020-05-21T09:12:10Z calico-kube-controllers-5cd4695574-mtspd Pod Started container calico-kube-controllers Started Normal
2020-05-21T09:12:10Z calico-kube-controllers-5cd4695574-mtspd Pod Created container calico-kube-controllers Created Normal
2020-05-21T09:12:12Z calico-kube-controllers-5cd4695574-mtspd Pod Readiness probe failed: Failed to read status file status.json: open status.json: no such
file or directory
Unhealthy Warning
2020-05-21T09:12:21Z calico-kube-controllers-5cd4695574-mtspd Pod Back-off restarting failed container BackOff Warning
2020-05-21T09:12:24Z kube-scheduler Lease controller-0_df95482f-1ac0-474a-a491-c503622f57d1 became leader LeaderElection Normal
2020-05-21T09:12:24Z coredns-78d9fd7cb9-lsv8g Pod 0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy exist
ing pods anti-affinity rules. FailedScheduling Warning
2020-05-21T09:12:24Z kube-scheduler Endpoints controller-0_df95482f-1ac0-474a-a491-c503622f57d1 became leader LeaderElection Normal
2020-05-21T09:12:32Z cm-cert-manager-cainjector-56b68989b5-8xrw6 Pod Back-off restarting failed container BackOff Warning
2020-05-21T09:12:33Z cert-manager-controller ConfigMap cm-cert-manager-7b8b94bf9f-v5cmx-external-cert-manager-controller became leader LeaderElection No
rmal
2020-05-21T09:12:34Z platform-deployment-manager-0 Pod Back-off restarting failed container BackOff Warning
Some issues with disk capacity around 2020-05-21T09:38:34 (The times are ulterior to the application failure so my guess is that they can't be causal)
2020-05-21T09:38:34Z coredns-78d9fd7cb9-lsv8g Pod Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "07b16d
823331e1ac4b326b65c01737d4eb2ce258835c2a5d74b4899664d78fc6": Multus: [kube-system/coredns-78d9fd7cb9-lsv8g]: error adding container to network "chain": delegateAd
d: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: stat /var/lib/calico/nodename: no such file or directory: check
that the calico/node container is running and has mounted /var/lib/calico/ FailedCreatePodSandBox Warning
2020-05-21T09:38:37Z ic-nginx-ingress-controller-hbs8r Pod Successfully pulled image "registry.local:9001/quay.io/kubernetes-ingress-controller/nginx-ingress
-controller:0.23.0" Pulled Normal
2020-05-21T09:38:39Z ic-nginx-ingress-controller-hbs8r Pod Created container nginx-ingress-controller Created Normal
2020-05-21T09:38:39Z ic-nginx-ingress-controller-hbs8r Pod Started container nginx-ingress-controller Started Normal
2020-05-21T09:38:41Z ic-nginx-ingress-controller ConfigMap ConfigMap kube-system/ic-nginx-ingress-controller CREATE Normal
2020-05-21T09:38:48Z calico-node-dpfzj Pod Successfully pulled image "registry.local:9001/quay.io/calico/node:v3.12.0" Pulled Normal
2020-05-21T09:38:55Z controller-1 Node Starting kubelet. Starting Normal
2020-05-21T09:38:55Z controller-1 Node invalid capacity 0 on image filesystem InvalidDiskCapacity Warning
2020-05-21T09:38:55Z controller-1 Node invalid capacity 0 on image filesystem ImageGCFailed Warning
My only conclusion until now is that cert-manager malfunction is causal to failing application upload.
An assesment of what happened in the lab when platform-integ-apps and oidc-auth-apps failed to upload
At 09:12:29.137 platform-integ-apps failed to upload conductor. kube_app Traceback (most recent call last): │ conductor. kube_app File "/usr/lib64/ python2. 7/site- packages/ sysinv/ conductor/ kube_app. py", line 1928, in perform_app_upload │ conductor. kube_app reason="Failed to validate application manifest.") │ conductor. kube_app KubeAppUploadFa ilure: Upload of application platform-integ-apps (1.0-8) failed: Failed to validate application manifest. │ conductor. kube_app
2020-05-21 09:12:29.137 104305 ERROR sysinv.
2020-05-21 09:12:29.137 104305 ERROR sysinv.
2020-05-21 09:12:29.137 104305 ERROR sysinv.
2020-05-21 09:12:29.137 104305 ERROR sysinv.
2020-05-21 09:12:29.137 104305 ERROR sysinv.
At 09:12:29.978 oidc-auth-apps failed to upload conductor. kube_app Traceback (most recent call last): │ conductor. kube_app File "/usr/lib64/ python2. 7/site- packages/ sysinv/ conductor/ kube_app. py", line 1928, in perform_app_upload │ conductor. kube_app reason="Failed to validate application manifest.") │ conductor. kube_app KubeAppUploadFa ilure: Upload of application oidc-auth-apps (1.0-0) failed: Failed to validate application manifest. │ conductor. kube_app
2020-05-21 09:12:29.978 104305 ERROR sysinv.
2020-05-21 09:12:29.978 104305 ERROR sysinv.
2020-05-21 09:12:29.978 104305 ERROR sysinv.
2020-05-21 09:12:29.978 104305 ERROR sysinv.
2020-05-21 09:12:29.978 104305 ERROR sysinv.
Around 2020-05-21T09:12:04
The cert-manager pod container started to throw this errors in a loop:
2020-05- 21T09:12: 04.317805242Z stderr F E0521 09:12:04.317715 1 dynamic_ source. go:87] "msg"="Failed to generate initial serving certificate, retrying..." "er =1000000000 21T09:12: 05.308809217Z stderr F I0521 09:12:05.308645 1 dynamic_ source. go:171] "msg"="Generating new ECDSA private key" 21T09:12: 05.315177628Z stderr F I0521 09:12:05.315036 1 dynamic_ source. go:186] "msg"="Signing new serving certificate"
ror"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input" "interval"
2020-05-
2020-05-
Around 2020-05- 21T09:12: 09Z 21T09:12: 09Z cm-cert- manager- webhook- 7d5c897795- tstjz Pod Readiness probe failed: HTTP probe failed with statuscode: 500 Unhealthy Warning 21T09:12: 10Z calico- kube-controller s-5cd4695574- mtspd Pod Container image "registry. local:9001/ quay.io/ calico/ kube-controller s:v3.12. 0" already pres 21T09:12: 10Z coredns- 78d9fd7cb9- q5nxw Pod Readiness probe failed: HTTP probe failed with statuscode: 503 Unhealthy Warning 21T09:12: 10Z calico- kube-controller s-5cd4695574- mtspd Pod Started container calico- kube-controller s Started Normal 21T09:12: 10Z calico- kube-controller s-5cd4695574- mtspd Pod Created container calico- kube-controller s Created Normal 21T09:12: 12Z calico- kube-controller s-5cd4695574- mtspd Pod Readiness probe failed: Failed to read status file status.json: open status.json: no such 21T09:12: 21Z calico- kube-controller s-5cd4695574- mtspd Pod Back-off restarting failed container BackOff Warning 21T09:12: 24Z kube-scheduler Lease controller- 0_df95482f- 1ac0-474a- a491-c503622f57 d1 became leader LeaderElection Normal 21T09:12: 24Z coredns- 78d9fd7cb9- lsv8g Pod 0/1 nodes are available: 1 node(s) didn't match pod affinity/ anti-affinity, 1 node(s) didn't satisfy exist 21T09:12: 24Z kube-scheduler Endpoints controller- 0_df95482f- 1ac0-474a- a491-c503622f57 d1 became leader LeaderElection Normal 21T09:12: 32Z cm-cert- manager- cainjector- 56b68989b5- 8xrw6 Pod Back-off restarting failed container BackOff Warning 21T09:12: 33Z cert-manager- controller ConfigMap cm-cert- manager- 7b8b94bf9f- v5cmx-external- cert-manager- controller became leader LeaderElection No 21T09:12: 34Z platform- deployment- manager- 0 Pod Back-off restarting failed container BackOff Warning
Things started to go wrong with one of cert-manager application pod
2020-05-
2020-05-
ent on machine Pulled Normal
2020-05-
2020-05-
2020-05-
2020-05-
file or directory
Unhealthy Warning
2020-05-
2020-05-
2020-05-
ing pods anti-affinity rules. FailedScheduling Warning
2020-05-
2020-05-
2020-05-
rmal
2020-05-
Some issues with disk capacity around 2020-05-21T09:38:34 (The times are ulterior to the application failure so my guess is that they can't be causal)
2020-05- 21T09:38: 34Z coredns- 78d9fd7cb9- lsv8g Pod Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "07b16d b65c01737d4eb2c e258835c2a5d74b 4899664d78fc6" : Multus: [kube-system/ coredns- 78d9fd7cb9- lsv8g]: error adding container to network "chain": delegateAd calico/ nodename: no such file or directory: check SandBox Warning 21T09:38: 37Z ic-nginx- ingress- controller- hbs8r Pod Successfully pulled image "registry. local:9001/ quay.io/ kubernetes- ingress- controller/ nginx-ingress 21T09:38: 39Z ic-nginx- ingress- controller- hbs8r Pod Created container nginx-ingress- controller Created Normal 21T09:38: 39Z ic-nginx- ingress- controller- hbs8r Pod Started container nginx-ingress- controller Started Normal 21T09:38: 41Z ic-nginx- ingress- controller ConfigMap ConfigMap kube-system/ ic-nginx- ingress- controller CREATE Normal 21T09:38: 48Z calico-node-dpfzj Pod Successfully pulled image "registry. local:9001/ quay.io/ calico/ node:v3. 12.0" Pulled Normal 21T09:38: 55Z controller-1 Node Starting kubelet. Starting Normal 21T09:38: 55Z controller-1 Node invalid capacity 0 on image filesystem InvalidDiskCapacity Warning 21T09:38: 55Z controller-1 Node invalid capacity 0 on image filesystem ImageGCFailed Warning
823331e1ac4b326
d: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: stat /var/lib/
that the calico/node container is running and has mounted /var/lib/calico/ FailedCreatePod
2020-05-
-controller:0.23.0" Pulled Normal
2020-05-
2020-05-
2020-05-
2020-05-
2020-05-
2020-05-
2020-05-
My only conclusion until now is that cert-manager malfunction is causal to failing application upload.