I managed to reproduce this issue in a Vbox AIO-SX setup, but the error given is different. But it should be noted that the error presented in the original LP and the one Wendy encountered, are also different (also different than mine). It seems that in some cases, K8s services start later than usual and the apply/reapply checks in sysinv don't check for that. I received a name resolution order because the coredns pod started after the re-apply of platform-integ-apps was triggered. My theory is that, depending on what pods are running or not-running at the moment when platform-integ-apps apply is triggered, we get different error messages. Also this issue is very hard to repoduce and on the vbox SX, i am able to reproduce this once every 10-15 lock-unlock cycles. Logs of when I reproduces the issue: sysinv.log ysinv 2019-11-13 11:38:08.422 98276 INFO sysinv.conductor.kube_app [-] Starting Armada service... sysinv 2019-11-13 11:38:08.424 98276 INFO sysinv.conductor.kube_app [-] kube_config=/opt/platform/armada/19.09/admin.conf, manifests_dir=/opt/platform/armada/19.09, overrides_dir=/opt/platform/helm/19.09, logs_dir=/var/log/armada. sysinv 2019-11-13 11:38:08.824 98276 INFO sysinv.conductor.kube_app [-] Armada service started! sysinv 2019-11-13 11:38:08.825 98276 INFO sysinv.conductor.kube_app [-] Armada apply command = /bin/bash -c 'set -o pipefail; armada apply --enable-chart-cleanup --debug /manifests/platform-integ-apps/1.0-8/platform-integ-apps-manifest.yaml --values /overrides/platform-integ-apps/1.0-8/kube-system-rbd-provisioner.yaml --values /overrides/platform-integ-apps/1.0-8/kube-system-ceph-pools-audit.yaml --values /overrides/platform-integ-apps/1.0-8/helm-toolkit-helm-toolkit.yaml --tiller-host tiller-deploy.kube-system.svc.cluster.local | tee /logs/platform-integ-apps-apply_2019-11-13-11-38-08.log' sysinv 2019-11-13 11:38:09.420 98276 INFO sysinv.conductor.kube_app [-] Starting progress monitoring thread for app platform-integ-apps sysinv 2019-11-13 11:38:16.833 98276 ERROR sysinv.conductor.kube_app [-] Failed to apply application manifest /manifests/platform-integ-apps/1.0-8/platform-integ-apps-manifest.yaml. See /var/log/armada/platform-integ-apps-apply_2019-11-13-11-38-08.log for details. sysinv 2019-11-13 11:38:16.835 98276 INFO sysinv.conductor.kube_app [-] Exiting progress monitoring thread for app platform-integ-apps sysinv 2019-11-13 11:38:17.017 98276 ERROR sysinv.conductor.kube_app [-] Application apply aborted!. Pod states during this time (note coredns marked as completed): Wed Nov 13 11:38:08 UTC 2019 NAME READY STATUS RESTARTS AGE calico-kube-controllers-7f985db75c-d5xml 0/1 Error 9 21h calico-node-zld2m 0/1 CrashLoopBackOff 20 21h ceph-pools-audit-1573644000-8v4w8 0/1 Completed 0 18m ceph-pools-audit-1573644300-wc2st 0/1 Completed 0 10m ceph-pools-audit-1573644600-9lqv7 0/1 Completed 0 8m1s coredns-6889846b6b-5nmng 0/1 Completed 9 21h kube-apiserver-controller-0 1/1 Running 10 21h kube-controller-manager-controller-0 1/1 Running 19 21h kube-multus-ds-amd64-76gj8 1/1 Running 10 21h kube-proxy-kzjmt 1/1 Running 10 21h kube-scheduler-controller-0 1/1 Running 19 21h kube-sriov-cni-ds-amd64-kkhpx 1/1 Running 10 21h rbd-provisioner-7484d49cf6-wgwhh 0/1 Error 7 20h storage-init-rbd-provisioner-84qfw 0/1 Completed 0 20h tiller-deploy-d6b59fcb-cgv4p 1/1 Running 7 123m Wed Nov 13 11:38:10 UTC 2019 NAME READY STATUS RESTARTS AGE calico-kube-controllers-7f985db75c-d5xml 0/1 Error 9 21h calico-node-zld2m 0/1 CrashLoopBackOff 20 21h ceph-pools-audit-1573644000-8v4w8 0/1 Completed 0 18m ceph-pools-audit-1573644300-wc2st 0/1 Completed 0 10m ceph-pools-audit-1573644600-9lqv7 0/1 Completed 0 8m3s coredns-6889846b6b-5nmng 0/1 Completed 9 21h kube-apiserver-controller-0 1/1 Running 10 21h kube-controller-manager-controller-0 1/1 Running 19 21h kube-multus-ds-amd64-76gj8 1/1 Running 10 21h kube-proxy-kzjmt 1/1 Running 10 21h kube-scheduler-controller-0 1/1 Running 19 21h kube-sriov-cni-ds-amd64-kkhpx 1/1 Running 10 21h rbd-provisioner-7484d49cf6-wgwhh 0/1 Error 7 20h storage-init-rbd-provisioner-84qfw 0/1 Completed 0 20h tiller-deploy-d6b59fcb-cgv4p 1/1 Running 7 123m Wed Nov 13 11:38:12 UTC 2019 NAME READY STATUS RESTARTS AGE calico-kube-controllers-7f985db75c-d5xml 0/1 Error 9 21h calico-node-zld2m 0/1 CrashLoopBackOff 20 21h ceph-pools-audit-1573644000-8v4w8 0/1 Completed 0 18m ceph-pools-audit-1573644300-wc2st 0/1 Completed 0 10m ceph-pools-audit-1573644600-9lqv7 0/1 Completed 0 8m6s coredns-6889846b6b-5nmng 0/1 Completed 9 21h kube-apiserver-controller-0 1/1 Running 10 21h kube-controller-manager-controller-0 1/1 Running 19 21h kube-multus-ds-amd64-76gj8 1/1 Running 10 21h kube-proxy-kzjmt 1/1 Running 10 21h kube-scheduler-controller-0 1/1 Running 19 21h kube-sriov-cni-ds-amd64-kkhpx 1/1 Running 10 21h rbd-provisioner-7484d49cf6-wgwhh 0/1 Error 7 20h storage-init-rbd-provisioner-84qfw 0/1 Completed 0 20h tiller-deploy-d6b59fcb-cgv4p 1/1 Running 7 123m Wed Nov 13 11:38:15 UTC 2019 NAME READY STATUS RESTARTS AGE calico-kube-controllers-7f985db75c-d5xml 0/1 Error 9 21h calico-node-zld2m 0/1 CrashLoopBackOff 20 21h ceph-pools-audit-1573644000-8v4w8 0/1 Completed 0 18m ceph-pools-audit-1573644300-wc2st 0/1 Completed 0 10m ceph-pools-audit-1573644600-9lqv7 0/1 Completed 0 8m8s coredns-6889846b6b-5nmng 0/1 Completed 9 21h kube-apiserver-controller-0 1/1 Running 10 21h kube-controller-manager-controller-0 1/1 Running 19 21h kube-multus-ds-amd64-76gj8 1/1 Running 10 21h kube-proxy-kzjmt 1/1 Running 10 21h kube-scheduler-controller-0 1/1 Running 19 21h kube-sriov-cni-ds-amd64-kkhpx 1/1 Running 10 21h rbd-provisioner-7484d49cf6-wgwhh 0/1 Error 7 20h storage-init-rbd-provisioner-84qfw 0/1 Completed 0 20h tiller-deploy-d6b59fcb-cgv4p 1/1 Running 7 123m Wed Nov 13 11:38:17 UTC 2019 NAME READY STATUS RESTARTS AGE calico-kube-controllers-7f985db75c-d5xml 0/1 Error 9 21h calico-node-zld2m 0/1 CrashLoopBackOff 20 21h ceph-pools-audit-1573644000-8v4w8 0/1 Completed 0 18m ceph-pools-audit-1573644300-wc2st 0/1 Completed 0 10m ceph-pools-audit-1573644600-9lqv7 0/1 Completed 0 8m10s coredns-6889846b6b-5nmng 0/1 Completed 9 21h kube-apiserver-controller-0 1/1 Running 10 21h kube-controller-manager-controller-0 1/1 Running 19 21h kube-multus-ds-amd64-76gj8 1/1 Running 10 21h kube-proxy-kzjmt 1/1 Running 10 21h kube-scheduler-controller-0 1/1 Running 19 21h kube-sriov-cni-ds-amd64-kkhpx 1/1 Running 10 21h rbd-provisioner-7484d49cf6-wgwhh 0/1 Error 7 20h storage-init-rbd-provisioner-84qfw 0/1 Completed 0 20h tiller-deploy-d6b59fcb-cgv4p 1/1 Running 7 123m Note:"kubectl -n kube-system get pods" didn't start working until Wed Nov 13 11:37:54 UTC 2019 Error in armada log: 2019-11-13 11:38:10.770 16 DEBUG armada.handlers.tiller [-] Tiller ListReleases() with timeout=300, request=limit: 32 status_codes: UNKNOWN status_codes: DEPLOYED status_codes: DELETED status_codes: DELETING status_codes: FAILED status_codes: PENDING_INSTALL status_codes: PENDING_UPGRADE status_codes: PENDING_ROLLBACK get_results /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:215^[[00m 2019-11-13 11:38:16.646 16 INFO armada.handlers.lock [-] Releasing lock^[[00m 2019-11-13 11:38:16.654 16 ERROR armada.cli [-] Caught unexpected exception: grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with: status = StatusCode.UNAVAILABLE details = "Name resolution failure" debug_error_string = "{"created":"@1573645095.806259192","description":"Failed to create subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2721,"referenced_errors":[{"created":"@1573645095.806254835","description":"Name resolution failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3026,"grpc_status":14}]}" > 2019-11-13 11:38:16.654 16 ERROR armada.cli Traceback (most recent call last): 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/__init__.py", line 38, in safe_invoke 2019-11-13 11:38:16.654 16 ERROR armada.cli self.invoke() 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 213, in invoke 2019-11-13 11:38:16.654 16 ERROR armada.cli resp = self.handle(documents, tiller) 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py", line 81, in func_wrapper 2019-11-13 11:38:16.654 16 ERROR armada.cli return future.result() 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result 2019-11-13 11:38:16.654 16 ERROR armada.cli return self.__get_result() 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result 2019-11-13 11:38:16.654 16 ERROR armada.cli raise self._exception 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run 2019-11-13 11:38:16.654 16 ERROR armada.cli result = self.fn(*self.args, **self.kwargs) 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 256, in handle 2019-11-13 11:38:16.654 16 ERROR armada.cli return armada.sync() 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 189, in sync 2019-11-13 11:38:16.654 16 ERROR armada.cli known_releases = self.tiller.list_releases() 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 252, in list_releases 2019-11-13 11:38:16.654 16 ERROR armada.cli releases = get_results() 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py", line 220, in get_results 2019-11-13 11:38:16.654 16 ERROR armada.cli for message in response: 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 364, in __next__ 2019-11-13 11:38:16.654 16 ERROR armada.cli return self._next() 2019-11-13 11:38:16.654 16 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/grpc/_channel.py", line 358, in _next 2019-11-13 11:38:16.654 16 ERROR armada.cli raise self 2019-11-13 11:38:16.654 16 ERROR armada.cli grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with: 2019-11-13 11:38:16.654 16 ERROR armada.cli status = StatusCode.UNAVAILABLE 2019-11-13 11:38:16.654 16 ERROR armada.cli details = "Name resolution failure" 2019-11-13 11:38:16.654 16 ERROR armada.cli debug_error_string = "{"created":"@1573645095.806259192","description":"Failed to create subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2721,"referenced_errors":[{"created":"@1573645095.806254835","description":"Name resolution failure","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3026,"grpc_status":14}]}"