Comment 2 for bug 1842780

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I used `juju run --unit kubernetes-master/0 'sleep 1000'` and `juju run --unit etcd/0 'sleep 1000'` as simulations of a hook executions (although the results of those do not seem to get logged in the /var/log/juju/machine-lock.log).

http://paste.ubuntu.com/p/2NQykJkJDK/ (/var/log/juju/machine-lock.log)

Based on the juju_machine_lock output below there is only one holder of the lock which is correct.

From the usability perspective I think it's harder to find out which agent is really executing something and holding the lock with this behavior. Should the state be "executing" only when a lock is actually held?

I am asking this because with large models it is difficult to spot units that endlessly execute something because of charm bugs preventing a model from progressing forward. If there is only one of them per machine in that state in Juju status output at any given time it is a little more straightforward.

juju status
Model Controller Cloud/Region Version SLA Timestamp
k8s canonistack-manual canonistack/canonistack-bos01 2.7-beta1 unsupported 16:18:34Z

App Version Status Scale Charm Store Rev OS Notes
containerd active 3 containerd jujucharms 20 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 270 ubuntu
etcd 3.2.10 active 1 etcd jujucharms 449 ubuntu
flannel 0.10.0 active 3 flannel jujucharms 438 ubuntu
kubernetes-master 1.15.3 active 1 kubernetes-master jujucharms 724 ubuntu exposed
kubernetes-worker 1.15.3 active 2 kubernetes-worker jujucharms 571 ubuntu exposed
openstack-integrator rocky active 1 openstack-integrator jujucharms 26 ubuntu

Unit Workload Agent Machine Public address Ports Message
easyrsa/1* active executing 0 10.48.132.145 Certificate Authority connected.
etcd/0* active executing 0 10.48.132.145 2379/tcp (juju-run) Healthy with 1 known peer
kubernetes-master/0* active executing 0 10.48.132.145 6443/tcp (juju-run) Kubernetes master running.
  containerd/1 active idle 10.48.132.145 Container runtime available.
  flannel/1 active idle 10.48.132.145 Flannel subnet 10.1.88.1/24
kubernetes-worker/0* active idle 1 10.48.130.251 80/tcp,443/tcp Kubernetes worker running.
  containerd/0* active idle 10.48.130.251 Container runtime available.
  flannel/0* active idle 10.48.130.251 Flannel subnet 10.1.96.1/24
kubernetes-worker/1 active idle 2 10.48.131.75 80/tcp,443/tcp Kubernetes worker running.
  containerd/2 active idle 10.48.131.75 Container runtime available.
  flannel/2 active idle 10.48.131.75 Flannel subnet 10.1.7.1/24
openstack-integrator/0* active idle 0 10.48.132.145 ready

Machine State DNS Inst id Series AZ Message
0 started 10.48.132.145 5c4bb239-f059-420f-90bb-e173f8e0bc1a bionic nova ACTIVE
1 started 10.48.130.251 b7dfd58a-d8e3-4260-af7a-26d9435857c0 bionic nova ACTIVE
2 started 10.48.131.75 e8d8bd7c-0ef4-495a-9e6d-266a0dd2a088 bionic nova ACTIVE

 date ; juju_machine_lock ; tail -n30 -f /var/log/juju/machine-lock.log
Fri Sep 6 16:18:59 UTC 2019
machine-0:
  holder: none
unit-containerd-1:
  holder: none
  waiting:
  - uniter (run update-status hook), waiting 20m44s
unit-easyrsa-1:
  holder: none
  waiting:
  - uniter (run relation-changed (3; kubernetes-master/0) hook), waiting 2m3s
unit-etcd-0:
  holder: uniter (run action 1ee189f9-5ac2-4090-8f12-4792e36458a8), holding 2m0s
unit-flannel-1:
  holder: none
unit-kubernetes-master-0:
  holder: none
  waiting:
  - uniter (run action b4625c02-e507-4a2e-8980-c6f74f9ae213), waiting 2m36s
unit-openstack-integrator-0:
  holder: none
  waiting:
  - uniter (run update-status hook), waiting 7m47s
2019-09-06 15:45:12 unit-etcd-0: uniter (run update-status hook), waited 0s, held 9s
2019-09-06 15:46:21 unit-openstack-integrator-0: uniter (run update-status hook), waited 0s, held 8s
2019-09-06 15:46:41 unit-easyrsa-1: uniter (run relation-changed (5; kubernetes-worker/1) hook), waited 0s, held 8s
2019-09-06 15:46:47 unit-flannel-1: uniter (run update-status hook), waited 0s, held 2s
2019-09-06 15:47:01 unit-easyrsa-1: uniter (run relation-changed (5; kubernetes-worker/0) hook), waited 0s, held 11s
2019-09-06 15:47:28 unit-containerd-1: uniter (run update-status hook), waited 0s, held 2s
2019-09-06 15:48:18 unit-easyrsa-1: uniter (run update-status hook), waited 0s, held 5s
2019-09-06 15:48:56 unit-kubernetes-master-0: uniter (run update-status hook), waited 6s, held 38s
2019-09-06 15:49:08 unit-easyrsa-1: uniter (run relation-changed (3; kubernetes-master/0) hook), waited 0s, held 8s
2019-09-06 15:51:07 unit-etcd-0: uniter (run update-status hook), waited 0s, held 7s
2019-09-06 15:51:24 unit-openstack-integrator-0: uniter (run update-status hook), waited 0s, held 5s
2019-09-06 15:51:44 unit-easyrsa-1: uniter (run relation-changed (5; kubernetes-worker/1) hook), waited 0s, held 9s
2019-09-06 15:52:24 unit-containerd-1: uniter (run update-status hook), waited 0s, held 3s
2019-09-06 15:52:39 unit-flannel-1: uniter (run update-status hook), waited 0s, held 7s
2019-09-06 15:53:55 unit-easyrsa-1: uniter (run relation-changed (5; kubernetes-worker/0) hook), waited 0s, held 1m15s
2019-09-06 15:54:15 unit-easyrsa-1: uniter (run update-status hook), waited 0s, held 20s
2019-09-06 15:57:10 unit-kubernetes-master-0: uniter (run update-status hook), waited 41s, held 2m54s
2019-09-06 15:58:45 unit-etcd-0: uniter (run update-status hook), waited 1m32s, held 1m36s
2019-09-06 15:59:27 unit-openstack-integrator-0: uniter (run update-status hook), waited 2m17s, held 41s
2019-09-06 15:59:33 unit-easyrsa-1: uniter (run relation-changed (3; kubernetes-master/0) hook), waited 2m46s, held 6s
2019-09-06 15:59:37 unit-flannel-1: uniter (run update-status hook), waited 2m35s, held 4s
2019-09-06 16:06:12 unit-kubernetes-master-0: uniter (run update-status hook), waited 1m54s, held 6m35s
2019-09-06 16:06:23 unit-etcd-0: uniter (run update-status hook), waited 4m54s, held 10s
2019-09-06 16:10:16 unit-kubernetes-master-0: uniter (run update-status hook), waited 10s, held 3m53s
2019-09-06 16:11:13 unit-openstack-integrator-0: uniter (run update-status hook), waited 8m14s, held 57s
2019-09-06 16:16:23 unit-kubernetes-master-0: uniter (run action 1f8f2b8f-ed5c-4a07-8c55-07d146f95aaa), waited 57s, held 5m11s
2019-09-06 16:16:46 unit-etcd-0: uniter (run update-status hook), waited 9m54s, held 23s
2019-09-06 16:16:50 unit-flannel-1: uniter (run update-status hook), waited 14m59s, held 4s
2019-09-06 16:16:56 unit-easyrsa-1: uniter (run relation-changed (5; kubernetes-worker/0) hook), waited 17m17s, held 5s
2019-09-06 16:16:59 unit-flannel-1: uniter (run update-status hook), waited 6s, held 4s waited 2m46s, held 6s
2019-09-06 16:16:56 unit-easyrsa-1: uniter (run relation-changed (5; kubernetes-worker/0) hook), waited 17m17s, held 5s