Multiple unit agents shown as "executing" on the same machine

Bug #1842780 reported by Dmitrii Shcherbakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

juju snap built out of commit ID: bff30a662f789f8ff0124ad042dc1311f443c9dc

Although I have seen this behavior before on stable versions.

In summary, multiple unit agents can be shown as "executing", however, I have not actually seen them doing things in parallel which is expected because they are gated by a machine-level lock.

I wonder if agent status reporting is correct though.

ubuntu@jujuc:~$ juju status
Model Controller Cloud/Region Version SLA Timestamp
k8s canonistack-manual canonistack/canonistack-bos01 2.7-beta1 unsupported 03:12:32Z

App Version Status Scale Charm Store Rev OS Notes
containerd active 3 containerd jujucharms 20 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 270 ubuntu
etcd 3.2.10 active 1 etcd jujucharms 449 ubuntu
flannel 0.10.0 active 3 flannel jujucharms 438 ubuntu
kubernetes-master 1.15.3 active 1 kubernetes-master jujucharms 724 ubuntu exposed
kubernetes-worker 1.15.3 waiting 2 kubernetes-worker jujucharms 571 ubuntu exposed
openstack-integrator rocky active 1 openstack-integrator jujucharms 26 ubuntu

Unit Workload Agent Machine Public address Ports Message
easyrsa/1* active executing 0 10.48.132.145 Certificate Authority connected.
etcd/0* active idle 0 10.48.132.145 2379/tcp Healthy with 1 known peer
kubernetes-master/0* active executing 0 10.48.132.145 6443/tcp Kubernetes master running.
  containerd/1 active idle 10.48.132.145 Container runtime available.
  flannel/1 active idle 10.48.132.145 Flannel subnet 10.1.88.1/24
kubernetes-worker/0* waiting idle 1 10.48.130.251 80/tcp,443/tcp Waiting for cloud integration
  containerd/0* active idle 10.48.130.251 Container runtime available.
  flannel/0* active idle 10.48.130.251 Flannel subnet 10.1.96.1/24
kubernetes-worker/1 waiting idle 2 10.48.131.75 80/tcp,443/tcp Waiting for cloud integration
  containerd/2 active idle 10.48.131.75 Container runtime available.
  flannel/2 active idle 10.48.131.75 Flannel subnet 10.1.7.1/24
openstack-integrator/0* active executing 0 10.48.132.145 ready

Machine State DNS Inst id Series AZ Message
0 started 10.48.132.145 5c4bb239-f059-420f-90bb-e173f8e0bc1a bionic nova ACTIVE
1 started 10.48.130.251 b7dfd58a-d8e3-4260-af7a-26d9435857c0 bionic nova ACTIVE
2 started 10.48.131.75 e8d8bd7c-0ef4-495a-9e6d-266a0dd2a088 bionic nova ACTIVE

Revision history for this message
Tim Penhey (thumper) wrote :

It may well be that the status is set to executing before it actually acquires the hook execution lock.

You can go onto the machine and look at /var/log/machine-lock.log to see what actually happened.

Also the command juju_machine_lock will show at any time who is waiting for the lock and what for.

Changed in juju:
status: New → Incomplete
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (7.4 KiB)

I used `juju run --unit kubernetes-master/0 'sleep 1000'` and `juju run --unit etcd/0 'sleep 1000'` as simulations of a hook executions (although the results of those do not seem to get logged in the /var/log/juju/machine-lock.log).

http://paste.ubuntu.com/p/2NQykJkJDK/ (/var/log/juju/machine-lock.log)

Based on the juju_machine_lock output below there is only one holder of the lock which is correct.

From the usability perspective I think it's harder to find out which agent is really executing something and holding the lock with this behavior. Should the state be "executing" only when a lock is actually held?

I am asking this because with large models it is difficult to spot units that endlessly execute something because of charm bugs preventing a model from progressing forward. If there is only one of them per machine in that state in Juju status output at any given time it is a little more straightforward.

juju status
Model Controller Cloud/Region Version SLA Timestamp
k8s canonistack-manual canonistack/canonistack-bos01 2.7-beta1 unsupported 16:18:34Z

App Version Status Scale Charm Store Rev OS Notes
containerd active 3 containerd jujucharms 20 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 270 ubuntu
etcd 3.2.10 active 1 etcd jujucharms 449 ubuntu
flannel 0.10.0 active 3 flannel jujucharms 438 ubuntu
kubernetes-master 1.15.3 active 1 kubernetes-master jujucharms 724 ubuntu exposed
kubernetes-worker 1.15.3 active 2 kubernetes-worker jujucharms 571 ubuntu exposed
openstack-integrator rocky active 1 openstack-integrator jujucharms 26 ubuntu

Unit Workload Agent Machine Public address Ports Message
easyrsa/1* active executing 0 10.48.132.145 Certificate Authority connected.
etcd/0* active executing 0 10.48.132.145 2379/tcp (juju-run) Healthy with 1 known peer
kubernetes-master/0* active executing 0 10.48.132.145 6443/tcp (juju-run) Kubernetes master running.
  containerd/1 active idle 10.48.132.145 Container runtime available.
  flannel/1 active idle 10.48.132.145 Flannel subnet 10.1.88.1/24
kubernetes-worker/0* active idle 1 10.48.130.251 80/tcp,443/tcp Kubernetes worker running.
  containerd/0* active idle 10.48.130.251 Container runtime available.
  flannel/0* active idle 10.48.130.251 Flannel subnet 10.1.96.1/24
kubernetes-worker/1 active idle 2 10.48.131.75 80/tcp,443/tcp Kubernetes worker running.
  containerd/2 active idle 10.48.131.75 Container runtime available.
  flannel/2 active ...

Read more...

Changed in juju:
status: Incomplete → New
Revision history for this message
Tim Penhey (thumper) wrote :

I think it is reasonable to only set the executing mode once it actually grabs the lock.

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.