juju-exec hangs indefinitely when called during charm upgrade
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Medium
|
Ian Booth |
Bug Description
We have a machine charm deployed using juju 3.1 on openstack. We use a custom event `reconcile-runners` which is triggered by a systemd timer.
The command to fire the event is
```
/usr/bin/bash -c '/usr/bin/juju-exec "{{unit}}" "JUJU_DISPATCH_
```
One day after a charm upgrade, we noticed that this event no longer fired, because the event fired a day earlier was still considered active because the process was hanging:
```
ubuntu@
ubuntu 3925878 0.0 0.0 7012 2172 pts/0 S+ 10:28 0:00 \_ grep --color=auto reco
root 3791264 0.0 0.0 7764 3052 ? Ss Feb26 0:00 /usr/bin/bash -c /usr/bin/juju-exec "small/12" "JUJU_DISPATCH_
1740 ./dispatch"
root 3791265 0.0 0.1 893180 53004 ? Sl Feb26 0:06 \_ /usr/bin/juju-exec small/12 JUJU_DISPATCH_
```
This behaviour could be reproduced locally (using lxd cloud) by refreshing the charm and then running
```
sudo /usr/bin/juju-exec "github-runner/96" "JUJU_DISPATCH_
```
in which case the command will also hang.
Assigned to Ian for further comment.
As discussed on Matrix, there's a time-out supplied with the command, but that applies to Juju's wait following actual dispatch.
From a scan over the code, it seems like during charm upgrade there can be the absence of a listener on the exec socket, which means we don't ever even get to factoring said time-out. Is that plausible?
Note that this is using the exec-to- dispatch- out-of- band trick.