juju-exec hangs indefinitely when called during charm upgrade

Bug #2055184 reported by Christopher Bartz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Ian Booth

Bug Description

We have a machine charm deployed using juju 3.1 on openstack. We use a custom event `reconcile-runners` which is triggered by a systemd timer.

The command to fire the event is

```
/usr/bin/bash -c '/usr/bin/juju-exec "{{unit}}" "JUJU_DISPATCH_PATH={{event}} timeout {{timeout}} ./dispatch" || /usr/bin/juju-run "{{unit}}" "JUJU_DISPATCH_PATH={{event}} timeout {{timeout}} ./dispatch"
```

One day after a charm upgrade, we noticed that this event no longer fired, because the event fired a day earlier was still considered active because the process was hanging:

```
ubuntu@juju-53f11a-runner:~$ ps afuwwxx | grep reco
ubuntu 3925878 0.0 0.0 7012 2172 pts/0 S+ 10:28 0:00 \_ grep --color=auto reco
root 3791264 0.0 0.0 7764 3052 ? Ss Feb26 0:00 /usr/bin/bash -c /usr/bin/juju-exec "small/12" "JUJU_DISPATCH_PATH=reconcile-runners timeout 1740 ./dispatch" || /usr/bin/juju-run "small/12" "JUJU_DISPATCH_PATH=reconcile-runners timeout
 1740 ./dispatch"
root 3791265 0.0 0.1 893180 53004 ? Sl Feb26 0:06 \_ /usr/bin/juju-exec small/12 JUJU_DISPATCH_PATH=reconcile-runners timeout 1740 ./dispatch
```

This behaviour could be reproduced locally (using lxd cloud) by refreshing the charm and then running

```
sudo /usr/bin/juju-exec "github-runner/96" "JUJU_DISPATCH_PATH=reconcile-runners timeout 1740 ./dispatch"
```

in which case the command will also hang.

Revision history for this message
Joseph Phillips (manadart) wrote :

Assigned to Ian for further comment.

As discussed on Matrix, there's a time-out supplied with the command, but that applies to Juju's wait following actual dispatch.

From a scan over the code, it seems like during charm upgrade there can be the absence of a listener on the exec socket, which means we don't ever even get to factoring said time-out. Is that plausible?

Note that this is using the exec-to-dispatch-out-of-band trick.

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Ian Booth (wallyworld)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.