Canonical Juju

juju-exec hangs indefinitely when called during charm upgrade

Bug #2055184 reported by Christopher Bartz on 2024-02-27

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Triaged	Medium	Ian Booth

Bug Description

We have a machine charm deployed using juju 3.1 on openstack. We use a custom event `reconcile-runners` which is triggered by a systemd timer.

The command to fire the event is

```
/usr/bin/bash -c '/usr/bin/juju-exec "{{unit}}" "JUJU_DISPATCH_PATH={{event}} timeout {{timeout}} ./dispatch" || /usr/bin/juju-run "{{unit}}" "JUJU_DISPATCH_PATH={{event}} timeout {{timeout}} ./dispatch"
```

One day after a charm upgrade, we noticed that this event no longer fired, because the event fired a day earlier was still considered active because the process was hanging:

```
ubuntu@juju-53f11a-runner:~$ ps afuwwxx | grep reco
ubuntu 3925878 0.0 0.0 7012 2172 pts/0 S+ 10:28 0:00 \_ grep --color=auto reco
root 3791264 0.0 0.0 7764 3052 ? Ss Feb26 0:00 /usr/bin/bash -c /usr/bin/juju-exec "small/12" "JUJU_DISPATCH_PATH=reconcile-runners timeout 1740 ./dispatch" || /usr/bin/juju-run "small/12" "JUJU_DISPATCH_PATH=reconcile-runners timeout
1740 ./dispatch"
root 3791265 0.0 0.1 893180 53004 ? Sl Feb26 0:06 \_ /usr/bin/juju-exec small/12 JUJU_DISPATCH_PATH=reconcile-runners timeout 1740 ./dispatch
```

This behaviour could be reproduced locally (using lxd cloud) by refreshing the charm and then running

```
sudo /usr/bin/juju-exec "github-runner/96" "JUJU_DISPATCH_PATH=reconcile-runners timeout 1740 ./dispatch"
```

in which case the command will also hang.

Revision history for this message

Joseph Phillips (manadart) wrote on 2024-02-29:

Assigned to Ian for further comment.

As discussed on Matrix, there's a time-out supplied with the command, but that applies to Juju's wait following actual dispatch.

From a scan over the code, it seems like during charm upgrade there can be the absence of a listener on the exec socket, which means we don't ever even get to factoring said time-out. Is that plausible?

Note that this is using the exec-to-dispatch-out-of-band trick.

Changed in juju:
status:	New → Triaged
importance:	Undecided → Medium
assignee:	nobody → Ian Booth (wallyworld)

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.