juju deploy is too slow

Bug #1987684 reported by Simon Déziel
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned

Bug Description

Deploying a local .charm takes more than 1 minute:

$ time juju deploy --debug ./lxd-cloud-cell_ubuntu-22.04-amd64.charm cell-a --resource snap=lxd-cloud-cell_0+git.d27fa0c_amd64.snap -n 3 --to 31,34,35
12:49:08 INFO juju.cmd supercommand.go:56 running juju [2.9.33 95186b2e0c2dfa9fe7b0b815cfe2c939813f9302 gc go1.18.3]
12:49:08 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/20277/bin/juju", "deploy", "--debug", "./lxd-cloud-cell_ubuntu-22.04-amd64.charm", "cell-a", "--resource", "snap=lxd-cloud-cell_0+git.d27fa0c_amd64.snap", "-n", "3", "--to", "31,34,35"}
12:49:08 INFO juju.juju api.go:78 connecting to API addresses: [172.17.40.184:17070]
12:49:08 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://172.17.40.184:17070/api"
12:49:08 INFO juju.api apiclient.go:688 connection established to "wss://172.17.40.184:17070/api"
12:49:08 INFO juju.juju api.go:78 connecting to API addresses: [172.17.40.184:17070]
12:49:08 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://172.17.40.184:17070/model/6fec25fe-a09f-4764-8b80-c43f1a62f73c/api"
12:49:08 INFO juju.api apiclient.go:688 connection established to "wss://172.17.40.184:17070/model/6fec25fe-a09f-4764-8b80-c43f1a62f73c/api"
12:49:13 DEBUG juju.core.charm computedseries.go:27 series "jammy" for charm "lxd-cloud-cell" with format 2, Kubernetes false
12:49:13 INFO juju.cmd.juju.application.deployer series_selector.go:83 with the configured model default series "jammy"
12:49:13 DEBUG juju.core.charm computedseries.go:27 series "jammy" for charm "lxd-cloud-cell" with format 2, Kubernetes false
12:49:13 INFO cmd charm.go:384 Preparing to deploy local charm: "lxd-cloud-cell_ubuntu-22.04-amd64.charm"
12:50:17 INFO cmd charm.go:406 Located local charm "lxd-cloud-cell", revision 13
12:50:23 INFO cmd charm.go:236 Deploying "cell-a" from local charm "lxd-cloud-cell", revision 13 on jammy
12:50:25 DEBUG juju.api monitor.go:35 RPC connection died
12:50:25 DEBUG juju.api monitor.go:35 RPC connection died
12:50:25 INFO cmd supercommand.go:544 command finished

real 1m17.398s
user 0m3.883s
sys 0m6.742s

Both local files are small:

$ ll -h lxd-cloud-cell_ubuntu-22.04-amd64.charm lxd-cloud-cell_0+git.d27fa0c_amd64.snap
-rw-rw-r-- 1 sdeziel sdeziel 20M Aug 2 22:02 lxd-cloud-cell_0+git.d27fa0c_amd64.snap
-rw-r--r-- 1 sdeziel sdeziel 5.8M Aug 17 19:17 lxd-cloud-cell_ubuntu-22.04-amd64.charm

And the juju controller is a local LXD instance on the same machine so the network is rather fast:

$ lxc ls
+---------------+---------+----------------------+------+-----------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+---------------+---------+----------------------+------+-----------+-----------+
| juju-492fcd-0 | RUNNING | 172.17.40.184 (eth0) | | CONTAINER | 0 |
+---------------+---------+----------------------+------+-----------+-----------+

$ ping 172.17.40.184
PING 172.17.40.184 (172.17.40.184) 56(84) bytes of data.
64 bytes from 172.17.40.184: icmp_seq=1 ttl=64 time=0.089 ms
64 bytes from 172.17.40.184: icmp_seq=2 ttl=64 time=0.090 ms
^C
--- 172.17.40.184 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.089/0.089/0.090/0.000 ms

Juju probably doesn't use the LXD file transfer facility but here are some numbers for comparison:

$ time lxc file push lxd-cloud-cell_ubuntu-22.04-amd64.charm juju-492fcd-0/tmp/

real 0m2.910s
user 0m0.115s
sys 0m0.158s

$ time lxc file push lxd-cloud-cell_0+git.d27fa0c_amd64.snap juju-492fcd-0/tmp/

real 0m1.220s
user 0m0.121s
sys 0m0.211s

After calling `juju deploy`, there appears to be a ~1 minute "pause" which probably corresponds to:

12:49:13 INFO cmd charm.go:384 Preparing to deploy local charm: "lxd-cloud-cell_ubuntu-22.04-amd64.charm"
12:50:17 INFO cmd charm.go:406 Located local charm "lxd-cloud-cell", revision 13

Additional information:

$ lsb_release -rd
Description: Ubuntu 20.04.4 LTS
Release: 20.04
$ uname -a
Linux shell01 5.19.2+ #50 SMP Sat Aug 20 02:52:38 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
$ snap list juju lxd
Name Version Rev Tracking Publisher Notes
juju 2.9.33 20277 latest/stable canonical✓ classic
lxd 5.4-1ff8d34 23371 latest/stable canonical✓ -

Tags: lxd-cloud
Revision history for this message
Ian Booth (wallyworld) wrote :

juju does push the resource to the controller which stores it in mongo gridfs. I wonder if this is where the time is going. 20M doesn't seem overly large though. We'll need to investigate a bit.

Changed in juju:
milestone: none → 2.9.34
importance: Undecided → High
status: New → Triaged
Changed in juju:
milestone: 2.9.34 → 2.9.35
Changed in juju:
milestone: 2.9.35 → 2.9.36
Revision history for this message
Simon Déziel (sdeziel) wrote :

`deploy` is much slower now on Juju 2.9.34 even with a smaller charm not using any resource:

```
sdeziel@shell01:~$ ls -lh ./lxd_ubuntu-20.04-amd64.charm && time juju deploy --debug -m test ./lxd_ubuntu-20.04-amd64.charm
-rw-r--r-- 1 sdeziel sdeziel 11M Oct 5 21:00 ./lxd_ubuntu-20.04-amd64.charm

14:46:34 INFO juju.cmd supercommand.go:56 running juju [2.9.34 20b8212e62c5d122e63141825a7733bf66c4c5a9 gc go1.18.5]
14:46:34 DEBUG juju.cmd supercommand.go:57 args: []string{"/snap/juju/20511/bin/juju", "deploy", "--debug", "-m", "test", "./lxd_ubuntu-20.04-amd64.charm"}
14:46:34 INFO juju.juju api.go:78 connecting to API addresses: [172.17.40.60:17070]
14:46:34 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://172.17.40.60:17070/api"
14:46:34 INFO juju.api apiclient.go:688 connection established to "wss://172.17.40.60:17070/api"
14:46:34 INFO juju.juju api.go:78 connecting to API addresses: [172.17.40.60:17070]
14:46:34 DEBUG juju.api apiclient.go:1153 successfully dialed "wss://172.17.40.60:17070/model/bab6b8e8-3e8c-40e0-8eec-118ea6a15196/api"
14:46:34 INFO juju.api apiclient.go:688 connection established to "wss://172.17.40.60:17070/model/bab6b8e8-3e8c-40e0-8eec-118ea6a15196/api"
14:46:34 DEBUG juju.core.charm computedseries.go:27 series "focal" for charm "lxd" with format 2, Kubernetes false
14:46:34 DEBUG juju.core.charm computedseries.go:27 series "focal" for charm "lxd" with format 2, Kubernetes false
14:46:34 INFO cmd charm.go:384 Preparing to deploy local charm: "lxd_ubuntu-20.04-amd64.charm"
14:53:43 INFO cmd charm.go:406 Located local charm "lxd", revision 4
14:53:47 INFO cmd charm.go:236 Deploying "lxd" from local charm "lxd", revision 4 on focal
14:53:55 DEBUG juju.api monitor.go:35 RPC connection died
14:53:55 DEBUG juju.api monitor.go:35 RPC connection died
14:53:55 INFO cmd supercommand.go:544 command finished

real 7m22.082s
user 0m0.501s
sys 0m0.414s
```

In both cases, the Juju controller is a local LXD container so the network upload of the charm shouldn't be the source of the slowness.

*Important note*: the slowness is only observed on an arm64 controller deploying to amd64 machines. The same deployment from an amd64 controller is way faster (~1 minute).

Changed in juju:
milestone: 2.9.36 → 2.9.37
Changed in juju:
milestone: 2.9.37 → 2.9.38
Changed in juju:
milestone: 2.9.38 → 2.9.39
Revision history for this message
Simon Déziel (sdeziel) wrote :

Sorry to reiterate but this is really unbearable at times:

$ time juju deploy -m test ./lxd_ubuntu-22.04-amd64.charm --config lxd-listen-https=true
Located local charm "lxd", revision 1
Deploying "lxd" from local charm "lxd", revision 1 on ubuntu@22.04/stable

real 8m16.558s
user 0m0.502s
sys 0m0.442s

Fortunately, it's not always *that* slow (usually ~1 minute) but I don't know why it's fluctuating that much.

$ snap list juju
Name Version Rev Tracking Publisher Notes
juju 3.0.3-f060a17 21815 3.0/edge canonical✓ -

Changed in juju:
milestone: 2.9.39 → 2.9.40
Changed in juju:
milestone: 2.9.40 → 2.9.41
Changed in juju:
milestone: 2.9.41 → 2.9.42
Changed in juju:
milestone: 2.9.42 → 2.9.43
Changed in juju:
milestone: 2.9.43 → 2.9.44
Changed in juju:
milestone: 2.9.44 → 2.9.45
Changed in juju:
milestone: 2.9.45 → 2.9.46
Revision history for this message
Haw Loeung (hloeung) wrote :

Is this due to LXD/LXC container provisioning?

Is it faster if you have the `enable-os-refresh-update` and `enable-os-upgrade` model config options both set to false?

Perhaps check /var/log/cloud-init.log for clues on how long container provisioning is taking?

Revision history for this message
Ian Booth (wallyworld) wrote :

The next 2.9.46 candidate release will not include a fix for this bug and we don't plan on any more 2.9 releases. As such it is being removed from its 2.9 milestone.

If the bug is still important to you, let us know and we can consider it for inclusion on a 3.x milestone.

Changed in juju:
milestone: 2.9.46 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.