After one of the redeployments we found that the next one was very slow without an apparent reason.
There were more than 10 deployments with fairly large models on this controller. These models were removed and re-created for new deployments.
The juju controller (non-HA) is running in a qemu-kvm virtual machine and resource utilization (CPU, memory, disk) seems to be far lower from what an underlying host can provide. The VM is not rate-limited.
juju status -m maas:controller
Model Controller Cloud/Region Version SLA Timestamp Notes
controller maas maas 2.4.3 unsupported 09:38:42-04:00 upgrade available: 2.4.4
Machine State DNS Inst id Series AZ Message
0 started 172.16.7.13 kyb86b bionic default Deployed
Model deployment in progress:
http://paste.ubuntu.com/p/6kfbpbbzP2/
It took about 30 minutes for a juju client to exit after a `juju deploy` command (which is only enumerating the charms and units and adding machines to the model).
mongostat reports (subjectively) significant CRUD per second rates while the resident memory utilization is fairly static:
root@juju:/var/log/juju# mongostat --host 127.0.0.1 --port 37017 --authenticationDatabase admin --ssl --sslAllowInvalidCertificates --username "$user" --password "$password"
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
92 805 323 *0 0 12|0 3.5% 72.1% 0 3.17G 1.27G 0|0 1|0 282k 1.50m 553 juju PRI Oct 12 13:32:16.532
13 26 2 *0 0 2|0 3.5% 72.1% 0 3.17G 1.27G 0|0 1|0 7.99k 91.9k 553 juju PRI Oct 12 13:32:17.531
171 543 43 *0 1 5|0 3.6% 72.2% 0 3.17G 1.27G 0|0 1|1 164k 1.12m 553 juju PRI Oct 12 13:32:18.529
15 140 24 *0 0 2|0 3.6% 72.2% 0 3.17G 1.27G 0|0 1|0 36.1k 269k 553 juju PRI Oct 12 13:32:19.533
222 153 29 *0 0 7|0 4.0% 72.5% 0 3.17G 1.27G 0|0 1|0 86.0k 262k 553 juju PRI Oct 12 13:32:20.532
18 901 117 *0 1 4|0 4.0% 72.5% 0 3.17G 1.27G 0|0 1|0 218k 1.55m 553 juju PRI Oct 12 13:32:21.531
275 852 76 *0 0 4|0 4.3% 72.8% 0 3.17G 1.27G 0|0 1|0 324k 1.63m 553 juju PRI Oct 12 13:32:22.541
24 217 65 *0 0 2|0 4.3% 72.8% 0 3.17G 1.27G 0|0 1|0 62.3k 331k 553 juju PRI Oct 12 13:32:23.532
242 390 *0 *0 0 1|0 4.4% 72.9% 0 3.17G 1.27G 0|0 1|0 127k 717k 553 juju PRI Oct 12 13:32:24.535
2 1182 279 2 0 6|0 4.5% 73.0% 0 3.17G 1.27G 0|0 1|0 350k 2.05m 553 juju PRI Oct 12 13:32:25.532
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
112 59 4 *0 0 4|0 4.5% 73.0% 0 3.17G 1.27G 0|0 1|0 37.0k 137k 553 juju PRI Oct 12 13:32:26.532
5 403 21 *0 1 4|0 4.3% 73.0% 0 3.17G 1.27G 0|0 1|0 87.1k 763k 553 juju PRI Oct 12 13:32:27.532
77 189 22 *0 0 5|0 4.4% 73.1% 0 3.17G 1.27G 0|0 1|0 56.1k 286k 553 juju PRI Oct 12 13:32:28.531
1 805 76 *0 0 9|0 4.3% 73.1% 0 3.17G 1.27G 0|0 5|1 204k 1.56m 553 juju PRI Oct 12 13:32:29.577
56 348 172 *0 0 2|0 4.4% 73.1% 0 3.17G 1.27G 0|0 1|0 137k 589k 553 juju PRI Oct 12 13:32:30.533
62 141 35 1 0 14|0 4.5% 73.2% 0 3.17G 1.27G 0|0 1|0 58.6k 288k 554 juju PRI Oct 12 13:32:31.551
111 1408 251 *0 1 10|0 4.6% 73.3% 0 3.17G 1.27G 0|0 1|0 401k 2.40m 554 juju PRI Oct 12 13:32:32.530
102 368 89 *0 0 7|0 4.6% 73.3% 0 3.17G 1.27G 0|0 2|0 117k 530k 554 juju PRI Oct 12 13:32:33.536
34 250 37 *0 0 3|0 4.6% 73.3% 0 3.17G 1.27G 0|0 1|0 66.1k 406k 554 juju PRI Oct 12 13:32:34.532
217 1148 119 *0 0 3|0 4.7% 73.4% 0 3.17G 1.27G 0|0 3|0 351k 2.18m 554 juju PRI Oct 12 13:32:35.556
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
48 905 299 *0 0 9|0 4.8% 73.5% 0 3.17G 1.27G 0|0 1|0 284k 1.61m 554 juju PRI Oct 12 13:32:36.531
240 683 214 1 1 6|0 4.9% 73.6% 0 3.17G 1.27G 0|0 1|0 280k 1.40m 554 juju PRI Oct 12 13:32:37.529
3 388 54 *0 0 1|0 5.0% 73.6% 0 3.17G 1.27G 0|0 1|0 91.6k 693k 554 juju PRI Oct 12 13:32:38.532
60 106 55 *0 0 2|0 5.0% 73.6% 0 3.17G 1.27G 0|0 1|0 50.0k 264k 554 juju PRI Oct 12 13:32:39.531
4 707 210 *0 0 5|0 5.0% 73.7% 0 3.17G 1.27G 0|0 1|0 229k 1.44m 554 juju PRI Oct 12 13:32:40.531
59 967 189 *0 1 7|0 5.0% 73.7% 0 3.17G 1.27G 0|0 1|0 269k 1.67m 554 juju PRI Oct 12 13:32:41.528
167 710 160 *0 0 2|0 4.7% 73.3% 0 3.17G 1.27G 0|0 1|0 237k 1.16m 554 juju PRI Oct 12 13:32:42.531
51 941 153 *0 0 11|0 4.8% 73.3% 0 3.17G 1.27G 0|0 1|0 252k 1.65m 554 juju PRI Oct 12 13:32:43.531
128 503 71 *0 0 7|0 4.8% 73.4% 0 3.17G 1.27G 0|0 1|0 157k 919k 554 juju PRI Oct 12 13:32:44.533
263 519 78 *0 0 4|0 5.0% 73.6% 0 3.17G 1.27G 0|0 1|0 218k 964k 554 juju PRI Oct 12 13:32:45.528
insert query update delete getmore command dirty used flushes vsize res qrw arw net_in net_out conn set repl time
116 262 43 *0 0 12|0 5.2% 73.6% 0 3.17G 1.27G 0|0 1|0 88.9k 392k 554 juju PRI Oct 12 13:32:46.544
~~~
Logs for the mongostat measurement period:
journalctl -u juju-db.service --since='2018-10-12 13:31:46' --until='2018-10-12 13:32:46' | pastebinit
http://paste.ubuntu.com/p/449m6CxFTf/
grep -P '(2018-10-12 13:31:(\d+))|(2018-10-12 13:32:(\d+))' debug-log-controller-0.log | pastebinit
http://paste.ubuntu.com/p/nP4kJvvVTw/
mongodump:
https://people.canonical.com/~dima/mongodump-1539507005.zst
Marking as ~field-critical due to the fact that the core functionality was critically impacted and resulted in customer-facing implications.
Although the model was v2.4.3 I do not see any bugfixes related to the described behavior for 2.4.4: /launchpad. net/juju/ +milestone/ 2.4.4
https:/
We will preserve the environment for now to investigate.