tripleo devtest overcloud startup takes too long
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Won't Fix
|
High
|
Ghe Rivero |
Bug Description
Discussions with Robert Collins indicate that I should expect overcloud startup (with the default devtest configuration and settings) on Ubuntu 13.10 on reasonable hardware to complete in about 5 minutes. I am running repeated builds on a dedicated system with the following configuration:
Processor: 2 x Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
Memory: 96G
Disk: 2TB
OS: Ubuntu 13.10
and seeing consistent overcloud startup times between 18 and 22 minutes.
devtest is configured to use a local pypi mirror, a local apt mirror and a local squid proxy.
Jon-Paul Sullivan (jonpaul-sullivan) wrote : | #1 |
Robert Collins (lifeless) wrote : | #2 |
Specifically I said that copying the data for 3 VMs + booting and init scripts shouldn't take this long. Fixing is a priority.
Changed in tripleo: | |
status: | New → Triaged |
importance: | Undecided → High |
Robert Collins (lifeless) wrote : | #3 |
Places I know we have lots of overhead:
a) - nova baremetal downloads and qemu-img converts each instance's image - no effective cache.
b) - we don't use virtio network adapters, and have observed excessive network fragmentation in the dd phase of a deploy.
c) - we transmit each instances image over the network - no multicast
d) - our images have considerable fat due to repeated copies of the same files
We don't have great instrumentation about where each step (swift -> disk, disk -> raw, dd to target, reboot, cluster-init) happens to let us zero in on timings, so that would be a great step to take if this isn't obvious.
We have other places folk have looked at in the past:
e) we don't deploy multiple images concurrently
Taking these one at a time:
a) This is something we'll get fixed when we move to Ironic. Pushing on that would be a straight forward way to get a substantial performance boost in clusters with the same image deployed multiple times.
b) This gives an extreme performance hit, so fixing it should make a considerable difference. To fix it we need to change the VM definition to use virtio, and we need to test across our matrix: VM hosts (fedora, ubuntu latest stable), and VM guests (aka images we build) fedora, ubuntu - so fedora images, fedora host, ubuntu images, fedora host, fedora images, ubuntu host, ubuntu images, ubuntu host. We have had issues with DHCP and things failing (due to checksums being missing with virtio) that /may/ be all fixed now upstream, or we can add mangle rules if we need to.
c) https:/
d) Dan Princes work on running all services from one virtualenv, plus deleting the git trees before we pack everything up should remove a lot of fat. We should be able to go further and purge a huge number of transient packages (e.g. virtualenv shouldn't be needed after the installation is complete).
e) deploying multiple images concurrently has shown no performance improvement on bare metal or VMs - there's enough chokepoints that while concurrency *may* be useful (e.g. if we have more network bandwidth on the deploy host than the target + enough CPU capacity on the deploy host + cache room) we hit bottlenecks straight away today. Note that because we boot targets into the agent and they wait, we're only paying 'boot time' once across the whole cluster.
My recommendation is to start with instrumentation, follow up with b) if network transfer is an issue, and d) thereafter, for now.
Robert Collins (lifeless) wrote : | #4 |
I've done some baseline tests today.
With a moderately tuned setup: wheels for building, local ubuntu and pypi mirror etc, -offline -u image builds, I have the following times:
seed = 7m real, 2m user, 51s sys,
undercloud=8m34s real, user 2m4, sys 38s
overcloud=17m 4m37 user 1m31 sys - and of that 3m was user image startup.
Note that that was entire stages including image builds.
Robert Collins (lifeless) wrote : | #5 |
Seed bringup with cached image:
real 3m33.652s
user 0m9.174s
sys 0m5.274s
Undercloud bringup with cached image:
real 5m32.363s
user 0m18.255s
sys 0m4.041s
Overcloud bringup with cached image:
real 12m0.132s
user 0m25.138s
sys 0m4.322s
(still including ~3m for user image startup).
Derek Higgins (derekh) wrote : | #6 |
For reference, here is the commit to switch to virtio net that we put on hold until we can get the entire supported matrix tested with it.
https:/
Now that we have a working fedora testenv I can test it out do ensure we are covered and then restore the commit.
stephen mulcahy (stephen-mulcahy) wrote : | #7 |
export LIBVIRT_
./devtest.sh --trash-my-machine
[2014-03-24 18:18:52] Total runtime: 6133 s
[2014-03-24 18:18:52] ramdisk : 632 s
[2014-03-24 18:18:52] seed : 1408 s
[2014-03-24 18:18:52] undercloud : 1615 s
[2014-03-24 18:18:52] overcloud : 2385 s
./devtest.sh --trash-my-machine -c
[2014-03-25 09:03:53] Total runtime: 1739 s
[2014-03-25 09:03:53] ramdisk : 0 s
[2014-03-25 09:03:53] seed : 265 s
[2014-03-25 09:03:53] undercloud : 444 s
[2014-03-25 09:03:53] overcloud : 1022 s
so even with cached run and virtio driver, overcloud standup is still taking 17 minutes
Robert Collins (lifeless) wrote : | #8 |
without virtio:
./devtest.sh --trash-my-machine
08:43 < lifeless> [2014-03-26 08:20:14] Total runtime: 2181 s
08:43 < lifeless> [2014-03-26 08:20:14] ramdisk : 61 s
08:43 < lifeless> [2014-03-26 08:20:14] seed : 412 s
08:43 < lifeless> [2014-03-26 08:20:14] undercloud : 545 s
08:43 < lifeless> [2014-03-26 08:20:14] overcloud : 1144 s
08:43 < lifeless> total devtest.sh runtime - 36m
./devtest.sh --trash-my-machine -c
[2014-03-26 16:17:15] Total runtime: 1396 s
[2014-03-26 16:17:15] ramdisk : 0 s
[2014-03-26 16:17:15] seed : 213 s
[2014-03-26 16:17:15] undercloud : 353 s
[2014-03-26 16:17:15] overcloud : 820 s
real 23m16.304s
user 0m57.494s
sys 0m16.543s
Robert Collins (lifeless) wrote : | #9 |
btw I'm building amd64 images.
Robert Collins (lifeless) wrote : | #10 |
Virtio:
[2014-03-26 22:40:58] Total runtime: 1010 s
[2014-03-26 22:40:58] ramdisk : 0 s
[2014-03-26 22:40:58] seed : 216 s
[2014-03-26 22:40:58] undercloud : 230 s
[2014-03-26 22:40:58] overcloud : 556 s
real 16m50.591s
user 0m53.818s
sys 0m15.669s
So - a huge difference.
Robert Collins (lifeless) wrote : | #11 |
(that was with -c)
Robert Collins (lifeless) wrote : | #12 |
To reproduce my timings:
$ cat my-rc
export TRIPLEO_
export DIB_COMMON_
export DIB_APT_
export PYPI_MIRROR_URL=http://
export DIB_NO_PYPI_PIP=1 # no pypi.python.org roundtrips
export PYPI_MIRROR_URL_1=http://
export NODE_ARCH=amd64 # same arch as my machine, so the wheels in the mirror work :)
I don't have an http_proxy setup in this environment.
Ghe Rivero (ghe.rivero) wrote : | #13 |
with virtio:
[2014-03-26 14:21:12] Total runtime: 2249 s
[2014-03-26 14:21:12] ramdisk : 74 s
[2014-03-26 14:21:12] seed : 484 s
[2014-03-26 14:21:12] undercloud : 542 s
[2014-03-26 14:21:12] overcloud : 1119 s
-c
[2014-03-26 15:40:04] Total runtime: 1141 s
[2014-03-26 15:40:04] ramdisk : 0 s
[2014-03-26 15:40:04] seed : 227 s
[2014-03-26 15:40:04] undercloud : 295 s
[2014-03-26 15:40:04] overcloud : 584 s
Without virtio, it times out. DD'ing the image makes kworker eat 100% CPU. Changing e1000 to virtio fix it.
Intel(R) Core(TM) i7-4770K CPU @ 3.50GH
16GB Ram
Differcies with Robert settings: Added http_proxy, https_proxy, apt-conf and using wheels but not DIB_NO_PYPI_PIP.
export NODE_CPU=2 NODE_MEM=2048 NODE_DISK=30 NODE_ARCH=amd64
Ghe Rivero (ghe.rivero) wrote : | #14 |
with virtio:
[2014-03-26 15:34:07] Total runtime: 2968 s
[2014-03-26 15:34:07] ramdisk : 102 s
[2014-03-26 15:34:07] seed : 538 s
[2014-03-26 15:34:07] undercloud : 938 s
[2014-03-26 15:34:07] overcloud : 1361 s
-c
[2014-03-26 16:07:15] Total runtime: 1836 s
[2014-03-26 16:07:15] ramdisk : 0 s
[2014-03-26 16:07:15] seed : 254 s
[2014-03-26 16:07:15] undercloud : 673 s
[2014-03-26 16:07:15] overcloud : 900 s
Intel(R) Xeon(R) CPU X5650 @ 2.67GHz, 24 cores, 96G Ram
stephen mulcahy (stephen-mulcahy) wrote : | #15 |
See https:/
Robert Collins (lifeless) wrote : | #16 |
More stats
without cache:
[2014-03-26 08:20:14] Total runtime: 2181 s
[2014-03-26 08:20:14] ramdisk : 61 s
[2014-03-26 08:20:14] seed : 412 s
[2014-03-26 08:20:14] undercloud : 545 s
[2014-03-26 08:20:14] overcloud : 1144 s
total devtest.sh runtime - 36m
and
[2014-03-27 00:49:02] Total runtime: 1824 s
[2014-03-27 00:49:02] ramdisk : 74 s
[2014-03-27 00:49:02] seed : 424 s
[2014-03-27 00:49:02] undercloud : 411 s
[2014-03-27 00:49:02] overcloud : 888 s
real 30m23.865s
user 9m27.099s
sys 3m29.709s
with -c:
[2014-03-27 08:03:22] Total runtime: 1071 s
[2014-03-27 08:03:22] ramdisk : 0 s
[2014-03-27 08:03:22] seed : 218 s
[2014-03-27 08:03:22] undercloud : 251 s
[2014-03-27 08:03:22] overcloud : 586 s
real 17m51.527s
user 0m53.982s
sys 0m15.965s
Robert Collins (lifeless) wrote : | #17 |
(comment 16 was all with virtio)
Chris Jones (cmsj) wrote : | #18 |
Rob noted that the ramdisk build seems incredibly slow in Stephen's original numbers (632s, see comment #7)
Some data:
on the lab hardware, calling diskimage-builder directly, for amd64, with --offline:
* 2m40s http_proxy set, elements set to "ubuntu deploy apt-sources apt-conf" (note: second run of this dropped to 1m21s)
* 3m53s http_proxy set, elements set to "ubuntu deploy"
* 4m17s no http_proxy set, elements set to "ubuntu deploy"
* 12m16s http_proxy set, elements set to "ubuntu deploy", pre-cached cloud image removed first
calling devtest_ramdisk.sh with the best case options from above (i.e. http_proxy, apt-sources/
Stephen: is that 10 minute ramdisk build consistent? I'm wondering if some $NODE_DIST or $DIB_COMMON_
stephen mulcahy (stephen-mulcahy) wrote : | #19 |
Chris: I suspect it was a fresh image being downloaded. I have sent you pointers to an additional set of more consistent data. The upstream CI job runtimes are also similar enough to those numbers.
Robert Collins (lifeless) wrote : | #20 |
Ghe - ok so
[2014-03-26 16:07:15] undercloud : 673 s
[2014-03-26 16:07:15] overcloud : 900 s
vs
[2014-03-27 08:03:22] undercloud : 251 s
[2014-03-27 08:03:22] overcloud : 586 s
both virto-io, both cached.
Since the undercloud is faster to bring up, how about we focus there and see where the time is going. Lets get instrumentation to identify:
- time to load the image into glance
- time till the VM is booted (e.g. pingable vs scripts completed)
- time till in-instance scripts have completed (heat signals CREATE_COMPLETE)
- time to fully configure via APIs
and surface that at the end of the devtest_undercloud script or something. I can then run the same thing and we can divide and conquer.
Robert Collins (lifeless) wrote : | #21 |
Just for kicks, here's an ironic -c run with virtio
[2014-03-27 23:08:50] Total runtime: 1236 s
[2014-03-27 23:08:50] ramdisk : 0 s
[2014-03-27 23:08:50] seed : 246 s
[2014-03-27 23:08:50] undercloud : 334 s
[2014-03-27 23:08:50] overcloud : 654 s
stephen mulcahy (stephen-mulcahy) wrote : | #22 |
[2014-03-27 11:43:44] Run comment : tripz400: clean run, using local pypi and apt mirrors and proxy
[2014-03-27 11:43:44] Total runtime: 4302 s
[2014-03-27 11:43:44] ramdisk : 333 s
[2014-03-27 11:43:44] seed : 1100 s
[2014-03-27 11:43:44] undercloud : 798 s
[2014-03-27 11:43:44] overcloud : 1787 s
[2014-03-27 11:43:44] DIB_COMMON_
[2014-03-27 12:18:22] Run comment :: tripz400: cached run, using local pypi and apt mirrors and proxy
[2014-03-27 12:18:22] Total runtime: 1976 s
[2014-03-27 12:18:22] ramdisk : 0 s
[2014-03-27 12:18:22] seed : 263 s
[2014-03-27 12:18:22] undercloud : 541 s
[2014-03-27 12:18:22] overcloud : 1161 s
[2014-03-27 12:18:22] DIB_COMMON_
Settings are as follows,
export http_proxy="..."
export https_proxy="..."
export no_proxy=localhost, ...
# -u to avoid compression, offline to avoid network hits
export DIB_COMMON_
export DIB_APT_
export DIB_APT_
export PYPI_MIRROR_
# local wheel mirror
export PYPI_MIRROR_
# no pypi.python.org roundtrips
export DIB_NO_PYPI_PIP=1
# same arch as host machine, so the wheels in the mirror work, also bumping some related settings for 64-bit
export NODE_CPU=2 NODE_MEM=4096 NODE_DISK=40 NODE_ARCH=amd64
export NODE_DIST="ubuntu apt-conf apt-sources"
# for a faster virtual NIC (but see http://
export LIBVIRT_
stephen mulcahy (stephen-mulcahy) wrote : | #23 |
./devtest.sh --trash-my-machine
[2014-03-27 18:04:26] Run comment : tripz400: clean run, using local pypi and apt mirrors and proxy, performance settings and smaller NODE_ settings
[2014-03-27 18:04:26] Total runtime: 3677 s
[2014-03-27 18:04:26] ramdisk : 97 s
[2014-03-27 18:04:26] seed : 892 s
[2014-03-27 18:04:26] undercloud : 841 s
[2014-03-27 18:04:26] overcloud : 1819 s
[2014-03-27 18:04:26] DIB_COMMON_
./devtest.sh --trash-my-machine -c
[2014-03-27 18:49:26] Run comment : tripz400: clean run, using local pypi and apt mirrors and proxy, performance settings and smaller NODE_ settings
[2014-03-27 18:49:26] Total runtime: 2007 s
[2014-03-27 18:49:26] ramdisk : 0 s
[2014-03-27 18:49:26] seed : 266 s
[2014-03-27 18:49:26] undercloud : 553 s
[2014-03-27 18:49:26] overcloud : 1179 s
[2014-03-27 18:49:26] DIB_COMMON_
Settings are as follows,
export http_proxy="..."
export https_proxy="..."
export no_proxy=localhost, ...
# -u to avoid compression, offline to avoid network hits
export DIB_COMMON_
export DIB_APT_
export DIB_APT_
export PYPI_MIRROR_
# local wheel mirror
export PYPI_MIRROR_
# no pypi.python.org roundtrips
export DIB_NO_PYPI_PIP=1
# same arch as host machine, so the wheels in the mirror work, also bumping some related settings for 64-bit
export NODE_CPU=1 NODE_MEM=2048 NODE_DISK=30 NODE_ARCH=amd64
export NODE_DIST="ubuntu apt-conf apt-sources"
# for a faster virtual NIC (but see http://
export LIBVIRT_
Ghe Rivero (ghe.rivero) wrote : | #24 |
Full log with timestamps from the devtest host:
http://
[2014-03-27 13:02:17] Total runtime: 1723 s
[2014-03-27 13:02:17] ramdisk : 0 s
[2014-03-27 13:02:17] seed : 247 s
[2014-03-27 13:02:17] undercloud : 662 s
[2014-03-27 13:02:17] overcloud : 811 s
Ghe Rivero (ghe.rivero) wrote : | #25 |
Some raw stats (Images build with -u)
boot seed - 140s
glance load-image undercloud -45s
boot undercloud - 568
glance load-image overcloud-control - 33s
glance load-image overcloud-compute - 24s
boot overcloud - 552s
wait-for user image - 129s
Ghe Rivero (ghe.rivero) wrote : | #26 |
Using cirros image for the end user test:
Total runtime: 1634 s
ramdisk : 0 s
seed : 249 s
undercloud : 682 s
overcloud : 693 s
Ghe Rivero (ghe.rivero) wrote : | #27 |
Some more stats all in the same spreadsheet:
https:/
Some data
- -u is actually a penalty more than a improvement in time.
- Overcloud boottimes are penalized when using -c
- Cirros end-user image saves some time.
stephen mulcahy (stephen-mulcahy) wrote : | #28 |
Re-running #23 without -u
[2014-03-28 16:57:18] Total runtime: 3320 s
[2014-03-28 16:57:18] ramdisk : 96 s
[2014-03-28 16:57:18] seed : 565 s
[2014-03-28 16:57:18] undercloud : 866 s
[2014-03-28 16:57:18] overcloud : 1765 s
Ghe Rivero (ghe.rivero) wrote : | #29 |
Some improvements in the weekend:
- Small improvements when the bm nodes are configured with 4 CPUs instead of 2
- Small improvements when the vm are configured with cpu=host-passthroug
- Huge improvement in the undercloud boot time when seed is configured with 4CPUs and 8GB RAM
- Using tmpfs for the images directory (/var/lib/
Next steps:
- Use tmpfs again, but configuring the nodes with 4GB to avoid RAM exhaustion
- Configure using memory huge pages
- CPU pinning
- Try with raw partitions vs qcow2
- Disable barriers in ext4
- Caching modes none and unsafe
- I/O schedulers
Must look:
- Confirm that overcloud boot times when performing a full deployment (not using -c) is actually >50% improvements than any other options and why!
Best time: (ubuntu end user)
[2014-03-30 13:30:34] Total runtime: 2783 s
[2014-03-30 13:30:34] ramdisk : 105 s
[2014-03-30 13:30:34] seed : 544 s
[2014-03-30 13:30:34] undercloud : 684 s
[2014-03-30 13:30:34] overcloud : 1422 s
-c (cirros)
[2014-03-28 16:57:18] Total runtime: 1088 s
[2014-03-28 16:57:18] ramdisk : 0 s
[2014-03-28 16:57:18] seed : 248 s
[2014-03-28 16:57:18] undercloud :277 s
[2014-03-28 16:57:18] overcloud : 555 s
NODE_CPU=4 NODE_MEM=8192 NODE_DISK=30 NODE_ARCH=amd64
seed=> NODE_CPU=4 NODE_MEM=8192 NODE_ARCH=amd64
<cpu mode='host-
Using cirros fo rend user image with -c
virtio
Ghe Rivero (ghe.rivero) wrote : | #30 |
Must look:
- Confirm that overcloud boot times when performing a full deployment (not using -c) is actually >50% improvements than any other options and why!
Discard. The wait_for overcloud loop is started after the end-user image is built.
stephen mulcahy (stephen-mulcahy) wrote : | #31 |
Re-running #23 without -u (devtest -c part)
[2014-03-31 09:00:16] Total runtime: 1966 s
[2014-03-31 09:00:16] ramdisk : 0 s
[2014-03-31 09:00:16] seed : 300 s
[2014-03-31 09:00:16] undercloud : 476 s
[2014-03-31 09:00:16] overcloud : 1181 s
omitting -u doesn't improve -c run performance
Robert Collins (lifeless) wrote : | #32 |
@ghe - a raw image rather than a raw partition should avoid the qcow2 overhead just as nicely.
Ghe Rivero (ghe.rivero) wrote : | #33 |
- Increasing the seed node only has impact when the images were built compressed previously.
Using huge-pages:
-c cirros
[2014-03-28 16:57:18] Total runtime: 917s
[2014-03-28 16:57:18] ramdisk : 0 s
[2014-03-28 16:57:18] seed : 253s
[2014-03-28 16:57:18] undercloud : 273s
[2014-03-28 16:57:18] overcloud : 381s
Changed in tripleo: | |
assignee: | nobody → Ghe Rivero (ghe.rivero) |
status: | Triaged → In Progress |
Ghe Rivero (ghe.rivero) wrote : | #34 |
The use of raw images or optimizing the disk fs (mounting ext4 with a mix of noatime, nobarrier or barrier=0) does not have a global impact on the boot times (there are some gains booting the undercloud, but some losts with the overcloud)
Ghe Rivero (ghe.rivero) wrote : | #35 |
CPU pinning and bigger hugepage size (1GB instead of 2MB) don't have a possitive impact on the boot times.
As a resume, big improvements can be achieve by:
- Using the virtio driver https:/
- Create the bm images compressed https:/
- Increase the specs of the seed vm (only has effect when the bm images are compressed) https:/
- Use a cirros image as an end user test https:/
- Make use of hugepages. (Should we offer this on the t-i story? Document it. Big improvement but not huge)
- Use of offline, mirrors, wheels.... (document it)
Small or partial improvements:
- Using tmpfs for the seed vm
- Using cpu=host-passthroug (need to test with newer CPUs)
- Increase bm vm specs
- Use of raw images for the seed
Future:
- Use of backing file for seed
- Perf. system during a full creation
- Everything is sequencial (dib seed-boot seed-dib uc - boot uc - dib oc -boot-oc). Some phased can be parallelized
- Perf. inside the image boot process
- Robert's comments in https:/
- ??? more ideas???
Changed in tripleo: | |
assignee: | Ghe Rivero (ghe.rivero) → James Polley (tchaypo) |
Changed in tripleo: | |
assignee: | James Polley (tchaypo) → Ghe Rivero (ghe.rivero) |
Openstack Gerrit (openstack-gerrit) wrote : Fix merged to tripleo-incubator (master) | #36 |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 1d684d65992bf0c
Author: Ghe Rivero <email address hidden>
Date: Mon Mar 31 09:43:32 2014 +0200
Allow to set seed node cpus and memory
Increasing the number of cpus and the memory for the seed vm,
has impact in the time needed to boot the undercloud when the
images are built compressed (without the -u option).
Booting times of the undercloud are reduced about 35% (More data
in the bug comments)
This can be done with the SEED_CPU and SEED_MEM (MB) env.
variables. (Defaults to 1 CPU and 2048M RAM, as current ones)
Partial-Bug: #1295732
Change-Id: Iac68701408696a
Ben Nemec (bnemec) wrote : | #37 |
We're no longer using devtest
Changed in tripleo: | |
status: | In Progress → Won't Fix |
Looking at http:// goodsquishy. com/downloads/ tripleo- jobs.html - specifically the check-tripleo- overcloud- precise timings - and they appear to match the timings seen herein.
Specifically taking http:// logs.openstack. org/52/ 76952/9/ check-tripleo/ check-tripleo- overcloud- precise/ ed953b6/ console. html gives the following:
2014-03-20 19:13:42.600 | Waiting for the overcloud stack to be ready
2014-03-20 19:33:42.240 | overcloud startup
So - that is a 20 minute gap waiting for overcloud startup.