time for zun to create container and reach Running state is 19 times as long as directly using docker

Bug #1809306 reported by Jacob Burckhardt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zun
Fix Released
High
hongbin
kuryr-libnetwork
Fix Released
Undecided
hongbin

Bug Description

My queens installation of zun seems to take a long time to create a container and get to a running state. I ran two tests. One test is the python script at the end of this email.

I ran that three times and below are the number of seconds that it took:

17.9 seconds
15.1 seconds
17.2 seconds

The average is 16.7 seconds.

The second test used docker directly without using zun. For that test I ran:

time bash -c 'docker run --detach cirros sleep 999d; docker ps -a --format "{{.Image}} {{.Command}} {{.Status}}"'

The output included this line which shows that it was running:

cirros "sleep 999d" Up Less than a second

I ran that on my zun compute node which means my zun test and docker-alone test used the same virtual machine which means they both ran on the same hardware in order to make it a fair comparison.

I ran that three times and below are the number of seconds that it took:

0.85 seconds
0.92 seconds
0.88 seconds

The average is .88 seconds.

So zun takes 16.7 / .88 = 19 times as long as docker alone to create and reach running state.

Is my test a fair comparison? Perhaps not since zun does more than docker-alone. For example, zun needs time to decide which compute node to run a container on. It also might need time for other things like maybe transferring a docker image to the compute node. I'm willing to wait a reasonable amount of time for those important tasks, but I am still surprised that they would take an extra 15.8 seconds.

Although this test creates only one container, my end goal for using zun is to automatically create hundreds of containers which means this amount of overhead is a burden. Therefore, I would appreciate if you can tell me if there is anything I can do to make it faster or if there is something the zun developers can do to improve the performance of zun.

Thanks.

Below are the logs from one of my runs of zun which I got from this command:

journalctl -u docker -u kuryr-libnetwork -u zun-compute -n 20

The lines which occurred after the test start time are:

Dec 20 11:46:44 pdv-packstack-02-cmp-02.pdvcorp.com zun-compute[1955]: 2018-12-20 11:46:44.636 1955 INFO zun.image.driver [req-ea1699d6-0eba-49cc-a83b-b33243919db3 b47c6f17563544ed8a649dac3e7aa0c8 3d5b74606b3e4e32a20d1ef2a8d2f617 default - -] Loading container image driver 'docker'
Dec 20 11:46:45 pdv-packstack-02-cmp-02.pdvcorp.com zun-compute[1955]: 2018-12-20 11:46:45.176 1955 INFO zun.image.driver [req-ea1699d6-0eba-49cc-a83b-b33243919db3 b47c6f17563544ed8a649dac3e7aa0c8 3d5b74606b3e4e32a20d1ef2a8d2f617 default - -] Loading container image driver 'docker'
Dec 20 11:46:52 pdv-packstack-02-cmp-02.pdvcorp.com wait_for_identity_and_kuryr_readiness.py[922]: 2018-12-20 11:46:52.261 1369 INFO werkzeug [-] 127.0.0.1 - - [20/Dec/2018 11:46:52] "POST /IpamDriver.RequestAddress HTTP/1.1" 200 -
Dec 20 11:46:53 pdv-packstack-02-cmp-02.pdvcorp.com ovs-vsctl[16796]: ovs|00001|vsctl|INFO|Called as ovs-vsctl add-port br-int qvoce8b8aa4-9c -- set interface qvoce8b8aa4-9c external_ids:attached-mac=fa:16:3e:15:ad:23 external_ids:iface-id=ce8b8aa4-9c9f-4962-9dd9-7e80a6340cd1 external_ids:vm-id=2110e8cc858e676754eca058ca3053197f78377a9238874e84b4265f7e5e8450 external_ids:iface-status=active external_ids:owner=kuryr
Dec 20 11:46:56 pdv-packstack-02-cmp-02.pdvcorp.com wait_for_identity_and_kuryr_readiness.py[922]: 2018-12-20 11:46:56.922 1369 INFO werkzeug [-] 127.0.0.1 - - [20/Dec/2018 11:46:56] "POST /NetworkDriver.CreateEndpoint HTTP/1.1" 200 -
Dec 20 11:46:57 pdv-packstack-02-cmp-02.pdvcorp.com wait_for_identity_and_kuryr_readiness.py[922]: 2018-12-20 11:46:57.710 1369 INFO werkzeug [-] 127.0.0.1 - - [20/Dec/2018 11:46:57] "POST /NetworkDriver.Join HTTP/1.1" 200 -
Dec 20 11:46:57 pdv-packstack-02-cmp-02.pdvcorp.com wait_for_identity_and_kuryr_readiness.py[922]: 2018-12-20 11:46:57.910 1369 INFO werkzeug [-] 127.0.0.1 - - [20/Dec/2018 11:46:57] "POST /NetworkDriver.ProgramExternalConnectivity HTTP/1.1" 200 -
Dec 20 11:46:58 pdv-packstack-02-cmp-02.pdvcorp.com wait_for_identity_and_kuryr_readiness.py[922]: 2018-12-20 11:46:58.100 1369 INFO werkzeug [-] 127.0.0.1 - - [20/Dec/2018 11:46:58] "POST /NetworkDriver.EndpointOperInfo HTTP/1.1" 200 -
Dec 20 11:46:58 pdv-packstack-02-cmp-02.pdvcorp.com wait_for_identity_and_kuryr_readiness.py[922]: 2018-12-20 11:46:58.261 1369 INFO werkzeug [-] 127.0.0.1 - - [20/Dec/2018 11:46:58] "POST /NetworkDriver.EndpointOperInfo HTTP/1.1" 200 -
Dec 20 11:46:58 pdv-packstack-02-cmp-02.pdvcorp.com dockerd-current[1721]: time="2018-12-20T11:46:58-08:00" level=info msg="Firewalld running: false"
Dec 20 11:46:58 pdv-packstack-02-cmp-02.pdvcorp.com oci-umount[16936]: umounthook <debug>: prestart container_id:bd2a64404252 rootfs:/var/lib/docker/overlay2/53ebcfa1df234b4b1961284cfe01289fd5c6e53b756f23bf637d8bc02ff2789e/merged
Dec 20 11:46:58 pdv-packstack-02-cmp-02.pdvcorp.com zun-compute[1955]: 2018-12-20 11:46:58.838 1955 INFO zun.container.docker.driver [req-6b8b6b33-8db7-497a-86c2-33700744acb3 - - - - -] Status of container f799ab89-5a63-4fe7-89e6-9e305b16743e changed from Created to Running

Below is my script to time zun:

#!/usr/bin/env python

from keystoneauth1 import loading
from keystoneauth1 import session
from zunclient import client
import time

AUTH_URL='http://pdv-packstack-02:5000'
USERNAME='internal'
PASSWORD='foo'
VERSION='1.12' # master -> 1.latest, stable/rocky -> 1.26, stable/queens -> 1.12

zun = client.Client(VERSION, session=session.Session
                    (loading.get_plugin_loader('password') \
                     .load_from_options(auth_url=AUTH_URL,
                                        username=USERNAME,
                                        password=PASSWORD,
                                        user_domain_name='default')))

start_time = time.time()

print 'Requested creation of', \
    zun.containers.run(name='cirros0',
                       image_driver='docker',
                       image='cirros',
                       command=['sleep', '999d']).name

while zun.containers.get('cirros0').status != 'Running':
    pass

print 'seconds to start running =', time.time() - start_time

Revision history for this message
hongbin (hongbin034) wrote :

Short answer:

In Rocky, we landed an optimization for Kuryr-libnetwork: https://github.com/openstack/kuryr-libnetwork/commit/17db307e274035fbe65e99a29eec6638e6ec1e2d , which might significantly improve the performance. I can backport the fix to Queen so that you can give another try if you want.

Long answer:

Zun relies on Kuryr-libnetwork to provide the network connectivity. Kuryr-libnetwork is a Docker network plugin which is responsible for:

* Make API calls to Neutron to provision networking resources (i.e. security groups, ports).
* Do the port binding for the container. Generally speaking, it essentially creates a veth pair, plug one end to the Neutron-managed virtual switch and push the other end into the container.

If you are directly calling Docker API to create the containers, above steps are skipped so it is much faster.

We are trying to optimize Kuryr-libnetwork to cut the unnecessary API calls to Neutron. By looking forward, we are looking for an optimization in Neutron side: https://blueprints.launchpad.net/neutron/+spec/speed-up-neutron-bulk-creation so that we can batch-create hundreds of Neutron ports in a reasonable time. As a result, it will be much faster to create hundreds of containers.

At the meanwhile, I will look into other possible ways to optimize the performance further. This is the area we are working hard to improve.

Changed in zun:
assignee: nobody → hongbin (hongbin034)
Revision history for this message
Jacob Burckhardt (jburckhardt) wrote :

My supervisor said we probably won't keep using Queens much longer, so there is no need to backport. But we do appreciate that you offered.

I'm glad to hear about the future plans for even more improvements like the batch-creation of Neutron ports.

Thanks.

Revision history for this message
hongbin (hongbin034) wrote :

I ran your script in master branch and the result is as below:

9.46601200104 seconds
9.04494690895 seconds
9.10332393646 seconds

The average is about 9 seconds. The time is shorter than 16 seconds (possible due to this fix: https://github.com/openstack/kuryr-libnetwork/commit/17db307e274035fbe65e99a29eec6638e6ec1e2d ). I will look at how to further optimize the time in short term and long term.

Changed in zun:
status: New → Confirmed
importance: Undecided → High
hongbin (hongbin034)
Changed in kuryr-libnetwork:
assignee: nobody → hongbin (hongbin034)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kuryr-libnetwork (master)

Fix proposed to branch: master
Review: https://review.openstack.org/627054

Changed in kuryr-libnetwork:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/627057

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kuryr-libnetwork (master)

Reviewed: https://review.openstack.org/627054
Committed: https://git.openstack.org/cgit/openstack/kuryr-libnetwork/commit/?id=610fd5f02443f2b97e9d10557bc680868951dbb9
Submitter: Zuul
Branch: master

commit 610fd5f02443f2b97e9d10557bc680868951dbb9
Author: Hongbin Lu <email address hidden>
Date: Sat Dec 22 23:00:14 2018 +0000

    Remove unnecessary port update

    Kuryr update pre-created neutron port twice. The first one is in
    'ipam_request_address' and it updates the following attributes:
    * name
    * admin_state_up
    * mac_address

    The second port update is in 'network_driver_create_endpoint' and
    several attributes are written including name and mac_address.
    This commit remove the first port update to optimize the performance.
    The update of admin_state_up will be moved to the second port update.

    Change-Id: I743b2088366d910902775cabefa43be2865e37c5
    Partial-Bug: #1809306

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/627055
Committed: https://git.openstack.org/cgit/openstack/kuryr-libnetwork/commit/?id=dc3cf2cc93a36006f8c16450a3e11b146314258c
Submitter: Zuul
Branch: master

commit dc3cf2cc93a36006f8c16450a3e11b146314258c
Author: Hongbin Lu <email address hidden>
Date: Sat Dec 22 23:45:16 2018 +0000

    Tag resource if it was not tagged

    On adding a tag to a resource (port/subnet/subnetpool/network),
    if the resource already has the tag, skip the neutron API call to
    add_tag because the tags will be unchanged.
    This will avoid one or two Neutron API calls and optimize the
    performance a bit.

    Partial-Bug: #1809306
    Change-Id: I8a181d6af5228007d36fa8b217fec7f766dd37fd

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/627057
Committed: https://git.openstack.org/cgit/openstack/kuryr-libnetwork/commit/?id=b194b0732fd4c0ed7ecaf211de06952339b38e3e
Submitter: Zuul
Branch: master

commit b194b0732fd4c0ed7ecaf211de06952339b38e3e
Author: Hongbin Lu <email address hidden>
Date: Sun Dec 23 00:23:02 2018 +0000

    Untag resource if it was tagged

    On removing a tag to a resource (port/subnet/subnetpool/network),
    if the resource doesn't have the tag, skip the neutron API call to
    remove_tag because the tags will be unchanged.
    This will avoid one or two Neutron API calls and optimize the
    performance a bit.

    Partial-Bug: #1809306
    Change-Id: Iffa5fbf27c64f1aad0a9bb97da9c4e7c5015a196

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/627198
Committed: https://git.openstack.org/cgit/openstack/kuryr-libnetwork/commit/?id=5fb8275d1202d6ab8880e9fd021e22c59353ce4f
Submitter: Zuul
Branch: master

commit 5fb8275d1202d6ab8880e9fd021e22c59353ce4f
Author: Hongbin Lu <email address hidden>
Date: Mon Dec 24 23:35:29 2018 +0000

    Don't update existing port if no change

    This allows external service pre-create a neutron port with
    'binding:host_id' populated before passing the port to kuryr.
    This is important because setting 'binding:host_id' is slow.
    This blocks the docker daemon for serving other requests,
    which blocks the whole system.
    We want to have this field pre-populated so that the docker
    daemon won't be blocked by this API call.

    This commit also allows the 'device_owner' field to be
    customized if users choose to pre-populate the 'binding:host_id'
    field. In addition, this commit skips the API call to update the
    neutron port if the 'binding:host_id' is pre-populated
    and 'admin_state_up' and 'mac_address' fields are unchanged.

    Closes-Bug: #1809306
    Change-Id: I09c4a51410dffaec21ab0bb3db85df8c776c92e5

Changed in kuryr-libnetwork:
status: In Progress → Fix Released
Changed in zun:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to zun (master)

Reviewed: https://review.openstack.org/627199
Committed: https://git.openstack.org/cgit/openstack/zun/commit/?id=6023978ae7b9a7fdc960cb577ec3e1474d5677a8
Submitter: Zuul
Branch: master

commit 6023978ae7b9a7fdc960cb577ec3e1474d5677a8
Author: Hongbin Lu <email address hidden>
Date: Mon Dec 24 23:38:39 2018 +0000

    Set binding:host_id on creating/updating port

    This patch depends on a change on kuryr-libnetwork [1],
    which skip populating 'binding:host_id' if it is already
    populated. This commit leverages the kuryr's change and
    has the 'binding:host_id' filled.

    This effectively moves a neutron API call from kuryr
    to zun but it increases the performance since docker
    daemon are blocked if the slow API call is in kuryr side.

    [1] https://review.openstack.org/#/c/627198/

    Depends-On: https://review.openstack.org/#/c/627198/
    Partial-Bug: #1809306
    Change-Id: I67563ea069cafab16500bd8ba598743b7ae87fe6

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kuryr-libnetwork 3.0.0

This issue was fixed in the openstack/kuryr-libnetwork 3.0.0 release.

hongbin (hongbin034)
Changed in zun:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.