Zun

Race condition on creating docker network

Bug #1743498 reported by hongbin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zun
Fix Released
High
hongbin

Bug Description

Currently, we lazily create docker network. At runtime, we check if the network is already there. If not, create the network (see _get_or_create_docker_network at zun/container/docker/driver.py).

This has a race condition:

1. Request A checks for network with name N but couldn't find one
2. Request B checks for network with name N but couldn't find one
3. Request A create a network with name N
4. Request B create another network with name N

Then, there are two duplicated network created. This will cause failure in the system. Such failure occasionally happened in the gate:

  http://logs.openstack.org/75/532775/2/check/zunclient-devstack-docker-sql/b75ae7e/logs/screen-zun-compute.txt.gz#_Jan_12_12_09_01_797249

To resolve the problem, there are several options. One option is to record each docker network in database and impose a unique constraints on name to avoid duplication. There should be other options available. Choose the optimal solution if any.

hongbin (hongbin034)
Changed in zun:
importance: Undecided → High
status: New → Triaged
Rajat Sharma (tajar29)
Changed in zun:
assignee: nobody → Rajat Sharma (tajar29)
Revision history for this message
KS Kim (kiseok7) wrote :

I think, 'oslo_concurrency.lockutils' could help this problem.
With using the '@lockutils' decorator, we can make a function called once at the time across threads.
(Other threads will be waiting for unlocking)

code could look like this:

    from oslo_concurrency import lockutils

    @lockutils.synchronized('get_or_create_docker_network')
    def _get_or_create_docker_network(self, context, network_api,
                                      neutron_net_id):
 ...

Revision history for this message
hongbin (hongbin034) wrote :

Hi @Kiseok,

The approach you suggested will work if lockutils supports locking across multiple nodes. This is because docker network is across multiple compute nodes so we need to do the locking across all nodes as well. I am not sure if lockutils satisfies this requirements. If not, perhaps lockutils can be enhanced to support it ? If it is, this approach sounds good.

Revision history for this message
KS Kim (kiseok7) wrote :

@hongbin,

I missed for multi compute nodes.
'oslo_concurrency.lockutils' is a kind of local lock-file,
so this approach with lockutils doen't work across over multi compute nodes.

Rajat Sharma (tajar29)
Changed in zun:
assignee: Rajat Sharma (tajar29) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to zun (master)

Fix proposed to branch: master
Review: https://review.openstack.org/570693

Changed in zun:
assignee: nobody → hongbin (hongbin034)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to zun (master)

Reviewed: https://review.openstack.org/570693
Committed: https://git.openstack.org/cgit/openstack/zun/commit/?id=574bbc1acd56e8096e931248df33c96c0e50e069
Submitter: Zuul
Branch: master

commit 574bbc1acd56e8096e931248df33c96c0e50e069
Author: Hongbin Lu <email address hidden>
Date: Sun May 27 21:45:15 2018 +0000

    Prevent race condition on creating network

    We leverage the unique constraint on network table to gurantee
    at most one docker network can be created for one neutron network.
    In particular, this patch moves the creation of network object from
    API layer to the kuryr network driver, in which the docker network
    is created. This basically ensures that each call to create docker
    network is after a creation of network object and pass the unique
    constraint imposed by the DB layer.

    In addition, this patch also adds a destroy_network implementation
    in objects and DB layer. We will clean up the DB record if the
    network creation failed in docker.

    Change-Id: I9507e90452c5af4cf1a64a045a07a9e63200b2b4
    Closes-Bug: #1743498

Changed in zun:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/zun 3.0.0.0rc1

This issue was fixed in the openstack/zun 3.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.