Race conditions in security group and rule callbacks

Bug #1607451 reported by Richard Theis
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Won't Fix
High
Unassigned

Bug Description

This bug is loosely related to https://bugs.launchpad.net/networking-ovn/+bug/1605089.

The OVN ML2 driver registers callbacks for security group and security group rule changes. These callbacks have potential race conditions as highlighted below and as noted partially in the code with the following TODO:

    # TODO(russellb) It's possible for Neutron and OVN to get out of sync
    # here. If updating ACls fails somehow, we're out of sync until another
    # change causes another refresh attempt.

Race condition #1 between X and Y:
1) X: Create security group 'foo' and associated OVN address sets.
2) X: Delete OVN address sets as part of deleting the security group.
3) Y: Update port to use the security group.
4) X: Delete the security group from neutron DB.

Step 4 will fail because the security group is in use and will result in no OVN address sets for the security group in use. I was able to recreate this failure by adding a sleep after the OVN address set delete.

                 elif event == events.BEFORE_DELETE:
                     txn.add(self._nb_ovn.delete_address_set(
                             name=utils.ovn_addrset_name(sg['id'], ip_version)))
+ if event == events.BEFORE_DELETE:
+ import time
+ time.sleep(15)

Handling the delete failure via an ABORT callback may help solve this race condition. A similar race condition may impact deleting a security group rule.

Race condition #2 between X and Y:
1) X: Create security group 'foo' and associated OVN address sets.
2) Y: Create port 'bar' that uses the security group.
3) X: Add security group rule to the security group.
4) X: Build list of ACLs to add for the security group ports.
5) Y: Update port 'bar' to no longer use the security group.
6) X: Add ACLs for the new security group rule.

Step 6 will result in ACLs being added to the port which are not correct. I was able to recreate this scenario by adding a sleep before the OVN ACL update.

+ import time
+ time.sleep(15)
     ovn.update_acls(list(lswitch_names),
                     iter(port_list),
                     acl_new_values_dict,

Note: A similar race condition likely exists if step 5 was update to be
a port create to use the security, port delete and/or security group delete.

Race condition #3 between X and Y:
This is similar to race condition #2 but for deleting a security group rule.

The solution for this bug may resolve out-of-sync conditions which could occur if OVN is unable to complete the ACL updates for a security group rule create.

Richard Theis (rtheis)
description: updated
Changed in networking-ovn:
status: New → In Progress
description: updated
Richard Theis (rtheis)
Changed in networking-ovn:
status: In Progress → Confirmed
Richard Theis (rtheis)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/347507
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=64a0faff01ac5d06aa290783f988935492e452c9
Submitter: Jenkins
Branch: master

commit 64a0faff01ac5d06aa290783f988935492e452c9
Author: Richard Theis <email address hidden>
Date: Tue Jul 26 11:59:12 2016 -0500

    Fail address set update if doesn't exist on port create

    Fail an address set update on port create if the address set does not
    exist. This helps prevent OVN address sets from getting further
    out-of-sync since this failure will prevent the port from being
    created in the neutron DB.

    Port update was not changed because there are general race conditions
    that need to be handled by [1]. For now, port update should do as much
    as it can since a failure won't undo neutron DB changes.

    The functional tests needed to be temporarily updated while the
    OVN address set sync support is in progress [2].

    Other OVN address set and ACL race conditions will be handled by [3].

    [1] https://bugs.launchpad.net/bugs/1605089
    [2] https://review.openstack.org/#/c/341882
    [3] https://bugs.launchpad.net/bugs/1607451

    Change-Id: Ib79231ad347ab76812240de44a6cbf7464ea0c74
    Related-Bug: #1560817
    Related-Bug: #1607451

Changed in networking-ovn:
importance: Undecided → High
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Marking as WONTFIX since we no longer rely on address sets after port groups was introduced in core OVN and networking-ovn.

Changed in networking-ovn:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.