Race conditions in security group and rule callbacks
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
networking-ovn |
Won't Fix
|
High
|
Unassigned |
Bug Description
This bug is loosely related to https:/
The OVN ML2 driver registers callbacks for security group and security group rule changes. These callbacks have potential race conditions as highlighted below and as noted partially in the code with the following TODO:
# TODO(russellb) It's possible for Neutron and OVN to get out of sync
# here. If updating ACls fails somehow, we're out of sync until another
# change causes another refresh attempt.
Race condition #1 between X and Y:
1) X: Create security group 'foo' and associated OVN address sets.
2) X: Delete OVN address sets as part of deleting the security group.
3) Y: Update port to use the security group.
4) X: Delete the security group from neutron DB.
Step 4 will fail because the security group is in use and will result in no OVN address sets for the security group in use. I was able to recreate this failure by adding a sleep after the OVN address set delete.
+ if event == events.
+ import time
+ time.sleep(15)
Handling the delete failure via an ABORT callback may help solve this race condition. A similar race condition may impact deleting a security group rule.
Race condition #2 between X and Y:
1) X: Create security group 'foo' and associated OVN address sets.
2) Y: Create port 'bar' that uses the security group.
3) X: Add security group rule to the security group.
4) X: Build list of ACLs to add for the security group ports.
5) Y: Update port 'bar' to no longer use the security group.
6) X: Add ACLs for the new security group rule.
Step 6 will result in ACLs being added to the port which are not correct. I was able to recreate this scenario by adding a sleep before the OVN ACL update.
+ import time
+ time.sleep(15)
ovn.
Note: A similar race condition likely exists if step 5 was update to be
a port create to use the security, port delete and/or security group delete.
Race condition #3 between X and Y:
This is similar to race condition #2 but for deleting a security group rule.
The solution for this bug may resolve out-of-sync conditions which could occur if OVN is unable to complete the ACL updates for a security group rule create.
description: | updated |
Changed in networking-ovn: | |
status: | New → In Progress |
description: | updated |
Changed in networking-ovn: | |
status: | In Progress → Confirmed |
description: | updated |
Changed in networking-ovn: | |
importance: | Undecided → High |
Reviewed: https:/ /review. openstack. org/347507 /git.openstack. org/cgit/ openstack/ networking- ovn/commit/ ?id=64a0faff01a c5d06aa290783f9 88935492e452c9
Committed: https:/
Submitter: Jenkins
Branch: master
commit 64a0faff01ac5d0 6aa290783f98893 5492e452c9
Author: Richard Theis <email address hidden>
Date: Tue Jul 26 11:59:12 2016 -0500
Fail address set update if doesn't exist on port create
Fail an address set update on port create if the address set does not
exist. This helps prevent OVN address sets from getting further
out-of-sync since this failure will prevent the port from being
created in the neutron DB.
Port update was not changed because there are general race conditions
that need to be handled by [1]. For now, port update should do as much
as it can since a failure won't undo neutron DB changes.
The functional tests needed to be temporarily updated while the
OVN address set sync support is in progress [2].
Other OVN address set and ACL race conditions will be handled by [3].
[1] https:/ /bugs.launchpad .net/bugs/ 1605089 /review. openstack. org/#/c/ 341882 /bugs.launchpad .net/bugs/ 1607451
[2] https:/
[3] https:/
Change-Id: Ib79231ad347ab7 6812240de44a6cb f7464ea0c74
Related-Bug: #1560817
Related-Bug: #1607451