Race condition of post-commit ovn update

Bug #1605089 reported by Han Zhou
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
Medium
Lucas Alvares Gomes

Bug Description

In ML2 mechanism driver, OVN updates are performed in post-commit, which is after neutron DB transaction. With multi-workers/nodes, the order of OVN updates are not guaranteed, which means the order of OVN updates can be different from that of DB transaction commit when there are racings between workers/nodes. This will result in inconsistent data between OVN and Neutron DB.

I had a simple test with below patch to make the problem easy to be reproduced (test port_update here).

diff --git a/networking_ovn/ml2/mech_driver.py b/networking_ovn/ml2/mech_driver.py
index e3e5050..277f487 100644
--- a/networking_ovn/ml2/mech_driver.py
+++ b/networking_ovn/ml2/mech_driver.py
@@ -11,7 +11,8 @@
 # License for the specific language governing permissions and limitations
 # under the License.
 #
-
+from time import sleep
+from random import randint
 import collections

 from neutron_lib.api import validators
@@ -568,6 +569,7 @@ class OVNMechanismDriver(driver_api.MechanismDriver):
         self._update_port_in_ovn(original_port, port, ovn_port_info)

     def _update_port_in_ovn(self, original_port, port, ovn_port_info):
+ time.sleep(random.randint(1, 3))
         external_ids = {
             ovn_const.OVN_PORT_NAME_EXT_ID_KEY: port['name']}
         admin_context = n_context.get_admin_context()

With this change I can easily get inconsistency between Neutron and OVN. Below is the example test for updating IPv4 address for same port at same time by 2 clients.

$ neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.8 & neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.10 &
[1] 29415
[2] 29416
$ Updated port: 182165b7-a634-430a-bbf0-f96783bca112

[2]+ Done neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.10
$ Updated port: 182165b7-a634-430a-bbf0-f96783bca112

[1]+ Done neutron port-update 182165b7-a634-430a-bbf0-f96783bca112 --fixed-ips type=dict list=true ip_address=10.0.0.8
$ neutron port-show 182165b7-a634-430a-bbf0-f96783bca112
+-----------------------+-------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------+
...
| fixed_ips | {"subnet_id": "d34a3b2f-cb0e- |
| | 417c-8125-5dfeb2d191a9", "ip_address": |
| | "10.0.0.8"} |
| | {"subnet_id": "1be4ecac- |
| | 2df5-4b18-a9b0-9ae1e191e721", |
| | "ip_address": |
| | "fd2a:66ba:3d0d:0:f816:3eff:fe8d:65"} |
| id | 182165b7-a634-430a-bbf0-f96783bca112 |
...
+-----------------------+-------------------------------------------+

$ ovn-nbctl lsp-get-addresses 182165b7-a634-430a-bbf0-f96783bca112
fa:16:3e:8d:00:65 10.0.0.10 fd2a:66ba:3d0d:0:f816:3eff:fe8d:65

We can see that in Neutron side the final result is IP "10.0.0.8", but in OVN north DB the final result is "10.0.0.10" for this port.

This race condition exists for most update operations that requires change to OVN side. It seems to be a general problem of neutron ML2 plugin (although the same problem existed on the old networking-ovn monolithic plugin).

Han Zhou (zhouhan)
description: updated
Changed in networking-ovn:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Richard Theis (rtheis) wrote :

I think this could impact create and delete operations as well since those are done in postcommit.

Revision history for this message
Numan Siddique (numansiddique) wrote :

@Han, @Richard - Few days before I sent an email to the ML with the subject - " Syncing neutron DB and OVN DB" [1]. Can async approach solves these kind of problems ? Do you have any comments/thoughts on this ?

[1] - http://lists.openstack.org/pipermail/openstack-dev/2016-July/099143.html

Revision history for this message
Amitabha Biswas (azbiswas) wrote :

Hi Numan, I replied to your proposal separately. If we were to use a journal thread to send CRUD to OVSDB, then either the sequence of queued operations would have to be correct or the queued operations must point to the correct neutron object before performing the CRUD operation. If the proposal can ensure that, then this particular problem should be solvable.

Revision history for this message
Han Zhou (zhouhan) wrote :

@Numan, I just read your email and it is really good information. It basically implements Neutron API as async call, and queuing the request within DB transaction, and the ordering is preserved by the journal thread "lock" that is implemented with state PROCESSING plus DB transaction "with_for_update", with the help of some validation functions for dependency checking (e.g. same object cannot be updated by 2 journal threads at the same time).

The dependency problem mentioned by @Amitabha is valid but I think it can be handled, and from ODL ml2 code it is handled by the validation checks (e.g. if a network creation is pending then the port creation depending on it will wait).

However, I didn't figure out how errors are handled with this model. For example, a port is created in Neutron but ODL controller failed to create it although the journal thread successfully sent the request to ODL. And I didn't see how the port states (UP & DOWN) are handled (so does it mean it will just be UP from the beginning?) It would be great if anyone can help answer this question.

Revision history for this message
Han Zhou (zhouhan) wrote :

@Richard, I think create and delete may not be a real issue because:

Create - before it complete and returns the client can't get the handle (uuid) thus would not initiate a conflicting operation for that object.

Delete - if the order is messed up with other operations, we would simply get an error from OVN IDL complaining that "object not exist", but won't result in inconsistent state between neutron and ovn.

Revision history for this message
Richard Theis (rtheis) wrote :

@Han, yes you are correct. Create and delete within ML2 post-commit should be OK. However, I think we have potential sync issues with the security group callbacks done post-commit. A post-commit failure could leave OVN and Neutron DB out-of-sync. Russell already has a TODO within the code for this. It seems like this also falls under this bug.

    def _process_sg_rule_notification(
            self, resource, event, trigger, **kwargs):
        sg_id = None
        sg_rule = None
        is_add_acl = True

        admin_context = n_context.get_admin_context()
        if resource == resources.SECURITY_GROUP:
            sg_id = kwargs.get('security_group_id')
        elif resource == resources.SECURITY_GROUP_RULE:
            if event == events.AFTER_CREATE:
                sg_rule = kwargs.get('security_group_rule')
                sg_id = sg_rule['security_group_id']
            elif event == events.BEFORE_DELETE:
                sg_rule = self._plugin.get_security_group_rule(
                    admin_context, kwargs.get('security_group_rule_id'))
                sg_id = sg_rule['security_group_id']
                is_add_acl = False

        # TODO(russellb) It's possible for Neutron and OVN to get out of sync
        # here. If updating ACls fails somehow, we're out of sync until another
        # change causes another refresh attempt.
        ovn_acl.update_acls_for_security_group(self._plugin,
                                               admin_context,
                                               self._nb_ovn,
                                               sg_id,
                                               rule=sg_rule,
                                               is_add_acl=is_add_acl)

Revision history for this message
Han Zhou (zhouhan) wrote :

@Richard, the kind of out-of-sync that can be solved by failure-rollback is a different problem. For this bug I just want to address the out-of-sync caused by race condition between multiple workers/nodes. Interestingly, if we can go with the async journal mechanism it may solve both problems.

Revision history for this message
Russell Bryant (russellb) wrote :

This discussion seems to have forked between this bug and mailing list. Can we just use the ML?

Revision history for this message
Richard Theis (rtheis) wrote :

@Russell, I will use the ML. Sorry about forking this discussion.

Revision history for this message
Han Zhou (zhouhan) wrote :

@Russell, when reporting the race issue I didn't know the ODL journal discussion. Numan linked them because the journal mechanism could be a solution to the problem here. We'd discuss in ML for the journal mechanism. But if there are any other idea to solve this bug it is welcome to be proposed here.

Revision history for this message
Russell Bryant (russellb) wrote :

It seems like we should be able to turn this into a failure-rollback scenario, as well. If we're updating a resource, we know the old and new values from the Neutron DB point of view. We can use ovsdb verify() calls to ensure that the old values we expect are still there in the OVN database for an update to be successful. Does that make sense?

Revision history for this message
Han Zhou (zhouhan) wrote :

@Russell, the neutron DB change is in precommit, and the OVN change is in postcommit, so how can we get the old value in postcommit? We can save the old value in another table during precommit, but that would be more complex than the journal approach.

Revision history for this message
Russell Bryant (russellb) wrote :

In the postcommit callback, we have access to both the current and old versions of the data.

The race condition you demonstrate in the report is for updating a port. In update_port_postcommit() in the OVN mech driver we have:

     def update_port_postcommit(self, context):
         port = context.current
         original_port = context.original
         self.update_port(port, original_port)

"original_port" is the old values for the port. "port" is the new values.

My suggestion is:

1) Go through and make sure that "original_port" matches the current data we have in our copy of the OVN northbound database. If they don't match, we've caught a race condition where another worker has already completed a 2nd Neutron database transaction, the OVN db transaction, and this worker's copy of the OVN database received that update.

2) If #1 looks good, use the OVSDB verify() capability to ensure that our OVN transaction does not succeed if the current values in the OVN database do not match what we had in our local copy.

For your example in this report, let's assume the port started with an address of 10.0.0.1.

Worker 1)
a) Update neutron DB from 10.0.0.1 to 10.0.0.10
b) Update OVN db to 10.0.0.10

Worker 2)
x) Update neutron DB from 10.0.0.10 to 10.0.0.8
y) Update OVN db to 10.0.0.8

The order was a -> x -> y -> b.

In my proposed scheme, I think (b) would have failed and Neutron and OVN would remain in sync. Analyzing original_port vs port would indicate that there is a change to fixed IPs. Original_port would indicate that the old value was 10.0.0.1. If our local copy was already updated to indicate that the port has 10.0.0.8, we would not proceed with updating the OVN db. If our local copy had 10.0.0.1, we would use verify() to ensure that the value was still 10.0.0.1 when the transaction is committed.

Revision history for this message
Han Zhou (zhouhan) wrote :

@Russell, forget my earlier comment about how to get the old value in postcommit :(

However, this doesn't solve the problem. In your example, y) would have failed, because worker 2) would try verify() with original_port value 10.0.0.10, but at that moment the local copy of OVN had 10.0.0.1 because b) had not happened yet.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

There is a simple pattern that solves the multi-update race at the cost of a duplicated DB read.

Have the OVN writer transmit the port to OVN backend. After that, read the latest state from the local DB, if it has changed, retransmit the port to the OVN backend.

Revision history for this message
Han Zhou (zhouhan) wrote :

@Kevin, that duplicated DB read may solve the problem in the example here, but in some cases it is not enough by simply overwriting the old value. For example, an update from A to B triggers removal of A from a network device (by some mech-driver) and then add B. In the race scenario A does not exist on the device yet, so it could trigger some unnecessary validation error. I agree that adding more logic may solve that problem again, but I think it is more clean to have the task-flow mechanism.

Revision history for this message
Han Zhou (zhouhan) wrote :

In addition, for the duplicated DB read patter to work, the re-read and re-apply must happen as a loop until no new changes are detected, otherwise, race between the re-applies of multiple workers can still result in same problem.

For this bug in OVN (where there is no such thing like validation error in network devices that I imagined), this pattern may be enough to solve the race problems, but it might still be tricky in some corner cases such as ACL update handling.

Revision history for this message
Richard Theis (rtheis) wrote :

FYI: I've opened https://bugs.launchpad.net/networking-ovn/+bug/1607451 to track the race conditions in the security group and rule callbacks.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

No, it does not need to re-read an re-apply. There will always be the final process that got something committed to get the latest state to the server. It only needs to be done once because the worker is just ensuring *it* didn't send stale data.

Also, your requests to a backend need to consider idempotency, even if taskflow is adopted (because a task can die mid-sync). Each port update should not make assumptions about the current state of the port on the backend.

Revision history for this message
Han Zhou (zhouhan) wrote :

@Kevin, think about this: 2 workers (w1 and w2) are checking neutron DB for same object after applying changes to OVN in postcommit. And at that moment there are 2 new updates happened: the first update change value to v1 before any of them recheck, and the second update change value to v2 after w1 recheck but before w2 recheck. So w1 will see v1 and w2 will see v2. Since there is no lock, w2 can apply change to OVN before w1, and w1 will overwrite it from v2 to v1. This is exactly same race problem. It won't be solved unless keep retrying to confirm there is no more change made to the object after its re-applying.

I agree with you about idempotency.

Han Zhou (zhouhan)
Changed in networking-ovn:
assignee: nobody → Han Zhou (zhouhan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/349121

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/349121
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=1b0db6ea0e5b663f1feb0ba03734d33bbbeeda9b
Submitter: Jenkins
Branch: master

commit 1b0db6ea0e5b663f1feb0ba03734d33bbbeeda9b
Author: Han Zhou <email address hidden>
Date: Fri Jul 29 16:03:34 2016 -0800

    Avoid error when removing addresses from address set

    There are race conditions [1] that can trigger address removal from
    address set, current code will result in a python exception:

       ValueError: list.remove(x): x not in list.

    Even worse, if address set is left inconsistent with neutron DB
    because of [1], e.g. an IP assigned to a port in neutron but not yet
    reflected in OVN address set, then future updates of IPs to that
    port will always fail on OVN side due to the exception above.

    [1] https://bugs.launchpad.net/networking-ovn/+bug/1605089

    Related-Bug: 1605089

    Change-Id: Id002594a17f410aea3f6ada4ddabc418d64ca63d

Revision history for this message
Kevin Benton (kevinbenton) wrote :

@Han

Whatever triggered the update while the two workers were reading data will also trigger a backend update and re-read itself.

However, there is a problem where you can have a worker pause after it reads from the DB on the verify step, then another worker comes in, does an update, verifies the DB and returns. Then the first worker updates the backend. To solve that problem everyone would need to re-read until it converges.

However, a journal does not even help the above problem unless you serialize all updates and have some sort of election system so there is only one worker that can operate on a given object at a time.

Revision history for this message
Han Zhou (zhouhan) wrote :

@Kevin,

> serialize all updates and have some sort of election system so there is only one worker that can operate on a given object at a time.

Exactly. And this is what I saw in the journal thread code of ODL plugin. I mentioned this in my earlier comment:

> the ordering is preserved by the journal thread "lock" that is implemented with state PROCESSING plus DB transaction "with_for_update", with the help of some validation functions for dependency checking (e.g. same object cannot be updated by 2 journal threads at the same time).

Revision history for this message
Richard Theis (rtheis) wrote :

I've opened https://bugs.launchpad.net/networking-ovn/+bug/1611852 for a similar bug that impacts the create and delete port flow.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

@Han,

'with_for_update' doesn't block on a mysql galera deployment across mysql nodes, you can't use that as a safe way to ensure multiple servers aren't writing to a backend.

Revision history for this message
Han Zhou (zhouhan) wrote :

@Kevin, let me explain more about the locking mechanism. For every object in the journal it has a state. Each thread trying to sync the object to backend must do an atomic check-and-update operation to update the state to "processing" ensure it is not yet in "processing" by other threads. 'with_for_update' is NOT to block the thread, but used to implement this atomic operation for updating the state field. If Galera implement the isolation levels correctly, then this should work. It means in Galera deployment, if there are 2 nodes trying to acquire the lock, i.e. updating the state of an object to "processing" in race, the later one doing commit will just fail and should give up handling that object.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/358447

Changed in networking-ovn:
status: Confirmed → In Progress
Revision history for this message
Kevin Benton (kevinbenton) wrote :

@Han, do you have a timeout mechanism or something to detect dead workers then?

Revision history for this message
Han Zhou (zhouhan) wrote :

@Kevin, yes, there is timeout mechanism to avoid being blocked by a record that is in processing by a dead worker.

BTW, in the patch I am working on I chose a less intrusive approach avoiding background workers, by just on-the-fly ordering enforcement in the same sync API call context. The task queue is still implemented in DB in the way we discussed above.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/358447
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

I'm gonna give this bug a go

Changed in networking-ovn:
assignee: Han Zhou (zhouhan) → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/462525

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/464221

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/464222

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/462525

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/464221
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=9d8b07a7b5d5dbe9a76573d611117beede6875f9
Submitter: Jenkins
Branch: master

commit 9d8b07a7b5d5dbe9a76573d611117beede6875f9
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed May 10 14:51:10 2017 +0100

    Add OVNClient for Ports and L3 resources

    Right now we have code doing CRUD operations on ONV resources spread
    around different places (ml2/mech_driver.py, l3/l3_ovn.py), this makes
    things very complicated to implement something like journaling (see
    bug: #1605089) where all resources, L2 and L3, are created from a
    central place (the journal thread, which keeps things in order).

    This patch is adding a new class called OVNClient which centralizes all
    the CRUD operations for the OVN resources in one place.

    For this first iteraction, the Port resource from the mechanism driver
    and the L3 resources (floating ip, routers and routers ports) are being
    moved to this new class, others will come in follow up patches.

    Related-Bug: #1605089
    Change-Id: Ia453a14cdc2ec393037fb813caff8fae23acd7c3

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/461103
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=1e612f9c1ef8d863c3f9421f2914f366406688ec
Submitter: Jenkins
Branch: master

commit 1e612f9c1ef8d863c3f9421f2914f366406688ec
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Apr 27 11:34:22 2017 +0100

    Add base migration scripts

    This patch contains the skeleton for migration scripts which will later
    be used as base for introducing new database tables for networking-ovn.

    Partial-Bug: #1605089
    Change-Id: Ie5b62bdb40e46b1856ac67a9519dbbf417c05784

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/471317

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/471317
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=fdd576101e1f131eebb8aad2297932adc052d36e
Submitter: Jenkins
Branch: master

commit fdd576101e1f131eebb8aad2297932adc052d36e
Author: Lucas Alvares Gomes <email address hidden>
Date: Fri Jun 2 18:17:20 2017 +0100

    Add network and subnet resources to OVNClient

    This patch is adding the network and subnet resources from the ml2
    driver into OVNClient so later in this series of patches they can make
    use of journaling.

    Related-Bug: #1605089
    Change-Id: I8477c246628711111c16f801c3dfad601eae9ba1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/475776

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/475777

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/468449
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=0cb2c4063e07402ab6dbadf2a27120990d55b268
Submitter: Jenkins
Branch: master

commit 0cb2c4063e07402ab6dbadf2a27120990d55b268
Author: Lucas Alvares Gomes <email address hidden>
Date: Fri May 26 09:56:17 2017 -0400

    Add database migration tests

    Simple patch adding database migration tests for OVN.

    Partial-Bug: #1605089
    Change-Id: I70e6fb1d2dc1324c4f7d3c7640d99d102c7985e3

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/475776
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=2b476b7ada34f1305d34b6f141a1e6fcec30c660
Submitter: Jenkins
Branch: master

commit 2b476b7ada34f1305d34b6f141a1e6fcec30c660
Author: Lucas Alvares Gomes <email address hidden>
Date: Tue Jun 20 11:03:52 2017 +0100

    Add security groups and security group rules to OVNClient

    This patch is adding the security groups and security group rules
    resources to the OVNClient so later we can use it for journaling.

    Partial-Bug: #1605089
    Change-Id: I452819c95173ca9a653a7557fdd7eeac6e4047d2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/490611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/464222

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/475777
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=ded2e5cd08e5e11b142bbddc9ad3b3d75d1a8d03
Submitter: Jenkins
Branch: master

commit ded2e5cd08e5e11b142bbddc9ad3b3d75d1a8d03
Author: Lucas Alvares Gomes <email address hidden>
Date: Tue Jun 20 11:42:38 2017 +0100

    Add if_exist parameter to SetLRouterPortInLSwitchPortCommand

    Like in other commands this patch is adding a "if_exist" parameter to
    SetLRouterPortInLSwitchPortCommand. This will later be used by the
    journal recovery module.

    Partial-Bug: #1605089
    Change-Id: I093bc5982a71bd2d893a41a53194be3384b44088

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/490611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/503703

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/503704

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/511268

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/511268
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=85c5f871a56bdaeaa3b8005b69765dea3f54ab59
Submitter: Zuul
Branch: master

commit 85c5f871a56bdaeaa3b8005b69765dea3f54ab59
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Oct 11 15:36:06 2017 +0100

    Simplify the L3 {create, update}_router() methods

    This patch is moving some of the logic from the {create,
    update}_router() methods of the L3 driver into the OVNClient. In
    summary, two things is being done in this patch:

    * Remove the "networks" parameter from the {create, update}_router()
      method in OVNClient. Now the list of network is fetched within these
      methods.

    * Remove the "delta" parameter from the update_router() method in
      OVNClient. We are already passing the new and old version of the
      router to that method, we know the differences we don't need to have
      an extra parameter for it.

    This patch is related to the work to that is being done at bug #1605089
    which uses the OVNClient to recovery the resources that are out of sync
    and by having OVNClient to handle the bulky work does simplify the
    synchronization work.

    Related-Bug: #1605089
    Change-Id: I8f0afb841e042f75e5062cca81717a3ecad17015

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/503703

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/503704

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/517918

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/520631

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/528746

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/529344
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=97f55707f3635bc88c7e01ffc8fc4b7fcb18b2cc
Submitter: Zuul
Branch: master

commit 97f55707f3635bc88c7e01ffc8fc4b7fcb18b2cc
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Dec 20 14:08:48 2017 +0000

    Refactor security groups

    This patch does two things:

    1. Delete the security group at the AFTER_DELETE notification level

    Prior to this patch we were using the BEFORE_DELETE notification level
    which can be problematic because if the deletion fails in Neutron we
    would have deleted it from OVN already.

    2. Remove security groups updates

    The only thing that the security group updates did was to update the
    external_ids of the Address_Set with its current name.

    That name wasn't used for anything in networking-ovn other than knowing
    whether the security group was created by networking-ovn or not.

    So, instead of having the name saved in the external_ids this patch
    changes it to the security group ID which is the canonical identifier of
    the resource and won't change.

    Partial-Bug: #1605089
    Change-Id: I5a8b64639d38e11050dae1008b14d48de14ecf94

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/527680
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=dd4d457411010878be0ec37accf106f5d62fe4a6
Submitter: Zuul
Branch: master

commit dd4d457411010878be0ec37accf106f5d62fe4a6
Author: Lucas Alvares Gomes <email address hidden>
Date: Fri Dec 8 16:12:33 2017 +0000

    Refactor Floating IPs related methods

    This patch is bringing logic for handling floating ips out of the L3
    driver and into OVNClient.

    The code now uses the ``external_ids`` column in the NAT table from
    OVNDB to bind the FIP from Neutron DB with it (before we used a
    combination of external + logical ips to identify those entries).

    The values added in the ``external_ids`` are:

    * neutron:fip_id = The ID of the FIP in neutron

    * neutron:fip_port_id: This is used in the update_floatingips() method
      to compare and see whether the port_id of the FIP has changed in the
      update or not.

    * neutron:router_name: The name of the router which the FIP is associated
      with.

    The ovn_db_sync.py script was updated to make use the updated methods
    in OVNClient to create and delete floating IPs, that way we avoid the
    code duplication that existed between this script and the L3 driver.

    Partial-Bug: #1605089
    Change-Id: I21e9728c93392de4c579ea00c34059ad6812f682

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/530229

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/517049
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=290c3d83c5d27ed986d8c2c2f118a74f314bf963
Submitter: Zuul
Branch: master

commit 290c3d83c5d27ed986d8c2c2f118a74f314bf963
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Nov 1 16:24:16 2017 +0000

    Check for correctness when updating networks

    This patch is saving the revision number from the Neutron network object
    into OVNDB and using it for checking for correctness when updating the
    OVN resource.

    The "original_network" parameter has been removed from the
    update_network() methods. Since the stragey here is to drop patches with
    older revision numbers we won't be creating a delta between two neutron
    versions anymore. Instead, we will always make sure that OVN is sync'd
    with what is latest in Neutron.

    Partial-Bug: #1605089
    Change-Id: Ie817a559bd708f6601d7004187f0f088e8b09b47

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/517918
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=0d37e8bc9804072abed6253a50c32bf119d5a9af
Submitter: Zuul
Branch: master

commit 0d37e8bc9804072abed6253a50c32bf119d5a9af
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Nov 2 13:28:13 2017 +0000

    Optimize inconsistency detection (Part 1)

    Create the database model and migration scripts for saving the revision
    number of the OVN objects in the Neutron database.

    Update the methods related to the network resource to use the new table.

    Next patches in the series will use this new table to detect when the
    resource is out of sync and fix it.

    Related-Bug: #1605089
    Change-Id: Id14e9e8b88797acec11b72117c6985c49ea713c9

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/520631
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=247ed26470edf6ec3a7210ad427339e9d05164af
Submitter: Zuul
Branch: master

commit 247ed26470edf6ec3a7210ad427339e9d05164af
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Nov 9 16:09:57 2017 +0000

    Check for ports correctness

    This patch is updating networking-ovn to check for correctness when
    creating, updating or deleting ports.

    It does:

    * Create the deltas using the information stored in OVNNB instead of
    original_port parameter

    * Save the revision number from Neutron in the ovn_revision_number table
    and in OVNDB

    * Expand the maintenance periodic task to detect and fix inconsistencies
    for ports

    Note that DNS records still can get out of sync but should be fixed by
    https://bugs.launchpad.net/networking-ovn/+bug/1739257

    Partial-Bug: #1605089
    Change-Id: I1b2366743d76e93c8a2b19c06bcba10a3a29c7f6

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/532183
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=849659a26ba70981636d8d1a0303a35f97620aca
Submitter: Zuul
Branch: master

commit 849659a26ba70981636d8d1a0303a35f97620aca
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Jan 3 10:12:21 2018 +0000

    Refactor Routers

    This patch is refactoring the routers resource to adhere with the
    database sync specification.

    The update_router() method from OVNClient has been modified to not
    depend on the "original_router" parameter anymore. A layer to make it
    backward compatible with existing routers and OVS version < 2.8.2 was
    added as well.

    Partial-Bug: #1605089
    Change-Id: I1744c91004fc36e0b97ebc51230f876eaaa3bd7f

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/531033
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=c2e6038fa17cb2cbfb095320827a59eb681112e8
Submitter: Zuul
Branch: master

commit c2e6038fa17cb2cbfb095320827a59eb681112e8
Author: Daniel Alvarez <email address hidden>
Date: Thu Jan 4 00:22:30 2018 +0100

    Check for sg_rules correctness

    This patch is updating networking-ovn to check for correctness when
    creating or deleting security group rules.

    This patch also allows to insert duplicate entries in ACL table in
    case that two indentical rules belong to different SGs. Each acl will
    reference to its own SG rule in the external_ids column so that we
    can ensure consistency across Neutron and OVN objects. The main
    drawback is that duplicated acls will make ovn-northd insert duplicate
    lflows in SB database which, in turn, makes ovn-controller drop the
    flows when it's processing the logical flows and log INFO messages. To
    overcome this, I have sent a patch [0] to core OVN so that
    ovn-controller logs those traces as DBG instead and reduce noise.
    Please see the references in the commit message at [0] and the
    discussion around this.

    Partial-Bug: #1605089

    [0] https://github.com/openvswitch/ovs/commit/5905b28f1abff1a295ea37654d887154b0dc2bc0

    Change-Id: Ie2659ecb84193d58d35ced6b8fb0b89fc03cf6e7
    Signed-off-by: Daniel Alvarez <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/532576
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=4946547a3ea552bf5848adadd6896e7fa9b715e2
Submitter: Zuul
Branch: master

commit 4946547a3ea552bf5848adadd6896e7fa9b715e2
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Jan 10 15:46:17 2018 +0000

    Check for routers correctness

    This patch is updating networking-ovn to check for correctness when
    creating, updating or deleting routers.

    Partial-Bug: #1605089
    Change-Id: Ida7e6100071a1e484927ca9638702e438918937a

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/535771

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/528746
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=8fd9411b8a787c3a2d638937228a138eeec96f9e
Submitter: Zuul
Branch: master

commit 8fd9411b8a787c3a2d638937228a138eeec96f9e
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Dec 14 13:20:06 2017 +0000

    Check for floating ips correctness

    This patch is updating networking-ovn to check for correctness when
    creating, updating or deleting floating ips.

    Partial-Bug: #1605089
    Change-Id: I377007c955809b8d56af93e24f0914e446f56bb2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/530229
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=82a770933215b1dbeeea90f2bbe41a6abc97cba3
Submitter: Zuul
Branch: master

commit 82a770933215b1dbeeea90f2bbe41a6abc97cba3
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Dec 27 10:09:25 2017 +0000

    Check for subnets correctness

    This patch is updating networking-ovn to check for correctness when
    creating, updating or deleting subnets.

    Change-Id: I619efda775471e51bf70a63b987abf719aae995f
    Partial-Bug: #1605089

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/535771
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=3fed2a9eeebca621013bebaeb8a41d6a7e5c67d9
Submitter: Zuul
Branch: master

commit 3fed2a9eeebca621013bebaeb8a41d6a7e5c67d9
Author: Lucas Alvares Gomes <email address hidden>
Date: Fri Jan 19 13:04:08 2018 +0000

    Maintenance task: Ordering resources by type

    When fixing the creation or update of a resource, it's better to sort it
    by its type starting with the root resources and leaving the leaf ones
    at the end. For example, in case a network and a port failed to be
    created, if the maintenance thread tries to create the port prior to
    creating the network it will fail.

    The same could happens to deletion, it's preferable to delete the leaf
    resources prior to deleting the root ones to avoid any conflict.

    This patch is adding such sorting mechanism to the maintenance.py database
    module that is used by the maintenance task when listing the resources
    that needs to be fixed.

    Change-Id: Ibc8312bf70b221626e389ea68bfa4c69f6a2cb67
    Partial-Bug: #1605089

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/536407
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=dc84c23a868e81d5d0df30c34de8b3904f1945e6
Submitter: Zuul
Branch: master

commit dc84c23a868e81d5d0df30c34de8b3904f1945e6
Author: Lucas Alvares Gomes <email address hidden>
Date: Mon Jan 22 12:45:41 2018 +0000

    Maintenance: Avoid code duplication

    This patch is refactoring the common/maintenance.py module to avoid code
    duplication when fixing resources.

    Related-Bug: #1605089
    Change-Id: Iacf73e4c8a5b5effad890888244745c98527991d

Changed in networking-ovn:
milestone: none → 2015.1.1
milestone: 2015.1.1 → none
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Closing this since the Port Groups and the database synchronization work are already completed.

Changed in networking-ovn:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.