Comment 13 for bug 1605089

Revision history for this message
Russell Bryant (russellb) wrote :

In the postcommit callback, we have access to both the current and old versions of the data.

The race condition you demonstrate in the report is for updating a port. In update_port_postcommit() in the OVN mech driver we have:

     def update_port_postcommit(self, context):
         port = context.current
         original_port = context.original
         self.update_port(port, original_port)

"original_port" is the old values for the port. "port" is the new values.

My suggestion is:

1) Go through and make sure that "original_port" matches the current data we have in our copy of the OVN northbound database. If they don't match, we've caught a race condition where another worker has already completed a 2nd Neutron database transaction, the OVN db transaction, and this worker's copy of the OVN database received that update.

2) If #1 looks good, use the OVSDB verify() capability to ensure that our OVN transaction does not succeed if the current values in the OVN database do not match what we had in our local copy.

For your example in this report, let's assume the port started with an address of 10.0.0.1.

Worker 1)
a) Update neutron DB from 10.0.0.1 to 10.0.0.10
b) Update OVN db to 10.0.0.10

Worker 2)
x) Update neutron DB from 10.0.0.10 to 10.0.0.8
y) Update OVN db to 10.0.0.8

The order was a -> x -> y -> b.

In my proposed scheme, I think (b) would have failed and Neutron and OVN would remain in sync. Analyzing original_port vs port would indicate that there is a change to fixed IPs. Original_port would indicate that the old value was 10.0.0.1. If our local copy was already updated to indicate that the port has 10.0.0.8, we would not proceed with updating the OVN db. If our local copy had 10.0.0.1, we would use verify() to ensure that the value was still 10.0.0.1 when the transaction is committed.