Comment 4 for bug 1499647

Revision history for this message
Assaf Muller (amuller) wrote : Re: L3 HA: extra L3HARouterAgentPortBinding created for routers

You're right that during the create_router flow we added bindings with l3_agent=None, and only populate that field later during router scheduling. The other place you can add bindings (With l3_agent already populated) is the router RPC sync, as stated earlier. In this case, what I think is happening is that the RPC sync call is adding a binding (With l3_agent already populated), then the create_router flow adds one too many bindings. In this case, the unique constraint addition is not needed. You can't have (Right?) two racing RPC calls coming from the same agent. The only issue we have to solve is to make sure that create_router adds the appropriate number of bindings, it cannot assume that bindings don't already exist.

Having said all that, I'm not sure if this solution is even correct. What if the create_router flow added the base router DB object, then the RPC sync call comes in. At this point, an HA router doesn't exist, and the router's VRID is not set either. The RPC call will add a binding, and when it tries to create an HA port it will fail because the HA network doesn't exist yet. Maybe the create_router created the HA network in time but didn't set the VRID yet. In this case the RPC call will most likely succeed, but the agent will fail to configure the router because the VRID field is empty. This is ugly!

The simpler and more robust solution is to make the HA router create_router method atomic, put everything apart from the notification in a transaction. The issue here is that we use the core plugin to create ports and networks, and those calls can involve HTTP and RPC calls.

I'm not sure what is the right solution here. Thoughts?