Comment 4 for bug 1552680

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Copying comment I left on the patch here for RFE discussion:

Many of the races we have been dealing with are not necessarily caused by multiple schedulers but are instead caused by concurrent create/update/deletes of the routers. For this to be an effective strategy, we would need to lock every operation that can mutate the state that is contentious (i.e. basically every router operation).

A lock only guarantees that everyone else attempting to acquire the same lock will not be executing at the same time. If just one operation doesn't attempt to acquire the lock before mutating state, every other operation is now at risk of breaking in a difficult to debug way (because the thing that breaks wont actually be the thing with the bug).

In other words, using locks changes our strategy from being defensive to making assumptions about everything that might change an object (e.g. other service plugins, core plugins, ML2 drivers, extensions, etc). This isn't something we can adopt lightheartedly. It needs to be a fundamental shift in the way we allow everything (even out-of-tree things) modify state.