neutron

Bug #1552680
Comment #5

Comment 5 for bug 1552680

Revision history for this message

John Schwarz (jschwarz) wrote on 2016-03-11:

I'm not interested in locking every router operation since it's counter-productive in more than one way (in addition to what you wrote, doing so will eventually be a severe bottle-neck we'd like to avoid). I am interested in locking specific codepaths which we know to be raceful - such as the schedule() function for example. Since not every router create/update/delete calls schedule() this mitigates this issue a bit while keeping the lowest common denominator (ie. where we create HA networks and ports and where we bind a router to an agent) safe.

Sure, a lock only guarantees safety if used properly and starting to use locks will necessarily mean changing our thinking to make sure everything that is raceful and difficult to solve without locks is locked (ie. not all of Neutron), but consider the following:
a. we probably all agree that locking huge chunks of codepaths is a bad idea, so we can just avoid it in favor of the above,
b. doing what we are doing now (ie. solving races with retries, etc) is making the code a lot more complicated and is subject to the same problems (ie. parts of the code that change behavior in a raceful way that we are not aware of is practically the same as locking the problematic code and forgetting just one operation since we are not aware of it either).

With that in mind, I strongly believe that while both approaches fail for "forgotten paths", the paths we do remember will be a lot safer with a DLM in place (since it *should* block the entire logic instead of just retrying specific things inside the logic).