Comment 16 for bug 1007027

Revision history for this message
Mike Bayer (zzzeek) wrote :

I'm not sure what approach we want to take on this, though I am partial towards leaving the "ping" in, as while it is not foolproof, until we can agree upon an approach that will wrap all API methods in a retry decorator, it does decrease the chance of an error when the DB has been restarted and a new API method starts up, before the application has had the chance to hit the database.

Let me point out a critical feature that is not present here nor in the linked blog post - that this is a ping per *transaction*, not per execution. When using Session.begin(), the connection is checked out from the pool just at that point, then the ping happens, and that's it.

Now with Nova and others, there is a major architectural issue which is that they call upon get_session() on a per-function-call basis in many cases, and due to the way EngineFacade is designed right now, this means that some API calls run multiple transactions on multiple, simultaneous connections at once - this is the reason we had this error: https://bugs.launchpad.net/oslo.db/+bug/1367354. So in my view, the "SELECT 1" degrading performance *is* something that will require code changes to Nova, but I'd like to propose a rework of EngineFacade first which will have some consistent pattern that hopefully can apply to all consuming projects (Neutron in particular has a very different pattern it seems, I'm going to try to come up with something there as well).

The pattern I'm thinking probably is along the lines of a decorator, so in theory the retry-method-on-disconnect could happen there as well (it will be able to track nesting of calls as well). IMHO a "SELECT 1" that is done only per-transaction, within a pattern where the scope of transactions is transparently managed via function-level declarations, is not a major performance issue, though if we agree upon transparent retry for all API methods then we can remove it. Though transparent retry is not without issues, if the API method also does something else that is not atomically part of the transaction, like writing a file, starting a job, or something. I'm not sure to what degree we have to worry about that.