Comment 2 for bug 2013187

Revision history for this message
Paul Goins (vultaire) wrote :

I've been looking at this a bit.

In one particular environment where we're hitting this, I see we're doing a substring match on a table containing 5,714 records, checking against 196,834 different deleted instance UUIDs, for a worst case of 1,124,709,476 substring comparisons. That is insane.

I'm hoping to spend some time on my side experimenting (since I happen to like these types of SQL issues), but there's a few things we can try in the queries from the gist:

* We can reduce the load by running in a loop against subsets of the deleted instance UUIDs, e,g, 1,000 at a time. That brings the load of a single query down by orders of magnitude, and seems perhaps the lowest risk path to making the scripts feasible to run regularly for cleanup purposes.

* We can consider dropping particularly old instance records from the nova instances table. If we create and destroy instances all the time, this table will grow without bound and make the problem progressively worse unless we trim old records. That being said, I'm not sure of all the references elsewhere to these stale instance IDs - the Gnocchi DB is one case where it's obvious we should leverage the deleted instance IDs to identify unneeded data, but there may be other cases... I'm not enough of an expert here to assert that this is a good path to go down or not.