Comment 15 for bug 742916

Revision history for this message
Gary Poster (gary) wrote :

Thank you, Robert. That was more than I was hoping for!

The recursive solution is cool, but AFAICT it does not meaningfully solve the bug. It is 25.5 seconds from a cold cache on staging. The original bug description is concerned about a 23.2 second query on production, also from a cold cache.

This really is an interesting bug. For other timeouts, we are usually more interested in warm cache problems. This bug points out that we will not have zero OOPS until cold cache lookups are fast too. That's a much higher standard than we've had to deal with so far (and in fact a higher standard than I've ever used before, in any project, perhaps sadly). Another bug I just looked at, bug 787294, is "fixed" because a warm cache lookup is now under our thresholds. If we had applied a cold cache constraint to that query, there's no way it would have met the desired characteristics--and solving it would have needed much more thought and work, I suspect.

For this particular bug, it does look like we need a schema change to fix it.

I'll keep this bug in my back pocket for now, and maybe return to it later if no-one else has tackled it--and meanwhile I'll raise the point about hot cache and cold cache OOPSes on the list.