Comment 3 for bug 640065

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 640065] Re: appserver deployment must not interrupt live requests

On Mon, Sep 27, 2010 at 9:34 PM, Tom Haddon <email address hidden> wrote:
> I'm a little confused here - how on earth can we be having connections
> that take so long? It seems like this is largely recorded as non-SQL
> time, so what is it?

Two places:

Firstly, the 'legimitate' ones: proxied librarian files are handed off
to the appserver event loop and served (fairly) efficiently, if you
ignore the whole copy-to-a-tempfile thing. They are dependent on
client performance, and can be huge (openoffice debs, for instance).
Deploying the restricted-librarian-to-public will fix that. Its
pending QA as a high-sev RT.

Secondly, the timeout code for requests works by raising an error when
something checks 'is there time remaining'. Anything that does not
check, will not timeout. We don't [yet] try to inject errors into the
thread mid-request, we let the threads cooperate.

Currently the following things check for timeouts:
 - google webservice lookups (at the start)
 - DB requests
 - possibly a couple of other things

I plan to put many more things into that set, but doing so will
instantly transform those pages into hard errors, so am biding my
time.

Note to that a page which does a 2 second query and spends 60 seconds
in a bad python loop, simply won't timeout, so we're not going to
permanently fix this until we can inject an exception into the thread
(and have moved the restricted librarian stuff off).

Oh, and then there is AJAX long-poll in the future ;) - so we're
looking at needing long migration times irrespective of bugs.

-Rob