Evergreen version: 2.0 (and presumed to affect earlier versions)
oilsExtendIDL(), which is invoked during initialization of the cstore, pcrud, and rstore, can silently hang if a given table cannot be selected from because it is locked or tied up by another process' hanging transaction. If this occurs during an app server or brick start, this can result in a situation where every service except cstore/pcrud/rstore is running but there's zero hint to the sysadmin that cstore failed to initialize and, more importantly, why.
To avoid this problem, I suggest having oilsExtendIDL() set a session statement timeout to something reasonable (like 5 seconds) and either abort the initialization of cstore/pcrud/rstore if an IDL query times out or at least squawk loudly enough that a sysadmin has a better chance of figuring out that she needs to check the database server for stuck transactions. Since currently all clients of oilsExtendIDL() immediately close the database connection after the IDL has been scanned, a session statement timeout for oilsExtendIDL() will not interfere with other queries.
This seems like a reasonable plan. It will require some investigation of how libdbi interacts with statements that time out, and when they do we should loudly log the failure and stop trying to do anything -- the cstore-ish backend should immediately exit after logging the failure. Not doing so will cause the backend in question to have incomplete metadata about the database, which will end up causing failures when trying to manipulate the timed-out database object.