Clark Kent can over-consume system resources

Bug #1435494 reported by Galen Charlton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Wishlist
Unassigned
2.6
Fix Released
Undecided
Unassigned
2.7
Fix Released
Undecided
Unassigned

Bug Description

Clark Kent can sometimes consume more RAM, swap space, or CPU than is reasonable or productive. For example:

- a badly constructed query with multiple Cartesian joins may never terminate, potentially tying up a Clark child process, pegging a CPU on the database server, and/or causing significant scratch disk usage on the database server keeping a snapshot alive.
- a query that returns a very large number of rows can cause a Clark child to bloat, and in extreme cases cause a OOM on the server running Clark.
- a report that asks for a chart of an unreasonably large number of rows can peg a CPU on the Clark server as GD::Graph attempts to compute sub-pixel graph elements.

In each of these cases, a request report may never finish.

A forthcoming patch series will allow limits to be placed on Clark's resource usage, settable via opensrf.xml and command-line switches.

Evergreen master

Galen Charlton (gmc)
Changed in evergreen:
importance: Undecided → Wishlist
tags: added: pullrequest reports
Revision history for this message
Galen Charlton (gmc) wrote :

A patch can be found at the tip of the user/gmcharlt/lp1435494-clip-the-cape-of-clark branch in the working/Evergreen repository:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1435494-clip-the-cape-of-clark

Revision history for this message
Ben Shum (bshum) wrote :

We took the opportunity to start testing this today. A note on the new settings in opensrf.xml...

For resultset_limit, it's currently empty and that can result in some errors when running reports without a defined default.

For us, we saw reports error out with errors like:

DBD::Pg::st execute failed: ERROR: syntax error at or near "0"
LINE 43: ) limited_to_HASH(0x2a974f8)_hits LIMIT HASH(0x2a974f8)
                           ^ at /openils/bin/clark-kent.pl line 243.

Galen hypothesized in IRC that clark might be reading the empty <resultset_limit></resultset_limit> as undefined and that's causing the hash to get passed in there and break things.

For us, setting <resultset_limit>1000000</resultset_limit> in opensrf.xml (1 million rows!) allowed us to proceed with report running and continue further tests.

More to follow once we see how it runs for a bit and ask it to do some unreasonable reports beyond the limits we've specified so far.

Ben Shum (bshum)
Changed in evergreen:
status: New → Triaged
milestone: none → 2.next
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Maybe there should be a reasonable limit default limit for resultset_limit, so people don't encounter that message, or so that they don't have reports failing until they notice it?

I don't know what a reasonable limit is, maybe 1000000. I'm just throwing this out there as a suggestion.

Revision history for this message
Galen Charlton (gmc) wrote :

I'm not opposed to setting a default resultset_limit - I'd suggest 1,048,576 to match the row limit for XSLX spreadsheets.

The reason why the original patch didn't set a default resultset limit is that unlike the other limits, the user has no way of knowing (for sure) that their report ran into it.

I'll make a follow-up patch so that we'll at least not run into the failure that Ben saw, and I'll await comment on whether to set a default resultset_limit.

Revision history for this message
Thomas Berezansky (tsbere) wrote :

I will suggest 1,048,575 instead, so that when we add a header row we get 1,048,576 total rows.

Revision history for this message
Galen Charlton (gmc) wrote :

I've pushed three more patches to the branch to address the feedback to date and to add release notes.

Revision history for this message
Ben Shum (bshum) wrote :

Been running this for over a week without problems. Pushing along to master for great new feature for 2.next (likely 2.9!)

Thanks Galen!!

Changed in evergreen:
status: Triaged → Fix Committed
Revision history for this message
Galen Charlton (gmc) wrote :

Any objections to backporting to rel_2_8 for inclusion in 2.8.1?

Revision history for this message
Ben Shum (bshum) wrote :

Based on feedback in IRC, no objection and going as far as backporting to rel_2_7 too.

Cheers, thanks Galen.

Changed in evergreen:
milestone: 2.next → 2.8.1
Revision history for this message
Ben Shum (bshum) wrote :

And rel_2_6 too now.

Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.