Evergreen

Clark Kent can over-consume system resources

Bug #1435494 reported by Galen Charlton on 2015-03-23

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Evergreen	Fix Released	Wishlist	Unassigned	Evergreen 2.8.1
2.6	Fix Released	Undecided	Unassigned	Evergreen 2.6.8
2.7	Fix Released	Undecided	Unassigned	Evergreen 2.7.5

Bug Description

Clark Kent can sometimes consume more RAM, swap space, or CPU than is reasonable or productive. For example:

- a badly constructed query with multiple Cartesian joins may never terminate, potentially tying up a Clark child process, pegging a CPU on the database server, and/or causing significant scratch disk usage on the database server keeping a snapshot alive.
- a query that returns a very large number of rows can cause a Clark child to bloat, and in extreme cases cause a OOM on the server running Clark.
- a report that asks for a chart of an unreasonably large number of rows can peg a CPU on the Clark server as GD::Graph attempts to compute sub-pixel graph elements.

In each of these cases, a request report may never finish.

A forthcoming patch series will allow limits to be placed on Clark's resource usage, settable via opensrf.xml and command-line switches.

Evergreen master

Tags:

Galen Charlton (gmc) on 2015-03-23

Changed in evergreen:
importance:	Undecided → Wishlist
tags:	added: pullrequest reports

Revision history for this message

Galen Charlton (gmc) wrote on 2015-03-23:

A patch can be found at the tip of the user/gmcharlt/lp1435494-clip-the-cape-of-clark branch in the working/Evergreen repository:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/gmcharlt/lp1435494-clip-the-cape-of-clark

Revision history for this message

Ben Shum (bshum) wrote on 2015-03-30:

We took the opportunity to start testing this today. A note on the new settings in opensrf.xml...

For resultset_limit, it's currently empty and that can result in some errors when running reports without a defined default.

For us, we saw reports error out with errors like:

DBD::Pg::st execute failed: ERROR: syntax error at or near "0"
LINE 43: ) limited_to_HASH(0x2a974f8)_hits LIMIT HASH(0x2a974f8)
^ at /openils/bin/clark-kent.pl line 243.

Galen hypothesized in IRC that clark might be reading the empty <resultset_limit></resultset_limit> as undefined and that's causing the hash to get passed in there and break things.

For us, setting <resultset_limit>1000000</resultset_limit> in opensrf.xml (1 million rows!) allowed us to proceed with report running and continue further tests.

More to follow once we see how it runs for a bit and ask it to do some unreasonable reports beyond the limits we've specified so far.

Ben Shum (bshum) on 2015-03-30

Changed in evergreen:
status:	New → Triaged
milestone:	none → 2.next

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2015-03-30:

Maybe there should be a reasonable limit default limit for resultset_limit, so people don't encounter that message, or so that they don't have reports failing until they notice it?

I don't know what a reasonable limit is, maybe 1000000. I'm just throwing this out there as a suggestion.

Revision history for this message

Galen Charlton (gmc) wrote on 2015-03-30:

I'm not opposed to setting a default resultset_limit - I'd suggest 1,048,576 to match the row limit for XSLX spreadsheets.

The reason why the original patch didn't set a default resultset limit is that unlike the other limits, the user has no way of knowing (for sure) that their report ran into it.

I'll make a follow-up patch so that we'll at least not run into the failure that Ben saw, and I'll await comment on whether to set a default resultset_limit.

Revision history for this message

Thomas Berezansky (tsbere) wrote on 2015-03-30:

I will suggest 1,048,575 instead, so that when we add a header row we get 1,048,576 total rows.

Revision history for this message

Galen Charlton (gmc) wrote on 2015-04-09:

I've pushed three more patches to the branch to address the feedback to date and to add release notes.

Revision history for this message

Ben Shum (bshum) wrote on 2015-04-10:

Been running this for over a week without problems. Pushing along to master for great new feature for 2.next (likely 2.9!)

Thanks Galen!!

Changed in evergreen:
status:	Triaged → Fix Committed

Revision history for this message

Galen Charlton (gmc) wrote on 2015-04-10:

Any objections to backporting to rel_2_8 for inclusion in 2.8.1?

Revision history for this message

Ben Shum (bshum) wrote on 2015-04-10:

Based on feedback in IRC, no objection and going as far as backporting to rel_2_7 too.

Cheers, thanks Galen.

Changed in evergreen:
milestone:	2.next → 2.8.1

Revision history for this message

Ben Shum (bshum) wrote on 2015-04-10:

#10

And rel_2_6 too now.

Evergreen Bug Maintenance (bugmaster) on 2015-05-16

Changed in evergreen:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.