Batch fine generator benefits from low-hanging optimization

Bug #1705728 reported by Bill Erickson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
New
Wishlist
Unassigned

Bug Description

Evergreen 2.12 / Wishlist

The batch fine generator could be smarter about caching org unit settings. It makes 3 to 5 org unit setting lookup calls per transaction. On my concerto-based server, I see 1830 org setting lookups for 394 circs. On a large data set, this can have a significant impact on performance. With effective caching, this number can be dropped to 3 to 5 org setting lookups per circ lib instead of per transaction.

To leverage the caching, the fine generator has to collect and process parallel batches within the API (a la hold targeter v2) instead of having the caller collect the batches and run a single API call per transaction.

Patch en route. Running this patch in a test cluster with a sizable data set shaved 25%-30% off the batch generator run time and, perhaps more importantly, avoids several million duplicate API/database calls.

Note the proposed changes have no impact on single-transaction fine generation.

Tags: performance
Revision history for this message
Bill Erickson (berick) wrote :

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1705728-fine-gen-cache-parallel

From the commit:

* Fine generator caches org unit setting values per instance. Once cached, the number of cstore calls per transaction is reduced by 3 to 5 calls, depending on context / settings.

* Fine generator disconnects from cstore after processing each transaction giving cstores a chance to recycle and avoid memory gobbling on huge batches.

* Fine generator now collects parallel batches of transactions directly within the server-side generator API instead of requiring the caller to collect transactions up front for individual processing. This lets us take advantage of the org setting caching.

* Fine generator script improvements:
** Arguments are passed via GetOpt, with support for legacy-style opensrf config and lockfile name passing (with warnings).
** supports a --parallel option to override the value from opensrf.xml.
** Sets OSRF_LOG_CLIENT for log tracing.

==

Unclear if release notes or tests are required, since there are no functionality changes to the generator itself and all fine generator script changes are backwards compatible. Testing essentially boils down to "does the fine generator still work, a little faster".

For further data, running the patches on my concerto-based VM shaves 48% off the run time.

tags: added: pullrequest
Changed in evergreen:
milestone: none → 3.0-alpha
assignee: Bill Erickson (berick) → nobody
Bill Erickson (berick)
tags: removed: pullrequest
Revision history for this message
Bill Erickson (berick) wrote :

Removing pullrequest for now. After running the patch for several days on a large data set, the storage drones, which are not recycling as quickly now, are slowly consuming enough memory to have a noticeable impact on the system.

Will revisit.

Changed in evergreen:
milestone: 3.0-alpha → none
Revision history for this message
Mike Rylander (mrylander) wrote :

Bill, I think perhaps a smaller overall change would be:
 * Continue to use MultiSession
 * Generate a session token in the client script, perhaps based on it's pid, and send that along with each circ id
 * Have the storage backend use that session token as a prefix to a memcache key for grabbing circ lib org settings out of the cache
 * If not there, grab them from the db and populate the memcache slot for other backends.

That has the benefit of reducing the cost of the settings by a ton, without complicating the server logic with having to know about which slot it's in and and how many there are of them. It also has the benefit of reducing failure-to-process to just one circ, rather than up to $circ_count/$parallel if the server side had a problem.

Revision history for this message
Mike Rylander (mrylander) wrote :

I'm going to put something together that looks like what I describe, and we can choose which way to go after looking at both. I think it'll be pretty quick to put together.

Revision history for this message
Mike Rylander (mrylander) wrote :
Revision history for this message
Bill Erickson (berick) wrote :

My last comment lead me to open bug #1706147 which would solve the memory issue.

Thanks for the input, Mike.

I was also concerned about one explosion bringing the whole thing down, but since each circ is processed in an eval block, it seemed pretty durable.

I opted not to use MultiSession because it does not support overriding the recv timeout. (TODO: open LP bug for this). I wasn't keen on requiring an OpenSRF change for this, but in light of bug #1706147, it may be heading down that path anyway.

Replacing DB calls with memcache calls would be a great improvement and I like the idea of a client-generated cache token. If bug #1706147 (or similar) proves to be impractical then that would be my preferred path as well. If we're going to get our hands dirty, though, I still think removing all of that extra traffic is a better reward for the effort.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

I am looking at options to improve our fine generator performance. It presently runs for 45 minutes every hour. We run it every hour because we have hourly fines and staff like to be able to tell patrons how much they owe in fines without having to check things in or do any math in their heads.

I would greatly appreciate an update on this bug, i.e. which branch should I look at, if any, to possibly add these optimizations.

Thanks, Bill & Mike for the discussion.

Jason

Michele Morgan (mmorgan)
tags: added: performance
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.