Hold targeter taking a long time to run
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
New
|
Undecided
|
Unassigned |
Bug Description
Evergreen master / 2.4
So since our upgrade, we've noticed that the hold_targeter script is taking unusually long to run its course. Symptoms of that started immediately when we had email notifications arrive from cron telling us that it couldn't run hold_targeter again while it was still running. The first day, it ran into itself throughout the whole day (our interval is every 15 minutes), the second day, it quieted down a little, and by the third day it was noticeably quieter but still complaining periodically.
My assumption here is that as the holds take their time being processed, it slowly spreads out over the hours and thus leaves fewer holds per interval to be processed eventually leading to less warnings about hold targeter already running. Instead of running hardest from 1 am through 5 am, now it runs over 1 am through 10 am or later, or something akin to that. As a result of that time displacement of hold processing, I think our libraries who run their hold pull lists in the morning are seeing more fluctuation than normal as they go to view holds, print holds, and check in holds. By the time they reach the last step and even while printing holds from moment to moment, the actual pull list may have changed to include more titles, different titles, or fewer titles, which disrupts staff workflows.
Another symptom that we've noticed is that our postgresql logs are regularly containing entries like:
automatic vacuum of table "evergreen.
Which seems to indicate that the hold targeter process might be taking longer than before and actually causing it to bump against the autovacuum processes. This shows up every couple hours for us too.
Just reporting potential performance issue and hoping to gather feedback from other master or 2.4 sites to see if they have also noticed the hold targeter running longer than before or impacting other areas.
Evergreen Indiana Consortium is on 2.2 so this may or may not relate. Our utility server handles the holds targeter and recently it crashed in off hours. We restarted processes during core library hours and noticed that due to the manner in which the select is performed (holds not checked in last 24 hours or later) the holds targeting was being performed during library core hours. The processing appears to be spreading out over time as Ben describes in his original bug report due to the 15 minute cron running of the job and the length of time it actually takes to process the 12K of holds that we have. However, the spread is currently only through library core hours and none have actually been moved outside of those hours for processing.