Serials: avoid infinite loops on the server side at holdings summarization time

Bug #1177038 reported by Lebbeous Fogle-Weekley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned
2.2
Fix Released
Undecided
Unassigned
2.3
Fix Released
Undecided
Unassigned
2.4
Fix Released
Undecided
Unassigned

Bug Description

For Evergreen 2.2+

This is a more holistic/comprehensive (yes I've said that before; sorry) fix targeting bugs in the same general neighborhood as bug 1124248.

    Serials: MFHD::get_compressed_holdings() can reach infinite loop

    Even controlled serials holdings involve the internal creation of MFHD
    fields, upon which caculations are performed for such purposes as the
    display of holdings summaries in the OPAC.

    There are too many ways that incorrect MFHD (or MFHD that our code just
    can't yet handle) can lead our MFHD routines to crash. We can't address
    all these possibilities in a single bug fix. But we can avoid this
    infinite loop.

    A subroutine within open-ils.serial (_summarize_contents()) relies on
    MFHD::get_compressed_holdings(). When the latter went into an infinite
    loop the result would be an open-il.serial drone process consuming CPU
    time indefinitely and, depending on the data that provoked the loop,
    potentially writing repeating messages to stderr indefinitely.

    End users will still see the item receiving fail in these cases, and be
    obliged to work around the issue as before until more robust
    holdings summarization code can be written, but at least the overall
    condition of the running Evergreen system won't be affected, and there
    will be better information in the error logs.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/senator/serials-loops-please-stop

I have built this branch as two commits to assist with testing. With just the first commit applied to a running Evergreen system, you can perform a test (see attachments coming shortly with a comment on this bug report) that will reproduce the infinite loop situation, and you can see an open-ils.serial drone process consuming CPU time endlessly with the top or ps commands. After applying the second commit, the same test will not lead to an infinite loop and an error will be returned reasonably quickly.

Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

It would seem Launchpad only allows one attachment at a time.

The attached SQL script included with this comment stages the necessary SQL for the test case described in the main description section of this bug report.

Run this via psql (on a test, not production system, or you're operating at your own risk; well, you are in all cases anyway).

When you've finished with testing, this data can be cleaned up with these SQL statements:

DELETE FROM biblio.record_entry WHERE id = 4000000;
DELETE FROM serial.subscription WHERE id = 4000000; -- sic

Those two should be enough to cascade to everything introduced in the SQL script.

Next attachment contains a srfsh script

Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

To run this test, save the attachment somewhere from which you can run srfsh.

Replace AUTHTOKENHERE with a currently valid authtoken that has the RECEIVE_SERIAL permission (or just a general superuser login).

Run:
  srfsh < serials-loop-test-case.srfsh

With the first but not the second commit from the branch linked in the bug report description, this test should cause an open-ils.serial drone process to get stuck in an infinite loop (end it with a regular invocation of the kill command).

With the second commit, this test should result fairly quickly in an error message, and there should be no looping process.

Revision history for this message
Remington Steed (rjs7) wrote :

I have tested this and it works as expected. I have signed off, and Dan Wells pushed it to master. Special thanks Lebbeous for the extra work you did to make this easy to test!

Revision history for this message
Dan Wells (dbw2) wrote :

Backported and pushed all the way through 2.2. Thanks again, Lebbeous.

Changed in evergreen:
status: New → Fix Committed
Ben Shum (bshum)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.