Serial summarization should be more flexible

Bug #1180039 reported by Dan Wells on 2013-05-14
This bug affects 2 people
Affects Status Importance Assigned to Milestone

Bug Description

Our current use of "get_compressed_holdings()" from the MFHD module requires data which perfectly matches the pattern given. When this is not the case, it generates warnings to highlight the problems, but the end result returned can be unpredictable.

This branch adds a new method, get_combined_holdings(), as a less strict alternative to deal with these issues. It does so by only looking at the ends of any given ranges and (as the name implies) combining where they meet or overlap.

While testing for this feature, an edge case was discovered in get_compressed_holdings(). Basically, it would fail if you tried to compress both an open-ended holding and a closed (or single) holding with the same starting point. A fix for that bug is included in a separate commit. I can open a second bug if needed, but it seems simpler all around to keep them together.

In addition, the MFHD test suite has been expanded for both the new method and this edge case.

Branch here:;a=shortlog;h=refs/heads/user/dbwells/serials_combined_holdings


I'm excited to test this soon, and I didn't want it to go without comment any longer.

Independently, Anoop Atre may have an opportunity to test this as well, so you can expect feedback from one or both of us soonish (wish I could be more specific).

None of this is to discourage testing by anyone else who may be interested, of course.

Ben Shum (bshum) on 2013-05-19
Changed in evergreen:
status: New → Confirmed
Dan Wells (dbw2) on 2013-06-12
Changed in evergreen:
milestone: 2.5.0-m1 → 2.5.0-m2

There is one way this new branch is less flexible toward what is arguably* wrong data.

When issuances are associated with a pattern code that defines, for example, subfields $a, $b, $i, and $j, if in the holding codes we occasionally find an unwanted $k or we're missing the $b or the $j, this routine dies with "Can't use an undefined value as an ARRAY reference at [...] line 636"

The old code seemed to ignore some abberrations of this nature. I'm not sure that was any more correct than the current behavior. Maybe we need to identify these cases with a specific error message?

I can provide example data if you don't see what I mean or if you can't reproduce it readily.

* I say "arguably" because with unexpected extra issues, who knows what holding codes may be appropriate?

Dan Wells (dbw2) wrote :

Lebbeous, thank you very much for testing. I think we can find a way to work around such cases without too much trouble.

As for whether this type of data is wrong, I'd feel pretty confident arguing that it is. If a holding doesn't conform to the caption in a truly meaningful and sensible way, it really needs a new caption. Take as many as you need; they're free! ;)

We already know that isn't always going to happen, so I think we can get away with marking the missing subfields as unknown/missing (the asterisk in MFHD), then throwing away the extras and seeing what happens. I'll see what I can do. Thanks again.

Changed in evergreen:
assignee: nobody → Dan Wells (dbw2)
tags: removed: pullrequest
Dan Wells (dbw2) wrote :

Added several new commits to robustify the MFHD code in general, and to hopefully solve the cases Lebbeous was hitting. Rebased and force-pushed to same location.

tags: added: pullrequest
Changed in evergreen:
assignee: Dan Wells (dbw2) → nobody
Dan Wells (dbw2) on 2013-07-15
Changed in evergreen:
milestone: 2.5.0-m2 → 2.5.0-alpha1

Hi Dan,

This code seems to work well. I found things working sensibly in my tests, and I see that your unit tests all pass. Sure, there were some warnings in the output, but I think they're expected, e.g.

> No values found for existing caption subfield 'j', returning '*' (unknown value indicator) at
> /usr/local/share/perl/5.10.1/Test/ line 1478

which makes sense to me.

I'll sign this off, but first I wondered if you had and preferences about how this will be squashed (or if you'll just squash it however you want)?


Dan Wells (dbw2) on 2013-08-05
Changed in evergreen:
importance: Low → Medium
Changed in evergreen:
status: Confirmed → In Progress
assignee: nobody → Lebbeous Fogle-Weekley (lebbeous)

I talked to Dan out-of-band about the squashing, and decided to go with a final set of four commits. Thanks for these enhancements Dan!

Now in master, rel_2_4, and rel_2_3.

Changed in evergreen:
status: In Progress → Fix Committed
assignee: Lebbeous Fogle-Weekley (lebbeous) → nobody

We are also experiencing these types of errors in some serials. Lebbeous and Steve corrected one title for us, but recently this is happening for 2 other titles.

Hi Gislaine,

When your site gets an upgrade to the latest 2.3 or 2.4 series release, you'll be running newer code from Dan that should obviate the need for explicit corrections of that type.

This will be a boon to many institutions who have collected serial holdings which, over time, have come to include many irregular editions. Thank you for helping to test parts of the new code in earlier phases (alluded to in comments #1 and #2 on this bug report), and thank you, Dan, once again.


Ben Shum (bshum) on 2013-11-11
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers