MVF ingest uses default values inappropriately

Bug #1322285 reported by Jason Stephenson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned
2.6
Fix Released
Undecided
Unassigned

Bug Description

Evergreen version 2.6.0
OpenSRF version 2.3.0
PostgreSQL version 9.1.13

MVF ingest appears to be using the default values from config.marc21_ff_pos_map inappropriately when indexing records.

Our example is that everyone of our fiction books with the fiction indicator in the 008/33 has two lit_form entries of 0 and 1 (nonfiction and fiction). The default value for lit_form is 0 in config.marc21_ff_pos_map. Our belief is that the 0 is being applied for the missing 006 value, and the correct value comes from the 008. In other words, the default value appears to be used when a field is missing, and not just when no value is determined from the MARC record.

The upshot of this is that when limiting searches to fiction or nonfiction these books show up.

Thomas Berezansky has an idea for fixing this that involves moving the default values from config.marc21_ff_pos_map to a separate table. I'll leave it up to him to provide more details in the comments below.

Revision history for this message
Thomas Berezansky (tsbere) wrote :

I will admit to not having looked at how things are being done in the actual code at this point, but I think that dropping the default_val column from the marc21_ff_pos_map table and making a new table for the defaults would help prevent a conflict between rows in marc21_ff_pos_map. We would then only pull those defaults when no rows in marc21_ff_pos_map returned anything for a given fixed field.

Alternatively, instead of defining these defaults there perhaps defaults on the actual index definitions that use the fixed field information would be better? Then you could possibly have different defaults by index that uses them (for example, you may want a composite that only triggers when a specific value is actually set, but for searching directly you want to provide a default value).

Revision history for this message
Mike Rylander (mrylander) wrote :

Please do not start down the road of dropping columns from configuration tables. That data is used elsewhere. If you plan to look into this, I implore you to actually look at the code before proposing breaking things by removing or moving configuration data.

Revision history for this message
Steve Callender (stevecallender) wrote :

I just want to add in that I also have seen this exact issue. It causes issues when doing a lit_form search as the 0's and 1's are lumped together due to double entries basically making a lit_form search useless if you just want to focus on fiction or non-fiction.

Revision history for this message
Mike Rylander (mrylander) wrote :
tags: added: pullrequest
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I am attaching a sample record that gets ingested both as fiction and non-fiction. It happens to all fiction records.

Revision history for this message
Mike Rylander (mrylander) wrote :

I've force-pushed an update that seems to work. Also added IF EXISTS to the drops, and updated the (entirely unused) biblio-schema versions of the functions. They supply the default by, um, default to preserve their behavior.

Revision history for this message
Jason Stephenson (jstephenson) wrote :

Looking at the changes, and they look good. I'm going to try it out on our training system, too, before giving it my full stamp of approval.

no longer affects: evergreen/2.7
Changed in evergreen:
milestone: none → 2.7.0-beta1
Changed in evergreen:
status: New → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
Revision history for this message
Dan Scott (denials) wrote :

Umm. This doesn't appear to be in any of the upgrade scripts in rel_2_7. I would have expected to see it in 2.6-2.7.0 given that the release target for this was 2.7.0-beta1, but the 0887 update is not there and instead the 0864 previous version rolls on.

If one applies the 2.6.2-2.6.3 upgrade, they'll get that fix; but if they then apply the 2.6-2.7.0 upgrade the fix will be rolled back. And the 2.6.2-2.6.3 upgrade script doesn't live in rel_2_7.

Changing back to "Confirmed" until it is properly addressed in rel_2_7 and master.

Changed in evergreen:
status: Fix Released → Confirmed
Revision history for this message
Ben Shum (bshum) wrote :

Hi Dan,

I'm not sure why the 2.6-2.7.0 version upgrade script seems to think that it starts from where the 2.6.2-2.6.3 script left off. Since it's not present in the branch for tags/rel_2_7_0, I would not have expected that. But honestly I no longer have the VM I used to create the 2.7.0 tarball and cannot verify what variable I passed for the make_release script for the immediate preceding version.

I don't see where the fixes in 2.6.2-2.6.3 would be rolled back by the 2.6-2.7.0 upgrade if one were to run the 2.6.2-2.6.3 first, then the 2.6-2.7.0 later. From a glance, it doesn't seem that the 2.6-2.7.0 upgrade touches any of those missing numbered scripts or functions? But maybe I'm misunderstanding...

If it does work to run 2.6.2-2.6.3 first, then perhaps we should forward port that version upgrade script to the rel_2_7 branch (so that the chain is properly aligned again), and then we should change the 2.6-2.7.0 upgrade script to be exactly 2.6.3-2.7.0, since that's where it seems to want to leave off.

Version upgrades are so messy :(

Thanks for testing this out and finding the problem Dan. Let's work on a resolution soon.

Changed in evergreen:
assignee: nobody → Ben Shum (bshum)
milestone: 2.7.0-beta1 → 2.7.2
Revision history for this message
Ben Shum (bshum) wrote :

Reassigning milestone to next 2.7 maintenance release for now, since the 2.7.0 clearly missed this :(

Revision history for this message
Ben Shum (bshum) wrote :

Okay, pushed some changes to master and rel_2_7 to deal with this. Marking back to fix committed. Thanks Dan!

Changed in evergreen:
status: Confirmed → Fix Committed
assignee: Ben Shum (bshum) → nobody
Galen Charlton (gmc)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.