uncontrolled attribute values that consistent only of spaces are normalized away

Bug #1415234 reported by Galen Charlton
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
New
Undecided
Unassigned

Bug Description

This is the same as bug 1414112, but for attributes and attribute values that are not backed by config.coded_value_map rows.

One example of this is the biography flag (008/34 for BKS), where at present if the 008/34 in a bib contains a blank, no "biog" key is added to the set of record attributes. This means that creating a search filter on non-biographies, i.e., "biog( )" is impossible.

This can readily be corrected as follows in the metabib.reingest_record_attributes() stored procedure:

@@ -133,7 +133,7 @@ BEGIN
             -- Create unknown uncontrolled values and find the IDs of the values
             IF ccvm_row.id IS NULL THEN
                 FOR tmp_val IN SELECT value FROM UNNEST(norm_attr_value) x(value) LOOP
- IF tmp_val IS NOT NULL AND BTRIM(tmp_val) <> '' THEN
+ IF tmp_val IS NOT NULL AND tmp_val <> '' THEN
                         BEGIN -- use subtransaction to isolate unique constraint violations
                             INSERT INTO metabib.uncontrolled_record_attr_value ( attr, value ) VALUE
                         EXCEPTION WHEN unique_violation THEN END;

However, that would result in *all* such fixed fields that can contain only blanks showing up as attributes:

"biog"=>" ", "conf"=>" ", "ctry"=>" ", "fest"=>" ", "gpub"=>" ", "ills"=>" ", "indx"=>" ", "mrec"=>" ", "date1"=>"2015", "date2"=>" ", "audience"=>" ", "cat_form"=>"a", "language"=>"eng", "bib_level"=>"m", "enc_level"=>"K", "item_lang"=>"eng", "item_type"=>"a", "vr_format"=>"s", "pub_status"=>"s", "icon_format"=>"book", "control_type"=>" ", "search_format"=>"book", "mr_hold_format"=>"book"

It is not necessarily clear to me that this is desirable for attributes like date2. For some attributes, like "conf" and "fest", a space is not actually a valid value.

Consequently, some additional eyes on this are desired.

Evergreen

Tags: cataloging
Galen Charlton (gmc)
tags: added: cataloging
Revision history for this message
Mike Rylander (mrylander) wrote :

For the example of "biog", I would argue (pretty strenuously) that we're simply lacking seed data for ccvm, since there are prescribed values. Conf and Fest would fall into the same catagory as Biog, IMO.

I agree with the implication that it's not desirable for date2, or for that matter any uncontrolled attribute, to be tracked when the value is all-spaces.

Revision history for this message
Galen Charlton (gmc) wrote :

I agree that providing ccvm seed data for all of the fixed fields that have controlled vocabularies is a good idea.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers