uncontrolled attribute values that consistent only of spaces are normalized away

Bug #1415234 reported by Galen Charlton
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

This is the same as bug 1414112, but for attributes and attribute values that are not backed by config.coded_value_map rows.

One example of this is the biography flag (008/34 for BKS), where at present if the 008/34 in a bib contains a blank, no "biog" key is added to the set of record attributes. This means that creating a search filter on non-biographies, i.e., "biog( )" is impossible.

This can readily be corrected as follows in the metabib.reingest_record_attributes() stored procedure:

@@ -133,7 +133,7 @@ BEGIN
             -- Create unknown uncontrolled values and find the IDs of the values
             IF ccvm_row.id IS NULL THEN
                 FOR tmp_val IN SELECT value FROM UNNEST(norm_attr_value) x(value) LOOP
- IF tmp_val IS NOT NULL AND BTRIM(tmp_val) <> '' THEN
+ IF tmp_val IS NOT NULL AND tmp_val <> '' THEN
                         BEGIN -- use subtransaction to isolate unique constraint violations
                             INSERT INTO metabib.uncontrolled_record_attr_value ( attr, value ) VALUE
                         EXCEPTION WHEN unique_violation THEN END;

However, that would result in *all* such fixed fields that can contain only blanks showing up as attributes:

"biog"=>" ", "conf"=>" ", "ctry"=>" ", "fest"=>" ", "gpub"=>" ", "ills"=>" ", "indx"=>" ", "mrec"=>" ", "date1"=>"2015", "date2"=>" ", "audience"=>" ", "cat_form"=>"a", "language"=>"eng", "bib_level"=>"m", "enc_level"=>"K", "item_lang"=>"eng", "item_type"=>"a", "vr_format"=>"s", "pub_status"=>"s", "icon_format"=>"book", "control_type"=>" ", "search_format"=>"book", "mr_hold_format"=>"book"

It is not necessarily clear to me that this is desirable for attributes like date2. For some attributes, like "conf" and "fest", a space is not actually a valid value.

Consequently, some additional eyes on this are desired.


Tags: cataloging
Galen Charlton (gmc)
tags: added: cataloging
Revision history for this message
Mike Rylander (mrylander) wrote :

For the example of "biog", I would argue (pretty strenuously) that we're simply lacking seed data for ccvm, since there are prescribed values. Conf and Fest would fall into the same catagory as Biog, IMO.

I agree with the implication that it's not desirable for date2, or for that matter any uncontrolled attribute, to be tracked when the value is all-spaces.

Revision history for this message
Galen Charlton (gmc) wrote :

I agree that providing ccvm seed data for all of the fixed fields that have controlled vocabularies is a good idea.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers