Authority entries in 5xx field should not be listed as a main entry in browse list

Bug #1307629 reported by Kathy Lussier on 2014-04-14
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Medium
Unassigned

Bug Description

Evergreen version: 2.5.3

This bug is possibly related to https://bugs.launchpad.net/evergreen/+bug/1307603.

If I look up Scott, Bronwyn, I get four entries (see screenshot at http://www.screencast.com/t/SFQZ18bn)

1. The first entry is a stray record with an incorrect heading.
2. I'm not quite sure where the second heading comes from. When I figure it out, I may be filing yet another bug report.
3. The third heading comes from the authority record for Bronwyn Scott. http://www.screencast.com/t/CIhMTM6ojafA. This heading appropriately shows the "See Also" heading directing the user to Nikki Poppen.
4. The fourth heading comes from the 500 field of Nikki Poppen's authority record. http://www.screencast.com/t/JWWfzcygfuN

Including the heading listed in 5xx field of an authority record leads to duplication of headings since 5xx headings point to authorized records that already have a heading in the list. We have seen the same behavior with subject authorities.

Also, the browse list should never display the contents of subfield 0. However, we have only noticed this problem in 5xx entries and believe the problem will go away once those entries are removed from the list.

Revision history for this message
Mike Rylander (mrylander) wrote :

Here's what I believe you're seeing, and configuration changes plus a browse reingest for both bib and auth may be required to address them.

 1) incomplete heading in a bib, lacking the birth year, and not linked to an authority entry. Exactly as you suggest

 2) Mis-normalized bib entry not linked to via $0 to an authority. It matches the see-also of an auth (for Poppen), though, so we follow that string-match through to the main entry record for Brownyn, which also contains a see-also for Poppen. If properly normalized, this would fold into the entry below it.

 3) Correctly normalized bib heading that is linked via $0 to the Brownyn (three such bibs; this is supported by the (3) next to the Brownyn "see" in the preceding entry)

 4) Mis-normalized see-also authority heading whose containing authority record's main entry is in use by, and linked to via $0, 6 bib records. If correctly normallized, this would be folded into the entry above it.

If any auth-auth linking was performed by hand early in the testing process, or the auth-auth linker was used on this data prior to the feature's final version, or certain flags (such as those disabling auth reingest) were used at some point, the data could easily get into this state. There may be other ways, as well.

For this test server (and possibly for production sites having issues similar to this) I think the best thing to do would be to clear out all browse-related data, all simple heading data, and finally do a browse-only reingest of bib followed by something along the lines of:

DO $$

DECLARE

  auth authority.record_entry%ROWTYPE;
  ashs authority.simple_heading%ROWTYPE;
  mbe_row metabib.browse_entry%ROWTYPE;
  mbe_id BIGINT;
  ash_id BIGINT;

BEGIN

  DELETE FROM authority.simple_heading;

  FOR auth IN SELECT * FROM authority.record_entry WHERE NOT DELETED LOOP

    FOR ashs IN SELECT * FROM authority.simple_heading_set(auth.marc) LOOP

        INSERT INTO authority.simple_heading (record,atag,value,sort_value)
            VALUES (ashs.record, ashs.atag, ashs.value, ashs.sort_value);
            ash_id := CURRVAL('authority.simple_heading_id_seq'::REGCLASS);

        SELECT INTO mbe_row * FROM metabib.browse_entry
            WHERE value = ashs.value AND sort_value = ashs.sort_value;

        IF FOUND THEN
            mbe_id := mbe_row.id;
        ELSE
            INSERT INTO metabib.browse_entry
                ( value, sort_value ) VALUES
                ( ashs.value, ashs.sort_value );

            mbe_id := CURRVAL('metabib.browse_entry_id_seq'::REGCLASS);
        END IF;

        INSERT INTO metabib.browse_entry_simple_heading_map (entry,simple_heading) VALUES (mbe_id,ash_id);

    END LOOP;

  END LOOP;

END;

$$;

That's untested, but it's cut from the authority ingest code, so should work fine. All of that should be performed on a quiescent server, especially WRT bib or auth updates.

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

I have been testing Mike's code on our a test server with production data that was a snapshot of data from a 2.5 system where the data was originally loaded on 2.2. It has not been fixing the issue highlighted in the original bug.

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

It is clear that some of the issues are related to our data. This was done after the authorities and bibs were reingested in the 2.5 upgrade. One thing I don't understand is why the subfield 0 appears. No matter how bad our data is, it seems that should never show up.

http://bark.cwmars.org/eg/opac/browse?blimit=25&qtype=subject&bterm=christianity+and+literature&locg=1

Specifically, this entry appears

      christianity and literature cwmars 32948

I'm sure we have some records with "Christianity and Literature" that has an errant period that prevented the bibliographic subject entry from linking to the authority.

1) run Mike's script

DO $$
DECLARE
  auth authority.record_entry%ROWTYPE;
  ashs authority.simple_heading%ROWTYPE;
  mbe_row metabib.browse_entry%ROWTYPE;
  mbe_id BIGINT;
  ash_id BIGINT;
BEGIN
  DELETE FROM authority.simple_heading;
  FOR auth IN SELECT * FROM authority.record_entry WHERE NOT DELETED LOOP
    FOR ashs IN SELECT * FROM authority.simple_heading_set(auth.marc) LOOP
        INSERT INTO authority.simple_heading (record,atag,value,sort_value)
            VALUES (ashs.record, ashs.atag, ashs.value, ashs.sort_value);
            ash_id := CURRVAL('authority.simple_heading_id_seq'::REGCLASS);
        SELECT INTO mbe_row * FROM metabib.browse_entry
            WHERE value = ashs.value AND sort_value = ashs.sort_value;
        IF FOUND THEN
            mbe_id := mbe_row.id;
        ELSE
            INSERT INTO metabib.browse_entry
                ( value, sort_value ) VALUES
                ( ashs.value, ashs.sort_value );
            mbe_id := CURRVAL('metabib.browse_entry_id_seq'::REGCLASS);
        END IF;
        INSERT INTO metabib.browse_entry_simple_heading_map (entry,simple_heading) VALUES (mbe_id,ash_id);
    END LOOP;
  END LOOP;
END;
$$;

2) remove orphaned browse data

DELETE FROM authority.simple_heading;

DELETE FROM metabib.browse_entry WHERE id IN ( SELECT e.id FROM metabib.browse_entry e LEFT JOIN metabib.browse_entry_def_map b ON (b.entry = e.id) LEFT JOIN metabib.browse_entry_simple_heading_map a ON (b.entry = e.id) WHERE b.id IS NULL AND a.id IS NULL

3) reingest the authorities with this

\t
\o reingest.auth.sql
SELECT 'UPDATE config.internal_flag SET enabled = ''t'' WHERE name = ''ingest.reingest.force_on_same_marc'';';
SELECT 'UPDATE authority.record_entry SET id = id WHERE id = ' || id || ';' FROM authority.record_entry WHERE NOT deleted; SELECT 'UPDATE config.internal_flag SET enabled = ''' || (SELECT enabled FROM config.internal_flag WHERE name = 'ingest.reingest.force_on_same_marc') || ''' WHERE name = ''ingest.reingest.force_on_same_marc'';';
\o

Ran the file “reingest.auth.sql”

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

I should add a comment. I ran this

DO $$
DECLARE
  auth authority.record_entry%ROWTYPE;
  ashs authority.simple_heading%ROWTYPE;
  mbe_row metabib.browse_entry%ROWTYPE;
  mbe_id BIGINT;
  ash_id BIGINT;
BEGIN
  DELETE FROM authority.simple_heading;
  FOR auth IN SELECT * FROM authority.record_entry WHERE NOT DELETED LOOP
    FOR ashs IN SELECT * FROM authority.simple_heading_set(auth.marc) LOOP
        INSERT INTO authority.simple_heading (record,atag,value,sort_value)
            VALUES (ashs.record, ashs.atag, ashs.value, ashs.sort_value);
            ash_id := CURRVAL('authority.simple_heading_id_seq'::REGCLASS);
        SELECT INTO mbe_row * FROM metabib.browse_entry
            WHERE value = ashs.value AND sort_value = ashs.sort_value;
        IF FOUND THEN
            mbe_id := mbe_row.id;
        ELSE
            INSERT INTO metabib.browse_entry
                ( value, sort_value ) VALUES
                ( ashs.value, ashs.sort_value );
            mbe_id := CURRVAL('metabib.browse_entry_id_seq'::REGCLASS);
        END IF;
        INSERT INTO metabib.browse_entry_simple_heading_map (entry,simple_heading) VALUES (mbe_id,ash_id);
    END LOOP;
  END LOOP;
END;
$$;

and checked the headings. The problem was recurring. Then I did this

1) remove orphaned browse data

DELETE FROM authority.simple_heading;

DELETE FROM metabib.browse_entry WHERE id IN ( SELECT e.id FROM metabib.browse_entry e LEFT JOIN metabib.browse_entry_def_map b ON (b.entry = e.id) LEFT JOIN metabib.browse_entry_simple_heading_map a ON (b.entry = e.id) WHERE b.id IS NULL AND a.id IS NULL

2) reingest the authorities with this

\t
\o reingest.auth.sql
SELECT 'UPDATE config.internal_flag SET enabled = ''t'' WHERE name = ''ingest.reingest.force_on_same_marc'';';
SELECT 'UPDATE authority.record_entry SET id = id WHERE id = ' || id || ';' FROM authority.record_entry WHERE NOT deleted; SELECT 'UPDATE config.internal_flag SET enabled = ''' || (SELECT enabled FROM config.internal_flag WHERE name = 'ingest.reingest.force_on_same_marc') || ''' WHERE name = ''ingest.reingest.force_on_same_marc'';';
\o

Ran the file “reingest.auth.sql”

and the problem recurred.

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

I have been looking more closely into this it does appear to be occuring in two different instance.

One is with the mis-normalized data Mike indiciates lacking the link.

The other instance appears to also occur when there are subdivisions. To the heading.

650 0. ‡aBirds ‡xVocalization ‡zNorth America.
650 0. ‡aBirds ‡xVocalization ‡zCanada, Western.
650 0. ‡aBirds ‡xVocalization ‡zUnited States.

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :
Download full text (6.7 KiB)

I have tried several things to try and fix the display. Here is the latest attempt to fix the display on one entry for the topical heading Birds Vocalization. I have attached the authorities and related bib records involved.

This is the undesired display where the subfield 0 is appear.

---------------------------------
birds vocalization (no link on this heading)

    Note: Here are entered works on the process of sound production by birds. Works on the calls or songs produced by birds are entered under Birdsongs. Musical compositions having birds as their subject are entered under Birds--Songs and music.
    See Also From Tracing -- Topical Term Talking birds (1)
    SEE Heading -- Topical Term Birds Vocalization (7)

birds vocalization cwmars 98017 (no link on this entry)

    See Also From Tracing -- Topical Term Birds Vocalization (7)
    SEE Heading -- Topical Term Talking birds (1)
---------------------------------

<li class="browse-result">
                            <span class="browse-result-value">
                                 <!-- only authority links -->
                                    birds vocalization

                            </span>

                            <ul class="browse-result-authority-headings">

            <div class="browse-public-general-note">
                <span class="browse-public-general-note-label">
                    Note:
                </span>
                <span class="browse-public-general-note-body">
                Here are entered works on the process of sound production by birds. Works on the calls or songs produced by birds are entered under Birdsongs. Musical compositions having birds as their subject are entered under Birds--Songs and music.
                </span>
            </div>

                                                <li><span class="browse-result-authority-field-name">See Also From Tracing -- Topical Term</span>
                                                <a href="/eg/opac/browse?blimit=25;qtype=subject;bterm=Talking%20birds;locg=1">Talking birds</a>
                                                <span class="browse-result-authority-bib-links">(1)</span>
                                                </li>

                                                <li><span class="browse-result-authority-field-name">SEE Heading -- Topical Term</span>
                                                <a href="/eg/opac/browse?blimit=25;qtype=subject;bterm=Birds%20Vocalization;locg=1">Birds Vocalization</a>
                                                <span class="browse-result-authority-bib-links">(7)</span>
                                                </li>

                            </ul>
                        </li>

                        <li class="br...

Read more...

Revision history for this message
Kathy Lussier (klussier) wrote :

Mike - I know in your comment above that you said the problem was caused by the state the data was in prior to 2.5, but since Tim has removed all those entries and reingested the bibs, shouldn't these headings be normalizing properly now.? In the example Tim used above, *all* of the "birds vocalization" headings are misnormalized, so there really is no way to get to the record. One would think that reingesting would put the entries into the properly normalized state.

Revision history for this message
Mike Rylander (mrylander) wrote :

Tim, the next thing we need to look at is the contents of your config.metabib_field table, since now that the authority-side headings have been addressed we'll have to investigate the bib-side headings. It's likely that we'll need to look at the normalizations being applied to each field, as well, but those are stored across several tables and a little trickier to pull out, so let's start with just the definitions. If there are non-stock definitions for link-able bib fields, they're probably lacking some authority-important parts. Your mention of subdivisions, in particular, makes me lean in this direction.

Stepping back, there are two sources of browse headings: bibs and authorities. They each have separate extraction and normalization configuration tables, and they both need to be set up such that the output of each lines up with the other. That means making sure that the authority, browse and browse_sort xpath fields is set properly, that the joiners match, and that the set of bib-side normalizers will cause the same output to be generated from the bib side as will be generated from the simpler authority-size heading generation routine.

Obviously something in your configuration is misaligned, but we'll find the culprit.

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

Attached are the contents of the config.metabib_field table.

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :

It would be good for others to test this but ESI did the following to fix some of the issues.

1. Truncate metabib.browse_entry table
2. Reingest authorities

This removed the references that included the subfield 0 and collapsed the headings removing some of the repetition. Most of the repetition of headings occurs because of authorities that have not been linked due to typos or some errant character in the bib record. However, we are still seeing this with Bronwyn Scott

Scott, Bronwyn 1967- (4) (linked search)

    See Also From Tracing -- Personal Name Poppen, Nikki, 1967- (6)

Scott, Bronwyn, 1967-

    See Also From Tracing -- Personal Name Scott, Bronwyn, 1967- (4)
    SEE Heading -- Personal Name Poppen, Nikki, 1967- (6)

Revision history for this message
Tim Spindler (tspindler-cwmars) wrote :
Download full text (3.3 KiB)

When two Authorities are linked to eachother via the 550, they are still showing up with two entries after the fix was applied.
----------------------------------------------------------------------------------------------------------------------------

DISPLAY

Academic libraries -- Relations with faculty and curriculum

    See Also From Tracing -- Topical Term Academic libraries Relations with faculty and curriculum (26)
    SEE Heading -- Topical Term Academic librarians Faculty status (1)

Academic libraries Relations with faculty and curriculum (20) (link to search results)

ASSOCIATED AUTHORITIES

=LDR 00490cz a2200157n 4500
=001 57782
=003 CWMARSL
=005 20031204064727.0
=008 031020i|\anannbabn\\\\\\\\\\|a\ana\\\\\\
=010 \\$ash 85076594
=035 \\$a(CWMARS)57782
=035 \\$a(DLC)sh 85076594
=040 \\$aDLC$cDLC$dDLC
=150 \\$aAcademic libraries$xRelations with faculty and curriculum
=450 \\$aLibrary-faculty communication
=550 \\$wg$aUniversities and colleges$xCurricula
=550 \\$wg$aUniversities and colleges$xFaculty
=550 \\$aAcademic librarians$xFaculty status$0(CWMARS)34259
=901 \\$c57782$tauthority

=LDR 00388cz a2200133n 4500
=001 34259
=003 CWMARSL
=005 20031204064601.0
=008 031020i|\anannbabn\\\\\\\\\\|a\ana\\\\\\
=010 \\$ash 85028310
=035 \\$a(CWMARS)34259
=035 \\$a(DLC)sh 85028310
=040 \\$aDLC$cDLC$dDLC
=150 \\$aAcademic librarians$xFaculty status
=450 \\$aFaculty status of academic librarians
=550 \\$aAcademic libraries$xRelations with faculty and curriculum$0(CWMARS)57782
=901 \\$c34259$tauthority

----------------------------------------------------------------------------------------------------------------------------

DISPLAY

Scott, Bronwyn 1967- (4) (link to search results)

    See Also From Tracing -- Personal Name Poppen, Nikki, 1967- (6)

Scott, Bronwyn, 1967-

    See Also From Tracing -- Personal Name Scott, Bronwyn, 1967- (4)
    SEE Heading -- Personal Name Poppen, Nikki, 1967- (6)

ASSOCIATED AUTHORITIES

=LDR 00577nz a2200157n 4500
=001 977862
=003 CWMARSL
=005 20080412071018.0
=008 080404n|\acannaabn\\\\\\\\\\|a\aaa\\\\\c
=010 \\$ano2008055395
=035 \\$a(CWMARS)977862
=035 \\$a(DLC)no2008055395
=035 \\$a(OCoLC)oca07732398
=040 \\$aOC$beng$cOC
=100 1\$aScott, Bronwyn,$d1967-
=500 1\$aPoppen, Nikki,$d1967-$0(CWMARS)761088
=670 \\$aPickpocket countess, 2008:$bt.p. (Bronwyn Scott) t.p. verso (c2008 by Nikki Poppen)
=670 \\$aEmail from author, Apr. 4, 2008$b(Nikki Poppen-Eagan; writes as Nikki Poppen for Avalon, under pseudonym Bronwyn Scott for Harlequin; b. 1967)
=901 \\$c977862$tauthority

=LDR 00818cz a2200205n 4500
=001 761088
=003 CWMARSL
=005 20080716153616.0
=008 060214n|\acannaabn\\\\\\\\\\|a\aaa\\\\\\
=010 \\$an 2006010915$zn 96023204
=035 \\$a(CWMARS)761088
=035 \\$a(DLC)n 2006010915
=035 \\$a(OCoLC)oca06855402
=040 \\$aDLC$beng$cDLC$dDLC$dOC$dDLC
=053 \0$aPS3616.O657
=100 1\$aPoppen, Nikki,$d1967-
=400 1\$aPoppen-Eagen, Nikki,$d1967-
=400 1\$aEagen, Nikki Poppen-,$d1967-
=500 1\$aScott, Bronwyn,$d1967-$0(CWMARS)977862
=670 \\$aPoppen, Nikki. Dowager's wager, c2006:$bECIP t.p. (Nikki Poppen)
=670 \\$aEmail from pub., Feb. 13, 2006$b(b. Nov. 27, 1967)
=670 \\$aEmail from author, Apr. 4, 2008$b(Nikki...

Read more...

Revision history for this message
Kathy Lussier (klussier) wrote :

Also marking this one as a duplicate of 1638299. The problem here was that bib normalization was removing the second comma from the author entry when ingesting the bib record, but the same didn't happen on the authority end of things, making it seem like they were two different entries. Using the MADS-based normalization for authority records should fix this.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers