biblio.extract_located_uris does not find existing, active asset.uris when use_restriction is NULL

Bug #797307 reported by Dan Scott
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
High
Dan Scott

Bug Description

* Evergreen 2.0.6
* PostgreSQL 9.0.4
* Debian Squeeze (database server)

The current logic in biblio.extract_located_uris() tries to reuse an existing asset.uri entry if it finds a match with the following code:

                -- look for a matching uri
                SELECT id INTO uri_id FROM asset.uri WHERE label = uri_label AND href = uri_href AND use_restriction = uri_use AND active;
                IF NOT FOUND THEN -- create one
                    INSERT INTO asset.uri (label, href, use_restriction) VALUES (uri_label, uri_href, uri_use);
                    IF uri_use IS NULL THEN
                        SELECT id INTO uri_id FROM asset.uri WHERE label = uri_label AND href = uri_href AND use_restriction IS NULL AND active;
                    ELSE
                        SELECT id INTO uri_id FROM asset.uri WHERE label = uri_label AND href = uri_href AND use_restriction = uri_use AND active;
                    END IF;
                END IF;

However, if there is no public note, then use_restriction / uri_use are NULL and the first SELECT always returns 0 rows, resulting in a new asset.uri entry being created. We need to handle the case where uri_use is NULL properly and avoid creating a new row every time the record is ingested.

Here's an example 856 that triggers the problem:

856 4 0 ‡uhttp://dsp-psd.pwgsc.gc.ca/Collection-R/LoPBdP/PRB-e/PRB0220-e.pdf ‡yTo view Windsor's electronic resource click here. ‡9OWA

Tags: pullrequest
Revision history for this message
Dan Scott (denials) wrote :

Fix in progress at user/dbs/lp797304_lp797307 (see related asset.uri bug 797304).

Changed in evergreen:
assignee: nobody → Dan Scott (denials)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Dan Scott (denials) wrote :

One associated issue is that asset.uri currently allows duplicate rows for everything except asset.uri.id, so SELECT id INTO uri_id FROM asset.uri ... may return multiple matching rows for legacy data prior to this fix.

This suggests that the cleanup code should include the addition of a unique constraint, which requires a pruning of duplicate rows, which may end up being rather painful if asset.uri_callnumber_map points to duplicate rows.

Something like:

-- Get rid of asset.uri entries that are effectively useless
DELETE FROM asset.uri WHERE id NOT IN (SELECT uri FROM asset.uri_call_number_map);

-- Then inside a function:
-- Get a list of all asset.uri entries where everything but id matches
-- Then update asset.uri_callnumber_map to point at the lowest id for each of those matches
-- Then delete all but the lowest id for each of those matches

-- Then add our unique constraint as a failsafe
ALTER TABLE asset.uri ADD CONSTRAINT unique_uris UNIQUE (href, label, use_restriction, active);

Revision history for this message
Dan Scott (denials) wrote :

As a short term approach, I added an ORDER BY and LIMIT clause to the SELECT statements that would otherwise potentially return multiple rows where only one was expected.

In addition, I have added an unwrapped upgrade script that contains the updated biblio.extract_located_uris() definition as well as a stored procedure for reingesting records in which the associated asset.uri row has href = label.

I believe this is appropriate for backporting through 2.0; cleaning up the duplicate asset.uri entries would be nice but is not essential for the short term. Review / sign-off / commit would be greatly appreciated:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dbs/lp797304_lp797307

tags: added: pullrequest
Revision history for this message
Mike Rylander (mrylander) wrote :

Signed off and merged into master through 2.0

Changed in evergreen:
status: In Progress → Fix Committed
Ben Shum (bshum)
Changed in evergreen:
milestone: none → 2.0.7
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.