build/tools/relator_map no longer fetching relator codes

Bug #1666987 reported by Galen Charlton
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Low
Unassigned

Bug Description

As of today, build/tools/relator_map no longer works because the LC website appears to be blocking requests from LWP/Simple user agents. This could be gotten around by changing the user agent (for example, curl is not blocked), but it might also be good to see if nowadays we're better off grabbing the list of relators from id.loc.gov.

Evergreen master

Tags: pullrequest
Revision history for this message
Galen Charlton (gmc) wrote :

Note that I've verified for the purpose of building 2.12-beta that there have been no relator code changes since the last time Evergreen's copy was updated in 2015.

Changed in evergreen:
importance: Undecided → Low
milestone: none → 2.next
Revision history for this message
Dan Wells (dbw2) wrote :

I noticed this also during the 2.11 cycle, and reached the same conclusions, so marking as confirmed.

Changed in evergreen:
status: New → Confirmed
Revision history for this message
Chris Sharp (chrissharp123) wrote :

Taking a look at this. Initially planning to fix the current approach of downloading the HTML file and using a parser like HTML::TableExtract to grab the codes.

Changed in evergreen:
assignee: nobody → Chris Sharp (chrissharp123)
Revision history for this message
Dan Scott (denials) wrote :

To get this working in the short term, I pushed a two-line fix that adds an "Evergreen/3.1" user agent to the script to http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dbs/lp1666987_build_relator_map_ua

Giving it a local run suggests that no terms have changed since 2015. But it would be nice to have a working script for the build process.

Longer term, we should evolve towards supporting the full URIs in relators.tt2 anyway, as well as the short relator codes, so I would eventually recommend reimplementing this to use the linked open data access method as suggested by Galen... something like:

1. Grab the complete list of relator codes from id.loc.gov (using cURL here as a handy way of dereferencing the URIs with the appropriate Accept: header but should be easy enough to accomplish with LWP::Simple, as id.loc.gov does not use the Cloudflare blocking that the loc.gov main site is):

curl -L -H 'Accept: application/json' http://id.loc.gov/vocabulary/relators/

2. Iterate over the members of "http://www.w3.org/2004/02/skos/core#hasTopConcept", which will give you:

            {
                "@id": "http://id.loc.gov/vocabulary/relators/abr"
            },
            {
                "@id": "http://id.loc.gov/vocabulary/relators/act"
            },
            ...

3. Dereference each of the relator URIs to extract the corresponding label and abbreviated code:

curl -L -H 'Accept: application/json' http://id.loc.gov/vocabulary/relators/abr

        "http://www.w3.org/2004/02/skos/core#prefLabel": [
            {
                "@language": "en",
                "@value": "Abridger"
            }
        ],
        "http://www.w3.org/2004/02/skos/core#notation": [
            {
                "@value": "abr"
            }
        ],

4. Generate our relators.tt2 as before, but make it smart enough to equate the short relator codes with the full URIs (perhaps in the authors.tt2 code)

But in the short term let's just test and merge http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dbs/lp1666987_build_relator_map_ua and move forward.

Changed in evergreen:
assignee: Chris Sharp (chrissharp123) → nobody
tags: added: pullrequest
Changed in evergreen:
milestone: 3.next → 3.1.2
Changed in evergreen:
milestone: 3.1.2 → 3.1.3
Changed in evergreen:
milestone: 3.1.3 → 3.1.4
Changed in evergreen:
milestone: 3.1.4 → 3.1.5
Changed in evergreen:
milestone: 3.1.5 → 3.1.6
Changed in evergreen:
milestone: 3.1.6 → 3.2.1
Changed in evergreen:
milestone: 3.2.1 → 3.2.2
Revision history for this message
Jason Stephenson (jstephenson) wrote :

Seeing as we typically only do this during the .0 release, should this be targeted at 3.1 and 3.2 and not 3.next? I understand that some could run this script on their own if they're doing an installation, but doing so is not part of the typical install steps, nor is it documented anywhere other than the release steps as far as I know.

Revision history for this message
Ben Shum (bshum) wrote :

I agree with Jason's comment and recommend that we only target this towards the next future major series. It's not typically checked during maintenance releases as far as I know.

no longer affects: evergreen/3.1
Changed in evergreen:
milestone: 3.2.2 → 3.3-beta1
Changed in evergreen:
milestone: 3.3-beta1 → 3.3-rc
Changed in evergreen:
milestone: 3.3-rc → 3.3.1
Revision history for this message
Dan Wells (dbw2) wrote :

Just dealt with this again... for the last time!

Works for me, so pushed to master. Thanks, Dan!

Changed in evergreen:
milestone: 3.3.1 → 3.4-beta1
status: Confirmed → Fix Committed
Galen Charlton (gmc)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.