Add authority records support to marc stream importer (Connexion)

Bug #1384740 reported by Bill Erickson
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Wishlist
Unassigned

Bug Description

Feature request.

Evergreen's marc_stream_import.pl script should have the ability to import authority records. For reference, this script is typically (I think) used to import MARC bib records from OCLC Connexion into Evergreen (though any service that writes MARC to a socket could theoretically work). It can also be used to import local MARC files. We should expand its current abilities to include authority records.

Yamil (ysuarez)
tags: added: authority
Revision history for this message
Bill Erickson (berick) wrote :

Pushed a heavily modified stream importer and small test script:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1384740-stream-importer-auth

This script has accumulated quite a bit of cruft so included with the authority additions is a lot of cleanup. If we need to add some of this back, we can, but I didn't want to unless I thought it was necessary. Also, note that some of the authority import options are not yet functional, pending the completion of bug #1171984, but basic authority queuing and importing does work.

-----

From the commit:

Add support for importing authority records to marc_stream_importer and clean up some cruft along the way. A single instance of the script can import either type of record. The record leader is inspected to determine if a batch is an authority batch or a bib batch.

New options:

--auth-merge-profile
--auth-queue
--auth-source

Changed options:

--import-no-match
--auto-overlay-exact
--auto-overlay-1match
--auto-overlay-best-match

These have been expanded to include authority records. Applied with no value or a value of '1' means the option applies to bib imports, a value of 2 applies to authority records, and a value of 3 applies to both.

Cleanup:

--import-by-queue is no longer supported. This option serves no particular purpose and is a bad idea when re-using the same queue over and over as most people do, because queue bloat will increase run times.

--noqueue (AKA "direct import") is no longer supported. All imports go through Vandelay now.

----

I'll put these into release notes if we agree --import-by-queue and --noqueue can be deprecated.

Revision history for this message
Galen Charlton (gmc) wrote :

Noting that there's a bit of overlap going on between the patch for bug 741788 and the cleanup done here; some (simple) merge conflict resolution will be needed one way or the other.

Revision history for this message
Galen Charlton (gmc) wrote :

> --import-no-match
> --auto-overlay-exact
> --auto-overlay-1match
> --auto-overlay-best-match
>
> These have been expanded to include authority records.
> Applied with no value or a value of '1' means the option
> applies to bib imports, a value of 2 applies to authority
> records, and a value of 3 applies to both.

Not a fan of the magic numbers; I'd prefer either splitting the switches (e.g., --import-no-match / --auth-import-no-match or --bib-import-no-match / --auth-import-no-match) or spelling out the options (e.g., --import-no-match biblio, --auto-overlay-exact authority, --auto-overlay-1match both, etc.)

> --import-by-queue is no longer supported. This option serves no particular
> purpose and is a bad idea when re-using the same queue over and over as
> most people do, because queue bloat will increase run times.

+1

> --noqueue (AKA "direct import") is no longer supported. All imports go
> through Vandelay now.

+1

Revision history for this message
Bill Erickson (berick) wrote : Re: [Bug 1384740] Re: Add authority records support to marc stream importer (Connexion)

On Wed, Apr 22, 2015 at 1:48 PM, Galen Charlton <email address hidden> wrote:

> > --import-no-match
> > --auto-overlay-exact
> > --auto-overlay-1match
> > --auto-overlay-best-match
> >
> > These have been expanded to include authority records.
> > Applied with no value or a value of '1' means the option
> > applies to bib imports, a value of 2 applies to authority
> > records, and a value of 3 applies to both.
>
> Not a fan of the magic numbers; I'd prefer either splitting the switches
> (e.g., --import-no-match / --auth-import-no-match or --bib-import-no-
> match / --auth-import-no-match) or spelling out the options (e.g.,
> --import-no-match biblio, --auto-overlay-exact authority, --auto-
> overlay-1match both, etc.)

Fair enough. I'll look at creating specific options soon.

Thanks for the feedback, Galen!

-b

Revision history for this message
Bill Erickson (berick) wrote :

Magic number options replaced by explicit options:

    --bib-import-no-match
    --bib-auto-overlay-exact
    --bib-auto-overlay-1match
    --bib-auto-overlay-best-match
    --auth-import-no-match
    --auth-auto-overlay-exact
    --auth-auto-overlay-1match
    --auth-auto-overlay-best-match

When any of the below are used, they map to the bib versions and print a deprecation warning to the terminal:

    --import-no-match
    --auto-overlay-exact
    --auto-overlay-1match
    --auto-overlay-best-match

Also rebased to master.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/berick/lp1384740-stream-importer-auth-rebase

tags: added: pullrequest
Changed in evergreen:
milestone: none → 2.next
Bill Erickson (berick)
Changed in evergreen:
milestone: 2.next → 2.9-alpha
Revision history for this message
Bill Erickson (berick) wrote :

We're using this script, minus the authority bits, in production now. In part, because the script refactoring includes some stability repairs I forgot to mention elsewhere in the ticket:

1. Open a new XMPP connection with each forked child during child init. Previously, a single XMPP connection was shared by all forked children, which lead to crossed streams and chaos.

2. Use an auth nonce during login (via oils_header.pl) to avoid colliding logins.

Changed in evergreen:
milestone: 2.9-alpha → 2.9-beta
Kathy Lussier (klussier)
tags: added: needsreleasenote
Revision history for this message
Bill Erickson (berick) wrote :

Rebased to current master and pushed a commit for release notes.

tags: removed: needsreleasenote
Changed in evergreen:
milestone: 2.9-beta → 2.next
Galen Charlton (gmc)
Changed in evergreen:
assignee: nobody → Galen Charlton (gmc)
status: New → Confirmed
Revision history for this message
Galen Charlton (gmc) wrote :

Pushed to master, along with a couple minor follow-ups. Thanks, Bill!

Changed in evergreen:
status: Confirmed → Fix Committed
assignee: Galen Charlton (gmc) → nobody
Changed in evergreen:
milestone: 2.next → 2.10-beta
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.