Searching: OCLC Number in Z39.50 with junk characters (2.5.2)

Bug #1353036 reported by Don Butterworth
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
New
Wishlist
Unassigned

Bug Description

At some point in its history, OCLC began to include alphabetic prefixes to its OCLC Control Number. There was also a point at which they pre-padded zeroes in front of the number. This, of course, creates keyword searching problems. Recently I have even run across OCLC numbers that include a backslash at the end of the number! In a previous ILS there was a parameter that allowed for these unwanted characters and zeroes to be eliminated on import. In another previous ILS the prefixes and zeroes were somehow ignored when searching for the OCLC number.

OCLC number searches in Evergreen require all characters to match exactly. This causes a problem when doing a Z39.50 Import search. Typing in just the OCLC number (TCN) retrieves the correct record from OCLC, but does not retrieve the record from the Local Catalog. Typing in the OCLC number with the prefix retrieves the record from the Local Catalog but not from OCLC. Not good. It also causes a problem for anyone trying to do an OCLC number search since they would have to guess at whether the number includes a prefix.

Evergreen needs the capability to conduct a keyword search for only the OCLC number, while ignoring the junk prefixes, zeroes and slashes, or provide a way to strip the nonessential characters when the bib records import.

Tags: search z3950
Elaine Hardy (ehardy)
tags: added: search wishlist
Revision history for this message
Dan Scott (denials) wrote :

The regular catalogue search includes the identifier|oclcnum: search filter that searches the 035 field for values matching (OCoLC)#########, e.g. https://laurentian.concat.ca/eg/opac/results?query=identifier|oclcnum%3A61162064

summary: - Searching: OCLC Number with junk characters (2.5.2)
+ Searching: OCLC Number in Z39.50 with junk characters (2.5.2)
Revision history for this message
Elaine Hardy (ehardy) wrote :

OCLC's format changes to their TCNs, is here: https://help.oclc.org/Metadata_Services/WorldShare_Collection_Manager/Choose_your_Collection_Manager_workflow/Data_sync_collections/Prepare_your_data/30035_field_and_OCLC_control_numbers

They do not add a back slash or any other character to their TCNs and PINES has not seen it in our records imported via the Z39.50 interface with WorldCat.

The prefix is important in distinguishing the OCLC based TCN from other TCNs that might exist in our database. We would not support it's removal. The initial zeros fill out the OCLC number to 8 places (see chart linked above). We would not want them removed from the TCN either.

We make sure we add the appropriate prefix with searching for OCLC TCNs in our database and not including it when searching in OCLC. While it sometimes means editing the initial search, the prefixes within the local database are too important.

We noticed in 3.2 that a new 035 with the incoming 001 with a prefix ocm, ocn, on) is no longer added to the bib record. The incoming record still has the 035 added by OCLC with the prefix (OCoLC) but the 001 is just changed to the database/record ID and is not duplicated as a new 035. This does not seem to have a negative impact on TCN searches or on preventing duplicate imports.

Revision history for this message
Don Butterworth (don-butterworth) wrote : Re: [Bug 1353036] Re: Searching: OCLC Number in Z39.50 with junk characters (2.5.2)
Download full text (4.1 KiB)

From my perspective, it shouldn't be necessary to strip prefixes from the
OCLC number.

In one of our legacy systems, there was a filtering parameter that
identified what prefix data should be ignored. I vaguely recall it as a
grid that looked something like:

0
00
000
0000
00000
000000
ocn
ocm
on

That may not be the best approach for Evergreen to use. I'm just a
cataloger and don't know much about what goes on behind the curtain. But it
seemed simple and we never had a problem doing OCLC number searches.

Not having OCLC numbers search, without the prefixes, prevents Evergreen
from pulling both the local record and the OCLC record when using the TCN
search in the *Import Record from Z39.50* environment thereby mandating
manual searching to avoid duplication. I'm surprised this issue hasn't
already been addressed. It seems like an easy win.

On Wed, Jan 8, 2020 at 10:10 AM Elaine Hardy <email address hidden>
wrote:

> OCLC's format changes to their TCNs, is here:
>
> https://help.oclc.org/Metadata_Services/WorldShare_Collection_Manager/Choose_your_Collection_Manager_workflow/Data_sync_collections/Prepare_your_data/30035_field_and_OCLC_control_numbers
>
>
> They do not add a back slash or any other character to their TCNs and
> PINES has not seen it in our records imported via the Z39.50 interface with
> WorldCat.
>
> The prefix is important in distinguishing the OCLC based TCN from other
> TCNs that might exist in our database. We would not support it's
> removal. The initial zeros fill out the OCLC number to 8 places (see
> chart linked above). We would not want them removed from the TCN either.
>
> We make sure we add the appropriate prefix with searching for OCLC TCNs
> in our database and not including it when searching in OCLC. While it
> sometimes means editing the initial search, the prefixes within the
> local database are too important.
>
>
> We noticed in 3.2 that a new 035 with the incoming 001 with a prefix ocm,
> ocn, on) is no longer added to the bib record. The incoming record still
> has the 035 added by OCLC with the prefix (OCoLC) but the 001 is just
> changed to the database/record ID and is not duplicated as a new 035. This
> does not seem to have a negative impact on TCN searches or on preventing
> duplicate imports.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1353036
>
> Title:
> Searching: OCLC Number in Z39.50 with junk characters (2.5.2)
>
> Status in Evergreen:
> New
>
> Bug description:
> At some point in its history, OCLC began to include alphabetic
> prefixes to its OCLC Control Number. There was also a point at which
> they pre-padded zeroes in front of the number. This, of course,
> creates keyword searching problems. Recently I have even run across
> OCLC numbers that include a backslash at the end of the number! In a
> previous ILS there was a parameter that allowed for these unwanted
> characters and zeroes to be eliminated on import. In another previous
> ILS the prefixes and zeroes were somehow ignored when searching for
> the OCLC number.
>
> OCLC number searches in Evergreen require all c...

Read more...

Revision history for this message
Elaine Hardy (ehardy) wrote :

We never use the Z39.50 interface to search our internal catalog. There is a bug that allows for the creation of an exact duplicate of the existing record if it is accidentally imported so PINES catalogers are instructed to never have the local catalog search box checked in the Z39.50 interface.

All initial searches for titles are done from advanced or numeric (ISBN) search. At that point, PINES catalogers would not have an OCLC number. Only after they either find the matching record in the PINE database or in the Z39.50 search would they have an OCLC number.

Revision history for this message
Don Butterworth (don-butterworth) wrote :
Download full text (3.2 KiB)

Looks like we do things a little differently. It is not uncommon for one of
our catalogers to copy the oclc number directly from OCLC Connexion and
enter it into the Z39.50 TCN. As you are well aware, there is no prefix on
the Connexion record. We want that number search to identify if the bib has
already been entered into the database by another member of the consortium,
which it will not do because of the prefix problem. Further when it finds a
duplicate, we may want to overlay the old record with the new. This is an
option provided for on the screen, but if you can't call them both up in
the results grid, it's catch 22.

On Wed, Jan 8, 2020 at 11:05 AM Elaine Hardy <email address hidden>
wrote:

> We never use the Z39.50 interface to search our internal catalog. There
> is a bug that allows for the creation of an exact duplicate of the
> existing record if it is accidentally imported so PINES catalogers are
> instructed to never have the local catalog search box checked in the
> Z39.50 interface.
>
>
> All initial searches for titles are done from advanced or numeric (ISBN)
> search. At that point, PINES catalogers would not have an OCLC number. Only
> after they either find the matching record in the PINE database or in the
> Z39.50 search would they have an OCLC number.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1353036
>
> Title:
> Searching: OCLC Number in Z39.50 with junk characters (2.5.2)
>
> Status in Evergreen:
> New
>
> Bug description:
> At some point in its history, OCLC began to include alphabetic
> prefixes to its OCLC Control Number. There was also a point at which
> they pre-padded zeroes in front of the number. This, of course,
> creates keyword searching problems. Recently I have even run across
> OCLC numbers that include a backslash at the end of the number! In a
> previous ILS there was a parameter that allowed for these unwanted
> characters and zeroes to be eliminated on import. In another previous
> ILS the prefixes and zeroes were somehow ignored when searching for
> the OCLC number.
>
> OCLC number searches in Evergreen require all characters to match
> exactly. This causes a problem when doing a Z39.50 Import search.
> Typing in just the OCLC number (TCN) retrieves the correct record from
> OCLC, but does not retrieve the record from the Local Catalog. Typing
> in the OCLC number with the prefix retrieves the record from the Local
> Catalog but not from OCLC. Not good. It also causes a problem for
> anyone trying to do an OCLC number search since they would have to
> guess at whether the number includes a prefix.
>
> Evergreen needs the capability to conduct a keyword search for only
> the OCLC number, while ignoring the junk prefixes, zeroes and slashes,
> or provide a way to strip the nonessential characters when the bib
> records import.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/evergreen/+bug/1353036/+subscriptions
>

--
Don Butterworth
Director of Strategic Collections Services /
Faculty Associate
B.L. Fisher Library
Asbury Theologica...

Read more...

Revision history for this message
Elaine Hardy (ehardy) wrote :

I have a better understanding of what you are doing.... I still would not use the Z39.50 interface this way because of the potential for duplicating a record in your database.

The problem isn't that the OCLC based TCN in your local catalog contains the nonnumerical characters, but that you need to filter them out in a Z39.50 search.

It sounds, then, like you have a wishlist request:

In the Z39.50 interface, when the local catalog is checked, both with and without choosing a Z39.50 source, TCN searches within the local catalog should filter out all nonnumerical characters.

The problem I see with this is for those libraries that do not have OCLC access and are searching other Z39.50 sources. TCNs in those sources may have nonnumerical characters that are retained within Evergreen, meaning they would not want to filter out those characters if searching local and nonlocal catalogs.

Since I am not a developer and don't know the under the hood parts, I don't know that development would be able to focus specifically on OCLC searching without a lot of structural changes to the interface so that OCLC searching with local catalog searching is treated differently from other Z39.50 searches. Filtering for just OCLC tcns here may be really complex.

Revision history for this message
Don Butterworth (don-butterworth) wrote :
Download full text (5.3 KiB)

Yes Elaine, You are right, about this wishlist request.

*In the Z39.50 interface, when the local catalog is checked, both with and
without choosing a Z39.50 source, TCN searches within the local catalog
should filter out all non-numerical characters*.

But, I would go even farther. I would also like for the "number only" OCLC
number to be available as a keyword in the general keyword index.

That is the number supplied by Connexion, that is what is searched in
Z39.50, that is what is searched using FirstSearch, and that is what is
searched using public WorldCat. Currently, when a user goes to WorldCat
and finds a book that we hold, when they click the hot link "Asbury
Theological Seminary" the search result they get is " Sorry, no entries
were found for your search." Why? Because the "number only" OCLC number
isn't always in the keyword index. Why not use ISBN instead? Because a
significant number of our holdings do not have ISBNs.

Getting back to the *Import Record from Z39.50* screen. I do see how it's
possible that there might be a supplier of bibliographic records that
includes alpha characters in their 001 fields, but I am not aware of any.
Is there such a supplier? I would suppose that the vast majority of
Evergreen users are "OCLC only" and that making it possible to search both
OCLC and the Local Catalog using the OCLC number would save a lot of
unnecessary searching.

In our case we are an OCLC only database. All our records are either OCLC
or Evergreen native records. At one time we actually had a programmer strip
off the alpha characters and alpha with padded zeros just so this feature
would work. So I can speak from experience that Evergreen has found
duplicate records that would have been inadvertently added to the system
because we were using the Local Catalog/OCLC TCN search.

I'm not an IT mechanic either, and it may be what I am suggesting is a big
hairy deal. On the other hand, maybe it is only one line of code that says
"ignore alpha characters, and alpha characters followed by zeros in the 001
field".

Any IT wizards want to weigh in?

Don

On Wed, Jan 8, 2020 at 12:35 PM Elaine Hardy <email address hidden>
wrote:

> I have a better understanding of what you are doing.... I still would
> not use the Z39.50 interface this way because of the potential for
> duplicating a record in your database.
>
> The problem isn't that the OCLC based TCN in your local catalog contains
> the nonnumerical characters, but that you need to filter them out in a
> Z39.50 search.
>
>
> It sounds, then, like you have a wishlist request:
>
> In the Z39.50 interface, when the local catalog is checked, both with
> and without choosing a Z39.50 source, TCN searches within the local
> catalog should filter out all nonnumerical characters.
>
>
> The problem I see with this is for those libraries that do not have OCLC
> access and are searching other Z39.50 sources. TCNs in those sources may
> have nonnumerical characters that are retained within Evergreen, meaning
> they would not want to filter out those characters if searching local and
> nonlocal catalogs.
>
> Since I am not a developer and don't know the under the hood parts, I
> don'...

Read more...

Revision history for this message
Elaine Hardy (ehardy) wrote :

Yes, that is how OCLC has their internal TCNs work regardless of how you access WorldCat. However, OCLC adds the prefixes and the initial zeros to their TCNs when a record is imported from WorldCat, either through the Z39.50 gateway or batch loads from WorldCat. It is not added by Evergreen. As far as I know they always have. Any ILS I have ever worked in has always used the prefixes and has never filtered them out in a search. The 035 retains the original format of the TCN.

Most Evergreen libraries are not OCLC libraries. Any library (including LC) that allows Z39.50 access to their database can be a target source. Many Evergreen libraries have multiple sources. Although most libraries use the Record/database ID assigned by Evergreen as their internal TCNs, LCCNs can be TCNs, so it is possible for the LCCN to be an Evergreen library's TCN. LCCNs, at least older ones, can have letters and backslashes.

PINES' only source of bib records is OCLC. Unfortunately, we still have records from member libraries' prePINES days that are not OCLC records and have various prefixes and formats.

The TCN search in the Z39.50 interface was originally intended for searching OCLC only, since PINES just uses them as our bibliographic utility. I know this because I requested that it be included when it was not in an early design and had to explain why we needed it. Early designs of the interface did not include being able to overlay from within the interface and did not search the local catalog.

Given that the record TCN is either the Evergreen assigned ID or the source supplied TCN per the MARC 035/001 switcheroo, I don't know that the TCN search is intended to simultaneously search the local catalog and the Z39.50 source since chances are they won't be the same number, especially given any added prefixes required by the source as per OCLC. If you want to search both WordCat and your local catalog at the same time, I suggest using another search or combination searches. Unfortunately the locally displayed TCN is the Evergreen supplied Database/record ID. However, you can look at the local MARC record and get the OCLC based TCN from the 901 field. You can then mark the local record as the overlay target, pick the matching OCLC record by that TCN and overlay. Given that there are reasons why your local record might not be the one identified by your staff member search of WorldCat (way too many duplicates in OCLC now and merged records in OCLC might have a different TCN than yours), searching using other search fields would allow you to find a local record a TCN search might miss.

Changed in evergreen:
importance: Undecided → Wishlist
tags: added: z3950
removed: wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.