Toronto needs two language tags available in biblios metaform

Bug #347736 reported by Eric Ostlund
4
Affects Status Importance Assigned to Milestone
Scribe2
New
Undecided
Unassigned

Bug Description

(from <email address hidden> - toronto book loader)

We have been working on a bilingual (english and french) collection for a
government sponsor and experimented with adding a second language code. It
is very important that the Canadian government stuff works in both
languages.

We did a bit of research on valid bilingual codes and found that using
engfre is allowable in a MARC record.
http://130.15.161.74/techserv/cat/Sect05/c05bp5.html

We tried a test by adding engfre at the Biblio/Metaform stage and while it
derived properly, the accents are not correct on the french sections of
the txt file.
http://www.archive.org/details/bilingualtest00ontauoft

So, we have been running an xml task to add the second language code
before the book is scanned and derived. The OCR works well on both
languages.
http://www.archive.org/details/waterwellsinonta00torouoft

My question (and my hope) is that we can add the second code on the
metaform page during loading so that we don't have to run an extra task on
the books. Is there a way to enter a second language code and have it
work? We have also tried eng;fre entered in the language field on the
metaform and it did not OCR properly.

Is there any way that the code engfre can OCR in both languages?

Revision history for this message
Hank Bromley (hank-archive) wrote :

The best solution here is for your sponsor to correct their MARC records to indicate the second language. It is true, as stated in the bug report, that multiple language codes are allowed in MARC, and our code will extract them from any of the 3-4 standard places where they might be found in the MARC record, but both of the test books referred to in the bug report have MARC records that specify only English.

If those records can't be corrected for whatever reason, and the second language code is to be added manually to meta.xml, it has to be added as a second <language> element, not appended to the first element. In other words, "engfre" is correct in the MARC, but would be converted to:

<language>eng</language>
<language>fre</language>

in meta.xml. Currently the metaform page in the biblio tool offers no way to specify additional elements, so it's not possible to manually add languages there; perhaps the biblio tool could be modified to allow adding elements, like the Metadata Editor in the Item Manager, and the QA page in the metamgr, now do.

It looks as though this experimentation has been going on for a month or so. Did anyone consider just emailing me during that time?

Revision history for this message
Eric Ostlund (erico-archive) wrote :

I think it would be hard to get sponsors to change their MARC records in cases like this, especially since we'll only discover problems with their MARC data when we get the book and our loaders do a catalog lookup.

Yes, it is true that <language>engfre</language> doesn't work, as their experiment demonstrated.
And adding a second language code is exactly what they are doing.

It is also true that the biblio tool offers no way to specify additional elements. That is what this request is about.

Revision history for this message
Gabe (gabe-archive) wrote :

Eric is correct

Ammending marcs is not a solution as the scanning is half way complete at this point on the scans. Any future collections coming into the Toronto Scanning center from government institutions such as the Library and Archive's Canada, Ontario Legislative Library, and Provincial or Federal donors, will more often than not be in a billingual format.

"So, we have been running an xml task to add the second language code
before the book is scanned and derived" - however, when dealing with over a thousand items - this is extremely time consuming."

Yes it is.

Any future collections coming into the Toronto Scanning center from government institutions such as the Library and Archive's Canada, Ontario Legislative Library, and Provincial or Federal donors, will more often than not, be in a billingual format. Unfortunately asking donors to change marcs, if they are not serials ( like the M.O.E's) , is something that most likely will not be accomplished before given to us - especially if the Ontario Council of Universitiy Libraries will be sponsoring the text's ( what this means is the scans have to happen before a certain end date as stipulated between the donor and OCUL). There is no lead time to accomplish mass .marc changes on the donors end.

"It looks as though this experimentation has been going on for a month or so. Did anyone consider just emailing me during that time?"

I will check with Andrea and Katie - but I believe concern over this was raised a little while ago.

In the end, if this can't be done, it can't be done :)

Gabe

Revision history for this message
Katie Lawson (katie-archive) wrote :

Just to add my two cents, as one of the people dealing with these collections and loading the books, having the ability to add the second code to the metaform would be the best outcome for us.

It would keep our efficiency high and reduce the risk of books slipping through the cracks (ie. forgetting to add the second code after the fact and the book getting derived and then needing to be altered and re-derived).

Thanks, Katie

Revision history for this message
siznax (siznax) wrote : Re: [Bug 347736] Re: Toronto needs two language tags available in biblios metaform

this should be very easy to do in the biblio tool,
and in a general way, i.e. allowing any number of
language fields.

and it shouldn't be difficult to make it possible
to add any number of generic fields, using the
same code (yay, reuse!) that we have in the item
manager, metamgr, and the curation tool.

please ask Dan about it.

/<email address hidden>

On 3/24/09 9:18 AM, Katie Lawson wrote:
> Just to add my two cents, as one of the people dealing with these
> collections and loading the books, having the ability to add the second
> code to the metaform would be the best outcome for us.
>
> It would keep our efficiency high and reduce the risk of books slipping
> through the cracks (ie. forgetting to add the second code after the fact
> and the book getting derived and then needing to be altered and re-
> derived).
>
> Thanks, Katie
>

Revision history for this message
Hank Bromley (hank-archive) wrote :

Sorry if i sounded uncooperative earlier - not intentional.

Dan and I have discussed it and agree that it'd be a good thing to enable adding fields to the biblio metaform, as the other tools do. Schedule permitting, Dan will address this - we'll bring it up with Robert and Brewster at Monday's books meeting, where priorities are set.

If it's difficult for libraries to update their MARC records quickly, may I suggest that the topic be included in the initial discussions with new agencies, when setting up the initial arrangements? Perhaps with some more lead time they'd be more likely to correct their records, so that what they're verbally asking us to do is consistent with what their records are instructing us to do. Even with the improved biblio tool, having to add the second language manually will still be prone to error.

Revision history for this message
Gabe (gabe-archive) wrote :

No...not all Hank.

Your point is quite valid. I think however that asking a library to ammend their records so that we can scan the items is a long shot - worth a shot nonetheless :)

However, one of the by-products of our scanning of UofT items, is to bring to the attention of their staff that many of their records are in fact incorrect - to this day, as far as we know, those records have never been ammended by UofT. Records from 20 + years ago seem to fall into the, we did it once, that's enough.

In this instance for the Ministry of the Environment scans - it was known from the outset that we would not get records from them - their system and records are a complete mess. Thus, in order to get the funding in place from M.O.E and the Ontario Council of Universities and move forward with the scans, we had to query other catalogue's to get the records. For 1000+ items, this more than anything is why "add the second code on the
metaform page during loading so that we don't have to run an extra task on
the books" is so important to Toronto.

Cheers

Gabe

Revision history for this message
Gabe (gabe-archive) wrote :

I mean - not at all :)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.