Abbyy not working with '-lang Japanese' or '-lang ChineseTraditional'?

Bug #175173 reported by Hank Bromley
2
Affects Status Importance Assigned to Milestone
Deriver
Fix Committed
Undecided
Hank Bromley

Bug Description

AbbyyXML failed in http://www-tracey.us.archive.org/log_show.php?task_id=21814305 .

When called manually (with '-lang Japanese'), so we can see error messages, we get this on the first page:

Info : Processing page 1
Error : ABBYY message is: Requested functionality is not supported in this version of ABBYY FineReader Engine
Error : Cannot OCR and export to file "0001.xml" [-415]
Error : Failure during OCR processing.

When called identically but without specifying a language, it completes the first page (though the results are, of course, garbage).

This is the first Japanese book we've done since we started specifying languages for ocr.

Tags: ocr
Revision history for this message
Hank Bromley (hank-archive) wrote :

Exact same failure mode with '-lang ChineseTraditional'. Log at http://www.us.archive.org/log_show.php?task_id=22101740 (the first Chinese book scanned since we started specifying languages for ocr). As with Japanese, tried calling manually to see error message, and got the same text reported above for Japanese.

pdf_comp claims to support both Chinese and Japanese (according to '-l' output). Not clear it does.

Revision history for this message
Hank Bromley (hank-archive) wrote :

Klaus at LuraTech confirmed that their license from Abbyy excludes the "CJK" languages (Chinese, Japanese, Korean). I've checked in a new version of our language list that omits them. (As a result, ocr will proceed on books in such languages, but using the English dictionary and charset, so the results will be poor.)

I'm leaving this bug open for now, as we have a few books that redrowed during ocr (using languages other than CJK), showing the same error (return value 255). I won't know whether that's a related problem until I run those books manually so I can see the Abbyy error messages.

Changed in deriver:
assignee: nobody → hank-archive
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.