no language codes for ImagePDF books
Bug #187108 reported by
Hank Bromley
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Deriver |
New
|
Undecided
|
Unassigned |
Bug Description
All the ImagePDF books are being ocr'd in English because we have no bibliographic metadata for them. Is it possible to build acquisition of bibliographic metadata into the processing pathway? Or, short of full biblio data, can we introduce some sort of human-assisted addition of language info to the metadata? Otherwise, our ocr output (and resulting text layer in the pdfs) is going to be junk on all non-English books.
For instance, here's a German one that was done recently:
ftp://ia360603.
To post a comment you must log in.