Deriver

Overview
Code
Bugs
Blueprints
Translations
Answers

no language codes for ImagePDF books

Bug #187108 reported by Hank Bromley on 2008-01-29

Affects		Status	Importance	Assigned to	Milestone
	Deriver	New	Undecided	Unassigned

Bug Description

All the ImagePDF books are being ocr'd in English because we have no bibliographic metadata for them. Is it possible to build acquisition of bibliographic metadata into the processing pathway? Or, short of full biblio data, can we introduce some sort of human-assisted addition of language info to the metadata? Otherwise, our ocr output (and resulting text layer in the pdfs) is going to be junk on all non-English books.

For instance, here's a German one that was done recently:

ftp://ia360603.us.archive.org/0/items/mitteilungen01gescgoog/mitteilungen01gescgoog_djvu.txt

Tags:

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.