Unwrap problem - some central european chars missing
Bug #822744 reported by
helour
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
calibre |
Fix Released
|
Undecided
|
John Schember |
Bug Description
I have found "automatic" and heuristic unwrap problem in the conversion of central european (CE) documents. There are some CE chars missing. Please look at the second chars table here: http://
Many thanks.
Related branches
Changed in calibre: | |
assignee: | nobody → John Schember (user-none) |
status: | New → Triaged |
To post a comment you must log in.
Incomlete chars table (where some CE chars missing) I have found in the files: preprocess.py (PDFTOHTML list), unsmarten.py, utils.py.
Maybe only the first one is critical for automatic lines unwrapping of the pdf documents.