pdftohtml could not get correct Chinese Characters
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
poppler (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
➜ example git:(master) ✗ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
➜ pdf2xml-viewer git:(master) ✗ pdftohtml
pdftohtml version 0.41.0
Copyright 2005-2016 The Poppler Developers - http://
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC
poppler-data is already the newest version (0.4.7-7).
➜ example git:(master) ✗ pdffonts test.pdf
name type encoding emb sub uni object ID
-------
OCVNVZ+KaiTi_GB2312 TrueType WinAnsi yes yes yes 19 0
JSRZNG+SimSun TrueType WinAnsi yes yes yes 8 0
➜
➜ example git:(master) ✗ pdftohtml -c -hidden -enc UTF-8 -xml test.pdf test-utf8.xml
Page-1
i could not get correct Chinese characters
test file is here
link: https:/
password: ai5u