pdftohtml could not get correct Chinese Characters

Bug #1678470 reported by wang haisheng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
poppler (Ubuntu)
New
Undecided
Unassigned

Bug Description

➜ example git:(master) ✗ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial
➜ pdf2xml-viewer git:(master) ✗ pdftohtml
pdftohtml version 0.41.0
Copyright 2005-2016 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1999-2003 Gueorgui Ovtcharov and Rainer Dorsch
Copyright 1996-2011 Glyph & Cog, LLC

poppler-data is already the newest version (0.4.7-7).

➜ example git:(master) ✗ pdffonts test.pdf
name type encoding emb sub uni object ID
------------------------------------ ----------------- ---------------- --- --- --- ---------
OCVNVZ+KaiTi_GB2312 TrueType WinAnsi yes yes yes 19 0
JSRZNG+SimSun TrueType WinAnsi yes yes yes 8 0

➜ example git:(master) ✗ pdftohtml -c -hidden -enc UTF-8 -xml test.pdf test-utf8.xml
Page-1

i could not get correct Chinese characters

test file is here
link: https://pan.baidu.com/s/1dFiSrDn
password: ai5u

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.