Unable to open previously imported pdf file

Bug #369861 reported by Danza
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Inkscape
Fix Released
Medium
Jon A. Cruz

Bug Description

How to repeat the error:

- Download the attached pdf presentation

- Open Inkscape and import a slide from pdf file

- Save the file in svg format

- Close Inkscape

- Open Inkscape again

- Try to open the previously saved file

Should appear an error like this:
-------------begin error----------------
/home/francesco/drawing-1.svg:488: parser error : PCDATA invalid Char value 2
roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                               ^
/home/francesco/drawing-1.svg:522: parser error : PCDATA invalid Char value 2
roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                               ^
/home/francesco/drawing-1.svg:541: parser error : PCDATA invalid Char value 2
roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                               ^
/home/francesco/drawing-1.svg:585: parser error : PCDATA invalid Char value 2
roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                               ^

** (inkscape:5276): WARNING **: SVGView: error loading document '/home/francesco
/drawing-1.svg'

/home/francesco/drawing-1.svg:488: parser error : PCDATA invalid Char value 2
roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                               ^
/home/francesco/drawing-1.svg:522: parser error : PCDATA invalid Char value 2
roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                               ^
/home/francesco/drawing-1.svg:541: parser error : PCDATA invalid Char value 2
roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                               ^
/home/francesco/drawing-1.svg:585: parser error : PCDATA invalid Char value 2
roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
------------- end error----------------

Inkscape version 0.46-r5 compiled on gentoo with options (gentoo use
flags):
perl spell

and without options (gentoo use flags):
-debug -dia -doc -gnome -inkjar -jabber -lcms -mmx -postscript -wmf)

Maybe my Inkscape installation is missing some font?

Revision history for this message
Danza (f-occhipinti) wrote :
Revision history for this message
jazzynico (jazzynico) wrote :

Not reproduced on Ubuntu 9.04, Inkscape 0.46-5ubuntu4 and SVN build 21274.

Revision history for this message
floid (jkanowitz) wrote :

Don't have time this AM to try the test case, but I'm seeing a similar situation where a SVGZ saved in some unknown version of Inkscape (file metadata does say 0.46), using an imported PDF, contains character 0001s that the parser doesn't like with a fresh install of Inkscape 0.46-5ubuntu4 on a fresh install of Ubuntu 9.10:

file.svg:493: parser error : PCDATA invalid Char value 1
         id="tspan2654"></tspan></text>
                        ^
file.svg:635: parser error : PCDATA invalid Char value 1
         id="tspan2736"></tspan></text>
                        ^
file.svg:971: parser error : PCDATA invalid Char value 1
         id="tspan2934"></tspan></text>
                        ^
This behavior seems to be somewhat unpredictable, since if I recall correctly I have sometimes generated SVGs from the same PDF with the same version of the software that don't suffer from it. Of course, going through and stripping the illegal characters is a [tedious] workaround.

...

For what it's worth, here is a snippet of the SVG from a "malformed" file, where ^A represents the illegal character, suggesting this is just being passed through from the imported document in violation of some schema that's caught when reloading:

<text
       transform="matrix(1,0,0,-1,93.600528,484.98)"
       id="text2652"><tspan
         style="font-size:7.98000002px;font-variant:normal;font-weight:normal;fo
nt-stretch:normal;writing-mode:lr-tb;fill:#000000;fill-opacity:1;fill-rule:nonze
ro;stroke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regu
lar"
         x="0"
         y="0"
         id="tspan2654">^A</tspan></text>

Importing the same source PDF into 0.46-5ubuntu4 and saving SVG generates the following, stripping/ignoring the character, at least today:

      <text
         id="text2652"
         transform="matrix(1,0,0,-1,93.600528,484.98)">
        <tspan
           id="tspan2654"
           y="0"
           x="0"
           style="font-size:7.98000002px;font-variant:normal;font-weight:normal;font-stretch:normal;writing-mode:lr-tb;fill:#000000;fill-opacity:1;fill-rule:nonzero;stroke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular"></tspan>
      </text>

I'll attach the source PDF in case anyone cares to examine its use/abuse of Wingdings/control characters in the first place.

Revision history for this message
dopelover (dopelover) wrote :

I also can't reproduce it on Ubuntu 9.04 (x86), Inkscape r21439.

Revision history for this message
su_v (suv-lp) wrote :

Mac OS X 10.5.7, XQuartz 2.3.3.2, Inkscape 0.46+devel r21420

page 1 of <http://launchpadlibrarian.net/27206739/articles_of_org_-_domestic_llc.pdf> imports without error, but fails to load when reopened after saving as svg:

   /Volumes/blue/img/Inkscape/test/bug/369861-articles_of_org_-_domestic_llc-page1.svg:641: parser error : Input is not proper UTF-8, indicate encoding !
   Bytes: 0xFF 0xBF 0xBF 0xBF
   roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                                  ^

page 2 of <http://launchpadlibrarian.net/27206739/articles_of_org_-_domestic_llc.pdf> imports and loads after saving without error.

page 1 of <http://launchpadlibrarian.net/26179507/DMSC10-Data-Complexity.pdf> imports and loads after saving without error.

page 9 of <http://launchpadlibrarian.net/26179507/DMSC10-Data-Complexity.pdf> imports without error, but fails to load when reopened after saving as svg:

  /Volumes/blue/img/Inkscape/test/bug/369861-DMSC10-Data-Complexity-page9.svg:950: parser error : Input is not proper UTF-8, indicate encoding !
  Bytes: 0xFF 0xBF 0xBF 0xBF
  roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
                                                                                 ^

(in case space is not preserved when posted: the marker '^' points to EOL '>')

tags: added: pdf
Revision history for this message
cbowditch (cbowditch) wrote :

It looks like Inkscape has trouble with non ASCII code points when importing a PDF. I have attached a very simple PDF (excuse the FoxIt splash I had to remove confidential information from the original; also a simple document is also easier to debug!) that contains a bullet point in Wingdings font, which is code point x9F and a single line of text in Helvetica. Inkscape imports this PDF and when the SVG is saved in UTF-8 encoding the x9F codepoint becomes xEF829F but if you decode this according to UTF-8 rules you get xF01F. I would expect a UTF-8 representation of x9F to be 2 bytes long.

The previous problems reported on this bug all seem to be related and point to the fact Inkscape cannot handle non ASCII code points when importing PDF.

su_v (suv-lp)
tags: added: importing
Revision history for this message
Florian Staudacher (florian-staudacher) wrote :

I also have a PDF that upon importing and saving as SVG (both Inkscape and Plain SVG) and trying to re-open, won't open as requested. (Won't attach it, since it's a litte too big.) The error message I get on the command line is the following:

...
parser error : PCDATA invalid Char value 31
...

I am using Inkscape 0.47 (on Kubuntu karmic).
From my humble point of view it seems that Inkscape should only save characters it is able to handle and process (or simply ignore them when trying to open the document).

su_v (suv-lp)
tags: added: text
su_v (suv-lp)
tags: added: encoding
Revision history for this message
Khaled Hosny (khaledhosny) wrote :

It seems like Inkscape is failing to handle 4 byte characters, and is truncating at the first 3 bytes resulting in invalid characters.

Revision history for this message
Khaled Hosny (khaledhosny) wrote :

Here is a very minimal PDF file. The resulting SVG should contain U+1d465 (𝑥).

Revision history for this message
Khaled Hosny (khaledhosny) wrote :

Here is a test file with a 3 byte character that is converted correctly.

Revision history for this message
Khaled Hosny (khaledhosny) wrote :

It turned out that UnicodeMap::mapUnicode does not handle surrogate pairs (characters outside Unicode's BMP) and that returning invalid UTF-8, which broke my files. Also, it does not handle invalid Unicode characters and just returns invalid UTF-8 that broke the files using windingabat font (the original bug report).

Attached a preliminary patch that uses g_utf16_to_utf8() to do the conversion, this ensures the returned UTF-8 is always valid (it returns NULL otherwise), this fixes surrogate pairs issue and makes sure the resulting SVG contains valid characters, with caveat that glyphs with no proper Unicode (unencoded glyphs) will be just omitted, this however can be fixed by converting glyphs to outlines in such case, but I think that would be another issue.

Revision history for this message
su_v (suv-lp) wrote :

Testing patch with Inkscape 0.48+devel r9685 on OS X 10.5.8 (poppler 0.12.4):

1) additional warnings when compiling (GCC 4.2.1)

CXX svg-builder.o
(…)
extension/internal/pdfinput/svg-builder.cpp: In member function ‘void Inkscape::Extension::Internal::SvgBuilder::addChar(GfxState*, double, double, double, double, double, double, CharCode, int, Unicode*, int)’:
extension/internal/pdfinput/svg-builder.cpp:1362: warning: NULL used in arithmetic
(…)

2) Opening the file 'test1.pdf' fails with Inkscape 0.47 and 0.48.0 r9641 with the same backtrace and console message as in bug #605872 comment #8 (Pango:ERROR:break.c:1034:pango_default_break: code should not be reached). 'test2.pdf' can be opened and saved as SVG without failure in both Inkscape versions.

3) With the patched Inkscape 0.48+devel r9685 both PDF files open without crash, can be saved as Inkscape SVG and reopened in Inkscape 9685 without error messages.

Could it be that your patch addresses a different problem (bug #605872) - since your example file(s) seem to trigger a different failure in Inkscape than the files with which this report (bug #369861) was originally filed or is it the same problem which fails differently across Inkscape versions and platforms?

Revision history for this message
Khaled Hosny (khaledhosny) wrote :

1) The attached new patch should suppress the warning

2) I was converting the files through command line, so I didn't see this message but I got broken SVG, and the 'test2.pdf' is supposed to be OK.

3) That is right, may main problem was the same as #605872 (I think I just found the wrong bug report), but even so the two bugs, AFAIK, are manifestations of the same underlying issue. As said in my previous comment, the current UTF-16 to UTF-8 conversion is buggy, it does not handle surrogate pairs correctly (my and #605872, and possibly this one), nor it validates the resulting UTF-8 which in either case means broken SVG that can't be reopened.

Revision history for this message
su_v (suv-lp) wrote :

Testing patch with Inkscape 0.48+devel r9686 on OS X 10.5.8 (poppler 0.12.4):

- Using the test case from bug #605872 (comment #6), Inkscape 0.48+devel r9686+patch2 opens the PDF file without crash, and after saving it as Inkscape SVG, the file can be reopened without error messages and renders correctly in Inkscape AFAICT.
- Command-line invocation to convert the PDF file not yet tested.

See also this message on inkscape-devel which has a list of potentially related reports I had compiled earlier (see item 3b):
<http://article.gmane.org/gmane.comp.graphics.inkscape.devel/34564>

@JazzyNico - can you confirm this on linux/win32?

Revision history for this message
jazzynico (jazzynico) wrote :

Finally reproduced on Ubuntu 10.04, Inkscape 0.47 and trunk, with slide 4.

I confirm the patch corrects this bug and bug #605872 issues (tested with GUI and CLI).
Thanks, Khaled!

Changed in inkscape:
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Khaled Hosny (khaledhosny) wrote :

I looked quickly to the issues listed in <http://article.gmane.org/gmane.comp.graphics.inkscape.devel/34564>, and I don't think any is related to this particular bug.

Revision history for this message
su_v (suv-lp) wrote :

> and I don't think any is related to this particular bug.

Most if not all of them fail to load Inkscape SVG files (saved from imported PDF files or AI generated SVG files) due to bad character data with 'parser error : Input is not proper UTF-8' or 'parser error : PCDATA invalid Char value' console messages, which is what this report (369861) was originally about. What is the difference then?

Revision history for this message
Khaled Hosny (khaledhosny) wrote : Re: [Bug 369861] Re: Unable to open previously imported pdf file

On Thu, Aug 05, 2010 at 07:31:59AM -0000, ~suv wrote:
> > and I don't think any is related to this particular bug.
>
> Most if not all of them fail to load Inkscape SVG files (saved from
> imported PDF files or AI generated SVG files) due to bad character data
> with 'parser error : Input is not proper UTF-8' or 'parser error :
> PCDATA invalid Char value' console messages, which is what this report
> (369861) was originally about. What is the difference then?

AFAICT, though all of them results from the fact that Inkdcape can write
invalid UTF-8 to SVG files, non seems to result from direct PDF import
which this bug (and patch) are about. I think Inkscape should validate
its text output before writing SVG files, but this is outside the scope
of this bug.

Revision history for this message
su_v (suv-lp) wrote :

Assigning to Jon A. Cruz who is "reviewing the patch to get it tweaked up and ready".

Revision history for this message
muhammad Gad (mamidogad) wrote :

how can I get the tool.

On Thu, Aug 5, 2010 at 10:54 AM, ~suv <email address hidden> wrote:

> Assigning to Jon A. Cruz who is "reviewing the patch to get it tweaked
> up and ready".
>
> --
> Unable to open previously imported pdf file
> https://bugs.launchpad.net/bugs/369861
> You received this bug notification because you are subscribed to
> Inkscape.
>
> Status in Inkscape: A Vector Drawing Tool: In Progress
>
> Bug description:
> How to repeat the error:
>
> - Download the attached pdf presentation
>
> - Open Inkscape and import a slide from pdf file
>
> - Save the file in svg format
>
> - Close Inkscape
>
> - Open Inkscape again
>
> - Try to open the previously saved file
>
> Should appear an error like this:
> -------------begin error----------------
> /home/francesco/drawing-1.svg:488: parser error : PCDATA invalid Char value
> 2
>
> roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
>
> ^
> /home/francesco/drawing-1.svg:522: parser error : PCDATA invalid Char value
> 2
>
> roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
>
> ^
> /home/francesco/drawing-1.svg:541: parser error : PCDATA invalid Char value
> 2
>
> roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
>
> ^
> /home/francesco/drawing-1.svg:585: parser error : PCDATA invalid Char value
> 2
>
> roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
>
> ^
>
> ** (inkscape:5276): WARNING **: SVGView: error loading document
> '/home/francesco
> /drawing-1.svg'
>
> /home/francesco/drawing-1.svg:488: parser error : PCDATA invalid Char value
> 2
>
> roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
>
> ^
> /home/francesco/drawing-1.svg:522: parser error : PCDATA invalid Char value
> 2
>
> roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
>
> ^
> /home/francesco/drawing-1.svg:541: parser error : PCDATA invalid Char value
> 2
>
> roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
>
> ^
> /home/francesco/drawing-1.svg:585: parser error : PCDATA invalid Char value
> 2
>
> roke:none;font-family:Wingdings;-inkscape-font-specification:Wingdings-Regular">
> ------------- end error----------------
>
> Inkscape version 0.46-r5 compiled on gentoo with options (gentoo use
> flags):
> perl spell
>
> and without options (gentoo use flags):
> -debug -dia -doc -gnome -inkjar -jabber -lcms -mmx -postscript -wmf)
>
> Maybe my Inkscape installation is missing some font?
>
>
>

--
Muhammad Gad
Production Developer
Hindawi Publishing Corporation

Changed in inkscape:
assignee: nobody → Jon A. Cruz (jon-joncruz)
Revision history for this message
Jon A. Cruz (jon-joncruz) wrote :

Fix committed as of revision 9812

Changed in inkscape:
status: In Progress → Fix Committed
Revision history for this message
su_v (suv-lp) wrote :
Revision history for this message
su_v (suv-lp) wrote :

@JonCruz - would you consider backporting the fix to the 0.48.x branch?

Revision history for this message
su_v (suv-lp) wrote :
Revision history for this message
su_v (suv-lp) wrote :

OTOH in these PDF files text is omitted that was incorrectly imported in previous Inkscape versions, but the files now open in Inkscape without error message after saving them as SVG (see comment #11):

<https://bugs.launchpad.net/inkscape/+bug/394472/+attachment/618802/+files/problematic_file.pdf>
<https://bugs.launchpad.net/inkscape/+bug/405838/+attachment/642596/+files/Bode_Analyzer_Automation_Objects_Overview.pdf>

su_v (suv-lp)
Changed in inkscape:
milestone: none → 0.48.1
jazzynico (jazzynico)
Changed in inkscape:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.