SVG from imported PDF cannot be reopened, characters error

Bug #291416 reported by mahfiaz
44
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Inkscape
Fix Released
High
Unassigned

Bug Description

I have a group of PDF files I need to covert to SVG but there is a group that I open up the PDF file in Inkscape and save file to SVG but when I try to open them again in Inkscape I receive the following error. Failed to load the requested file.

Revision history for this message
mahfiaz (mahfiaz) wrote :

To terminal it throwed such an error:
/home/mattias/Töölaud/Kodutöö 2/joonis 2.3.svg:337: parser error : PCDATA invalid Char value 18
-family:Arial;-inkscape-font-specification:Arial">¥¥¥¥¥¥¥¥¥¥

From this we see that there are inappropriate characters. I attach the file.

Revision history for this message
mahfiaz (mahfiaz) wrote :

The related question gives some further information.

Changed in inkscape:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
mahfiaz (mahfiaz) wrote :

User dr21702 helped with a bug report. He/she found out that this happens when embedding of images is switched on. https://bugs.launchpad.net/inkscape/+bug/297701

The following is the very same node from both files, the first is imported with images embedded:

<text
       transform="matrix(0.864326,0,0,-1,549.463,99.497)"
       id="text29528"><tspan
         style="fill:#b93292;fill-opacity:1;fill-rule:nonzero;stroke:none;font-family:Wingdings;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:5.4373;writing-mode:lr;-inkscape-font-specification:Wingdings-Regular"
         x="0"
         y="0"
         id="tspan29530"></tspan></text>

This is same node imported with images linked:
<text
       transform="matrix(0.86432604,0,0,-1,549.463,99.497)"
       id="text29780"><tspan
         style="fill:#b93292;fill-opacity:1;fill-rule:nonzero;stroke:none;font-family:Wingdings;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:5.4373;writing-mode:lr;-inkscape-font-specification:Wingdings-Regular"
         x="0"
         y="0"
         id="tspan29782"></tspan></text>

Revision history for this message
arne (arnehu) wrote :

Hi!

I also coverted a pdf to svg - and worked many hours on that svg file. directly after saving the pdf to svg there was no problem opening the file. However now I cannot open it.

Im not 100% sure if its exactly the same problem like above and if it has to do with characters - but there is a lot of text in my image...

IS IT POSSIBLE TO RESCUE THAT FILE?

I dont need the objects converted from pdf - i only need all the stuff I drew myself. As this took many, many hours I wouldnt like to do it all over again...

Revision history for this message
Lee Brummond (lee-brummond) wrote : Lee Brummond/HO/Farmers/USA/Zurich is out of the office.

I will be out of the office starting 11/26/2008 and will not return until
12/01/2008.

If you have an emergency please contact Maria Perez by email at
<email address hidden> or by phone at (805) 915-8945.

***** PLEASE NOTE ***** This E-Mail/telefax message and any
documents accompanying this transmission may contain privileged
and/or confidential information and is intended solely for the
addressee(s) named above. If you are not the intended
addressee/recipient, you are hereby notified that any use of,
disclosure, copying, distribution, or reliance on the contents of
this E-Mail/telefax information is strictly prohibited and may
result in legal action against you. Please reply to the sender
advising of the error in transmission and immediately
delete/destroy the message and any accompanying documents. Thank
you.*****

Revision history for this message
arne (arnehu) wrote :

All right - I dont know If my problem really has to do with converting a pdf - I finally found out that the following pieces of code made my file unusable - after removing them both everything was ok:

Revision history for this message
Pablo Trabajos (pajarico) wrote :

arne, the last attachment wasn't correctly upload. Please re-attach.

tags: added: pdf text
tags: added: svg
Revision history for this message
atreju (atreju-tauschinsky) wrote :

I see the same, or a very similar problem. For me this frequently occurs when I open a pdf created in matlab. Saving this file as svg results in an unreadable svg file, as described above.

The error I get when running inkscape in a terminal is:
floqstuckquad.svg:22584: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0x90 0x3C 0x2F 0x74
                         id="tspan18016">�</tspan></text>
                                         ^
As an additional remark: If I use pdf2svg to convert the pdf to svg in the first place, and then open the resulting svg in inkscape everything works fine. It also seems to result in a cleaner xml structure, (less unnecessarily nested groups and groups of one object).

I'd be happy to provide original files etc. if that would be helpful. If there's anything else I can do to track this down, let me know...

thanks,

atreju

Revision history for this message
Wolf (drechsel) wrote :

I've got a similar problem with inkscape 0.47 AQUA, devel-r9101 and inkscape 0.47 X11. After opening a quite complex pdf and saving as svg, it cannot be reopened.

This is the console error message:

/Users/bub/Desktop/4Bestandsplan Unterer Marktplatz M250.svg:14140: parser error : Excessive depth in document: 256 use XML_PARSE_HUGE option
                                   transform="translate(-0.24,-21.3)"><path
                                                                      ^
/Users/bub/Desktop/4Bestandsplan Unterer Marktplatz M250.svg:14140: parser error : Extra content at the end of the document
                                   transform="translate(-0.24,-21.3)"><path
                                                                      ^

The attached zip contains the original pdf and the created svg.

Yours, Wolf

Revision history for this message
Gabriel M. Beddingfield (gabrbedd) wrote :

I have the same problem as Wolf. The 'embed images' switch has no impact in my case. PDF->SVG was done on Windows with 0.47. SVG fails to open. In Widows it says, "Failed to load the requested file __" when I try to open it. On linux, when I try to open it with 0.46 (Apr 7, 2008) I get the same console messages as Wolf.

Note that these saved SVG's render fine with Firefox 3.6.

Revision history for this message
Gabriel M. Beddingfield (gabrbedd) wrote :

Just checked my SVG file with xmllint (v20631), and got the identical error message on the console:

/home/gabriel/e_10273.svg:13442: parser error : Excessive depth in document: change xmlParserMaxDepth = 1024

su_v (suv-lp)
tags: added: encoding
Revision history for this message
Paolo Ariano (ariano-paolo) wrote :

1. I import a .pdf file, the preview is perfect but when imported the text contain a lot of "%$ strange charcaters
2. i clean the imported .pdf with inkscape and delete the mismatching text and finally I save it as .svg
3. if i try to open the saved .svg inkscae can't open it
4. i open the .svg with a text editor (vi, gedit) and i found the error, here a crop:
<tspan
               id="tspan5028"
               sodipodi:role="line"
               y="0"
               x="0 4.9372802 9.8745604 14.81184 19.749121 24.686399"
               style="font-size:8.88000011px;font-variant:normal;font-weight:normal;writing-mode:lr-tb;fill:#000000;fill-opacity:1;fill-rule:nonzero;stroke:none;font-family:Arial;-inkscape-font-specification:ArialMT"></tspan>

5. i delete manually the  before the </tspan> and so on, i save the .svg file with the tex editor
6. now inkscape can open the manually cleaned .svg file, i edit it save and reopen, all i sok
7. so this is specific with the imported .pdf files, this does not happen if i create a file and save it as .svg

Revision history for this message
Gabriel M. Beddingfield (gabrbedd) wrote :

Paolo, I think that's a different issue than what's being discussed here.

There problem here is that when a PDF is imported, the XML structure is severely nested. The tree structure goes something like 1,200 or 2,000 elements deep. This bumps up against an arbitrary limit inside the XML parser, that only allows structures 1,024 elements deep.

If you examine the resulting XML after a PDF import, there is no need for the structure to be so deep. I'm guessing there's probably a simple fix in the PDF import logic. Like there's a loop that's adding subelements instead of sibling elements.

Revision history for this message
su_v (suv-lp) wrote :

@Gabriel - you problem seems to be bug #297070 “depth of xmlParserMaxDepth insufficient”. This report (bug #291416) is about the encoding error ('parser error : PCDATA invalid Char value') when reopening a SVG file saved from an imported PDF file.

Revision history for this message
Gabriel M. Beddingfield (gabrbedd) wrote :

@~suv - Yes, I was incorrect. Sorry.

It looks like my other comments are also for that bug.

Revision history for this message
bbyak (buliabyak) wrote :

There should be no more "Excessive depth in document" errors as of rev 9775

Revision history for this message
su_v (suv-lp) wrote :

The only PDF file attached to this report to test with (the error happens when writing the SVG file after loading the PDF file, not when loading the broken SVG), saves as SVG and reopens without issues in Inkscape 0.48+devel r9812 on OS X 10.5.8 (SVG file attached).

Same applies to file 'ChannelGuide.pdf' attached in duplicates (bug #297701, #297702).

See Bug #369861 “Unable to open previously imported pdf file” (patch was committed in r9812).

Revision history for this message
theAdib (theadib) wrote :

i am running Ubuntu10.10 using current Inkscape devel builds rev10013.
I can load the pdf from Archiv.zip export to svg and open again.

Could someone summarize if this defect still exists?

Adib.

krash (cesar-alvarez)
description: updated
Revision history for this message
su_v (suv-lp) wrote :

@theAdib - see my comment before yours: there are no original PDF files attached here except the one from Wolf, and that one no longer produced the same error after the commit of the patch which was attached in bug #369861:

Please read Khaled Hosny's comments there what the patch does address and what not, comments #8, #11, ...
«… preliminary patch that uses g_utf16_to_utf8() to do the conversion, this ensures the returned UTF-8 is always valid (it returns NULL otherwise), this fixes surrogate pairs issue and makes sure the resulting SVG contains valid characters, with caveat that glyphs with no proper Unicode (unencoded glyphs) will be just omitted, …»
<https://bugs.launchpad.net/inkscape/+bug/369861/comments/11>

AFAIU, after revision 9812 - which committed the patch from K. Hosny - a character which had been incorrectly read from the PDF file, is silently dropped and not written to the SVG file, thus avoiding the encoding error on reopening the SVG file.

Revision history for this message
mahfiaz (mahfiaz) wrote :

I found the old file, which the original bug report was about, tried to reproduce the bug and everything worked fine in r9654. Although, I am not 100% sure if the file is exactly the same (it may have been changed).

Changed in inkscape:
status: Confirmed → Fix Released
Revision history for this message
Eothred (ylevinsen) wrote :

I'd just like to add that I had similar issues with version 0.48.2, but the problems are gone using the latest version from bazaar. When I imported my pdf (which was exported from some autocad 2d drawing), and saved as svg, I could not open the svg again. It did not help to unhook the "embed images" during import as I've seen some suggest on the forums.

Thanks for your help developers!

As a sidenote: The Autocad thing is utterly hopeless at giving me a useable figure for publications, so I have to import it to Inkscape and fix it there. Kind of funny how expensive Autocad is and then after paying I still need a FOSS application to get something useful! :D

Revision history for this message
Muxa (muxa-p) wrote :

I am using precompilled win distribution of Inkscape 0.48.2 r9819.
I just import page of PDF, cut some elements, delete raster graphics and save it as Inkscape svg.
But can't open it in next time.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.