Trouble if chm contains path with spaces

Bug #894193 reported by Reto Knaak on 2011-11-23
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
chm2pdf (Debian)
New
Unknown
chm2pdf (Ubuntu)
Undecided
Unassigned

Bug Description

Some links are broken, images not included in PDF etc if the CHM contains path and names with spaces.
In the attachment, I have made a nasty CHM file full of spaces to show the errors.

Related branches

Reto Knaak (reto-knaak) wrote :
Reto Knaak (reto-knaak) wrote :

Applying this patch, the CHM in the previous comment is converted without missing links and images.

Only the last page is missing, but this is also the case without this patch and is thus a separate bug. (of html2doc?)

The attachment "Fixing path with spaces patch" of this bug report has been identified as being a patch. The ubuntu-reviewers team has been subscribed to the bug report so that they can review the patch. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Julian Taylor (jtaylor) wrote :

thanks for the patch, I have forwarded it to the debian bug.

Changed in chm2pdf (Debian):
status: Unknown → New
Reto Knaak (reto-knaak) wrote :

The posted patch #2 fixes the error only if filename contains no spaces...
With filenames containing escaped spaces, it doesn't work...

Reto Knaak (reto-knaak) wrote :

I think now I have a fully working patch:

This line
  page=re.sub(r'(?i)"'+iurl,'"'+img_filename,page)
could put in the html page a mix of escaped and unescaped spaces in the paths...
(e.g. /tmp/tmp33GfZf/Name\ with\ space/doc space/image path/velocity space.gif)

Now this line will be
  page=re.sub(r'(?i)"'+iurl,'"'+re.sub('\\\\ ', ' ', img_filename),page)
to put only unescaped paths in the html page.

Reto Knaak (reto-knaak) on 2011-11-27
Changed in chm2pdf (Ubuntu):
status: New → Confirmed
Reto Knaak (reto-knaak) wrote :

In my patch there is still one place I don't like too much:
  page = re.sub('%20',' ',page)

If in the text there is a %20, this too will be replaced with a normal space - and this is not correct.
So the regex should be changed to repalce %20 only if they occour inside a src=".." or href="...", but I have no solution for this right now.

Any suggestions?

Max Grishkin (grishkin) wrote :

BeautifulSoup allows to replace %20 in links only, see http://bazaar.launchpad.net/~grishkin/chm2pdf/chm2pdf_branch/revision/16. But BeautifulSoup is only a recommended package for chm2pdf, so nothing will be substituted if it is not installed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.