ebook-convert bug to and from word (docx)

Bug #1829246 reported by klaus schallhorn
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

I'm working on parsing HTML created by word and other word compatible processors. To get a large body of word I converted - among other things - epubs I have here in order to then convert the generated docx back into html files using various routes.

When I use ebook-convert to convert an epub into a docx and back, links that contain a SPAN in the clickable part, are processed incorrectly. Instead of a single html link I get three, all pointing to the URI of the original link. This is easier shown than described:

Original copy:

[p]This demonstrates what kind of [a href="http://www.example.com"]oddities linktext [span class="stdspamp"]&[/span] ampersands[/a] can produce.[/p]

After using ebook-convert to convert _to_ and the resulting docx _from_ word using the options below I have:

[p id="calibre_link-2" class="block_2"][span class="text_1"]This demonstrates what kind of [/span]
[a href="http://www.example.com" class="text_2"]oddities linktext [/a]
[a href="http://www.example.com" class="text_3"]&[/a]
[a href="http://www.example.com" class="text_2"] ampersands[/a][span class="text_1"] can produce.[/span][/p]

(I've inserted linefeeds for clarity only.)

Options I've used to ebook-convert:

ebook-convert INFILE Outfile.docx --docx-no-toc --unsmarten-punctuation --preserve-cover-aspect-ratio

And to do docx to htmlz:

ebook-convert INFILE.docx Outfile.htmlz --docx-inline-subsup

The link is handled/exported correctly when I export from the calibre generated docx using word, libre office or other compatible programs, thus it seems the error occurs when converting docx to htmlz.

I enclose the original epub, the docx and htmlz created by ebook-convert.

Revision history for this message
klaus schallhorn (kso) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.