textboxes duplcated after import from docx

Bug #1789238 reported by Charlie Simon on 2018-08-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

In MS-Word, I use the MS-recommended method of keeping images and captions together by creating a text box and inserting the image and the caption within the text box. Text can flow around the image/caption and most everything works in Word AND in export to .pdf.
On import to Caliber v 3.29, WIndows 10, from .DOCX file and conversion to EPUB, every image/caption is duplicated. On investigation, any textbox content is duplicated. Other than the duplication which I can manually delete, the content is handled properly. This is in a book manuscript so the .docx file is over 100MB so I will provide only if needed.

Sample (textbox content starts at block_58):

planted into your brain and which would take the place of any individual neuron and have an identical function.</span></p>
 <p class="block_59"> </p>
 <p class="block_58">Consider replacing neurons with artificial equivalents built from electronic components such as transistors and diodes—perhaps with a microprocessor. Neuron Image by Quasar Jarosz [CC BY-SA 3.0</p>
 <p class="block_59"> </p>
 <p class="block_58">Consider replacing neurons with artificial equivalents built from electronic components such as transistors and diodes—perhaps with a microprocessor. Neuron Image by Quasar Jarosz [CC BY-SA 3.0</p>
 <p class="block_36">Through our hypothetical, completely painless microsurgical techniques, we will remove a single neuron from your brain, measure its characteristics, and replace it with one of our artificial neurons which has been adjusted to fit perfectly. As we

Attach a DOCX file (create a small extract from the full file) demonstrating/reproducing the problem to this bug report. You can do that by clicking the "Add attachment or patch" link at the bottom of the bug's page. If the file you are attaching is copyrighted, mark the bug as private. You can do this by clicking the tiny yellow icon next to "This report contains Public information" in the top right area of the bug's page.

 status incomplete

Changed in calibre:
status: New → Incomplete
Charlie Simon (charlessimon) wrote :
  • See comments. Edit (386.1 KiB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)

The docx file contains two image insertions.
The first uncovers a different but perhaps related bug if it is unrelated, let me know and I can create another bug report. It was done by "Insert Picture" then add caption. The caption shows properly but the image does not.
The second shows the original bug report. In the texbox, there is an image and a caption. In the epub file, it shows twice.

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

 status fixreleased

Changed in calibre:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments