DOCX Conversion - gitch

Bug #1552972 reported by Better Red on 2016-03-03
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

I've noticed some odd spans in the output of DOCX->EPUB Conversions - <span id="id_GoBack">. They don't have any effect on my viewing of the EPUB, but I only use the calibre viewer and the Simple Epub Extension for Chrome.

I am pretty sure they occur where I've corrected the DOCX - maybe the most recent correction. My guess is that they're something to do with Word's undo. could be since Word 2010, I upgraded from 2007 not so long ago.


Attach a DOCX file demonstrating/reproducing the problem to this bug report. You can do that by clicking the "Add attachment or patch" link at the bottom of the bug's page. If the file you are attaching is copyrighted, mark the bug as private. You can do this by clicking the tiny lock icon next to "This report contains Public information" in the top right area of the bug's page.

 status incomplete

Changed in calibre:
status: New → Incomplete
Better Red (urbanetiger) wrote :

Here you go

I rubbed out (backspace) the second 'u' in consequuntur towards end of first para.

Attached zip is from conversion debug + docx + resultant epub

Kovid Goyal (kovid) wrote :

Perfectly normal. Word has inserted a bookmark at that location, named
_GoBack. Bookmarks in the docx become anchors in the html. Why word does
that, you'd have to ask Microsoft.

 status invalid

Changed in calibre:
status: Incomplete → Invalid
Better Red (urbanetiger) wrote :

Apparently because it starts with an '_' its a 'very well hidden' bookmark, and it was introduced with Office 2010. <a href="">_GoBack hidden bookmark in Open XML while processing office 2010 word document</a>

I'm not interested in carrying any bookmarks through to the epub as anchors -- what purpose do they serve?

It would be nice if DOCX Input had a 'Discard bookmarks' option, failing that I'll see if Modify can remove them. I don't want to remove them in the DOCX as they have meaning in that context.

But something is not quite right - see attachment, the most recently added real 'bookmark/anchor' seems to be mis-labeled - its presumably Word writing wrong information in the DOCX Ψ²

Kovid Goyal (kovid) wrote :

Bookmarks are what hyperlinks point to. If you remove them hyperlinks in
the document will stop working.

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: Invalid → Fix Released
Better Red (urbanetiger) wrote :

Thanks for that update re the GoBack

Traditionally "bookmarks" in a Word are used as aide memoires within the editing processes - eg TODOs. They're not regarded as links in the sense of a bibliography reference, a foot/endnote or an index entry - which appear in the printed copy. I can't see anything to print bookmarks.

Looks like someone at MS said 'oh look we can do bookmarks as links' and didn't think through the consequences.

To make those spans meaningful one would need to create a Bookmark List that used them, I am NOT suggesting conversion should offer to do that. But unless the user is interested in doing it, they are a waste of space distraction when editing the code.

I'll see what can be done to get the Modify PI to remove them - it has a Strip spans option.


Better Red (urbanetiger) wrote :

FYI - see and following 2 posts

DD's Toolbag feature will do me until something better shows up

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers