Duplicated ToC entries in PDF to any format conversion

Bug #1738385 reported by MitraX
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

DESCRIPTION:

Conversion of any PDF book to any format (e.g. epub, mobi or azw3) always ends with many times duplicated links (even in wrong levels) in table of contents, no matter what options for ToC you select.

ENVIRONMENT:

Calibre version: 3.14 (and all versions before)
OS: Any

REPRODUCIBILITY: Always

STEPS TO REPRODUCE:

1) Add an PDF book into the Calibre

2) Select the book and start conversion by clicking Convert book > Convert individually

3) Select output format in the field located in the upper right corner (e.g. EPUB or AZW3)

4) Click the Table of Contents in the left column

5) Configure options for ToC in the main window, e.g.:

- Uncheck "Allow duplicate links when creating the Table of Contents"

- Check "Do not add detected chapters to the Table of Contents"

- Check "Manually fine-tune the ToC after conversion is completed"

6) Click the OK button in order to start conversion process

EXPTECTED BEHAVIOUR:

The book should contain ToC without duplicated links in valid levels and sublevels, e.g.:

- Chapter 1
-- Subchapter 1
--- Subsubchapter 1
--- Subsubchapter 2
-- Subchapter 2
- Chapter 2
...

ACTUALLY BEHAVIOUR:

Table of contents is created with duplicated links; the same link is present many times in different levels of ToC, e.g.:

- Chapter 1
-- Subchapter 1
--- Subsubchapter 1
- Subchapter 1 --> duplicated, not valid level
-- Subsubchapter 1 --> duplicated, not valid level
-- Subsubchapter 2
- Subsubchapter 1 --> duplicated, not valid level
- Subsubchapter 2 --> duplicated, not valid level
- Subchapter 2 --> duplicated, not valid level
- Chapter 2

NOTE:

It seems there's some kind of loop passing through the whole ToC everytime it enters into new subchapter.

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1738385

Attach a PDF file demonstrating/reproducing the problem to this bug report. You can do that by clicking the "Add attachment or patch" link at the bottom of the bug's page. If the file you are attaching is copyrighted, mark the bug as private. You can do this by clicking the tiny yellow icon next to "This report contains Public information" in the top right area of the bug's page.

 status incomplete

Changed in calibre:
status: New → Incomplete
Revision history for this message
MitraX (mitrax-f) wrote :

Well, you can use whatever PDF book and the result will always be the same.

However, I've attached the first one I've found on Internet; it is called "Think Java", published under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License and it's available on http://greenteapress.com/wp/think-java/

Here is another PDF file, official Kindle Paperwhite User's Guide, for which all ToC is flatten if you convert it to e.g. epub format: https://s3-us-west-2.amazonaws.com/customerdocumentation/EC/Kindle_User_Guide_EN-US.pdf

Revision history for this message
Kovid Goyal (kovid) wrote :
  • t.epub Edit (539.0 KiB, application/octet-stream)

I tried converting the kindle guide to epub and I can see no duplicate entries in the TOC. Every entry in the toc corresponds to a link in the PDF file and none fo them have the same text/destination. The links are fairly useless since in the PDF file they only point to pages, not actual locations, so in the epub they dont always end up pointing to exactly the right place, but that is a fundamental limitation of converting PDF.

See attached epub.

Changed in calibre:
status: Incomplete → Invalid
Revision history for this message
MitraX (mitrax-f) wrote :

That was the second example I mentioned for the flatten ToC issue. It does really make sense what you explained regarding this particular book, so everything is clear here.

However, please try the first one book I added as an attachment: "thinkjava.pdf". It will contain duplicated entries as I described in the example, after the conversion to e.g. azw3.

MitraX (mitrax-f)
Changed in calibre:
status: Invalid → New
Revision history for this message
Kovid Goyal (kovid) wrote : Fixed in master

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.