Converting MOBI to EPUB fails with "SplitError: Could not find reasonable point at which to split"

Bug #1427694 reported by drunken monkey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Invalid
Undecided
Unassigned

Bug Description

Normally, converting .mobi to .epub works great. However, there's one book where it just won't work, throwing this error:

  1% Converting input to HTML...
  InputFormatPlugin: MOBI Input running
  on (…)
  Parsing all content...
  Forcing index.html into XHTML namespace
  34% Running transforms on ebook...
  Merging user specified metadata...
  Detecting structure...
  Flattening CSS and remapping font sizes...
  Source base font size is 12.00000pt
  Removing fake margins...
  Cleaning up manifest...
  Trimming unused files from manifest...
  Trimming u'images/00006.jpg' from manifest
  Trimming u'images/00004.jpg' from manifest
  Creating EPUB Output...
  67% Running EPUB Output plugin
  Splitting markup on page breaks and flow limits, if any...
    Looking for large trees in index.html...
    Found large tree #42
  Traceback (most recent call last):
    File "/usr/bin/ebook-convert", line 20, in <module>
      sys.exit(main())
    File "/usr/lib/calibre/calibre/ebooks/conversion/cli.py", line 360, in main
      plumber.run()
    File "/usr/lib/calibre/calibre/ebooks/conversion/plumber.py", line 1198, in run
      self.opts, self.log)
    File "/usr/lib/calibre/calibre/ebooks/conversion/plugins/epub_output.py", line 198, in convert
      split(self.oeb, self.opts)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 60, in __call__
      self.split_item(item)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 70, in split_item
      self.max_flow_size, self.oeb, self.opts)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 214, in __init__
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 347, in split_to_size
      self.split_to_size(tree)
    File "/usr/lib/calibre/calibre/ebooks/oeb/transforms/split.py", line 340, in split_to_size
      raise SplitError(self.item.href, root)
  calibre.ebooks.oeb.transforms.split.SplitError: Could not find reasonable point at which to split: index.html Sub-tree size: 280 KB
It, of course, doesn't produce an .epub file. Also, I checked, and the original .mobi doesn't have any long paragraphs that might cause this.

I don't think I'm allowed to attach the book I'm trying to convert, but it's this one:
http://www.worldcat.org/title/reapers-gale/oclc/191935167 (ISBN: 0553813161, UUID: 2e34e378-ef5a-4696-9d50-e275a33c52df)

Versions:
  calibre: 2.20.0-1 (But happened with earlier versions, too.)
  Qt: 5.4.1-2
  pyqt5: 5.4.1-1
OS: Linux 3.18.6-1-ARCH (x86_64)

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1427694

That error indicates the input file contains some large block of
unstructured text. You can work around it by increasing the split size
int he epub output section of the conversion dialog, but be aware that
the resulting epub file might not work on some older e-ink devices.

 status invalid

Changed in calibre:
status: New → Invalid
Revision history for this message
drunken monkey (remus) wrote :

Thanks for your response!

Just a bit of additional information for anyone else stumbling across this problem:
What puzzled me was that I thought this splitting would occur on paragraph borders, and since the book in question didn't have any long paragraphs, I didn't know what could cause these "large trees".
However, splitting actually occurs across page breaks (as far as I can tell – and it of course makes sense, because otherwise the reader would need to open the other HTML files after all for determining the correct positioning), and there was just an extremely long chapter in the book in question.
So I did the following:

- Based on the above suggestion, I converted with larger split size first:
    ebook-convert book.mobi book.epub --flow-size 1000000
- I then unzipped the .epub to look at the contents:
    unzip book.epub
- Listing the contents, I saw one .html file that was indeed 280 KB large.
- I edited that file, moving the first few paragraphs over to the previous HTML file, moving the file under the 260 KB threshold. (Of course, I also made sure that the previous file ended up under that threshold as well – otherwise, I'd have had to use the next file instead, or juggled around more.)
- I then re-zipped the book which now – apart from a missing page break and one surplus one in the middle of the long chapter – should work fine everywhere. Command (assuming the whole directory contents are the extracted book, nothing else – especially not book.mobi or book.epub):
    zip book.epub *

Revision history for this message
Kovid Goyal (kovid) wrote :

FYI, calibre inscludes an ebook editor that can edit epub files
directly, which you should use rather than zip/unzip as the epub file
has special internal conventions that zip/unzip will not respect.

Also note that splitting does happen on many different types
of "suitable" locations in the HTML, not just page breaks. However, it
is not always possible to find suitable locations automatically.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.