calibredb --add with ignoring errors option

Bug #1711272 reported by xiatian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Invalid
Undecided
Unassigned

Bug Description

Hi, I'm using 'calibredb add /root/test/* --with-library /root/library -r -d' command to add books to my library. I have thousands of books needed to be processed. It will take a long time. However, the process was killed by an error 'Error: Did no succeed opening JPX Stream as JP2, trying as J2K. Killed' Is there a an option to ignore any errors to prevent the process from being killed? I want all my books to be traversed to be added in calibre library.

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1711272

The error you mention comes from a corrupted PDF file. As far as I know,
all such errors are already ignored, they simply cause reading metadata
from the PDF file in question to fail. Indeed, reading PDF metadata
happens via a separate worker process, so even if it is killed, it does
not matter. So if calibredb add finished, it most likely added all your
books. Check the exit code of the calibredb process.

 status invalid

Changed in calibre:
status: New → Invalid
Revision history for this message
xiatian (521xiatian) wrote :

I'm afraid the process was just killed but the whole task wasn't finished. I have 100GB ebook files in a directive and I tried to calibredb them in a calibre library directory. When the process was killed, the size of this library was only 30GB. Is there an argument which can show all the output? I want to locate which book caused the problem to fail.

Revision history for this message
Kovid Goyal (kovid) wrote :

I can certainly add a verbose flag to have it output the names of the
book s it is adding, but I think you will be better served by simply
splitting up your collection into 5-6 chunks and adding those
separately. Basically calibredb add is not really designed for adding
huge collections at once, since that is a relatively uncommon operation.
It's likely it is being killed because of an out of memory condition, so
splitting it up will fix it.

Revision history for this message
xiatian (521xiatian) wrote :

I tried to split it up and I found another error and then it stopped:

Traceback (most recent call last):
  File "/usr/bin/calibredb", line 19, in <module>
    sys.exit(main())
  File "/usr/lib/calibre/calibre/library/cli.py", line 1229, in main
    return command(args[2:], dbpath)
  File "/usr/lib/calibre/calibre/library/cli.py", line 360, in command_add
    tags, opts.series, opts.series_index)
  File "/usr/lib/calibre/calibre/library/cli.py", line 255, in do_add
    dir_dups.extend(db.recursive_import(dir, single_book_per_directory=one_book_per_directory))
  File "/usr/lib/calibre/calibre/library/database2.py", line 3597, in recursive_import
    self.import_book_directory_multiple(dirpath[0], callback=callback)
  File "/usr/lib/calibre/calibre/library/database2.py", line 3569, in import_book_directory_multiple
    self.import_book(mi, formats)
  File "/usr/lib/calibre/calibre/library/database2.py", line 3321, in import_book
    self.set_metadata(id, mi, ignore_errors=True, commit=True)
  File "/usr/lib/calibre/calibre/library/database2.py", line 2094, in set_metadata
    self.set_path(id, index_is_id=True)
  File "/usr/lib/calibre/calibre/library/database2.py", line 607, in set_path
    path = self.construct_path_name(id)
  File "/usr/lib/calibre/calibre/library/database2.py", line 566, in construct_path_name
    while author[-1] in (' ', '.'):
IndexError: string index out of range

Was the process finished ?

Revision history for this message
Kovid Goyal (kovid) wrote :

You appear to be using a truly ancient and completely unsupported version of calibre. Presumably the one supplied by your linux distribution. Uninstall it and install the official calibre binaries from https://calibre-ebook.com/download_linux and you should be fine.

Revision history for this message
xiatian (521xiatian) wrote :

Hi,
I have updated my calibre to the latest released binary and added an option "--ignore *.pdf" but I still got an pdf error and the process was again killed.

"""
Traceback (most recent call last):
  File "site-packages/calibre/customize/ui.py", line 417, in get_file_type_metadata
  File "site-packages/calibre/customize/builtins.py", line 342, in get_metadata
  File "site-packages/calibre/ebooks/metadata/pdf.py", line 108, in get_metadata
RuntimeError: Failed to run pdfinfo
Killed

"""

Revision history for this message
Kovid Goyal (kovid) wrote :

You cant use --ignore *.pdf your shell will expand the *. Instead do --ignore \*.pdf

And split up your add into batches as I said above.

Revision history for this message
xiatian (521xiatian) wrote :

Sorry to bother you again. I have ignored pdf files and split up the collections. Unfortunately I got another error after a few minutes' process:
"ImportError: libXcomposite.so.1: cannot open shared object file: No such file or directory
Killed
"
I'm using the latest released binary, I don't know why I come across so many problems.

Revision history for this message
Kovid Goyal (kovid) wrote :

You are missing libXcomposite on your machine. Install ti and the error
will go away.

Revision history for this message
xiatian (521xiatian) wrote :

I installed libXcomposite and that problem was gone. And then I tried to add a chunk of a directory(about 30G), which contains sub-directories. It was again killed after running a few seconds, without any errors output, just showing killed. How should I track the errors? Is 30G still too big or too many sub-directories that can kill calibre?

Revision history for this message
Kovid Goyal (kovid) wrote :

How many books you can add at once depends on how much RAM you have
available.

Revision history for this message
xiatian (521xiatian) wrote :

OK, I will try to get a dedicated server with 8G or 16G RAM and see what will happen. Thank you so much for your kind help! I will update you on any progress in near future. I will appropriate your kindness.

Revision history for this message
Kovid Goyal (kovid) wrote :

There's no need. Do the import on your home desktop, them simply copy
the calibre library folder to your server.

Revision history for this message
xiatian (521xiatian) wrote :

I changed to debian 7, it needs glibc 2.14 so I installed glibc 2.14 but there was another error:
"WARNING: Failed to set default libc locale, using en_US.UTF-8
Traceback (most recent call last):
  File "site.py", line 72, in main
  File "site.py", line 18, in set_default_encoding
  File "locale.py", line 581, in setlocale
Error: unsupported locale setting"

I can't see there is anything wrong with my locale setting. Could you help me ?
Thank you so much.

Revision history for this message
Kovid Goyal (kovid) wrote :

It's a warning, you can ignore it. Or set your LANG environment variable
correctly to a locale that is actually installed on your system.

Revision history for this message
Kovid Goyal (kovid) wrote :

Probably when you upgraded glibc you did not re-genrate the locale
files.

Revision history for this message
xiatian (521xiatian) wrote :

Hi, I have changed my server os from Debian7 to Ubuntu14 and ignored all pdf files. Everything goes fines now. Thank you . And I still hope there could be an option to show full output of calibredb add command so that I can monitor the whole progress. If there is an error, I can locate which book is the cause. That would be nice.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.