Copy to library takes 30 times longer than import

Bug #1593027 reported by William Harr on 2016-06-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

Calibre 2.58 on Ubuntu 14.04.
I have a large library >450k books. When I import books, I import into an empty library so I can edit metadata quickly. This includes, typically, switching Author and Title fields, correcting Authors with commas, setting language to English, eliminating dates from Author fields, etc. When finished, I can either import the books from the small library directory, or copy the books to the large library. The import loses some edits as it re-reads data from the files themselves. The library copy works better, but takes MUCH longer, like multiple days to import 4k files into the large library. I think the library copy should be faster, not much slower, as it should just be copying the metadata from the database to the other database. The import is re-reading the files and getting bad data again. The correcting edits then take much longer, as updating the large library takes a minute or so for each individual edit.

Copying to library has to first serialize metadata, then import it into
the destination library. Importing directly makes use of the already
serialized metadata in the OPF file -- so it will always be a little slower than
direct import. However, most of the performance difference comes from
the adding books process having an optimized implementation for finding
duplicates -- I'll port that over to copy to library someday, but in the
meantime you could just turn off checking for dupes when copying to
library in Preferences->Adding Books.

And note that importing directly from the source library folders will
not lose any metadata provided that you run Library maintenance->Library
metadata backup stats and wait for the backups to be completed (this
causes the aforementioned opf files to be written out).

Thanks for the tips and the rapid response. I'll test both approaches.

On Wed, Jun 15, 2016 at 9:58 PM, Kovid Goyal <email address hidden>
wrote:

> Copying to library has to first serialize metadata, then import it into
> the destination library. Importing directly makes use of the already
> serialized metadata in the OPF file -- so it will always be a little
> slower than
> direct import. However, most of the performance difference comes from
> the adding books process having an optimized implementation for finding
> duplicates -- I'll port that over to copy to library someday, but in the
> meantime you could just turn off checking for dupes when copying to
> library in Preferences->Adding Books.
>
> And note that importing directly from the source library folders will
> not lose any metadata provided that you run Library maintenance->Library
> metadata backup stats and wait for the backups to be completed (this
> causes the aforementioned opf files to be written out).
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1593027
>
> Title:
> Copy to library takes 30 times longer than import
>
> Status in calibre:
> New
>
> Bug description:
> Calibre 2.58 on Ubuntu 14.04.
> I have a large library >450k books. When I import books, I import into
> an empty library so I can edit metadata quickly. This includes, typically,
> switching Author and Title fields, correcting Authors with commas, setting
> language to English, eliminating dates from Author fields, etc. When
> finished, I can either import the books from the small library directory,
> or copy the books to the large library. The import loses some edits as it
> re-reads data from the files themselves. The library copy works better,
> but takes MUCH longer, like multiple days to import 4k files into the large
> library. I think the library copy should be faster, not much slower, as it
> should just be copying the metadata from the database to the other
> database. The import is re-reading the files and getting bad data again.
> The correcting edits then take much longer, as updating the large library
> takes a minute or so for each individual edit.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/calibre/+bug/1593027/+subscriptions
>

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released

Wow, no dust on you!

On Thu, Jun 16, 2016 at 5:20 AM, Kovid Goyal <email address hidden>
wrote:

> Fixed in branch master. The fix will be in the next release. calibre is
> usually released every Friday.
>
> status fixreleased
>
> ** Changed in: calibre
> Status: New => Fix Released
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1593027
>
> Title:
> Copy to library takes 30 times longer than import
>
> Status in calibre:
> Fix Released
>
> Bug description:
> Calibre 2.58 on Ubuntu 14.04.
> I have a large library >450k books. When I import books, I import into
> an empty library so I can edit metadata quickly. This includes, typically,
> switching Author and Title fields, correcting Authors with commas, setting
> language to English, eliminating dates from Author fields, etc. When
> finished, I can either import the books from the small library directory,
> or copy the books to the large library. The import loses some edits as it
> re-reads data from the files themselves. The library copy works better,
> but takes MUCH longer, like multiple days to import 4k files into the large
> library. I think the library copy should be faster, not much slower, as it
> should just be copying the metadata from the database to the other
> database. The import is re-reading the files and getting bad data again.
> The correcting edits then take much longer, as updating the large library
> takes a minute or so for each individual edit.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/calibre/+bug/1593027/+subscriptions
>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers