Copy to library takes 30 times longer than import

Bug #1593027 reported by William Harr
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

Calibre 2.58 on Ubuntu 14.04.
I have a large library >450k books. When I import books, I import into an empty library so I can edit metadata quickly. This includes, typically, switching Author and Title fields, correcting Authors with commas, setting language to English, eliminating dates from Author fields, etc. When finished, I can either import the books from the small library directory, or copy the books to the large library. The import loses some edits as it re-reads data from the files themselves. The library copy works better, but takes MUCH longer, like multiple days to import 4k files into the large library. I think the library copy should be faster, not much slower, as it should just be copying the metadata from the database to the other database. The import is re-reading the files and getting bad data again. The correcting edits then take much longer, as updating the large library takes a minute or so for each individual edit.

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1593027

Copying to library has to first serialize metadata, then import it into
the destination library. Importing directly makes use of the already
serialized metadata in the OPF file -- so it will always be a little slower than
direct import. However, most of the performance difference comes from
the adding books process having an optimized implementation for finding
duplicates -- I'll port that over to copy to library someday, but in the
meantime you could just turn off checking for dupes when copying to
library in Preferences->Adding Books.

And note that importing directly from the source library folders will
not lose any metadata provided that you run Library maintenance->Library
metadata backup stats and wait for the backups to be completed (this
causes the aforementioned opf files to be written out).

Revision history for this message
William Harr (william-h-harr) wrote : Re: [Bug 1593027] Re: calibre bug 1593027

Thanks for the tips and the rapid response. I'll test both approaches.

On Wed, Jun 15, 2016 at 9:58 PM, Kovid Goyal <email address hidden>
wrote:

> Copying to library has to first serialize metadata, then import it into
> the destination library. Importing directly makes use of the already
> serialized metadata in the OPF file -- so it will always be a little
> slower than
> direct import. However, most of the performance difference comes from
> the adding books process having an optimized implementation for finding
> duplicates -- I'll port that over to copy to library someday, but in the
> meantime you could just turn off checking for dupes when copying to
> library in Preferences->Adding Books.
>
> And note that importing directly from the source library folders will
> not lose any metadata provided that you run Library maintenance->Library
> metadata backup stats and wait for the backups to be completed (this
> causes the aforementioned opf files to be written out).
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1593027
>
> Title:
> Copy to library takes 30 times longer than import
>
> Status in calibre:
> New
>
> Bug description:
> Calibre 2.58 on Ubuntu 14.04.
> I have a large library >450k books. When I import books, I import into
> an empty library so I can edit metadata quickly. This includes, typically,
> switching Author and Title fields, correcting Authors with commas, setting
> language to English, eliminating dates from Author fields, etc. When
> finished, I can either import the books from the small library directory,
> or copy the books to the large library. The import loses some edits as it
> re-reads data from the files themselves. The library copy works better,
> but takes MUCH longer, like multiple days to import 4k files into the large
> library. I think the library copy should be faster, not much slower, as it
> should just be copying the metadata from the database to the other
> database. The import is re-reading the files and getting bad data again.
> The correcting edits then take much longer, as updating the large library
> takes a minute or so for each individual edit.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/calibre/+bug/1593027/+subscriptions
>

Revision history for this message
Kovid Goyal (kovid) wrote : Fixed in master

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released
Revision history for this message
William Harr (william-h-harr) wrote : Re: [Bug 1593027] Fixed in master

Wow, no dust on you!

On Thu, Jun 16, 2016 at 5:20 AM, Kovid Goyal <email address hidden>
wrote:

> Fixed in branch master. The fix will be in the next release. calibre is
> usually released every Friday.
>
> status fixreleased
>
> ** Changed in: calibre
> Status: New => Fix Released
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1593027
>
> Title:
> Copy to library takes 30 times longer than import
>
> Status in calibre:
> Fix Released
>
> Bug description:
> Calibre 2.58 on Ubuntu 14.04.
> I have a large library >450k books. When I import books, I import into
> an empty library so I can edit metadata quickly. This includes, typically,
> switching Author and Title fields, correcting Authors with commas, setting
> language to English, eliminating dates from Author fields, etc. When
> finished, I can either import the books from the small library directory,
> or copy the books to the large library. The import loses some edits as it
> re-reads data from the files themselves. The library copy works better,
> but takes MUCH longer, like multiple days to import 4k files into the large
> library. I think the library copy should be faster, not much slower, as it
> should just be copying the metadata from the database to the other
> database. The import is re-reading the files and getting bad data again.
> The correcting edits then take much longer, as updating the large library
> takes a minute or so for each individual edit.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/calibre/+bug/1593027/+subscriptions
>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.