Adding ISBN custom column significantly affects performance

Bug #757445 reported by kiwidude
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

I added an ISBN custom column using the "built-in one click" defaults on Preferences->Adding columns on my test library which has 40,000 books.

It takes the time for switching to that library from 15 seconds to just on a minute (4x slower).

Just wondered if there was any optimisation that could be done to speed that up, I've dropped the column again now.

Related branches

Revision history for this message
Charles Haley (cbhaley) wrote :

Using my 20,000 book test library and measuring startup time:
- No custom columns: 6.2 seconds
- An ISBN custom column: 6.2 seconds.

No difference.

Using the same library and measuring time to change library:
- No custom columns: 12 seconds
- An ISBN column: 12 seconds

Again, no difference. However, the doubling of time to change libraries compared to startup was troubling. After analysis, I found that tags_view.recount() was being called multiple times.

I fixed that, and the change library time is now 5.3 seconds, with or without an ISBN custom column. Fix submitted.

Bottom line: I now see no difference in startup or change library times, whether or not an ISBN column exists.

I have no explanation for why you are seeing a factor of four change. Are you doing anything with the ISBN custom column when you change libraries, such as sorting, searching, or restricting on it?

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 757445

Changing the component for this bug.

 assignee cbhaley
 status triaged

Changed in calibre:
assignee: nobody → Charles Haley (cbhaley)
status: New → Triaged
Revision history for this message
kiwidude (kiwidude) wrote :

Thanks for looking into this Charles.

It is sorting that is the cuase of this. I was originally going to mention it, but then since I tried changing my sort to another column I sitll had the problem I didn't Of course Calibre stores a number of secondary sorts and if I click enough columns to ensure that ISBN isn't among them the opening time goes back to "normal" (which btw is improved to roughly 10-12 seconds thanks to your fix).

I was able to replicate this because I had deleted by custom column. The first time I try to apply sorting on it, takes in excess of 45 seconds to do the sort. So when switching libraries I guess that as it remembers the last 3 or whatever sort columns that is getting added onto the time.

Revision history for this message
Charles Haley (cbhaley) wrote :

Code has been merged to trunk.

At this point, in my tests the time to open/change a large library is not affected by the existence of an ISBN column.

I will wait until I hear from you before closing this ticket. If you are still seeing huge differences, then will I need more information about what else might be happening during a change library operation. For example, it could be affected by library_changed() methods in plugins you use.

Changed in calibre:
assignee: Charles Haley (cbhaley) → nobody
status: Triaged → Fix Committed
Kovid Goyal (kovid)
Changed in calibre:
status: Fix Committed → Fix Released
Revision history for this message
Charles Haley (cbhaley) wrote :

Evaluating composite columns is delayed until the value is needed. This greatly improves responsiveness when opening libraries. However, sorting requires that the values for all records be evaluated immediately. When one column is evaluated for a record is evaluated, all composites in that record are evaluated at the same time to avoid multiple calls to get_metadata.

In your case, evaluating the composite columns for a record seems to take around a millisecond. That time includes the call to get_metadata, plus the call to the formatter for each composite column in the record. For my test library, the time is around 0.5 milliseconds, but this difference could be because a) I might have a faster processor, b) I might have more memory, or c) you might have multiple composite columns.

Note that subsequent sorts will used the previously computed value, so the large penalty is paid only once.

I did make one more optimization. Composite columns cannot use mi.user_categories, so there is no point in computing them in this instance. I changed the code not to compute the user_categories field in this case. I don't know if this will help you.

Changes submitted.

Revision history for this message
kiwidude (kiwidude) wrote :

Ok, thx Charles, I will tyr them out when merged to see if it makes a difference. This is on my main PC which is a Core i7 3.2GHz with 6GB ram wunning Win 7 64-bit. So it's not super recent but not exactly a slouch :-)

As for other columns, this database has only two simple yes/no custom columns in it, no composite columns outside of isbn.

It is taking bang on 45 seconds to sort 39,590 odd books of which 11,200 or so have ISBN values.

I initially noticed this because of the addition to the QUality Check plugin where I added the ability to display duplicate isbn records. This gave me 150 matches. Having it take 45 seconds to sort 150 matches had me worried initially - I understand now that actually it was loading and sorting 40,000 rows underneath.

Revision history for this message
Charles Haley (cbhaley) wrote :

Changes are merged. Give it a try.

Revision history for this message
kiwidude (kiwidude) wrote :

About 10-11 seconds to sort that column now - much better, thanks Charles.

Revision history for this message
Kovid Goyal (kovid) wrote :

@kiwidude: It might help charles if you could post th emetadata.db somewhere
nad PM him a link. Ofcourse then he will know what books you like to read :)

Revision history for this message
Charles Haley (cbhaley) wrote :

Re the database: it would be nice to have to test with. Real data is always better than constructed data.

However, I think we are at the end with this bug/problem, so there isn't any hurry, or really any strong need.

Revision history for this message
kiwidude (kiwidude) wrote :

Haha, that is true. Except he wouldn't know the books I *really* like to read because they are in my "proper" library having been top of the list to migrate. And this library is a mess now anyways after four months of plugin experiments on it.

If Charles feels it would help further I can try to find somewhere to upload it (around 45MB). However I know from previously he had a big test library of his own so if the numbers I see now are roughly similar there should be no need. As it is he has chopped the switch library time (with a sorted ISBN) from 1 minute down to 21 seconds. Without that column being sorted, the reduction is from 15 seconds down to 10 seconds. So it's a big improvement over what it was.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.