sqlite library needlessly modified

Bug #1422058 reported by anarcat
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Won't Fix
Undecided
Unassigned

Bug Description

i am trying to track my calibre library on multiple computers. in particular, i am using git-annex to track those files from multiple locations, which is generally working great:

http://git-annex.branchable.com/tips/Git_annex_and_Calibre/

the problem i am seeing, however, is that the metadata.db file changes even though i made no change to the library. to reproduce, i just add the metadata.db file to git-annex, start calibre, stop it, and then the file was modified!

here's a diff of the database dump:

--- /home/anarcat/calibre.orig 2015-02-14 23:45:35.929316232 -0500
+++ /home/anarcat/calibre 2015-02-14 23:46:22.189315034 -0500
@@ -3980,8 +3980,8 @@
   ]
 ]');
 INSERT INTO "preferences" VALUES(1498,'news_to_be_synced','[]');
-INSERT INTO "preferences" VALUES(1513,'tag_browser_hidden_categories','[]');
-INSERT INTO "preferences" VALUES(1515,'library_view books view state','{
+INSERT INTO "preferences" VALUES(1517,'tag_browser_hidden_categories','[]');
+INSERT INTO "preferences" VALUES(1519,'library_view books view state','{
   "column_alignment": {
     "timestamp": "center",
     "pubdate": "center",
@@ -4043,7 +4043,7 @@
   ],
   "last_modified_injected": true
 }');
-INSERT INTO "preferences" VALUES(1516,'field_metadata','{
+INSERT INTO "preferences" VALUES(1520,'field_metadata','{
   "rating": {
     "is_category": true,
     "is_csp": false,

i was expecting to see some timestamp inserted, but those look like primary key changes! those changes look completely unnecessary and add needless noise in the synchronisation process here. in fact, if you would host your calibre library on (say) dropbox, you would have needless bandwidth usage (for example). on laptops, this leads to higher power usage as disks will spin up, etc...

this is calibre 2.5.0+dfsg-1 on Debian Jessie.

Revision history for this message
anarcat (anarcat) wrote :

note that this also happens with the 2.19.0+dfsg-1 release from sid, running in jessie.

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1422058

That's not a primary key, it is the result of an INSERT OR REPLACE with
a unique key and a value that has not changed.

It is pointless trying to prevent the GUI from making any changes to the
db, as there are simply too many things that can cause changes.
Something as simple as resizing a column in the GUI can cause changes in
the db. Or connecting a device. Or using the Tag Browser. Or runnning a
search.

Sure, one could (with a fair bit of effort) have the case of
start GUI, stop GUI cause no database changes, but that is a premature
optimization.

And any sync program worth its salt should be performing binary diffs
not transmitting entire changed files. Oh and before you complain about
the CPU cost of a binary diff, that is negligible compared to the CPU
cost of starting calibre.

Incidentally, I dont know what you are using to diff the db, but the
biggest change in simply starting and stopping calibre will come from
writing a large JSON value called field_metadata into the db preferences
table. Which is done to facilitate interoperation with third part tools
like calibre2opds.

 status wontfix

Changed in calibre:
status: New → Won't Fix
Revision history for this message
Kovid Goyal (kovid) wrote :
Revision history for this message
anarcat (anarcat) wrote :

Thanks for the fix.

To diff the database, i used the following commands:

sqlite3 -cmd .dump metadata.db > ~/calibre.orig
calibre
sqlite3 -cmd .dump metadata.db > ~/calibre
diff -u ~/calibre.orig ~/calibre

It would be great to have settings like resizing a column, making searches or browsing tags be stored in another database, to ease such syncs and avoid conflicts. It is true that the field_metadata structure was modified, but i'll also note that 'tag_browser_hidden_categories' and 'library_view books view state' preferences were also needlessly reset.

I wasn't worried about the cost of diffs. Indeed git-annex and git should perform fine with such changes although git may take a while to do a binary diff over the changes (it repacks only after a few loose objects are found in the repository). Git-annex will only keep the last few copy in its history, and my backup system does deduplication using rolling checksums, so no problem there.

No, my concern was violating "POLA": i would have expected Calibre to not write anything if i didn't add, modify or delete books in the database. Now I understand that there are more things than I expected in that database, but i have a hard time reconciling this with the settings sitting in ~/.config/calibre. This can cause catastrophic failures if calibre is started and closed on two different machines with a asynchronous database (say dropbox or git-annex) as those tools won't be able to merge the changes in the binary database.

As for the reference to Knuth's famous quote, having been using Calibre for a few years, and considering Calibre itself has been in existence for 8 years, this hardly seems like "premature". The full quote is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." Thank you for considering this part of the 3%. :)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.