Duplicates found when adding books -> Check on author missing

Bug #869506 reported by northguy
74
This bug affects 14 people
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Wishlist
Kovid Goyal

Bug Description

When adding books by dragging them from Explorer into Calibre, Calibre performs a check on Duplicates (which is a good thing). The bad thing, is that the check is performed on title only. It happens to be that I was adding a book which had a duplicate title, but was written by a different author. It would be nice if Calibre performs a double check on Author, to see if there is a book with a similar title, written by a different author.

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 869506

The problem is the same book often has the author name spelled differently, so
comparing on authors as well is not a god ida in general. I suggest you use
the duplicate finder calibre plugin to manage your duplicates.

 status wontfix

Changed in calibre:
status: New → Won't Fix
Revision history for this message
northguy (northguy) wrote :
Revision history for this message
northguy (northguy) wrote :

I understand you don't want to change the automatic handling of the titles and authors. But currently the Show detail screen after adding a double title, shows a modal screen which only shows a title. This is too little information to decide on if you would like to add a book to your database.

It would be great if the details screen shows a bit more detail to base my decision on to add or not-to add. A suggestion is added in the attachment in reply #2.

Revision history for this message
Kovid Goyal (kovid) wrote :

Ah you want to change the matching info displayed. I agree that would be a good thing, so I'll re-open this ticket. It is however rather a lot of work (because of the details of the adding process) and I am currently traveling so it will be a while before I can look at it.

Changed in calibre:
status: Won't Fix → New
importance: Undecided → Wishlist
assignee: nobody → Kovid Goyal (kovid)
Revision history for this message
Keith Davies (keith-davies) wrote :

I'd like to suggest author and publisher be shown[1], and perhaps file size (or file size of matching format, if there is more than one format), though the latter might be a bit tricky if there are matching files on input (such as loading EPUB and PDF of a document, when there is EPUB and PDF in the system already).

[1] I'm loading a large number of similarly-titled documents, where the primary difference is in the publisher rather than title. I don't necessarily have author name.

Revision history for this message
Monk (monk-gmx) wrote :

I would like to add to this wish. I would love to see an dialogue that gives me following choices on found duplicates:

- Open both ebooks to have a look at the content and formatting
- Merge option, i.e. when I have this book in mobi but trying to add a epub then merge them into one entry
- Replace, i.e. replace the existing entry with the new one (overwriting the existing format with the importing one)
- Skip, i.e. do not import the book
- Import, i.e. willingly create duplicate

Thanks for considering and kind regards.

Revision history for this message
Monk (monk-gmx) wrote :

Additionally I think this would be most useful when also added to the "Copy to Library" function.

Kind regards,

Monk

Monk (monk-gmx)
Changed in calibre:
status: New → Confirmed
status: Confirmed → New
Revision history for this message
dotancohen (dotancohen) wrote :

I ran into this issue today. In my case I was adding an additional format of a book: I had a particular book in CHM and I was adding a PDF. The problem is that I was actually adding several books at once and the dialogue only informed me that at least a single duplicate title exists, but not which files and if the file is a different file despite the identical title.

I am very much in favor of the dialogue suggested by Patrick, but I would like to see additional information including at a minimum file size and ideally format as well.

Thank you Kovid for your dedication to Calibre users!

Revision history for this message
James Hale (j1aas) wrote :

I would settle for a simple skip duplicates button. That is, skip the identified duplicates but add all the rest of the titles.
I realize this doesn't have the sophistication of the other requests, but it does offer my main bugbear of adding collections I get.

Revision history for this message
Alex Ott (alexott) wrote :

Another useful comparison option for comparison is language - I added to my collection books with English titles, but in English & German languages, and last added one is overwrite previous. For example, 1984 by Orwell, Airport, Hotel by Hailey, etc.

Paul Fiera (paulfiera)
description: updated
Revision history for this message
drMerry (invullen) wrote :

This bug is also related to
Bug #788183

Revision history for this message
Pat Ferate (pferate) wrote :

I've made some changes to code to implement something similar to the image that Patrick posted in #2. My new code if only within Adder.process_duplicates() in calibre/gui2/add.py. So it is only seen if calibre will already see it as a duplicate. It checks if the title is already in your library, and if so, displays all entries with title and author.

Since I didn't need to touch any other part of the code yet, I haven't looked at how calibre handles new formats of books with the same title yet, but I will soon.

One additional change that I'm contemplating is splitting each duplicate into it's own dialog box, so that you can add some or skip others. Although the flexibility would be nice, it can be overwhelming if there is a lot of duplicates. Maybe I can try implementing some sort of check box field in the question dialog, so that you can choose which to add and at the same time, see them all.

Kovid, how would you like contributed code submitted? Attach as a patch on here? Request a merge? I'm still learning bzr and it's differences with git and svn.

Revision history for this message
Kovid Goyal (kovid) wrote :

I'm not fussy. Different ways to submit code in order of decreasing preference:

1) Launchpad merge request
2) Patch generated by bzr send -o
3) Patch generated by diff
4) The changed file(s)

As for separate dialogs. Dont do that, it's not scalable for many duplicates, as you pointed out. Instead create a single dialog with a list where clicking on each item in the list shows you info about the dups for that entry in calibre.

Revision history for this message
Pat Ferate (pferate) wrote :

I've updated the code again to implement the list of checkboxes. I'm attaching a screenshot of how I currently have it set up, I'd like some feedback before I submit another merge request for this update (text, format, etc...). I'll have the code in my branch if anybody wants to take a look at it beforehand.

I'm thinking that I might also add the formats along with the title and author on each line.
    <title> by <author> (epub, pdf, mobi)
    <title> by <author> (epub, mobi)

I'm not sure about adding the publisher, as Keith posted in #5, because I think it may look too busy; although it won't be too difficult as long as it's stored in the metadata. What does everyone think?

I was looking at Monks suggestions (#6 & #7), and it seems like that functionality could work easily when going through each entry individually, using different buttons. With keeping this in one dialog box, I would imagine using a pull-down select widget for each entry; but it may become too busy when you scale it up. What does everybody think?

Thanks!

Revision history for this message
Tari Wreford (tari-wreford) wrote :

  I for one like the what I see for the duplicates, especially all the extra information of author, and format. I'm adding a lot of books right now, and trying to figure out which one in a big batch is a problem duplicate can be a pain. I came looking to add an enhancement request and found this already being addressed. Thank you!
  I like the idea of choosing to add or not to add. It makes the issue of duplicates a lot easier to deal with.
  I would suggest some feedback to your methodology, to make it more user friendly.
  Just have the one window with the information you are providing, and just allow a user to scroll up and down the list.
  You already have check boxes for each duplicate. Instead of Yes/No, just use OK. This way, the check marks determine what is going to happen. OK, means that the check marks are acted upon.
  The hide details button is ok. Sometimes you don't want to be bothered with details.
  Add either one multi-purpose button, or 2 different buttons, to check all, and un-check all. If you are adding a large number of books at one time, and have a lot of duplicates, you can choose to add everything, or not add everything with a single click.
  The last thing, just to make it 'calibre flexible' is in the preferences, have a default action to have all duplicates checked off to add or unchecked to not add, so that when your duplicate window shows up all you have to do is review, then click ok.

Just my 2 cents. I don't know how difficult this would be...

Thanks again everyone for taking the time to work on this wonderful product!

Revision history for this message
Manfred Band (ma-band) wrote :

Calibre 0.9.20 - 0.9.23

Double Cover for converting epub to epub!

Reason: page structure change (for example, add borders, etc.)

Epub conversion with a real cover:

(in Sigil - Cover Page + Cover = semantics = Cover)
Result after conversion correctly.

problem:

Epub conversion with a pseudo-cover:

(in Sigil - Cover Page + Cover = semantics unregistered)
Result after conversion = double cover!
(a true and a pseudo-cover)

Calibre generated from the pseudo-Cover a real cover, what the Sigil-semantics corresponds Cover, the pseudo-cover is not deleted - why?

At this point, the pseudo-Cover can be deleted!

Even if the pseudo-cover was not a cover, but a picture or graphic, it stands as a cover still available and can be optionally edited by Sigil.

In ebook-structur to delete the first Image:

The pseudo-Cover will be deleted.
If the following picture not a cover but a normal picture, it is missing in the Ebook.

Kovid Goyal (kovid)
Changed in calibre:
status: New → Fix Released
Revision history for this message
Aargonian (aargonian) wrote :

Regarding that it is a bad idea to ignore a possible duplication because the author's name was spelled incorrectly, thus making Calibre think the book is not a duplicate rather than that it is a duplicate, can we not add an option in the Preferences that would enable/disable checking of author name when adding possible duplicates? For instance, in my case, I've been merging two folders full over over 1000 books (generated by a custom program of mine). The problem is that both folders are 99% identical, but I made a slight change between the versions of my program that introduces a very slight (but important) alteration to the structure of the books that a simple merge won't fix. I know that the author names are consistent between both folders, and I'd rather have Calibre ignore duplicates that have the same author /and/ title, rather than just the same title. As it is, merging has been a pain, and the duplicate finder only does me so much good, since I'm still filtering for my custom tag (Program Version: X) to check which version of the book I have in calibre, then manually deleting them.

The situation is a little more complicated than that, but the general idea is that it would be much easier if the option to simply enable/disable author checking in the preferences was possible.

Forgive me if I have misunderstood something or missed something.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.