Embed Metadata plugin tool creating errors

Bug #1677383 reported by Jackie Stockdale on 2017-03-29
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

I recently started to use the 'Embed Metadata' plugin tool. It's not something I've felt any great need for in the past but my experiments with EPUB2 to EPUB3 format-shifting made me realise that it's much easier to do if calibre has made the first pass at sanitising the original OPF file.

However, I discovered that a small number of my EPUB2s had zero Check Book errors before Embed Metadata but a level 3 error immediately afterwards, 'The OPF has no unique identifier'

I tracked down the problem to a combination of 2 things:

1. The original content of the OPF main <dc:identifier>, i.e. the one whose 'id' attrib matches the <package> 'unique-identifier' attrib.

2. My calibre library's 'identifiers' field was empty. This field is not one I've previously paid much attention to.

Both conditions had to be met to trigger the error. For the problematic books, if I added an ISBN value to the library before running Embed Metadata, then no new errors were created.

I've sorted out my own library but I've attached a couple of problem EPUBs (scrambled) in case you want to investigate further.

Those books are invalid, the <dcLidentifier> element has no text. In
other words the actual value of the identifier is missing. As such
calibre is perfectly within its rights to delete the element.

I should probably make Check Book check for not only the existence of
the identifier element, but also whether it is empty or not.

 status invalid

Changed in calibre:
status: New → Invalid

Are you sure you checked both test epubs? I included 2 samples for the reason that one had text and the other didn't.

The main <dc:identifier> for test2_scrambled.epub is:
<dc:identifier id="uid" opf:scheme="ISBN">9780091799878</dc:identifier>

It may be invalid for other reasons but not because it has no text.

If there is no calibre library ISBN for the book (which was the case in my library at the time I first encountered the problem) then Embed Metadata still results in the 'OPF has no unique identifier' error.

Kovid Goyal (kovid) wrote :

No I did not look at the second, I did not realize it was different from the first

Kovid Goyal (kovid) wrote :

In the case of your second book, the package identifier is an ISBN and
you presumably deleted the ISBN from the identifiers in calibre. So
calibre has to delete it from the OPF when applying the metadata to the
file. I suppose I could add some code to replace it with a randomly
generated UUID in this case to keep Check Book happy.

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

 status fixreleased

Changed in calibre:
status: Invalid → Fix Released

Thank you.

For the record, I've never deleted any ISBNs from my library. I think this is how the problems arose - how common these circumstances are I couldn't begin to guess.

The books in question never had an ISBN in the library because they were all created as 'empty books' with minimal metadata back in 2009 when I only owned paper books. When I eventually did get some of them as ebooks I added them as a new format to an old library record rather than doing the full import, so calibre never got the chance to auto-extract the publisher's metadata, including ISBN.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers