Embed Metadata for PDF consistently adds 0.5MB to book size

Bug #1341549 reported by Jon Aske on 2014-07-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

I have noticed that the much awaited Embed Metadata button for PDF's (thank you, Kovid!) quite consistently (though not always) adds 0.5MB to book size, way more than would have been expected (a 1.1MB book becomes a 1.6MB book, for example).

Interestingly, this 0.5MB can be then made disappear by resaving the pdf with Adobe Acrobat Professional, for example.

Not sure what's going on, but it does seem like a bug and not a design feature, though I am not sure exactly what is being embedded or how.

I cannot reproduce this. Steps I tried:

1) Download a minimal PDF file:
http://brendanzagaeski.appspot.com/minimal.pdf (739 bytes)
2) Add it to calibre and add some metadata
3) Use the embed metadata tool
4) The PDF filesize becomes 17KB (an increase of ~16KB nowhere near
500KB)

Embedding metadata embeds *all* the mettadata for that book present in
calibre, this means all the standard fields,m comments and all custom
columns. Assuming you dont have a book sized comments field pdf filesize
will typically increase by 10-20KB.

 status invalid

Changed in calibre:
status: New → Invalid

This happens to me almost invariantly. I suggest you try it with a
different PDF. I can send you one if you want.

On Mon, Jul 14, 2014 at 11:08 AM, Kovid Goyal <email address hidden>
wrote:

> I cannot reproduce this. Steps I tried:
>
> 1) Download a minimal PDF file:
> http://brendanzagaeski.appspot.com/minimal.pdf (739 bytes)
> 2) Add it to calibre and add some metadata
> 3) Use the embed metadata tool
> 4) The PDF filesize becomes 17KB (an increase of ~16KB nowhere near
> 500KB)
>
> Embedding metadata embeds *all* the mettadata for that book present in
> calibre, this means all the standard fields,m comments and all custom
> columns. Assuming you dont have a book sized comments field pdf filesize
> will typically increase by 10-20KB.
>
> status invalid
>
> ** Changed in: calibre
> Status: New => Invalid
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1341549
>
> Title:
> Embed Metadata for PDF consistently adds 0.5MB to book size
>
> Status in calibre: e-book management:
> Invalid
>
> Bug description:
> I have noticed that the much awaited Embed Metadata button for PDF's
> (thank you, Kovid!) quite consistently (though not always) adds 0.5MB
> to book size, way more than would have been expected (a 1.1MB book
> becomes a 1.6MB book, for example).
>
> Interestingly, this 0.5MB can be then made disappear by resaving the
> pdf with Adobe Acrobat Professional, for example.
>
> Not sure what's going on, but it does seem like a bug and not a design
> feature, though I am not sure exactly what is being embedded or how.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/calibre/+bug/1341549/+subscriptions
>

I notice an increase of between 10-20K for my entire corpus of 300 test
pdf files.

I know this is not a high priority for some, but let me press on, if I may.
For some of us, this is important. I have a feeling this swelling of the
PDF file after embedding metadata may have to do with the saving of the
book COVER image file to the book in some form, probably highly
uncompressed. Could it be? It's the only thing I can think of. I can send
you a couple of PDF's where after obtaining metadata (including covers)
from the usual places (Google, Amazon), and embedding it, the PDF file
swells by about 0.5MB. If I then enter the file with Adobe Acrobat and do a
Save As Reduced-Size PDF, things get back to normal size. So, some kind of
compressing done by Acrobat seems to undo the swelling of the PDF file.

On Mon, Jul 14, 2014 at 11:08 AM, Kovid Goyal <email address hidden>
wrote:

> I cannot reproduce this. Steps I tried:
>
> 1) Download a minimal PDF file:
> http://brendanzagaeski.appspot.com/minimal.pdf (739 bytes)
> 2) Add it to calibre and add some metadata
> 3) Use the embed metadata tool
> 4) The PDF filesize becomes 17KB (an increase of ~16KB nowhere near
> 500KB)
>
> Embedding metadata embeds *all* the mettadata for that book present in
> calibre, this means all the standard fields,m comments and all custom
> columns. Assuming you dont have a book sized comments field pdf filesize
> will typically increase by 10-20KB.
>
> status invalid
>
> ** Changed in: calibre
> Status: New => Invalid
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1341549
>
> Title:
> Embed Metadata for PDF consistently adds 0.5MB to book size
>
> Status in calibre: e-book management:
> Invalid
>
> Bug description:
> I have noticed that the much awaited Embed Metadata button for PDF's
> (thank you, Kovid!) quite consistently (though not always) adds 0.5MB
> to book size, way more than would have been expected (a 1.1MB book
> becomes a 1.6MB book, for example).
>
> Interestingly, this 0.5MB can be then made disappear by resaving the
> pdf with Adobe Acrobat Professional, for example.
>
> Not sure what's going on, but it does seem like a bug and not a design
> feature, though I am not sure exactly what is being embedded or how.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/calibre/+bug/1341549/+subscriptions
>

No covers are not saved in PDF files. Your PDF files likely have large
binary blobs already in their XMP metadata which probably exist in
compressed streams. calibre does not write compressed XMP metadata
streams to PDF, and I have no interest in changing that, patches are
welcome.

Kovid Goyal (kovid) wrote :

On second thoughts, I will implement compression for the XMP metadata
stream, since it is an easy change. It will be in the next release. I
dont know if that will fix your issue, since you haven't attached a
sample PDF, but most likely, it will.

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers