Comment 8 for bug 1440304

Revision history for this message
iostrym (armandooooo) wrote : RE: [Bug 1440304] Re: [Enhancement] configure metadata import when importing pdf file in calibre

Hi,

from exiftool I have the following information regarding wrong PDF file :

---- XMP-xmp ----
Create Date : 2014:04:22 00:43:38+02:00
Modify Date : 2015:04:06 23:24:42+02:00
Creator Tool : PFU ScanSnap Manager 6.2.22 #SV600
Metadata Date : 2014:04:22 00:53:01+02:00
Caption Writer : ATT

"metadata date" is 2014-04-22T00:53:01+02:00 (as you said) but "Modify date" is 2015:04:06 23:24:42+02:00. And Modify date from XMP metadata is same date than info dict modify date.

Anyway, both metadata should be ok because both contains correct information. I don't understand why PDF info should be a problem as adobe reader and also pdf xchange are able to read info dict correctly.

and if you compare both xmp and info dict date, in the two different document, there are always the same because info dict and xmp metadata are written in same time by the same programme (pdf x change) :

OK PDF :
---- PDF ----
Modify Date : 2015:04:06 23:56:21+02:00
---- XMP-xmp ----
Modify Date : 2015:04:06 23:56:21+02:00

KO PDF :
---- PDF ----
Modify Date : 2015:04:06 23:24:42+02:00
---- XMP-xmp ----
Modify Date : 2015:04:06 23:24:42+02:00

What is the metadata taken in account then when date are the same ? And why should XMP metadata ir info dict give incorrect result in Calibre ?

Best regards,

> Date: Tue, 7 Apr 2015 06:29:42 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 1440304] Re: [Enhancement] configure metadata import when importing pdf file in calibre
>
> To be precise, calibre compares the ModDate from the PDF Info dictionary
> to the MetadataData in the XMP block. In your problem PDF, the ModDate
> is Mon Apr 6 23:24:42 2015 and the MetadataDate is
> 2014-04-22T00:53:01+02:00
>
> so calibre will use the information from the Info block rather than the
> XMP, since the Info block is marked as being newer.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1440304
>
> Title:
> [Enhancement] configure metadata import when importing pdf file in
> calibre
>
> Status in calibre: e-book management:
> Won't Fix
>
> Bug description:
> In 2.23 on win7 64 bits, when importing a pdf in calibre, some common metadata in pdf file can be read by calibre to be imported in calibre metadata.
> for example : title, author and tag are imported. Also subject metadata is put in comment.
>
> By testing I saw that :
>
> - first line of subject is put in calibre tag (pdf subject can set in many lines using some pdf editor)
> - full subject (including others lines) are put in calibre comment
> - tag must be separated by comma.
>
> But maybe this import feature is described somewhere ?
>
> Something great would be for example
> - to configure the "separator" used between tags because some pdf editor don't support comma and want ";"
> - to be able to disable de first line import in subject for tags
> - to be able to customize which calibre metadata is written using which pdf metadata :
> ie : published date is first line of subject
> isbn is second line of subject
> others lines of subject are comment
>
> I would be happy to help if I was showed where this is done in code...
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/calibre/+bug/1440304/+subscriptions