calibre

Metadata wrong for books added through the web server

Bug #1992244 reported by Florian Bach on 2022-10-08

10

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	calibre	Fix Released	Undecided	Unassigned

Bug Description

This is a bug similar to https://bugs.launchpad.net/calibre/+bug/1945889 which you've fixed before.
The past bug was about book metadata being wrong for books added through the auto-add feature, and the bug seems to still exist when adding books through the Calibre web server.

For context, I've created a Calibre plugin that can turn Adobe ACSM files into EPUB or PDF files upon import.

When importing the file normally, everything works fine.
Since 5.29, importing a file through the auto-add folder also works fine (see bug report linked above).
However, when adding an ACSM file through Calibre's web server, the same bug still occurs.

The book is imported successfully, but it has the old metadata (so, it takes the ACSM file's name as book name and author instead of the EPUB metadata).

I'm assuming that whatever fix you've applied in the bug report linked above for auto-added EPUB files now needs to be added to files imported through the Calibre Web UI, meaning, make Calibre re-read book metadata after the import is done and all FileTypePlugins have already ran.

If you need further clarification of the bug please let me know. It'd be great if this similar bug could be fixed as well.

For context, I'm running Calibre 6.4 on Linux, but another user confirmed the same bug still exists in Calibre 6.6.1 on MacOS.

Revision history for this message

Kovid Goyal (kovid) wrote on 2022-10-09: Fixed in master

#1

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

status fixreleased

Changed in calibre:
status:	New → Fix Released

Revision history for this message

Florian Bach (leseratte10) wrote on 2022-10-14:

#2

Thanks for the bugfix, but the behaviour is still broken now, just in a different way. Unfortunately I was unable to test that before the new version was released today, I tried to compile Calibre myself on Linux but was unsuccessful.

It seems as if Calibre is now running the FileTypePlugins twice, or even three times sometimes.

Looking at the log file, Calibre first executes the ACSM Input plugin as intended, turning the ACSM file into an EPUB file. But then, after that, Calibre is trying to run the ACSM FileTypePlugin again, using a nonexistant file:

"ACSM Input v0.1.0: Trying to parse file mnqfpahl_import_plugin.acsm"
"ACSM Input v0.1.0: Hey, that didn't work: ACSM not found or invalid"

After that, the book is still added to the Calibre book database, but Calibre thinks it's an ACSM file. It displays "Formats: ACSM" in the sidebar and refuses to open it. If I open the path the file is in, the file is named with the correct Title and Author, but with the extension "acsm" instead of "epub". If I rename that file to "epub" I get a valid EPUB file with DRM.

I'm hoping my explaination of the error was clear, the issue seems to be a bit tricky. The errors do not occur when adding the ACSM through the Calibre GUI or through the auto-add code, just through the WebUI.

If necessary I can post exact instructions on how you can reproduce the bug yourself, just let me know what you need. Here's a quote from the debug log, showing that the plugin runs multiple times instead of once:

```
ACSM Input v0.1.0: Trying to parse file URLLink - 2022-10-14T124313.013.acsm
ACSM Input v0.1.0: Try to fulfill ...
... (tons of debug logs from my plugin removed)
ACSM Input v0.1.0: Downloading book ...
ACSM Input v0.1.0: Loading book from http://contentserver.adobe.com/media/xxxx.epub
Download took 1676 ms (HTTP 200)
That's a ZIP file -> EPUB
ACSM Input v0.1.0: File successfully fulfilled ...
ACSM Input v0.1.0: Trying to parse file URLLink - 2022-10-14T124313.013.acsm
ACSM Input v0.1.0: Try to fulfill ...

ACSM Input v0.1.0: Downloading book ...
ACSM Input v0.1.0: Loading book from http://contentserver.adobe.com/media/xxxx.epub
Download took 1824 ms (HTTP 200)
That's a ZIP file -> EPUB
ACSM Input v0.1.0: File successfully fulfilled ...
ACSM Input v0.1.0: Trying to parse file ujix2pdn_import_plugin.acsm
ACSM Input v0.1.0: Try to fulfill ...
ACSM Input v0.1.0: Hey, that didn't work:
ACSM not found or invalid
Received server change event: BooksAdded(book_ids=3065) for /media/some/path/that/i/censored
```

Thanks for the bugfix, but the behaviour is still broken now, just in a different way. Unfortunately I was unable to test that before the new version was released today, I tried to compile Calibre myself on Linux but was unsuccessful.

It seems as if Calibre is now running the FileTypePlugins twice, or even three times sometimes.

Looking at the log file, Calibre first executes the ACSM Input plugin as intended, turning the ACSM file into an EPUB file. But then, after that, Calibre is trying to run the ACSM FileTypePlugin again, using a nonexistant file:

"ACSM Input v0.1.0: Trying to parse file mnqfpahl_import_plugin.acsm"
"ACSM Input v0.1.0: Hey, that didn't work: ACSM not found or invalid"

After that, the book is still added to the Calibre book database, but Calibre thinks it's an ACSM file. It displays "Formats: ACSM" in the sidebar and refuses to open it. If I open the path the file is in, the file is named with the correct Title and Author, but with the extension "acsm" instead of "epub". If I rename that file to "epub" I get a valid EPUB file with DRM.

I'm hoping my explaination of the error was clear, the issue seems to be a bit tricky. The errors do not occur when adding the ACSM through the Calibre GUI or through the auto-add code, just through the WebUI.

If necessary I can post exact instructions on how you can reproduce the bug yourself, just let me know what you need. Here's a quote from the debug log, showing that the plugin runs multiple times instead of once:

```
ACSM Input v0.1.0: Trying to parse file URLLink - 2022-10-14T124313.013.acsm
ACSM Input v0.1.0: Try to fulfill ...
... (tons of debug logs from my plugin removed)
ACSM Input v0.1.0: Downloading book ...
ACSM Input v0.1.0: Loading book from http://contentserver.adobe.com/media/xxxx.epub
Download took 1676 ms (HTTP 200)
That's a ZIP file -> EPUB
ACSM Input v0.1.0: File successfully fulfilled ...
ACSM Input v0.1.0: Trying to parse file URLLink - 2022-10-14T124313.013.acsm
ACSM Input v0.1.0: Try to fulfill ...

ACSM Input v0.1.0: Downloading book ...
ACSM Input v0.1.0: Loading book from http://contentserver.adobe.com/media/xxxx.epub
Download took 1824 ms (HTTP 200)
That's a ZIP file -> EPUB
ACSM Input v0.1.0: File successfully fulfilled ...
ACSM Input v0.1.0: Trying to parse file ujix2pdn_import_plugin.acsm
ACSM Input v0.1.0: Try to fulfill ...
ACSM Input v0.1.0: Hey, that didn't work: 
ACSM not found or invalid
Received server change event: BooksAdded(book_ids=3065) for /media/some/path/that/i/censored
```

Revision history for this message

Kovid Goyal (kovid) wrote on 2022-10-14:

#3

That should take care fo it:
https://github.com/kovidgoyal/calibre/commit/a32750b4be814fa7b3991941eb7110023f91b522

and you dont need to build calibre to run from source, see
https://manual.calibre-ebook.com/develop.html

Revision history for this message

Florian Bach (leseratte10) wrote on 2022-10-14 (last edit on 2022-10-14):

#4

Thanks a lot, I didn't know that that was possible. That's way easier than compiling Calibre.

The mentioned issue with the wrong file type extension is now fixed; thanks.
There's just one small thing remaining:

When I import an ACSM file into Calibre using the GUI, for which I already have that same book, Calibre first runs all the FileTypePlugins, then asks me "do you want to add this duplicate anyways", and I click "Yes" or "No" depending on what I want, and if I clicked "Yes" then the book is added.

When I do that through the web UI, it first runs all the FileTypePlugins once, then asks me "Do you want to add the duplicate", and I click "Yes", then it re-runs all the FileTypePlugins again with the same source file.

While not an issue in general (running a FileTypePlugin twice in a row with the same input file should "generate" the same resulting file), I do not really like that as it means my plugin is going to be contacting the Adobe servers twice in a row and downloads the book a second time unnecessarily.

Is it possible to make the Calibre Webserver import the book copy that's already been processed by the FileTypePlugins, instead of having it re-run the source ACSM file through all the plugins again upon clicking "Add anyway"?

Revision history for this message

Kovid Goyal (kovid) wrote on 2022-10-14:

#5

No, I'm afraid that's not fixable easily. It would require the server to
maintain a cache of files between requests. There then becomes disk
usage issues and cache expiry issues that I dont really feel like
dealing with.

You can deal with it in your plugin, cache the result of the last n acsm
calls and re-use it if an indentical acsm is passed again.

Revision history for this message

Florian Bach (leseratte10) wrote on 2022-10-14:

#6

Okay, I will try to do that instead. Thanks for the two bugfixes!

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.