E-book viewer modifies the epub file upon closing

Bug #858957 reported by Li Fanxi
90
This bug affects 19 people
Affects Status Importance Assigned to Milestone
calibre
Won't Fix
Undecided
Unassigned

Bug Description

calibre version: 0.8.20
OS: Linux (Debian)

How to reproduce:
- Open E-book viewer
- Check the MD5SUM of a epub file
- Open the epub file in the E-book viewer
- Close the E-book viewer
- Check the MD5SUM of the epub file again

Expected:
The MD5SUM value does not change. E-book viewer should not modify the e-book files.

Actual:
The MD5SUM value changes, the e-book file has been modified. The impact of this behavior is if the ebook files are under control of some synchronization services such as Dropbox, each time the epub file get read, it needs to be uploaded and synchronized to all other places.

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 858957

The ebook viewer stores booksmarks in the epub file. This is by design. You
can turn it off by unsetting the "remeber last page when closing" option in
the viewer preferences.

 status wontfix

Changed in calibre:
status: New → Won't Fix
Revision history for this message
Li Fanxi (lifanxi) wrote :

So there will be no way to have the "Remember last page when closing" function along with the ability to not change the epub file when closing?

I am wondering why mobi or txt files don't have the similar problem, why they can remember the last reading position without changing the file? I didn't dive into the code, but I guess there will be place outside the ebook files to record this information. Why not using the same mechanism for the epub files?

Revision history for this message
Kovid Goyal (kovid) wrote :

Because there is no way to put arbitrary information into MOBI and txt files.

Revision history for this message
Li Fanxi (lifanxi) wrote :

Yes.

My question is why not using the same mechanism for epub files: store the bookmark information outside the epub files like what has been done for mobi or txt files, then we no longer need to modify the epub files to keep the bookmark.

Revision history for this message
Kovid Goyal (kovid) wrote :

Because storing bookmarks inside epub files in convenient. It allwos you to
transmit the epub file and preserve the bookmarks.

Revision history for this message
danmb (danmbox) wrote :

This is actually a pretty surprising default. What do you do, silently ignore errors on read-only mounts? Perhaps you should warn the user the first time you modify an epub (once per installation, not per-book)

Revision history for this message
avdd (avdd) wrote :

> This is by design

The design is wrong.

Revision history for this message
Kovid Goyal (kovid) wrote :

ROFL

Revision history for this message
Eli Schwartz (eschwartz) wrote : Re: [Bug 858957] Re: E-book viewer modifies the epub file upon closing

I am not sure why you think there is a problem here, calibre by default
stores an extra copy of the bookmark data inside the ebook file if the
ebook format supports that, and there is an option to disable that if
you value unmodified files more than you value being able to store your
reading location and bookmarks together with the ebook.

Since you apparently feel so strongly about this that you revived a
6-year-old bug just to tell us the design is somehow wrong, I can only
assume you know of a better design.

Please tell what the better design is, other than simply removing this
*optional* feature.

Revision history for this message
Krzysztof Jurewicz (krzysztof-jurewicz) wrote :

Imagine that someone reads a detective novel in which the identity of murderer is revealed on the last page. He then closes the ebook and submits it to a friend, recommending lecture. The friend then opens the book and finds out who is the murderer.

ebook-viewer should by default, as the name suggests, be used only to view ebooks, not to modify them; checksum should be unchanged. If ebook-viewer wants to modify an ebook, it should be explicitly approved by the user (perhaps once per ebook or per installation, as danmbox suggests). Or the program name should be changed to something less missleading.

Revision history for this message
Eli Schwartz (eschwartz) wrote :

So your rationale for disabling this feature by default is that when
saving the reading position of your book in order that you can continue
where you left off, you may accidentally spoil the book for your
mysterious friend who finds himself continuing where you left off?

That's a pretty weak objection. It is a fringe case to begin with, and
made yet more unlikely by the fact that it is really easy to just... go
back to the start of the book, unless an even more unlikely coincidence
happens and the exact saved location has an actual spoiler which is
unlikely in my experience.

Claiming that an "ebook-viewer" has a misleading name if it also
modifies the book, which it doesn't as it merely stores additional
information inside the book like a number of other document formats, is
just silly. There is no rule of etiquette or expectation that a "viewer"
is fundamentally incompatible with modifying the file. You are making an
arbitrary differentiation that exists only in your own mind and was
devised solely for the purpose of defending your claim right here, right
now. But if you truly believed that, then users should not even be
allowed to *opt in* to such an "untruthful" program behavior.

Revision history for this message
Krzysztof Jurewicz (krzysztof-jurewicz) wrote :

In my opinion, modifying user’s files without his consent (either explicit or implicit) is bad manners and it will break things in ways hard to predict. It may create unnecessary copies of ebooks if they are distributed via IPFS, leading to fragmentarization. It will create unnecessary entries if ebooks are distributed through Dat protocol. If ebooks are signed, it will break signatures. It will add bloat to Dropbox/rsync/whatever synchronization, as mentioned in the original report. In my case, I’ve noticed that MOBI file in one of my repositories had been regenerated without source file being modified (because EPUB had been used as an intermediary), which had been confusing.

The above issues may not always be catastrophic and may not be frequent, but many people will expect that a viewer by default doesn’t modify their files and will be surprised by different behaviour. Actually, it’s hard for me to find an example of another interactive viewer or even an editor that acts similarly.

Revision history for this message
Paul Bryan (pbryan) wrote :

FWIW, I support the feature request that Calibre not write to an ebook by default (i.e. default "safe"). This automatic writing feature adversely affected me, modifying most of my ebook collection without my knowledge. Not what I was expecting.

Revision history for this message
Eli Schwartz (eschwartz) wrote :

> It may create unnecessary copies of ebooks if they are
> distributed via IPFS, leading to fragmentarization. It will create
> unnecessary entries if ebooks are distributed through Dat protocol. If
> ebooks are signed, it will break signatures. It will add bloat to
> Dropbox/rsync/whatever synchronization, as mentioned in the original
> report.

These are all things that cause no harm and the user having noticed them
can then check what changed and disable the feature. Essentially, you
are arguing that IPFS/Dat distributors are a more common case that
should be catered to, than users who want to continue reading their
calibre books on another device.

> In my case, I’ve noticed that MOBI file in one of my
> repositories had been regenerated without source file being modified
> (because EPUB had been used as an intermediary), which had been
> confusing.

I have no idea what this means, are you saying that you did a
MOBI-to-MOBI conversion or something? In what way does this relate to
what was mentioned here -- MOBIs cannot be edited the way EPUB can and
they therefore don't have these bookmarks anyway.

> The above issues may not always be catastrophic and may not be frequent,
> but many people will expect that a viewer by default doesn’t modify
> their files and will be surprised by different behaviour. Actually, it’s
> hard for me to find an example of another interactive viewer or even an
> editor that acts similarly.

Unfortunately, there is no standard for ebook annotations, if you can
convince the IDPF to be useful for once and publish one, calibre will be
delighted to migrate to that instead.

As for other viewers, PDF also does embedded annotations/bookmarks.

Revision history for this message
Eli Schwartz (eschwartz) wrote :

> [...] modifying most of my ebook collection without my knowledge.

This is scare-tactic language used to distract attention away from the
fact that your books are not, in fact, modified. All that happened was
that some additional metadata was bolted onto the side.

If you are worried that calibre opening the file in write mode will
somehow corrupt your files beyond all repair, there is no earthly reason
to think calibre will do that unless you have pre-existing issues.

Claiming that file modification adversely affects file synchronization
is at least an intelligible protest, but I have no idea what your issue
is, largely because you haven't actually mentioned it.

Revision history for this message
Paul Bryan (pbryan) wrote :

> This is scare-tactic language used to distract attention away from the
> fact that your books are not, in fact, modified. All that happened was
> that some additional metadata was bolted onto the side.

It's a fact. It altered my ebook files, and it did so without my knowledge or consent.

Revision history for this message
Krzysztof Jurewicz (krzysztof-jurewicz) wrote :

> I have no idea what this means, are you saying that you did a
> MOBI-to-MOBI conversion or something? In what way does this relate to
> what was mentioned here -- MOBIs cannot be edited the way EPUB can and
> they therefore don't have these bookmarks anyway.

I have a Markdown file and a Makefile which is used to generate other formats, in particular:

• EPUB (using Pandoc);
• MOBI (from the generated EPUB, using ebook-convert).

If I open the EPUB using ebook-viewer’s default settings and run “make” again, the MOBI file is regenerated.

> As for other viewers, PDF also does embedded annotations/bookmarks.

It may, but I doubt that PDF viewers save annotations/bookmarks without getting user’s approval.

Revision history for this message
Edward J. Shornock (ed-shornock) wrote :

>> As for other viewers, PDF also does embedded annotations/bookmarks.

> It may, but I doubt that PDF viewers save annotations/bookmarks without getting user’s approval.

They certainly don't modify files without saving/prompting. Plus, annotations and bookmarks don't happen automatically. A user has to do *something* to add a bookmark or annotation. Okular (most often used with KDE) will save annotations and bookmarks without prompting but it never modifies the original files.

ebook-viewer on the other hand modifies a file by just opening it and closing it.

I found this unexpected (to me) behaviour doing sanity checks on my file system. "WTH are these files changed? I've only viewed them…"

I can see how it's a useful feature but I think it really should be opt-in. Maybe there could be a one-time prompt when a book is first opened.

Revision history for this message
Carsten Fuchs (carsten78) wrote :

I too find this feature surprising: Having opened a file for reading only doesn't suggest it will be modified on disk.
Modifying the epub file however changes its file size, timestamp and checksums. This in turn doesn't agree well with file integrity monitoring and backup systems.

Could you therefore please reconsider the implementation of this feature?

It seems to me that a very good alternative implementation was to store metadata in a separate file. This would be compatible with all book/file formats and make any modifications explicit.
It also would not require the puzzled user to undertake research to figure out why the epub file has changed and how to turn it off.

Revision history for this message
Jan (zorglf) wrote :

So this is why my automated backups keep multiple copies of the same book all over!

I was utterly confused when I checked why my backups grown lots when I changed little files, and I found out that I have a dozen of copies of the same 14MB epub.

I second that this option must be either a opt-in or at least should require the user to accept a huge red blinking warning.

Revision history for this message
Álvaro GR (emuagr) wrote :

This isn't proper behaviour for an e-book READER/VIEWER. I'm generating epubs and use this tool to check the result, I don't need any modifications in the source file that will be distributed later.

Revision history for this message
Mel B. (bighype) wrote :

I noticed this bug years ago but since I don't use ebook-viewer that often, I didn't care much to track it. I used calibre to move ebooks from my computer to my devices and for that I didn't use an ebook-viewer.

However, I recently started cataloging my books and backing them up and I've noticed that my backup program would back up an epub after opening it with ebook-viewer. Every time it's opened, it would be changed. I think most people would think that these files are idempotent.

Imagine if your video player modified the file every time you played it. Or if your editor of choice modified the file even when you didn't explicitly save it. This is no different.

Video players, such as mpv for example, store playback position in a special directory, for example. Many other programs do the same thing.

I don't see how modifying files is better than just saving a one byte value in some file inside of the users' conf folder.

Revision history for this message
ellie (et1234567) wrote :

Lots of people made tickets about this, it's sad to see there is no real movement on this. Couldn't there at least be a dialog asking for this to be enabled at first launch? Why does it need to be a silent default? This really messes with cloud-based backups with dropbox or syncthing too...

Revision history for this message
u922796 (u922796-deactivatedaccount) wrote :

This is surprising and undesirable default behavior. People who read EPUBs expect their EPUB viewer to read the file, and not silently modify the file.

Why does this have to be a silent default?

Revision history for this message
Rohin Koshi (gothicserpent) wrote :

I sometimes download repository torrents of old books (that are in the public domain and legal to distribute).

When I use calibre viewer to open the books, the TOC / last page is changed, thereby creating a "missing files" error whenever i view any book.

My solution to this was to make the epub files themselves read-only in windows explorer in bulk (this creates the issue of "preparing book for first read" every single time i open any epub file with this viewer. This has caused me to almost consider only using other programs for viewing epub files), and I STRONGLY agree any modification of data without clear user consent is not proper.

This option should be disabled by default, and given an option to enable, perhaps in the readme.

Please disable this setting by default!

Revision history for this message
Eli Schwartz (eschwartz) wrote :

> thereby creating a "missing files" error whenever i view any book

What does this even mean? Why would the book go missing?

Revision history for this message
Francisco Pombal (pombal.francisco) wrote :

A viewer is used for viewing files. Viewing a file shouldn't change it. Having a viewer changing files by default is a nightmare for data integrity checking, collection management and file-sharing purposes (e.g. changed files will fail BitTorrent's hash checking).

I'm fine with this option existing for those who want it, but it should definitely be disabled by default, by the principle of list surprise - I don't expect a file's SHA256 sum to change after I "view" it.

>> thereby creating a "missing files" error whenever i view any book

> What does this even mean? Why would the book go missing?

I think the poster above means that their torrent client complains that the file (or some of its pieces) are missing/invalid, after "viewing" it in the Calibre viewer.

Revision history for this message
Francisco Pombal (pombal.francisco) wrote :

> Because storing bookmarks inside epub files in convenient. It allwos you to
transmit the epub file and preserve the bookmarks.

That convenience shouldn't come at the cost of altering files by default.

The bookmarks should be stored outside of the epub file. Some kind of bookmark database perhaps, with a standardized format and syncing protocol. This would make it easy to sync bookmarks across viewers/devices (or, less ambitiously, simply different Calibre instances).

Revision history for this message
Sami Boukortt (sboukortt) wrote :

It’s also a potential privacy concern. You may not realize that by sharing your copy of an e-book, you are also sharing your bookmarks. The convenience argument assumes that sharing the bookmarks is what you want to do. With little data for either side (other than the number of votes on this bug report), I could just as easily speculate that wanting to share them in the file itself is as much of a fringe use-case as not wanting the file to change.

In any case, modifying the file without prompting (which happens even with no changes to the bookmarks, I might add) is indeed not what I expected from a viewing tool. I don’t know of any other tool that does that. So, I would say that:

> There is no rule of etiquette or expectation that a "viewer" is fundamentally incompatible with modifying the file.

is not true.

As for:

> Please tell what the better design is, other than simply removing this *optional* feature.

Why not keep the feature if some users desire it, but not enabled by default?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.