saveDatabaseInterval > 0 can result in regular heavy drive access

Bug #1332467 reported by Martin Spacek
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qpdfview
Fix Released
Low
Adam Reichold

Bug Description

I noticed that with qpdfview left running with say 5 pdfs open in 5 tabs, every 30 sec or so there's a burst of drive activity that lasts about 1 second. I eventually realized this has to do with saving per-file information to the database (though I'm not too sure where exactly that is). I have restoreBookmarks=false and restoreTabs=true. Disabling restorePerFileSettings reduced the amount of drive activity somewhat, as did closing some PDFs. Finally, I added saveDatabaseInterval to my qpdfview.conf under [mainWindow] and set it to 0. That seems to have gotten rid of the drive activity altogether. I'm not sure how to accurately track drive activity per-process in Linux.

After re-enabling restorePerFileSettings, even with saveDatabaseInterval=0, per-file view information still seems to be saved, presumably on close. This seems completely sufficient to me. It's a bit unnerving seeing so much drive activity during idle, especially if running off battery. I'd suggest either greatly increasing the default saveDatabaseInterval, say to 10 min, or just turning it off altogether. If qpdfview crashes, then you lose any changes to your latest tabs and views and stuff. Not a big deal. Or perhaps I'm misunderstanding something?

Revision history for this message
Adam Reichold (adamreichold) wrote : Re: [Bug 1332467] [NEW] saveDatabaseInterval > 0 can result in regular heavy drive access

Hello,

Am 20.06.2014 10:55, schrieb Martin Spacek:
> Public bug reported:
>
> I noticed that with qpdfview left running with say 5 pdfs open in 5
> tabs, every 30 sec or so there's a burst of drive activity that lasts
> about 1 second. I eventually realized this has to do with saving per-
> file information to the database (though I'm not too sure where exactly
> that is). I have restoreBookmarks=false and restoreTabs=true. Disabling
> restorePerFileSettings reduced the amount of drive activity somewhat, as
> did closing some PDFs. Finally, I added saveDatabaseInterval to my
> qpdfview.conf under [mainWindow] and set it to 0. That seems to have
> gotten rid of the drive activity altogether. I'm not sure how to
> accurately track drive activity per-process in Linux.

This is expected as relevant (i.e. whatever "restore*" settings the user
has enabled) information is synchronized with the database regularly to
improve reliability. The default value of the timer is 30 seconds. The
database path depends on the XDG environment settings but it is usually
stored in "~/.local/share/qpdfview/qpdfview/database" (Qt 4) or
"~/.local/share/data/qpdfview/qpdfview/database" (Qt 5). It is a plain
SQLite database which you can inspect using for example the "sqlite"
command-line tool.

> After re-enabling restorePerFileSettings, even with
> saveDatabaseInterval=0, per-file view information still seems to be
> saved, presumably on close. This seems completely sufficient to me. It's
> a bit unnerving seeing so much drive activity during idle, especially if
> running off battery. I'd suggest either greatly increasing the default
> saveDatabaseInterval, say to 10 min, or just turning it off altogether.
> If qpdfview crashes, then you lose any changes to your latest tabs and
> views and stuff. Not a big deal. Or perhaps I'm misunderstanding
> something?

Depends on what the definition of "a big deal" is. :-) In any case,
writing to that database should be cheap enough as its total size is
usually just a few kilobytes, so I don't think I want to disable that
function by default but the default timeout of 30 seconds is maybe a bit
too much. How about changing the default value to 5 minutes as compromise?

Best regarsds, Adam.

> ** Affects: qpdfview
> Importance: Undecided
> Status: New
>

Changed in qpdfview:
status: New → Triaged
importance: Undecided → Low
Revision history for this message
Martin Spacek (mspacek) wrote :

Hi Adam,

5 min sounds good to me. Yup, the database file is only about 10 KB on my system. If the database is so small and cheap to read/write, why does it take ~1 sec of solid disk activity (according to my drive LED) to write the relevant info for 5 tabs?

I have a reasonably fast i7 quad core laptop with 32GB RAM, so I normally have very little (spinning disk) hard drive access.

Changed in qpdfview:
milestone: none → 0.4.11
assignee: nobody → Adam Reichold (adamreichold)
status: Triaged → Fix Committed
Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello,

I changed the default to five minutes. Concerning the amount of drive activity, if you are concerned about it, we should try to measure this directly together with the operating system because I wouldn't trust the activitiy of that LED to have a granularity of much less than seconds. You could try using "iotop" from [1] or something similar.

Best regards, Adam.

[1] http://sebastien.godard.pagesperso-orange.fr/

Revision history for this message
Boris (borisxm) wrote :

For me the problem are even worse. I'm normally have from 80 to 100 opened documents and after recent update qpdfview completely freezes during heavy disk activity which takes about 12-13 seconds every 30 seconds. It has been confirmed, that it writes to database-journal file.

The size of database itself is 257024 bytes. Setting the above mentioned saveDatabaseInterval variable to 43200000 seems to be a solution, but closing of the viewer still takes these 12 seconds (this hasn't changed from the previous release). It is unclear, why sqlite takes that much time for such a small database. Simple copying to the .bak file should be an instant event.

Revision history for this message
Adam Reichold (adamreichold) wrote : Re: [Bug 1332467] Re: saveDatabaseInterval > 0 can result in regular heavy drive access

Hello Boris,

Am 26.06.2014 13:11, schrieb Boris:
> For me the problem are even worse. I'm normally have from 80 to 100
> opened documents and after recent update qpdfview completely freezes
> during heavy disk activity which takes about 12-13 seconds every 30
> seconds. It has been confirmed, that it writes to database-journal file.

thanks for sharing your observations. I am really surprised that people
actually notice the function being there as with around 20 tabs open
here and the interval being set to 5 seconds, I can almost not notice
the short moment when the database is saved and I get an average write
rate of 150 kB/s (again with a 5 second interval, not 30 seconds or 5
minutes).

> The size of database itself is 257024 bytes. Setting the above mentioned
> saveDatabaseInterval variable to 43200000 seems to be a solution, but
> closing of the viewer still takes these 12 seconds (this hasn't changed
> from the previous release). It is unclear, why sqlite takes that much
> time for such a small database. Simple copying to the .bak file should
> be an instant event.
>

There is obviously something going on in the background that I have not
though oft for which I am sorry. But I am completely in the dark as to
why SQLite needs such a long time to write out the changes. The one
thing I can think of is that qpdfview's update strategy is probably not
optimal: We delete all rows of the current table of open tabs and then
insert all open tabs as new rows within a single transaction. But then
again, it is a table of 100 rows in your case which should not take 10
seconds to write out.

Can you explain in more detail (or just provide a link to some other
resource) what you mean with "the .bak file"? Also note that you can
disable the periodic saving completely by setting a zero or negative value.

Best regards, Adam.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Of course, maybe this means we'll have to consider deactivating this by
default but since it is a reliability feature I would rather fix it than
disable it... :-\

Am 26.06.2014 13:11, schrieb Boris:
> For me the problem are even worse. I'm normally have from 80 to 100
> opened documents and after recent update qpdfview completely freezes
> during heavy disk activity which takes about 12-13 seconds every 30
> seconds. It has been confirmed, that it writes to database-journal file.
>
> The size of database itself is 257024 bytes. Setting the above mentioned
> saveDatabaseInterval variable to 43200000 seems to be a solution, but
> closing of the viewer still takes these 12 seconds (this hasn't changed
> from the previous release). It is unclear, why sqlite takes that much
> time for such a small database. Simple copying to the .bak file should
> be an instant event.
>

Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello again,

Am 26.06.2014 13:11, schrieb Boris:
> For me the problem are even worse. I'm normally have from 80 to
> 100 opened documents and after recent update qpdfview completely
> freezes during heavy disk activity which takes about 12-13 seconds
> every 30 seconds. It has been confirmed, that it writes to
> database-journal file.
>
> The size of database itself is 257024 bytes. Setting the above
> mentioned saveDatabaseInterval variable to 43200000 seems to be a
> solution, but closing of the viewer still takes these 12 seconds
> (this hasn't changed from the previous release). It is unclear, why
> sqlite takes that much time for such a small database. Simple
> copying to the .bak file should be an instant event.
>

it's really puzzling because from looking for general advice on
improving SQLite performace like [1], we already do everything that
yields large gains and our tables are still to small to provide a
measurable load...

Would be interested in building and running a version of the code that
uses an instrumented Database::saveTabs to find out if some part of
the transaction is especially slow?

Best regards, Adam.

Revision history for this message
Boris (borisxm) wrote :

I've did a simple strace and found that there a lot of open/close/write syscalls:

open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
etc.

In between them there are lots of write() calls with sizes up to 1024 bytes. May be tons of these open/close pairs slows down operation because OS might be instructed to flush data on close. Your case can be different due to mount options, FS type. Mine is ext4 mounted with noatime.

> what you mean with "the .bak file"?

Simple backup copy of the database file.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello again,

Am 26.06.2014 16:20, schrieb Boris:
> I've did a simple strace and found that there a lot of open/close/write
> syscalls:
>
> open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
> open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
> open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
> open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
> open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
> open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
> open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
> open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
> open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
> open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
> open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
> open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
> open("/home/user/.local/share/data/qpdfview/qpdfview", O_RDONLY|O_CLOEXEC) = 199
> open("/home/user/.local/share/data/qpdfview/qpdfview/database-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 198
> etc.
>
> In between them there are lots of write() calls with sizes up to 1024
> bytes. May be tons of these open/close pairs slows down operation
> because OS might be instructed to flush data on close. Your case can be
> different due to mount options, FS type. Mine is ext4 mounted with
> noatime.

It looks like the SQLite spends a lot of time opening its journal file
for writing. Do you feel comfortable with building from source using the
attached patch which requests an in-memory journal from SQLite?

>> what you mean with "the .bak file"?
>
> Simple backup copy of the database file.
>

I did infer, but I don't SQLite works that way. I think it rather uses
said journal to record changes and then commit them to the database file
in one go.

Best regards, Adam.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello again,

Am 26.06.2014 18:45, schrieb Boris Popov:
> On 26.06.2014 22:18, Adam Reichold wrote:
>
>> Since "journal_mode = MEMORY" is already outside of the safe defaults, adding "synchronous =
>> OFF" shouldn't make things worse from a reliability point of view. Tarball attached. Does
>> improve performance measurably?
>
>
> Yes, now it completes within a 80-90 mS and almost unnoticeable! Still, it is a good idea to
> avoid unnecessary writes because they may cause unwanted disk spin ups. AFAIR, Mozilla had the
> same problem with sqlite and a lot of users complained about these constant writes.
>

Ok, so I'll commit the journal and synchronization changes to trunk in
addition to already having changed the default interval to five minutes.

I am still not convinced the function should be turned off by default
since users who are conscious about those writes can turn it off by
themselves and if is terminated for whatever reason, it better to have
written the state to persistent storage than to loose bookmarks or tabs.

However I do think that we could expose the setting in the settings
dialog so that it becomes easier to change it.

What are your opinions? Boris? Martin? Benjamin?

Best regards, Adam.

Revision history for this message
Boris (borisxm) wrote :

Under "unnecessary writes" I mean that if viewer is idle, there is no point to write data every five minutes - schedule writes right after changes occur and we'll have both of two worlds. Also I'm fine with this setting exposed via UI.

Revision history for this message
Martin Spacek (mspacek) wrote :

With a dozen tabs open, it used to take about 3 sec to close qpdfview. I really don't know anything about SQL, but that last commit has dropped close time to zilch! Great!

When in doubt, I think it's always best to expose a setting and give the user flexibility (pretty much the opposite of the GNOME way of thinking).

Revision history for this message
Martin Spacek (mspacek) wrote :

"Under "unnecessary writes" I mean that if viewer is idle, there is no point to write data every five minutes - schedule writes right after changes occur and we'll have both of two worlds. Also I'm fine with this setting exposed via UI."

Sounds good, but (and I'm just speculating) I suspect that qpdfview saves your position in every file, so every time you scroll or zoom or whatever, that changes an entry in the database. So it might have to be a combination of what the timer is doing and whether any changes to be saved occurred at all. If something has changed, and the 5 min timer signal has fired, then write to database.

Revision history for this message
Adam Reichold (adamreichold) wrote :

Am 26.06.2014 19:22, schrieb Boris:
> Under "unnecessary writes" I mean that if viewer is idle, there is
> no point to write data every five minutes - schedule writes right
> after changes occur and we'll have both of two worlds. Also I'm
> fine with this setting exposed via UI.

This would of course be optimal w.r.t. to amount of writing done, but
it also implies that we have to track every action that results in a
change of the persistent data and do this in relation to the users
settings. It sounds rather suboptimal from a workload and maintenance
burden perspective. As long as the contributor situation is as it is,
we will probably have to go for simple solutions...

Revision history for this message
Adam Reichold (adamreichold) wrote :

Am 26.06.2014 19:30, schrieb Martin Spacek:
> With a dozen tabs open, it used to take about 3 sec to close qpdfview. I
> really don't know anything about SQL, but that last commit has dropped
> close time to zilch! Great!

The change is really just reducing the amount of paranoia SQLite invests
into it file system accesses, i.e. how much it tries to reduce the
probability of the database being corrupted if qpdfview crashes while
updating the database. (But of course the time to update the database
does not change the fact, that some spinning disk might have to wake up
to make that short write...)

> When in doubt, I think it's always best to expose a setting and give the
> user flexibility (pretty much the opposite of the GNOME way of
> thinking).
>

Already two people for, so I'll add since its simple and we already are
running for world's most bloated settings dialog anyway. ;-)

Revision history for this message
Adam Reichold (adamreichold) wrote :

Am 26.06.2014 19:37, schrieb Martin Spacek:
> "Under "unnecessary writes" I mean that if viewer is idle, there is no
> point to write data every five minutes - schedule writes right after
> changes occur and we'll have both of two worlds. Also I'm fine with this
> setting exposed via UI."
>
> Sounds good, but (and I'm just speculating) I suspect that qpdfview
> saves your position in every file, so every time you scroll or zoom or
> whatever, that changes an entry in the database. So it might have to be
> a combination of what the timer is doing and whether any changes to be
> saved occurred at all. If something has changed, and the 5 min timer
> signal has fired, then write to database.

How much data is really saved depends on the exact settings, e.g.
whether per-file settings are on or off, but in principle we store a lot
of each views state like current page and scale factor and so on.

Implementing a combined approach is simple but tedious, we just make the
periodic time a one-shot timer that if it isn't already running, is
started once a relevant attribute changes. This means we will write out
state at most once every five minutes (or whatever was configured) but
only if there actually is reason to do so.
The real problem from where I am standing is to add a line like
"markTabsTableDirty();" to each and every signal handler that tracks a
relevant attribute...

Revision history for this message
Martin Spacek (mspacek) wrote :

Testing the latest from trunk, it seems the viewport position on restored tabs tends to be about half a page higher than it was when qpdfview was last closed, i.e., on restore I'm looking at the spot on the page about half a page higher than before. However if I close and restart again, there's no creeping up the page. I'm now looking at the exact same spot as before. Has this always been the case?

The other thing I've noticed is that on restore, qpdfview always shows the last tab in the tab bar, even if it wasn't the last one viewed. So, say I have 3 tabs, I'm reading the 2nd tab in the list, I close qpdfview, restart, now I'm looking at the 3rd tab. Should I open that up as separate bug?

Revision history for this message
Adam Reichold (adamreichold) wrote :

Hello Martin,

both are the intended behaviour as...

Am 28.06.2014 21:30, schrieb Martin Spacek:
> Testing the latest from trunk, it seems the viewport position on
> restored tabs tends to be about half a page higher than it was when
> qpdfview was last closed, i.e., on restore I'm looking at the spot on
> the page about half a page higher than before. However if I close and
> restart again, there's no creeping up the page. I'm now looking at the
> exact same spot as before. Has this always been the case?

...we only store the current page, not the view port position. Hence
after a restart, you will end up with a view port as if you just jumped
to that page, i.e. at the top-left of it.

> The other thing I've noticed is that on restore, qpdfview always shows
> the last tab in the tab bar, even if it wasn't the last one viewed. So,
> say I have 3 tabs, I'm reading the 2nd tab in the list, I close
> qpdfview, restart, now I'm looking at the 3rd tab. Should I open that up
> as separate bug?

...we just do not store which tab is current and hence can't restore it.

Both observations are definitely separate from the storage interval and
if you want to track them, that this bug report is probably not the
place. However, I'd definitely prefer a merge request over another wish
list bug. ;-) (However, since this would almost surely involve another
database schema change, I would rather wait after 0.4.11 with this,
since I'd really like to whole tiling thing to settle down first.)

Best regards, Adam.

Changed in qpdfview:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.