Improve insertion times

Bug #639737 reported by Seif Lotfy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zeitgeist Framework
Fix Released
Wishlist
Seif Lotfy

Bug Description

We insert pretty slowly with an average of 0.15 seconds for one event on my core i5 2.5 GHz beast.

RainCT had some optimization possibilities:
1) PRAGMA synchronous=OFF
2) PRAGMA journal_mode=OFF

The Chat:
------------------------------------------------------------
<kamstrup> I think we are - but I can't recall... in case of failed transactions - but I don't even know if we use transactions these days...
<seif> <RainCT> try synchronous=OFF
<seif> <RainCT> but it can corrupt your database if your phone dies while ZG is inserting
<seif> <RainCT> and journal_mode=MEMORY
<seif> <RainCT> or OFF since we don't use rollback anyway
<seif> so maybe journal_mode = OFF is a good start?
<kamstrup> okay, he's probably right...
<kamstrup> 'grep -Ri rollback _zeitgeist/' is your friend :-)
<kamstrup> apparently we are not using rollback...

More info can be found here: http://www.sqlite.org/pragma.html
------------------------------------------------------------

In order to get a better picture of what's going on, can you please try to get some more information, like:
1) How many events are in your database?
2) What's the insertion time for one event into an empty db?
3) Out of this 0.15 secs, how many time is spend in our python code, and what's the time of the actual sql action?
4) How much faster is adding 10 events at once compared to adding them one at a time?
5) You think 0.15 secs is slow for inserting one event, what time do you expect, and why?

Related branches

Seif Lotfy (seif)
summary: - Slow insertion times on maemo
+ Slow insertion times
description: updated
summary: - Slow insertion times
+ Improve insertion times
description: updated
Revision history for this message
Seif Lotfy (seif) wrote :

Here is a diff... https://code.edge.launchpad.net/~zeitgeist/zeitgeist/sqlite-insert-improvements/+merge/35701
Here I only set the journal_mode to off since we dont use rollback thus we can not be harmed AFAIK. It improved my average speed to 0.075 seconds which is better.

I also benchmarked the synchronous = OFF for get 0.001 average but a lot of risks come with it. So I did not commit this change.

Seif Lotfy (seif)
Changed in zeitgeist:
assignee: nobody → Seif Lotfy (seif)
status: New → Triaged
importance: Undecided → Wishlist
description: updated
Revision history for this message
Markus Korn (thekorn) wrote :

Seif, before making inserts faster I would like to understand why they are so slow right now. The last step should be to actually add some PRAGMA optimizations, which have, as you already mentioned some downsides.

In order to get a better picture of what's going on, can you please try to get some more information, like:
How many events are in your database?
What's the insertion time for one event into an empty db?
Out of this 0.15 secs, how many time is spend in our python code, and what's the time of the actual sql action?
How much faster is adding 10 events at once compared to adding them one at a time?
You think 0.15 secs is slow for inserting one event, what time do you expect, and why?

Revision history for this message
Seif Lotfy (seif) wrote :

Markus,

Table has exactly 5701 events and need 0.15 seconds per one event for insertion
An empty table needs 0.1 seconds here. The time spent in python is too little to be mentioned. Sadly its the commit that takes most of the time.
Inserting 10 events in my 5701 events database takes 1.0 seconds
0.15 is very slow for inserting one item IMHO, I would expect 0.01 seconds at minimum, Especially if we intend to deploy on ARM powered devices these times are not acceptable. You could feel a tiny lag at 0.6 second on the Maemo.

description: updated
Revision history for this message
Seif Lotfy (seif) wrote :

I think however our lower boundary is 0.05 seconds since we do commits after insert events is called, which is a disk write. Now we could avoid that by first returning the values to the user then commiting... It will save noticeable time.

Also:
I tried setting journal_mode to WAL
Speed improvement from 1.5 to 0.5 - 1.0
Better...

The WAL journaling mode uses a write-ahead log instead of a rollback journal to implement transactions. The WAL journaling mode is persistent; after being set it stays in effect across multiple database connections and after closing and reopening the database. A database in WAL journaling mode can only be accessed by SQLite version 3.7.0 or later.

More info can be found here: http://www.sqlite.org/wal.html

Revision history for this message
Seif Lotfy (seif) wrote :

After reading more into the synchronous mode I found the mode NORMAL

When synchronous is NORMAL (1), the SQLite database engine will still pause at the most critical moments, but less often than in FULL mode. There is a very small (though non-zero) chance that a power failure at just the wrong time could corrupt the database in NORMAL mode. But in practice, you are more likely to suffer a catastrophic disk failure or some other unrecoverable hardware fault.

This on gives me
DEBUG:zeitgeist.engine:Inserted 1 events in 0.001519s

Revision history for this message
Mikkel Kamstrup Erlandsen (kamstrup) wrote :

Everyone be sure to read the details here http://www.sqlite.org/pragma.html so you can make an informed opinion. Here's what I gather from a quick read. There may be more pragmas we can use, I haven't trawled the entire document:

 * journal_mode: I don't think OFF is a good choice since we do use transactions - although not rollback. I am thinking DELETE, TRUNCATE, or PERSIST which ever performs the best on our target platform.

 * locking_mode: I think we can safely go with EXCLUSIVE since ZG is really the only process ever that should access the DB. It comes at the cost of you having to kill zg-daemon if you want to manually insoect the db - but I think we can live with that

 * synchronous: I think it's ok to use NORMAL. Especially if we write a backup tool which I think we need to do anyway (streaming events as json into a gzip stream should be fast and light).

Generally: We should do some *serious* dog fooding and see how likely we are to mess up our DBs before we release this into the wild... kill -9, hard poweroffs, and what have we!

Revision history for this message
Seif Lotfy (seif) wrote : Re: [Bug 639737] Re: Improve insertion times
Download full text (3.1 KiB)

What about journal_mode = WAL
I see it being a good candidate too...

On Thu, Sep 16, 2010 at 11:28 PM, Mikkel Kamstrup Erlandsen <
<email address hidden>> wrote:

> Everyone be sure to read the details here
> http://www.sqlite.org/pragma.html so you can make an informed opinion.
> Here's what I gather from a quick read. There may be more pragmas we can
> use, I haven't trawled the entire document:
>
> * journal_mode: I don't think OFF is a good choice since we do use
> transactions - although not rollback. I am thinking DELETE, TRUNCATE, or
> PERSIST which ever performs the best on our target platform.
>
> * locking_mode: I think we can safely go with EXCLUSIVE since ZG is
> really the only process ever that should access the DB. It comes at the
> cost of you having to kill zg-daemon if you want to manually insoect the
> db - but I think we can live with that
>
> * synchronous: I think it's ok to use NORMAL. Especially if we write a
> backup tool which I think we need to do anyway (streaming events as json
> into a gzip stream should be fast and light).
>
> Generally: We should do some *serious* dog fooding and see how likely we
> are to mess up our DBs before we release this into the wild... kill -9,
> hard poweroffs, and what have we!
>
> --
> Improve insertion times
> https://bugs.launchpad.net/bugs/639737
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Zeitgeist Framework: Triaged
>
> Bug description:
> We insert pretty slowly with an average of 0.15 seconds for one event on my
> core i5 2.5 GHz beast.
>
> RainCT had some optimization possibilities:
> 1) PRAGMA synchronous=OFF
> 2) PRAGMA journal_mode=OFF
>
> The Chat:
> ------------------------------------------------------------
> <kamstrup> I think we are - but I can't recall... in case of failed
> transactions - but I don't even know if we use transactions these days...
> <seif> <RainCT> try synchronous=OFF
> <seif> <RainCT> but it can corrupt your database if your phone dies while
> ZG is inserting
> <seif> <RainCT> and journal_mode=MEMORY
> <seif> <RainCT> or OFF since we don't use rollback anyway
> <seif> so maybe journal_mode = OFF is a good start?
> <kamstrup> okay, he's probably right...
> <kamstrup> 'grep -Ri rollback _zeitgeist/' is your friend :-)
> <kamstrup> apparently we are not using rollback...
>
> More info can be found here: http://www.sqlite.org/pragma.html
> ------------------------------------------------------------
>
> In order to get a better picture of what's going on, can you please try to
> get some more information, like:
> 1) How many events are in your database?
> 2) What's the insertion time for one event into an empty db?
> 3) Out of this 0.15 secs, how many time is spend in our python code, and
> what's the time of the actual sql action?
> 4) How much faster is adding 10 events at once compared to adding them one
> at a time?
> 5) You think 0.15 secs is slow for inserting one event, what time do you
> expect, and why?
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/zeitgeist/+bug/639737/+subscribe
>

--
This is me doing some advertisement for my blog http://seilo.geekyog...

Read more...

Revision history for this message
Seif Lotfy (seif) wrote :

After reading through again and again I think we should go with:

* locking_mode = EXCLUSIVE ==> look at mikkels explanation
* synchronous = NORMAL ==> again this seems very reasonable since according to the sqlite guys the chances are near 0 that something wrong could happen there.
* journal_mode = WAL ==> please read http://www.sqlite.org/wal.html it shows the advantages and disadvantages where I think we are not really targeted by the disadvantages.

Revision history for this message
Seif Lotfy (seif) wrote :
Download full text (4.0 KiB)

Here a more detailed benchmarking:

1) NO CHANGES: ~ 0.12 s
------------------------
DEBUG:zeitgeist.engine:Inserted 1 events in 0.107999s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.155844s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.128242s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.161610s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.106275s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.098261s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.112802s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.095572s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.112714s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.151126s

2) PRAGMA locking_mode = EXCLUSIVE: ~ 0.104 s
-----------------------------------------------------
DEBUG:zeitgeist.engine:Inserted 1 events in 0.102811s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.111825s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.099751s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.112921s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.104319s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.116963s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.101241s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.098185s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.101261s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.098256s

3) PRAGMA synchronous =NORMAL: ~ 0.098 s
-----------------------------------------------
DEBUG:zeitgeist.engine:Inserted 1 events in 0.089371s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.092805s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.075517s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.102996s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.080430s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.122163s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.088753s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.151342s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.099504s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.082245s

4) PRAGMA locking_mode = EXCLUSIVE && PRAGMA synchronous =NORMAL: ~ 0.912 s
------------------------------------------------------------------------
DEBUG:zeitgeist.engine:Inserted 1 events in 0.083371s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.076682s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.075443s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.091788s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.076546s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.099368s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.088096s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.091075s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.088003s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.146164s

5) PRAGMA journal_mode = WAL ~ 0.071 s
------------------------------------------
DEBUG:zeitgeist.engine:Inserted 1 events in 0.072993s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.068750s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.067910s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.066470s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.075882s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.071495s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.114053s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.055614s
DEBUG:zeitgeist.engine:Inserted 1 events in 0.059923s
DEBUG:zeitgeist.engine:Inserted 1 events in ...

Read more...

Revision history for this message
Seif Lotfy (seif) wrote :

BTW the tests were done on empty DBs
I wonder how they affect the query times

Seif Lotfy (seif)
Changed in zeitgeist:
status: Triaged → In Progress
Changed in zeitgeist:
milestone: none → 0.6
Revision history for this message
Seif Lotfy (seif) wrote :

OK I worked out the synchronous = NORMAL pragma for a week now and actually broke my netbook. What I learned is you have a better chance of messing up your partition than using synchronous = NORMAL. The deal breaker is the improvement is not noticeable unless we use journal = WAL
However we are facing the issue that WAL and is not supported in older sqlite. What do we do?

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: 0.6 → 0.7
Revision history for this message
Seif Lotfy (seif) wrote :

OK I think the only way we can actually get the insertion time improved ATM is by using WAL for journaling. However it requires sqlite 3.7 and once its set to WAL sqlite 3.6 wont be able to access it.

This breaks compatibility with older zeitgeist DBs:
However I think this should be doable if we set zeitgeist 0.7 to depend on sqlite 3.7 and upgrade the db scheme to 4
What do u think?

Seif Lotfy (seif)
Changed in zeitgeist:
milestone: 0.7.0 → none
Seif Lotfy (seif)
Changed in zeitgeist:
milestone: none → 0.8.0
Seif Lotfy (seif)
Changed in zeitgeist:
status: In Progress → Fix Committed
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.