Fetching svn revision info seems to be very slow

Bug #882388 reported by Philip Peitsch on 2011-10-27
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar Subversion Plugin

Bug Description

I've just installed a clean version of bzr 2.4.1 on my windows box, and have checked out a network bzr clone of a subversion repository. I have then switched my local copy to be a checkout of the original svn source (so I can commit directly back to svn). At this point, it is wanting to fetch the svn revision info, and is processing at a rate of around 300 entries every 10mins, which at 23000 entries left, is going to take me 12hrs...

I don't ever recall it being this slow on bzr 2.2.3 (the previous version I used)

summary: - featching svn revision info seems to be very slow
+ tching svn revision info seems to be very slow
summary: - tching svn revision info seems to be very slow
+ Fetching svn revision info seems to be very slow
Philip Peitsch (philip-peitsch) wrote :

To add further detail, the structure is as follows:

There is an existing svn trunk that is live (/project/trunk). Due to memory limits etc, standard practice is to copy a bzr branch made using linux (bzr branch svn+http://.../project/trunk) onto the Windows computer, then reconfigure the branch to be a bound-branch (bzr reconfigure --checkout svn+http://.../project/trunk). After this, running bzr up will fetch the svn repository info.

Philip Peitsch (philip-peitsch) wrote :

Annd another question. Once this process completes on one machine, can I potentially share the cache data to other windows machines (e.g., copying the svn-cache directory) to prevent them for having to spend all the time refetching things?

Philip Peitsch (philip-peitsch) wrote :

Observing my PC, there is very little cpu usage or memory usage by the bazaar process, but there seems to be an astronomical number of writes to disk. I wonder if this is related to sqlite's fsync behaviour (remembering that firefox had abysmal performance around their 3 release due to something similar).

Philip Peitsch (philip-peitsch) wrote :

Here's the result of this command: bzr up --lsprof-file callgrind.out

Note, it hasn't finished updating, but appears to have stopped writing to the callgrind.out file already...?

Philip Peitsch (philip-peitsch) wrote :

Same thing again, except this time with the following additions to the bazaar.conf file (suggested by wgz on IRC):

dirstate.fdatasync = false
repository.fdatasync = false

Philip Peitsch (philip-peitsch) wrote :

More callgrinds. Fsync = false, command was bzr branch -r50 svn+http://.../project/trunk trunk1 --lsprof-file nofsync.callgrind.out

Philip Peitsch (philip-peitsch) wrote :

More callgrinds. Fsync = true, command was bzr branch -r50 svn+http://.../project/trunk trunk2 --lsprof-file fsync.callgrind.out

Jelmer Vernooij (jelmer) wrote :

Which version of bzr-svn is this?

Philip Peitsch (philip-peitsch) wrote :

So I tried moving the svn-cache file onto a ramdisk just to see, and this significantly speeds it up again. For this, I manually hacked up the __init__.py for the cache module (C:\Program Files\Bazaar\plugins\svn\cache\__init__.py), and adjusted it to look on a ramdisk and write to there. The results are significantly less disk usage, and processing at a rate of a few thousand a minute.

So I think this is rather confirmed to be an sqlite/fsync piece of fun!

1.1.0 . I've just installed the windows 2.4.1 installer version of bazaar

Jelmer Vernooij (jelmer) wrote :

I don't have a way of reading callgrind files here. Does disabling fsync speed things up?

Nope. Disabling the dirstate and repository fsyncs seemed to do nothing. However, disabling sqlite's fsync behaviour ("PRAGMA synchronous = 0" as found on here: http://www.sqlite.org/pragma.html) did rapidly speed up the retrieval (it's doing roughly 50 per second at the moment)

Jelmer Vernooij (jelmer) wrote :

That seems a bit odd, bzr-svn hasn't changed how much it uses the cache, especially for writes.

It may be that the last time I did this operation was 1.5 years ago, and things have grown enough that I notice this now... not sure! We do have roughly 20,000 more commits this time around (up from 10,000)

Committing also hits this slowdown during the "determining rev no" part on the first commit. The speed-up isn't quite as dramatic for that, but is definitely still noticeable.

Martin Packman (gz) wrote :

From the profile output (the latter two are pickles), basically the issue is this:

   CallCount    Recursive    Total(ms)   Inline(ms) module:lineno(function)
         234            0     37.0572     37.0572   <method 'executemany' of 'sqlite3.Connection' objects>
         123            0      2.2640      2.2640   <method 'execute' of 'sqlite3.Connection' objects>
         106            0      0.1915      0.1915   <method 'write' of 'file' objects>
           4            0      0.1139      0.1132   <method 'get_latest_revnum' of '_ra.RemoteAccess' objects>
         +48            0      0.0007      0.0002   +bzrlib.plugins.svn.transport:202(update)
           1            0     39.4097      0.0388   <method 'get_log' of '_ra.RemoteAccess' objects>
        +117            0     39.3639      0.0019   +bzrlib.plugins.svn.logwalker:205(__call__)
         +56            0      0.0070      0.0003   +bzrlib.plugins.svn.transport:202(update)

Jelmer Vernooij (jelmer) on 2011-11-11
tags: added: performance
Jelmer Vernooij (jelmer) on 2012-01-23
tags: added: sqlite
Changed in bzr-svn:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers