Slow performance for many operations on the gcc code import branches

Bug #593560 reported by Loïc Minier
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
High
Unassigned
Breezy
Triaged
Medium
Unassigned

Bug Description

Hi there

For Linaro, we'd like to do our developments on top of the FSF gcc SVN imports (lp:gcc/4.4 and lp:gcc/4.5 branches), which are recent imports using format 2a. While grabbing these and pushing them to lp:gcc-linaro/4.4 and lp:gcc-linaro/4.5, performance was quite bad, with very slow local branching, or even "bzr status".

Would you help us setup bzr for better performance, or look into the issues slowing it down?

If you think our history / tree sizes are so big that performance wont be stellar before a while, what options do we have? Could we work from partial history from e.g. the last year only, or from tarball imports?

These are my notes on performance (some are network bound):
bzr branch lp:~vila/gcc/4.4
11mn58s

bzr checkout 4.4 co
3mn17s

bzr checkout --lightweight 4.4 colw
3mn24s

bzr revert -r tag:gcc_4_4_4_release
7mn55s

bzr status
25s

bzr branch -r tag:gcc_4_4_4_release 4.4 4.4.4
9mn47s

# (network)
bzr branch -r tag:gcc_4_4_4_release lp:~vila/gcc/4.4 lp:~gcc-linaro-dev/gcc-linaro/4.4
3h21mn37s

# from a bzr init-repo with the 4.4 revs
bzr branch lp:~vila/gcc/4.5
10mn45s

bzr revert -r tag:gcc_4_5_0_release
7mn44s

# (network)
bzr push -r tag:gcc_4_5_0_release lp:~gcc-linaro-dev/gcc-linaro/4.5
2h28mn41s

Thanks!

PS: adding "udd" tag as vila suggested

Related branches

Revision history for this message
Vincent Ladeuil (vila) wrote :

Quick comments:
- using different projects for gcc-linaro and gcc means pushing will push the entire history without
  taking advantage of the gcc stacked-on branch (presumably lp:gcc)
- on my machine, checkout times are worse with SSD than regular HDs, the massive IO
  and an old SSD being the main reasons
- branch and checkout have a '--hardlink' parameter which helps

Changed in bzr:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 593560] Re: Slow performance for many operations on the gcc code import branches

the hardlink parameter shouldn't change anything, and in my tests is
slower than extracting content from a repo.

It also doesn't explain 25s in status.

Can you please confirm:- you are using bzr 2.2bsomething
- your bzr version
- what plugins you have installed
-attach a callgrind (bzr --lsprof-file foo.callgrind slow-operation)
 for:
  -bzr checkout --lightweight 4.4 colw
  -bzr revert -r tag:gcc_4_4_4_release
  -bzr status
  -bzr branch -r tag:gcc_4_4_4_release 4.4 4.4.4

Until we have local perf pinned down, looking at network will be a distraction.

Thanks,
Rob

Revision history for this message
Vincent Ladeuil (vila) wrote :

Also, providing .bzr.log related parts for each problematic command will help diagnosis.

Revision history for this message
Loïc Minier (lool) wrote :

Ah sorry, I checked bzr version to include here but forgot to copy-paste it:
2.1.2
(I'm running maverick)

plugins:
- cia in ~/.bazaar/plugins
- system wide I have this bzr related packages which hold misc plugins: % dpkg -l bzr\*|grep ^ii|awk '{print $2, $3}':
bzr 2.1.2-1
bzr-builddeb 2.4.2
bzr-dbus 0.1~bzr39-2
bzr-gtk 0.98.0-1ubuntu1
bzr-rebase 0.6.0-1
bzr-rewrite 0.6.0-1
bzrtools 2.1.0-2
% ls /usr/share/pyshared/bzrlib/plugins
builddeb bzrtools cia dbus etckeeper gtk __init__.py launchpad netrc_credential_store news_merge rewrite

Will provide .bzr.log + callgrind trace for each slow op.

Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :

Important note: I'm using ecryptfs on my whole $HOME; it caused issues in the past with things such as sparse files, but performance was overall comparable to before using that in my experience

Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :

I wanted to try out perf to see whether it gives any insight on what's going on:
% sudo perf top
             samples pcnt function DSO
             _______ _____ __________________________ ____________

             1048.00 21.0% PyEval_EvalFrameEx python2.6
              373.00 7.5% RotatingTree_Get _lsprof.so
              365.00 7.3% memcpy libc-2.12.so
              305.00 6.1% read_hpet [kernel]
              187.00 3.8% PyObject_Hash python2.6
              175.00 3.5% PyObject_Malloc python2.6
              127.00 2.5% PyObject_GenericGetAttr python2.6
              125.00 2.5% _PyString_Eq python2.6
              118.00 2.4% _int_malloc libc-2.12.so
              106.00 2.1% PyTuple_New python2.6
               95.00 1.9% PyString_FromStringAndSize python2.6
               89.00 1.8% PyObject_Free python2.6
               77.00 1.5% __memchr libc-2.12.so

(obviously not the level you care about, but still interesting to see what kind of functions it's hitting in python)

Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
Loïc Minier (lool) wrote :
Revision history for this message
John A Meinel (jameinel) wrote :

Small note on your "bzr co --lightweight" times (colw.lsprof). _create_files seems to be the bulk of the time. Taking that perf, and filtering out the unimportant functions:

Sorry that the malone will mess up the formatting, but this is the best I can do for now.

           1 0 173.9197 3.6865 bzrlib.transform:2370(_create_files)
      +62598 0 65.5857 1.3641 +bzrlib.transform:1175(create_file)
...
      +62598 0 67.3141 0.3384 +bzrlib.workingtree:444(get_file)
...
       +3801 0 27.7786 0.0733 +bzrlib.osutils:1676(_walkdirs_utf8)
...
          +1 0 2.4488 0.0000 +bzrlib.tree:98(iter_changes)

And focusing just on the "get_file" path, which terminates here:
       62598 0 61.5976 60.5614 bzrlib.workingtree:462(get_file_byname)
      +62598 0 1.0362 0.2359 +bzrlib.workingtree:393(abspath)

Note that get_file_byname is:
    def get_file_byname(self, filename, filtered=True):
        path = self.abspath(filename)
        f = file(path, 'rb')
        if filtered and self.supports_content_filtering():
            ...
        else:
            return f

The key bit to notice is that this is effectively just file() or open().

My guess is that your encrypted filesystem actually has a lot more overhead
than you realize. Probably because the filenames themselves are encrypted, it
has to do a fair amount of work to figure out the path <=> encrypted path
mapping.

It seems to cost right about 1ms for every file we want to open (for reading
*or* writing). Which probably isn't a huge overhead for many operations that
only touch a couple of files, but if you end up reading/writing/statting the
entire 64k workingtree, that 1ms adds up fast.

That would also explain 'bzr status' performance, I think. Since you have the
overhead of determining the paths over and over again.

I certainly could be wrong. But as near as I can tell, the time is spent in
'open()' (aka file()). Which is normally pretty cheap.

Revision history for this message
John A Meinel (jameinel) wrote :

I'll also try to expand on what Robert mentioned. At present, if you do something like "bzr co --lightweight" and the source location has a working tree, it tries to read the content from that working tree, rather than extracting the content from the repository.

This *used* to be faster (pre 2a formats). I'm guessing it no longer holds true. If we removed that logic, you would be saving ~90s from your checkout time. 30s in 'iter_changes' and 60s from reading the old content. I don't know what the specific cost for extraction would be, but at least on your encrypted filesystem, it seems that opening lots of files is much more expensive than doing more work from a single larger file.

Could you try doing:

 time bzr co source test1
 cd test1
 bzr remove-tree .
 cd ..
 time bzr co --lightweight --lsprof-file colw-nosourcewt.lsprof test1 test2

And see what that gives us?

Revision history for this message
Loïc Minier (lool) wrote :

time bzr co 4.4 test1
bzr co 4.4 test1 1484,59s user 69,65s system 93% cpu 27:38,51 total

(bzr remove-tree)

time bzr co --lightweight --lsprof-file colw-nosourcewt.lsprof test1 test2
Profile data written to "colw-nosourcewt.lsprof".
bzr co --lightweight --lsprof-file colw-nosourcewt.lsprof test1 test2 172,49s user 33,52s system 76% cpu 4:28,71 total

I also did some tests on a lucid server running bzr 2.1.1 and using a regular hard disk, a faster CPU, and no ecryptfs, the results were much better (about 3 times better):
# (network)
bzr branch lp:gcc-linaro/4.4
59mn56s

# seems odd
bzr co 4.4 co
19mn24s

bzr co --lightweight 4.4 colw
1mn28s

bzr status
9s

bzr revert -r tag:gcc_4_4_4_release
1mn10s

# seems odd
bzr branch -r tag:gcc_4_4_4_release 4.4 4.4.4
19mn48s

(Memory usage was quite scary during this process, with bzr using as much as 11% of memory in top on a 8 GiB machine.)

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 593560] Re: Slow performance for many operations on the gcc code import branches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Loïc Minier wrote:
> time bzr co 4.4 test1
> bzr co 4.4 test1 1484,59s user 69,65s system 93% cpu 27:38,51 total
>
> (bzr remove-tree)
>
> time bzr co --lightweight --lsprof-file colw-nosourcewt.lsprof test1 test2
> Profile data written to "colw-nosourcewt.lsprof".
> bzr co --lightweight --lsprof-file colw-nosourcewt.lsprof test1 test2 172,49s user 33,52s system 76% cpu 4:28,71 total
>

I'm having a hard time parsing your 'time' output, but it at least looks
like:

bzr co 4.4 test1 ... 27:38,51
bzr co --lightweight ... test1 test2 4:28,71

Which says it is about 6-7x faster to do the checkout without having a
source working tree... (And that is even while running --lsprof.)

Certainly it looks like we may want a way to tell bzr *not* to look at
source working trees when building a new WT.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwXlcUACgkQJdeBCYSNAAOR+gCfX9+R9+ZUjk/43FV7ZT7W+xP7
LkIAnAzcvpOhTV8XGoWQC4xUZEIZ70ke
=tO4c
-----END PGP SIGNATURE-----

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 593560] Re: Slow performance for many operations on the gcc code import branches

I think we should just remove accelerator tree support except when
hardlinking; its almost certainly slower except in two particular
case: Where the source tree is cache hot or hardlinks are being used.

-Rob

Revision history for this message
Martin Pool (mbp) wrote :

On 16 June 2010 09:58, Robert Collins <email address hidden> wrote:
> I think we should just remove accelerator tree support except when
> hardlinking; its almost certainly slower except in two particular
> case: Where the source tree is cache hot or hardlinks are being used.

That makes sense. Perhaps we could make it an option defaulting to
off so that we can still compare it in the future.

--
Martin

Revision history for this message
Andrew Bennetts (spiv) wrote :

For the benefit of people reading the comments on this bug, branch and checkout no longer use "accelerator" trees by default (as of lp:bzr r5358, which will be in bzr 2.3b1). See bug 607298.

So judging by comment #18 of this bug using bzr trunk should give noticeably better results.

Revision history for this message
Loïc Minier (lool) wrote :

Good news, thanks!

Revision history for this message
Andrew Bennetts (spiv) wrote :

The stat time in this bug report is weirdly high: on my laptop it takes under a second to stat the gcc-linaro 4.4 tree even with lp:bzr/2.1 and a bunch of plugins enabled. The lsprof data you provide suggests that it is spending a lot of time reading files, and is rewriting the dirstate, which implies the tree (or at least the stat metadata) has changed.

A look around suggests that ecryptfs does in fact have a bug that would cause this: bug 613873 "stat mtime on encrypted fs doesn't match between calls". I suppose bzr could offer a "only check mtimes to 1s resolution" option as a workaround for users in your situation?

(This in turn would contribute to making the bzr 2.1 behaviour of using the source tree as an "accelerator" for operations like checkout behave poorly, because that implies doing a status check first.)

Revision history for this message
Andrew Bennetts (spiv) wrote :

vila pointed out to me on IRC that we already truncate mtimes (and ctimes) to 1s resolution. (That reminded me that we introduced that behaviour to guard against exactly this sort of bug in the kernel.)

So that probably rules out bug 613873 as being involved. But something still smells fishy. Can you try running "bzr status" twice in a row, and tell us if the second time is still 25s? Basically for making sense of "bzr status" timings it helps to know how many files have changed since the last bzr operation, and whether the tree and repo would still be in the OS cache or not. It would be good to get a baseline value for hot cache + nothing changed, which should incur no physical disk IO.

As a data point, if I touch every file in my gcc-linaro tree (find . | xargs touch), 'bzr st' takes < 9s to run with a hot cache (it has to revalidate the file contents and rewrite the dirstate).

Revision history for this message
Loïc Minier (lool) wrote :

In comment #17 I made some tests on a lucid (bzr 2.1.1) non-ecryptfs based home dir with rotating storage.

I just tried again on this lucid install, which I think is a more common use case for working on this branch than the ecryptfs setup, and it took 1mn28s to "bzr status" the first time and indeed about 1s the second time.

After for i in *; do find $i | xargs touch; done, bzr status takes 12s.

Revision history for this message
Loïc Minier (lool) wrote :

I tested again on ecryptfs on SSD, the second bzr status takes 6 seconds; after a touch on all files (takes 2 seconds), bzr status takes 28s.

Revision history for this message
Loïc Minier (lool) wrote :

On the same system with SSD, but without ecryptfs, the second run of a bzr status takes 2 seconds and a run after touching all files takes 15s.

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
tags: added: performance revert
removed: check-for-breezy
tags: removed: revert
To post a comment you must log in.