bzr-search slows down "bzr up"

Bug #512216 reported by Eli Zaretskii
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
bzr search plugin
Incomplete
Undecided
Unassigned

Bug Description

Installing bzr-search makes "bzr update" take a very-very-VERY long time due to updating the index. Below are two excerpts from .bzr.log. As you see, the indexing took 500 seconds in the first case, and 1500 seconds in the second, for just a handful of files. My machine is a 3GHz Windows XP box, and I'm running Bazaar 2.0.3 installed with the standalone Windows installer downloaded from the Bazaar Downloads page.

If this is the normal speed of updating the index, then I'd like to arrange for the index updates to be run by a scheduled job some time during the wee hours. Is there any way (except by hacking the sources) to tell the bzr-search not to run the update automatically whenever the branch is changed?

112.547 fetch up to rev {<email address hidden>}
[ 3652] 2010-01-24 22:42:15.299 INFO: M ChangeLog
[ 3652] 2010-01-24 22:42:15.299 INFO: M admin/notes/bugtracker
[ 3652] 2010-01-24 22:42:15.299 INFO: M configure
[ 3652] 2010-01-24 22:42:15.299 INFO: M configure.in
[ 3652] 2010-01-24 22:42:15.299 INFO: M lisp/ChangeLog
[ 3652] 2010-01-24 22:42:15.299 INFO: M lisp/emacs-lisp/advice.el
[ 3652] 2010-01-24 22:42:15.299 INFO: M lisp/emacs-lisp/assoc.el
[ 3652] 2010-01-24 22:42:15.299 INFO: M lisp/net/tramp-smb.el
[ 3652] 2010-01-24 22:42:15.299 INFO: M lisp/progmodes/ada-mode.el
[ 3652] 2010-01-24 22:42:15.299 INFO: M lisp/url/ChangeLog
[ 3652] 2010-01-24 22:42:15.299 INFO: M lisp/url/url-util.el
[ 3652] 2010-01-24 22:42:15.299 INFO: M src/ChangeLog
[ 3652] 2010-01-24 22:42:15.299 INFO: M src/filelock.c
[ 3652] 2010-01-24 22:42:17.049 INFO: All changes applied successfully.
[ 3652] 2010-01-24 22:42:17.220 INFO: Updated to revision 99409.
4154.859 return code 0

20.031 fetch up to rev {<email address hidden>}
[ 228] 2010-01-25 09:45:30.204 INFO: M .bzrignore
[ 228] 2010-01-25 09:45:30.204 INFO: M admin/notes/bugtracker
[ 228] 2010-01-25 09:45:30.204 INFO: M doc/emacs/ChangeLog
[ 228] 2010-01-25 09:45:30.204 INFO: M doc/emacs/programs.texi
[ 228] 2010-01-25 09:45:30.204 INFO: M doc/misc/ChangeLog
[ 228] 2010-01-25 09:45:30.204 INFO: M doc/misc/cc-mode.texi
[ 228] 2010-01-25 09:45:30.204 INFO: M etc/NEWS
[ 228] 2010-01-25 09:45:30.204 INFO: M lisp/ChangeLog
[ 228] 2010-01-25 09:45:30.204 INFO: M lisp/progmodes/python.el
[ 228] 2010-01-25 09:45:30.204 INFO: M lisp/term/xterm.el
[ 228] 2010-01-25 09:45:30.204 INFO: M lisp/vc-bzr.el
[ 228] 2010-01-25 09:45:30.204 INFO: M lisp/vc-git.el
[ 228] 2010-01-25 09:45:30.204 INFO: M src/ChangeLog
[ 228] 2010-01-25 09:45:30.204 INFO: M src/image.c
[ 228] 2010-01-25 09:45:30.204 INFO: M src/keymap.c
[ 228] 2010-01-25 09:45:31.845 INFO: All changes applied successfully.
[ 228] 2010-01-25 09:45:32.002 INFO: Updated to revision 99382.
1774.984 return code 0

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 512216] [NEW] bzr-search slows down "bzr up"

Could you get a lsprof file for this?

e.g.

$ bzr --lsprof-file foo.callgrind up

and attach the callgrind file - this will let me see where the
performance hit is coming in.

-Rob

Revision history for this message
Eli Zaretskii (eliz) wrote :

Attached below. My net connection dropped during that command (would you believe that?), and bzr errored out, but the index update was working for about 20 minutes up until that moment, so I hope the callgrind file has enough information for you. FWIW, most of the time is spent in this phase:

  Indexing...:Indexing commits 2/4

One of the cores is almost 100% busy during all this time, and there's also significant (but not overwhelming) disk I/O.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 512216] Re: bzr-search slows down "bzr up"

Thanks.

Some quick results:
the search.index.combine function is 91% of the time.
it calls:
- copy_posting_lists once (90%)
  which calls:
  - iter_all_entries 1.3M times (20%)
  - add_index_to_pack 155K times (63%)
    which calls:
    - finish 153K times (59%)
      which calls
      - write nodes 153K times (56%)
        which calls NamedTemporaryFile 152K times (21% here and 22% from
        init below)
      - btree init 153K times (22%)

NamedTemporaryFile calls _mkstemp_inner 300K times as a result of this
call pattern (42%)

and _mkstemp_inner calls nt.open the same number of times (38%)

So a big chunk of the time (40%) is simply temporary file management on
your platform.

We may be able to help this by changing bzrlib a bit. Can you perhaps
run your %TEMPDIR% on a ram disk.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :

John Meinel points out that you are probably using 2.0. 2.1 has an
optimisation to make writing BTree indices out faster when they are
small.

-Rob

Revision history for this message
Eli Zaretskii (eliz) wrote :

Yes, this is Bazaar 2.0.3, as mentioned in the original report. I will try with 2.1 when it is released. But since you say the optimization is for small indices, I wonder if it will help with Emacs (we have 99K revisions in the repository). Anyway, I promise to try that and get back with a new profile data.

Regarding the profile, a few comments:
1. I don't have a RAM disk installed. I never needed it for Windows, and finding and installing one sounds like a PITA (after googling a bit). Also, how much RAM would I need to accomodate the temporary files? The *.pack files in the directory created by bzr-search (.bzr/bzr-search/indices) are between 9 and 45MB each and together take up 440MB, if that matters.
2. Even if I install a RAM disk, and let's optimistically assume that it slashes all of the 40% precent of the indexing time -- I'm still left with 10 minutes for updating a tree with half a dozen files.
3. I actually very much doubt that a significant amount of the 40% will get slashed by using a RAM disk. That's because heavy disk I/O usually manifests itself by CPU load going down to 50% or 70%, because the CPU is waiting for the disk. Also, the disk LED is constantly on under heavy disk traffic. By contrast, all the 20 min it took to update the index in my case, the CPU meter was steady at 90%-100%, and the disk LED was flickering. Which means most of the I/O was with the disk cache anyway, and the speedup with RAM disk probably won't be spectacular.

Again, I don't mind the long indexing time, I just don't want it to slow down my merges from mainline. So I'd very much appreciate an option to set in my bazaar.conf to disable the automatic indexing that happens through hooking the branch changes. Then I could schedule the index updates to run when I'm not working on this machine. I don't mind the indedx being slightly behind the tree, since the power of indexed search is normally not needed for recent changes that are still in my fading memory.

Thanks.

Revision history for this message
John A Meinel (jameinel) wrote :

1) 2.1.0rc1 is 'released' and you can try it already (and all of the bX releases are also perfectly stable for running)
2) bzr-search generates lots-and-lots of tiny indexes based on various rules. Robert showed me a callgrind which had 150k indexes. If your total search index size is < 585MB then on average all of those indexes are going to be smaller than a page, which is addressed by the 2.1 change.

I can see where you may want to turn of auto-indexing, which can certainly be a feature request, but I think you'll find it shouldn't cause the same level of interruption in 2.1.

I suppose the other factor is that bzr-search is deciding that it needs to 'compact' its indices (combine existing ones into a whole new set). You may want to disable that specifically instead.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hi Eli,

Any luck with this on 2.1?

Changed in bzr-search:
status: New → Incomplete
Revision history for this message
Eli Zaretskii (eliz) wrote :

Yes, I did try with 2.1 (and all versions after that, up to 2.3.3 that I'm currently using). The "index" command got quite a lot faster, but it is still too slow to have it part of each "update". I run a job once a week to update the index, and it takes anywhere between 2.5 min and 5 min, depending on how much activity went on in Emacs development during the previous week.

To be acceptable for running as part of "update", the indexing should take a few seconds at most.

Revision history for this message
Robert Collins (lifeless) wrote :

On Mon, Aug 15, 2011 at 7:46 AM, Eli Zaretskii <email address hidden> wrote:
> Yes, I did try with 2.1 (and all versions after that, up to 2.3.3 that
> I'm currently using).  The "index" command got quite a lot faster, but
> it is still too slow to have it part of each "update".  I run a job once
> a week to update the index, and it takes anywhere between 2.5 min and 5
> min, depending on how much activity went on in Emacs development during
> the previous week.
>
> To be acceptable for running as part of "update", the indexing should
> take a few seconds at most.

Hmm, it should be that fast :) - so, if it runs once a week, it has to
index all the weeks worth of changes.

Do you know how many revisions its updating when it takes 5 minutes?
Thats 300 seconds, so if we said '3 seconds' as a limit, 100 revisions
would be at that scale if you were pulling all the time, 50 revisions
would say we need (at minimum) a 50% reduction in time-to-index and so
forth.

Revision history for this message
Eli Zaretskii (eliz) wrote :

For example, today's weekly run took 394 seconds for 36 revisions. The one before that took 160 seconds for 48 revisions. Yet another one took 197 seconds for 134 revisions. I can show more statistics if that helps.

Also, there's some obvious overhead involved, because I just ran "bzr index" after committing a single revision since the last "index", and it took 181 seconds! Running "bzr index" again takes 0.7 seconds to discover that it has nothing to do, so the overhead is not related to the bzr startup time.

Perhaps you could add some logging to the "index" command to show where it spends most of the time, then I could report back in a few weeks what I see in the log.

Thanks.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.