code like find_branches tries to open every file as bzrdir

Bug #781068 reported by Alexander Belchenko
This bug report is a duplicate of:  Bug #197597: branches command slow. Edit Remove
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

(Initially posted to bzr ML but got no answer/feedback on this.)

I think this is genuine bug in bzrdir search algorithm or maybe in
find_branches code, but: the latter tries to open every directory and
*every* plain file as bzrdir object. I suppose the original intent was
open_bzrdir always search directories upwards, but find_branches have
to search downwards but it never take care about skipping plain files.

This can be seen with bzr-svn plugin installed because it reports
about every failed attempt to open location as svn tree, see for
example attachment for recent bug
https://bugs.launchpad.net/bugs/778318/+attachment/2116045/+files/.bzr.log

Is it really bug as I think or bzrdir name does not implies it should
be dir only and should support files too? Do you think it should skip
files and check directories only? Dp ypu think current behavior have
some performance penalties because it doing too much useless work?

Tags: performance
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 781068] [NEW] code like find_branches tries to open every file as bzrdir

On Wed, 2011-05-11 at 10:55 +0000, Alexander Belchenko wrote:
> Public bug reported:
>
> (Initially posted to bzr ML but got no answer/feedback on this.)
>
> I think this is genuine bug in bzrdir search algorithm or maybe in
> find_branches code, but: the latter tries to open every directory and
> *every* plain file as bzrdir object. I suppose the original intent was
> open_bzrdir always search directories upwards, but find_branches have
> to search downwards but it never take care about skipping plain files.
>
> This can be seen with bzr-svn plugin installed because it reports
> about every failed attempt to open location as svn tree, see for
> example attachment for recent bug
> https://bugs.launchpad.net/bugs/778318/+attachment/2116045/+files/.bzr.log
>
> Is it really bug as I think or bzrdir name does not implies it should
> be dir only and should support files too? Do you think it should skip
> files and check directories only? Dp ypu think current behavior have
> some performance penalties because it doing too much useless work?
I think the find_branches() interface in general needs to be fixed - see
also bug 672016.

Cheers,

Jelmer

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Please note that there isn't anything in bzr core that actually uses find_branches() as far as I know.

I think we should just redo it.

Changed in bzr:
status: New → Confirmed
importance: Undecided → Medium
importance: Medium → Low
Revision history for this message
John A Meinel (jameinel) wrote :

This is ~ok on a remote location, where determining if a path from list_dir() is a directory or a file is as expensive as just trying to open it. (at least in theory the one is a stat round-trip, the other is an open(.bzr/branch-format) round-trip).

In practice, it could be quite a bit more expensive.

Even further, locally it is extra expensive. The CPU overhead of Branch.open probably dominates. (arguably Branch.open() is way to slow already, but this exacerbates the problem.)

Changed in bzr:
importance: Low → Medium
tags: added: performance
Revision history for this message
Matthew Fuller (fullermd) wrote :

(n.b.: I don't think, as bug 197597 is described, this can really be considered a dupe...)

There IS in fact something in core that uses find_branches: check does. And grep says that upgrade does too.

And this behavior makes that incredibly slow. Consider this case, with every bit of data it touches already in a warm cache:

% time dbzr branches . >> /dev/null
4.634u 1.166s 0:05.81 99.6% 1545+1382k 0+0io 0pf+0w

6 seconds! There are 7 branches there (actually, 6, because 1 of the 'branches' is just a symlink to another). Almost a second per branch! A full 'check' of this repo with ~2500 revs and all the branches takes 28 seconds, so more than a quarter of the time is just finding the branches.

If I ktrace 'branches', and see what files it looks at, how many times does it look for a branch-format file?

% kdump < ktrace.out | grep NAMI | grep branch-format | wc -l
   17897

Whaaaat? Well, not only does it check for it under every file in the WT's, but it also hops through an [unversioned and ignored] symlink that points into another big tree elsewhere in the filesystem, that isn't even versioned, much less relevant along with branches. I guess it's a good thing only ONE branch has a symlink off to another big tree (and that it's not the branch that's linked to by the symlink 'branch'), or it would go through all those files again a few more times.

At the least, it shouldn't follow through symlinks. They can't be relevant to find_branches anyway (and I suspect can lead to highly non-POLA behavior). This also causes a related blowup in another case I've had where several levels of out-of-tree symlink following down leads to a self-ref that eventually blows up with "Too many levels of symbolic links" (11 seconds into 'branches' in a dir with 4 branches in it). That would limit the pain to tree size * numbers of trees (which isn't that low a limit, really, but it's a step).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.