Comment 4 for bug 781068

Revision history for this message
Matthew Fuller (fullermd) wrote :

(n.b.: I don't think, as bug 197597 is described, this can really be considered a dupe...)

There IS in fact something in core that uses find_branches: check does. And grep says that upgrade does too.

And this behavior makes that incredibly slow. Consider this case, with every bit of data it touches already in a warm cache:

% time dbzr branches . >> /dev/null
4.634u 1.166s 0:05.81 99.6% 1545+1382k 0+0io 0pf+0w

6 seconds! There are 7 branches there (actually, 6, because 1 of the 'branches' is just a symlink to another). Almost a second per branch! A full 'check' of this repo with ~2500 revs and all the branches takes 28 seconds, so more than a quarter of the time is just finding the branches.

If I ktrace 'branches', and see what files it looks at, how many times does it look for a branch-format file?

% kdump < ktrace.out | grep NAMI | grep branch-format | wc -l
   17897

Whaaaat? Well, not only does it check for it under every file in the WT's, but it also hops through an [unversioned and ignored] symlink that points into another big tree elsewhere in the filesystem, that isn't even versioned, much less relevant along with branches. I guess it's a good thing only ONE branch has a symlink off to another big tree (and that it's not the branch that's linked to by the symlink 'branch'), or it would go through all those files again a few more times.

At the least, it shouldn't follow through symlinks. They can't be relevant to find_branches anyway (and I suspect can lead to highly non-POLA behavior). This also causes a related blowup in another case I've had where several levels of out-of-tree symlink following down leads to a self-ref that eventually blows up with "Too many levels of symbolic links" (11 seconds into 'branches' in a dir with 4 branches in it). That would limit the pain to tree size * numbers of trees (which isn't that low a limit, really, but it's a step).