status is slow at printing newly added files

Bug #758164 reported by Martin Pool
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

sabdfl reports:

---
cd emacs
bzr init
bzr add
time bzr status
time bzr status
time bzr commit
time bzr status
time bzr status

In my case, I see a marked difference between the first two "status"
times, when there are a lot of added but uncommitted files, and the
second two "status" times, when the files are all committed.
---

Revision history for this message
Martin Pool (mbp) wrote :

I can't reproduce this at the moment: on a machine with magnetic disks and with the caches flushed I get

[1] mbp@grace% time bzr init && time bzr add -q && time bzr st |wc && time bzr st |wc
Created a repository tree (format: 2a)
Using shared repository: /home/mbp/testdata/launchpad/
bzr init 0.45s user 0.12s system 14% cpu 3.908 total
bzr add -q 1.17s user 0.18s system 61% cpu 2.180 total
   6891 6891 367471
bzr st 0.40s user 0.05s system 83% cpu 0.536 total
wc 0.02s user 0.01s system 5% cpu 0.536 total
   6891 6891 367471
bzr st 0.37s user 0.05s system 97% cpu 0.429 total
wc 0.01s user 0.02s system 6% cpu 0.429 total

strace shows us chdiring into subdirectories, reading the directory and stating the files and that's it.

(Possibly we could just use the metadata returned by getdents and not even stat them, but that shouldn't cause a drastic change in time, and istr we discussed this and found there was something we needed missing from that.)

Changed in bzr:
assignee: canonical-bazaar (canonical-bazaar) → Martin Pool (mbp)
status: Confirmed → Incomplete
Revision history for this message
Martin Pool (mbp) wrote :

Actually, the previous comment is inaccurate: I don't reproduce the previous problem of the very first status being slow, but it is definitely slower with uncommitted adds than after commit.

Although the first two status runs are not doing more IO, they are quite noticeably slower, and the difference seems to be mostly in user time

bzr init 0.15s user 0.01s system 97% cpu 0.165 total
bzr add -q 0.73s user 0.08s system 98% cpu 0.822 total
bzr st > /dev/null 0.37s user 0.04s system 99% cpu 0.414 total
bzr st > /dev/null 0.34s user 0.05s system 98% cpu 0.396 total
bzr ci -m 'foo' -q 13.21s user 1.37s system 17% cpu 1:22.00 total
bzr st 0.16s user 0.05s system 96% cpu 0.217 total
bzr st 0.18s user 0.01s system 97% cpu 0.195 total

Martin Pool (mbp)
Changed in bzr:
status: Incomplete → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 758164] Re: first status after add is slow

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/12/2011 01:13 AM, Martin Pool wrote:
> ** Changed in: bzr
> Status: Incomplete => Confirmed
>

Wouldn't we be *printing* all of those uncommitted additions. That seems
like a fair amount of time to me.

Note that there are a *lot* of things we do for files that show up
different versus the base. For example, we decode the UTF-8 path name
into a Unicode string. We take a lot of care to try to filter things
that are unchanged as quickly as possible. We don't take as much care to
do a minimum amount of work once we've determined an object is
interesting (different from basis). Mostly because there is usually only
1 commit that has anything like 90% of the files interesting, and the
other 100,000 commits have at most 10% of them interesting.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2kHDoACgkQJdeBCYSNAAPxFwCgoe6xU46/BBt8jopcstJx2xaY
Jf8AoLsmu/KwW2r3zfLPoCawAK67SqBW
=BcF+
-----END PGP SIGNATURE-----

Revision history for this message
Martin Pool (mbp) wrote : Re: first status after add is slow

I think you're right that this is basically the work of printing them, and just speeding up the general changes->output would help.

Revision history for this message
Martin Pool (mbp) wrote :

random as-yet-uninvestigated idea: some of the add and/or status code goes through the python logging layers, which may be substantially slower than other possibilities like writing them straight through the ui object to stdout.

Martin Pool (mbp)
summary: - first status after add is slow
+ status is slow at printing newly added files
Revision history for this message
Martin Pool (mbp) wrote :

I profiled this just to see if there was anything very obviously wrong.

status is using the older TreeDelta class in bzrlib.delta and may possibly be able to be faster by using something closer to iter_changes.

It also spends something like 33% of its time in report_delta printing it, even when sending the output to /dev/null, about 17% + 8% in bzrlib.ui, and about 11% of its time in the write codec.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 758164] Re: status is slow at printing newly added files

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 4/18/2011 3:40 AM, Martin Pool wrote:
> I profiled this just to see if there was anything very obviously wrong.
>
> status is using the older TreeDelta class in bzrlib.delta and may
> possibly be able to be faster by using something closer to iter_changes.
>
> It also spends something like 33% of its time in report_delta printing
> it, even when sending the output to /dev/null, about 17% + 8% in
> bzrlib.ui, and about 11% of its time in the write codec.

TreeDelta now just wraps iter_changes() and aggregates the results. So
while it adds a bit of sorted() time, it shouldn't be slower at
computing the actual diff.

I think the primary slowdown is just printing.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk2sAL8ACgkQJdeBCYSNAAOKHgCghk1PA/UWayYY6Nkq23GwB9+u
Zz4AoIptPPx4KpDpfR99AUE88zmyEeKp
=A04U
-----END PGP SIGNATURE-----

Martin Pool (mbp)
Changed in bzr:
assignee: Martin Pool (mbp) → nobody
importance: High → Medium
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.