bzr check is slow

Bug #834754 reported by Alexander Belchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

Running `bzr check` for bzr.dev repository using bzr 2.4b5.

C:\work\Bazaar\bzr-2a\bzr.dev>timeit bzr check
Checking working tree at 'C:/work/Bazaar/bzr-2a/bzr.dev'.
Checking branch at 'file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/2.2/'.
Checking branch at 'file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/2.3/'.
Checking branch at 'file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/2.3/bugfix-660174/'.
Checking branch at 'file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/trunk/'.
Checking repository at 'file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/'.
checked repository file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/ format RepositoryFormat2a()
 37785 revisions
  1950 file-ids
     2 ghost revisions
     1 inconsistent parents
checked branch file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/2.3/bugfix-660174/ format Branch format 7
checked branch file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/trunk/ format Branch format 7
checked branch file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/2.3/ format Branch format 7
checked branch file:///C:/work/Bazaar/bzr-2a/bzr.dev/.bzr/branches/2.2/ format Branch format 7

time: 12850.577

It took 3 hours and 35 minutes to finish.

It spent almost 3 hours in the phase:
checking file graphs:text-index:Finding text references xxx/37785
and during that phase bzr read from the disk about 6.5 GB of data, while the repo itself has 58 MB in packs and 13 MB in indices.

Then bzr spent about 35 minutes in the phase:
checking file graphs:text-index:Calculating text parents xxx/105693
and during that phase bzr read from the disk about 80 MB of data

I've filed this bug report because of the harsh feedback from http://lists.sourcegear.com/pipermail/veracity-users/2011-August/000276.html

I would agree with the point that if bzr check is very slow then people won't use it on their own repositories.

Maybe by default `bzr check` should work much faster and maybe only check the MD5 sum of pack files (Bug #676014), checksum of index files and pack-names database. I hope btree-index files (index files and pack-names) have some checksum inside(?). If the first reason to run check regularly is to have faster alert about filesystem inconsistency, then maybe such fast check should be enough for the pitfalls explained in Adrian's mail? It won't help against malicious attack, of course, but at least common problems with filesystems, including network shares could be checked much faster and easier?

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 834754] [NEW] bzr check is slow

On 26/08/11 16:11, Alexander Belchenko wrote:
> Maybe by default `bzr check` should work much faster and maybe only
> check the MD5 sum of pack files (Bug #676014), checksum of index files
> and pack-names database. I hope btree-index files (index files and pack-
> names) have some checksum inside(?). If the first reason to run check
> regularly is to have faster alert about filesystem inconsistency, then
> maybe such fast check should be enough for the pitfalls explained in
> Adrian's mail? It won't help against malicious attack, of course, but at
> least common problems with filesystems, including network shares could
> be checked much faster and easier?
I think limiting what is checked by default would be a reasonable thing
to do. We can then have a --full option or something like that that
checks more.

Cheers,

Jelmer

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/26/2011 5:14 PM, Jelmer Vernooij wrote:
> On 26/08/11 16:11, Alexander Belchenko wrote:
>> Maybe by default `bzr check` should work much faster and maybe
>> only check the MD5 sum of pack files (Bug #676014), checksum of
>> index files and pack-names database. I hope btree-index files
>> (index files and pack- names) have some checksum inside(?). If
>> the first reason to run check regularly is to have faster alert
>> about filesystem inconsistency, then maybe such fast check should
>> be enough for the pitfalls explained in Adrian's mail? It won't
>> help against malicious attack, of course, but at least common
>> problems with filesystems, including network shares could be
>> checked much faster and easier?
> I think limiting what is checked by default would be a reasonable
> thing to do. We can then have a --full option or something like
> that that checks more.
>
> Cheers,
>
> Jelmer
>

I'm pretty sure this is a duplicate, but I'm not sure the original bug #.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5Y5G4ACgkQJdeBCYSNAANGtwCfZnFa4/X0d+8iCIBSm7O5ZsvX
QXUAn0nLA9KvNetamnRcbJjdWrEDzHKj
=9ZTW
-----END PGP SIGNATURE-----

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

bug 425463 is related, at least.

Revision history for this message
Alexander Belchenko (bialix) wrote :

27.08.2011 15:34, John A Meinel пишет:
> I'm pretty sure this is a duplicate, but I'm not sure the original bug
> #.

I haven't found direct duplicate while searching by tag "check".

Revision history for this message
John A Meinel (jameinel) wrote :

The bulk of the time is spent checking that the file-id graphs match what Inventories say it should be. I think the current code does it file-by-file, which means that we re-read each inventory for each revision that touched the file. An alternative way to do that check would be to go inventory-by-inventory and generate all file-graphs concurrently.
Anyway, it would certainly be possible to just *not* do that check. Or to have a "bzr check --fast" vs "bzr check --full".

Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Matthew Fuller (fullermd) wrote :

> I've filed this bug report because of the harsh feedback from
> http://lists.sourcegear.com/pipermail/veracity-users/2011-August/000276.html

The question is, at a high level, are we way slower due to checking
more, or just checking way less efficiently?

Picking a git repo I have locally, 'git fsck' takes under a second.
Well, almost two seconds if I run it --verbose so I can see all the
things it's checking. bzr-git it over into a bzr branch, and 'check'
takes a bit over 28 seconds (it's also something over 40% bigger
packed up).

To take repacking as a comparison, we're only ~2x slower; git takes a
bit under 2 seconds, we take just under 4.

( source repo: git://github.com/miyagawa/Plack ~4 meg in git )

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On 08/31/2011 12:00 PM, Matthew Fuller wrote:
>> I've filed this bug report because of the harsh feedback from
>> http://lists.sourcegear.com/pipermail/veracity-users/2011-August/000276.html
> The question is, at a high level, are we way slower due to checking
> more, or just checking way less efficiently?
>
> Picking a git repo I have locally, 'git fsck' takes under a second.
> Well, almost two seconds if I run it --verbose so I can see all the
> things it's checking. bzr-git it over into a bzr branch, and 'check'
> takes a bit over 28 seconds (it's also something over 40% bigger
> packed up).
>
> To take repacking as a comparison, we're only ~2x slower; git takes a
> bit under 2 seconds, we take just under 4.
>
>
> ( source repo: git://github.com/miyagawa/Plack ~4 meg in git )
>
"bzr check" is the rough equivalent of "git fsck --full --verbose", how
long does that take?

Cheers,

Jelmer

Revision history for this message
Matthew Fuller (fullermd) wrote :

> "bzr check" is the rough equivalent of "git fsck --full --verbose",
> how long does that take?

--full is the default. It takes .56 seconds without --verbose, 1.59
with (the extra being just to scroll 22,295 lines in xterm).

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.