fetching from pack to 2a format is slow

Bug #407834 reported by Michael B. Trausch
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

I am currently attempting (for the third time, though through no fault of bzr itself that I had to interrupt and re-start twice; I had a forced-reboot due to video corruption the first time, and the second time, my child killed the process by closing the terminal) to clone the lp:mysql-server/6.0 branch. However, the first time attempt ran for two or three hours before I had to reboot, and I didn't have a time on the second attempt. I'm currently attempting for the third time.

I'm not sure why it's taking so long. This does seem just more than a bit on the side of unreasonable, however.

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

If it helps any, at no point during the first nor the current branch runs has my connection been saturated with bzr data transfer, and it is using nearly 100% of a single core. (It seems to be, just from that informal observation, that maybe the operation is CPU-bound—perhaps there would be a way to make it use all available power to speed up the operation?)

Revision history for this message
Martin Pool (mbp) wrote :

Please give the precise command you're running, and 'bzr info' in the destination directory; please also re-run it with the -Dhpss flag added and post the trace from .bzr.log.

summary: - Branching MySQL takes *forever*
+ Branching MySQL too slow
Changed in bzr:
status: New → Incomplete
Revision history for this message
Michael B. Trausch (mtrausch) wrote : Re: Branching MySQL too slow

Command line (literally, just a straightforward branch):

 bzr branch lp:mysql-server/6.0 mysql-6.0

I am unsure what you mean, "bzr info" in the target directory, as the target directory isn't created until the branch begins. The branch is going to a shared repository, and the info for that is thus:

 Shared repository with trees (format: 2a)
 Location:
   shared repository: .

It was only running for 45 minutes when I received the request for more info, so I've restarted it with the debug options, going to a clean .bzr.log file. I should have a result in... several hours. Well, maybe. I don't remember how many revisions there were to fetch when I started trying to branch 24 hours ago, but it seems to go down with every failed run; it was 53014 just before I filed this bug, when I restarted with debug options, it is now 52514. There was 41 minutes, 49 seconds (wall-clock time), 32:28 user CPU time, 9 seconds system time in that run.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 407834] Re: Branching MySQL too slow

This is:
 - adding rich root parent detail
 - and converting the data compression format used

Furthermore, due to another bug (look at the 2.0 targeted bugs) the
conversion method is doing millions of round trips to the launchpad.

So - we know its slow, and we're working on fixing it.

I suggest either getting mysql to mass upgrade, or not using a 2a
repository with mysql until we've gotten the upgrade bugs targeted
against 2.0 fixed.

Cheers,
Rob

Revision history for this message
Martin Pool (mbp) wrote :

Did you get a warning that the transfer would be slow? If not, you
probably should, and it's probably a bug that you don't.

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

On Mon, 3 Aug 2009, Martin Pool wrote:

> Did you get a warning that the transfer would be slow? If not, you
> probably should, and it's probably a bug that you don't.

No, I didn't. A certainly would be useful.

  --- Mike

>
> --
> Branching MySQL too slow
> https://bugs.launchpad.net/bugs/407834
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Bazaar Version Control System: Incomplete
>
> Bug description:
> I am currently attempting (for the third time, though through no fault of bzr itself that I had to interrupt and re-start twice; I had a forced-reboot due to video corruption the first time, and the second time, my child killed the process by closing the terminal) to clone the lp:mysql-server/6.0 branch. However, the first time attempt ran for two or three hours before I had to reboot, and I didn't have a time on the second attempt. I'm currently attempting for the third time.
>
> I'm not sure why it's taking so long. This does seem just more than a bit on the side of unreasonable, however.
>

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

On Mon, 3 Aug 2009, Michael B. Trausch wrote:

> On Mon, 3 Aug 2009, Martin Pool wrote:
>
>> Did you get a warning that the transfer would be slow? If not, you
>> probably should, and it's probably a bug that you don't.
>
> No, I didn't. A certainly would be useful.

I could have sworn I typed "A warning certainly would be useful." I think
it's time to stop sitting at the computer...

Martin Pool (mbp)
summary: - Branching MySQL too slow
+ fetching from pack to 2a format is slow
Revision history for this message
Martin Pool (mbp) wrote :

So is the essence of this bug just that there should be a warning, or should/can we do something to make it faster?

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 407834] Re: fetching from pack to 2a format is slow

I think this one is, at most, that a warning might be useful.

We don't warn for svn conversions though, and they are just about as
painful :)

-Rob

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

On Tue, 4 Aug 2009, Martin Pool wrote:

> So is the essence of this bug just that there should be a warning, or
> should/can we do something to make it faster?

Hrm. That I don't know; I'm not intimately familiar with bzr's internals.
I *can* say that I didn't expect to not finish it on my system; it's not
exactly a low-end system. Well, it wasn't when I bought it:

mbt@zest:~/Projects/UNIX/OpenSource/AllTray/alltray$ cat
/proc/cpuinfo|grep MH
cpu MHz : 2200.000
cpu MHz : 2200.000
cpu MHz : 2200.000
cpu MHz : 2200.000
mbt@zest:~/Projects/UNIX/OpenSource/AllTray/alltray$ cat /proc/meminfo
|grep MemTotal
MemTotal: 5868964 kB

I can note at least the following:

  * Branching it without landing it into a shared repository took only 30
minutes, which was fine given the large size of the project. Others might
complain about that. I believe that the Linux kernel is of comparable
size and depth, though I could be wrong. In any event, git seems to clone
that in around 10 minutes (and does saturate my connection in doing so);
branching MySQL never does).

  * When it was doing the conversion locally, it ran for several hours
(without completing; while that's not bzr's fault, I don't know if I'll
ever be able to complete it without replacing hardware or letting my
system stay in an unusable-at-the-console state for hours) using only one
core, and seemed to be CPU-bound as opposed to I/O bound.

  * The local conversion didn't appear to run (much) faster than
over-the-network, though I didn't formally measure that in any way. That
said, it wasn't network-bound in waiting over the network, and my drives
were barely active (this was done on an LVM logical volume that is striped
over two SATA 3.0 Gbps hard disks).

So, if the question is, "should it be faster," my (rather uneducated,
end-userish answer) is "yes, of course." But I am understanding of
technical limitations, and if the conversion process is happening as fast
as it can, then I can't really complain that much. See my post to the ML
for some other ideas; I am willing to collect data if it would help at
least be able to give other users a very rough idea of how long operations
like this may take. At the absolute least, I _do_ think that it should be
able to see when a branch is very large and a conversion is going to
happen, there should be _some_ indication that the user may be waiting for
a (very) long time, even on powerhouse systems.

  --- Mike

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

On Tue, 4 Aug 2009, Robert Collins wrote:

> I think this one is, at most, that a warning might be useful.
>
> We don't warn for svn conversions though, and they are just about as
> painful :)

Not IME... All my Subversion branching and the like has been very painless
and never run longer than 90 minutes, at most (and if that), IIRC.

  --- Mike

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-08-04 at 00:44 +0000, Michael B. Trausch wrote:
>
>
> Hrm. That I don't know; I'm not intimately familiar with bzr's
> internals.
> I *can* say that I didn't expect to not finish it on my system; it's
> not
> exactly a low-end system. Well, it wasn't when I bought it:

There are other extant bugs about the performance of the operation you
attempted. Its not that its not important, its that this is overlapping
those bugs.

-Rob

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-08-04 at 00:55 +0000, Michael B. Trausch wrote:
> On Tue, 4 Aug 2009, Robert Collins wrote:
>
> > I think this one is, at most, that a warning might be useful.
> >
> > We don't warn for svn conversions though, and they are just about as
> > painful :)
>
> Not IME... All my Subversion branching and the like has been very painless
> and never run longer than 90 minutes, at most (and if that), IIRC.

And once we fix the performance issues you ran into because you were
doing the conversion over the network, I expect pack-0.92->2a to be
similar in time to a svn conversion.

-Rob

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

Wouldn't that mean that a local, non-network conversion would be quicker than a network conversion?

Revision history for this message
Robert Collins (lifeless) wrote :

On Tue, 2009-08-04 at 02:24 +0000, Michael B. Trausch wrote:
> Wouldn't that mean that a local, non-network conversion would be quicker
> than a network conversion?

Yes ;)

-Rob

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

On Tue, 4 Aug 2009, Robert Collins wrote:

> On Tue, 2009-08-04 at 02:24 +0000, Michael B. Trausch wrote:
>> Wouldn't that mean that a local, non-network conversion would be quicker
>> than a network conversion?
>
> Yes ;)

I'll have to try it again, but as I noted earlier, the speed (at least as
per the indicators that bzr gives) was roughly the same even going from a
local branch to a new 2a branch.

  --- Mike

Revision history for this message
Martin Pool (mbp) wrote :

2009/8/4 Michael B. Trausch <email address hidden>:
> On Tue, 4 Aug 2009, Robert Collins wrote:
>
>> On Tue, 2009-08-04 at 02:24 +0000, Michael B. Trausch wrote:
>>> Wouldn't that mean that a local, non-network conversion would be quicker
>>> than a network conversion?
>>
>> Yes ;)
>
> I'll have to try it again, but as I noted earlier, the speed (at least as
> per the indicators that bzr gives) was roughly the same even going from a
> local branch to a new 2a branch.

fwiw I converted mysql on the weekend and it took a bit under 20 hours.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Michael B. Trausch (mtrausch) wrote :

On Tue, 4 Aug 2009, Martin Pool wrote:

> 2009/8/4 Michael B. Trausch <email address hidden>:
>> On Tue, 4 Aug 2009, Robert Collins wrote:
>>
>>> On Tue, 2009-08-04 at 02:24 +0000, Michael B. Trausch wrote:
>>>> Wouldn't that mean that a local, non-network conversion would be quicker
>>>> than a network conversion?
>>>
>>> Yes ;)
>>
>> I'll have to try it again, but as I noted earlier, the speed (at least as
>> per the indicators that bzr gives) was roughly the same even going from a
>> local branch to a new 2a branch.
>
> fwiw I converted mysql on the weekend and it took a bit under 20 hours.

... and that would explain why I couldn't do it. I had to reboot _way_
before that, probably 5 or 6 hours in...

  --- Mike

Revision history for this message
Kristian Nielsen (knielsen) wrote :

It seems this is the same problem that I am having.

$ bzr info .
Shared repository with trees (format: 2a)
$ bzr info /home/knielsen/devel/repo/mariadb-5.1
Repository branch (format: 1.14 or 1.9)
$ time bzr branch --no-tree /home/knielsen/devel/repo/mariadb-5.1

(/home/knielsen/devel/repo/mariadb-5.1 is a local branch of lp:maria, which
shares most of the history with mysql-6.0)

This has now run for > 24 hours with the CPU pegged at 100%. Process uses
2Gbyte of resident memory.

Note that both of these branches are _local_. So this problem appears to be
distinct from bug 385826.

The machine it is running on is a Core 2 Duo 2.4GHz with 4 Gb RAM and an Intel
SSD disk.

This does appear to be excessive resource consumption even for a conversion
operation. I was trying to follow procedures found here:

    http://doc.bazaar-vcs.org/latest/en/upgrade-guide/index.html#migrating-branches-on-launchpad

(I also tried bzr upgrade, but that died after running for 21 hours, so seems
to have similar performance issues. The failure seems not related to bzr).

If the branch command eventually finishes, I will test if branching between
two repositories both of format 2a also suffers from this problem.

[Does bzr regression tests include testing on the mysql repositories? If not it
might be an idea, as they often seem to provoke various issues.]

Revision history for this message
Martin Pool (mbp) wrote :

Our regression tests as such run on small reproduction recipes for bugs, so that the tests run quickly. We do do some performance and acceptance tests on mysql. Suggestions for more are welcome...

Revision history for this message
Kristian Nielsen (knielsen) wrote :

The branching of lp:maria succeeded. Took 34 hours at 98% cpu usage and using just over 2Gbyte of memory.

Branching from the converted format 2a shared repo into another format 2a shared repo seems to still take lots of resources: 12 minutes at 100% cpu and using >1Gbyte of memory. But nowhere near the format conversion resource usage.

Doing a branch/checkout inside the converted format 2a repo is significantly faster and uses significantly less memory than was the case with the earlier formats. So that's good. The repo size is also significantly (3-4 times) smaller.

Ok, so I think at the root of these problems are these two issues:

1. When branching into a format 2a shared repo from an earlier format branch, bzr has to convert the revisions. This is quite painful for a repo of the size of MySQL/MariaDB (for a say 5x bigger repo it would hardly be feasible).

2. The basic usage of just installing bzr 2 and running `bzr init-repo . ; bzr branch lp:maria` will by default initiate such conversion, which will cause problems until we convert main repos to 2a.

It would obviously be really nice if the format conversion could be improved an order of magnitude in terms of cpu and memory usage. But I guess that might be too much to hope for.

Failing that, I guess we need to document this issue clearly, and then as soon as feasible convert all of our repositories to the new 2a format once and for all to avoid this problem in the future and reap the efficiency benefits of the new format.

Once we convert, it might also make sense to make tarballs of a pre-initialised shared repo available for machines with too little memory to do the initial branching themselves.

Revision history for this message
John A Meinel (jameinel) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

> Once we convert, it might also make sense to make tarballs of a pre-
> initialised shared repo available for machines with too little memory to
> do the initial branching themselves.
>

bzr 2.1.0b2 cuts memory consumption in ~1/2 for my test cases.
(Launchpad's code base). For LP's codebase it was ~1GB => 500MB for
branching outside the shared repository. I haven't specifically tweaked
any of the conversion code paths. I would expect them to be "better" but
probably not 2:1 better.

Converting from 1.9 => 2a is not going to get a lot better. More than
have the time is generally spent extracting the data from the 1.9 format
(I believe the times I saw was 2/3rds extraction, 1/3rd insertion into
the new format.)

There is one patch I know that can make the conversion significantly faster:

=== modified file 'bzrlib/xml8.py'
- --- bzrlib/xml8.py 2009-07-07 04:32:13 +0000
+++ bzrlib/xml8.py 2009-11-16 16:49:46 +0000
@@ -433,9 +433,9 @@
                 pass
             else:
                 # Only copying directory entries drops us 2.85s => 2.35s
- - # if cached_ie.kind == 'directory':
- - # return cached_ie.copy()
- - # return cached_ie
+ if cached_ie.kind == 'directory':
+ return cached_ie.copy()
+ return cached_ie
                 return cached_ie.copy()

         kind = elt.tag

Note that the code is already there, just commented out. The reason for
this is that it is "unsafe" for some code paths, but it works great for
the conversion case. (If someone mutates the object that you have stored
in the cache... bad things happen. The conversion code *doesn't* though.)

That also might help memory consumption a little bit, but I wouldn't
expect too much there.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksBg4UACgkQJdeBCYSNAANwLQCfY8id1CFonwMlz7Nkg8K7Mec2
V7YAoJWdT3IuBwfG4IzUVo/6or7mTpkt
=qrhR
-----END PGP SIGNATURE-----

Martin Pool (mbp)
Changed in bzr:
importance: Undecided → Medium
status: Incomplete → Confirmed
tags: added: 2a fetch packs
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.