Bug #321389 “bzr visualise: sort by date” : Bugs : Bazaar GTK+ Frontends

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-07-03:

#1

Since the graph most of the time reflects the chronology of the merges and,
when no merges exist in a branch, the simplified graph *is* in reverse chronological order
(so the more recent revisions are shown first),I don't understand what you're talking about.

Can you explain with more details or even show us a public branch where the current approach
is not presenting the graph in a reverse chronological order ?

Changed in bzr-gtk:
status:	New → Incomplete

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-07-03:

#2

Or do you mean not reverse indeed ? Like in bug #260295 ?

Revision history for this message

Maarten Bosmans (mkbosmans) wrote on 2009-07-03:

#3

Thanks for triaging this bug.

What I meant is for the commits of different branches to be interleaved, according to the commit date.
In a feature branch that is being worked on for a couple of months for example, I can then see how the commits of the branch relate in time to the mainline commits.

This is applicable for both the case of a merged branch, as for different branches that aren't merged yet. So instead of

      B2
       |
      B1
       |
  A3 |
   | |
  A2 |
   | |
  A1 |
   \ /
     * parent commit

I'd like to see something like

  A3
   |
   | B2
   | |
  A2 |
   | |
   | B1
   | |
  A1 |
   \ /
     * parent commit

Revision history for this message

Vincent Ladeuil (vila) wrote on 2009-07-04:

#4

Oh thanks !

Indeed, its' different from bug #260295.

I certainly agree it would be a nice option or even made the default presentation.

Changed in bzr-gtk:
status:	Incomplete → Confirmed

Revision history for this message

Maarten Bosmans (mkbosmans) wrote on 2009-07-04:

#5

> I certainly agree it would be a nice option or even made the default presentation.

The current behaviour of grouping all the commits of a merged branch directly under the merge commit is not really what is expected, IMHO. So may be it should be the default, but at least it should be possible to get this representation when clicking on the date column header.

Can you say whether it's easy to implement this? I can give a shot, but I'll have to brush up my Python, so if you're familiar with the bzr-gtk codebase and would like to make a patch, that would be great.

Gary van der Merwe (garyvdm) on 2009-07-05

Changed in qbzr:
importance:	Undecided → Low
status:	New → Confirmed

Revision history for this message

Gary van der Merwe (garyvdm) wrote on 2009-07-10:

#6

viz, and qlog, use merge_sort, which in turn uses topo_sort. This sorts the graph topologically, so that all parents come before their children. You need to sort topologically, and then by date. If you just sort by date, it may not be sorted topologically (which might happen if someone committed with their computer date set wrong, which I have done before.) If it is not sorted topologically, there will be problems drawing the graph.

Revision history for this message

Maarten Bosmans (mkbosmans) wrote on 2009-07-11:

#7

shell script to generate testcase Edit (386 bytes, text/x-sh)

These commands generate a branch viztest to show the behaviour.

The resulting graph of bzr viz of the viztest branch is:
* Merge
* A2
* A1
* B2
* B1
* Initial commit

While the expected result would be:
* Merge
* B2
* A2
* B1
* A1
* Initial commit

Revision history for this message

Maarten Bosmans (mkbosmans) wrote on 2009-07-11:

#8

Preliminary patch for bzr-gtk Edit (1.1 KiB, text/plain)

It wasn't easy to grok the merge_sort code, so as a first try I just reordered the result from merge_sort in a separate pass in linegraph.py. This has the disadvantage that the code isn't generic, but bzr-gtk specific.

This patch can be viewed as a proof-of-concept. It is nice to see for example how the commits on diverging branches of a large project relate in time. In general however, I think the generated tree is a bit more messy to look at, so may be this sort mode shouldn't be the default.

The two main problems with this patch are:

* The revisions aren't really sorted by date, but by revid, which happens to have the format commiter-datestr-hash, so sorting on that gives approximately the expected result, at least for my single-commiter project. As far as I can see the linegraph hasn't got enough information to extract the revision timestamp from the list of revids. So may be the code has to move even higher up, where this information is available.

* The topological order isn't preserved for commit's with incorrect timestamps. There several possible ways of fixing this, but I'll wait with that until we have decided the correct place for this code

Gary, do you still think merge_sort is the right place for this code? If so, how do you propose the revision timestamp gets exposed to this function? May be the function call should have a reference to the repository, so it can look up the timestamp, or better yet the sort function could have a comparison function to sort by after the topological sort as an argument.

The next two weeks I'm on holiday, so after that I'll work on this again. But if anybody could try to fix this in the mean time, that would be great too.

Revision history for this message

Gary van der Merwe (garyvdm) wrote on 2009-07-12: Re: [Bug 321389] Re: bzr visualise: sort by date

#9

On Sat, Jul 11, 2009 at 6:07 PM, Maarten Bosmans<email address hidden> wrote:
> It wasn't easy to grok the merge_sort code, so as a first try I just
> reordered the result from merge_sort in a separate pass in linegraph.py.
> This has the disadvantage that the code isn't generic, but bzr-gtk
> specific.
>
> This patch can be viewed as a proof-of-concept. It is nice to see for
> example how the commits on diverging branches of a large project relate
> in time. In general however, I think the generated tree is a bit more
> messy to look at, so may be this sort mode shouldn't be the default.
>
> The two main problems with this patch are:
>
> * The revisions aren't really sorted by date, but by revid, which
> happens to have the format commiter-datestr-hash, so sorting on that
> gives approximately the expected result, at least for my single-commiter
> project. As far as I can see the linegraph hasn't got enough information
> to extract the revision timestamp from the list of revids. So may be the
> code has to move even higher up, where this information is available.
>
> * The topological order isn't preserved for commit's with incorrect
> timestamps. There several possible ways of fixing this, but I'll wait
> with that until we have decided the correct place for this code
>
> Gary, do you still think merge_sort is the right place for this code? If
> so, how do you propose the revision timestamp gets exposed to this
> function? May be the function call should have a reference to the
> repository, so it can look up the timestamp, or better yet the sort
> function could have a comparison function to sort by after the
> topological sort as an argument.

To get the real timestamp, you have to load the revision, which is
really slow. At the moment, we only load the revision when it is
displayed on screen. So I think there are 2 options:

* Use a regular expression to extract date str from the revid.
Pro: Will be fast because it won't have to load all the revisions.
Con: Won't work if the revid is not a bzr revid, e.g. on a bzr-svn branch.

* Load all the revisions. Because this is slow, I would say you need
to make topo sort the default, and only do this if the user
specifically asks for it.
Pro: Will work for any type of branch.
Con: Slooooow.

I don't really understand the merge_sort/topo_sort to be able to tell
you if it would make sense to do it as a part of that merge_sort code,
or if we should just do a second pass. The I think the best person to
ask about this would be John Meinel.

For qlog, we would need to do 2 passes, because we use information
gathered from the order, and the merge depth do work out what merges
what.

On Sat, Jul 11, 2009 at 6:07 PM, Maarten Bosmans<mkbosmans@gmail.com> wrote:
> It wasn't easy to grok the merge_sort code, so as a first try I just
> reordered the result from merge_sort in a separate pass in linegraph.py.
> This has the disadvantage that the code isn't generic, but bzr-gtk
> specific.
>
> This patch can be viewed as a proof-of-concept. It is nice to see for
> example how the commits on diverging branches of a large project relate
> in time. In general however, I think the generated tree is a bit more
> messy to look at, so may be this sort mode shouldn't be the default.
>
> The two main problems with this patch are:
>
> * The revisions aren't really sorted by date, but by revid, which
> happens to have the format commiter-datestr-hash, so sorting on that
> gives approximately the expected result, at least for my single-commiter
> project. As far as I can see the linegraph hasn't got enough information
> to extract the revision timestamp from the list of revids. So may be the
> code has to move even higher up, where this information is available.
>
> * The topological order isn't preserved for commit's with incorrect
> timestamps. There several possible ways of fixing this, but I'll wait
> with that until we have decided the correct place for this code
>
> Gary, do you still think merge_sort is the right place for this code? If
> so, how do you propose the revision timestamp gets exposed to this
> function? May be the function call should have a reference to the
> repository, so it can look up the timestamp, or better yet the sort
> function could have a comparison function to sort by after the
> topological sort as an argument.

To get the real timestamp, you have to load the revision, which is
really slow. At the moment, we only load the revision when it is
displayed on screen. So I think there are 2 options:

* Use a regular expression to extract date str from the revid.
  Pro: Will be fast because it won't have to load all the revisions.
  Con: Won't work if the revid is not a bzr revid, e.g. on a bzr-svn branch.

* Load all the revisions. Because this is slow, I would say you need
to make topo sort the default, and only do this if the user
specifically asks for it.
Pro: Will work for any type of branch.
Con: Slooooow.

I don't really understand the merge_sort/topo_sort to be able to tell
you if it would make sense to do it as a part of that merge_sort code,
or if we should just do a second pass. The I think the best person to
ask about this would be John Meinel.

For qlog, we would need to do 2 passes, because we use information
gathered from the order, and the merge depth do work out what merges
what.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-07-12:

#10

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
>
> To get the real timestamp, you have to load the revision, which is
> really slow. At the moment, we only load the revision when it is
> displayed on screen. So I think there are 2 options:
>
> * Use a regular expression to extract date str from the revid.
> Pro: Will be fast because it won't have to load all the revisions.
> Con: Won't work if the revid is not a bzr revid, e.g. on a bzr-svn branch.
>
> * Load all the revisions. Because this is slow, I would say you need
> to make topo sort the default, and only do this if the user
> specifically asks for it.
> Pro: Will work for any type of branch.
> Con: Slooooow.
>

I'll mention that this has gotten better in --2a formats.

2.0s time "get_ancestry(b.last_revision())"
11.5s time "get_revisions(ancestry)" (1.9)
4.9s time "get_revisions(ancestry)" (2a)

So you spend at least 2.0s just to get the revision graph, and then some
amount more to merge_sort them.

It costs an extra 3.0s to extract all of the revision texts (down from
9.5s). I have an experimental decoder which improves things a bit more:

4.4s time "get_revisions(ancestry)" (2a)

(The bulk of the improvement is just improvements in the time to extract
texts from the repository in 2a versus 1.9, and a small improvement in
revision extraction.)

> I don't really understand the merge_sort/topo_sort to be able to tell
> you if it would make sense to do it as a part of that merge_sort code,
> or if we should just do a second pass. The I think the best person to
> ask about this would be John Meinel.

I haven't timed the 'merge_sort' code here, though I'm sure there is
some room for improvement. Anyway, the main difference is:

  topo_sort All parents are emitted before a child is emitted
  merge_sort All unique parents are emitted immediately before a
                child

So if you have the graph: (time goes down)
X
|\
A B
| |
C D
| |
E F
|/
G

Topo sort is allowed to emit that as:
X A B C D E F G
or
X A C E B D F G
or
X A C B D E F G
etc

(There is no exact ordering between A & B in topo sort). Merge sort will
*always* yield:
X A C E B D F G

(B D F are ancestors of G that were not ancestors of E, so they must
appear between E and G)

>
> For qlog, we would need to do 2 passes, because we use information
> gathered from the order, and the merge depth do work out what merges
> what.
>

I'll also note that if you switched to a date sorted order, and you make
the assumption that parent revisions are always older (not guaranteed,
but likely do to potential clock skew), then you could have a huge win
by not having to load the whole graph before you display it. Though
you'd give up stuff like the merge sorting depth, etc.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpaLsUACgkQJdeBCYSNAAPrOgCgh5HynJN70CyGp0FMrHaCwx/0
vb0AnR3jkquqRqYIZ4V/F4Chy+qD4OOP
=7Q50
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
> 
> To get the real timestamp, you have to load the revision, which is
> really slow. At the moment, we only load the revision when it is
> displayed on screen. So I think there are 2 options:
> 
> * Use a regular expression to extract date str from the revid.
>   Pro: Will be fast because it won't have to load all the revisions.
>   Con: Won't work if the revid is not a bzr revid, e.g. on a bzr-svn branch.
> 
> * Load all the revisions. Because this is slow, I would say you need
> to make topo sort the default, and only do this if the user
> specifically asks for it.
> Pro: Will work for any type of branch.
> Con: Slooooow.
>

I'll mention that this has gotten better in --2a formats.

2.0s time "get_ancestry(b.last_revision())"
11.5s time "get_revisions(ancestry)" (1.9)
 4.9s time "get_revisions(ancestry)" (2a)

So you spend at least 2.0s just to get the revision graph, and then some
amount more to merge_sort them.

It costs an extra 3.0s to extract all of the revision texts (down from
9.5s). I have an experimental decoder which improves things a bit more:

4.4s time "get_revisions(ancestry)" (2a)

(The bulk of the improvement is just improvements in the time to extract
texts from the repository in 2a versus 1.9, and a small improvement in
revision extraction.)

> I don't really understand the merge_sort/topo_sort to be able to tell
> you if it would make sense to do it as a part of that merge_sort code,
> or if we should just do a second pass. The I think the best person to
> ask about this would be John Meinel.

I haven't timed the 'merge_sort' code here, though I'm sure there is
some room for improvement. Anyway, the main difference is:

topo_sort 	All parents are emitted before a child is emitted
  merge_sort	All unique parents are emitted immediately before a
                child

So if you have the graph: (time goes down)
 X
 |\
 A B
 | |
 C D
 | |
 E F
 |/
 G

Topo sort is allowed to emit that as:
 X A B C D E F G
or
 X A C E B D F G
or
 X A C B D E F G
etc

(There is no exact ordering between A & B in topo sort). Merge sort will
*always* yield:
 X A C E B D F G

(B D F are ancestors of G that were not ancestors of E, so they must
appear between E and G)

> 
> For qlog, we would need to do 2 passes, because we use information
> gathered from the order, and the merge depth do work out what merges
> what.
>

I'll also note that if you switched to a date sorted order, and you make
the assumption that parent revisions are always older (not guaranteed,
but likely do to potential clock skew), then you could have a huge win
by not having to load the whole graph before you display it. Though
you'd give up stuff like the merge sorting depth, etc.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpaLsUACgkQJdeBCYSNAAPrOgCgh5HynJN70CyGp0FMrHaCwx/0
vb0AnR3jkquqRqYIZ4V/F4Chy+qD4OOP
=7Q50
-----END PGP SIGNATURE-----

Jelmer Vernooij (jelmer) on 2011-02-18

Changed in bzr-gtk:
importance:	Undecided → Medium
importance:	Medium → Low

Alexander Belchenko (bialix) on 2011-08-05

Changed in qbzr:
importance:	Low → Wishlist

Affects		Status	Importance	Assigned to	Milestone
	Bazaar GTK+ Frontends	Confirmed	Low	Unassigned
	QBzr	Confirmed	Wishlist	Unassigned

Bazaar GTK+ Frontends

bzr visualise: sort by date

Bug Description

Other bug subscribers

Patches

Bug attachments

Remote bug watches