Migrate to Git

Bug #1846887 reported by NecLimDul
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MadGraph5_aMC@NLO
Fix Released
Wishlist
Unassigned

Bug Description

This isn't really a bug but an offer of some code and help.

I've been trying to help someone with this program and struggling to parse through the different branches and waiting hours for checkouts of branches to download so I wrote a script to migrate all the branches and revisions to git where I can more easily review branches and see what was going on in the project.

Obvious I don't know the projects goals in this regard but I wanted to offer the solution up if it was a pain point for the project as well or something that was be considered.

The script is posted here:
https://gitlab.com/neclimdul/madgraph_migrate

The migrated code(and some of my tinkerings) are here:
https://gitlab.com/neclimdul/madgraph

Its I think a pretty straightforward python script. Not really polished or anything just hacked on until it worked and pushed up. It currently migrates everything that comes up in launchpad's branches list but its just a python script so it could easily be modified to filter out branches that aren't wanted.

I'm also happy to help with running the migration, setting up a project in gitlab/github, etc, just let me know.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :

Hi,

This is a project that is quite interesting but which is for me quite complicated to do.
And for the moment, I need to focus on the python3 conversion before looking how to do the bzr to git conversion.

The issue when doing such conversion is obviously to keep all the branch history in place and be sure that we can still merge efficiently branches that have started to diverge more than 5 years ago.

What worries me in your code is first his README:" Currently broken by relying on git bzr plugin which runs into buts with bugs in bzr and fast-export. Something you posted to github but was thrown back at bzr."

Then do you have an example where you load various bzr branches into a single git repo?
I think the real issue here is to be sure that we can load all the branches in a consistent way and that we can efficiently merge them afterwards. This validation is actually what would take a lot of time to be convince that we have a tools that allow a clean transition. If you can do such careful validation that would be awesome.

Cheers,

Olivier

Revision history for this message
NecLimDul (neclimdul) wrote :

It is very complicated! I'm glad I can share some experience and code. :-D And good to hear python3 is a focus! I was definitely interested in looking into that progress.

" Currently broken by relying on git bzr plugin which runs into buts with bugs in bzr and fast-export. Something you posted to github but was thrown back at bzr."

Woops... I've been working on this in my free time and I left that note to myself at one point when I ran into a large problem and had to walk away for a bit. The script works very different now and the comment is not accurate so I've removed it from the README.

"Then do you have an example where you load various bzr branches into a single git repo?"
Yes that's actually what I have here https://gitlab.com/neclimdul/madgraph. You can use the "Branches" link in the left sidebar to view and even compare all the bzr branches launchpad listed and I combined into the repo.

Agreed, validation is a _very_ big deal in this sort of thing. I don't have enough knowledge of the project to know what sort of key things the team would need to be validated. Knowing that you're interested though I'll see if I can think of some ways I can provide some of that validation.

Changed in mg5amcnlo:
importance: Undecided → Wishlist
Revision history for this message
NecLimDul (neclimdul) wrote :

Hope things are going well! I've been poking at this in my free time and have mad a lot of changes to the migration script I posted before and wanted to provide an update.

I mostly focused on testing the migration but the refactors came with some "Features"
- Its better at pulling incremental changes so you don't need to rebuild everything all the time(pulling all the bzr branches and even the git import take a while)
- I can migrate and test specific branches in isolation now
- A test script now that does some comparisons from bzr and git

One failed changes, originally I tried comparing the entire history of the branches. It turns out that even though git can format the individual commits so they are _very_ close to bzr's format, git organizes its history slightly different and this doesn't seem feasible. By this I mean in the flat linear view of commits, commits from merges seem to "appear" in a different order even when logically everything is the same. The code is still in the repo if anyone wants to take a look but I couldn't get output that seemed useful to me.

The test script does compare the source code after the migration and this seems to work great. Attached are the diffs between the git and bzr branches after the migration. Key takeaways:
- The maddump branch didn't work. :( I didn't see any errors but I'll keep looking at this.
- PY8meetsMG5aMC has one directory containing a link that shouldn't be there. ¯\_(ツ)_/¯
- Every other branch is identical!
- Its not in the diffs because I suppressed it but there are a lot of empty directories in some bzr branches and git can't store empty directories. Where it makes sense I would suggest adding a .gitkeep or .gitignore file to hold the directories after the migration. This is the standard solution when an empty directory is needed as a placeholder or to document a structure.

Changed in mg5amcnlo:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.