Abderrahim Kitouni | 29 May 17:05 2009

GSoC: hg and git interoperability (status report)

Hi all,
It's been a week since GSoC officially started, so here is a (not so) little
update about what I've done so far.

I've rewrote the script I've sent earlier as a mercurial extension. It can
convert pretty much everything (it still has some bugs), and should be robust
enough not to crash (except on encoding issues).

Last week, I added support for pulling from git. It worked for a simple linear
repository, but for more complex repositories, there are some problems when
converting back to git (as I don't keep original git objects around).

This week, trying to fix some bugs, I added a command to verify that a
repository is correctly converted (by converting back and verifying that the
hashes match).

I also noticed that hg strips some whitespace from the changeset description. To
work around this, I'm storing the original description in an extra field, is
there another way?

I'd like to build a test suite for this, if you'd like to help try cloning your
favorite project and verify that the conversion is OK. As of now, it doesn't
have a "UI", all you can do is :

hg git-clone git://host/path [dest]
hg git-pull git://host/path
hg git verify

You can get the code from here : http://bitbucket.org/abderrahim/hg-git/
(you'll need dulwich as well http://samba.org/~jelmer/dulwich)
and report bugs to http://bitbucket.org/abderrahim/hg-git/issues/

I've recently noticed that the homepage of dulwich was outdated, so a part of
my project is not relevant. I'm thinking about implementing the server side
protocol (i.e. serving hg repositories over the git protocol).

Some time ago, another project with the same goal was started
http://hg-git.github.com/ I may be able to reuse some of their code, but I'm
taking a different approach from them, so I'm not sure what I can reuse (I can
definitely reuse improvements to dulwich).

I think there should be only one "canonical" source (either git or hg), and
information not relevant to this source is discarded. So for the time being,
I'll only focus on git concepts and how to preserve them (I also want to have
stable hashes across clones), with the exception of octopus merges that will
come later. So commits made in hg may have different hashes after they make the
roundtrip (but this shouldn't affect pulling from git afterwards).

Later on, I plan to have it work the other way around, and at this point I'd
expect that cloning a git repo over the git protocol will produce the same

That's all, I hope this makes sense. Any feedback is vey welcome.