Robin H. Johnson | 27 Oct 02:10 2010

meeting followup: commit signing

So beyond the meeting, I spoke to spearce again, and came up with a more
detailed plan.

1. We will implement our own reflog to track who pushes commits. It will
   be done by the server-side script making a commit into a submodule.

2. Careful selection of what to sign should work with the following:
   # git diff-tree --no-commit-id -r --raw $commitid ; 
   # git cat-file commit $commitid |egrep -v '^(tree|parent|commiter)'
   Need a slightly better parser to trim those 3 lines from the latter.
   Feed that data into gpg --detached-sign.
   But then after we have that, we can either append it onto a commit
   message (would have to trim during verification), or put it in as a
   git note (need to verify trampling).
   This SHOULD be safe across all actions, rewind, merge, cherry-pick.

Log of the discussion attached.

Robin Hugh Johnson
Gentoo Linux: Developer, Trustee & Infrastructure Lead
E-Mail     : robbat2@...
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85
**** BEGIN LOGGING AT Tue Oct 26 11:39:37 2010

Oct 26 11:39:50 robbat2|na	if you've got a moment, wanted to pick your brain more about the signing issue,
and an idea
[spearce has address ~spearce <at> nat/google/x-idtnzvqqrspgyiyg]
Oct 26 11:43:57 spearce	eh?
Oct 26 11:45:21 robbat2|na	going back to what to sign from the gentoo meeting at the summit
Oct 26 11:45:30 robbat2|na	git show itself John said may change slightly
Oct 26 11:45:42 robbat2|na	but isn't the commit itself just the tree object+blob objects
Oct 26 11:45:52 robbat2|na	which in themselves are invariant once committed
Oct 26 11:47:00 spearce	the commit itself is the SHA-1(tree, parent(s), author, committer, message). 
Where tree is itself a transitive SHA-1 of all of the file contents (as its a SHA-1 of the SHA-1s of the files).
Oct 26 11:47:34 robbat2|na	and what we care about is signing the author+committer+message + new blobs
Oct 26 11:49:23 spearce	yes.  so sign the output of `git diff --raw parent tree` ?
Oct 26 11:50:03 spearce	that's awkward, but it lets you cherry-pick the commit onto a different base
assuming the files that commit changes weren't modified by anyone else.
Oct 26 11:50:08 robbat2|na	i was thinking ' git show --raw $commitid'
Oct 26 11:50:15 robbat2|na	which has the git diff --raw parent tree line on the bottom
Oct 26 11:51:17 spearce	right, i see.  but i would sign the raw underlying data to prevent formatting
changes from changing the signature and breaking old commits
Oct 26 11:51:55 robbat2|na	yup, raw is better
Oct 26 11:52:04 spearce	so more like `git diff-tree -r --raw parent tree`
Oct 26 11:52:25 spearce	and a filtered version of `git cat-file commit commit`
Oct 26 11:52:51 robbat2|na	dropping the parent line right?
Oct 26 11:53:54 spearce	tree, parent, committer
Oct 26 11:53:58 spearce	so keep author and the message text
Oct 26 11:54:40 robbat2|na	and track the committer seperately with our parallel reflog
Oct 26 11:54:49 robbat2|na	that you suggested
Oct 26 11:56:50 robbat2|na	(ignoring that we need a safer parser to exclude, something like this)
Oct 26 11:56:53 robbat2|na	# git diff-tree -r --raw $commitid ; git cat-file commit $commitid |egrep -v '^(tree|parent|comitter)'
Oct 26 11:57:11 robbat2|na	with output of:
Oct 26 11:57:14 robbat2|na	769e957a036ad2a0da0f2c4612251c8e39fe58d8
Oct 26 11:57:14 robbat2|na	:100644 100644 0a34f78da211659febd0385b4a0ac31750991ea0
4bf5a162205eb07f94f8df36a37ca4a30eb07f1f M	gitosis.conf
Oct 26 11:57:14 robbat2|na	author Robin H. Johnson
<robbat2@...> 1280258366 +0000
Oct 26 11:57:14 robbat2|na	Bump.
Oct 26 11:57:16 robbat2|na	</eof>
Oct 26 11:57:25 robbat2|na	and the whitespace line it ate
Oct 26 11:58:22 spearce	yea
**** BEGIN LOGGING AT Tue Oct 26 12:00:21 2010

Oct 26 12:00:21 robbat2|na	ok, so now just to figure out where to store the signature for that data
Oct 26 12:00:40 spearce	two ideas:
Oct 26 12:00:41 robbat2|na	and the notes overwrite issue
Oct 26 12:00:47 spearce	1)  put it at the end of the commit message
Oct 26 12:01:01 spearce	2) take the SHA-1 of that data above, and store it as a detached signature in a notes branch
Oct 26 12:01:40 robbat2|na	putting it on the end of the commit message will change the output  of git
diff-tree -r --raw
Oct 26 12:01:45 robbat2|na	because the first line is the commitid
Oct 26 12:04:59 robbat2|na	adding --no-commit-id to the diff-tree maybe
Oct 26 12:05:59 spearce	oh, yea.  you can't include that stupid $commitid line in the output of diff-tree in
your signature.
Oct 26 12:06:05 spearce	otherwise it would bust when you cherry-pick that change
Oct 26 12:06:23 robbat2|na	oh, and the issue re repo, is that it's going to be too painful in overhead. most
developers have the entire tree, so w/ repo that would mean 24k repos on their box, and at least 3.4GiB burnt
in inodes
Oct 26 12:07:15 robbat2|na	ok, i'll play with this a bit more, and see if I can break it at all
Oct 26 12:08:11 spearce	so are you guys going with one giant repository?
Oct 26 12:08:31 robbat2|na	yup
Oct 26 12:08:43 robbat2|na	that overhead cost is why we excluded other competitors in what to move to from CVS
Oct 26 12:09:18 robbat2|na	we are going to use graft however, to get the pack size stuff down probably
Oct 26 12:10:06 robbat2|na	atomic commits to the repo are critical as well, w/ repo, somebody updating a
submodule and not it's parent could lead to bad breakage
Oct 26 12:26:23 robbat2|na	any objections to me publishing this discussion in our SCM conversion notes?
Oct 26 12:30:04 spearce	nope
Oct 26 12:30:13 robbat2|na	thanks :-)
**** BEGIN LOGGING AT Tue Oct 26 16:58:08 2010