Page MenuHomePhabricator

Add instructions for migrating branches from one git repository to another.
Needs ReviewPublic

Authored by jlebar on Oct 18 2018, 4:24 PM.

Details

Reviewers
jyknight
greened
Summary

Useful for migrating from one monorepo to another, or from a multirepo
to a monorepo.

Event Timeline

jlebar created this revision.Oct 18 2018, 4:24 PM

If we go with this document moving forward, I would first purge it so that it reflects the current situation. Otherwise we could also just archive it and start a new one with the actual action plan.
I feel adding more stuff here will be a bit confusing moving forward.

fedor.sergeev added inline comments.
llvm/docs/Proposals/GitHubMove.rst
874–875

this can be done shorter and faster:
git log --oneline --merges origin/master..mybranch

906

Why not rebase --onto instead of format-patch/am ?
Never used git-am for a bulk set of patches but my feeling is that rebase would be a bit more "controllable" in a case when something wrong happens with the patches...
(also it has a benifit of not requiring to put patches somewhere into /tmp :) )

fedor.sergeev added inline comments.Oct 23 2018, 2:48 AM
llvm/docs/Proposals/GitHubMove.rst
906

Ignore the above, it wont work with --directory=llvm hacks.

fedor.sergeev added inline comments.Oct 23 2018, 6:29 AM
llvm/docs/Proposals/GitHubMove.rst
934–935

I'm afraid this wont work. --since sets a bottom limit for the commits shown with git-log, but git-log starts from the top (HEAD), so in most cases it will be equivalent to git log -n1 --author <xxx>, which is absolutely not whats needed here.

Perhaps doing both --since/--until can do the trick. And if we find the times of commits reliable enough then we can automate it:

merge_base=$(git merge-base origin/master my-branch)
merge_time=$(git log -n1 --pretty=%at $merge_base)
merge_author=$(git log -n1 --pretty="%an <%ae>" $merge_base)
git log -n1 --since="@$merge-time" --until="@$merge_time" --author="$merge_author"

jlebar updated this revision to Diff 170806.Oct 23 2018, 8:23 PM
jlebar marked 4 inline comments as done.

Address fedor.sergeev's comments.

Thanks a lot for proofreading my totally untested code. :)

I feel adding more stuff here will be a bit confusing moving forward.

I agree, but my goal is to avoid bikesheding on this as much as possible, and I'm sort of afraid as soon as I create a new page I'm going to get to enjoy that... At some point someone is going to have to write the canonical documentation, and they can move this there. And until that canonical documentation exists, does it hurt too much to have this here?

llvm/docs/Proposals/GitHubMove.rst
874–875

Hm...I *think* you're right. I was worried about weird edge cases, but I now I can't articulate any. Done, thanks.

906

These steps have two separate repositories; git rebase only operates within one repository.

I did it this way because if I suggested that pulling both histories into one repository was a good way to do it, people would complain that they can't switch to the new monorepo without making their .git directory larger. :)

934–935

Added --until. I decided not to script it because this is not 100% sound, and so I'd rather encourage a human to be in the loop.

danilaml added inline comments.
llvm/docs/Proposals/GitHubMove.rst
934–935

How would that deal with multiple merges (not necessarily only with origin/master) with many conflicts? Would that essentially require to manually redo all merges?

fedor.sergeev added inline comments.Oct 24 2018, 5:06 AM
llvm/docs/Proposals/GitHubMove.rst
934–935

Then you still need to add %an <%ae> to the merge-base description format, e.g.:

git log -n1 --pretty="format:Date-Time: %aD %at%nAuthor: %an <%ae>%nDescription:%s" $(git merge-base origin/master my-branch)

so users have something to copy-paste from.

fedor.sergeev added inline comments.Oct 24 2018, 5:13 AM
llvm/docs/Proposals/GitHubMove.rst
934–935

How would that deal with multiple merges (not necessarily only with origin/master)

This is not trying to recreate all the merges.
It just a single *additional* merge (done right below this thread of comments).
This merge will essentially join two disjoint histories (having a lot of duplication there for all the upstream
commits that happened to be in your own monorepo).

Recreating merges without duplicating upstream commits requires substantial complications in history rewrite ala git filter-branch and are not for the faint of heart...

fedor.sergeev added inline comments.Oct 24 2018, 5:33 AM
llvm/docs/Proposals/GitHubMove.rst
927–930

Perhaps this comment should be rewritten into a bit less git-talk form.
For those who are not git-savvy it might be rather unclear that we intend to look
for the latest upstream commit that was already merged into our old monorepo.
Mostly - try to avoid using "merge base" term.

I'm bad at writing good English prose, so I wont be giving any wording suggestions :)

danilaml added inline comments.Oct 24 2018, 6:17 AM
llvm/docs/Proposals/GitHubMove.rst
934–935

Ah, I think I understand. So it's basically nothing more than "find the commit in new final repo corresponding to the latest merged commit from some other llvm git mirror and merge with that"? I.e. if you know you are at HEAD you could just do the merge?

Then I don't see this that much helpful, since IMHO, the main problem comes from the actual transition to the new repo and everything it involves, not finding the point of such transition. I.e. stuff like rewriting history to use final repo commits instead (in if you don't lieve with duplicated commits), or dealing with the merge conflicts again and again (since I don't think that git would understand that they were already resolved by previous non-final merge). The painful stuff.

fedor.sergeev added inline comments.Oct 24 2018, 6:45 AM
llvm/docs/Proposals/GitHubMove.rst
934–935

it's basically nothing more than "find the commit in new final repo"

Exactly.

the main problem comes from the actual transition to the new repo and everything it involves, not finding the point of such transition

It depends on what do you want to achieve. If you do not care about past history other than being able to find your own downstream commits then doing a single merge should be fine.

rewriting history to use final repo commits instead

Yes, rewrites are considerably more complicated.

dealing with the merge conflicts again and again

And here - no, you will not need to perform *any* merge conflict resolution besides the first one described here,
which is, btw, very likely to succeed without conflicts unless you were doing something really weird with your merges. As soon as you are done with the merge, all the commits that go up from the merge point (either upstream or downstream) will conflict only on their contents, not on something that happened before the merge point.

fedor.sergeev added inline comments.Oct 24 2018, 6:50 AM
llvm/docs/Proposals/GitHubMove.rst
934–935

the last statement was a bit of an overstatement, but the point is that your conflicts from now on will not be any different than any other merge conflicts you had before.

danilaml added inline comments.Oct 24 2018, 7:19 AM
llvm/docs/Proposals/GitHubMove.rst
934–935

I was talking about existing merge conflict. I.e. suppose you've just completed big merge from one of llvm mirrors and resolved many non-trivial conflicts. Now, even if you didn't introduce any new changes and decided to merge llvm git prototype right after that (at the same commit) you'd have to go through the whole tedious process again, i.e. you'll have to resolve all conflicts again, potentially introducing new bugs. You could as well do something like merge -s ours and hope for the best (that wouldn't solve duplicated commits though).

Btw, the git will refuse to merge with prototype branch unless you pass --allow-unrelated-histories.

bogner added inline comments.Oct 24 2018, 12:03 PM
llvm/docs/Proposals/GitHubMove.rst
866–868

I guess the use case here is for local/in-progress branches, rather than any sort of real downstream consumer. Maybe we should point that out directly rather than having to infer it from the structure of the repos? I guess this one could be called "Local branches from one of the prototype monorepos".

Also it's a bit odd that this is first, before the "multirepo to monorepo" local branches case below ("Local branches from the official LLVM git mirrors"). I expect most developers have that case, since the monorepo prototypes are pretty new and aren't documented as the official way to do things.

872

It's not clear what origin/master is meant to be here, since you don't say what origin is at all. Having the sections based on use cases rather than structure of the repos as I suggest above would probably help.

933–935

I guess the second command here should point at monorepo/master so it doesn't find the exact same commit as git merge-base did. In any case, this is an extremely tedious way to deal with this, especially when you have multiple branches.

938–943

I expect this is the common case for anyone with any significant out of tree changes. We probably need to pay a bit more attention to it.

greened added inline comments.Oct 29 2018, 6:14 PM
llvm/docs/Proposals/GitHubMove.rst
938–943

Agreed. It is definitely the situation for us. I suspect the vast majority of cases will be people using the existing project repositories, with their own long-lived branches that have periodic merges from master. Essentially, downstream looks like a "downstream master" branches that periodically merge from the "upstream master" branches to sync up.

In our situation we have multiple such "downstream master" branches, one for each project repository we're using (llvm, compiler-rt, etc.). It's not at all clear to me how we will transition those multiple branches to a single "downstream master" branch on the monorepo.

In the worst case we'll replay all of our downstream commits on top of the monorepo master, but I am hoping there is a better way.

jlebar added inline comments.Oct 29 2018, 6:22 PM
llvm/docs/Proposals/GitHubMove.rst
938–943

I hear that you and others are in this position.

Do you want to rewrite history so it's like you always had a monorepo? That's tantamount to "replaying all of our downstream commits on top of the monorepo master". It's going to be painful, but it sounds like you have a rough idea of how to do it? I'm not sure what a better way would be, in part because I'm not sure what you're trying to accomplish.

Anyway writing such a tool was not what I meant to volunteer for when I volunteered to write this patch. Do you want me to abandon it?

fedor.sergeev added inline comments.Oct 29 2018, 11:00 PM
llvm/docs/Proposals/GitHubMove.rst
938–943

I believe this patch is still very useful, as it at least enumerates the options if not providing full solutions.
Writing a generic automated tool for the most complicated case is a very challenging task, and inability to solve it should not stop us from providing the best guidance we can.

greened added inline comments.Oct 30 2018, 5:09 PM
llvm/docs/Proposals/GitHubMove.rst
938–943

I have to experiment a bit, but it's possible that some unpushed changes to git-subtree that I have might help. If that turns out to be the case, I'd make it a high priority to make those changes available (probably as a fork on GitHub) and we could point people there if they want to try it.

I know that writing such tools is difficult. I did something similar in git-subtree and it took a very long time to get everything working.

The document is certainly useful. You shouldn't abandon it!

bogner added inline comments.Oct 31 2018, 9:25 AM
llvm/docs/Proposals/GitHubMove.rst
938–943

I've sent some thoughts about an approach that makes this a lot easier to deal with to llvmdev. See http://lists.llvm.org/pipermail/llvm-dev/2018-October/127334.html

bjope added a subscriber: bjope.Nov 6 2018, 12:33 AM

Let's submit this soon, even if it will be further edited.

llvm/docs/Proposals/GitHubMove.rst
860–861

Probably worth explicitly mentioning that this is intended for when you're OK with changing your commit hashes as part of the migration.

881

Don't think you need merge-base here. git log origin/master..my-branch should be sufficient?

883–885

I found it easier to output to a single file, using git format-patch --stdout origin/master..my-branch > foo.patch. (but that doesn't really matter, maybe not worth mentioning)

886–889

Easier to just go by svn revision number of the upstream commit, probably?

I'd just go directly:

git checkout -b my-branch 'origin/master^{/llvm-svn=1234\W}' (\W, aka end-of-word, is only necessary if your revision number is fewer digits than current.)

938

Can now also suggest the migrate-downstream-fork.py tool I wrote about on the list. (Could just point to the mailing list archive initially, before adding better instructions).

greened added inline comments.Nov 13 2018, 9:17 AM
llvm/docs/Proposals/GitHubMove.rst
938

Yes, we should absolutely mention migrate-downstream-fork.py. It worked great for our downstream forks. Here's the link to James' post about it:

http://lists.llvm.org/pipermail/llvm-dev/2018-November/127496.html

I also added a tool to zip downstream forks based on a submodule update history:

http://lists.llvm.org/pipermail/llvm-dev/2018-November/127704.html

I think both of these tools will be useful for folks living downstream.

rnk added a subscriber: rnk.Nov 14 2018, 3:39 PM

I ended up finding these instructions accidentally when searching for other monorepo related stuff, and I followed these instructions, and they worked. Thanks for writing them up!

What's the status of this? Since the monorepo prototype seems very likely to be blessed soon, I was thinking of writing up some recipes for common downstream migrations. This document would seem to be the right place to put them but it seems from all the comments that this is very much in flux.

Would it make more sense to create a separate "downstream migration process" document? This one has a lot of content about the whats and whys of the migration while I think most users will want to cut to the chase and know what they should do to start using the monorepo.

jlebar added a comment.Jan 9 2019, 1:17 PM

What's the status of this?

Mostly I'm just kind of a little apprehensive about taking another stab at this patch since getting it into a state that everyone is happy with seems pretty challenging.

As @bogner mentioned, the last section (Multirepo to Monorepo, With Merges) is almost certainly the most common downstream situation. I may write up some recipes under this header and move it to a more prominent position in a separate patch. The discussion of the other situations is orthogonal to the kinds of things I found I needed to do downstream.

I created some instructions in D56550 which should cover the most common cases.