Index: docs/Proposals/GitHubMove.rst =================================================================== --- docs/Proposals/GitHubMove.rst +++ docs/Proposals/GitHubMove.rst @@ -853,6 +853,371 @@ less like to encounter a build failure where a commit change an API in LLVM and another later one "fixes" the build in clang. +Moving Local Branches to the Monorepo +===================================== + +Suppose you have been developing against the existing LLVM git +mirrors. You have one or more git branches that you want to migrate +to the "final monorepo". + +The simplest way to migrate such branches is with the +``migrate-downstream-fork.py`` tool at +https://github.com/jyknight/llvm-git-migration. + +Basic migration +--------------- + +Basic instructions for ``migrate-downstream-fork.py`` are in the +Python script and are expanded on below to a more general recipe:: + + # Make a repository which will become your final local mirror of the + # monorepo. + mkdir my-monorepo + git -C my-monorepo init + + # Add a remote to the monorepo. + git -C my-monorepo remote add upstream/monorepo https://github.com/llvm-git-prototype/llvm.git + + # Add remotes for each git mirror you use, from upstream as well as + # your local mirror. All projects are listed # here but you need only + # import those for which you have local # branches. + my_projects=( clang + clang-tools-extra + compiler-rt + debuginfo-tests + libcxx + libcxxabi + libunwind + lld + lldb + llvm + openmp + polly ) + for p in ${my_projects[@]}; do + git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git + git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git + done + + # Pull in all the commits. + git -C my-monorepo fetch --all + + # Run migrate-downstream-fork to rewrite local branches on top of + # the upstream monorepo. + ( + cd my-monorepo + migrate-downstream-fork.py \ + refs/remotes/local \ + refs/tags \ + --new-repo-prefix=refs/remotes/upstream/monorepo \ + --old-repo-prefix=refs/remotes/upstream/split \ + --revmap-out=monorepo-map.txt + ) + + # Octopus-merge the resulting local split histories to unify them. + + # Assumes local work on local split mirrors is on master (and + # upstream is presumably represented by some other branch like + # upstream/master). + my_local_branch="master" + + git -C my-monorepo branch --no-track local/octopus/master \ + $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \ + refs/remotes/local/split/llvm/${my_local_branch}) + git -C my-monorepo checkout local/octopus/${my_local_branch} + + subproject_branches=() + for p in ${my_projects[@]}; do + subproject_branch=${p}/local/monorepo/${my_local_branch} + git -C my-monorepo branch ${subproject_branch} \ + refs/remotes/local/split/${p}/${my_local_branch} + if [[ "${p}" != "llvm" ]]; then + subproject_branches+=( ${subproject_branch} ) + fi + done + + git -C my-monorepo merge ${subproject_branches[@]} + + for p in ${my_projects[@]}; do + subproject_branch=${p}/local/monorepo/${my_local_branch} + git -C my-monorepo branch -d ${subproject_branch} + done + + # Create local branches for upstream monorepo branches. + for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \ + refs/remotes/upstream/monorepo); do + upstream_branch=${ref#refs/remotes/upstream/monorepo/} + git -C my-monorepo branch upstream/${upstream_branch} ${ref} + done + +The above gets you to a state like the following:: + + U1 - U2 - U3 <- upstream/master + \ \ \ + \ \ - Llld1 - Llld2 - + \ \ \ + \ - Lclang1 - Lclang2-- Lmerge <- local/octopus/master + \ / + - Lllvm1 - Lllvm2----- + +Each branched component has its branch rewritten on top of the +monorepo and all components are unified by a giant octopus merge. + +If additional active local branches need to be preserved, the above +operations following the assignment to ``my_local_branch`` should be +done for each branch. Ref paths will need to be updated to map the +local branch to the corresponding upstream branch. If local branches +have no corresponding upstream branch, then the creation of +``local/octpous/`` need not use ``git-merge-base`` to +pinpont its root commit; it may simply be branched from the +appropriate component branch (say, ``llvm/local_release_X``). + +Zipping local history +--------------------- + +The octopus merge is suboptimal for many cases, because walking back +through the history of one component leaves the other components fixed +at a history that likely makes things unbuildable. + +Some downstream users track the order commits were made to subprojects +with some kind of "umbrella" project that imports the project git +mirrors as submodules, similar to the multirepo umnbrella proposed +above. Such an umbrella repository looks something like this:: + + UM1 ---- UM2 ---- UM3 ---- UM4 ---- UM5 ---- UM6 ---- UM7 <- master + | | | | | | + Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 + +The vertical bars represent submodule updates to a particular commit +in the project mirror. ``UM3`` in this case is a commit of some local +umbrella repository state that is not a submodule update, perhaps a +``README`` or project build script update. + +Ideally we would like our local monorepo branch to look something like +this:: + + U1 - U2 - U3 <- upstream/master + \ \ \ + \ -------------- + \ \ \ + - Lllvm1 -- Llld1 - Lclang1 - Lclang2 - Lllvm2 - Llld2 <- local/zip/master + +Note that the merge from ``U2`` to ``Lclang1`` is redundant but +harmless. The ``UM3`` commit has been eliminated entirely because its +contents presumably don't apply to the monorepo (more below). + +The tool ``zip-downstream-fork.py`` at +https://github.com/jyknight/llvm-git-migration/pull/2 can be used to +convert the unmbrella history into the desired history. Given the +above run of ``migrate-downstream-fork.py``, a recipe to create the +zipped history is below:: + + # Import any non-LLVM repositories the umbrella references. + git -C my-monorepo remote add localrepo https://my.local.mirror.org/localrepo.git + git fetch localrepo + + # Import umbrella history. + git -C my-monorepo remote add umbrella https://my.local.mirror.org/umbrella.git + git fetch umbrella + + ( + cd my-monorepo + zip-downstream-fork.py \ + refs/remotes/umbrella \ + --new-repo-prefix=refs/remotes/upstream/monorepo \ + --revmap-in=monorepo-map.txt \ + --revmap-out=zip-map.txt \ + --skipped-pick-first + ) + + # Create the zip branch (assuming umbrella master is wanted). + git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master + +``zip-downstream-fork.py`` will skip any commits that don't update +submodules. It cannot know in general whether the trees of those +commits might conflict with trees in the monorepo (for example, a +top-level ``README`` update) and so such commits are simply discarded. +Enhancements to move such commits under a subdirectory should be +straightforward to make. + +The ``--skipped-pick-first`` option tells ``zip-downstream-fork.py`` +what to do in the case where a merge commit is skipped. Some commit +has to substitude for a skipped commit in the history so that children +of the skipped commit have a parent to reference. For commits with a +single parent, this is straightforward: the parent commit substitutes +for the skipped commit. For merges, ``--skipped-pick-first`` says +that the first parent of the merge substitutes for the skipped merge. +Leaving ``--skipped-pick-first`` off the command-line will result in +``zip-downstream-fork.py`` dropping into an interactive session to +have the user tell it what to do. + +Note that if the umbrella has submodules to non-LLVM repositories, +``zip-downstream-fork.py`` needs to know about them to be able to +rewrite commits. That is why the first step above is to fetch commits +from such repositories. + +Importing local repositories +---------------------------- + +You may have additional repositories that integrate with the LLVM +ecosystem, essentially extending it with new tools. If such +repositories are tightly coupled with LLVM, it may make sense to +import them into your local mirror of the monorepo. + +If such repositores participated in the umbrella repository used +during the zipping process above, they will automatically be added to +the monorepo. For downstream repositories that don't participate in +an umbrella setup, the ``import-downstream-repo.py`` tool at +https://github.com/jyknight/llvm-git-migration/pull/6 can help with +getting them into the monorepo. A recipe follows:: + + # Import repo history. + git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git + git fetch myrepo + + my_local_tags=( refs/tags/release + refs/tags/hotfix ) + + ( + cd my-monorepo + import-downstream-repo.py \ + refs/remotes/myrepo \ + ${my_local_tags[@]} \ + --new-repo-prefix=refs/remotes/upstream/monorepo \ + --subdir=myrepo \ + --tags-prefix=myrepo + ) + + # Preserve release braches. + for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \ + refs/remotes/myrepo/release; do + branch=${ref#refs/remotes/myrepo/} + git -C my-monorepo branch --no-track myrepo/${branch} ${ref} + done + + # Preserve master. + git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master + + # Merge master. + git -C my-monorepo checkout local/zip/master # Or local/octopus/master + git -C my-monorepo merge myrepo/master + +You may want to merge other corresponding branches, for example +``myrepo`` release branches if they were in lockstep with LLVM project +releases. + +``--tags-prefix`` tells ``import-downstream-repo.py`` to rename +annotated tags with the given prefix. Due to limitations with +``fast_filter_branch.py``, unannotated tags cannot be renamed +(``fast_filter_branch.py`` considers them branches, not tags). Since +the upstream monorepo had its tags rewritten with an "llvm.org-" +prefix, name conflicts should not be an issue. ``--tags-prefix`` can +be used to more clearly indicate which tags correspond to various +imported repositories. + +Given this repository history:: + + R1 - R2 - R3 <- master + ^ + | + tag1 + +The above recipe results in a history like this:: + + U1 - U2 - U3 <- upstream/master + \ \ \ + \ -------------- + \ \ \ + - Lllvm1 -- Llld1 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - M1 <- local/zip/master + / + R1 - R2 - R3 <- myrepo/master + ^ + | + myrepo-tag1 + +Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs +from ``myrepo``. There are some options to +``import-downstream-repo.py`` that attempt to rewrite myrepo trees on +top of monorepo trees but they are untested and assumed to be broken. +For most cases the above should be sufficient. If you require commits +from myrepo to be interleaved with commits on local project branches +(for example, interleaved with llvm1, llvm2, etc. above) and myrepo +doesn't appear in an umbrella repository, a new tool will need to be +developed. Creating such a tool would involve: + +1. Modifying ``fast_filter_branch.py`` to optionally take a + revlist directly rather than generating it itself + +2. Creating a tool to generate an interleaved ordering of local + commits based on some criteria (``zip-downstream-fork.py`` uses the + umbrella history as its criterion) + +3. Generting such an ordering and feeding it to + ``fast_filter_branch.py`` as a revlist. + +Some care will also likely need to be taken to handle merge commits, +to ensure the parents of such commits migrate correctly. + +Scrubbing the Local Monorepo +---------------------------- + +Once all of the migrating, zipping and importing is done, it's time to +clean up. The python tools use git-fast-import which leaves a lot of +cruft around and we want to shrink our new monorepo mirror as much as +possible. Here is one way to do it:: + + git -C my-monorepo checkout master + + # Delete branches we no longer need. Do this for any other branches + # you merged above. + git -C my-monorepo branch -D local/zip/master || true + git -C my-monorepo branch -D locaol/octopus/master || true + + # Remove remotes. + git -C my-monorepo remote remove upstream/monorepo + + for p in ${my_projects[@]}; do + git -C my-monorepo remote remove upstream/split/${p} + git -C my-monorepo remote remove local/split/${p} + done + + git -C my-monorepo remote remove localrepo + git -C my-monorepo remote remove umbrella + git -C my-monorepo remote remove myrepo + + # Add anything else here you don't need. refs/tags/release is + # listed below assuming tags have been rewritten with a local prefix. + # If not, remove it from this list. + refs_to_clean=( + refs/original + refs/remotes + refs/tags/backups + refs/tags/release + ) + + git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} | + xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d + + git -C my-monorepo reflog expire --all --expire=now + + # fast_filter_branch.py might have gc running in the background. + while ! run ${git_tool} \ + -c gc.reflogExpire=0 \ + -c gc.reflogExpireUnreachable=0 \ + -c gc.rerereresolved=0 \ + -c gc.rerereunresolved=0 \ + -c gc.pruneExpire=now \ + gc --prune=now; do + continue + done + + # Takes a LOOOONG time! + git -C my-monorepo repack -A -d -f --depth=250 --window=250 + + git -C my-monorepo prune-packed + git -C my-monorepo prune + +You should now have a trim monorepo. Upload it to your git server and +happy hacking! References ==========