Index: llvm/trunk/docs/Proposals/GitHubMove.rst =================================================================== --- llvm/trunk/docs/Proposals/GitHubMove.rst +++ llvm/trunk/docs/Proposals/GitHubMove.rst @@ -852,6 +852,540 @@ less like to encounter a build failure where a commit change an API in LLVM and another later one "fixes" the build in clang. +Moving Local Branches to the Monorepo +===================================== + +Suppose you have been developing against the existing LLVM git +mirrors. You have one or more git branches that you want to migrate +to the "final monorepo". + +The simplest way to migrate such branches is with the +``migrate-downstream-fork.py`` tool at +https://github.com/jyknight/llvm-git-migration. + +Basic migration +--------------- + +Basic instructions for ``migrate-downstream-fork.py`` are in the +Python script and are expanded on below to a more general recipe:: + + # Make a repository which will become your final local mirror of the + # monorepo. + mkdir my-monorepo + git -C my-monorepo init + + # Add a remote to the monorepo. + git -C my-monorepo remote add upstream/monorepo https://github.com/llvm/llvm-project.git + + # Add remotes for each git mirror you use, from upstream as well as + # your local mirror. All projects are listed here but you need only + # import those for which you have local branches. + my_projects=( clang + clang-tools-extra + compiler-rt + debuginfo-tests + libcxx + libcxxabi + libunwind + lld + lldb + llvm + openmp + polly ) + for p in ${my_projects[@]}; do + git -C my-monorepo remote add upstream/split/${p} https://github.com/llvm-mirror/${p}.git + git -C my-monorepo remote add local/split/${p} https://my.local.mirror.org/${p}.git + done + + # Pull in all the commits. + git -C my-monorepo fetch --all + + # Run migrate-downstream-fork to rewrite local branches on top of + # the upstream monorepo. + ( + cd my-monorepo + migrate-downstream-fork.py \ + refs/remotes/local \ + refs/tags \ + --new-repo-prefix=refs/remotes/upstream/monorepo \ + --old-repo-prefix=refs/remotes/upstream/split \ + --source-kind=split \ + --revmap-out=monorepo-map.txt + ) + + # Octopus-merge the resulting local split histories to unify them. + + # Assumes local work on local split mirrors is on master (and + # upstream is presumably represented by some other branch like + # upstream/master). + my_local_branch="master" + + git -C my-monorepo branch --no-track local/octopus/master \ + $(git -C my-monorepo merge-base refs/remotes/upstream/monorepo/master \ + refs/remotes/local/split/llvm/${my_local_branch}) + git -C my-monorepo checkout local/octopus/${my_local_branch} + + subproject_branches=() + for p in ${my_projects[@]}; do + subproject_branch=${p}/local/monorepo/${my_local_branch} + git -C my-monorepo branch ${subproject_branch} \ + refs/remotes/local/split/${p}/${my_local_branch} + if [[ "${p}" != "llvm" ]]; then + subproject_branches+=( ${subproject_branch} ) + fi + done + + git -C my-monorepo merge ${subproject_branches[@]} + + for p in ${my_projects[@]}; do + subproject_branch=${p}/local/monorepo/${my_local_branch} + git -C my-monorepo branch -d ${subproject_branch} + done + + # Create local branches for upstream monorepo branches. + for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \ + refs/remotes/upstream/monorepo); do + upstream_branch=${ref#refs/remotes/upstream/monorepo/} + git -C my-monorepo branch upstream/${upstream_branch} ${ref} + done + +The above gets you to a state like the following:: + + U1 - U2 - U3 <- upstream/master + \ \ \ + \ \ - Llld1 - Llld2 - + \ \ \ + \ - Lclang1 - Lclang2-- Lmerge <- local/octopus/master + \ / + - Lllvm1 - Lllvm2----- + +Each branched component has its branch rewritten on top of the +monorepo and all components are unified by a giant octopus merge. + +If additional active local branches need to be preserved, the above +operations following the assignment to ``my_local_branch`` should be +done for each branch. Ref paths will need to be updated to map the +local branch to the corresponding upstream branch. If local branches +have no corresponding upstream branch, then the creation of +``local/octopus/`` need not use ``git-merge-base`` to +pinpont its root commit; it may simply be branched from the +appropriate component branch (say, ``llvm/local_release_X``). + +Zipping local history +--------------------- + +The octopus merge is suboptimal for many cases, because walking back +through the history of one component leaves the other components fixed +at a history that likely makes things unbuildable. + +Some downstream users track the order commits were made to subprojects +with some kind of "umbrella" project that imports the project git +mirrors as submodules, similar to the multirepo umbrella proposed +above. Such an umbrella repository looks something like this:: + + UM1 ---- UM2 -- UM3 -- UM4 ---- UM5 ---- UM6 ---- UM7 ---- UM8 <- master + | | | | | | | + Lllvm1 Llld1 Lclang1 Lclang2 Lllvm2 Llld2 Lmyproj1 + +The vertical bars represent submodule updates to a particular local +commit in the project mirror. ``UM3`` in this case is a commit of +some local umbrella repository state that is not a submodule update, +perhaps a ``README`` or project build script update. Commit ``UM8`` +updates a submodule of local project ``myproj``. + +The tool ``zip-downstream-fork.py`` at +https://github.com/greened/llvm-git-migration/tree/zip can be used to +convert the umbrella history into a monorepo-based history with +commits in the order implied by submodule updates:: + + U1 - U2 - U3 <- upstream/master + \ \ \ + \ -----\--------------- local/zip--. + \ \ \ | + - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 <-' + + +The ``U*`` commits represent upstream commits to the monorepo master +branch. Each submodule update in the local ``UM*`` commits brought in +a subproject tree at some local commit. The trees in the ``L*1`` +commits represent merges from upstream. These result in edges from +the ``U*`` commits to their corresponding rewritten ``L*1`` commits. +The ``L*2`` commits did not do any merges from upstream. + +Note that the merge from ``U2`` to ``Lclang1`` appears redundant, but +if, say, ``U3`` changed some files in upstream clang, the ``Lclang1`` +commit appearing after the ``Llld1`` commit would actually represent a +clang tree *earlier* in the upstream clang history. We want the +``local/zip`` branch to accurately represent the state of our umbrella +history and so the edge ``U2 -> Lclang1`` is a visual reminder of what +clang's tree actually looks like in ``Lclang1``. + +Even so, the edge ``U3 -> Llld1`` could be problematic for future +merges from upstream. git will think that we've already merged from +``U3``, and we have, except for the state of the clang tree. One +possible migitation strategy is to manually diff clang between ``U2`` +and ``U3`` and apply those updates to ``local/zip``. Another, +possibly simpler strategy is to freeze local work on downstream +branches and merge all submodules from the latest upstream before +running ``zip-downstream-fork.py``. If downstream merged each project +from upstream in lockstep without any intervening local commits, then +things should be fine without any special action. We anticipate this +to be the common case. + +The tree for ``Lclang1`` outside of clang will represent the state of +things at ``U3`` since all of the upstream projects not participating +in the umbrella history should be in a state respecting the commit +``U3``. The trees for llvm and lld should correctly represent commits +``Lllvm1`` and ``Llld1``, respectively. + +Commit ``UM3`` changed files not related to submodules and we need +somewhere to put them. It is not safe in general to put them in the +monorepo root directory because they may conflict with files in the +monorepo. Let's assume we want them in a directory ``local`` in the +monorepo. + +**Example 1: Umbrella looks like the monorepo** + +For this example, we'll assume that each subproject appears in its own +top-level directory in the umbrella, just as they do in the monorepo . +Let's also assume that we want the files in directory ``myproj`` to +appear in ``local/myproj``. + +Given the above run of ``migrate-downstream-fork.py``, a recipe to +create the zipped history is below:: + + # Import any non-LLVM repositories the umbrella references. + git -C my-monorepo remote add localrepo \ + https://my.local.mirror.org/localrepo.git + git fetch localrepo + + subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc + libcxx libcxxabi libunwind lld lldb llgo llvm openmp + parallel-libs polly pstl ) + + # Import histories for upstream split projects (this was probably + # already done for the ``migrate-downstream-fork.py`` run). + for project in ${subprojects[@]}; do + git remote add upstream/split/${project} \ + https://github.com/llvm-mirror/${subproject}.git + git fetch umbrella/split/${project} + done + + # Import histories for downstream split projects (this was probably + # already done for the ``migrate-downstream-fork.py`` run). + for project in ${subprojects[@]}; do + git remote add local/split/${project} \ + https://my.local.mirror.org/${subproject}.git + git fetch local/split/${project} + done + + # Import umbrella history. + git -C my-monorepo remote add umbrella \ + https://my.local.mirror.org/umbrella.git + git fetch umbrella + + # Put myproj in local/myproj + echo "myproj local/myproj" > my-monorepo/submodule-map.txt + + # Rewrite history + ( + cd my-monorepo + zip-downstream-fork.py \ + refs/remotes/umbrella \ + --new-repo-prefix=refs/remotes/upstream/monorepo \ + --old-repo-prefix=refs/remotes/upstream/split \ + --revmap-in=monorepo-map.txt \ + --revmap-out=zip-map.txt \ + --subdir=local \ + --submodule-map=submodule-map.txt \ + --update-tags + ) + + # Create the zip branch (assuming umbrella master is wanted). + git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master + +Note that if the umbrella has submodules to non-LLVM repositories, +``zip-downstream-fork.py`` needs to know about them to be able to +rewrite commits. That is why the first step above is to fetch commits +from such repositories. + +With ``--update-tags`` the tool will migrate annotated tags pointing +to submodule commits that were inlined into the zipped history. If +the umbrella pulled in an upstream commit that happened to have a tag +pointing to it, that tag will be migrated, which is almost certainly +not what is wanted. The tag can always be moved back to its original +commit after rewriting, or the ``--update-tags`` option may be +discarded and any local tags would then be migrated manually. + +**Example 2: Nested sources layout** + +The tool handles nested submodules (e.g. llvm is a submodule in +umbrella and clang is a submodule in llvm). The file +``submodule-map.txt`` is a list of pairs, one per line. The first +pair item describes the path to a submodule in the umbrella +repository. The second pair item secribes the path where trees for +that submodule should be written in the zipped history. + +Let's say your umbrella repository is actually the llvm repository and +it has submodules in the "nested sources" layout (clang in +tools/clang, etc.). Let's also say ``projects/myproj`` is a submodule +pointing to some downstream repository. The submodule map file should +look like this (we still want myproj mapped the same way as +previously):: + + tools/clang clang + tools/clang/tools/extra clang-tools-extra + projects/compiler-rt compiler-rt + projects/debuginfo-tests debuginfo-tests + projects/libclc libclc + projects/libcxx libcxx + projects/libcxxabi libcxxabi + projects/libunwind libunwind + tools/lld lld + tools/lldb lldb + projects/openmp openmp + tools/polly polly + projects/myproj local/myproj + +If a submodule path does not appear in the map, the tools assumes it +should be placed in the same place in the monorepo. That means if you +use the "nested sources" layout in your umrella, you *must* provide +map entries for all of the projects in your umbrella (except llvm). +Otherwise trees from submodule updates will appear underneath llvm in +the zippped history. + +Because llvm is itself the umbrella, we use --subdir to write its +content into ``llvm`` in the zippped history:: + + # Import any non-LLVM repositories the umbrella references. + git -C my-monorepo remote add localrepo \ + https://my.local.mirror.org/localrepo.git + git fetch localrepo + + subprojects=( clang clang-tools-extra compiler-rt debuginfo-tests libclc + libcxx libcxxabi libunwind lld lldb llgo llvm openmp + parallel-libs polly pstl ) + + # Import histories for upstream split projects (this was probably + # already done for the ``migrate-downstream-fork.py`` run). + for project in ${subprojects[@]}; do + git remote add upstream/split/${project} \ + https://github.com/llvm-mirror/${subproject}.git + git fetch umbrella/split/${project} + done + + # Import histories for downstream split projects (this was probably + # already done for the ``migrate-downstream-fork.py`` run). + for project in ${subprojects[@]}; do + git remote add local/split/${project} \ + https://my.local.mirror.org/${subproject}.git + git fetch local/split/${project} + done + + # Import umbrella history. We want this under a different refspec + # so zip-downstream-fork.py knows what it is. + git -C my-monorepo remote add umbrella \ + https://my.local.mirror.org/llvm.git + git fetch umbrella + + # Create the submodule map. + echo "tools/clang clang" > my-monorepo/submodule-map.txt + echo "tools/clang/tools/extra clang-tools-extra" >> my-monorepo/submodule-map.txt + echo "projects/compiler-rt compiler-rt" >> my-monorepo/submodule-map.txt + echo "projects/debuginfo-tests debuginfo-tests" >> my-monorepo/submodule-map.txt + echo "projects/libclc libclc" >> my-monorepo/submodule-map.txt + echo "projects/libcxx libcxx" >> my-monorepo/submodule-map.txt + echo "projects/libcxxabi libcxxabi" >> my-monorepo/submodule-map.txt + echo "projects/libunwind libunwind" >> my-monorepo/submodule-map.txt + echo "tools/lld lld" >> my-monorepo/submodule-map.txt + echo "tools/lldb lldb" >> my-monorepo/submodule-map.txt + echo "projects/openmp openmp" >> my-monorepo/submodule-map.txt + echo "tools/polly polly" >> my-monorepo/submodule-map.txt + echo "projects/myproj local/myproj" >> my-monorepo/submodule-map.txt + + # Rewrite history + ( + cd my-monorepo + zip-downstream-fork.py \ + refs/remotes/umbrella \ + --new-repo-prefix=refs/remotes/upstream/monorepo \ + --old-repo-prefix=refs/remotes/upstream/split \ + --revmap-in=monorepo-map.txt \ + --revmap-out=zip-map.txt \ + --subdir=llvm \ + --submodule-map=submodule-map.txt \ + --update-tags + ) + + # Create the zip branch (assuming umbrella master is wanted). + git -C my-monorepo branch --no-track local/zip/master refs/remotes/umbrella/master + + +Comments at the top of ``zip-downstream-fork.py`` describe in more +detail how the tool works and various implications of its operation. + +Importing local repositories +---------------------------- + +You may have additional repositories that integrate with the LLVM +ecosystem, essentially extending it with new tools. If such +repositories are tightly coupled with LLVM, it may make sense to +import them into your local mirror of the monorepo. + +If such repositores participated in the umbrella repository used +during the zipping process above, they will automatically be added to +the monorepo. For downstream repositories that don't participate in +an umbrella setup, the ``import-downstream-repo.py`` tool at +https://github.com/greened/llvm-git-migration/tree/import can help with +getting them into the monorepo. A recipe follows:: + + # Import downstream repo history into the monorepo. + git -C my-monorepo remote add myrepo https://my.local.mirror.org/myrepo.git + git fetch myrepo + + my_local_tags=( refs/tags/release + refs/tags/hotfix ) + + ( + cd my-monorepo + import-downstream-repo.py \ + refs/remotes/myrepo \ + ${my_local_tags[@]} \ + --new-repo-prefix=refs/remotes/upstream/monorepo \ + --subdir=myrepo \ + --tag-prefix="myrepo-" + ) + + # Preserve release braches. + for ref in $(git -C my-monorepo for-each-ref --format="%(refname)" \ + refs/remotes/myrepo/release); do + branch=${ref#refs/remotes/myrepo/} + git -C my-monorepo branch --no-track myrepo/${branch} ${ref} + done + + # Preserve master. + git -C my-monorepo branch --no-track myrepo/master refs/remotes/myrepo/master + + # Merge master. + git -C my-monorepo checkout local/zip/master # Or local/octopus/master + git -C my-monorepo merge myrepo/master + +You may want to merge other corresponding branches, for example +``myrepo`` release branches if they were in lockstep with LLVM project +releases. + +``--tag-prefix`` tells ``import-downstream-repo.py`` to rename +annotated tags with the given prefix. Due to limitations with +``fast_filter_branch.py``, unannotated tags cannot be renamed +(``fast_filter_branch.py`` considers them branches, not tags). Since +the upstream monorepo had its tags rewritten with an "llvmorg-" +prefix, name conflicts should not be an issue. ``--tag-prefix`` can +be used to more clearly indicate which tags correspond to various +imported repositories. + +Given this repository history:: + + R1 - R2 - R3 <- master + ^ + | + release/1 + +The above recipe results in a history like this:: + + U1 - U2 - U3 <- upstream/master + \ \ \ + \ -----\--------------- local/zip--. + \ \ \ | + - Lllvm1 - Llld1 - UM3 - Lclang1 - Lclang2 - Lllvm2 - Llld2 - Lmyproj1 - M1 <-' + / + R1 - R2 - R3 <-. + ^ | + | | + myrepo-release/1 | + | + myrepo/master--' + +Commits ``R1``, ``R2`` and ``R3`` have trees that *only* contain blobs +from ``myrepo``. If you require commits from ``myrepo`` to be +interleaved with commits on local project branches (for example, +interleaved with ``llvm1``, ``llvm2``, etc. above) and myrepo doesn't +appear in an umbrella repository, a new tool will need to be +developed. Creating such a tool would involve: + +1. Modifying ``fast_filter_branch.py`` to optionally take a + revlist directly rather than generating it itself + +2. Creating a tool to generate an interleaved ordering of local + commits based on some criteria (``zip-downstream-fork.py`` uses the + umbrella history as its criterion) + +3. Generating such an ordering and feeding it to + ``fast_filter_branch.py`` as a revlist + +Some care will also likely need to be taken to handle merge commits, +to ensure the parents of such commits migrate correctly. + +Scrubbing the Local Monorepo +---------------------------- + +Once all of the migrating, zipping and importing is done, it's time to +clean up. The python tools use ``git-fast-import`` which leaves a lot +of cruft around and we want to shrink our new monorepo mirror as much +as possible. Here is one way to do it:: + + git -C my-monorepo checkout master + + # Delete branches we no longer need. Do this for any other branches + # you merged above. + git -C my-monorepo branch -D local/zip/master || true + git -C my-monorepo branch -D local/octopus/master || true + + # Remove remotes. + git -C my-monorepo remote remove upstream/monorepo + + for p in ${my_projects[@]}; do + git -C my-monorepo remote remove upstream/split/${p} + git -C my-monorepo remote remove local/split/${p} + done + + git -C my-monorepo remote remove localrepo + git -C my-monorepo remote remove umbrella + git -C my-monorepo remote remove myrepo + + # Add anything else here you don't need. refs/tags/release is + # listed below assuming tags have been rewritten with a local prefix. + # If not, remove it from this list. + refs_to_clean=( + refs/original + refs/remotes + refs/tags/backups + refs/tags/release + ) + + git -C my-monorepo for-each-ref --format="%(refname)" ${refs_to_clean[@]} | + xargs -n1 --no-run-if-empty git -C my-monorepo update-ref -d + + git -C my-monorepo reflog expire --all --expire=now + + # fast_filter_branch.py might have gc running in the background. + while ! git -C my-monorepo \ + -c gc.reflogExpire=0 \ + -c gc.reflogExpireUnreachable=0 \ + -c gc.rerereresolved=0 \ + -c gc.rerereunresolved=0 \ + -c gc.pruneExpire=now \ + gc --prune=now; do + continue + done + + # Takes a LOOOONG time! + git -C my-monorepo repack -A -d -f --depth=250 --window=250 + + git -C my-monorepo prune-packed + git -C my-monorepo prune + +You should now have a trim monorepo. Upload it to your git server and +happy hacking! References ==========