Index: llvm/trunk/docs/Proposals/GitHubMove.rst =================================================================== --- llvm/trunk/docs/Proposals/GitHubMove.rst +++ llvm/trunk/docs/Proposals/GitHubMove.rst @@ -0,0 +1,868 @@ +============================== +Moving LLVM Projects to GitHub +============================== + +.. contents:: Table of Contents + :depth: 4 + :local: + +Introduction +============ + +This is a proposal to move our current revision control system from our own +hosted Subversion to GitHub. Below are the financial and technical arguments as +to why we are proposing such a move and how people (and validation +infrastructure) will continue to work with a Git-based LLVM. + +There will be a survey pointing at this document which we'll use to gauge the +community's reaction and, if we collectively decide to move, the time-frame. Be +sure to make your view count. + +Additionally, we will discuss this during a BoF at the next US LLVM Developer +meeting (http://llvm.org/devmtg/2016-11/). + +What This Proposal is *Not* About +================================= + +Changing the development policy. + +This proposal relates only to moving the hosting of our source-code repository +from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing +using GitHub's issue tracker, pull-requests, or code-review. + +Contributers will continue to earn commit access on demand under the Developer +Policy, except that that a GitHub account will be required instead of SVN +username/password-hash. + +Why Git, and Why GitHub? +======================== + +Why Move At All? +---------------- + +This discussion began because we currently host our own Subversion server +and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and +provides limited support, but there is only so much it can do. + +Volunteers are not sysadmins themselves, but compiler engineers that happen +to know a thing or two about hosting servers. We also don't have 24/7 support, +and we sometimes wake up to see that continuous integration is broken because +the SVN server is either down or unresponsive. + +We should take advantage of one of the services out there (GitHub, GitLab, +and BitBucket, among others) that offer better service (24/7 stability, disk +space, Git server, code browsing, forking facilities, etc) for free. + +Why Git? +-------- + +Many new coders nowadays start with Git, and a lot of people have never used +SVN, CVS, or anything else. Websites like GitHub have changed the landscape +of open source contributions, reducing the cost of first contribution and +fostering collaboration. + +Git is also the version control many LLVM developers use. Despite the +sources being stored in a SVN server, these developers are already using Git +through the Git-SVN integration. + +Git allows you to: + +* Commit, squash, merge, and fork locally without touching the remote server. +* Maintain local branches, enabling multiple threads of development. +* Collaborate on these branches (e.g. through your own fork of llvm on GitHub). +* Inspect the repository history (blame, log, bisect) without Internet access. +* Maintain remote forks and branches on Git hosting services and + integrate back to the main repository. + +In addition, because Git seems to be replacing many OSS projects' version +control systems, there are many tools that are built over Git. +Future tooling may support Git first (if not only). + +Why GitHub? +----------- + +GitHub, like GitLab and BitBucket, provides free code hosting for open source +projects. Any of these could replace the code-hosting infrastructure that we +have today. + +These services also have a dedicated team to monitor, migrate, improve and +distribute the contents of the repositories depending on region and load. + +GitHub has one important advantage over GitLab and +BitBucket: it offers read-write **SVN** access to the repository +(https://github.com/blog/626-announcing-svn-support). +This would enable people to continue working post-migration as though our code +were still canonically in an SVN repository. + +In addition, there are already multiple LLVM mirrors on GitHub, indicating that +part of our community has already settled there. + +On Managing Revision Numbers with Git +------------------------------------- + +The current SVN repository hosts all the LLVM sub-projects alongside each other. +A single revision number (e.g. r123456) thus identifies a consistent version of +all LLVM sub-projects. + +Git does not use sequential integer revision number but instead uses a hash to +identify each commit. (Linus mentioned that the lack of such revision number +is "the only real design mistake" in Git [TorvaldRevNum]_.) + +The loss of a sequential integer revision number has been a sticking point in +past discussions about Git: + +- "The 'branch' I most care about is mainline, and losing the ability to say + 'fixed in r1234' (with some sort of monotonically increasing number) would + be a tragic loss." [LattnerRevNum]_ +- "I like those results sorted by time and the chronology should be obvious, but + timestamps are incredibly cumbersome and make it difficult to verify that a + given checkout matches a given set of results." [TrickRevNum]_ +- "There is still the major regression with unreadable version numbers. + Given the amount of Bugzilla traffic with 'Fixed in...', that's a + non-trivial issue." [JSonnRevNum]_ +- "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_. + +However, Git can emulate this increasing revision number: +`git rev-list --count `. This identifier is unique only within a +single branch, but this means the tuple `(num, branch-name)` uniquely identifies +a commit. + +We can thus use this revision number to ensure that e.g. `clang -v` reports a +user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing +the objections raised above with respect to this aspect of Git. + +What About Branches and Merges? +------------------------------- + +In contrast to SVN, Git makes branching easy. Git's commit history is +represented as a DAG, a departure from SVN's linear history. However, we propose +to mandate making merge commits illegal in our canonical Git repository. + +Unfortunately, GitHub does not support server side hooks to enforce such a +policy. We must rely on the community to avoid pushing merge commits. + +GitHub offers a feature called `Status Checks`: a branch protected by +`status checks` requires commits to be whitelisted before the push can happen. +We could supply a pre-push hook on the client side that would run and check the +history, before whitelisting the commit being pushed [statuschecks]_. +However this solution would be somewhat fragile (how do you update a script +installed on every developer machine?) and prevents SVN access to the +repository. + +What About Commit Emails? +------------------------- + +We will need a new bot to send emails for each commit. This proposal leaves the +email format unchanged besides the commit URL. + +Straw Man Migration Plan +======================== + +Step #1 : Before The Move +------------------------- + +1. Update docs to mention the move, so people are aware of what is going on. +2. Set up a read-only version of the GitHub project, mirroring our current SVN + repository. +3. Add the required bots to implement the commit emails, as well as the + umbrella repository update (if the multirepo is selected) or the read-only + Git views for the sub-projects (if the monorepo is selected). + +Step #2 : Git Move +------------------ + +4. Update the buildbots to pick up updates and commits from the GitHub + repository. Not all bots have to migrate at this point, but it'll help + provide infrastructure testing. +5. Update Phabricator to pick up commits from the GitHub repository. +6. LNT and llvmlab have to be updated: they rely on unique monotonically + increasing integer across branch [MatthewsRevNum]_. +7. Instruct downstream integrators to pick up commits from the GitHub + repository. +8. Review and prepare an update for the LLVM documentation. + +Until this point nothing has changed for developers, it will just +boil down to a lot of work for buildbot and other infrastructure +owners. + +The migration will pause here until all dependencies have cleared, and all +problems have been solved. + +Step #3: Write Access Move +-------------------------- + +9. Collect developers' GitHub account information, and add them to the project. +10. Switch the SVN repository to read-only and allow pushes to the GitHub repository. +11. Update the documentation. +12. Mirror Git to SVN. + +Step #4 : Post Move +------------------- + +13. Archive the SVN repository. +14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to + point to GitHub instead. + +One or Multiple Repositories? +============================= + +There are two major variants for how to structure our Git repository: The +"multirepo" and the "monorepo". + +Multirepo Variant +----------------- + +This variant recommends moving each LLVM sub-project to a separate Git +repository. This mimics the existing official read-only Git repositories +(e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical +repositories for each sub-project. + +This will allow the individual sub-projects to remain distinct: a +developer interested only in compiler-rt can checkout only this repository, +build it, and work in isolation of the other sub-projects. + +A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or +clang+llvm+libcxx for example) at a specific revision. + +A tuple of revisions (one entry per repository) accurately describes the state +across the sub-projects. +For example, a given version of clang would be +**. + +Umbrella Repository +^^^^^^^^^^^^^^^^^^^ + +To make this more convenient, a separate *umbrella* repository will be +provided. This repository will be used for the sole purpose of understanding +the sequence in which commits were pushed to the different repositories and to +provide a single revision number. + +This umbrella repository will be read-only and continuously updated +to record the above tuple. The proposed form to record this is to use Git +[submodules]_, possibly along with a set of scripts to help check out a +specific revision of the LLVM distribution. + +A regular LLVM developer does not need to interact with the umbrella repository +-- the individual repositories can be checked out independently -- but you would +need to use the umbrella repository to bisect multiple sub-projects at the same +time, or to check-out old revisions of LLVM with another sub-project at a +consistent state. + +This umbrella repository will be updated automatically by a bot (running on +notice from a webhook on every push, and periodically) on a per commit basis: a +single commit in the umbrella repository would match a single commit in a +sub-project. + +Living Downstream +^^^^^^^^^^^^^^^^^ + +Downstream SVN users can use the read/write SVN bridges with the following +caveats: + + * Be prepared for a one-time change to the upstream revision numbers. + * The upstream sub-project revision numbers will no longer be in sync. + +Downstream Git users can continue without any major changes, with the minor +change of upstreaming using `git push` instead of `git svn dcommit`. + +Git users also have the option of adopting an umbrella repository downstream. +The tooling for the upstream umbrella can easily be reused for downstream needs, +incorporating extra sub-projects and branching in parallel with sub-project +branches. + +Multirepo Preview +^^^^^^^^^^^^^^^^^ + +As a preview (disclaimer: this rough prototype, not polished and not +representative of the final solution), you can look at the following: + + * Repository: https://github.com/llvm-beanz/llvm-submodules + * Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/ + +Concerns +^^^^^^^^ + + * Because GitHub does not allow server-side hooks, and because there is no + "push timestamp" in Git, the umbrella repository sequence isn't totally + exact: commits from different repositories pushed around the same time can + appear in different orders. However, we don't expect it to be the common case + or to cause serious issues in practice. + * You can't have a single cross-projects commit that would update both LLVM and + other sub-projects (something that can be achieved now). It would be possible + to establish a protocol whereby users add a special token to their commit + messages that causes the umbrella repo's updater bot to group all of them + into a single revision. + * Another option is to group commits that were pushed closely enough together + in the umbrella repository. This has the advantage of allowing cross-project + commits, and is less sensitive to mis-ordering commits. However, this has the + potential to group unrelated commits together, especially if the bot goes + down and needs to catch up. + * This variant relies on heavier tooling. But the current prototype shows that + it is not out-of-reach. + * Submodules don't have a good reputation / are complicating the command line. + However, in the proposed setup, a regular developer will seldom interact with + submodules directly, and certainly never update them. + * Refactoring across projects is not friendly: taking some functions from clang + to make it part of a utility in libSupport wouldn't carry the history of the + code in the llvm repo, preventing recursively applying `git blame` for + instance. However, this is not very different than how most people are + Interacting with the repository today, by splitting such change in multiple + commits. + +Workflows +^^^^^^^^^ + + * :ref:`Checkout/Clone a Single Project, without Commit Access `. + * :ref:`Checkout/Clone a Single Project, with Commit Access `. + * :ref:`Checkout/Clone Multiple Projects, with Commit Access `. + * :ref:`Commit an API Change in LLVM and Update the Sub-projects `. + * :ref:`Branching/Stashing/Updating for Local Development or Experiments `. + * :ref:`Bisecting `. + +Monorepo Variant +---------------- + +This variant recommends moving all LLVM sub-projects to a single Git repository, +similar to https://github.com/llvm-project/llvm-project. +This would mimic an export of the current SVN repository, with each sub-project +having its own top-level directory. +Not all sub-projects are used for building toolchains. In practice, www/ +and test-suite/ will probably stay out of the monorepo. + +Putting all sub-projects in a single checkout makes cross-project refactoring +naturally simple: + + * New sub-projects can be trivially split out for better reuse and/or layering + (e.g., to allow libSupport and/or LIT to be used by runtimes without adding a + dependency on LLVM). + * Changing an API in LLVM and upgrading the sub-projects will always be done in + a single commit, designing away a common source of temporary build breakage. + * Moving code across sub-project (during refactoring for instance) in a single + commit enables accurate `git blame` when tracking code change history. + * Tooling based on `git grep` works natively across sub-projects, allowing to + easier find refactoring opportunities across projects (for example reusing a + datastructure initially in LLDB by moving it into libSupport). + * Having all the sources present encourages maintaining the other sub-projects + when changing API. + +Finally, the monorepo maintains the property of the existing SVN repository that +the sub-projects move synchronously, and a single revision number (or commit +hash) identifies the state of the development across all projects. + +.. _build_single_project: + +Building a single sub-project +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Nobody will be forced to build unnecessary projects. The exact structure +is TBD, but making it trivial to configure builds for a single sub-project +(or a subset of sub-projects) is a hard requirement. + +As an example, it could look like the following:: + + mkdir build && cd build + # Configure only LLVM (default) + cmake path/to/monorepo + # Configure LLVM and lld + cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld + # Configure LLVM and clang + cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=clang + +.. _git-svn-mirror: + +Read/write sub-project mirrors +------------------------------ + +With the Monorepo, the existing single-subproject mirrors (e.g. +http://llvm.org/git/compiler-rt.git) with git-svn read-write access would +continue to be maintained: developers would continue to be able to use the +existing single-subproject git repositories as they do today, with *no changes +to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to +work identically to how it works today. The monorepo can be set-up such that the +SVN revision number matches the SVN revision in the GitHub SVN-bridge. + +Living Downstream +^^^^^^^^^^^^^^^^^ + +Downstream SVN users can use the read/write SVN bridge. The SVN revision +number can be preserved in the monorepo, minimizing the impact. + +Downstream Git users can continue without any major changes, by using the +git-svn mirrors on top of the SVN bridge. + +Git users can also work upstream with monorepo even if their downstream +fork has split repositories. They can apply patches in the appropriate +subdirectories of the monorepo using, e.g., `git am --directory=...`, or +plain `diff` and `patch`. + +Alternatively, Git users can migrate their own fork to the monorepo. As a +demonstration, we've migrated the "CHERI" fork to the monorepo in two ways: + + * Using a script that rewrites history (including merges) so that it looks + like the fork always lived in the monorepo [LebarCHERI]_. The upside of + this is when you check out an old revision, you get a copy of all llvm + sub-projects at a consistent revision. (For instance, if it's a clang + fork, when you check out an old revision you'll get a consistent version + of llvm proper.) The downside is that this changes the fork's commit + hashes. + + * Merging the fork into the monorepo [AminiCHERI]_. This preserves the + fork's commit hashes, but when you check out an old commit you only get + the one sub-project. + +Monorepo Preview +^^^^^^^^^^^^^^^^^ + +As a preview (disclaimer: this rough prototype, not polished and not +representative of the final solution), you can look at the following: + + * Full Repository: https://github.com/joker-eph/llvm-project + * Single sub-project view with *SVN write access* to the full repo: + https://github.com/joker-eph/compiler-rt + +Concerns +^^^^^^^^ + + * Using the monolithic repository may add overhead for those contributing to a + standalone sub-project, particularly on runtimes like libcxx and compiler-rt + that don't rely on LLVM; currently, a fresh clone of libcxx is only 15MB (vs. + 1GB for the monorepo), and the commit rate of LLVM may cause more frequent + `git push` collisions when upstreaming. Affected contributors can continue to + use the SVN bridge or the single-subproject Git mirrors with git-svn for + read-write. + * Using the monolithic repository may add overhead for those *integrating* a + standalone sub-project, even if they aren't contributing to it, due to the + same disk space concern as the point above. The availability of the + sub-project Git mirror addesses this, even without SVN access. + * Preservation of the existing read/write SVN-based workflows relies on the + GitHub SVN bridge, which is an extra dependency. Maintaining this locks us + into GitHub and could restrict future workflow changes. + +Workflows +^^^^^^^^^ + + * :ref:`Checkout/Clone a Single Project, without Commit Access `. + * :ref:`Checkout/Clone a Single Project, with Commit Access `. + * :ref:`Checkout/Clone Multiple Projects, with Commit Access `. + * :ref:`Commit an API Change in LLVM and Update the Sub-projects `. + * :ref:`Branching/Stashing/Updating for Local Development or Experiments `. + * :ref:`Bisecting `. + +Multi/Mono Hybrid Variant +------------------------- + +This variant recommends moving only the LLVM sub-projects that are *rev-locked* +to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo +proposal for the rest. While neither variant recommends combining sub-projects +like www/ and test-suite/ (which are completely standalone), this goes further +and keeps sub-projects like libcxx and compiler-rt in their own distinct +repositories. + +Concerns +^^^^^^^^ + + * This has most disadvantages of multirepo and monorepo, without bringing many + of the advantages. + * Downstream have to upgrade to the monorepo structure, but only partially. So + they will keep the infrastructure to integrate the other separate + sub-projects. + * All projects that use LIT for testing are effectively rev-locked to LLVM. + Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang. + It's not clear where to draw the lines. + + +Workflow Before/After +===================== + +This section goes through a few examples of workflows, intended to illustrate +how end-users or developers would interact with the repository for +various use-cases. + +.. _workflow-checkout-commit: + +Checkout/Clone a Single Project, without Commit Access +------------------------------------------------------ + +Except the URL, nothing changes. The possibilities today are:: + + svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm + # or with Git + git clone http://llvm.org/git/llvm.git + +After the move to GitHub, you would do either:: + + git clone https://github.com/llvm-project/llvm.git + # or using the GitHub svn native bridge + svn co https://github.com/llvm-project/llvm/trunk + +The above works for both the monorepo and the multirepo, as we'll maintain the +existing read-only views of the individual sub-projects. + +Checkout/Clone a Single Project, with Commit Access +--------------------------------------------------- + +Currently +^^^^^^^^^ + +:: + + # direct SVN checkout + svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm + # or using the read-only Git view, with git-svn + git clone http://llvm.org/git/llvm.git + cd llvm + git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username= + git config svn-remote.svn.fetch :refs/remotes/origin/master + git svn rebase -l # -l avoids fetching ahead of the git mirror. + +Commits are performed using `svn commit` or with the sequence `git commit` and +`git svn dcommit`. + +.. _workflow-multicheckout-nocommit: + +Multirepo Variant +^^^^^^^^^^^^^^^^^ + +With the multirepo variant, nothing changes but the URL, and commits can be +performed using `svn commit` or `git commit` and `git push`:: + + git clone https://github.com/llvm/llvm.git llvm + # or using the GitHub svn native bridge + svn co https://github.com/llvm/llvm/trunk/ llvm + +.. _workflow-monocheckout-nocommit: + +Monorepo Variant +^^^^^^^^^^^^^^^^ + +With the monorepo variant, there are a few options, depending on your +constraints. First, you could just clone the full repository:: + + git clone https://github.com/llvm/llvm-projects.git llvm + # or using the GitHub svn native bridge + svn co https://github.com/llvm/llvm-projects/trunk/ llvm + +At this point you have every sub-project (llvm, clang, lld, lldb, ...), which +:ref:`doesn't imply you have to build all of them `. You +can still build only compiler-rt for instance. In this way it's not different +from someone who would check out all the projects with SVN today. + +You can commit as normal using `git commit` and `git push` or `svn commit`, and +read the history for a single project (`git log libcxx` for example). + +Secondly, there are a few options to avoid checking out all the sources. + +**Using the GitHub SVN bridge** + +The GitHub SVN native bridge allows to checkout a subdirectory directly: + + svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt —username=... + +This checks out only compiler-rt and provides commit access using "svn commit", +in the same way as it would do today. + +**Using a Subproject Git Nirror** + +You can use *git-svn* and one of the sub-project mirrors:: + + # Clone from the single read-only Git repo + git clone http://llvm.org/git/llvm.git + cd llvm + # Configure the SVN remote and initialize the svn metadata + $ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=... + git config svn-remote.svn.fetch :refs/remotes/origin/master + git svn rebase -l + +In this case the repository contains only a single sub-project, and commits can +be made using `git svn dcommit`, again exactly as we do today. + +**Using a Sparse Checkouts** + +You can hide the other directories using a Git sparse checkout:: + + git config core.sparseCheckout true + echo /compiler-rt > .git/info/sparse-checkout + git read-tree -mu HEAD + +The data for all sub-projects is still in your `.git` directory, but in your +checkout, you only see `compiler-rt`. +Before you push, you'll need to fetch and rebase (`git pull --rebase`) as +usual. + +Note that when you fetch you'll likely pull in changes to sub-projects you don't +care about. If you are using spasre checkout, the files from other projects +won't appear on your disk. The only effect is that your commit hash changes. + +You can check whether the changes in the last fetch are relevant to your commit +by running:: + + git log origin/master@{1}..origin/master -- libcxx + +This command can be hidden in a script so that `git llvmpush` would perform all +these steps, fail only if such a dependent change exists, and show immediately +the change that prevented the push. An immediate repeat of the command would +(almost) certainly result in a successful push. +Note that today with SVN or git-svn, this step is not possible since the +"rebase" implicitly happens while committing (unless a conflict occurs). + +Checkout/Clone Multiple Projects, with Commit Access +---------------------------------------------------- + +Let's look how to assemble llvm+clang+libcxx at a given revision. + +Currently +^^^^^^^^^ + +:: + + svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION + cd llvm/tools + svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION + cd ../projects + svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION + +Or using git-svn:: + + git clone http://llvm.org/git/llvm.git + cd llvm/ + git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username= + git config svn-remote.svn.fetch :refs/remotes/origin/master + git svn rebase -l + git checkout `git svn find-rev -B r258109` + cd tools + git clone http://llvm.org/git/clang.git + cd clang/ + git svn init https://llvm.org/svn/llvm-project/clang/trunk --username= + git config svn-remote.svn.fetch :refs/remotes/origin/master + git svn rebase -l + git checkout `git svn find-rev -B r258109` + cd ../../projects/ + git clone http://llvm.org/git/libcxx.git + cd libcxx + git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username= + git config svn-remote.svn.fetch :refs/remotes/origin/master + git svn rebase -l + git checkout `git svn find-rev -B r258109` + +Note that the list would be longer with more sub-projects. + +.. _workflow-multicheckout-multicommit: + +Multirepo Variant +^^^^^^^^^^^^^^^^^ + +With the multirepo variant, the umbrella repository will be used. This is +where the mapping from a single revision number to the individual repositories +revisions is stored.:: + + git clone https://github.com/llvm-beanz/llvm-submodules + cd llvm-submodules + git checkout $REVISION + git submodule init + git submodule update clang llvm libcxx + # the list of sub-project is optional, `git submodule update` would get them all. + +At this point the clang, llvm, and libcxx individual repositories are cloned +and stored alongside each other. There are CMake flags to describe the directory +structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`, +etc. + +Another option is to checkout repositories based on the commit timestamp:: + + git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master` + +.. _workflow-monocheckout-multicommit: + +Monorepo Variant +^^^^^^^^^^^^^^^^ + +The repository contains natively the source for every sub-projects at the right +revision, which makes this straightforward:: + + git clone https://github.com/llvm/llvm-projects.git llvm-projects + cd llvm-projects + git checkout $REVISION + +As before, at this point clang, llvm, and libcxx are stored in directories +alongside each other. + +.. _workflow-cross-repo-commit: + +Commit an API Change in LLVM and Update the Sub-projects +-------------------------------------------------------- + +Today this is possible, even though not common (at least not documented) for +subversion users and for git-svn users. For example, few Git users try to update +LLD or Clang in the same commit as they change an LLVM API. + +The multirepo variant does not address this: one would have to commit and push +separately in every individual repository. It would be possible to establish a +protocol whereby users add a special token to their commit messages that causes +the umbrella repo's updater bot to group all of them into a single revision. + +The monorepo variant handles this natively. + +Branching/Stashing/Updating for Local Development or Experiments +---------------------------------------------------------------- + +Currently +^^^^^^^^^ + +SVN does not allow this use case, but developers that are currently using +git-svn can do it. Let's look in practice what it means when dealing with +multiple sub-projects. + +To update the repository to tip of trunk:: + + git pull + cd tools/clang + git pull + cd ../../projects/libcxx + git pull + +To create a new branch:: + + git checkout -b MyBranch + cd tools/clang + git checkout -b MyBranch + cd ../../projects/libcxx + git checkout -b MyBranch + +To switch branches:: + + git checkout AnotherBranch + cd tools/clang + git checkout AnotherBranch + cd ../../projects/libcxx + git checkout AnotherBranch + +.. _workflow-multi-branching: + +Multirepo Variant +^^^^^^^^^^^^^^^^^ + +The multirepo works the same as the current Git workflow: every command needs +to be applied to each of the individual repositories. +However, the umbrella repository makes this easy using `git submodule foreach` +to replicate a command on all the individual repositories (or submodules +in this case): + +To create a new branch:: + + git submodule foreach git checkout -b MyBranch + +To switch branches:: + + git submodule foreach git checkout AnotherBranch + +.. _workflow-mono-branching: + +Monorepo Variant +^^^^^^^^^^^^^^^^ + +Regular Git commands are sufficient, because everything is in a single +repository: + +To update the repository to tip of trunk:: + + git pull + +To create a new branch:: + + git checkout -b MyBranch + +To switch branches:: + + git checkout AnotherBranch + +Bisecting +--------- + +Assuming a developer is looking for a bug in clang (or lld, or lldb, ...). + +Currently +^^^^^^^^^ + +SVN does not have builtin bisection support, but the single revision across +sub-projects makes it possible to script around. + +Using the existing Git read-only view of the repositories, it is possible to use +the native Git bisection script over the llvm repository, and use some scripting +to synchronize the clang repository to match the llvm revision. + +.. _workflow-multi-bisecting: + +Multirepo Variant +^^^^^^^^^^^^^^^^^ + +With the multi-repositories variant, the cross-repository synchronization is +achieved using the umbrella repository. This repository contains only +submodules for the other sub-projects. The native Git bisection can be used on +the umbrella repository directly. A subtlety is that the bisect script itself +needs to make sure the submodules are updated accordingly. + +For example, to find which commit introduces a regression where clang-3.9 +crashes but not clang-3.8 passes, one should be able to simply do:: + + git bisect start release_39 release_38 + git bisect run ./bisect_script.sh + +With the `bisect_script.sh` script being:: + + #!/bin/sh + cd $UMBRELLA_DIRECTORY + git submodule update llvm clang libcxx #.... + cd $BUILD_DIR + + ninja clang || exit 125 # an exit code of 125 asks "git bisect" + # to "skip" the current commit + + ./bin/clang some_crash_test.cpp + +When the `git bisect run` command returns, the umbrella repository is set to +the state where the regression is introduced. The commit diff in the umbrella +indicate which submodule was updated, and the last commit in this sub-projects +is the one that the bisect found. + +.. _workflow-mono-bisecting: + +Monorepo Variant +^^^^^^^^^^^^^^^^ + +Bisecting on the monorepo is straightforward, and very similar to the above, +except that the bisection script does not need to include the +`git submodule update` step. + +The same example, finding which commit introduces a regression where clang-3.9 +crashes but not clang-3.8 passes, will look like:: + + git bisect start release_39 release_38 + git bisect run ./bisect_script.sh + +With the `bisect_script.sh` script being:: + + #!/bin/sh + cd $BUILD_DIR + + ninja clang || exit 125 # an exit code of 125 asks "git bisect" + # to "skip" the current commit + + ./bin/clang some_crash_test.cpp + +Also, since the monorepo handles commits update across multiple projects, you're +less like to encounter a build failure where a commit change an API in LLVM and +another later one "fixes" the build in clang. + + +References +========== + +.. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html +.. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html +.. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html +.. [TorvaldRevNum] Linus Torvald, http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html +.. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html +.. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules) +.. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/ +.. [LebarCHERI] Port *CHERI* to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html +.. [AminiCHERI] Port *CHERI* to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html Index: llvm/trunk/docs/Proposals/GitHubSubMod.rst =================================================================== --- llvm/trunk/docs/Proposals/GitHubSubMod.rst +++ llvm/trunk/docs/Proposals/GitHubSubMod.rst @@ -1,273 +0,0 @@ -=============================================== -Moving LLVM Projects to GitHub with Sub-Modules -=============================================== - -Introduction -============ - -This is a proposal to move our current revision control system from our own -hosted Subversion to GitHub. Below are the financial and technical arguments as -to why we need such a move and how will people (and validation infrastructure) -continue to work with a Git-based LLVM. - -There will be a survey pointing at this document when we'll know the community's -reaction and, if we collectively decide to move, the time-frames. Be sure to make -your views count. - -Essentially, the proposal is divided in the following parts: - -* Outline of the reasons to move to Git and GitHub -* Description on what the work flow will look like (compared to SVN) -* Remaining issues and potential problems -* The proposed migration plan - -Why Git, and Why GitHub? -======================== - -Why move at all? ----------------- - -The strongest reason for the move, and why this discussion started in the first -place, is that we currently host our own Subversion server and Git mirror in a -voluntary basis. The LLVM Foundation sponsors the server and provides limited -support, but there is only so much it can do. - -The volunteers are not Sysadmins themselves, but compiler engineers that happen -to know a thing or two about hosting servers. We also don't have 24/7 support, -and we sometimes wake up to see that continuous integration is broken because -the SVN server is either down or unresponsive. - -With time and money, the foundation and volunteers could improve our services, -implement more functionality and provide around the clock support, so that we -can have a first class infrastructure with which to work. But the cost is not -small, both in money and time invested. - -On the other hand, there are multiple services out there (GitHub, GitLab, -BitBucket among others) that offer that same service (24/7 stability, disk space, -Git server, code browsing, forking facilities, etc) for the very affordable price -of *free*. - -Why Git? --------- - -Most new coders nowadays start with Git. A lot of them have never used SVN, CVS -or anything else. Websites like GitHub have changed the landscape of open source -contributions, reducing the cost of first contribution and fostering -collaboration. - -Git is also the version control most LLVM developers use. Despite the sources -being stored in an SVN server, most people develop using the Git-SVN integration, -and that shows that Git is not only more powerful than SVN, but people have -resorted to using a bridge because its features are now indispensable to their -internal and external workflows. - -In essence, Git allows you to: - -* Commit, squash, merge, fork locally without any penalty to the server -* Add as many branches as necessary to allow for multiple threads of development -* Collaborate with peers directly, even without access to the Internet -* Have multiple trees without multiplying disk space. - -In addition, because Git seems to be replacing every project's version control -system, there are many more tools that can use Git's enhanced feature set, so -new tooling is much more likely to support Git first (if not only), than any -other version control system. - -Why GitHub? ------------ - -GitHub, like GitLab and BitBucket, provide free code hosting for open source -projects. Essentially, they will completely replace *all* the infrastructure that -we have today that serves code repository, mirroring, user control, etc. - -They also have a dedicated team to monitor, migrate, improve and distribute the -contents of the repositories depending on region and load. A level of quality -that we'd never have without spending money that would be better spent elsewhere, -for example development meetings, sponsoring disadvantaged people to work on -compilers and foster diversity and equality in our community. - -GitHub has the added benefit that we already have a presence there. Many -developers use it already, and the mirror from our current repository is already -set up. - -Furthermore, GitHub has an *SVN view* (https://github.com/blog/626-announcing-svn-support) -where people that still have/want to use SVN infrastructure and tooling can -slowly migrate or even stay working as if it was an SVN repository (including -read-write access). - -So, any of the three solutions solve the cost and maintenance problem, but GitHub -has two additional features that would be beneficial to the migration plan as -well as the community already settled there. - - -What will the new workflow look like -==================================== - -In order to move version control, we need to make sure that we get all the -benefits with the least amount of problems. That's why the migration plan will -be slow, one step at a time, and we'll try to make it look as close as possible -to the current style without impacting the new features we want. - -Each LLVM project will continue to be hosted as separate GitHub repository -under a single GitHub organisation. Users can continue to choose to use either -SVN or Git to access the repositories to suit their current workflow. - -In addition, we'll create a repository that will mimic our current *linear -history* repository. The most accepted proposal, then, was to have an umbrella -project that will contain *sub-modules* (https://git-scm.com/book/en/v2/Git-Tools-Submodules) -of all the LLVM projects and nothing else. - -This repository can be checked out on its own, in order to have *all* LLVM -projects in a single check-out, as many people have suggested, but it can also -only hold the references to the other projects, and be used for the sole purpose -of understanding the *sequence* in which commits were added by using the -``git rev-list --count hash`` or ``git describe hash`` commands. - -One example of such a repository is Takumi's llvm-project-submodule -(https://github.com/chapuni/llvm-project-submodule), which when checked out, -will have the references to all sub-modules but not check them out, so one will -need to *init* the module manually. This will allow the *exact* same behaviour -as checking out individual SVN repositories, as it will keep the correct linear -history. - -There is no need to additional tags, flags and properties, or external -services controlling the history, since both SVN and *git rev-list* can already -do that on their own. - -We will need additional server hooks to avoid non-fast-forwards commits (ex. -merges, forced pushes, etc) in order to keep the linearity of the history. - -The three types hooks to be implemented are: - -* Status Checks: By placing status checks on a protected branch, we can guarantee - that the history is kept linear and sane at all times, on all repositories. - See: https://help.github.com/articles/about-required-status-checks/ -* Umbrella updates: By using GitHub web hooks, we can update a small web-service - inside LLVM's own infrastructure to update the umbrella project remotely. The - maintenance of this service will be lower than the current SVN maintenance and - the scope of its failures will be less severe. - See: https://developer.github.com/webhooks/ -* Commits email update: By adding an email web hook, we can make every push show - in the lists, allowing us to retain history and do post-commit reviews. - See: https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/ - -Access will be transferred one-to-one to GitHub accounts for everyone that already -has commit access to our current repository. Those who don't have accounts will -have to create one in order to continue contributing to the project. In the -future, people only need to provide their GitHub accounts to be granted access. - -In a nutshell: - -* The projects' repositories will remain identical, with a new address (GitHub). -* They'll continue to have SVN access (Read-Write), but will also gain Git RW access. -* The linear history can still be accessed in the (RO) submodule meta project. -* Individual projects' history will be local (ie. not interlaced with the other - projects, as the current SVN repos are), and we need the umbrella project - (using submodules) to have the same view as we had in SVN. - -Additionally, each repository will have the following server hooks: - -* Pre-commit hooks to stop people from applying non-fast-forward merges -* Webhook to update the umbrella project (via buildbot or web services) -* Email hook to each commits list (llvm-commit, cfe-commit, etc) - -Essentially, we're adding Git RW access in addition to the already existing -structure, with all the additional benefits of it being in GitHub. - -Example of a working version: - -* Repository: https://github.com/llvm-beanz/llvm-submodules -* Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/ - -What will *not* be changed --------------------------- - -This is a change of version control system, not the whole infrastructure. There -are plans to replace our current tools (review, bugs, documents), but they're -all orthogonal to this proposal. - -We'll also be keeping the buildbots (and migrating them to use Git) as well as -LNT, and any other system that currently provides value upstream. - -Any discussion regarding those tools are out of scope in this proposal. - -Remaining questions and problems -================================ - -1. How much the SVN view emulates and how much it'll break tools/CI? - -For this one, we'll need people that will have problems in that area to tell -us what's wrong and how to help them fix it. - -We also recommend people and companies to migrate to Git, for its many other -additional benefits. - -2. Which tools will need changing? - -LNT may break, since it relies on SVN's history. We can continue to -use LNT with the SVN-View, but it would be best to move it to Git once and for -all. - -The LLVMLab bisect tool will also be affected and will need adjusting. As with -LNT, it should be fine to use GitHub's SVN view, but changing it to work on Git -will be required in the long term. - -Phabricator will also need to change its configuration to point at the GitHub -repositories, but since it already works with Git, this will be a trivial change. - -Migration Plan -============== - -If we decide to move, we'll have to set a date for the process to begin. - -As usual, we should be announcing big changes in one release to happen in the -next one. But since this won't impact external users (if they rely on our source -release tarballs), we don't necessarily have to. - -We will have to make sure all the *problems* reported are solved before the -final push. But we can start all non-binding processes (like mirroring to GitHub -and testing the SVN interface in it) before any hard decision. - -Here's a proposed plan: - -STEP #1 : Pre Move - -0. Update docs to mention the move, so people are aware the it's going on. -1. Register an official GitHub project with the LLVM foundation. -2. Setup another (read-only) mirror of llvm.org/git at this GitHub project, - adding all necessary hooks to avoid broken history (merge, dates, pushes), as - well as a webhook to update the umbrella project (see below). -3. Make sure we have an llvm-project (with submodules) setup in the official - account, with all necessary hooks (history, update, merges). -4. Make sure bisecting with llvm-project works. -5. Make sure no one has any other blocker. - -STEP #2 : Git Move - -6. Update the buildbots to pick up updates and commits from the official git - repository. -7. Update Phabricator to pick up commits from the official git repository. -8. Tell people living downstream to pick up commits from the official git - repository. -9. Give things time to settle. We could play some games like disabling the SVN - repository for a few hours on purpose so that people can test that their - infrastructure has really become independent of the SVN repository. - -Until this point nothing has changed for developers, it will just -boil down to a lot of work for buildbot and other infrastructure -owners. - -Once all dependencies are cleared, and all problems have been solved: - -STEP #3: Write Access Move - -10. Collect peoples GitHub account information, adding them to the project. -11. Switch SVN repository to read-only and allow pushes to the GitHub repository. -12. Mirror Git to SVN. - -STEP #4 : Post Move - -13. Archive the SVN repository, if GitHub's SVN is good enough. -14. Review and update *all* LLVM documentation. -15. Review website links pointing to viewvc/klaus/phab etc. to point to GitHub - instead. Index: llvm/trunk/docs/index.rst =================================================================== --- llvm/trunk/docs/index.rst +++ llvm/trunk/docs/index.rst @@ -510,13 +510,13 @@ :hidden: CodeOfConduct - Proposals/GitHubSubMod + Proposals/GitHubMove :doc:`CodeOfConduct` Proposal to adopt a code of conduct on the LLVM social spaces (lists, events, IRC, etc). -:doc:`Proposals/GitHubSubMod` +:doc:`Proposals/GitHubMove` Proposal to move from SVN/Git to GitHub.