Index: docs/Proposals/GitHub.rst =================================================================== --- /dev/null +++ docs/Proposals/GitHub.rst @@ -0,0 +1,254 @@ +============================== +Moving LLVM Projects to GitHub +============================== + +Introduction +============ + +This is a proposal to move our current revision control system from our own +hosted Subversion to GitHub. Below are the financial and technical arguments as +to why we need such a move and how will people (and validation infrastructure) +continue to work with a Git-based LLVM. + +There will be a survey pointing at this document when we'll know the community's +reaction and, if we collectively decide to move, the time-frames. Be sure to make +your views count. + +Essentially, the proposal is divided in the following parts: + +* Outline of the reasons to move to Git and GitHub +* Description on what the work flow will look like (compared to SVN) +* Remaining issues and potential problems +* The proposed migration plan + +Why Git, and Why GitHub? +======================== + +Why move at all? +---------------- + +The strongest reason for the move, and why this discussion started in the first +place, is that we currently host our own Subversion server and Git mirror in a +voluntary basis. The LLVM Foundation sponsors the server and provides limited +support, but there is only so much it can do. + +The volunteers are not Sysadmins themselves, but compiler engineers that happen +to know a thing or two about hosting servers. We also don't have 24/7 support, +and we sometimes wake up to see that continuous integration is broken because +the SVN server is either down or unresponsive. + +With time and money, the foundation and volunteers could improve our services, +implement more functionality and provide around the clock support, so that we +can have a first class infrastructure with which to work. But the cost is not +small, both in money and time invested. + +On the other hand, there are multiple services out there (GitHub, GitLab, +BitBucket among others) that offer that same service (24/7 stability, disk space, +Git server, code browsing, forking facilities, etc) for the very affordable price +of *free*. + +Why Git? +-------- + +Most new coders nowadays start with Git. A lot of them have never used SVN, CVS +or anything else. Websites like GitHub have changed the landscape of open source +contributions, reducing the cost of first contribution and fostering +collaboration. + +Git is also the version control most LLVM developers use. Despite the sources +being stored in an SVN server, most people develop using the Git-SVN integration, +and that shows that Git is not only more powerful than SVN, but people have +resorted to using a bridge because its features are now indispensable to their +internal and external workflows. + +In essence, Git allows you to: + +* Commit, squash, merge, fork locally without any penalty to the server +* Add as many branches as necessary to allow for multiple threads of development +* Collaborate with peers directly, even without access to the Internet +* Have multiple trees without multiplying disk space. + +In addition, because Git seems to be replacing every project's version control +system, there are many more tools that can use Git's enhanced feature set, so +new tooling is much more likely to support Git first (if not only), than any +other version control system. + +Why GitHub? +----------- + +GitHub, like GitLab and BitBucket, provide free code hosting for open source +projects. Essentially, they will completely replace *all* the infrastructure that +we have today that serves code repository, mirroring, user control, etc. + +They also have a dedicated team to monitor, migrate, improve and distribute the +contents of the repositories depending on region and load. A level of quality +that we'd never have without spending money that would be better spent elsewhere, +for example development meetings, sponsoring disadvantaged people to work on +compilers and foster diversity and equality in our community. + +GitHub has the added benefit that we already have a presence there. Many +developers use it already, and the mirror from our current repository is already +set up. + +Furthermore, GitHub has an *SVN view* (https://github.com/blog/626-announcing-svn-support) +where people that still have/want to use SVN infrastructure and tooling can +slowly migrate or even stay working as if it was an SVN repository (including +read-write access). + +So, any of the three solutions solve the cost and maintenance problem, but GitHub +has two additional features that would be beneficial to the migration plan as +well as the community already settled there. + + +What will the new workflow look like +==================================== + +In order to move version control, we need to make sure that we get all the +benefits with the least amount of problems. That's why the migration plan will +be slow, one step at a time, and we'll try to make it look as close as possible +to the current style without impacting the new features we want. + +Each LLVM project will continue to be hosted as separate GitHub repositories +under a single GitHub organisation. Users can continue to choose to use either +SVN or Git to access the repositories to suit their current workflow. + +In addition, we'll create a repository that will mimic our current *linear +history* repository. The most accepted proposal, then, was to have an umbrella +project that will contain *sub-modules* (https://git-scm.com/book/en/v2/Git-Tools-Submodules) +of all the LLVM projects and nothing else. + +This repository can be checked out on its own, in order to have *all* LLVM +projects in a single check-out, as many people have suggested, but it can also +only hold the references to the other projects, and be used for the sole purpose +of understanding the *sequence* in which commits were added by using the +``git rev-list --count hash`` or ``git describe hash`` commands. + +One example of such a repository is Takumi's llvm-project-submodule +(https://github.com/chapuni/llvm-project-submodule), which when checked out, +will have the references to all sub-modules but not check them out, so one will +need to *init* the module manually. This will allow the *exact* same behaviour +as checking out individual SVN repositories, as it will keep the correct linear +history. + +There is no need to additional tags, flags and properties, or external +services controlling the history, since both SVN and *git rev-list* can already +do that on their own. + +We will need additional server hooks to avoid non-fast-forwards commits (ex. +merges, forced pushes, etc) in order to keep the linearity of the history. + +Access will be transfered one-to-one to GitHub accounts for everyone that already +has commit access to our current repository. Those who don't have accounts will +have to create one in order to continue contributing to the project. In the +future, people only need to provide their GitHub accounts to be granted access. + +In a nutshell: + +* The projects' repositories will remain identical, with a new address (GitHub). +* They'll continue to have SVN access (Read-Write), but will also gain Git RW access. +* The linear history can still be accessed in the (RO) submodule meta project. +* Individual projects' history will be local (ie. not interlaced with the other + projects, as the current SVN repos are), and we need the umbrella project + (using submodules) to have the same view as we had in SVN. + +Additionally, each repository will have the following server hooks: + +* Pre-commit hooks to stop people from applying non-fast-forward merges +* Webhook to update the umbrella project (via buildbot or web services) +* Email hook to each commits list (llvm-commit, cfe-commit, etc) + +Essentially, we're adding Git RW access in addition to the already existing +structure, with all the additional benefits of it being in GitHub. + +What will *not* be changed +-------------------------- + +This is a change of version control system, not the whole infrastructure. There +are plans to replace our current tools (review, bugs, documents), but they're +all orthogonal to this proposal. + +We'll also be keeping the buildbots (and migrating them to use Git) as well as +LNT, and any other system that currently provides value upstream. + +Any discussion regarding those tools are out of scope in this proposal. + +Remaining questions and problems +================================ + +1. How much the SVN view emulates and how much it'll break tools/CI? + +For this one, we'll need people that will have problems in that area to tell +us what's wrong and how to help them fix it. + +We also recommend people and companies to migrate to Git, for its many other +additional benefits. + +2. Which tools will need changing? + +LNT may break, since it relies on SVN's history. We can continue to +use LNT with the SVN-View, but it would be best to move it to Git once and for +all. + +The LLVMLab bisect tool will also be affected and will need adjusting. As with +LNT, it should be fine to use GitHub's SVN view, but changing it to work on Git +will be required in the long term. + +Phabricator will also need to change its configuration to point at the GitHub +repositories, but since it already works with Git, this will be a trivial change. + +Migration Plan +============== + +If we decide to move, we'll have to set a date for the process to begin. + +As usual, we should be announcing big changes in one release to happen in the +next one. But since this won't impact external users (if they rely on our source +release tarballs), we don't necessarily have to. + +We will have to make sure all the *problems* reported are solved before the +final push. But we can start all non-binding processes (like mirroring to GitHub +and testing the SVN interface in it) before any hard decision. + +Here's a proposed plan: + +STEP #1 : Pre Move + +0. Update docs to mention the move, so people are aware the it's going on. +1. Register an official GitHub project with the LLVM foundation. +2. Setup another (read-only) mirror of llvm.org/git at this GitHub project, + adding all necessary hooks to avoid broken history (merge, dates, pushes), as + well as a webhook to update the umbrella project (see below). +3. Make sure we have an llvm-project (with submodules) setup in the official + account, with all necessary hooks (history, update, merges). +4. Make sure bisecting with llvm-project works. +5. Make sure no one has any other blocker. + +STEP #2 : Git Move + +6. Update the buildbots to pick up updates and commits from the official git + repository. +7. Update Phabricator to pick up commits from the official git repository. +8. Tell people living downstream to pick up commits from the official git + repository. +9. Give things time to settle. We could play some games like disabling the SVN + repository for a few hours on purpose so that people can test that their + infrastructure has really become independent of the SVN repository. + +Until this point nothing has changed for developers, it will just +boil down to a lot of work for buildbot and other infrastructure +owners. + +Once all dependencies are cleared, and all problems have been solved: + +STEP #3: Write Access Move + +10. Collect peoples GitHub account information, adding them to the project. +11. Switch SVN repository to read-only and allow pushes to the GitHub repository. +12. Mirror Git to SVN. + +STEP #4 : Post Move + +13. Archive the SVN repository, if GitHub's SVN is good enough. +14. Review and update *all* LLVM documentation. +15. Review website links pointing to viewvc/klaus/phab etc. to point to GitHub + instead.