This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
-
Proposals/
171/182
GitHubMove.rst
-
GitHubSubMod.rst
-
index.rst

Differential D24167

Moving to GitHub - Unified Proposal
ClosedPublic

Authored by mehdi_amini on Sep 1 2016, 4:19 PM.

Download Raw Diff

Details

Reviewers

dexonsmith

Commits

rG647deb8f1a2d: Moving to GitHub - Unified Proposal
rL284077: Moving to GitHub - Unified Proposal

Summary

This document described the proposal to move to GitHub, and includes the two proposals side-by-side with a comparison between the two.
It also goes through various workflow examples, presenting the current set of commands following by the ones involved in each of the two proposals.

It is intended to supersede the previous "submodule proposal" document entirely, and drive the design of a survey addressed to the community.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

rengolin added inline comments.Sep 2 2016, 1:26 AM

docs/Proposals/GitHubMove.rst
161	IMO, we don't need to keep the same format, but that's a good point. Though, it would be better to outline the two options in one quick phrase than leave the other implied.

mehdi_amini marked 4 inline comments as done.Sep 2 2016, 1:46 AM

mehdi_amini added inline comments.

docs/Proposals/GitHubMove.rst

298

I added "with the sequence" following your comment to make it more clear.

425

I'm not sure I follow: AFAIK recursive is for nested submodules, which is not part of the proposal. So to be clear I expect --recursive to be a no-op. I can be wrong, but I'll need some more explanation if I missed something obvious here.

If your point is about cloning *all* the sub-projects and not only just a selected list, then --recursive is not the right option, just doing git submodule update without any other flag will do it. I'll spell it out.

618

I'm sorry I don't follow. You mention a changed in the flow for commit. Here is what's mentioned in the section I referred to, can you clarify where is the inaccuracy?

Workflow today:

# direct SVN checkout
svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
# or using the read-only Git view, with git-svn
git clone http://llvm.org/git/llvm.git
cd llvm
git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
git config svn-remote.svn.fetch :refs/remotes/origin/master
git svn rebase -l  # -l avoids fetching ahead of the git mirror.

Workflow after (copy/paste):

A second option is to use svn via the GitHub svn native bridge::

  svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt  —username=...

This checks out only compiler-rt and provides commit access using "svn commit",
in the same way as it would do today.

Finally, you could use *git-svn* and one of the sub-project mirrors::

  # Clone from the single read-only Git repo
  git clone http://llvm.org/git/llvm.git
  cd llvm
  # Configure the SVN remote and initialize the svn metadata
  git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=...
  git config svn-remote.svn.fetch :refs/remotes/origin/master
  git svn rebase -l


In this case the repository contains only a single sub-project, and commits can
be made using `git svn dcommit`, again **exactly as we do today**.

philip.pfaffe added a subscriber: philip.pfaffe.Sep 2 2016, 2:20 AM

kparzysz added a subscriber: kparzysz.Sep 2 2016, 9:46 AM

kparzysz added inline comments.

docs/Proposals/GitHubMove.rst
199	and preserve the history.

In D24167#532327, @jlebar wrote:

Wait a second. We're choosing between two proposals. The three of us here are among the experts.

Assuming that we're somehow experts on workflows above other contributors seems a bit presumptuous to me. Keep in mind the difference in the proposals isn't Git, so being a Git expert (which I'm certainly not) isn't really relevant. I'm not an expert on other people's workflows, so I would prefer if we approach this from a perspective of providing information and allowing people to form their own opinions.

We absolutely should be comparing the two proposals explicitly, to draw users' attentions to the differences that we think are important. Because we actually know something that others don't!

Really? No offense, but this comes off as incredibly condescending. You're asserting that you know better than everyone else. I'm not saying we shouldn't help people compare the proposals. I'm saying we shouldn't draw conclusions for them. Our community is filled with a lot of really smart people and I strongly believe they are capable of forming their own informed decisions.

I don't want a one-sided fight, where only the monorepo or multirepo cadre gets to have its say. But if you believe that debate leads to better outcomes, then absolutely we should compare and advocate, which is just another way of explaining why, in our view, one proposal is better than the other.

Debate is great. The LLVM.org documentation on the proposal isn't the place for it. Debate it at the social, debate it in your office, debate it on the lists and on IRC, debate it on your livejournal (I think that's still a thing right?). I believe the proposals should be neutral. I believe the documentation on LLVM.org should be position agnostic geared toward informing people without trying to directly influence them.

Can we just have these as separate sections in one document? That's almost what we have here.

Four documents is a lot to ask people to read and understand. At least with one, they can skip around, etc... And it will also be easier to review and edit.

I disagree. There are a number of advantages to multiple focused documents, and one of them is that they will be smaller and easier to review. Also shorter documents are generally easier to digest because you can pick one up and read it in a few minutes, and they provide break points.

I think we could agree on a section that lays out the monorepo and multirepo proposals in a dry way, explaining what each looks like. Then we could have the "why monorepo" and "why multirepo" sections. Finally we could have the workflow comparison.

I would have no objection to this assuming that the two "why" sections and workflow comparisons are also neutrally written and avoid direct references to the other proposals.

In D24167#532395, @jlebar wrote:

Could we just add a similar section advocating for the multirepo and be done here, move on with our lives?

No. I don't believe advocacy documents of this nature have any place on llvm.org. I know it isn't your intention, but your document as written is little better than propaganda. It is filled with half-truths, assertions that aren't backed by fact, and slanted language designed to influence the reader to your opinion. To provide some examples:

SVN users would be more disrupted in the multirepo approach.

I disagree. I think that the mono-repo is more disruptive. But, that is my opinion, not backed by fact. You present this as a fact, and it is clearly a subjective opinion.

Because all of our subprojects would live in a single repository, we could move code between them easily and without losing history.

This can actually be done with a multi-repo approach too using sub-tree merging. While it may not be as easy, it doesn't lose history.

With the multirepo, moving clang-tools-extra into clang would be much more complicated, and might end up loosing history.

Same as above. You don't need to lose history to do this. Yes, it is more complicated, but it is a one-time cost.

Look, at the end of the day I care *way* more about how our community arrives at a decision rather than the specifics of the decision. I believe that are community is filled with intelligent people that are capable of drawing their own opinions, and that those opinions are equally or more valuable than my own. As a result I believe that the information we should be providing to the community for consideration of these proposals should be the best possible effort at being non-biased and impartial.

You may disagree with some or all of that, which is certainly your prerogative. What I'm trying to push for here is a framework for how to construct documents for these proposals that strive to be non-biased. Is it possible to write non-biased documents that refer to each other? Sure. Is it possible to write biased documents that *don't* directly refer to each other? Sure. My point is that it is *easier* to write non-biased documents if they don't directly relate themselves against their opposition.

-Chris

probinson added inline comments.Sep 2 2016, 11:03 AM

docs/Proposals/GitHubMove.rst
521	Very nicely succinct. One typo: scripts -> script

Chris, I am really happy to work with you to make sure that you're happy that the parts of this document that are explicitly not advocating a position come across as dry and factual. I agree that parts of this document don't come across that way, and, where we're not explicitly advocating for a position, they should be changed. This was actually my explicit feedback to Mehdi when I reviewed his document, and also when he sent me this review yesterday.

But I admit to being flummoxed by what I read as rage here against the idea that we would allow advocates to explain their reasoned positions in a document posted to llvm.org. (Do you burn your California voter's guide for this reason?) Indeed, I am even more confused because the rage seems to be only aimed at the parts of this document that compare the monorepo and multirepo, but not at the parts that compare git and svn while clearly advocating for one side.

I also am really confused by the idea that somehow explaining why I think X is true prevents someone else from making up their own mind. It seems to me that *not* explaining my arguments would actually prevent people from making an informed decision.

But you're clearly very upset by this, and I don't think it is worth arguing further. Frankly I'm already feeling fight-or-flight here, and if the rhetoric escalates further, I'm afraid I won't feel welcome in this discussion at all.

Would you be amenable to a compromise, Chris? How about we link to advocacy statements from this document? Would that be acceptable to you?

Justin,

I apologize if my frustration comes off as rage. I believe that healthy debate is valuable and should be welcomed, but I also believe that debate has a place, and that this isn't debate. While I don't burn my CA voter's guide, it at least is a bit more honest about the fact that it publishes paid opinions (which I generally don't read). If you would prefer to follow the format of the CA voter's guide I'd feel way more comfortable with this. The idea of providing a dedicated space for arguments and rebuttals that are clearly labeled as the opinions of a person or group of people is not objectionable to me. I object to the intermingling of opinion and fact that blurs the lines between the two.

Comparing the proposals isn't the problem. The problem is that this document draws conclusions from the comparisons. All of that is fixable, and I spoke with Mehdi before he posted this review and voiced my concerns and willingness to help improve the document.

The problem is also not you thinking something is true, the problem is the document doesn't say "@jlebar thinks the mono repo is better because ...", it says "the mono repo is better because ...". These documents will be consumed by many people who are not following the review or dev list threads on the topic, which means that separating the opinion and fact will be very difficult for most of the audience.

I apologize for any comments I've made that have made you feel unwelcome in this conversation. It is not my intent. The discussions around moving to Git have caused a lot of passionate discourse which has already been way too unwelcoming to many members of our community. In the future I will strive to keep my responses less impassioned.

-Chris

In D24167#533142, @beanz wrote:

In D24167#532327, @jlebar wrote:

Wait a second. We're choosing between two proposals. The three of us here are among the experts.

Assuming that we're somehow experts on workflows above other contributors seems a bit presumptuous to me. Keep in mind the difference in the proposals isn't Git, so being a Git expert (which I'm certainly not) isn't really relevant. I'm not an expert on other people's workflows, so I would prefer if we approach this from a perspective of providing information and allowing people to form their own opinions.

I think this is more about addressing people's needs: we can't know everyone else's workflow, but we (well, not me) can certainly provide the answers to "how do I do this in git". In that sense being a git expert definitely helps.

We absolutely should be comparing the two proposals explicitly, to draw users' attentions to the differences that we think are important. Because we actually know something that others don't!

Really? No offense, but this comes off as incredibly condescending. You're asserting that you know better than everyone else. I'm not saying we shouldn't help people compare the proposals. I'm saying we shouldn't draw conclusions for them. Our community is filled with a lot of really smart people and I strongly believe they are capable of forming their own informed decisions.

I am not an expert in either git or svn, nor do I want to become one for the sole purpose of making an informed decision here. I cannot predict all possible consequences of either approach that could affect my work, and I do welcome a summary of the differences with their potential impact. Moreover, it may very well be that the differences in the proposed solutions will not be significant to me and my workflow. In that case, I would give preference to the solution that helps the the rest of the "core" developers.

Thanks a lot for the apology, Chris.

If you would prefer to follow the format of the CA voter's guide I'd feel way more comfortable with this. The idea of providing a dedicated space for arguments and rebuttals that are clearly labeled as the opinions of a person or group of people is not objectionable to me. I object to the intermingling of opinion and fact that blurs the lines between the two.

I am 100% onboard with this. In fact, it's what I was trying to get at in the first place when I wrote:

I think we could agree on a section that lays out the monorepo and multirepo proposals in a dry way, explaining what each looks like. Then we could have the "why monorepo" and "why multirepo" sections. Finally we could have the workflow comparison.

I also don't like the half-advocacy position taken in parts of this document, particularly "one or many repositories?" When I revised Mehdi's proposal, I structured it more like you suggested, being explicit when we're advocating for one side or the other.

If you're amenable, I'm actually kind of interested in doing a postmortem (offline) to figure out how I could have communicated that better. Would have saved us both some grief, I think. In any case I'm really sorry for getting you worked up over something we agree on. And I greatly appreciate your apology. I too will try to stay a bit more calm. :)

Hi folks,

Deviating a bit from the conflict, I'd like to show how all of us agree on the same things basically...

In D24167#533142, @beanz wrote:

I believe the proposals should be neutral. I believe the documentation on LLVM.org should be position agnostic geared toward informing people without trying to directly influence them.

I agree with you, and I think that's what Justin was going for. I also see Mehdi's text as an attempt to do that, but it's hard to do it (I certainly can't) while we have an opinion (and we all have some).

So, the way I see it, we're arguing over specifics. There's no need.

I can see two ways this will go down without exploding again:

We "sanitise" the text from our biases the best we can and expect people to understand it. I'm not saying the intention was to be biased, just that humans bias. By having more points of view (this review is a good start), we "clean it up" a bit. Ie. we just continue doing this review as is, until everyone is happy "enough". (the quotes are important, as those words can have multiple meanings, and I mean the best of them). This will have the cost of re-reviewing what was agreed before (the sub-mod proposal), but we can make it simpler by having a clear table to workflow (as I proposed earlier) and a very short list of pros and cons (as was proposed on the list).

We do the original plan, to have two completely separate sections (on the same document). The sub-mod section copied (+ some workflow) from the other text, the mono-repo as a filter from this text. It'd still be good to have a table at the end, though. The additional cost is re-do a lot of what Mehdi has done, but the benefit is that people that spent time discussing the other proposal won't have to do the whole thing again here.

I don't mind either way, but would be good to pick one and stick with it.

I would have no objection to this assuming that the two "why" sections and workflow comparisons are also neutrally written and avoid direct references to the other proposals.

I agree this has to be avoided, no matter how we organise the document. Justin's point that "this is a comparison" is very fair, but IIGIR, Chris' point was surrounding "bias comparison" (ex. "A is more complicated than B").

Workflows are all different, and people find different things complicated. Let's just lay out the independent facts about each one, even if the text becomes a bit brittle, and let people do their own comparisons.

I don't believe advocacy documents of this nature have any place on llvm.org.

So, removing all feelings around the phrase and the moment, I think there's a lot of meaning in this statement.

The OSS LLVM community has prided in not pushing an agenda towards anyone. The individual contributions (company or personal) do have agendas, and they're synced upstream, and the discussions are massively technical in nature. There is obviously a power play, as in any other community (OSS or not), but here, we have always valued technical arguments over everything else.

The GitHub problem is a technical one, but also a personal one. Technical problems, with technical solutions, but very rooted on how companies and individuals develop, validate and deliver their products. It's very easy to let our biases of "this is much easier than that" unintentionally blind us, and we have to be careful.

The first rule of thumb is to not assume, for any reason, that we're experts. Most people in our community could have been doing the job of investigating this issue, but they're not. That's not to say that whatever opinion *we* reach will be the best, just that whatever we present has to be clear, concise and technically accurate, so *they* can take their own decisions.

We're not here to decide what's best, but to digest and present the information in the simplest form possible, so everyone can decide what's best for themselves.

SVN users would be more disrupted in the multirepo approach.

I disagree. I think that the mono-repo is more disruptive. But, that is my opinion, not backed by fact. You present this as a fact, and it is clearly a subjective opinion.

This is a very good example why we can't use biased comparison like "A would be more disruptive than B". This is a personal opinion, not an invariant fact to all members of our community. We should not have any of that on this document, or people will not take it seriously and we won't have the effect we want.

I'm not pointing fingers or trying to start a fight, I'm just making clear that this discussion cannot happen in the text.

We need simplicity and clarity. Despite our best efforts, this text is not there yet, and it's no one's fault. I think Mehdi, Justin and Chris have done a remarkable job at collating and discussing all the facts, and I think a lot of people in the community *really* appreciate it. I certainly do.

But now it's time to stop discussing how the *best* workflow looks like, and start clearing up the proposal.

In my personal view, Mehdi's description and workflow examples are great. We just need to validate the sub-module part with the previous proposal, and work on the formatting.

cheers,
-renato

docs/Proposals/GitHubMove.rst
618	This is how it would work on a multi-repo, but this section is talking about the mono-repo. IIGIR, on a mono-repo, developers of a single component will have to commit back on the mono-repo, which will then be propagated to the individual (read-only) repos, no?

Address most minor comments

mehdi_amini added inline comments.Sep 3 2016, 2:09 PM

docs/Proposals/GitHubMove.rst
425	I added a comment mentioning that the list if optional. Let me know if I misunderstood something about --recursive above.
618	This is how it would work on a multi-repo I'm not totally sure what is "This" referring to? Assuming it is about my previous paste, then no it describes the monorepo. IIGIR, on a mono-repo, developers of a single component will have to commit back on the mono-repo, which will then be propagated to the individual (read-only) repos, no? Right, and this is the same thing as what a git-svn developer do today: git clone the individual repo configure git svn to point to the SVN repo (the one from the monorepo in the future). commit through SVN the commits are propagated to the individual repo.

Lots of inline comments.

docs/Proposals/GitHubMove.rst
64	The language here is also misleading. Maybe change to something like: Many new coders nowadays start with Git, and a lot of people have never used SVN, CVS, or anything else.
69	I would remove the "(most?)" bit here because it doesn't really add any value. We have no data to support an assertion of "most", and it could be misleading to suggest it.
80	Can we also add this as a point: Maintain remote forks and branches on Git hosting services and easily integrate back to the main repository. In particular for people that maintain out-of-tree code or forks, the ability to seamlessly merge between repositories is a big win for Git.
138	Can we also add something about the more traditional Git approaches to this? Maybe something like: Additionally, there are simple Git commands that can also be used to determine the order of commits. For example to answer the question is a bug fixed in <hash-a> fixed in a compiler built at <hash-b> can be answered with the command `git rev-list <hash-a>..<hash-b> --count`. If this prints a number greater than 0, the fix is contained in <hash-b>. Additionally if we were to use Git tags similarly to how we use SVN tags today you would be able to identify which releases contained a fix by running `git describe --contains <hash>`.
193	This is a completely subjective statement, and should not be present.
203	With git history could be preserved even across repositories. Git subtree merges support this, and while it isn't as simple, it is a one-time cost.
206	As in my other comment, losing history is not an issue.
209	Actually, there were also concerns about the increased burden for contributors not just downstream users. In general I think this entire section is designed to point out supporting arguments for the mono-repo with no recognition of the merits of the multi-repo proposal.
231	This is very slanted wording. From a user perspective the multi-repo solution to this problem is not much more complicate than the mono-repo solution.
422	Nit: "enters the dance" implies complexity.
453	Not sure I agree this is easy for svn users. To my knowledge llvm.org doesn't even document how to checkout the SVN repositories in a way to make this possible.
519	Additionally users of the umbrella repo can use `git submodule foreach` to have single command workflows that nearly match the mono-repo proposal.
571	This is inaccurate. Even though my rough prototype of the git umbrella repo doesn't have each submodule update being a single commit that was the stated plan for how the umbrella would be updated. That means each umbrella repo commit would represent a single commit to a single subproject, so your bisection granularity is comparable.
582	Better to say "both proposals will allow you to continue to use SVN". The wording here makes it seem like only the mono-repo has GitHub's SVN support, even though that is later contradicted.
621	This is a subjective statement that I don't believe is factually accurate. We could easily teach the build system to checkout subprojects so that building a full toolchain could be `git clone ... && configure && build` regardless of the repository layout.
628	I'm confused by this. The sub-project mirrors are read-only, so the workflow is either checkout the full mono-repo or use Git-SVN. That doesn't sound unchanged.
636	It is worth noting (as I did when I sent this out) that this was a very rough prototype, and it doesn't solve all the problems that we would expect a more permanent solution to solve. For example, the submodule update is periodic, not on a push-based notification, and the scripting around it doesn't do a single commit per update, which was the intended solution.

emaste added a subscriber: emaste.Sep 9 2016, 12:49 PM

emaste added inline comments.Sep 9 2016, 12:58 PM

docs/Proposals/GitHubMove.rst
11	A little point, but I think we should say "why we're proposing such a move" or similar. "why we need such a move" in the first paragraph of the document implies the decision is already made, and might discourage those against change from even responding.

Replace "why we need to move" with "why we are proposing to move".

Address beanz' comments.

mehdi_amini added inline comments.Sep 9 2016, 4:07 PM

docs/Proposals/GitHubMove.rst
139	I'm not against mentioning this somewhere, but the "traditional" Git approach of hashes does not address at all the concerns mentioned right above.
194	Rewrote, but I suspect we'll need some other rounds. Suggestion welcome.
204	Can you provide an example where the history of a single file contents can be preserved without pulling all the source repository entirely? I'd like to try it and see how git log/git blame deals with that.
210	You're welcome to suggest merits of the multi-repo proposal to balance.
232	Please provide a replacement for this sentence.
454	Do you have an alternative to suggest?
572	If you have a way to guarantee it, I'm willing to hear about it. Right now, I don't believe it is possible without implementing it on the git hosting itself.
583	I did a minor rewording (we're on a different support level here between the two solutions, which need to be conveyed somehow).
622	Removed the paragraph
629	We're talking about libcxx in the monorepo proposal? Assuming yes, can you give an example of workflow that would be changed compared to today?
637	(Already addressed above)

I'll work up some suggested alternate phrases this weekend or early next week, but I have some responses inline.

docs/Proposals/GitHubMove.rst
205	Google is our friend -> http://stackoverflow.com/questions/1365541/how-to-move-files-from-one-git-repo-to-another-not-a-clone-preserving-history
211	I don't think that our proposals should be constructed as convoluted arguments between contributing authors. Adding pro multi-repo statements will only make this more difficult to grok. I actually think there is very little in this section that shouldn't be part of an "arguments/rebuttals" section.
573	You can absolutely guarantee the same granularity. You can't guarantee the same ordering, but generally speaking that is significantly less important than granularity. To get the same granularity you allow the script that updates submodules to produce more than one commit to the submodule repo at a time. If there are multiple you can sort them by committer date. While committer date isn't a great thing to use since our proposals both depend on maintaining a linear history it should be good enough for the common cases because committer date gets reset on rebase.
630	Ah. I think the confusing phrasing is that monorepo is being used in two contexts. Maybe rephrase this to something like: With this variant of the monorepo proposal developers who only work on excluded sub-projects will continue to use the single-project repositories. The workflow is still changed from today, because today we're using SVN.
638	I'd like to see that mentioned here as well. This document is quite large and people may jump around reading it. It is worth having the note directly next to the link.

mehdi_amini marked 4 inline comments as done.Sep 9 2016, 4:58 PM

mehdi_amini added inline comments.

docs/Proposals/GitHubMove.rst
205	I don't see `git subtree` at work on this link, just `filter-branch` + `git mv` + merge. That flow tracks the history of a file, not its content AFAIK (i.e. if a function was moved from another file into the current one, the history of when/why this function added/modified won't be included). Also, what would be the effect of moving a file from a repo to another, and later back to the original repo?
211	There is nothing convoluted here. Adding fact-based pro multi-repo statement will make it easier to understand. I disagree, I think most of this section should stay here. So we'll have to go in the specifics, piece by piece.
573	but generally speaking that is significantly less important than granularity. No sorry, I can't agree, this is critical: correctness goes before usability. It seems to me that you're willing to trade correctness to bring a guarantee of usability here. I'm willing to believe that "in practice" the granularity should be small enough, it just has to be worded carefully. Right now it is a parenthesis at the end: `(it is possible that one commit in the umbrella repository includes multiple commits in the sub-projects)` , we can reword this `(it is possible that one commit in the umbrella repository includes multiple commits in the sub-projects, though it should be occasional in practice)` (One may bikeshed on what exactly "occasional" is though, but we don't have any data to bikeshed efficiently anyway).
630	Sorry, the sentence is really about the monorepo: leaving libcxx within the monorepo should not be a regression compared to today.

mehdi_amini marked 22 inline comments as done.Sep 26 2016, 2:29 PM

mehdi_amini added inline comments.

docs/Proposals/GitHubMove.rst
213	Can you clarify what you're referring to exactly and/or suggest some editing?
233	(Tried to make it more explicit that complexity is handled by the infrastructure)

Address the remaining outstanding comments (but 1 or 2 maybe)

Minor fix.

mehdi_amini updated this revision to Diff 73090.Sep 30 2016, 10:25 AM

Any other comments? Otherwise we should move forward.

Lots of comments inline, and one meta-comment.

Looking at the details of the mono-repo proposal "use the GitHub SVN" interface is the answer to a lot of workflows. How would the Git-SVN workflows be impacted by moving to a PR-based workflow? I assume it works fine, you just create SVN branches and commit to them then make the PR via the web UI. Is that correct?

I know we're not actually considering a PR-based workflow, but it is something to consider.

docs/Proposals/GitHubMove.rst
223	What about the concerns about active community members having this burden?
250	Remove "(with some granularity)". The multi-repo proposal can have the same 1:1 mapping of commits in per-project repos to umbrella commits that the mono-repo would have. When the update job runs with a list of more than one commit we can sort them by committer timestamp (which is updated after rebase). It will provide a roughly linear timeline for the commits to be sorted across the repositories. It won't be perfect, but it should be good enough for sorting commits in close proximity because the pushed commits will either be rebased (which updates the committer timestamp) or they will be merge commits which will have a committer timestamp generated when the merge commit was generated.
254	I would say 'continuously' rather than periodically here. You describe in more detail below how the notifications would be configured and 'periodically' isn't a full picture.
269	s/interacts/would interact/
335	You've lost me here. Checking out all the projects in SVN today involves multiple svn co commands. Unless there is some magic in SVN I'm unaware of. If there is such magic we should document it somewhere on LLVM.org (maybe on the getting started page?) and link to it here.
351	Can you please add actual size numbers for each project and the mono-repo? Just saying '2x' isn't super meaningful without knowing the size of 1x.
388	the emphasis on 'exactly as we do today' is unnecessary.
446	Alternatively since our intention is to enforce a linear history in the repositories doing a checkout by timestamp using the format below should also work in the majority of cases. git checkout 'master@{...}'
462	Again, I don't follow how this is easy. There is no documentation on LLVM.org explaining how to do this and my limited knowledge of SVN leaves me with no idea how to do it.
467	I would phrase as "It would be possible...", because it most certainly is possible.
472	Please remove "and makes this use case ...", it is a value judgement.
585	If we go with the multi-repo approach we can ensure that each umbrella repo commit will be only one submodule update. This is relatively straight forward tooling to add. The only situation where we could potentially allow multiple updates in a single umbrella commit would be if we wanted to do cross-repository correlating of revlocked changes.
590	The granularity is not finer.
602	Better to say both proposals allow you to continue using SVN the same way, but that each solution will have minor impacts. In the monorepo there will be a one-time change in revision numbers, and in the multi-repo each project will have its own revision numbers out of sync from each other.
608	s/any of the proposal/both of the proposals/
612	Reword from the second sentence on. You're making a value assessment. A better phrasing might be: If your fork touches multiple LLVM projects, migrating your fork into the mono repo would enable you to make commits that touch multiple projects at the same time the same way LLVM contributors would be able to do so.
623	I would phrase the downside as "rewriting the fork's history and changing its commit hashes", because that is what happens.
631	This is a little unclear to me. Do you mean applying the patches via "git apply" from a patch file? Might be worth clarification about how that would work.
642	This makes it sound like the git mirrors are read-write. Might be worth adding a "via Git-SVN" comment to clarify.

beanz added inline comments.Sep 30 2016, 2:30 PM

docs/Proposals/GitHubMove.rst
208	How does the mono-repo do this? It might make it easier, but since it is likely that even with a mono-repo most people won't build all projects I don't think it actually encourages updates across all sub-projects.
215	You still haven't addressed the feedback here. Saying the multi-repo would lose history is still inaccurate. For starters, you're not actually deleting the history from the repository you're moving code from. Also with a multi-repo you can easily preserve the file history by using git filter-branch. Using filter-branch will not follow history across renames that are outside the filter, but will follow them within the filter. For example if you were to use filter branch on lib/Support to break it out into its own repository, filter branch would preserve history of files under lib/Support that are renamed as long as they remain under libSupport. It would not preserve the history of a file being renamed and moved under libSupport. Even with that the history before that point is traceable because the history would still exist in the old repository, so you are not losing history, you just aren't moving it with the file.

Address review.

docs/Proposals/GitHubMove.rst
208	I was thinking about the fact that if I change the API `createTargetMachineFromTriple()`, and `git grep` to find the uses, then all the uses in sub-projects will show up.
215	Fair enough: replaced "losing history" with "the history of the refactored code won't be available from the new place".
223	Can you clarify what you're referring to exactly? (No regression compared to now I believe)
250	It seems to me that at the beginning the idea was that the submodules would be updated every few minutes, so that we'd be able to have rev-locked commits pushed to multiple projects at the same time and have them appear a single umbrella update (with somehow a heuristic like "update the submodules when there hasn't been a push for 2 min"). Apparently your idea is rather than we should update it with single commits, but what's the story for rev-locked? How would the tooling not have a race condition? Example: I commit to LLVM I commit to Clang the script runs, pull LLVM, no change I push to LLVM I push to Clang the script pulls Clang, see my commit the script is done with pulling and update the submodule with the clang change, before the LLVM change, even though the commit date would be reversed. I don't see a principled solution to implement the umbrella without server-side (i.e. native git hook) support. Sure you can craft it, and it'll work fine most of the time, but that does not make it bulletproof.
335	I was referring to: svn co http://llvm.org/svn/llvm-project/ --depth=immediates cd llvm-project/ svn up llvm/trunk clang/trunk libcxx/trunk You can then have a build with only LLVM configured like: mkdir ../build-llvm && cd ../build-llvm cmake ../llvm-project/llvm/trunk And a build dir with llvm+clang: mkdir ../build-clang && cd ../build-clang cmake ../llvm-project/llvm/trunk -DLLVM_EXTERNAL_CLANG_DIR=../llvm-project/clang/trunk/ So that a single `svn up $projects` in the source directory update all the sources and you can still build a subset of the projects from these sources. This is also how I'd synchronize if I was integrating downstream from SVN.
446	This applies to both proposals right? Where do you want me to add this?
462	(Copy/pasted commands above)
462	Copy/pasted above (I'm not sure I really want to document it on llvm.org now).
472	I don't believe so, but if you insist...
571	(I'm waiting for the story to support this above)
585	(I'm waiting for the story to support this above)
590	(I'm waiting for the story to support this above)
602	"The same way" implies "a single SVN revision number to me". One could even say "a single SVN checkout" (cf the command I copy/pasted above). I don't see how it'd work with the multi-repo? How would someone downstream integrating from SVN be able to correlate revision across repositories?
623	The paragraph starts with " Using a script that rewrites history" and end with "changes the fork's commit hashes", it seems to me that this makes explicit that the downside of rewriting history is that the hashes change. (I'm not sure how "rewriting history" is a downside by itself otherwise)

beanz added inline comments.Sep 30 2016, 4:57 PM

docs/Proposals/GitHubMove.rst
208	That is 'making easier' not 'encouraging'. Personally I fall to 'grep' way before I fall to 'git grep' for things like this, and I don't think the monorepo has any enforcement of this.
215	In your example of moving clang-tools-extra there would be no need for loss of history at all. There is no need for filter-branch. You can literally reformat clang-tools-extra to be under tools/extra/ and merge the whole tree into the clang master branch. The only point where you would lose any history at all is if you were trimming one part of a repository into another repository, and even in that situation you can minimize the losses pretty well using filter-branch and index scripts. It is complicated but possible.
223	Ah. I misread. I see what you are saying. This is fine.
250	The automation will run. It will collect a list of commits that have been pushed to each repository since the last time the script ran. It will then sort them by committer timestamp order, and commit one at a time to the umbrella repo as submodule updates. We can setup the automation to run based on GitHub WebHooks, and periodically in case a WebHook gets dropped. There is no race condition that I see. If we need to support revlocked changes, (and I'm not convinced this is the case since they are by far a minority of commits) we can support them via annotations on the commit messages. We can teach the automation to look for markers in the commit message denoting that it is revlocked to other changes, and we can have it group revlocked changes together. There is no need for server-side hooks, and this solution would work as well as any mirroring system. I don't believe there is any need for this solution to be bulletproof, but I see no reason why it cannot be as robust as the single-project mirrors that the mono-repo proposal includes.
335	I can't imagine that is a common workflow. It certainly isn't the documented recommended workflow on llvm.org, so I'm not sure there is value in bringing it into the discussion.
351	Can you add per-project sizes?
446	I think it is worth noting under the multi-repo proposal something along the lines of: Because we will be maintaining a linear history you can perform a timestamp based checkout of each project repository with the following command: git checkout 'master@{...}' Additionally you can use the umbrella repository... If you want to also add the timestamp checkout to the mono-repo proposal, that makes sense too. I just think it is worth noting under the multi-repo proposal that timestamp based checkouts are expected to work due to the linear history requirement, which means you don't need the submodule repo.
462	Fine if you don't want to document it, but I certainly would not describe that as "easy". Especially because if you ever mix up and type "svn up" in the root it starts updating everything. I think this is an incredibly fragile workflow, which is probably why it is also incredibly uncommon.
571	See above.
585	Again, above.
602	Maybe rather than "the same way" "with similar workflows to today"?
623	Fine.

mehdi_amini marked 2 inline comments as done.Sep 30 2016, 5:36 PM

mehdi_amini added inline comments.

docs/Proposals/GitHubMove.rst
208	That is 'making easier' not 'encouraging'. "All the source is there by default" + "making it easier" => why I wrote "encouraging". Personally I fall to 'grep' way before I fall to 'git grep' for things like this, and I don't think the monorepo has any enforcement of this. Not sure why "enforcement" comes into play here?
215	So do you have anything concrete that could be added here, be practical (something we'd be willing to encourage in the future), be understandable by any dev, and not take > 20 lines to describe?
250	The automation will run. It will collect a list of commits that have been pushed to each repository since the last time the script ran. Atomically? There is no race condition that I see. Did you read my sequence 1-7 that describes an example of race? but I see no reason why it cannot be as robust as the single-project mirrors that the mono-repo proposal includes. Define "robust". The single-project mirrors have a very well deterministic algorithm to construct, and reconstruct them at will, you don't have one for the multi-repo. That's not "robust" to me.
351	That'd make a long list, how should it be presented?
446	Are you sure that this command does what you think it does? If I read correctly the doc, it is looking at your reflog, not the history. The right one should be something like `git checkout` `git rev-list -n 1 --before="2009-07-27 13:37" master` I just think it is worth noting under the multi-repo proposal that timestamp based checkouts are expected to work due to the linear history requirement, which means you don't need the submodule repo. OK that wasn't clear to me the first time.
602	I'm still missing what would be similar for someone integrating multiple projects from SVN today (assuming such downstream integrator exists) with the multi-repo?

Add mention of the ability to check out the individual repos according to a timestamp

Ping?

beanz added inline comments.Oct 3 2016, 10:05 AM

docs/Proposals/GitHubMove.rst
208	"All the source is there by default" This is what makes it easier. Your math is double counting it. I disagree with your wording here. I've told you I disagree. You can continue to disregard my feedback or you can fix it. The choice is yours.
215	You gave an example that is factually incorrect. I'm asking you to fix it. That is concrete. In my earlier comment I told you why your example was incorrect. You can remove the example, or come up with an alternative. That is your choice. What you cannot do, is use this factually inaccurate example.
250	I've updated my automation (https://github.com/llvm-beanz/llvm-submodules) to make one umbrella commit per commit to sub-project repository. This has a single commit granularity. That was the original point I was arguing. It works. It is done. Is it perfect? No. There are a number of situations where the order of the commits to the submodule can be impacted by the order and proximity of commits to the project repositories. That is irrelevant to the point I was making. I'm more than happy to debate with you about whether or not that matters, but that is a separate issue from what I was pointing out. Do we need to belabor this further, or will you update the document based on my feedback?
351	However you think it is best presented. A table would seem fitting. You could put it below and have a link down to it. I think that if you're bringing size into the discussion you need to provide sufficient data.
446	You are correct, you need to use `rev-list` to get the commit hash.
602	I strongly suspect that very few users are using a single SVN checkout that contains more than one sub-project. If you discount that workflow, the workflow for interfacing using the GitHub SVN bridge is very similar whether you are using one repo or many. Additionally, with the mono repo the combined SVN workflow is actually a lot better than with SVN today. It is way less fragile since you aren't doing sub-directory checkouts. This means you don't run the risk of inadvertently running `svn up` and pulling down way more than you wanted.

mehdi_amini added inline comments.Oct 3 2016, 11:43 AM

docs/Proposals/GitHubMove.rst
208	"All the source is there by default" This is what makes it easier. Sorry, but I mentioned earlier `git grep` and you answered `That is 'making easier'`. All the source presents by default is more than making it easier. I disagree with your wording here. I've told you I disagree. I strongly disagree with your disagreement here.
215	The current spelling (Friday, 3:51pm) is: "With the multirepo, moving clang-tools-extra into clang would be more complicated than a simple `git mv` command, and the history of the refactored code won't be available from the new place." I can change the example to: "Refactoring some functions from clang to make it a utility in one of the llvm/lib/Support file to share it across sub-projects wouldn't carry the history of the code in the llvm repo." That said, I asked you on 9/9 (over 3 weeks ago) "Can you provide an example where the history of a single file contents can be preserved without pulling all the source repository entirely? I'd like to try it and see how git log/git blame deals with that." You haven't been able to provide me with this. So you can claim whatever you want about "factual innacuracy", you still failed to provide counter facts to support your claim.
250	You're moving goal posts. Your previous message said that there is no race, while now you're eluding it with "There are a number of situations where...". Also you're changing the definition of the multi-repo as I was foreseeing it. I think it is worse, and if we were to adopt the multi-repo proposal, I would be totally against this. Now, just to please you, because again I don't think it does any good to this proposal, I'll re-formulate making clear that: update in the multi-repo are single commits based. commits can be in different orders. it does not handle cross-project commits.
602	If you discount that workflow, the workflow for interfacing using the GitHub SVN bridge is very similar whether you are using one repo or many. "Very similar" is subjective, to me it can't be similar as long as there is no longer a single revision number. Additionally, with the mono repo the combined SVN workflow is actually a lot better than with SVN today. It is way less fragile since you aren't doing sub-directory checkouts. This means you don't run the risk of inadvertently running svn up and pulling down way more than you wanted. I don't understand what you mean here.

kparzysz added inline comments.Oct 3 2016, 12:41 PM

docs/Proposals/GitHubMove.rst
356	Even with sparse checkout? Am I going to see new files in projects that were not originally included in the sparse checkout?
367	A conflicting change would have to affect the same file. This is regardless of whether it's monorepo or multirepo. Am I missing something here? Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system will just add the change on top of the current ToT, even if they have not been fetched to the local repo.

jlebar added inline comments.Oct 3 2016, 1:41 PM

docs/Proposals/GitHubMove.rst
356	What do you mean by "see"? In order to push a commit without `-f`, the commit's parent commit must be the current remote head. The commits in git are unaffected by sparse checkout. So, if you have a commit you want to push, you will need to rebase it atop current remote HEAD -- you'll have to do this rebase even if you're using sparse checkouts and all of the changes between your current base revision and current remote HEAD are to subprojects that you don't have checked out. If you don't like this, you can continue to use the single-subproject mirrors exactly as you currently do (with git-svn and everything), by changing the configs as explained elsewhere in this document. But I've been using a monorepo (http://github.com/llvm-project/llvm-project) for months now. I've pushed maybe 30 commits using my custom script (https://github.com/jlebar/llvm-repo-tools) and this necessity to rebase hasn't once been an annoyance for me.
367	Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system will just add the change on top of the current ToT, even if they have not been fetched to the local repo. That is what git-svn will do, yes. But that's not pure git's behavior.

mehdi_amini added inline comments.Oct 3 2016, 1:52 PM

docs/Proposals/GitHubMove.rst
356	Even with sparse checkout? Am I going to see new files in projects that were not originally included in the sparse checkout? If you mean are you seeing them when typing `ls` in your terminal, then no you don't. I can add "unless you're using a sparse checkout" to make it more clear.
367	A conflicting change would have to affect the same file. This is regardless of whether it's monorepo or multirepo. Am I missing something here? The point was that when you run `git pull --rebase`, you have new changes, and even without an explicit "diff conflict" your changes that you're about to push may use an API that have changed upstream. Note today this is not addressed: SVN will blindly accept the push and break the build. Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system will just add the change on top of the current ToT, even if they have not been fetched to the local repo. As Justing mentions, this is not true with `git push` AFAIK. You have to `pull` (merge or rebase) before being able to push.

After this round of feedback I'm removing myself from this discussion.

docs/Proposals/GitHubMove.rst
208	You asked for feedback. If you want to disregard it that is your decision.
215	"Can you provide an example where the history of a single file contents can be preserved without pulling all the source repository entirely? I'd like to try it and see how git log/git blame deals with that." git-filter-branch can preserve the history of a single file. It does not follow renames, however if you know a file was renamed, you can use git-filter-branch's --tree-filter or --index-filter flags to perform more complicated slicing of the repository to preserve that history. If you're unfamiliar with the types of things you can do with filter branch, this article gives a good overview (https://devsector.wordpress.com/2014/10/05/advanced-git-branch-filtering/).
250	From the beginning I said: It won't be perfect, but it should be good enough for sorting commits in close proximity... If you want to debate that statement we can do so, but I would prefer not to in this thread. Also you're changing the definition of the multi-repo as I was foreseeing it. I think it is worse, and if we were to adopt the multi-repo proposal, I would be totally against this. You don't get to dictate how the proposal in opposition to your preferred approach is written. I think you've been pretty clear about being against the multi-repo proposal, so I don't see how your opinion factors in to the final document, which shouldn't be opinion based.
602	Saying the workflows is "similar" is not a subjective wording. Today someone who writes: `svn co svn co http://llvm.org/svn/llvm-project/llvm/trunk` Under the mono-repo could write something like: `svn co http://github.com/llvm/llvm-project/master/llvm` Under the multi-repo could write something like: `svn co http://github.com/llvm/llvm/master/` The workflow of `svn co` -> `svn add` -> `svn commit` is similar in all cases.

beanz removed a subscriber: beanz.Oct 3 2016, 2:02 PM

kparzysz added inline comments.Oct 3 2016, 2:02 PM

docs/Proposals/GitHubMove.rst
356	What do you mean by "see"? I'm referring to this (and the rest of this paragraph): "However when you fetch you'll likely pull in changes to sub-projects you don't care about." The intent wasn't clear---I wasn't aware of the requirement about parent commit (I use SVN for upstreaming changes). But that brings another question: what is the anticipated frequency of commits to the monorepo? My concern is that the "rebuild and retest" approach may take long enough to require another rebase...

remove reference to size.
change the model for multi-repos: single commit based, losing cross-project commits.
clarify that sparse-checkout don't see changes from other projects

mehdi_amini added inline comments.Oct 3 2016, 2:16 PM

docs/Proposals/GitHubMove.rst
356	When you commit to SVN, you add a "patch" on top of the existing codebase. Unless there is a conflict your patch will be committed. It does not mean it will build, since someone else may just have changed an API you're using in your patch. The new monorepo won't be different from SVN on this aspect: you have the same frequency of commits, and you can run `git pull && git push` which is roughly equivalent to `git svn dcommit` today. The thing is that between the `git pull` and the `git push`, you can also inspect what changed since your last build/check, and decide if you need to rebuild or not.

mehdi_amini added inline comments.Oct 3 2016, 2:18 PM

docs/Proposals/GitHubMove.rst
356	Maybe we should just remove all this paragraph, it is confusing...

Remove confusing paragraph.

jlebar added inline comments.Oct 3 2016, 2:21 PM

docs/Proposals/GitHubMove.rst
356	My concern is that the "rebuild and retest" approach may take long enough to require another rebase... This isn't a function of the monorepo. You choose when to rebuild/retest, and that's orthogonal to the repository structure. If "rebuild/retest only when there were changes to files I changed" is what you want to do, you can still do that. You can ask that question of git before pushing. Or you could ask "have any of the projects I care about changed?" Or you could ask a different question. And you could ask those questions of the monorepo, or the multirepo (although it might be a bit more work in the multirepo -- I say "might" so beanz doesn't jump on me). In this sense it's safer than SVN, which assumes that you only care about retesting if there were modifications to files you also changed.

jlebar added inline comments.Oct 3 2016, 2:23 PM

docs/Proposals/GitHubMove.rst
356	I really think you want this paragraph, btw. This is a very common question -- it's been asked many times before. "I don't want the monorepo because it will mean I have to rebuild/retest a lot more than I do today." False, but we need to explain why.

mehdi_amini marked 45 inline comments as done.Oct 3 2016, 2:37 PM

mehdi_amini added inline comments.

docs/Proposals/GitHubMove.rst
250	You don't get to dictate how ... Sorry, you mischaracterizing my position and what I wrote, I don't appreciate this.
356	OK, I'll try to rephrase it then. The main point is that `git pull && git push` is not different from today SVN.

mehdi_amini marked an inline comment as done.Oct 3 2016, 2:38 PM

Restore the paragraph.

mehdi_amini added a reviewer: dexonsmith.Oct 5 2016, 9:36 PM

dtzWill added a subscriber: dtzWill.Oct 6 2016, 7:43 AM

Address Duncan's inline comments.

New layout attempt

I believe what Duncan is asking for is basically the same thing I (and others) have also been asking for: An explicit "not dryly-factual" section where the experts explain their various positions.

I am saddened that we won't have these sections in the document -- I think not having them does a disservice to the readers who, like you, want this material. (We've had at least one other person comment in this thread, and we've had nobody say they don't want it.) But the disagreement seems to be based on Mehdi having fundamentally different conceptions of concourse than certainly I have, and I'm not prepared to litigate the philosophy of argument just to get a section added to this document.

On the other hand, in light of the amount of abuse it seems that whoever drives this process will receive no matter what they say, I'm certainly not willing to switch places with Mehdi, and I think he deserves a heaping ton of credit for frankly superhuman positivity here (in addition to credit for doing the work itself). So if he continues to oppose this idea, I respect his decision. In that case, I think we're just going to have to write it up separately, and hope that it gets the visibility it deserves. Maybe we'll be able to get a link in this document, although if that turns into a fight, like so much else here has, I hope I'll have the self-control to turn and run in the other direction.

Try another layout: add first a description of the multirepo, then one for the monorepo, then the interleaved comparison.

ioeric added a subscriber: ioeric.Oct 12 2016, 2:30 AM

ioeric added inline comments.Oct 12 2016, 4:19 AM

docs/Proposals/GitHubMove.rst
181	I am wondering where we are in the process now. Specifically, when would we get to this step (2.5)? Phabricator is seeing frequent connection errors from svn server (might due to the increased number of svn connections after the recent phabricator upgrade): svn: OPTIONS of 'http://llvm.org/svn-robots/llvm-project': could not connect to server (http://llvm.org) This blocks syncing svn commits from time to time. I'd expect Github to be more stable.

Address Duncan's feedback

(Remove duplicated section)

mehdi_amini updated this revision to Diff 74445.Oct 12 2016, 2:55 PM

Split the bullet about the overhead of the monorepo for users that care only about a single subproject.

Closed by commit rL284077: Moving to GitHub - Unified Proposal (authored by mehdi_amini). · Explain WhyOct 12 2016, 4:54 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

docs/

Proposals/

GitHubMove.rst

879 lines

GitHubSubMod.rst

index.rst

4 lines

Diff 74413

docs/Proposals/GitHubMove.rst

This file was added.

				==============================
				Moving LLVM Projects to GitHub
				==============================

				Introduction
				============

				This is a proposal to move our current revision control system from our own
				hosted Subversion to GitHub. Below are the financial and technical arguments as
				to why we are proposing such a move and how people (and validation
				infrastructure) will continue to work with a Git-based LLVM.
				emasteUnsubmitted Done Reply Inline Actions A little point, but I think we should say "why we're proposing such a move" or similar. "why we need such a move" in the first paragraph of the document implies the decision is already made, and might discourage those against change from even responding. emaste: A little point, but I think we should say "why we're proposing such a move" or similar. "why…

				There will be a survey pointing at this document which we'll use to gauge the
				community's reaction and, if we collectively decide to move, the time-frame. Be
				probinsonUnsubmitted Done Reply Inline Actions s/gague/gauge/ probinson: s/gague/gauge/
				sure to make your view count.

				Additionally, we will discuss this during a BoF at the next US LLVM Developer
				meeting (http://llvm.org/devmtg/2016-11/).

				This proposal is divided into the following parts:

				* Outline of the reasons to move to Git and GitHub
				* Description off the options
				* What examples of some workflows will look like (compared to currently)
				* The proposed migration plan

				What This Proposal is Not About
				=================================

				Changing the development policy.

				This proposal relates only to moving the hosting of our source-code repository
				from SVN hosted on our own servers to Git hosted on GitHub. We are not proposing
				using GitHub's issue tracker, pull-requests, or code-review.

				Contributers will continue to earn commit access on demand under the Developer
				Policy, except that that a GitHub account will be required instead of SVN
				username/password-hash.
				probinsonUnsubmitted Done Reply Inline Actions contributors -> contributor 'in the same condition' -> 'under the same conditions' probinson: contributors -> contributor 'in the same condition' -> 'under the same conditions'

				Why Git, and Why GitHub?
				========================

				Why Move At All?
				----------------

				This discussion began because we currently host our own Subversion server
				and Git mirror on a voluntary basis. The LLVM Foundation sponsors the server and
				provides limited support, but there is only so much it can do.

				Volunteers are not sysadmins themselves, but compiler engineers that happen
				to know a thing or two about hosting servers. We also don't have 24/7 support,
				and we sometimes wake up to see that continuous integration is broken because
				the SVN server is either down or unresponsive.

				We should take advantage of one of the services out there (GitHub, GitLab,
				BitBucket among others) that offer better service (24/7 stability, disk
				space, Git server, code browsing, forking facilities, etc) for free.

				Why Git?
				--------

				Many new coders nowadays start with Git, and a lot of people have never used
				SVN, CVS, or anything else. Websites like GitHub have changed the landscape
				of open source contributions, reducing the cost of first contribution and
				Eugene.ZelenkoUnsubmitted Done Reply Inline Actions Please remove space before dot. Eugene.Zelenko: Please remove space before dot.
				beanzUnsubmitted Done Reply Inline Actions The language here is also misleading. Maybe change to something like: Many new coders nowadays start with Git, and a lot of people have never used SVN, CVS, or anything else. beanz: The language here is also misleading. Maybe change to something like: > Many new coders…
				fostering collaboration.

				Git is also the version control many LLVM developers use. Despite the
				sources being stored in a SVN server, these developers are already using Git
				through the Git-SVN integration.
				beanzUnsubmitted Done Reply Inline Actions I would remove the "(most?)" bit here because it doesn't really add any value. We have no data to support an assertion of "most", and it could be misleading to suggest it. beanz: I would remove the "(most?)" bit here because it doesn't really add any value. We have no data…

				Git allows you to:

				* Commit, squash, merge, and fork locally without touching the remote server.
				* Maintain local branches, enabling multiple threads of development.
				* Collaborate on these branches (e.g. through your own fork of llvm on GitHub).
				* Inspect the repository history (blame, log, bisect) without Internet access.
				* Maintain remote forks and branches on Git hosting services and
				integrate back to the main repository.

				In addition, because Git seems to be replacing many OSS projects' version
				beanzUnsubmitted Done Reply Inline Actions Can we also add this as a point: Maintain remote forks and branches on Git hosting services and easily integrate back to the main repository. In particular for people that maintain out-of-tree code or forks, the ability to seamlessly merge between repositories is a big win for Git. beanz: Can we also add this as a point: > * Maintain remote forks and branches on Git hosting…
				control systems, there are many tools that are built over Git.
				Future tooling may support Git first (if not only).

				Why GitHub?
				-----------

				GitHub, like GitLab and BitBucket, provides free code hosting for open source
				projects. Any of these could replace the code-hosting infrastructure that we
				have today.

				These services also have a dedicated team to monitor, migrate, improve and
				distribute the contents of the repositories depending on region and load.

				GitHub has one important advantage over GitLab and
				BitBucket: it offers read-write SVN access to the repository
				(https://github.com/blog/626-announcing-svn-support).
				This would enable people to continue working post-migration as though our code
				were still canonically in an SVN repository.

				In addition, there are already multiple LLVM mirrors on GitHub, indicating that
				part of our community has already settled there.

				On Managing Revision Numbers with Git
				-------------------------------------

				The current SVN repository hosts all the LLVM sub-projects alongside each other.
				A single revision number (e.g. r123456) thus identifies a consistent version of
				all LLVM sub-projects.

				Git does not use sequential integer revision number but instead uses a hash to
				identify each commit. (Linus mentioned that the lack of such revision number
				is "the only real design mistake" in Git [TorvaldRevNum]_.)

				The loss of a sequential integer revision number has been a sticking point in
				past discussions about Git:

				- "The 'branch' I most care about is mainline, and losing the ability to say
				'fixed in r1234' (with some sort of monotonically increasing number) would
				be a tragic loss." [LattnerRevNum]_
				- "I like those results sorted by time and the chronology should be obvious, but
				timestamps are incredibly cumbersome and make it difficult to verify that a
				given checkout matches a given set of results." [TrickRevNum]_
				- "There is still the major regression with unreadable version numbers.
				Given the amount of Bugzilla traffic with 'Fixed in...', that's a
				non-trivial issue." [JSonnRevNum]_
				- "Sequential IDs are important for LNT and llvmlab bisection tool." [MatthewsRevNum]_.

				However, Git can emulate this increasing revision number:
				`git rev-list --count <commit-hash>`. This identifier is unique only within a
				single branch, but this means the tuple `(num, branch-name)` uniquely identifies
				a commit.

				We can thus use this revision number to ensure that e.g. `clang -v` reports a
				user-friendly revision number (e.g. `master-12345` or `4.0-5321`), addressing
				the objections raised above with respect to this aspect of Git.

				What About Branches and Merges?
				-------------------------------
				beanzUnsubmitted Done Reply Inline Actions Can we also add something about the more traditional Git approaches to this? Maybe something like: Additionally, there are simple Git commands that can also be used to determine the order of commits. For example to answer the question is a bug fixed in <hash-a> fixed in a compiler built at <hash-b> can be answered with the command `git rev-list <hash-a>..<hash-b> --count`. If this prints a number greater than 0, the fix is contained in <hash-b>. Additionally if we were to use Git tags similarly to how we use SVN tags today you would be able to identify which releases contained a fix by running `git describe --contains <hash>`. beanz: Can we also add something about the more traditional Git approaches to this? Maybe something…

				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I'm not against mentioning this somewhere, but the "traditional" Git approach of hashes does not address at all the concerns mentioned right above. mehdi_amini: I'm not against mentioning this somewhere, but the "traditional" Git approach of hashes does…
				In contrast to SVN, Git makes branching easy. Git's commit history is represented
				as a DAG, a departure from SVN's linear history.

				However, we propose to mandate making merge commits illegal in our canonical Git
				repository.

				Unfortunately, GitHub does not support server side hooks to enforce such a
				policy. We must rely on the community to avoid pushing merge commits.

				GitHub offers a feature called `Status Checks`: a branch protected by
				jlebarUnsubmitted Done Reply Inline Actions What's the resolution here? jlebar: What's the resolution here?
				rengolinUnsubmitted Done Reply Inline Actions This is a good question. If it works at all, two things can happen: SVN reports rev 123, I commit, get rev 124. Git rev-count get's 124 SVN reports rev 123, I commit, get rev 125 because someone committed at the same time and git sorted the other commit first. If 2 happens, I don't think it'll be a big deal, so, we should be fine, as long as the SVN bridge can work with the linearity enforcement of restricted branches. rengolin: This is a good question. If it works at all, two things can happen: 1. SVN reports rev 123, I…
				`status checks` requires commits to be whitelisted before the push can happen.
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Right now it is one or the other, I asked the github support to consider adding the option to have SVN commits bypass the status-check, they are considering it (no promises). mehdi_amini: Right now it is one or the other, I asked the github support to consider adding the option to…
				We could supply a pre-push hook on the client side that would run and check the
				history, before whitelisting the commit being pushed [statuschecks]_.
				However this solution would be somewhat fragile (how do you update a script
				installed on every developer machine?) and prevents SVN access to the
				repository.

				What About Commit Emails?
				-------------------------

				We will need a new bot to send emails for each commit. This proposal leaves the
				rengolinUnsubmitted Done Reply Inline Actions No need, GitHub has email hooks: https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/ rengolin: No need, GitHub has email hooks: https://help.github.com/articles/managing-notifications-for…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions This does not line up with "We'll keep the exact same email format". mehdi_amini: This does not line up with "We'll keep the exact same email format".
				email format unchanged besides the commit URL.
				rengolinUnsubmitted Done Reply Inline Actions IMO, we don't need to keep the same format, but that's a good point. Though, it would be better to outline the two options in one quick phrase than leave the other implied. rengolin: IMO, we don't need to keep the same format, but that's a good point. Though, it would be…

				Straw Man Migration Plan
				========================

				STEP #1 : Before The Move

				1. Update docs to mention the move, so people are aware of what is going on.
				2. Set up a read-only version of the GitHub project, mirroring our current SVN
				repository.
				3. Add the required bots to implement the commit emails, as well as the
				umbrella repository update (if the multirepo is selected) or the read-only
				Git views for the sub-projects (if the monorepo is selected).

				STEP #2 : Git Move

				4. Update the buildbots to pick up updates and commits from the GitHub
				repository. Not all bots have to migrate at this point, but it'll help
				provide infrastructure testing.
				5. Update Phabricator to pick up commits from the GitHub repository.
				6. LNT and llvmlab have to be updated: they rely on unique monotonically
				ioericUnsubmitted Not Done Reply Inline Actions I am wondering where we are in the process now. Specifically, when would we get to this step (2.5)? Phabricator is seeing frequent connection errors from svn server (might due to the increased number of svn connections after the recent phabricator upgrade): svn: OPTIONS of 'http://llvm.org/svn-robots/llvm-project': could not connect to server (http://llvm.org) This blocks syncing svn commits from time to time. I'd expect Github to be more stable. ioeric: I am wondering where we are in the process now. Specifically, when would we get to this step (2.
				increasing integer across branch [MatthewsRevNum]_.
				jlebarUnsubmitted Done Reply Inline Actions Maybe we should call them "single-subproject mirrors" instead of "read-only repositories". jlebar: Maybe we should call them "single-subproject mirrors" instead of "read-only repositories".
				7. Instruct downstream integrators to pick up commits from the GitHub
				jlebarUnsubmitted Done Reply Inline Actions would continue to be maintained jlebar: would continue to be maintained
				repository.
				jlebarUnsubmitted Not Done Reply Inline Actions I think we need to explain what this means, because this is critical for understanding the monorepo. Developers will continue to be able to use the existing single-subproject git repositories as they do today, with no changes to workflow beyond a one-time git-svn config change. Everything (git fetch, git svn dcommit, etc.) would continue to work identically to how it works today. jlebar: I think we need to explain what this means, because this is critical for understanding the…
				jlebarUnsubmitted Done Reply Inline Actions Missing period at end of sentence. jlebar: Missing period at end of sentence.
				rengolinUnsubmitted Done Reply Inline Actions Full stop. rengolin: Full stop.
				8. Review and prepare an update for the LLVM documentation.

				jlebarUnsubmitted Done Reply Inline Actions This segue does not make sense in context. jlebar: This segue does not make sense in context.
				probinsonUnsubmitted Done Reply Inline Actions immediates ... technicals -> immediate ... technical probinson: immediates ... technicals -> immediate ... technical
				Until this point nothing has changed for developers, it will just
				boil down to a lot of work for buildbot and other infrastructure
				owners.

				Once all dependencies are cleared, and all problems have been solved:

				STEP #3: Write Access Move
				beanzUnsubmitted Done Reply Inline Actions This is a completely subjective statement, and should not be present. beanz: This is a completely subjective statement, and should not be present.

				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Rewrote, but I suspect we'll need some other rounds. Suggestion welcome. mehdi_amini: Rewrote, but I suspect we'll need some other rounds. Suggestion welcome.
				9. Collect developers' GitHub account information, and add them to the project.
				10. Switch the SVN repository to read-only and allow pushes to the GitHub repository.
				11. Update the documentation.
				12. Mirror Git to SVN.

				kparzyszUnsubmitted Done Reply Inline Actions and preserve the history. kparzysz: and preserve the history.
				STEP #4 : Post Move
				jlebarUnsubmitted Done Reply Inline Actions losing jlebar: losing
				rengolinUnsubmitted Done Reply Inline Actions Better to avoid "much more" for the reasons we have discussed before. Either say how it's worse, or don't compare. rengolin: Better to avoid "much more" for the reasons we have discussed before. Either say how it's worse…

				13. Archive the SVN repository.
				14. Update links on the LLVM website pointing to viewvc/klaus/phab etc. to
				beanzUnsubmitted Done Reply Inline Actions With git history could be preserved even across repositories. Git subtree merges support this, and while it isn't as simple, it is a one-time cost. beanz: With git history could be preserved even across repositories. Git subtree merges support this…
				point to GitHub instead.
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Can you provide an example where the history of a single file contents can be preserved without pulling all the source repository entirely? I'd like to try it and see how git log/git blame deals with that. mehdi_amini: Can you provide an example where the history of a single file contents can be preserved…

				beanzUnsubmitted Done Reply Inline Actions Google is our friend -> http://stackoverflow.com/questions/1365541/how-to-move-files-from-one-git-repo-to-another-not-a-clone-preserving-history beanz: Google is our friend -> http://stackoverflow.com/questions/1365541/how-to-move-files-from-one…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I don't see `git subtree` at work on this link, just `filter-branch` + `git mv` + merge. That flow tracks the history of a file, not its content AFAIK (i.e. if a function was moved from another file into the current one, the history of when/why this function added/modified won't be included). Also, what would be the effect of moving a file from a repo to another, and later back to the original repo? mehdi_amini: I don't see `git subtree` at work on this link, just `filter-branch` + `git mv` + merge. That…
				One or Multiple Repositories?
				beanzUnsubmitted Done Reply Inline Actions As in my other comment, losing history is not an issue. beanz: As in my other comment, losing history is not an issue.
				=============================

				beanzUnsubmitted Done Reply Inline Actions How does the mono-repo do this? It might make it easier, but since it is likely that even with a mono-repo most people won't build all projects I don't think it actually encourages updates across all sub-projects. beanz: How does the mono-repo do this? It might make it easier, but since it is likely that even with…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I was thinking about the fact that if I change the API `createTargetMachineFromTriple()`, and `git grep` to find the uses, then all the uses in sub-projects will show up. mehdi_amini: I was thinking about the fact that if I change the API `createTargetMachineFromTriple()`, and…
				beanzUnsubmitted Done Reply Inline Actions That is 'making easier' not 'encouraging'. Personally I fall to 'grep' way before I fall to 'git grep' for things like this, and I don't think the monorepo has any enforcement of this. beanz: That is 'making easier' not 'encouraging'. Personally I fall to 'grep' way before I fall to…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions That is 'making easier' not 'encouraging'. "All the source is there by default" + "making it easier" => why I wrote "encouraging". Personally I fall to 'grep' way before I fall to 'git grep' for things like this, and I don't think the monorepo has any enforcement of this. Not sure why "enforcement" comes into play here? mehdi_amini: > That is 'making easier' not 'encouraging'. "All the source is there //by default//" +…
				beanzUnsubmitted Done Reply Inline Actions "All the source is there by default" This is what makes it easier. Your math is double counting it. I disagree with your wording here. I've told you I disagree. You can continue to disregard my feedback or you can fix it. The choice is yours. beanz: > "All the source is there by default" This is what makes it easier. Your math is double…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions "All the source is there by default" This is what makes it easier. Sorry, but I mentioned earlier `git grep` and you answered `That is 'making easier'`. All the source presents by default is more than making it easier. I disagree with your wording here. I've told you I disagree. I strongly disagree with your disagreement here. mehdi_amini: >> "All the source is there by default" > This is what makes it easier. Sorry, but I mentioned…
				beanzUnsubmitted Done Reply Inline Actions You asked for feedback. If you want to disregard it that is your decision. beanz: You asked for feedback. If you want to disregard it that is your decision.
				There are two major variants for how to structure our Git repository: The
				beanzUnsubmitted Done Reply Inline Actions Actually, there were also concerns about the increased burden for contributors not just downstream users. In general I think this entire section is designed to point out supporting arguments for the mono-repo with no recognition of the merits of the multi-repo proposal. beanz: Actually, there were also concerns about the increased burden for contributors not just…
				"multirepo" and the "monorepo".
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions You're welcome to suggest merits of the multi-repo proposal to balance. mehdi_amini: You're welcome to suggest merits of the multi-repo proposal to balance.

				beanzUnsubmitted Done Reply Inline Actions I don't think that our proposals should be constructed as convoluted arguments between contributing authors. Adding pro multi-repo statements will only make this more difficult to grok. I actually think there is very little in this section that shouldn't be part of an "arguments/rebuttals" section. beanz: I don't think that our proposals should be constructed as convoluted arguments between…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions There is nothing convoluted here. Adding fact-based pro multi-repo statement will make it easier to understand. I disagree, I think most of this section should stay here. So we'll have to go in the specifics, piece by piece. mehdi_amini: 1) There is nothing convoluted here. 2) Adding fact-based pro multi-repo statement will make it…
				Multirepo Variant
				rengolinUnsubmitted Done Reply Inline Actions I think this won't address the fears of people that don't know enough to not panic. This is why getting the technical parts correct and accurate is so important (and I confess I didn't do enough due diligence on my part of the text either). rengolin: I think this won't address the fears of people that don't know enough to not panic. This is why…
				-----------------
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Can you clarify what you're referring to exactly and/or suggest some editing? mehdi_amini: Can you clarify what you're referring to exactly and/or suggest some editing?

				This variant recommends moving each LLVM sub-project to a separate Git
				beanzUnsubmitted Done Reply Inline Actions You still haven't addressed the feedback here. Saying the multi-repo would lose history is still inaccurate. For starters, you're not actually deleting the history from the repository you're moving code from. Also with a multi-repo you can easily preserve the file history by using git filter-branch. Using filter-branch will not follow history across renames that are outside the filter, but will follow them within the filter. For example if you were to use filter branch on lib/Support to break it out into its own repository, filter branch would preserve history of files under lib/Support that are renamed as long as they remain under libSupport. It would not preserve the history of a file being renamed and moved under libSupport. Even with that the history before that point is traceable because the history would still exist in the old repository, so you are not losing history, you just aren't moving it with the file. beanz: You still haven't addressed the feedback here. Saying the multi-repo would lose history is…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Fair enough: replaced "losing history" with "the history of the refactored code won't be available from the new place". mehdi_amini: Fair enough: replaced "losing history" with "the history of the refactored code won't be…
				beanzUnsubmitted Done Reply Inline Actions In your example of moving clang-tools-extra there would be no need for loss of history at all. There is no need for filter-branch. You can literally reformat clang-tools-extra to be under tools/extra/ and merge the whole tree into the clang master branch. The only point where you would lose any history at all is if you were trimming one part of a repository into another repository, and even in that situation you can minimize the losses pretty well using filter-branch and index scripts. It is complicated but possible. beanz: In your example of moving clang-tools-extra there would be no need for loss of history at all.
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions So do you have anything concrete that could be added here, be practical (something we'd be willing to encourage in the future), be understandable by any dev, and not take > 20 lines to describe? mehdi_amini: So do you have anything concrete that could be added here, be practical (something we'd be…
				beanzUnsubmitted Done Reply Inline Actions You gave an example that is factually incorrect. I'm asking you to fix it. That is concrete. In my earlier comment I told you why your example was incorrect. You can remove the example, or come up with an alternative. That is your choice. What you cannot do, is use this factually inaccurate example. beanz: You gave an example that is factually incorrect. I'm asking you to fix it. That is concrete. In…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions The current spelling (Friday, 3:51pm) is: "With the multirepo, moving clang-tools-extra into clang would be more complicated than a simple `git mv` command, and the history of the refactored code won't be available from the new place." I can change the example to: "Refactoring some functions from clang to make it a utility in one of the llvm/lib/Support file to share it across sub-projects wouldn't carry the history of the code in the llvm repo." That said, I asked you on 9/9 (over 3 weeks ago) "Can you provide an example where the history of a single file contents can be preserved without pulling all the source repository entirely? I'd like to try it and see how git log/git blame deals with that." You haven't been able to provide me with this. So you can claim whatever you want about "factual innacuracy", you still failed to provide counter facts to support your claim. mehdi_amini: The current spelling (Friday, 3:51pm) is: "With the multirepo, moving clang-tools-extra into…
				beanzUnsubmitted Done Reply Inline Actions "Can you provide an example where the history of a single file contents can be preserved without pulling all the source repository entirely? I'd like to try it and see how git log/git blame deals with that." git-filter-branch can preserve the history of a single file. It does not follow renames, however if you know a file was renamed, you can use git-filter-branch's --tree-filter or --index-filter flags to perform more complicated slicing of the repository to preserve that history. If you're unfamiliar with the types of things you can do with filter branch, this article gives a good overview (https://devsector.wordpress.com/2014/10/05/advanced-git-branch-filtering/). beanz: > "Can you provide an example where the history of a single file contents can be preserved…
				repository. This mimics the existing official read-only Git repositories
				(e.g., http://llvm.org/git/compiler-rt.git), and creates new canonical
				repositories for each sub-project.

				This will allow the individual subprojects to remain distinct: a
				probinsonUnsubmitted Done Reply Inline Actions maintains the property probinson: maintains the property
				developer interested only in compiler-rt can checkout only this repository,
				build it, and work in isolation of the other subprojects.

				beanzUnsubmitted Done Reply Inline Actions What about the concerns about active community members having this burden? beanz: What about the concerns about active community members having this burden?
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Can you clarify what you're referring to exactly? (No regression compared to now I believe) mehdi_amini: Can you clarify what you're referring to exactly? (No regression compared to now I believe)
				beanzUnsubmitted Done Reply Inline Actions Ah. I misread. I see what you are saying. This is fine. beanz: Ah. I misread. I see what you are saying. This is fine.
				A key need is to be able to check out multiple projects (i.e. lldb+clang+llvm or
				clang+llvm+libcxx for example) at a specific revision.

				rengolinUnsubmitted Done Reply Inline Actions This is a nit, please don't take it personal... When I read "things are more involved", I had a negative feeling that "it's complicated". Down there, when explaining the "involved" way of checking out a singular repo (compiler-rt), instead, you say "there are a number of options", and I had a positive feeling of "choice". Even though they mean the same thing, it felt different. I don't particularly mind either way, but to avoid backlash, I'd try to be consistent and use the same (preferably neutral) phrases for all cases. rengolin: This is a nit, please don't take it personal... When I read "things are more involved", I had…
				mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions They don't mean the same thing. Here it is more complicated. The other one exposes multiple options with various tradeoff. mehdi_amini: They don't mean the same thing. Here it is more complicated. The other one exposes multiple…
				A tuple of revisions (one entry per repository) accurately describes the state
				across the sub-projects.
				For example, a given version of clang would be
				<LLVM-12345, clang-5432, libcxx-123, etc.>.

				beanzUnsubmitted Done Reply Inline Actions This is very slanted wording. From a user perspective the multi-repo solution to this problem is not much more complicate than the mono-repo solution. beanz: This is very slanted wording. From a user perspective the multi-repo solution to this problem…
				Umbrella Repository
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Please provide a replacement for this sentence. mehdi_amini: Please provide a replacement for this sentence.
				^^^^^^^^^^^^^^^^^^^
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions (Tried to make it more explicit that complexity is handled by the infrastructure) mehdi_amini: (Tried to make it more explicit that complexity is handled by the infrastructure)

				To make this more convenient, a separate umbrella repository will be
				provided. This repository will be used for the sole purpose of understanding
				the sequence in which commits were pushed to the different repositories and to
				provide a single revision number.

				This umbrella repository will be read-only and continuously updated
				to record the above tuple. The proposed form to record this is to use Git
				[submodules]_, possibly along with a set of scripts to help check out a
				specific revision of the LLVM distribution.

				A regular LLVM developer does not need to interact with the umbrella repository
				-- the individual repositories can be checked out independently -- but you would
				need to use the umbrella repository to bisect multiple sub-projects at the same
				time, or to check-out old revisions of llvm plus another sub-project at a
				consistent version.

				rengolinUnsubmitted Done Reply Inline Actions Can you add a link of the monorepo as well? I think you had one, right? rengolin: Can you add a link of the monorepo as well? I think you had one, right?
				beanzUnsubmitted Done Reply Inline Actions Remove "(with some granularity)". The multi-repo proposal can have the same 1:1 mapping of commits in per-project repos to umbrella commits that the mono-repo would have. When the update job runs with a list of more than one commit we can sort them by committer timestamp (which is updated after rebase). It will provide a roughly linear timeline for the commits to be sorted across the repositories. It won't be perfect, but it should be good enough for sorting commits in close proximity because the pushed commits will either be rebased (which updates the committer timestamp) or they will be merge commits which will have a committer timestamp generated when the merge commit was generated. beanz: Remove "(with some granularity)". The multi-repo proposal can have the same 1:1 mapping of…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions It seems to me that at the beginning the idea was that the submodules would be updated every few minutes, so that we'd be able to have rev-locked commits pushed to multiple projects at the same time and have them appear a single umbrella update (with somehow a heuristic like "update the submodules when there hasn't been a push for 2 min"). Apparently your idea is rather than we should update it with single commits, but what's the story for rev-locked? How would the tooling not have a race condition? Example: I commit to LLVM I commit to Clang the script runs, pull LLVM, no change I push to LLVM I push to Clang the script pulls Clang, see my commit the script is done with pulling and update the submodule with the clang change, before the LLVM change, even though the commit date would be reversed. I don't see a principled solution to implement the umbrella without server-side (i.e. native git hook) support. Sure you can craft it, and it'll work fine most of the time, but that does not make it bulletproof. mehdi_amini: It seems to me that at the beginning the idea was that the submodules would be updated every…
				beanzUnsubmitted Done Reply Inline Actions The automation will run. It will collect a list of commits that have been pushed to each repository since the last time the script ran. It will then sort them by committer timestamp order, and commit one at a time to the umbrella repo as submodule updates. We can setup the automation to run based on GitHub WebHooks, and periodically in case a WebHook gets dropped. There is no race condition that I see. If we need to support revlocked changes, (and I'm not convinced this is the case since they are by far a minority of commits) we can support them via annotations on the commit messages. We can teach the automation to look for markers in the commit message denoting that it is revlocked to other changes, and we can have it group revlocked changes together. There is no need for server-side hooks, and this solution would work as well as any mirroring system. I don't believe there is any need for this solution to be bulletproof, but I see no reason why it cannot be as robust as the single-project mirrors that the mono-repo proposal includes. beanz: The automation will run. It will collect a list of commits that have been pushed to each…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions The automation will run. It will collect a list of commits that have been pushed to each repository since the last time the script ran. Atomically? There is no race condition that I see. Did you read my sequence 1-7 that describes an example of race? but I see no reason why it cannot be as robust as the single-project mirrors that the mono-repo proposal includes. Define "robust". The single-project mirrors have a very well deterministic algorithm to construct, and reconstruct them at will, you don't have one for the multi-repo. That's not "robust" to me. mehdi_amini: > The automation will run. It will collect a list of commits that have been pushed to each…
				beanzUnsubmitted Done Reply Inline Actions I've updated my automation (https://github.com/llvm-beanz/llvm-submodules) to make one umbrella commit per commit to sub-project repository. This has a single commit granularity. That was the original point I was arguing. It works. It is done. Is it perfect? No. There are a number of situations where the order of the commits to the submodule can be impacted by the order and proximity of commits to the project repositories. That is irrelevant to the point I was making. I'm more than happy to debate with you about whether or not that matters, but that is a separate issue from what I was pointing out. Do we need to belabor this further, or will you update the document based on my feedback? beanz: I've updated my automation (https://github.com/llvm-beanz/llvm-submodules) to make one umbrella…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions You're moving goal posts. Your previous message said that there is no race, while now you're eluding it with "There are a number of situations where...". Also you're changing the definition of the multi-repo as I was foreseeing it. I think it is worse, and if we were to adopt the multi-repo proposal, I would be totally against this. Now, just to please you, because again I don't think it does any good to this proposal, I'll re-formulate making clear that: update in the multi-repo are single commits based. commits can be in different orders. it does not handle cross-project commits. mehdi_amini: You're moving goal posts. Your previous message said that there is no race, while now you're…
				beanzUnsubmitted Not Done Reply Inline Actions From the beginning I said: It won't be perfect, but it should be good enough for sorting commits in close proximity... If you want to debate that statement we can do so, but I would prefer not to in this thread. Also you're changing the definition of the multi-repo as I was foreseeing it. I think it is worse, and if we were to adopt the multi-repo proposal, I would be totally against this. You don't get to dictate how the proposal in opposition to your preferred approach is written. I think you've been pretty clear about being against the multi-repo proposal, so I don't see how your opinion factors in to the final document, which shouldn't be opinion based. beanz: From the beginning I said: > It won't be perfect, but it should be good enough for sorting…
				mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions You don't get to dictate how ... Sorry, you mischaracterizing my position and what I wrote, I don't appreciate this. mehdi_amini: > You don't get to dictate how ... Sorry, you mischaracterizing my position and what I wrote…
				This umbrella repository will be updated automatically by a bot (running on
				notice from a webhook on every push, and periodically) on a per commit basis: a
				single commit in the umbrella repository would match a single commit in a
				subproject.
				beanzUnsubmitted Done Reply Inline Actions I would say 'continuously' rather than periodically here. You describe in more detail below how the notifications would be configured and 'periodically' isn't a full picture. beanz: I would say 'continuously' rather than periodically here. You describe in more detail below how…

				Living Downstream
				^^^^^^^^^^^^^^^^^

				Downstream SVN users can use the read/write SVN bridges with the following
				caveats:

				* Be prepared for a one-time change to the upstream revision numbers.
				* The upstream sub-project revision numbers will no longer be in sync.

				Downstream Git users can continue without any major changes, with the minor
				change of upstreaming using `git push` instead of `git svn dcommit`.

				Git users also have the option of adopting an umbrella repository downstream.
				The tooling for the upstream umbrella can easily be reused for downstream needs,
				beanzUnsubmitted Done Reply Inline Actions s/interacts/would interact/ beanz: s/interacts/would interact/
				incorporating extra sub-projects and branching in parallel with sub-project
				branches.

				Multirepo Preview
				^^^^^^^^^^^^^^^^^

				As a preview (disclaimer: this rough prototype, not polished and not
				representative of the final solution), you can look at the following:
				rengolinUnsubmitted Done Reply Inline Actions Better to keep the same order svn/git. And don't need to specify that it's a bridge, since you mention above. rengolin: Better to keep the same order svn/git. And don't need to specify that it's a bridge, since you…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions The "canonical way" changes. The bridge isn't mentioned in this section, I rather have it clear (and it doesn't hurt). mehdi_amini: The "canonical way" changes. The bridge isn't mentioned in this section, I rather have it clear…

				* Repository: https://github.com/llvm-beanz/llvm-submodules
				* Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/

				Concerns
				^^^^^^^^

				* Because GitHub does not allow server-side hooks, and because there is no
				"push timestamp" in Git, the umbrella repository sequence isn't totally
				exact: commits from different repositories pushed around the same time can
				appear in different orders. However, we don't expect it to be the common case
				or to cause serious issues in practice.
				* You can't have a single cross-projects commit that would update both LLVM and
				other subprojects (something that can be achieved now). It would be possible
				to establish a protocol whereby users add a special token to their commit
				messages that causes the umbrella repo's updater bot to group all of them
				into a single revision.
				* Another option is to group commits that were pushed closely enough together
				in the umbrella repository. This has the advantages of allowing cross-project
				commits, and is less sensitive to mis-ordering commits. However, this has the
				rengolinUnsubmitted Done Reply Inline Actions Can you commit via git directly? rengolin: Can you commit via git directly?
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions How do you use git svn? mehdi_amini: How do you use git svn?
				potential to group unrelated commits together, especially if the bot goes
				rengolinUnsubmitted Done Reply Inline Actions D'oh, "with the sequence...". Ignore me. rengolin: D'oh, "with the sequence...". Ignore me.
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I added "with the sequence" following your comment to make it more clear. mehdi_amini: I added "with the sequence" following your comment to make it more clear.
				down and needs to catch up.
				* This variant relies on heavier tooling. But the current prototype shows that
				it is not out-of-reach.
				* Submodules don't have a good reputation / are complicating the command line.
				However, in the proposed setup, a regular developer will seldom interact with
				submodules directly, and certainly never update them.
				* Refactoring across projects is not frienly: taking some functions from clang
				to make it part of a utility in libSupport wouldn't carry the history of the
				code in the llvm repo, preventing recursively applying `git blame` for
				instance.


				Workflows
				^^^^^^^^^

				* :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
				* :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-multicheckout-nocommit>`.
				* :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-multicheckout-multicommit>`.
				* :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
				* :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-multi-branching>`.
				* :ref:`Bisecting <workflow-multi-bisecting>`.

				Monorepo Variant
				----------------

				rengolinUnsubmitted Done Reply Inline Actions Is this a flat tree (like today) or the checked-out tree (tools/clang, etc)? rengolin: Is this a flat tree (like today) or the checked-out tree (tools/clang, etc)?
				This variant recommends moving all LLVM sub-projects to a single Git repository,
				similar to https://github.com/llvm-project/llvm-project.
				This would mimic an export of the current SVN repository, with each sub-project
				having its own top-level directory.
				Not all sub-projects are used for building toolchains. In practise, www/
				and test-suite/ will probably stay out of the monorepo.

				Putting all sub-projects in a single checkout makes cross-project refactoring
				naturally simple:

				* New sub-projects can be trivially split out for better reuse and/or layering
				(e.g., to allow libSupport and/or LIT to be used by runtimes without adding a
				beanzUnsubmitted Done Reply Inline Actions You've lost me here. Checking out all the projects in SVN today involves multiple svn co commands. Unless there is some magic in SVN I'm unaware of. If there is such magic we should document it somewhere on LLVM.org (maybe on the getting started page?) and link to it here. beanz: You've lost me here. Checking out all the projects in SVN today involves multiple svn co…
				mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions I was referring to: svn co http://llvm.org/svn/llvm-project/ --depth=immediates cd llvm-project/ svn up llvm/trunk clang/trunk libcxx/trunk You can then have a build with only LLVM configured like: mkdir ../build-llvm && cd ../build-llvm cmake ../llvm-project/llvm/trunk And a build dir with llvm+clang: mkdir ../build-clang && cd ../build-clang cmake ../llvm-project/llvm/trunk -DLLVM_EXTERNAL_CLANG_DIR=../llvm-project/clang/trunk/ So that a single `svn up $projects` in the source directory update all the sources and you can still build a subset of the projects from these sources. This is also how I'd synchronize if I was integrating downstream from SVN. mehdi_amini: I was referring to: ``` svn co http://llvm.org/svn/llvm-project/ --depth=immediates cd llvm…
				beanzUnsubmitted Done Reply Inline Actions I can't imagine that is a common workflow. It certainly isn't the documented recommended workflow on llvm.org, so I'm not sure there is value in bringing it into the discussion. beanz: I can't imagine that is a common workflow. It certainly isn't the documented recommended…
				dependency on LLVM).
				* Changing an API in LLVM and upgrading the sub-projects will always be done in
				a single commit, designing away a common source of temporary build breakage.
				* Moving code across sub-project (during refactoring for instance) in a single
				commit enables accurate `git blame` when tracking code change history.
				* Tooling based on `git grep` works natively across sub-projects, allowing to
				easier find refactoring opportunities across projects (for example reusing a
				datastructure initially in LLDB by moving it into libSupport).
				* Finally, having all the sources present encourages maintaining the other
				sub-projects when changing API.

				Finally, the monorepo maintains the property of the existing SVN repository that
				the sub-projects move synchronously, and a single revision number (or commit
				hash) identifies the state of the development across all projects.

				rengolinUnsubmitted Done Reply Inline Actions successful? rengolin: successful?
				probinsonUnsubmitted Done Reply Inline Actions successed -> successful probinson: successed -> successful
				Building a single sub-project
				beanzUnsubmitted Done Reply Inline Actions Can you please add actual size numbers for each project and the mono-repo? Just saying '2x' isn't super meaningful without knowing the size of 1x. beanz: Can you please add actual size numbers for each project and the mono-repo? Just saying '2x'…
				beanzUnsubmitted Done Reply Inline Actions Can you add per-project sizes? beanz: Can you add per-project sizes?
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions That'd make a long list, how should it be presented? mehdi_amini: That'd make a long list, how should it be presented?
				beanzUnsubmitted Done Reply Inline Actions However you think it is best presented. A table would seem fitting. You could put it below and have a link down to it. I think that if you're bringing size into the discussion you need to provide sufficient data. beanz: However you think it is best presented. A table would seem fitting. You could put it below and…
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Nobody will be forced to build unnecessary projects. The exact structure
				is TBD, but making it trivial to configure builds for a single sub-project
				(or a subset of sub-projects) is a hard requirement.
				kparzyszUnsubmitted Done Reply Inline Actions Even with sparse checkout? Am I going to see new files in projects that were not originally included in the sparse checkout? kparzysz: Even with sparse checkout? Am I going to see new files in projects that were not originally…
				jlebarUnsubmitted Done Reply Inline Actions What do you mean by "see"? In order to push a commit without `-f`, the commit's parent commit must be the current remote head. The commits in git are unaffected by sparse checkout. So, if you have a commit you want to push, you will need to rebase it atop current remote HEAD -- you'll have to do this rebase even if you're using sparse checkouts and all of the changes between your current base revision and current remote HEAD are to subprojects that you don't have checked out. If you don't like this, you can continue to use the single-subproject mirrors exactly as you currently do (with git-svn and everything), by changing the configs as explained elsewhere in this document. But I've been using a monorepo (http://github.com/llvm-project/llvm-project) for months now. I've pushed maybe 30 commits using my custom script (https://github.com/jlebar/llvm-repo-tools) and this necessity to rebase hasn't once been an annoyance for me. jlebar: What do you mean by "see"? In order to push a commit without `-f`, the commit's parent commit…
				kparzyszUnsubmitted Done Reply Inline Actions What do you mean by "see"? I'm referring to this (and the rest of this paragraph): "However when you fetch you'll likely pull in changes to sub-projects you don't care about." The intent wasn't clear---I wasn't aware of the requirement about parent commit (I use SVN for upstreaming changes). But that brings another question: what is the anticipated frequency of commits to the monorepo? My concern is that the "rebuild and retest" approach may take long enough to require another rebase... kparzysz: >What do you mean by "see"? I'm referring to this (and the rest of this paragraph): "However…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions When you commit to SVN, you add a "patch" on top of the existing codebase. Unless there is a conflict your patch will be committed. It does not mean it will build, since someone else may just have changed an API you're using in your patch. The new monorepo won't be different from SVN on this aspect: you have the same frequency of commits, and you can run `git pull && git push` which is roughly equivalent to `git svn dcommit` today. The thing is that between the `git pull` and the `git push`, you can also inspect what changed since your last build/check, and decide if you need to rebuild or not. mehdi_amini: When you commit to SVN, you add a "patch" on top of the existing codebase. Unless there is a…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Maybe we should just remove all this paragraph, it is confusing... mehdi_amini: Maybe we should just remove all this paragraph, it is confusing...
				jlebarUnsubmitted Done Reply Inline Actions I really think you want this paragraph, btw. This is a very common question -- it's been asked many times before. "I don't want the monorepo because it will mean I have to rebuild/retest a lot more than I do today." False, but we need to explain why. jlebar: I really think you want this paragraph, btw. This is a very common question -- it's been asked…
				mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions OK, I'll try to rephrase it then. The main point is that `git pull && git push` is not different from today SVN. mehdi_amini: OK, I'll try to rephrase it then. The main point is that `git pull && git push` is not…
				jlebarUnsubmitted Done Reply Inline Actions My concern is that the "rebuild and retest" approach may take long enough to require another rebase... This isn't a function of the monorepo. You choose when to rebuild/retest, and that's orthogonal to the repository structure. If "rebuild/retest only when there were changes to files I changed" is what you want to do, you can still do that. You can ask that question of git before pushing. Or you could ask "have any of the projects I care about changed?" Or you could ask a different question. And you could ask those questions of the monorepo, or the multirepo (although it might be a bit more work in the multirepo -- I say "might" so beanz doesn't jump on me). In this sense it's safer than SVN, which assumes that you only care about retesting if there were modifications to files you also changed. jlebar: > My concern is that the "rebuild and retest" approach may take long enough to require another…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Even with sparse checkout? Am I going to see new files in projects that were not originally included in the sparse checkout? If you mean are you seeing them when typing `ls` in your terminal, then no you don't. I can add "unless you're using a sparse checkout" to make it more clear. mehdi_amini: > Even with sparse checkout? Am I going to see new files in projects that were not originally…

				As an example, it could look like the following::

				mkdir build && cd build
				# Configure only LLVM (default)
				cmake path/to/monorepo
				# Configure LLVM and lld
				cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lld
				# Configure Error: lldb project requires also clang
				cmake path/to/monorepo -DLLVM_ENABLE_PROJECTS=lldb

				kparzyszUnsubmitted Done Reply Inline Actions A conflicting change would have to affect the same file. This is regardless of whether it's monorepo or multirepo. Am I missing something here? Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system will just add the change on top of the current ToT, even if they have not been fetched to the local repo. kparzysz: A conflicting change would have to affect the same file. This is regardless of whether it's…
				jlebarUnsubmitted Done Reply Inline Actions Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system will just add the change on top of the current ToT, even if they have not been fetched to the local repo. That is what git-svn will do, yes. But that's not pure git's behavior. jlebar: > Rebasing is always a good practice, but it's not strictly required. If there are no conflicts…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions A conflicting change would have to affect the same file. This is regardless of whether it's monorepo or multirepo. Am I missing something here? The point was that when you run `git pull --rebase`, you have new changes, and even without an explicit "diff conflict" your changes that you're about to push may use an API that have changed upstream. Note today this is not addressed: SVN will blindly accept the push and break the build. Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system will just add the change on top of the current ToT, even if they have not been fetched to the local repo. As Justing mentions, this is not true with `git push` AFAIK. You have to `pull` (merge or rebase) before being able to push. mehdi_amini: > A conflicting change would have to affect the same file. This is regardless of whether it's…
				.. _git-svn-mirror:

				Read/write sub-project mirrors
				------------------------------

				With the Monorepo, the existing single-subproject mirrors (e.g.
				http://llvm.org/git/compiler-rt.git) with git-svn read-write access would
				continue to be maintained: developers would continue to be able to use the
				existing single-subproject git repositories as they do today, with *no changes
				to workflow*. Everything (git fetch, git svn dcommit, etc.) could continue to
				work identically to how it works today. The monorepo can be set-up such that the
				SVN revision number matches the SVN revision in the GitHub SVN-bridge.

				Living Downstream
				^^^^^^^^^^^^^^^^^

				Downstream SVN users can use the read/write SVN bridge. The SVN revision
				number can be preserved in the monorepo, minimizing the impact.

				Downstream Git users can continue without any major changes, by using the
				git-svn mirrors on top of the SVN bridge.
				beanzUnsubmitted Done Reply Inline Actions the emphasis on 'exactly as we do today' is unnecessary. beanz: the emphasis on 'exactly as we do today' is unnecessary.

				Git users can also work upstream with monorepo even if their downstream
				fork has split repositories. They can apply patches in the appropriate
				subdirectories of the monorepo using, e.g., `git am --directory=...`, or
				plain `diff` and `patch`.

				Alternatively, Git users can migrate their own fork to the monorepo. As a
				demonstration, we've migrated the "CHERI" fork to the monorepo in two ways:

				* Using a script that rewrites history (including merges) so that it looks
				like the fork always lived in the monorepo [LebarCHERI]_. The upside of
				this is when you check out an old revision, you get a copy of all llvm
				sub-projects at a consistent revision. (For instance, if it's a clang
				fork, when you check out an old revision you'll get a consistent version
				of llvm proper.) The downside is that this changes the fork's commit
				hashes.

				* Merging the fork into the monorepo [AminiCHERI]_. This preserves the
				fork's commit hashes, but when you check out an old commit you only get
				the one sub-project.

				Monorepo Preview
				^^^^^^^^^^^^^^^^^

				As a preview (disclaimer: this rough prototype, not polished and not
				representative of the final solution), you can look at the following:

				* Full Repository: https://github.com/joker-eph/llvm-project
				* Single subproject view with SVN write access to the full repo:
				https://github.com/joker-eph/compiler-rt

				Concerns
				^^^^^^^^

				beanzUnsubmitted Done Reply Inline Actions Nit: "enters the dance" implies complexity. beanz: Nit: "enters the dance" implies complexity.
				* Some concerns have been raised that having a single repository would be an
				overhead for those that have interest in only a single repository. This is
				rengolinUnsubmitted Done Reply Inline Actions Maybe mention --recursive, too? rengolin: Maybe mention --recursive, too?
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions What's the point? mehdi_amini: What's the point?
				addressed by keeping the single-subproject Git mirrors for each project just
				rengolinUnsubmitted Done Reply Inline Actions "Update" would take one per project and is more cumbersome when you don't know beforehand which or how many projects you'll build (we have that problem). Conceptually the same, but recursive gives a better "impression" of simplicity. It's about the bias issue that Chris was talking about, even if totally unintended. rengolin: "Update" would take one per project and is more cumbersome when you don't know beforehand which…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I'm not sure I follow: AFAIK recursive is for nested submodules, which is not part of the proposal. So to be clear I expect `--recursive` to be a no-op. I can be wrong, but I'll need some more explanation if I missed something obvious here. If your point is about cloning all the sub-projects and not only just a selected list, then `--recursive` is not the right option, just doing `git submodule update` without any other flag will do it. I'll spell it out. mehdi_amini: I'm not sure I follow: AFAIK recursive is for nested submodules, which is not part of the…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I added a comment mentioning that the list if optional. Let me know if I misunderstood something about --recursive above. mehdi_amini: I added a comment mentioning that the list if optional. Let me know if I misunderstood…
				as we do today. For contributors that need read/write access the
				:ref:`GitHub SVN bridge <git-svn-mirror>` allows to contribute to a single
				sub-project the same way as today.
				* Preservation of the existing read/write SVN-based workflows relies on the
				GitHub SVN bridge, which is an extra dependency. Maintaining this locks us
				into GitHub and could restrict future workflow changes.

				Workflows
				^^^^^^^^^

				* :ref:`Checkout/Clone a Single Project, without Commit Access <workflow-checkout-commit>`.
				* :ref:`Checkout/Clone a Single Project, with Commit Access <workflow-monocheckout-nocommit>`.
				* :ref:`Checkout/Clone Multiple Projects, with Commit Access <workflow-monocheckout-multicommit>`.
				* :ref:`Commit an API Change in LLVM and Update the Sub-projects <workflow-cross-repo-commit>`.
				* :ref:`Branching/Stashing/Updating for Local Development or Experiments <workflow-mono-branching>`.
				* :ref:`Bisecting <workflow-mono-bisecting>`.

				Multi/Mono Hybrid Variant
				-------------------------

				This variant recommends moving only the LLVM sub-projects that are rev-locked
				beanzUnsubmitted Done Reply Inline Actions Alternatively since our intention is to enforce a linear history in the repositories doing a checkout by timestamp using the format below should also work in the majority of cases. git checkout 'master@{...}' beanz: Alternatively since our intention is to enforce a linear history in the repositories doing a…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions This applies to both proposals right? Where do you want me to add this? mehdi_amini: This applies to both proposals right? Where do you want me to add this?
				beanzUnsubmitted Done Reply Inline Actions I think it is worth noting under the multi-repo proposal something along the lines of: Because we will be maintaining a linear history you can perform a timestamp based checkout of each project repository with the following command: git checkout 'master@{...}' Additionally you can use the umbrella repository... If you want to also add the timestamp checkout to the mono-repo proposal, that makes sense too. I just think it is worth noting under the multi-repo proposal that timestamp based checkouts are expected to work due to the linear history requirement, which means you don't need the submodule repo. beanz: I think it is worth noting under the multi-repo proposal something along the lines of: >…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Are you sure that this command does what you think it does? If I read correctly the doc, it is looking at your reflog, not the history. The right one should be something like `git checkout` `git rev-list -n 1 --before="2009-07-27 13:37" master` I just think it is worth noting under the multi-repo proposal that timestamp based checkouts are expected to work due to the linear history requirement, which means you don't need the submodule repo. OK that wasn't clear to me the first time. mehdi_amini: Are you sure that this command does what you think it does? If I read correctly the doc, it is…
				beanzUnsubmitted Done Reply Inline Actions You are correct, you need to use `rev-list` to get the commit hash. beanz: You are correct, you need to use `rev-list` to get the commit hash.
				to LLVM into a monorepo (clang, lld, lldb, ...), following the multirepo
				proposal for the rest. While neither variant recommends combining sub-projects
				like www/ and test-suite/ (which are completely standalone), this goes further
				and keeps sub-projects like libcxx and compiler-rt in their own distinct
				repositories.

				Concerns
				beanzUnsubmitted Not Done Reply Inline Actions Not sure I agree this is easy for svn users. To my knowledge llvm.org doesn't even document how to checkout the SVN repositories in a way to make this possible. beanz: Not sure I agree this is easy for svn users. To my knowledge llvm.org doesn't even document how…
				^^^^^^^^
				mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions Do you have an alternative to suggest? mehdi_amini: Do you have an alternative to suggest?

				probinsonUnsubmitted Done Reply Inline Actions single repository -> monorepo probinson: single repository -> monorepo
				* This has most disadvantages of multirepo and monorepo, without bringing many
				of the advantages.
				* Downstream have to upgrade to the monorepo structure, but only partially. So
				they will keep the infrastructure to integrate the other separate
				sub-projects.
				* All projects that use LIT for testing are effectively rev-locked to LLVM.
				Furthermore, some runtimes (like compiler-rt) are rev-locked with Clang.
				beanzUnsubmitted Done Reply Inline Actions Again, I don't follow how this is easy. There is no documentation on LLVM.org explaining how to do this and my limited knowledge of SVN leaves me with no idea how to do it. beanz: Again, I don't follow how this is easy. There is no documentation on LLVM.org explaining how to…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions (Copy/pasted commands above) mehdi_amini: (Copy/pasted commands above)
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Copy/pasted above (I'm not sure I really want to document it on llvm.org now). mehdi_amini: Copy/pasted above (I'm not sure I really want to document it on llvm.org now).
				beanzUnsubmitted Done Reply Inline Actions Fine if you don't want to document it, but I certainly would not describe that as "easy". Especially because if you ever mix up and type "svn up" in the root it starts updating everything. I think this is an incredibly fragile workflow, which is probably why it is also incredibly uncommon. beanz: Fine if you don't want to document it, but I certainly would not describe that as "easy".
				It's not clear where to draw the lines.


				Workflow Before/After
				=====================
				beanzUnsubmitted Done Reply Inline Actions I would phrase as "It would be possible...", because it most certainly is possible. beanz: I would phrase as "It would be possible...", because it most certainly is possible.

				This section goes through a few examples of workflows, intended to illustrate
				how end-users or developers would interact with the repository for
				various use-cases.

				beanzUnsubmitted Done Reply Inline Actions Please remove "and makes this use case ...", it is a value judgement. beanz: Please remove "and makes this use case ...", it is a value judgement.
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I don't believe so, but if you insist... mehdi_amini: I don't believe so, but if you insist...
				.. _workflow-checkout-commit:

				Checkout/Clone a Single Project, without Commit Access
				------------------------------------------------------

				Except the URL, nothing changes. The possibilities today are::

				svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
				# or with Git
				git clone http://llvm.org/git/llvm.git

				After the move to GitHub, you would do either::

				git clone https://github.com/llvm-project/llvm.git
				# or using the GitHub svn native bridge
				svn co https://github.com/llvm-project/llvm/trunk

				probinsonUnsubmitted Done Reply Inline Actions These checkouts should not have -b on them. probinson: These checkouts should not have -b on them.
				The above works for both the monorepo and the multirepo, as we'll maintain the
				existing read-only views of the individual sub-projects.

				Checkout/Clone a Single Project, with Commit Access
				---------------------------------------------------

				Currently
				::

				# direct SVN checkout
				svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
				# or using the read-only Git view, with git-svn
				git clone http://llvm.org/git/llvm.git
				cd llvm
				git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
				git config svn-remote.svn.fetch :refs/remotes/origin/master
				git svn rebase -l # -l avoids fetching ahead of the git mirror.

				Commits are performed using `svn commit` or with the sequence `git commit` and
				`git svn dcommit`.

				.. _workflow-multicheckout-nocommit:

				Multirepo Variant

				With the multirepo variant, nothing changes but the URL, and commits can be
				performed using `svn commit` or `git commit` and `git push`::

				git clone https://github.com/llvm/llvm.git llvm
				# or using the GitHub svn native bridge
				beanzUnsubmitted Done Reply Inline Actions Additionally users of the umbrella repo can use `git submodule foreach` to have single command workflows that nearly match the mono-repo proposal. beanz: Additionally users of the umbrella repo can use `git submodule foreach` to have single command…
				svn co https://github.com/llvm/llvm/trunk/ llvm
				probinsonUnsubmitted Done Reply Inline Actions SVN bisection is not built-in but it is easy to do manually (or scripted) because you can do `svn update -r $REVISION` to an arbitrary revision. Because revisions are integers, do `(BAD - GOOD)/2` to pick the next revision. So, it is not materially harder than bisecting on the multirepo. probinson: SVN bisection is not built-in but it is easy to do manually (or scripted) because you can do…
				mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions Thanks Paul, I tried to clarify, can you double-check? mehdi_amini: Thanks Paul, I tried to clarify, can you double-check?

				probinsonUnsubmitted Done Reply Inline Actions Very nicely succinct. One typo: scripts -> script probinson: Very nicely succinct. One typo: scripts -> script
				.. _workflow-monocheckout-nocommit:

				Monorepo Variant

				With the monorepo, there are multiple possibilities to achieve this. First,
				you could just clone the full repository::

				git clone https://github.com/llvm/llvm-projects.git llvm
				# or using the GitHub svn native bridge
				svn co https://github.com/llvm/llvm-projects/trunk/ llvm

				At this point you have every sub-project (llvm, clang, lld, lldb, ...), which
				doesn't imply you have to build all of them. You can still build only
				compiler-rt for instance. In this way it's not different from someone who would
				check out all the projects with SVN today.

				You can commit as normal using `git commit` and `git push` or `svn commit`, and
				read the history for a single project (`git log libcxx` for example).

				There are a few options to avoid checking out all the sources.

				First, you could hide the other directories using a Git sparse checkout::

				git config core.sparseCheckout true
				echo /compiler-rt > .git/info/sparse-checkout
				git read-tree -mu HEAD

				The data for all sub-projects is still in your `.git` directory, but in your
				checkout, you only see `compiler-rt`.
				Before you push, you'll need to fetch and rebase (`git pull --rebase`) as
				usual.

				probinsonUnsubmitted Done Reply Inline Actions sub-projects -> sub-project probinson: sub-projects -> sub-project
				Note that when you fetch you'll likely pull in changes to sub-projects you don't
				care about. If you are using spasre checkout, the files from other projects
				won't appear on your disk. The only effect is that your commit hash changes.

				You can check whether the changes in the last fetch are relevant to your commit
				by running::

				git log origin/master@{1}..origin/master -- libcxx

				This command can be hidden in a script so that `git llvmpush` would perform all
				these steps, fail only if such a dependent change exists, and show immediately
				the change that prevented the push. An immediate repeat of the command would
				(almost) certainly result in a successful push.
				Note that today with SVN or git-svn, this step is not possible since the
				"rebase" implicitly happens while committing (unless a conflict occurs).

				A second option is to use svn via the GitHub svn native bridge::

				beanzUnsubmitted Done Reply Inline Actions This is inaccurate. Even though my rough prototype of the git umbrella repo doesn't have each submodule update being a single commit that was the stated plan for how the umbrella would be updated. That means each umbrella repo commit would represent a single commit to a single subproject, so your bisection granularity is comparable. beanz: This is inaccurate. Even though my rough prototype of the git umbrella repo doesn't have each…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions (I'm waiting for the story to support this above) mehdi_amini: (I'm waiting for the story to support this above)
				beanzUnsubmitted Done Reply Inline Actions See above. beanz: See above.
				svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt —username=...
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions If you have a way to guarantee it, I'm willing to hear about it. Right now, I don't believe it is possible without implementing it on the git hosting itself. mehdi_amini: If you have a way to guarantee it, I'm willing to hear about it. Right now, I don't believe…

				beanzUnsubmitted Done Reply Inline Actions You can absolutely guarantee the same granularity. You can't guarantee the same ordering, but generally speaking that is significantly less important than granularity. To get the same granularity you allow the script that updates submodules to produce more than one commit to the submodule repo at a time. If there are multiple you can sort them by committer date. While committer date isn't a great thing to use since our proposals both depend on maintaining a linear history it should be good enough for the common cases because committer date gets reset on rebase. beanz: You can absolutely guarantee the same granularity. You can't guarantee the same ordering, but…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions but generally speaking that is significantly less important than granularity. No sorry, I can't agree, this is critical: correctness goes before usability. It seems to me that you're willing to trade correctness to bring a guarantee of usability here. I'm willing to believe that "in practice" the granularity should be small enough, it just has to be worded carefully. Right now it is a parenthesis at the end: `(it is possible that one commit in the umbrella repository includes multiple commits in the sub-projects)` , we can reword this `(it is possible that one commit in the umbrella repository includes multiple commits in the sub-projects, though it should be occasional in practice)` (One may bikeshed on what exactly "occasional" is though, but we don't have any data to bikeshed efficiently anyway). mehdi_amini: > but generally speaking that is significantly less important than granularity. No sorry, I…
				This checks out only compiler-rt and provides commit access using "svn commit",
				rengolinUnsubmitted Done Reply Inline Actions This last phrase is odd... It's not clear what "this" is, but I think you mean "a single repo in build structure". In a multi-repo, people will continue to checkout independent projects and commit directly to them, there's no difference for them. rengolin: This last phrase is odd... It's not clear what "this" is, but I think you mean "a single repo…
				probinsonUnsubmitted Done Reply Inline Actions Wouldn't the multirepo still have SVN views on each subproject? Seems like the SVN views would basically be the same for either multirepo or monorepo. probinson: Wouldn't the multirepo still have SVN views on each subproject? Seems like the SVN views would…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Tried to clarify: the multirepo breaks the cross-project synchronization with SVN. mehdi_amini: Tried to clarify: the multirepo breaks the cross-project synchronization with SVN.
				in the same way as it would do today.
				rengolinUnsubmitted Done Reply Inline Actions Looks better, thanks! rengolin: Looks better, thanks!

				Finally, you could use git-svn and one of the sub-project mirrors::

				# Clone from the single read-only Git repo
				git clone http://llvm.org/git/llvm.git
				cd llvm
				# Configure the SVN remote and initialize the svn metadata
				beanzUnsubmitted Done Reply Inline Actions Better to say "both proposals will allow you to continue to use SVN". The wording here makes it seem like only the mono-repo has GitHub's SVN support, even though that is later contradicted. beanz: Better to say "both proposals will allow you to continue to use SVN". The wording here makes it…
				$ git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=...
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I did a minor rewording (we're on a different support level here between the two solutions, which need to be conveyed somehow). mehdi_amini: I did a minor rewording (we're on a different support level here between the two solutions…
				git config svn-remote.svn.fetch :refs/remotes/origin/master
				git svn rebase -l
				rengolinUnsubmitted Done Reply Inline Actions "CHERI" rengolin: "CHERI"
				beanzUnsubmitted Done Reply Inline Actions If we go with the multi-repo approach we can ensure that each umbrella repo commit will be only one submodule update. This is relatively straight forward tooling to add. The only situation where we could potentially allow multiple updates in a single umbrella commit would be if we wanted to do cross-repository correlating of revlocked changes. beanz: If we go with the multi-repo approach we can ensure that each umbrella repo commit will be only…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions (I'm waiting for the story to support this above) mehdi_amini: (I'm waiting for the story to support this above)
				beanzUnsubmitted Done Reply Inline Actions Again, above. beanz: Again, above.

				In this case the repository contains only a single sub-project, and commits can
				be made using `git svn dcommit`, again exactly as we do today.

				Checkout/Clone Multiple Projects, with Commit Access
				beanzUnsubmitted Done Reply Inline Actions The granularity is not finer. beanz: The granularity is not finer.
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions (I'm waiting for the story to support this above) mehdi_amini: (I'm waiting for the story to support this above)
				----------------------------------------------------

				Let's look how to assemble llvm+clang+libcxx at a given revision.

				Currently
				::

				svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm -r $REVISION
				cd llvm/tools
				svn co http://llvm.org/svn/llvm-project/clang/trunk clang -r $REVISION
				rengolinUnsubmitted Done Reply Inline Actions ... or you can apply directly on the multi-repo solution. It's good to repeat to make clear that both solutions are covered. rengolin: ... or you can apply directly on the multi-repo solution. It's good to repeat to make clear…
				cd ../projects
				svn co http://llvm.org/svn/llvm-project/libcxx/trunk libcxx -r $REVISION
				beanzUnsubmitted Done Reply Inline Actions Better to say both proposals allow you to continue using SVN the same way, but that each solution will have minor impacts. In the monorepo there will be a one-time change in revision numbers, and in the multi-repo each project will have its own revision numbers out of sync from each other. beanz: Better to say both proposals allow you to continue using SVN the same way, but that each…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions "The same way" implies "a single SVN revision number to me". One could even say "a single SVN checkout" (cf the command I copy/pasted above). I don't see how it'd work with the multi-repo? How would someone downstream integrating from SVN be able to correlate revision across repositories? mehdi_amini: "The same way" implies "a single SVN revision number to me". One could even say "a single SVN…
				beanzUnsubmitted Done Reply Inline Actions Maybe rather than "the same way" "with similar workflows to today"? beanz: Maybe rather than "the same way" "with similar workflows to today"?
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I'm still missing what would be similar for someone integrating multiple projects from SVN today (assuming such downstream integrator exists) with the multi-repo? mehdi_amini: I'm still missing what would be similar for someone integrating multiple projects from SVN…
				beanzUnsubmitted Done Reply Inline Actions I strongly suspect that very few users are using a single SVN checkout that contains more than one sub-project. If you discount that workflow, the workflow for interfacing using the GitHub SVN bridge is very similar whether you are using one repo or many. Additionally, with the mono repo the combined SVN workflow is actually a lot better than with SVN today. It is way less fragile since you aren't doing sub-directory checkouts. This means you don't run the risk of inadvertently running `svn up` and pulling down way more than you wanted. beanz: I strongly suspect that very few users are using a single SVN checkout that contains more than…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions If you discount that workflow, the workflow for interfacing using the GitHub SVN bridge is very similar whether you are using one repo or many. "Very similar" is subjective, to me it can't be similar as long as there is no longer a single revision number. Additionally, with the mono repo the combined SVN workflow is actually a lot better than with SVN today. It is way less fragile since you aren't doing sub-directory checkouts. This means you don't run the risk of inadvertently running svn up and pulling down way more than you wanted. I don't understand what you mean here. mehdi_amini: > If you discount that workflow, the workflow for interfacing using the GitHub SVN bridge is…
				beanzUnsubmitted Done Reply Inline Actions Saying the workflows is "similar" is not a subjective wording. Today someone who writes: `svn co svn co http://llvm.org/svn/llvm-project/llvm/trunk` Under the mono-repo could write something like: `svn co http://github.com/llvm/llvm-project/master/llvm` Under the multi-repo could write something like: `svn co http://github.com/llvm/llvm/master/` The workflow of `svn co` -> `svn add` -> `svn commit` is similar in all cases. beanz: Saying the workflows is "similar" is not a subjective wording. Today someone who writes: `svn…

				Or using git-svn::

				git clone http://llvm.org/git/llvm.git
				cd llvm/
				git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
				rengolinUnsubmitted Done Reply Inline Actions I'd add a small paragraph explaining the problems that will come from having "two worlds", neither here, nor there. If it's too complicated, than lets not even propose that, as it'll end up as a third proposal. rengolin: I'd add a small paragraph explaining the problems that will come from having "two worlds"…
				probinsonUnsubmitted Done Reply Inline Actions repository -> repositories. probinson: repository -> repositories.
				beanzUnsubmitted Done Reply Inline Actions s/any of the proposal/both of the proposals/ beanz: s/any of the proposal/both of the proposals/
				git config svn-remote.svn.fetch :refs/remotes/origin/master
				git svn rebase -l
				git checkout `git svn find-rev -B r258109`
				cd tools
				beanzUnsubmitted Done Reply Inline Actions Reword from the second sentence on. You're making a value assessment. A better phrasing might be: If your fork touches multiple LLVM projects, migrating your fork into the mono repo would enable you to make commits that touch multiple projects at the same time the same way LLVM contributors would be able to do so. beanz: Reword from the second sentence on. You're making a value assessment. A better phrasing might…
				git clone http://llvm.org/git/clang.git
				cd clang/
				git svn init https://llvm.org/svn/llvm-project/clang/trunk --username=<username>
				git config svn-remote.svn.fetch :refs/remotes/origin/master
				git svn rebase -l
				rengolinUnsubmitted Done Reply Inline Actions Can they checkout the read-only libc++ and commit without checking out the entire monorepo? rengolin: Can they checkout the read-only libc++ and commit without checking out the entire monorepo?
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions This is covered in the section `Checkout/Clone a Single Project, with Commit Access` (or I don't understand the question) mehdi_amini: This is covered in the section `Checkout/Clone a Single Project, with Commit Access` (or I…
				git checkout `git svn find-rev -B r258109`
				rengolinUnsubmitted Done Reply Inline Actions That work flow example shows a changed flow for commits, so the statement that "their workflow is unchanged" is not accurate. The parentheses comment helps, but doesn't address the issue completely. A better way would be "the workflow is as described in [link] pointing above. rengolin: That work flow example shows a changed flow for commits, so the statement that "their workflow…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I'm sorry I don't follow. You mention a changed in the flow for commit. Here is what's mentioned in the section I referred to, can you clarify where is the inaccuracy? Workflow today: # direct SVN checkout svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm # or using the read-only Git view, with git-svn git clone http://llvm.org/git/llvm.git cd llvm git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username> git config svn-remote.svn.fetch :refs/remotes/origin/master git svn rebase -l # -l avoids fetching ahead of the git mirror. Workflow after (copy/paste): A second option is to use svn via the GitHub svn native bridge:: svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt —username=... This checks out only compiler-rt and provides commit access using "svn commit", in the same way as it would do today. Finally, you could use git-svn and one of the sub-project mirrors:: # Clone from the single read-only Git repo git clone http://llvm.org/git/llvm.git cd llvm # Configure the SVN remote and initialize the svn metadata git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=... git config svn-remote.svn.fetch :refs/remotes/origin/master git svn rebase -l In this case the repository contains only a single sub-project, and commits can be made using `git svn dcommit`, again exactly as we do today. mehdi_amini: I'm sorry I don't follow. You mention a changed in the flow for commit. Here is what's…
				rengolinUnsubmitted Done Reply Inline Actions This is how it would work on a multi-repo, but this section is talking about the mono-repo. IIGIR, on a mono-repo, developers of a single component will have to commit back on the mono-repo, which will then be propagated to the individual (read-only) repos, no? rengolin: This is how it would work on a multi-repo, but this section is talking about the mono-repo.
				mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions This is how it would work on a multi-repo I'm not totally sure what is "This" referring to? Assuming it is about my previous paste, then no it describes the monorepo. IIGIR, on a mono-repo, developers of a single component will have to commit back on the mono-repo, which will then be propagated to the individual (read-only) repos, no? Right, and this is the same thing as what a git-svn developer do today: git clone the individual repo configure git svn to point to the SVN repo (the one from the monorepo in the future). commit through SVN the commits are propagated to the individual repo. mehdi_amini: > This is how it would work on a multi-repo I'm not totally sure what is "This" referring to?
				cd ../../projects/
				git clone http://llvm.org/git/libcxx.git
				cd libcxx
				beanzUnsubmitted Done Reply Inline Actions This is a subjective statement that I don't believe is factually accurate. We could easily teach the build system to checkout subprojects so that building a full toolchain could be `git clone ... && configure && build` regardless of the repository layout. beanz: This is a subjective statement that I don't believe is factually accurate. We could easily…
				git svn init https://llvm.org/svn/llvm-project/libcxx/trunk --username=<username>
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Removed the paragraph mehdi_amini: Removed the paragraph
				git config svn-remote.svn.fetch :refs/remotes/origin/master
				beanzUnsubmitted Done Reply Inline Actions I would phrase the downside as "rewriting the fork's history and changing its commit hashes", because that is what happens. beanz: I would phrase the downside as "rewriting the fork's history and changing its commit hashes"…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions The paragraph starts with " Using a script that rewrites history" and end with "changes the fork's commit hashes", it seems to me that this makes explicit that the downside of rewriting history is that the hashes change. (I'm not sure how "rewriting history" is a downside by itself otherwise) mehdi_amini: The paragraph starts with " Using a script that rewrites history" and end with "changes the…
				beanzUnsubmitted Done Reply Inline Actions Fine. beanz: Fine.
				git svn rebase -l
				git checkout `git svn find-rev -B r258109`

				Note that the list would be longer with more sub-projects.
				rengolinUnsubmitted Done Reply Inline Actions This is confusing... I thought you were going to list all of them, mono and multi. rengolin: This is confusing... I thought you were going to list all of them, mono and multi.
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions This is the intention, should be updated later. mehdi_amini: This is the intention, should be updated later.

				rengolinUnsubmitted Done Reply Inline Actions Ok rengolin: Ok
				beanzUnsubmitted Done Reply Inline Actions I'm confused by this. The sub-project mirrors are read-only, so the workflow is either checkout the full mono-repo or use Git-SVN. That doesn't sound unchanged. beanz: I'm confused by this. The sub-project mirrors are read-only, so the workflow is either checkout…
				.. _workflow-multicheckout-multicommit:
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions We're talking about libcxx in the monorepo proposal? Assuming yes, can you give an example of workflow that would be changed compared to today? mehdi_amini: We're talking about libcxx in the monorepo proposal? Assuming yes, can you give an example of…

				beanzUnsubmitted Done Reply Inline Actions Ah. I think the confusing phrasing is that monorepo is being used in two contexts. Maybe rephrase this to something like: With this variant of the monorepo proposal developers who only work on excluded sub-projects will continue to use the single-project repositories. The workflow is still changed from today, because today we're using SVN. beanz: Ah. I think the confusing phrasing is that monorepo is being used in two contexts. Maybe…
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Sorry, the sentence is really about the monorepo: leaving libcxx within the monorepo should not be a regression compared to today. mehdi_amini: Sorry, the sentence is really about the monorepo: leaving libcxx within the monorepo should not…
				Multirepo Variant
				beanzUnsubmitted Done Reply Inline Actions This is a little unclear to me. Do you mean applying the patches via "git apply" from a patch file? Might be worth clarification about how that would work. beanz: This is a little unclear to me. Do you mean applying the patches via "git apply" from a patch…

				With the multirepo variant, the umbrella repository will be used. This is
				where the mapping from a single revision number to the individual repositories
				revisions is stored.::

				beanzUnsubmitted Done Reply Inline Actions It is worth noting (as I did when I sent this out) that this was a very rough prototype, and it doesn't solve all the problems that we would expect a more permanent solution to solve. For example, the submodule update is periodic, not on a push-based notification, and the scripting around it doesn't do a single commit per update, which was the intended solution. beanz: It is worth noting (as I did when I sent this out) that this was a very rough prototype, and it…
				git clone https://github.com/llvm-beanz/llvm-submodules
				mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions (Already addressed above) mehdi_amini: (Already addressed above)
				cd llvm-submodules
				beanzUnsubmitted Done Reply Inline Actions I'd like to see that mentioned here as well. This document is quite large and people may jump around reading it. It is worth having the note directly next to the link. beanz: I'd like to see that mentioned here as well. This document is quite large and people may jump…
				git checkout $REVISION
				git submodule init
				git submodule update clang llvm libcxx
				# the list of subproject is optional, `git submodule update` would get them all.
				beanzUnsubmitted Done Reply Inline Actions This makes it sound like the git mirrors are read-write. Might be worth adding a "via Git-SVN" comment to clarify. beanz: This makes it sound like the git mirrors are read-write. Might be worth adding a "via Git-SVN"…

				At this point the clang, llvm, and libcxx individual repositories are cloned
				and stored alongside each other. There are CMake flags to describe the directory
				structure; alternatively, you can just symlink `clang` to `llvm/tools/clang`,
				etc.

				Another option is to checkout repositories based on the commit timestamp::

				git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`

				.. _workflow-monocheckout-multicommit:

				Monorepo Variant

				The repository contains natively the source for every sub-projects at the right
				revision, which makes this straightforward::

				git clone https://github.com/llvm/llvm-projects.git llvm-projects
				cd llvm-projects
				git checkout $REVISION

				As before, at this point clang, llvm, and libcxx are stored in directories
				alongside each other.

				.. _workflow-cross-repo-commit:

				Commit an API Change in LLVM and Update the Sub-projects
				--------------------------------------------------------

				Today this is possible, even though not common (at least not documented) for
				subversion users and for git-svn users. Few Git users try to e.g. update LLD or
				Clang in the same commit as they change an LLVM API.

				The multirepo variant does not address this: one would have to commit and push
				separately in every individual repository. It would be possible to establish a
				protocol whereby users add a special token to their commit messages that causes
				the umbrella repo's updater bot to group all of them into a single revision.

				The monorepo variant handles this natively.

				Branching/Stashing/Updating for Local Development or Experiments
				----------------------------------------------------------------

				Currently

				SVN does not allow this use case, but developers that are currently using
				git-svn can do it. Let's look in practice what it means when dealing with
				multiple sub-projects.

				To update the repository to tip of trunk::

				git pull
				cd tools/clang
				git pull
				cd ../../projects/libcxx
				git pull

				To create a new branch::

				git checkout -b MyBranch
				cd tools/clang
				git checkout -b MyBranch
				cd ../../projects/libcxx
				git checkout -b MyBranch

				To switch branches::

				git checkout AnotherBranch
				cd tools/clang
				git checkout AnotherBranch
				cd ../../projects/libcxx
				git checkout AnotherBranch

				.. _workflow-multi-branching:

				Multirepo Variant

				The multirepo works the same as the current Git workflow: every command needs
				to be applied to each of the individual repositories.
				However, the umbrella repository makes this easy using `git submodule foreach`
				to replicate a command on all the individual repositories (or submodules
				in this case):

				To create a new branch::

				git submodule foreach git checkout -b MyBranch

				To switch branches::

				git submodule foreach git checkout AnotherBranch

				.. _workflow-mono-branching:

				Monorepo Variant

				Regular Git commands are sufficient, because everything is in a single
				repository:

				To update the repository to tip of trunk::

				git pull

				To create a new branch::

				git checkout -b MyBranch

				To switch branches::

				git checkout AnotherBranch

				Bisecting
				---------

				Assuming a developer is looking for a bug in clang (or lld, or lldb, ...).

				Currently

				SVN does not have builtin bisection support, but the single revision across
				sub-projects makes it possible to script around.

				Using the existing Git read-only view of the repositories, it is possible to use
				the native Git bisection script over the llvm repository, and use some scripting
				to synchronize the clang repository to match the llvm revision.

				.. _workflow-multi-bisecting:

				Multirepo Variant

				With the multi-repositories variant, the cross-repository synchronization is
				achieved using the umbrella repository. This repository contains only
				submodules for the other sub-projects. The native Git bisection can be used on
				the umbrella repository directly. A subtlety is that the bisect script itself
				needs to make sure the submodules are updated accordingly.

				For example, to find which commit introduces a regression where clang-3.9
				crashes but not clang-3.8 passes, one should be able to simply do::

				git bisect start release_39 release_38
				git bisect run ./bisect_script.sh

				With the `bisect_script.sh` script being::

				#!/bin/sh
				cd $UMBRELLA_DIRECTORY
				git submodule update llvm clang libcxx #....
				cd $BUILD_DIR

				ninja clang \|\| exit 125 # an exit code of 125 asks "git bisect"
				# to "skip" the current commit

				./bin/clang some_crash_test.cpp

				When the `git bisect run` command returns, the umbrella repository is set to
				the state where the regression is introduced. The commit diff in the umbrella
				indicate which submodule was updated, and the last commit in this subprojects is
				the one that the bisect found.

				.. _workflow-mono-bisecting:

				Monorepo Variant

				Bisecting on the monorepo is straightforward, and very similar to the above,
				except that the bisection script does not need to include the
				`git submodule update` step.

				The same example, finding which commit introduces a regression where clang-3.9
				crashes but not clang-3.8 passes, will look like::

				git bisect start release_39 release_38
				git bisect run ./bisect_script.sh

				With the `bisect_script.sh` script being::

				#!/bin/sh
				cd $BUILD_DIR

				ninja clang \|\| exit 125 # an exit code of 125 asks "git bisect"
				# to "skip" the current commit

				./bin/clang some_crash_test.cpp

				Also, since the monorepo handles commits update across multiple projects, you're
				less like to encounter a build failure where a commit change an API in LLVM and
				another later one "fixes" the build in clang.

				Living Downstream
				-----------------

				Depending on which of the multirepo or the monorepo variant gets accepted,
				and depending on the integration scheme, downstream projects may be differently
				impacted and have different options.

				* If you were pulling from the SVN repo before the switch to Git. The monorepo
				will allow you to continue to use SVN the same way. The main caveat is that
				you'll need to be prepared for a one-time change to the revision numbers.
				The multirepo variant still offers an SVN access to each individual
				sub-project, but the SVN revision for each sub-project won't be synchronized.

				* If you were pulling from one of the existing read-only Git repos, this also
				will continue to work as before as they will continue to exist in both of the
				variants.

				Under the monorepo variant, you have a third option: migrating your fork to
				the monorepo. If your fork touches multiple LLVM projects, migrating your fork
				into the mono repo would enable you to make commits that touch multiple projects
				at the same time the same way LLVM contributors would be able to do so.

				As a demonstration, we've migrated the "CHERI" fork to the monorepo in two ways:

				* Using a script that rewrites history (including merges) so that it looks like
				the fork always lived in the monorepo [LebarCHERI]_. The upside of this is
				when you check out an old revision, you get a copy of all llvm sub-projects at
				a consistent revision. (For instance, if it's a clang fork, when you check
				out an old revision you'll get a consistent version of llvm proper.) The
				downside is that this changes the fork's commit hashes.

				* Merging the fork into the monorepo [AminiCHERI]_. This preserves the fork's
				commit hashes, but when you check out an old commit you only get the one
				sub-project.

				If you keep a split-repository solution downstream, upstreaming patches to
				the monorepo is always possible (the splitrepo is obvious): you can apply the
				patches in the appropriate subdirectory of the monorepo (using either
				`git am --directory=...` or plain `diff` and `patch`).

				References
				==========

				.. [LattnerRevNum] Chris Lattner, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041739.html
				.. [TrickRevNum] Andrew Trick, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041721.html
				.. [JSonnRevNum] Joerg Sonnenberg, http://lists.llvm.org/pipermail/llvm-dev/2011-July/041688.html
				.. [TorvaldRevNum] Linus Torvald, http://git.661346.n2.nabble.com/Git-commit-generation-numbers-td6584414.html
				.. [MatthewsRevNum] Chris Matthews, http://lists.llvm.org/pipermail/cfe-dev/2016-July/049886.html
				.. [submodules] Git submodules, https://git-scm.com/book/en/v2/Git-Tools-Submodules)
				.. [statuschecks] GitHub status-checks, https://help.github.com/articles/about-required-status-checks/
				.. [LebarCHERI] Port CHERI to a single repository rewriting history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102787.html
				.. [AminiCHERI] Port CHERI to a single repository preserving history, http://lists.llvm.org/pipermail/llvm-dev/2016-July/102804.html

docs/Proposals/GitHubSubMod.rst

This file was deleted.

	===============================================
	Moving LLVM Projects to GitHub with Sub-Modules
	===============================================

	Introduction
	============

	This is a proposal to move our current revision control system from our own
	hosted Subversion to GitHub. Below are the financial and technical arguments as
	to why we need such a move and how will people (and validation infrastructure)
	continue to work with a Git-based LLVM.

	There will be a survey pointing at this document when we'll know the community's
	reaction and, if we collectively decide to move, the time-frames. Be sure to make
	your views count.

	Essentially, the proposal is divided in the following parts:

	* Outline of the reasons to move to Git and GitHub
	* Description on what the work flow will look like (compared to SVN)
	* Remaining issues and potential problems
	* The proposed migration plan

	Why Git, and Why GitHub?
	========================

	Why move at all?
	----------------

	The strongest reason for the move, and why this discussion started in the first
	place, is that we currently host our own Subversion server and Git mirror in a
	voluntary basis. The LLVM Foundation sponsors the server and provides limited
	support, but there is only so much it can do.

	The volunteers are not Sysadmins themselves, but compiler engineers that happen
	to know a thing or two about hosting servers. We also don't have 24/7 support,
	and we sometimes wake up to see that continuous integration is broken because
	the SVN server is either down or unresponsive.

	With time and money, the foundation and volunteers could improve our services,
	implement more functionality and provide around the clock support, so that we
	can have a first class infrastructure with which to work. But the cost is not
	small, both in money and time invested.

	On the other hand, there are multiple services out there (GitHub, GitLab,
	BitBucket among others) that offer that same service (24/7 stability, disk space,
	Git server, code browsing, forking facilities, etc) for the very affordable price
	of free.

	Why Git?
	--------

	Most new coders nowadays start with Git. A lot of them have never used SVN, CVS
	or anything else. Websites like GitHub have changed the landscape of open source
	contributions, reducing the cost of first contribution and fostering
	collaboration.

	Git is also the version control most LLVM developers use. Despite the sources
	being stored in an SVN server, most people develop using the Git-SVN integration,
	and that shows that Git is not only more powerful than SVN, but people have
	resorted to using a bridge because its features are now indispensable to their
	internal and external workflows.

	In essence, Git allows you to:

	* Commit, squash, merge, fork locally without any penalty to the server
	* Add as many branches as necessary to allow for multiple threads of development
	* Collaborate with peers directly, even without access to the Internet
	* Have multiple trees without multiplying disk space.

	In addition, because Git seems to be replacing every project's version control
	system, there are many more tools that can use Git's enhanced feature set, so
	new tooling is much more likely to support Git first (if not only), than any
	other version control system.

	Why GitHub?
	-----------

	GitHub, like GitLab and BitBucket, provide free code hosting for open source
	projects. Essentially, they will completely replace all the infrastructure that
	we have today that serves code repository, mirroring, user control, etc.

	They also have a dedicated team to monitor, migrate, improve and distribute the
	contents of the repositories depending on region and load. A level of quality
	that we'd never have without spending money that would be better spent elsewhere,
	for example development meetings, sponsoring disadvantaged people to work on
	compilers and foster diversity and equality in our community.

	GitHub has the added benefit that we already have a presence there. Many
	developers use it already, and the mirror from our current repository is already
	set up.

	Furthermore, GitHub has an SVN view (https://github.com/blog/626-announcing-svn-support)
	where people that still have/want to use SVN infrastructure and tooling can
	slowly migrate or even stay working as if it was an SVN repository (including
	read-write access).

	So, any of the three solutions solve the cost and maintenance problem, but GitHub
	has two additional features that would be beneficial to the migration plan as
	well as the community already settled there.


	What will the new workflow look like
	====================================

	In order to move version control, we need to make sure that we get all the
	benefits with the least amount of problems. That's why the migration plan will
	be slow, one step at a time, and we'll try to make it look as close as possible
	to the current style without impacting the new features we want.

	Each LLVM project will continue to be hosted as separate GitHub repository
	under a single GitHub organisation. Users can continue to choose to use either
	SVN or Git to access the repositories to suit their current workflow.

	In addition, we'll create a repository that will mimic our current *linear
	history* repository. The most accepted proposal, then, was to have an umbrella
	project that will contain sub-modules (https://git-scm.com/book/en/v2/Git-Tools-Submodules)
	of all the LLVM projects and nothing else.

	This repository can be checked out on its own, in order to have all LLVM
	projects in a single check-out, as many people have suggested, but it can also
	only hold the references to the other projects, and be used for the sole purpose
	of understanding the sequence in which commits were added by using the
	``git rev-list --count hash`` or ``git describe hash`` commands.

	One example of such a repository is Takumi's llvm-project-submodule
	(https://github.com/chapuni/llvm-project-submodule), which when checked out,
	will have the references to all sub-modules but not check them out, so one will
	need to init the module manually. This will allow the exact same behaviour
	as checking out individual SVN repositories, as it will keep the correct linear
	history.

	There is no need to additional tags, flags and properties, or external
	services controlling the history, since both SVN and git rev-list can already
	do that on their own.

	We will need additional server hooks to avoid non-fast-forwards commits (ex.
	merges, forced pushes, etc) in order to keep the linearity of the history.

	The three types hooks to be implemented are:

	* Status Checks: By placing status checks on a protected branch, we can guarantee
	that the history is kept linear and sane at all times, on all repositories.
	See: https://help.github.com/articles/about-required-status-checks/
	* Umbrella updates: By using GitHub web hooks, we can update a small web-service
	inside LLVM's own infrastructure to update the umbrella project remotely. The
	maintenance of this service will be lower than the current SVN maintenance and
	the scope of its failures will be less severe.
	See: https://developer.github.com/webhooks/
	* Commits email update: By adding an email web hook, we can make every push show
	in the lists, allowing us to retain history and do post-commit reviews.
	See: https://help.github.com/articles/managing-notifications-for-pushes-to-a-repository/

	Access will be transferred one-to-one to GitHub accounts for everyone that already
	has commit access to our current repository. Those who don't have accounts will
	have to create one in order to continue contributing to the project. In the
	future, people only need to provide their GitHub accounts to be granted access.

	In a nutshell:

	* The projects' repositories will remain identical, with a new address (GitHub).
	* They'll continue to have SVN access (Read-Write), but will also gain Git RW access.
	* The linear history can still be accessed in the (RO) submodule meta project.
	* Individual projects' history will be local (ie. not interlaced with the other
	projects, as the current SVN repos are), and we need the umbrella project
	(using submodules) to have the same view as we had in SVN.

	Additionally, each repository will have the following server hooks:

	* Pre-commit hooks to stop people from applying non-fast-forward merges
	* Webhook to update the umbrella project (via buildbot or web services)
	* Email hook to each commits list (llvm-commit, cfe-commit, etc)

	Essentially, we're adding Git RW access in addition to the already existing
	structure, with all the additional benefits of it being in GitHub.

	Example of a working version:

	* Repository: https://github.com/llvm-beanz/llvm-submodules
	* Update bot: http://beanz-bot.com:8180/jenkins/job/submodule-update/

	What will not be changed
	--------------------------

	This is a change of version control system, not the whole infrastructure. There
	are plans to replace our current tools (review, bugs, documents), but they're
	all orthogonal to this proposal.

	We'll also be keeping the buildbots (and migrating them to use Git) as well as
	LNT, and any other system that currently provides value upstream.

	Any discussion regarding those tools are out of scope in this proposal.

	Remaining questions and problems
	================================

	1. How much the SVN view emulates and how much it'll break tools/CI?

	For this one, we'll need people that will have problems in that area to tell
	us what's wrong and how to help them fix it.

	We also recommend people and companies to migrate to Git, for its many other
	additional benefits.

	2. Which tools will need changing?

	LNT may break, since it relies on SVN's history. We can continue to
	use LNT with the SVN-View, but it would be best to move it to Git once and for
	all.

	The LLVMLab bisect tool will also be affected and will need adjusting. As with
	LNT, it should be fine to use GitHub's SVN view, but changing it to work on Git
	will be required in the long term.

	Phabricator will also need to change its configuration to point at the GitHub
	repositories, but since it already works with Git, this will be a trivial change.

	Migration Plan
	==============

	If we decide to move, we'll have to set a date for the process to begin.

	As usual, we should be announcing big changes in one release to happen in the
	next one. But since this won't impact external users (if they rely on our source
	release tarballs), we don't necessarily have to.

	We will have to make sure all the problems reported are solved before the
	final push. But we can start all non-binding processes (like mirroring to GitHub
	and testing the SVN interface in it) before any hard decision.

	Here's a proposed plan:

	STEP #1 : Pre Move

	0. Update docs to mention the move, so people are aware the it's going on.
	1. Register an official GitHub project with the LLVM foundation.
	2. Setup another (read-only) mirror of llvm.org/git at this GitHub project,
	adding all necessary hooks to avoid broken history (merge, dates, pushes), as
	well as a webhook to update the umbrella project (see below).
	3. Make sure we have an llvm-project (with submodules) setup in the official
	account, with all necessary hooks (history, update, merges).
	4. Make sure bisecting with llvm-project works.
	5. Make sure no one has any other blocker.

	STEP #2 : Git Move

	6. Update the buildbots to pick up updates and commits from the official git
	repository.
	7. Update Phabricator to pick up commits from the official git repository.
	8. Tell people living downstream to pick up commits from the official git
	repository.
	9. Give things time to settle. We could play some games like disabling the SVN
	repository for a few hours on purpose so that people can test that their
	infrastructure has really become independent of the SVN repository.

	Until this point nothing has changed for developers, it will just
	boil down to a lot of work for buildbot and other infrastructure
	owners.

	Once all dependencies are cleared, and all problems have been solved:

	STEP #3: Write Access Move

	10. Collect peoples GitHub account information, adding them to the project.
	11. Switch SVN repository to read-only and allow pushes to the GitHub repository.
	12. Mirror Git to SVN.

	STEP #4 : Post Move

	13. Archive the SVN repository, if GitHub's SVN is good enough.
	14. Review and update all LLVM documentation.
	15. Review website links pointing to viewvc/klaus/phab etc. to point to GitHub
	instead.

docs/index.rst

Context not available.
	:hidden:	:hidden:

	CodeOfConduct	CodeOfConduct
	Proposals/GitHubSubMod	Proposals/GitHubMove

	:doc:`CodeOfConduct`	:doc:`CodeOfConduct`
	Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,	Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
	IRC, etc).	IRC, etc).

	:doc:`Proposals/GitHubSubMod`	:doc:`Proposals/GitHubMove`
	Proposal to move from SVN/Git to GitHub.	Proposal to move from SVN/Git to GitHub.


Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

Moving to GitHub - Unified ProposalClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 74413

docs/Proposals/GitHubMove.rst

docs/Proposals/GitHubSubMod.rst

docs/index.rst

Moving to GitHub - Unified Proposal
ClosedPublic