This is an archive of the discontinued LLVM Phabricator instance.

Moving to GitHub - Unified Proposal
ClosedPublic

Authored by mehdi_amini on Sep 1 2016, 4:19 PM.

Details

Summary

This document described the proposal to move to GitHub, and includes the two proposals side-by-side with a comparison between the two.
It also goes through various workflow examples, presenting the current set of commands following by the ones involved in each of the two proposals.

It is intended to supersede the previous "submodule proposal" document entirely, and drive the design of a survey addressed to the community.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rengolin added inline comments.Sep 2 2016, 1:26 AM
docs/Proposals/GitHubMove.rst
161

IMO, we don't need to keep the same format, but that's a good point.

Though, it would be better to outline the two options in one quick phrase than leave the other implied.

mehdi_amini marked 4 inline comments as done.Sep 2 2016, 1:46 AM
mehdi_amini added inline comments.
docs/Proposals/GitHubMove.rst
298

I added "with the sequence" following your comment to make it more clear.

425

I'm not sure I follow: AFAIK recursive is for nested submodules, which is not part of the proposal. So to be clear I expect --recursive to be a no-op. I can be wrong, but I'll need some more explanation if I missed something obvious here.

If your point is about cloning *all* the sub-projects and not only just a selected list, then --recursive is not the right option, just doing git submodule update without any other flag will do it. I'll spell it out.

618

I'm sorry I don't follow. You mention a changed in the flow for commit. Here is what's mentioned in the section I referred to, can you clarify where is the inaccuracy?

Workflow today:

# direct SVN checkout
svn co https://user@llvm.org/svn/llvm-project/llvm/trunk llvm
# or using the read-only Git view, with git-svn
git clone http://llvm.org/git/llvm.git
cd llvm
git svn init https://llvm.org/svn/llvm-project/llvm/trunk --username=<username>
git config svn-remote.svn.fetch :refs/remotes/origin/master
git svn rebase -l  # -l avoids fetching ahead of the git mirror.

Workflow after (copy/paste):

A second option is to use svn via the GitHub svn native bridge::

  svn co https://github.com/llvm/llvm-projects/trunk/compiler-rt compiler-rt  —username=...

This checks out only compiler-rt and provides commit access using "svn commit",
in the same way as it would do today.

Finally, you could use *git-svn* and one of the sub-project mirrors::

  # Clone from the single read-only Git repo
  git clone http://llvm.org/git/llvm.git
  cd llvm
  # Configure the SVN remote and initialize the svn metadata
  git svn init https://github.com/joker-eph/llvm-project/trunk/llvm —username=...
  git config svn-remote.svn.fetch :refs/remotes/origin/master
  git svn rebase -l


In this case the repository contains only a single sub-project, and commits can
be made using `git svn dcommit`, again **exactly as we do today**.
kparzysz added inline comments.
docs/Proposals/GitHubMove.rst
199

and preserve the history.

beanz added a comment.Sep 2 2016, 10:52 AM

Wait a second. We're choosing between two proposals. The three of us here are among the experts.

Assuming that we're somehow experts on workflows above other contributors seems a bit presumptuous to me. Keep in mind the difference in the proposals isn't Git, so being a Git expert (which I'm certainly not) isn't really relevant. I'm not an expert on other people's workflows, so I would prefer if we approach this from a perspective of providing information and allowing people to form their own opinions.

We absolutely should be comparing the two proposals explicitly, to draw users' attentions to the differences that we think are important. Because we actually know something that others don't!

Really? No offense, but this comes off as incredibly condescending. You're asserting that you know better than everyone else. I'm not saying we shouldn't help people compare the proposals. I'm saying we shouldn't draw conclusions for them. Our community is filled with a lot of really smart people and I strongly believe they are capable of forming their own informed decisions.

I don't want a one-sided fight, where only the monorepo or multirepo cadre gets to have its say. But if you believe that debate leads to better outcomes, then absolutely we should compare and advocate, which is just another way of explaining why, in our view, one proposal is better than the other.

Debate is great. The LLVM.org documentation on the proposal isn't the place for it. Debate it at the social, debate it in your office, debate it on the lists and on IRC, debate it on your livejournal (I think that's still a thing right?). I believe the proposals should be neutral. I believe the documentation on LLVM.org should be position agnostic geared toward informing people without trying to directly influence them.

Can we just have these as separate sections in one document? That's almost what we have here.

Four documents is a lot to ask people to read and understand. At least with one, they can skip around, etc... And it will also be easier to review and edit.

I disagree. There are a number of advantages to multiple focused documents, and one of them is that they will be smaller and easier to review. Also shorter documents are generally easier to digest because you can pick one up and read it in a few minutes, and they provide break points.

I think we could agree on a section that lays out the monorepo and multirepo proposals in a dry way, explaining what each looks like. Then we could have the "why monorepo" and "why multirepo" sections. Finally we could have the workflow comparison.

I would have no objection to this assuming that the two "why" sections and workflow comparisons are also neutrally written and avoid direct references to the other proposals.

Could we just add a similar section advocating for the multirepo and be done here, move on with our lives?

No. I don't believe advocacy documents of this nature have any place on llvm.org. I know it isn't your intention, but your document as written is little better than propaganda. It is filled with half-truths, assertions that aren't backed by fact, and slanted language designed to influence the reader to your opinion. To provide some examples:

SVN users would be more disrupted in the multirepo approach.

I disagree. I think that the mono-repo is more disruptive. But, that is my opinion, not backed by fact. You present this as a fact, and it is clearly a subjective opinion.

Because all of our subprojects would live in a single repository, we could move code between them easily and without losing history.

This can actually be done with a multi-repo approach too using sub-tree merging. While it may not be as easy, it doesn't lose history.

With the multirepo, moving clang-tools-extra into clang would be much more complicated, and might end up loosing history.

Same as above. You don't need to lose history to do this. Yes, it is more complicated, but it is a one-time cost.

Look, at the end of the day I care *way* more about how our community arrives at a decision rather than the specifics of the decision. I believe that are community is filled with intelligent people that are capable of drawing their own opinions, and that those opinions are equally or more valuable than my own. As a result I believe that the information we should be providing to the community for consideration of these proposals should be the best possible effort at being non-biased and impartial.

You may disagree with some or all of that, which is certainly your prerogative. What I'm trying to push for here is a framework for how to construct documents for these proposals that strive to be non-biased. Is it possible to write non-biased documents that refer to each other? Sure. Is it possible to write biased documents that *don't* directly refer to each other? Sure. My point is that it is *easier* to write non-biased documents if they don't directly relate themselves against their opposition.

-Chris

probinson added inline comments.Sep 2 2016, 11:03 AM
docs/Proposals/GitHubMove.rst
521

Very nicely succinct. One typo: scripts -> script

Chris, I am really happy to work with you to make sure that you're happy that the parts of this document that are explicitly not advocating a position come across as dry and factual. I agree that parts of this document don't come across that way, and, where we're not explicitly advocating for a position, they should be changed. This was actually my explicit feedback to Mehdi when I reviewed his document, and also when he sent me this review yesterday.

But I admit to being flummoxed by what I read as rage here against the idea that we would allow advocates to explain their reasoned positions in a document posted to llvm.org. (Do you burn your California voter's guide for this reason?) Indeed, I am even more confused because the rage seems to be only aimed at the parts of this document that compare the monorepo and multirepo, but not at the parts that compare git and svn while clearly advocating for one side.

I also am really confused by the idea that somehow explaining why I think X is true prevents someone else from making up their own mind. It seems to me that *not* explaining my arguments would actually prevent people from making an informed decision.

But you're clearly very upset by this, and I don't think it is worth arguing further. Frankly I'm already feeling fight-or-flight here, and if the rhetoric escalates further, I'm afraid I won't feel welcome in this discussion at all.

Would you be amenable to a compromise, Chris? How about we link to advocacy statements from this document? Would that be acceptable to you?

beanz added a comment.Sep 2 2016, 11:50 AM

Justin,

I apologize if my frustration comes off as rage. I believe that healthy debate is valuable and should be welcomed, but I also believe that debate has a place, and that this isn't debate. While I don't burn my CA voter's guide, it at least is a bit more honest about the fact that it publishes paid opinions (which I generally don't read). If you would prefer to follow the format of the CA voter's guide I'd feel way more comfortable with this. The idea of providing a dedicated space for arguments and rebuttals that are clearly labeled as the opinions of a person or group of people is not objectionable to me. I object to the intermingling of opinion and fact that blurs the lines between the two.

Comparing the proposals isn't the problem. The problem is that this document draws conclusions from the comparisons. All of that is fixable, and I spoke with Mehdi before he posted this review and voiced my concerns and willingness to help improve the document.

The problem is also not you thinking something is true, the problem is the document doesn't say "@jlebar thinks the mono repo is better because ...", it says "the mono repo is better because ...". These documents will be consumed by many people who are not following the review or dev list threads on the topic, which means that separating the opinion and fact will be very difficult for most of the audience.

I apologize for any comments I've made that have made you feel unwelcome in this conversation. It is not my intent. The discussions around moving to Git have caused a lot of passionate discourse which has already been way too unwelcoming to many members of our community. In the future I will strive to keep my responses less impassioned.

-Chris

Wait a second. We're choosing between two proposals. The three of us here are among the experts.

Assuming that we're somehow experts on workflows above other contributors seems a bit presumptuous to me. Keep in mind the difference in the proposals isn't Git, so being a Git expert (which I'm certainly not) isn't really relevant. I'm not an expert on other people's workflows, so I would prefer if we approach this from a perspective of providing information and allowing people to form their own opinions.

I think this is more about addressing people's needs: we can't know everyone else's workflow, but we (well, not me) can certainly provide the answers to "how do I do this in git". In that sense being a git expert definitely helps.

We absolutely should be comparing the two proposals explicitly, to draw users' attentions to the differences that we think are important. Because we actually know something that others don't!

Really? No offense, but this comes off as incredibly condescending. You're asserting that you know better than everyone else. I'm not saying we shouldn't help people compare the proposals. I'm saying we shouldn't draw conclusions for them. Our community is filled with a lot of really smart people and I strongly believe they are capable of forming their own informed decisions.

I am not an expert in either git or svn, nor do I want to become one for the sole purpose of making an informed decision here. I cannot predict all possible consequences of either approach that could affect my work, and I do welcome a summary of the differences with their potential impact. Moreover, it may very well be that the differences in the proposed solutions will not be significant to me and my workflow. In that case, I would give preference to the solution that helps the the rest of the "core" developers.

jlebar added a comment.Sep 2 2016, 2:00 PM

Thanks a lot for the apology, Chris.

If you would prefer to follow the format of the CA voter's guide I'd feel way more comfortable with this. The idea of providing a dedicated space for arguments and rebuttals that are clearly labeled as the opinions of a person or group of people is not objectionable to me. I object to the intermingling of opinion and fact that blurs the lines between the two.

I am 100% onboard with this. In fact, it's what I was trying to get at in the first place when I wrote:

I think we could agree on a section that lays out the monorepo and multirepo proposals in a dry way, explaining what each looks like. Then we could have the "why monorepo" and "why multirepo" sections. Finally we could have the workflow comparison.

I also don't like the half-advocacy position taken in parts of this document, particularly "one or many repositories?" When I revised Mehdi's proposal, I structured it more like you suggested, being explicit when we're advocating for one side or the other.

If you're amenable, I'm actually kind of interested in doing a postmortem (offline) to figure out how I could have communicated that better. Would have saved us both some grief, I think. In any case I'm really sorry for getting you worked up over something we agree on. And I greatly appreciate your apology. I too will try to stay a bit more calm. :)

Hi folks,

Deviating a bit from the conflict, I'd like to show how all of us agree on the same things basically...

I believe the proposals should be neutral. I believe the documentation on LLVM.org should be position agnostic geared toward informing people without trying to directly influence them.

I agree with you, and I think that's what Justin was going for. I also see Mehdi's text as an attempt to do that, but it's hard to do it (I certainly can't) while we have an opinion (and we all have some).

So, the way I see it, we're arguing over specifics. There's no need.

I can see two ways this will go down without exploding again:

  1. We "sanitise" the text from our biases the best we can and expect people to understand it. I'm not saying the intention was to be biased, just that humans bias. By having more points of view (this review is a good start), we "clean it up" a bit. Ie. we just continue doing this review as is, until everyone is happy "enough". (the quotes are important, as those words can have multiple meanings, and I mean the best of them). This will have the cost of re-reviewing what was agreed before (the sub-mod proposal), but we can make it simpler by having a clear table to workflow (as I proposed earlier) and a very short list of pros and cons (as was proposed on the list).
  1. We do the original plan, to have two completely separate sections (on the same document). The sub-mod section copied (+ some workflow) from the other text, the mono-repo as a filter from this text. It'd still be good to have a table at the end, though. The additional cost is re-do a lot of what Mehdi has done, but the benefit is that people that spent time discussing the other proposal won't have to do the whole thing again here.

I don't mind either way, but would be good to pick one and stick with it.

I would have no objection to this assuming that the two "why" sections and workflow comparisons are also neutrally written and avoid direct references to the other proposals.

I agree this has to be avoided, no matter how we organise the document. Justin's point that "this is a comparison" is very fair, but IIGIR, Chris' point was surrounding "bias comparison" (ex. "A is more complicated than B").

Workflows are all different, and people find different things complicated. Let's just lay out the independent facts about each one, even if the text becomes a bit brittle, and let people do their own comparisons.

I don't believe advocacy documents of this nature have any place on llvm.org.

So, removing all feelings around the phrase and the moment, I think there's a lot of meaning in this statement.

The OSS LLVM community has prided in not pushing an agenda towards anyone. The individual contributions (company or personal) do have agendas, and they're synced upstream, and the discussions are massively technical in nature. There is obviously a power play, as in any other community (OSS or not), but here, we have always valued technical arguments over everything else.

The GitHub problem is a technical one, but also a personal one. Technical problems, with technical solutions, but very rooted on how companies and individuals develop, validate and deliver their products. It's very easy to let our biases of "this is much easier than that" unintentionally blind us, and we have to be careful.

The first rule of thumb is to not assume, for any reason, that we're experts. Most people in our community could have been doing the job of investigating this issue, but they're not. That's not to say that whatever opinion *we* reach will be the best, just that whatever we present has to be clear, concise and technically accurate, so *they* can take their own decisions.

We're not here to decide what's best, but to digest and present the information in the simplest form possible, so everyone can decide what's best for themselves.

SVN users would be more disrupted in the multirepo approach.

I disagree. I think that the mono-repo is more disruptive. But, that is my opinion, not backed by fact. You present this as a fact, and it is clearly a subjective opinion.

This is a very good example why we can't use biased comparison like "A would be more disruptive than B". This is a personal opinion, not an invariant fact to all members of our community. We should not have any of that on this document, or people will not take it seriously and we won't have the effect we want.

I'm not pointing fingers or trying to start a fight, I'm just making clear that this discussion cannot happen in the text.

We need simplicity and clarity. Despite our best efforts, this text is not there yet, and it's no one's fault. I think Mehdi, Justin and Chris have done a remarkable job at collating and discussing all the facts, and I think a lot of people in the community *really* appreciate it. I certainly do.

But now it's time to stop discussing how the *best* workflow looks like, and start clearing up the proposal.

In my personal view, Mehdi's description and workflow examples are great. We just need to validate the sub-module part with the previous proposal, and work on the formatting.

cheers,
-renato

docs/Proposals/GitHubMove.rst
618

This is how it would work on a multi-repo, but this section is talking about the mono-repo.

IIGIR, on a mono-repo, developers of a single component will have to commit back on the mono-repo, which will then be propagated to the individual (read-only) repos, no?

mehdi_amini updated this revision to Diff 70272.Sep 3 2016, 2:07 PM
mehdi_amini marked 15 inline comments as done.

Address most minor comments

mehdi_amini added inline comments.Sep 3 2016, 2:09 PM
docs/Proposals/GitHubMove.rst
425

I added a comment mentioning that the list if optional. Let me know if I misunderstood something about --recursive above.

618

This is how it would work on a multi-repo

I'm not totally sure what is "This" referring to?
Assuming it is about my previous paste, then no it describes the monorepo.

IIGIR, on a mono-repo, developers of a single component will have to commit back on the mono-repo, which will then be propagated to the individual (read-only) repos, no?

Right, and this is the same thing as what a git-svn developer do today:

  1. git clone the individual repo
  2. configure git svn to point to the SVN repo (the one from the monorepo in the future).
  3. commit through SVN
  4. the commits are propagated to the individual repo.
beanz added a comment.Sep 9 2016, 11:39 AM

Lots of inline comments.

docs/Proposals/GitHubMove.rst
64

The language here is also misleading. Maybe change to something like:

Many new coders nowadays start with Git, and a lot of people have never used SVN, CVS, or anything else.

69

I would remove the "(most?)" bit here because it doesn't really add any value. We have no data to support an assertion of "most", and it could be misleading to suggest it.

80

Can we also add this as a point:

  • Maintain remote forks and branches on Git hosting services and easily integrate back to the main repository.

In particular for people that maintain out-of-tree code or forks, the ability to seamlessly merge between repositories is a big win for Git.

138

Can we also add something about the more traditional Git approaches to this? Maybe something like:

Additionally, there are simple Git commands that can also be used to determine the order of commits. For example to answer the question is a bug fixed in <hash-a> fixed in a compiler built at <hash-b> can be answered with the command git rev-list <hash-a>..<hash-b> --count. If this prints a number greater than 0, the fix is contained in <hash-b>. Additionally if we were to use Git tags similarly to how we use SVN tags today you would be able to identify which releases contained a fix by running git describe --contains <hash>.

193

This is a completely subjective statement, and should not be present.

203

With git history could be preserved even across repositories. Git subtree merges support this, and while it isn't as simple, it is a one-time cost.

206

As in my other comment, losing history is not an issue.

209

Actually, there were also concerns about the increased burden for *contributors* not just downstream users.

In general I think this entire section is designed to point out supporting arguments for the mono-repo with no recognition of the merits of the multi-repo proposal.

231

This is very slanted wording. From a user perspective the multi-repo solution to this problem is not much more complicate than the mono-repo solution.

422

Nit: "enters the dance" implies complexity.

453

Not sure I agree this is easy for svn users. To my knowledge llvm.org doesn't even document how to checkout the SVN repositories in a way to make this possible.

519

Additionally users of the umbrella repo can use git submodule foreach to have single command workflows that nearly match the mono-repo proposal.

571

This is inaccurate. Even though my rough prototype of the git umbrella repo doesn't have each submodule update being a single commit that was the stated plan for how the umbrella would be updated. That means each umbrella repo commit would represent a single commit to a single subproject, so your bisection granularity is comparable.

582

Better to say "both proposals will allow you to continue to use SVN". The wording here makes it seem like only the mono-repo has GitHub's SVN support, even though that is later contradicted.

621

This is a subjective statement that I don't believe is factually accurate. We could easily teach the build system to checkout subprojects so that building a full toolchain could be git clone ... && configure && build regardless of the repository layout.

628

I'm confused by this. The sub-project mirrors are read-only, so the workflow is either checkout the full mono-repo or use Git-SVN. That doesn't sound unchanged.

636

It is worth noting (as I did when I sent this out) that this was a very rough prototype, and it doesn't solve all the problems that we would expect a more permanent solution to solve. For example, the submodule update is periodic, not on a push-based notification, and the scripting around it doesn't do a single commit per update, which was the intended solution.

emaste added a subscriber: emaste.Sep 9 2016, 12:49 PM
emaste added inline comments.Sep 9 2016, 12:58 PM
docs/Proposals/GitHubMove.rst
11

A little point, but I think we should say "why we're proposing such a move" or similar.

"why we need such a move" in the first paragraph of the document implies the decision is already made, and might discourage those against change from even responding.

mehdi_amini updated this revision to Diff 70918.Sep 9 2016, 3:22 PM

Replace "why we need to move" with "why we are proposing to move".

mehdi_amini marked an inline comment as done.Sep 9 2016, 3:22 PM
mehdi_amini updated this revision to Diff 70924.Sep 9 2016, 4:04 PM
mehdi_amini marked 9 inline comments as done.

Address beanz' comments.

mehdi_amini added inline comments.Sep 9 2016, 4:07 PM
docs/Proposals/GitHubMove.rst
139

I'm not against mentioning this somewhere, but the "traditional" Git approach of hashes does not address at all the concerns mentioned right above.

194

Rewrote, but I suspect we'll need some other rounds. Suggestion welcome.

204

Can you provide an example where the history of a single file *contents* can be preserved without pulling all the source repository entirely?
I'd like to try it and see how git log/git blame deals with that.

210

You're welcome to suggest merits of the multi-repo proposal to balance.

232

Please provide a replacement for this sentence.

454

Do you have an alternative to suggest?

572

If you have a way to *guarantee* it, I'm willing to hear about it. Right now, I don't believe it is possible without implementing it on the git hosting itself.

583

I did a minor rewording (we're on a different support level here between the two solutions, which need to be conveyed somehow).

622

Removed the paragraph

629

We're talking about libcxx in the monorepo proposal?
Assuming yes, can you give an example of workflow that would be changed compared to today?

637

(Already addressed above)

beanz added a comment.Sep 9 2016, 4:28 PM

I'll work up some suggested alternate phrases this weekend or early next week, but I have some responses inline.

docs/Proposals/GitHubMove.rst
205
211

I don't think that our proposals should be constructed as convoluted arguments between contributing authors. Adding pro multi-repo statements will only make this more difficult to grok.

I actually think there is very little in this section that shouldn't be part of an "arguments/rebuttals" section.

573

You can absolutely *guarantee* the same granularity. You can't guarantee the same ordering, but generally speaking that is significantly less important than granularity.

To get the same granularity you allow the script that updates submodules to produce more than one commit to the submodule repo at a time. If there are multiple you can sort them by committer date. While committer date isn't a great thing to use since our proposals both depend on maintaining a linear history it should be good enough for the common cases because committer date gets reset on rebase.

630

Ah. I think the confusing phrasing is that monorepo is being used in two contexts. Maybe rephrase this to something like:

With this variant of the monorepo proposal developers who only work on excluded sub-projects will continue to use the single-project repositories.

The workflow is still changed from today, because today we're using SVN.

638

I'd like to see that mentioned here as well. This document is quite large and people may jump around reading it. It is worth having the note directly next to the link.

mehdi_amini marked 4 inline comments as done.Sep 9 2016, 4:58 PM
mehdi_amini added inline comments.
docs/Proposals/GitHubMove.rst
205

I don't see git subtree at work on this link, just filter-branch + git mv + merge.

That flow tracks the history of a file, not its content AFAIK (i.e. if a function was moved from another file into the current one, the history of when/why this function added/modified won't be included).

Also, what would be the effect of moving a file from a repo to another, and later back to the original repo?

211
  1. There is nothing convoluted here.
  2. Adding fact-based pro multi-repo statement will make it easier to understand.
  3. I disagree, I think most of this section should stay here. So we'll have to go in the specifics, piece by piece.
573

but generally speaking that is significantly less important than granularity.

No sorry, I can't agree, this is critical: correctness goes before usability. It seems to me that you're willing to trade correctness to bring a guarantee of usability here.

I'm willing to believe that "in practice" the granularity should be small enough, it just has to be worded carefully. Right now it is a parenthesis at the end:
(it is possible that one commit in the umbrella repository includes multiple commits in the sub-projects) , we can reword this (it is possible that one commit in the umbrella repository includes multiple commits in the sub-projects, though it should be occasional in practice)
(One may bikeshed on what exactly "occasional" is though, but we don't have any data to bikeshed efficiently anyway).

630

Sorry, the sentence is really about the monorepo: leaving libcxx within the monorepo should not be a regression compared to today.

mehdi_amini marked 22 inline comments as done.Sep 26 2016, 2:29 PM
mehdi_amini added inline comments.
docs/Proposals/GitHubMove.rst
213

Can you clarify what you're referring to exactly and/or suggest some editing?

233

(Tried to make it more explicit that complexity is handled by the infrastructure)

mehdi_amini marked an inline comment as done.

Address the remaining outstanding comments (but 1 or 2 maybe)

Any other comments? Otherwise we should move forward.

beanz added a comment.Sep 30 2016, 2:30 PM

Lots of comments inline, and one meta-comment.

Looking at the details of the mono-repo proposal "use the GitHub SVN" interface is the answer to a lot of workflows. How would the Git-SVN workflows be impacted by moving to a PR-based workflow? I assume it works fine, you just create SVN branches and commit to them then make the PR via the web UI. Is that correct?

I know we're not actually considering a PR-based workflow, but it is something to consider.

docs/Proposals/GitHubMove.rst
223

What about the concerns about active community members having this burden?

250

Remove "(with some granularity)". The multi-repo proposal can have the same 1:1 mapping of commits in per-project repos to umbrella commits that the mono-repo would have.

When the update job runs with a list of more than one commit we can sort them by committer timestamp (which is updated after rebase). It will provide a roughly linear timeline for the commits to be sorted across the repositories. It won't be perfect, but it should be good enough for sorting commits in close proximity because the pushed commits will either be rebased (which updates the committer timestamp) or they will be merge commits which will have a committer timestamp generated when the merge commit was generated.

254

I would say 'continuously' rather than periodically here. You describe in more detail below how the notifications would be configured and 'periodically' isn't a full picture.

269

s/interacts/would interact/

335

You've lost me here. Checking out all the projects in SVN today involves multiple svn co commands. Unless there is some magic in SVN I'm unaware of. If there is such magic we should document it somewhere on LLVM.org (maybe on the getting started page?) and link to it here.

351

Can you please add actual size numbers for each project and the mono-repo?

Just saying '2x' isn't super meaningful without knowing the size of 1x.

388

the emphasis on 'exactly as we do today' is unnecessary.

446

Alternatively since our intention is to enforce a linear history in the repositories doing a checkout by timestamp using the format below should also work in the majority of cases.

git checkout 'master@{...}'
462

Again, I don't follow how this is easy. There is no documentation on LLVM.org explaining how to do this and my limited knowledge of SVN leaves me with no idea how to do it.

467

I would phrase as "It would be possible...", because it most certainly is possible.

472

Please remove "and makes this use case ...", it is a value judgement.

585

If we go with the multi-repo approach we can ensure that each umbrella repo commit will be only one submodule update. This is relatively straight forward tooling to add. The only situation where we could potentially allow multiple updates in a single umbrella commit would be if we wanted to do cross-repository correlating of revlocked changes.

590

The granularity is not finer.

602

Better to say both proposals allow you to continue using SVN the same way, but that each solution will have minor impacts. In the monorepo there will be a one-time change in revision numbers, and in the multi-repo each project will have its own revision numbers out of sync from each other.

608

s/any of the proposal/both of the proposals/

612

Reword from the second sentence on. You're making a value assessment. A better phrasing might be:

If your fork touches multiple LLVM projects, migrating your fork into the mono repo would enable you to make commits that touch multiple projects at the same time the same way LLVM contributors would be able to do so.

623

I would phrase the downside as "rewriting the fork's history and changing its commit hashes", because that is what happens.

631

This is a little unclear to me. Do you mean applying the patches via "git apply" from a patch file? Might be worth clarification about how that would work.

642

This makes it sound like the git mirrors are read-write. Might be worth adding a "via Git-SVN" comment to clarify.

beanz added inline comments.Sep 30 2016, 2:30 PM
docs/Proposals/GitHubMove.rst
208

How does the mono-repo do this? It might make it easier, but since it is likely that even with a mono-repo most people won't build all projects I don't think it actually encourages updates across all sub-projects.

215

You still haven't addressed the feedback here. Saying the multi-repo would lose history is still inaccurate.

For starters, you're not actually deleting the history from the repository you're moving code from. Also with a multi-repo you can easily preserve the file history by using git filter-branch. Using filter-branch will not follow history across renames that are outside the filter, but will follow them within the filter.

For example if you were to use filter branch on lib/Support to break it out into its own repository, filter branch would preserve history of files under lib/Support that are renamed as long as they remain under libSupport. It would not preserve the history of a file being renamed and moved under libSupport. Even with that the history before that point *is* traceable because the history would still exist in the old repository, so you are not losing history, you just aren't moving it with the file.

mehdi_amini marked 27 inline comments as done.

Address review.

docs/Proposals/GitHubMove.rst
208

I was thinking about the fact that if I change the API createTargetMachineFromTriple(), and git grep to find the uses, then all the uses in sub-projects will show up.

215

Fair enough: replaced "losing history" with "the history of the refactored code won't be available from the new place".

223

Can you clarify what you're referring to exactly? (No regression compared to now I believe)

250

It seems to me that at the beginning the idea was that the submodules would be updated every few minutes, *so that* we'd be able to have rev-locked commits pushed to multiple projects at the same time and have them appear a single umbrella update (with somehow a heuristic like "update the submodules when there hasn't been a push for 2 min").

Apparently your idea is rather than we should update it with single commits, but what's the story for rev-locked?
How would the tooling not have a race condition? Example:

  1. I commit to LLVM
  2. I commit to Clang
  3. the script runs, pull LLVM, no change
  4. I push to LLVM
  5. I push to Clang
  6. the script pulls Clang, see my commit
  7. the script is done with pulling and update the submodule with the clang change, *before* the LLVM change, even though the commit date would be reversed.

I don't see a principled solution to implement the umbrella without server-side (i.e. native git hook) support. Sure you can craft it, and it'll work fine most of the time, but that does not make it bulletproof.

335

I was referring to:

svn co http://llvm.org/svn/llvm-project/ --depth=immediates
cd llvm-project/
svn up llvm/trunk clang/trunk libcxx/trunk

You can then have a build with only LLVM configured like:

mkdir ../build-llvm && cd ../build-llvm
cmake ../llvm-project/llvm/trunk

And a build dir with llvm+clang:

mkdir ../build-clang && cd ../build-clang
cmake ../llvm-project/llvm/trunk -DLLVM_EXTERNAL_CLANG_DIR=../llvm-project/clang/trunk/

So that a single svn up $projects in the source directory update all the sources and you can still build a subset of the projects from these sources.

This is also how I'd synchronize if I was integrating downstream from SVN.

446

This applies to both proposals right? Where do you want me to add this?

462

(Copy/pasted commands above)

462

Copy/pasted above (I'm not sure I really want to document it on llvm.org now).

472

I don't believe so, but if you insist...

571

(I'm waiting for the story to support this above)

585

(I'm waiting for the story to support this above)

590

(I'm waiting for the story to support this above)

602

"The same way" implies "a single SVN revision number to me". One could even say "a single SVN checkout" (cf the command I copy/pasted above).
I don't see how it'd work with the multi-repo?
How would someone downstream integrating from SVN be able to correlate revision across repositories?

623

The paragraph *starts* with " Using a script that rewrites history" and end with "changes the fork's commit hashes", it seems to me that this makes explicit that the downside of rewriting history is that the hashes change.
(I'm not sure how "rewriting history" is a downside by itself otherwise)

beanz added inline comments.Sep 30 2016, 4:57 PM
docs/Proposals/GitHubMove.rst
208

That is 'making easier' not 'encouraging'. Personally I fall to 'grep' way before I fall to 'git grep' for things like this, and I don't think the monorepo has any enforcement of this.

215

In your example of moving clang-tools-extra there would be *no* need for loss of history at all. There is no need for filter-branch. You can literally reformat clang-tools-extra to be under tools/extra/ and merge the whole tree into the clang master branch.

The only point where you would lose any history at all is if you were trimming one part of a repository into another repository, and even in that situation you can minimize the losses pretty well using filter-branch and index scripts. It is complicated but possible.

223

Ah. I misread. I see what you are saying. This is fine.

250

The automation will run. It will collect a list of commits that have been pushed to each repository since the last time the script ran. It will then sort them by committer timestamp order, and commit one at a time to the umbrella repo as submodule updates.

We can setup the automation to run based on GitHub WebHooks, and periodically in case a WebHook gets dropped.

There is no race condition that I see.

If we need to support revlocked changes, (and I'm not convinced this is the case since they are by far a minority of commits) we can support them via annotations on the commit messages. We can teach the automation to look for markers in the commit message denoting that it is revlocked to other changes, and we can have it group revlocked changes together.

There is no need for server-side hooks, and this solution would work as well as any mirroring system. I don't believe there is any need for this solution to be bulletproof, but I see no reason why it cannot be as robust as the single-project mirrors that the mono-repo proposal includes.

335

I can't imagine that is a common workflow. It certainly isn't the documented recommended workflow on llvm.org, so I'm not sure there is value in bringing it into the discussion.

351

Can you add per-project sizes?

446

I think it is worth noting under the multi-repo proposal something along the lines of:

Because we will be maintaining a linear history you can perform a timestamp based checkout of each project repository with the following command:

git checkout 'master@{...}'

Additionally you can use the umbrella repository...

If you want to also add the timestamp checkout to the mono-repo proposal, that makes sense too. I just think it is worth noting under the multi-repo proposal that timestamp based checkouts are expected to work due to the linear history requirement, which means you don't need the submodule repo.

462

Fine if you don't want to document it, but I certainly would not describe that as "easy". Especially because if you ever mix up and type "svn up" in the root it starts updating *everything*. I think this is an incredibly fragile workflow, which is probably why it is also incredibly uncommon.

571

See above.

585

Again, above.

602

Maybe rather than "the same way" "with similar workflows to today"?

623

Fine.

mehdi_amini marked 2 inline comments as done.Sep 30 2016, 5:36 PM
mehdi_amini added inline comments.
docs/Proposals/GitHubMove.rst
208

That is 'making easier' not 'encouraging'.

"All the source is there by default" *+* "making it easier" => why I wrote "encouraging".

Personally I fall to 'grep' way before I fall to 'git grep' for things like this, and I don't think the monorepo has any enforcement of this.

Not sure why "enforcement" comes into play here?

215

So do you have anything *concrete* that could be added here, be practical (something we'd be willing to encourage in the future), be understandable by any dev, *and* not take > 20 lines to describe?

250

The automation will run. It will collect a list of commits that have been pushed to each repository since the last time the script ran.

Atomically?

There is no race condition that I see.

Did you read my sequence 1-7 that describes an example of race?

but I see no reason why it cannot be as robust as the single-project mirrors that the mono-repo proposal includes.

Define "robust". The single-project mirrors have a very well *deterministic* algorithm to construct, and reconstruct them at will, you don't have one for the multi-repo. That's not "robust" to me.

351

That'd make a long list, how should it be presented?

446

Are you sure that this command does what you think it does?
If I read correctly the doc, it is looking at your *reflog*, not the history.

The right one should be something like git checkout `git rev-list -n 1 --before="2009-07-27 13:37" master`

I just think it is worth noting under the multi-repo proposal that timestamp based checkouts are expected to work due to the linear history requirement, which means you don't need the submodule repo.

OK that wasn't clear to me the first time.

602

I'm still missing what would be similar for someone integrating multiple projects from SVN today (assuming such downstream integrator exists) with the multi-repo?

Add mention of the ability to check out the individual repos according to a timestamp

mehdi_amini marked an inline comment as done.Oct 2 2016, 11:17 AM

Ping?

beanz added inline comments.Oct 3 2016, 10:05 AM
docs/Proposals/GitHubMove.rst
208

"All the source is there by default"

This is what makes it easier. Your math is double counting it. I disagree with your wording here. I've told you I disagree. You can continue to disregard my feedback or you can fix it. The choice is yours.

215

You gave an example that is factually incorrect. I'm asking you to fix it. That is concrete. In my earlier comment I told you why your example was incorrect. You can remove the example, or come up with an alternative. That is your choice. What you cannot do, is use this factually inaccurate example.

250

I've updated my automation (https://github.com/llvm-beanz/llvm-submodules) to make one umbrella commit per commit to sub-project repository. This has a single commit granularity. That was the original point I was arguing. It works. It is done.

Is it perfect? No. There are a number of situations where the order of the commits to the submodule can be impacted by the order and proximity of commits to the project repositories. That is irrelevant to the point I was making. I'm more than happy to debate with you about whether or not that matters, but that is a separate issue from what I was pointing out.

Do we need to belabor this further, or will you update the document based on my feedback?

351

However you think it is best presented. A table would seem fitting. You could put it below and have a link down to it. I think that if you're bringing size into the discussion you need to provide sufficient data.

446

You are correct, you need to use rev-list to get the commit hash.

602

I strongly suspect that very few users are using a single SVN checkout that contains more than one sub-project. If you discount that workflow, the workflow for interfacing using the GitHub SVN bridge is very similar whether you are using one repo or many.

Additionally, with the mono repo the combined SVN workflow is actually a lot better than with SVN today. It is way less fragile since you aren't doing sub-directory checkouts. This means you don't run the risk of inadvertently running svn up and pulling down way more than you wanted.

mehdi_amini added inline comments.Oct 3 2016, 11:43 AM
docs/Proposals/GitHubMove.rst
208

"All the source is there by default"

This is what makes it easier.

Sorry, but I mentioned earlier git grep and you answered That is 'making easier'.
All the source presents by default is more than making it easier.

I disagree with your wording here. I've told you I disagree.

I strongly disagree with your disagreement here.

215

The current spelling (Friday, 3:51pm) is: "With the multirepo, moving clang-tools-extra into clang would be more complicated than a simple git mv command, and the history of the refactored code won't be available from the new place."
I can change the example to: "Refactoring some functions from clang to make it a utility in one of the llvm/lib/Support file to share it across sub-projects wouldn't carry the history of the code in the llvm repo."

That said, I asked you on 9/9 (over 3 weeks ago) "Can you provide an example where the history of a single file *contents* can be preserved without pulling all the source repository entirely? I'd like to try it and see how git log/git blame deals with that."
You haven't been able to provide me with this. So you can claim whatever you want about "factual innacuracy", you still failed to provide counter facts to support your claim.

250

You're moving goal posts. Your previous message said that there is no race, while now you're eluding it with "There are a number of situations where...".

Also you're changing the definition of the multi-repo as I was foreseeing it. I think it is worse, and if we were to adopt the multi-repo proposal, I would be totally against this.

Now, *just to please you*, because again I don't think it does any good to this proposal, I'll re-formulate making clear that:

  1. update in the multi-repo are single commits based.
  2. commits can be in different orders.
  3. it does not handle cross-project commits.
602

If you discount that workflow, the workflow for interfacing using the GitHub SVN bridge is very similar whether you are using one repo or many.

"Very similar" is subjective, to me it can't be similar as long as there is no longer a single revision number.

Additionally, with the mono repo the combined SVN workflow is actually a lot better than with SVN today. It is way less fragile since you aren't doing sub-directory checkouts. This means you don't run the risk of inadvertently running svn up and pulling down way more than you wanted.

I don't understand what you mean here.

kparzysz added inline comments.Oct 3 2016, 12:41 PM
docs/Proposals/GitHubMove.rst
356

Even with sparse checkout? Am I going to see new files in projects that were not originally included in the sparse checkout?

367

A conflicting change would have to affect the same file. This is regardless of whether it's monorepo or multirepo. Am I missing something here?

Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system will just add the change on top of the current ToT, even if they have not been fetched to the local repo.

jlebar added inline comments.Oct 3 2016, 1:41 PM
docs/Proposals/GitHubMove.rst
356

What do you mean by "see"?

In order to push a commit without -f, the commit's parent commit must be the current remote head. The commits in git are unaffected by sparse checkout. So, if you have a commit you want to push, you will need to rebase it atop current remote HEAD -- you'll have to do this rebase even if you're using sparse checkouts and all of the changes between your current base revision and current remote HEAD are to subprojects that you don't have checked out.

If you don't like this, you can continue to use the single-subproject mirrors exactly as you currently do (with git-svn and everything), by changing the configs as explained elsewhere in this document. But I've been using a monorepo (http://github.com/llvm-project/llvm-project) for months now. I've pushed maybe 30 commits using my custom script (https://github.com/jlebar/llvm-repo-tools) and this necessity to rebase hasn't once been an annoyance for me.

367

Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system
will just add the change on top of the current ToT, even if they have not been fetched to the local
repo.

That is what git-svn will do, yes. But that's not pure git's behavior.

mehdi_amini added inline comments.Oct 3 2016, 1:52 PM
docs/Proposals/GitHubMove.rst
356

Even with sparse checkout? Am I going to see new files in projects that were not originally included in the sparse checkout?

If you mean are you seeing them when typing ls in your terminal, then no you don't. I can add "unless you're using a sparse checkout" to make it more clear.

367

A conflicting change would have to affect the same file. This is regardless of whether it's monorepo or multirepo. Am I missing something here?

The point was that when you run git pull --rebase, you have new changes, and even without an explicit "diff conflict" your changes that you're about to push may use an API that have changed upstream. Note today this is not addressed: SVN will blindly accept the push and break the build.

Rebasing is always a good practice, but it's not strictly required. If there are no conflicts, the system will just add the change on top of the current ToT, even if they have not been fetched to the local repo.

As Justing mentions, this is not true with git push AFAIK. You have to pull (merge or rebase) before being able to push.

beanz added a comment.Oct 3 2016, 2:02 PM

After this round of feedback I'm removing myself from this discussion.

docs/Proposals/GitHubMove.rst
208

You asked for feedback. If you want to disregard it that is your decision.

215

"Can you provide an example where the history of a single file *contents* can be preserved without pulling all the source repository entirely? I'd like to try it and see how git log/git blame deals with that."

git-filter-branch can preserve the history of a single file. It does not follow renames, however if you know a file was renamed, you can use git-filter-branch's --tree-filter or --index-filter flags to perform more complicated slicing of the repository to preserve that history. If you're unfamiliar with the types of things you can do with filter branch, this article gives a good overview (https://devsector.wordpress.com/2014/10/05/advanced-git-branch-filtering/).

250

From the beginning I said:

It won't be perfect, but it should be good enough for sorting commits in close proximity...

If you want to debate that statement we can do so, but I would prefer not to in this thread.

Also you're changing the definition of the multi-repo as I was foreseeing it. I think it is worse, and if we were to adopt the multi-repo proposal, I would be totally against this.

You don't get to dictate how the proposal in opposition to your preferred approach is written. I think you've been pretty clear about being against the multi-repo proposal, so I don't see how your opinion factors in to the final document, which shouldn't be opinion based.

602

Saying the workflows is "similar" is not a subjective wording.

Today someone who writes:
svn co svn co http://llvm.org/svn/llvm-project/llvm/trunk
Under the mono-repo could write something like:
svn co http://github.com/llvm/llvm-project/master/llvm
Under the multi-repo could write something like:
svn co http://github.com/llvm/llvm/master/

The *workflow* of svn co -> svn add -> svn commit is *similar* in all cases.

beanz removed a subscriber: beanz.Oct 3 2016, 2:02 PM
kparzysz added inline comments.Oct 3 2016, 2:02 PM
docs/Proposals/GitHubMove.rst
356

What do you mean by "see"?

I'm referring to this (and the rest of this paragraph):
"However when you fetch you'll likely pull in changes to sub-projects you don't care about."

The intent wasn't clear---I wasn't aware of the requirement about parent commit (I use SVN for upstreaming changes). But that brings another question: what is the anticipated frequency of commits to the monorepo? My concern is that the "rebuild and retest" approach may take long enough to require another rebase...

mehdi_amini updated this revision to Diff 73339.Oct 3 2016, 2:12 PM
  • remove reference to size.
  • change the model for multi-repos: single commit based, losing cross-project commits.
  • clarify that sparse-checkout don't see changes from other projects
mehdi_amini added inline comments.Oct 3 2016, 2:16 PM
docs/Proposals/GitHubMove.rst
356

When you commit to SVN, you add a "patch" on top of the existing codebase. Unless there is a conflict your patch will be committed.
It does not mean it will build, since someone else may just have changed an API you're using in your patch.

The new monorepo won't be different from SVN on this aspect: you have the same frequency of commits, and you can run git pull && git push which is roughly equivalent to git svn dcommit today.
The thing is that between the git pull and the git push, you can also inspect what changed since your last build/check, and decide if you need to rebuild or not.

mehdi_amini added inline comments.Oct 3 2016, 2:18 PM
docs/Proposals/GitHubMove.rst
356

Maybe we should just remove all this paragraph, it is confusing...

mehdi_amini updated this revision to Diff 73342.Oct 3 2016, 2:20 PM

Remove confusing paragraph.

jlebar added inline comments.Oct 3 2016, 2:21 PM
docs/Proposals/GitHubMove.rst
356

My concern is that the "rebuild and retest" approach may take long enough to require another rebase...

This isn't a function of the monorepo. You choose when to rebuild/retest, and that's orthogonal to the repository structure. If "rebuild/retest only when there were changes to files I changed" is what you want to do, you can still do that. You can ask that question of git before pushing. Or you could ask "have any of the projects I care about changed?" Or you could ask a different question. And you could ask those questions of the monorepo, or the multirepo (although it might be a bit more work in the multirepo -- I say "might" so beanz doesn't jump on me).

In this sense it's safer than SVN, which assumes that you only care about retesting if there were modifications to files you also changed.

jlebar added inline comments.Oct 3 2016, 2:23 PM
docs/Proposals/GitHubMove.rst
356

I really think you want this paragraph, btw. This is a very common question -- it's been asked many times before. "I don't want the monorepo because it will mean I have to rebuild/retest a *lot* more than I do today." False, but we need to explain why.

mehdi_amini marked 45 inline comments as done.Oct 3 2016, 2:37 PM
mehdi_amini added inline comments.
docs/Proposals/GitHubMove.rst
250

You don't get to dictate how ...

Sorry, you mischaracterizing my position and what I wrote, I don't appreciate this.

356

OK, I'll try to rephrase it then.
The main point is that git pull && git push is not different from today SVN.

mehdi_amini marked an inline comment as done.Oct 3 2016, 2:38 PM
mehdi_amini updated this revision to Diff 73349.Oct 3 2016, 2:55 PM
mehdi_amini marked 13 inline comments as done.

Restore the paragraph.

dtzWill added a subscriber: dtzWill.Oct 6 2016, 7:43 AM
mehdi_amini updated this revision to Diff 73858.Oct 6 2016, 3:13 PM

Address Duncan's inline comments.

New layout attempt

I believe what Duncan is asking for is basically the same thing I (and others) have also been asking for: An explicit "not dryly-factual" section where the experts explain their various positions.

I am saddened that we won't have these sections in the document -- I think not having them does a disservice to the readers who, like you, want this material. (We've had at least one other person comment in this thread, and we've had nobody say they don't want it.) But the disagreement seems to be based on Mehdi having fundamentally different conceptions of concourse than certainly I have, and I'm not prepared to litigate the philosophy of argument just to get a section added to this document.

On the other hand, in light of the amount of abuse it seems that whoever drives this process will receive no matter what they say, I'm certainly not willing to switch places with Mehdi, and I think he deserves a heaping ton of credit for frankly superhuman positivity here (in addition to credit for doing the work itself). So if he continues to oppose this idea, I respect his decision. In that case, I think we're just going to have to write it up separately, and hope that it gets the visibility it deserves. Maybe we'll be able to get a link in this document, although if that turns into a fight, like so much else here has, I hope I'll have the self-control to turn and run in the other direction.

Try another layout: add first a description of the multirepo, then one for the monorepo, then the interleaved comparison.

ioeric added a subscriber: ioeric.Oct 12 2016, 2:30 AM
ioeric added inline comments.Oct 12 2016, 4:19 AM
docs/Proposals/GitHubMove.rst
181

I am wondering where we are in the process now. Specifically, when would we get to this step (2.5)?

Phabricator is seeing frequent connection errors from svn server (might due to the increased number of svn connections after the recent phabricator upgrade):

svn: OPTIONS of 'http://llvm.org/svn-robots/llvm-project': could not connect to server (http://llvm.org)

This blocks syncing svn commits from time to time. I'd expect Github to be more stable.

Address Duncan's feedback

(Remove duplicated section)

Split the bullet about the overhead of the monorepo for users that care only about a single subproject.

This revision was automatically updated to reflect the committed changes.