This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Move pseudoparser from clang to clang-tools-extra
ClosedPublic

Authored by sammccall on Mar 8 2022, 10:32 AM.

Details

Summary

This should make clearer that:

  • it's not part of clang proper
  • there's no expectation to update it along with clang (beyond green tests)
  • clang should not depend on it

This is intended to be expose a library, so unlike other tools has a split
between include/ and lib/.

The main renames are:

clang/lib/Tooling/Syntax/Pseudo/*           => clang-tools-extra/pseudo/lib/*
clang/include/clang/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/include/clang-pseudo/*
clang/tools/clang/pseudo/*                  => clang-tools-extra/pseudo/tool/*
clang/test/Syntax/*                         => clang-tools-extra/pseudo/test/*
clang/unittests/Tooling/Syntax/Pseudo/*     => clang-tools-extra/pseudo/unittests/*
#include "clang/Tooling/Syntax/Pseudo/*"    => #include "clang-pseudo/*"
namespace clang::syntax::pseudo             => namespace clang::pseudo
check-clang                                 => check-clang-pseudo
clangToolingSyntaxPseudo                    => clangPseudo

The clang-pseudo and ClangPseudoTests binaries are not renamed.

See discussion around:
https://discourse.llvm.org/t/rfc-a-c-pseudo-parser-for-tooling/59217/50

Diff Detail

Event Timeline

sammccall created this revision.Mar 8 2022, 10:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2022, 10:32 AM
Herald added a subscriber: mgorny. · View Herald Transcript
sammccall requested review of this revision.Mar 8 2022, 10:32 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 8 2022, 10:32 AM

@hokein I talked about making some headers private, but realized that change really is unrelated, and would be confusing to mix in here.

This is awkward to review. I think the main things to check:

  • whether the intended renames (see patch description) are correct
  • the CMakeLists and lit.* configs: these are largely new (albeit cobbled together from other configs)

Going through the source file changes line-by-line is unlikely to be enlightening. Most likely errors is if I forgot to change something, which would be annoying but can be fixed forward.

Thanks, the movement looks roughly good.

We should be aware of that (when landing this patch):

  • the upstream llvm gn build files need to be adjust (IIUC, there is no expectation from us to update the gn files)
  • we'd better prepare a patch for the internal integration (make the integration life easier)
clang-tools-extra/pseudo/CMakeLists.txt
2

Why not use CMAKE_CURRENT_SOURCE_DIR? It seems a more natural fit.

clang-tools-extra/pseudo/include/clang-pseudo/DirectiveMap.h
31

clang-pseudo is somewhat nice, it is probably ok, but

  • using hyphen in the include path is not a canonical pattern in LLVM code-style;
  • we have a command-line tool which calls clang-pseudo, clang-pseudo makes me think this is a header from the command-line tool rather from the *library*.

there are other options, but they don't seem significantly better than the current one :(

  • clang/Pseudo/Token.h: this matches the canonical pattern of using headers from clang libraries, but it causes confusion (which we want to avoid in this patch);
  • clangPseudo/Token.h: this feels slightly better, as we build a library called clangPseudo;

I know we want to explicitly express that this is intended to be a library, another aggressive option (which mostly follows the existing pattern in clang-tools-extra) would be

clang-tools-extra/clangPseudo/Token.{h, cpp}
clang-tools-extra/clangPseudo/internal/Forest.h
clang-tools-extra/clangPseudo/tool/

internal private headers are put into the sub internal directory, headers in non internal directories are treat as public headers implicitly.

sammccall added inline comments.Mar 9 2022, 1:43 AM
clang-tools-extra/pseudo/CMakeLists.txt
2

I don't understand what you want me to change.
This line adds the "include" directory, both in the source tree and the genfiles tree.

Do you want the former to be explicitly qualified with CMAKE_CURRENT_SOURCE_DIR rather than using the relative path? That seems unnecessary and uncommon

If it's not obvious what this line does, would it be better to split it into two calls with one arg each?

clang-tools-extra/pseudo/include/clang-pseudo/DirectiveMap.h
31

Thanks for going through the alternatives.
I think clang-pseudo is the best option, and clangPseudo is the most viable alternative.
clangPseudo looks wrong to me though, and I think clang-pseudo follows precedent better, both inside LLVM and more generally.

using hyphen in the include path is not a canonical pattern in LLVM code-style;

Let's split include path into parts: #include "mount-point/ProjectRelativeSubdir/Header.h"

Mount points are usually single-word lowercase in LLVM.
The only real precedent for multi-words is clang-c/llvm-c/mlir-c that use hyphens. This isn't terribly strong, but it's something.

Project-relative subdirs follow the internal structure of the project, which is usually UpperCamelCase and occasionally lower-hyphen-case.

AFAICS there are zero examples of lowerCamelCase either in mount points or in project-relative subdirs.
(I believe this is rare in libraries outside llvm codebase too, which may be why it looks wrong to me).

clangPseudo/Token.h: this feels slightly better, as we build a library called clangPseudo;

Yes, matching the library name rather than the binary name feels more consistent.
I don't think what CMake calls the library is a particularly significant string to anchor on (particularly for out-of-tree users, but even for llvm devs). Confusion with the clang-pseudo tool is a risk, on the other hand *association* with the tool is a good thing.

The obvious precedent here is that clang is both a binary and an include mount point. I don't think this has caused confusion.

clang/Pseudo/Token.h: this matches the canonical pattern of using headers from clang libraries, but it causes confusion (which we want to avoid in this patch)

Agree, this isn't a good idea if we want to send a signal that this is a sibling to clang, but not part of it.

clang-tools-extra/clangPseudo/Token.{h, cpp}

I did consider this option, and very much wanted it to work. I think it's a bad idea though:

It places public headers next to the implementation and internal headers elsewhere, which is the opposite of the convention in other libraries.
It can easily lead to problems: the distinction between public vs private interface is subtle and inadvertently making too many headers public won't break the build and is easily overlooked in review.

hokein accepted this revision.Mar 9 2022, 2:44 AM
hokein added inline comments.
clang-tools-extra/pseudo/CMakeLists.txt
2

sorry, I was not clear in the comment.

If it's not obvious what this line does, would it be better to split it into two calls with one arg each?

Yeah. I interpreted the line include_directories(include ${CMAKE_CURRENT_BINARY_DIR}/include) as "add the ${CMAKE_CURRENT_BINARY_DIR}/include to the include-search directory", and didn't realize it covers the <source_dir>/include.

Split it into two or add a comment would make it clearer.

clang-tools-extra/pseudo/include/clang-pseudo/DirectiveMap.h
31

That's fair enough. I'm convinced clang-pseudo is the best option, thanks!

This revision is now accepted and ready to land.Mar 9 2022, 2:44 AM

Thank you for working on this! I have a few thoughts on the renaming, but otherwise strongly support the direction here.

clang/lib/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/lib/*

The usual naming conventions in clang-tools-extra is to use the tool name as the folder it goes in. Based on that, should the folders be clang-tools-extra/clang-pseudo/ instead of clang-tools-extra/pseudo/?

clang/include/clang/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/include/clang-pseudo/*
clang/tools/clang/pseudo/* => clang-tools-extra/pseudo/tool/*
clang/test/Syntax/* => clang-tools-extra/pseudo/test/*

The convention are for clang-tools-extra tests to live in clang-tools-extra/test/<tool> (clangd is using the style you propose here when it probably should have followed the existing conventions).

clang/unittests/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/unittests/*
#include "clang/Tooling/Syntax/Pseudo/*" => #include "clang-pseudo/*"
namespace clang::syntax::pseudo => namespace clang::pseudo
check-clang => check-clang-pseudo
clangToolingSyntaxPseudo => clangPseudo

Thank you for working on this! I have a few thoughts on the renaming, but otherwise strongly support the direction here.

clang/lib/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/lib/*

The usual naming conventions in clang-tools-extra is to use the tool name as the folder it goes in. Based on that, should the folders be clang-tools-extra/clang-pseudo/ instead of clang-tools-extra/pseudo/?

I don't think it's a good naming convention, and am not convinced consistency here is important enough to propagate a bad idea further.
We're in a directory called "clang-tools-extra", so the clang- prefix is redundant in the path (not in the binary name).
It seems like a small thing, but it adds up: the clang-tools-extra directory is ugly, so are lib and include/clang-pseudo, so is having both llvm and llvm-project because of the monorepo.
We have a file named llvm/llvm-project/llvm/include/llvm/Support/InitLLVM.h! People working with this code type these paths often.

(If it helps, I'd be happy to prepare changes to drop the clang- prefixes from any of the directories where they seem obviously redundant - everything except clang-doc and clangd. But I think the much larger burden of establishing consensus needs to fall on people who think that consistency is worth enforcing here).

clang/include/clang/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/include/clang-pseudo/*
clang/tools/clang/pseudo/* => clang-tools-extra/pseudo/tool/*
clang/test/Syntax/* => clang-tools-extra/pseudo/test/*

The convention are for clang-tools-extra tests to live in clang-tools-extra/test/<tool> (clangd is using the style you propose here when it probably should have followed the existing conventions).

We used this layout for several years, and it was painful to navigate and maintain, so it was deliberately changed in b804eef09052cf40e79aa2ed8a23f2f39e2dda1b at which point all the pain went away.

Again, I'd be willing to work on improving this, but I don't want to spend weeks arguing with people about it. My experience is that these conversations are exhausting, negative because people are grumpy about dealing with churn, and in the end there's nobody empowered to say "OK, this is a good idea". See also the idea of using a separate bugtracker, website infra etc for clangd, where all the feedback was extremely negative, it was very difficult to decide to proceed anyway, and these turned out to be large improvements and none of the warnings came true.

Thank you for working on this! I have a few thoughts on the renaming, but otherwise strongly support the direction here.

clang/lib/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/lib/*

The usual naming conventions in clang-tools-extra is to use the tool name as the folder it goes in. Based on that, should the folders be clang-tools-extra/clang-pseudo/ instead of clang-tools-extra/pseudo/?

I don't think it's a good naming convention, and am not convinced consistency here is important enough to propagate a bad idea further.

It's the convention we currently use and I don't think we should deviate from it as a one-off; that only adds confusion, especially once others go to add another project and try to use existing projects as a pattern.

We're in a directory called "clang-tools-extra", so the clang- prefix is redundant in the path (not in the binary name).
It seems like a small thing, but it adds up: the clang-tools-extra directory is ugly, so are lib and include/clang-pseudo, so is having both llvm and llvm-project because of the monorepo.
We have a file named llvm/llvm-project/llvm/include/llvm/Support/InitLLVM.h! People working with this code type these paths often.

I don't disagree with these points, FWIW. :-)

(If it helps, I'd be happy to prepare changes to drop the clang- prefixes from any of the directories where they seem obviously redundant - everything except clang-doc and clangd. But I think the much larger burden of establishing consensus needs to fall on people who think that consistency is worth enforcing here).

The status quo is that we try to be consistent unless there's a compelling reason not to, so I disagree; the onus is on you as to why this is a special snowflake that deserves to be inconsistent with everything else in the project. My suggestion to move forward is: rename to be consistent in this patch so we can land it, then put up an RFC to rename all of the directories in clang-tools-extra at once so that we stay consistent (there's a wee bit extra churn from that, but it means we don't hold up this patch debating directory names). I'd support such an RFC, but I wouldn't support introducing inconsistency here; the rationale for deviation isn't nearly compelling enough to me. Also, because it's an NFC change, I think the RFC is effectively "does someone have a blocking concern that I need to address first, otherwise the NFC change will go in now". It's not so much an RFC as a heads up that the change is coming in case downstreams need to react to it early.

clang/include/clang/Tooling/Syntax/Pseudo/* => clang-tools-extra/pseudo/include/clang-pseudo/*
clang/tools/clang/pseudo/* => clang-tools-extra/pseudo/tool/*
clang/test/Syntax/* => clang-tools-extra/pseudo/test/*

The convention are for clang-tools-extra tests to live in clang-tools-extra/test/<tool> (clangd is using the style you propose here when it probably should have followed the existing conventions).

We used this layout for several years, and it was painful to navigate and maintain, so it was deliberately changed in b804eef09052cf40e79aa2ed8a23f2f39e2dda1b at which point all the pain went away.

And introduced new pain for others when you did so -- I went looking for the clangd tests and it took me a while to figure out that project deviated.

Again, I'd be willing to work on improving this, but I don't want to spend weeks arguing with people about it. My experience is that these conversations are exhausting, negative because people are grumpy about dealing with churn, and in the end there's nobody empowered to say "OK, this is a good idea". See also the idea of using a separate bugtracker, website infra etc for clangd, where all the feedback was extremely negative, it was very difficult to decide to proceed anyway, and these turned out to be large improvements and none of the warnings came true.

Understandable. FWIW, I'd also be fine switching things around so that the test directory is with the tool for all tools (I'd see this as the same kind of NFC change as above). What I'm not fine with is one tool deciding they're special and don't need to be consistent with the rest of the project. These sorts of inconsistencies rarely cause pain for the people who introduce them, but they do cause pain for others who are working on the project as a whole instead of just one tool. As an example of the pain caused by clangd -- not everyone runs tests using a check target. Some folks use python to execute tests directly, and they expect that python llvm-lit.sv -sv clang-tools-extra\tests will run all of the tests for clang-tools-extra. I've broken clangd tests by not running the tests when I thought I had. Thankfully, postcommit CI found the issues pretty quickly, but it still meant a bot stampede for me to deal with when I shouldn't have been able to catch the issue far earlier and with less disruption.

So I absolutely sympathize with the points you raise and I'd be happy to support efforts to improve the situation, but I don't think this patch goes about it the right way.

I understand where you're coming from. But I think agreeing to move the code was premature if it means either:

  • following all the precedents in clang-tools-extra, or
  • taking on the political burden of getting these changed for all projects.

Based on previous interactions, I think we differ on the relative value we place on consistency vs local quality vs burden on development, and our ability to make consensus-based changes on a timeline that feels acceptable.
(This isn't a criticism, I admire your patience and dedication to consistency, I just don't share it).
If consistency is to be the sine qua non, then I think we should probably leave the code where it is until someone's willing to do the work you described. clang/Tooling/Syntax is the most consistent location for the code, and the one proposed in the original RFC last year.

The status quo is that we try to be consistent unless there's a compelling reason not to, so I disagree; the onus is on you as to why this is a special snowflake that deserves to be inconsistent with everything else in the project.

If I understand, you're saying that LLVM in general, or clang-tools-extra in particular values consistency over quality.
Is this a documented policy (where?), consensus (among who?), or historical practice (which would beg the question somewhat).
This is a genuine question - I suspect that you're right, but it's hard to know how to challenge this if it's unclear where it comes from.

I understand where you're coming from. But I think agreeing to move the code was premature if it means either:

  • following all the precedents in clang-tools-extra, or
  • taking on the political burden of getting these changed for all projects.

I agreed based on the understanding that the new code would follow the existing conventions, so perhaps the agreement was a bit premature. But, I'm honestly pretty surprised at this significant of resistance to following the same patterns used by (basically) everything else in the project. We're talking about a directory name and where we put tests; what are the technical challenges here? All of the arguments you've presented seem to be stylistic in nature (not that those kinds of issues aren't important to consider), so I feel like I must be missing something.

Based on previous interactions, I think we differ on the relative value we place on consistency vs local quality vs burden on development, and our ability to make consensus-based changes on a timeline that feels acceptable.
(This isn't a criticism, I admire your patience and dedication to consistency, I just don't share it).
If consistency is to be the sine qua non, then I think we should probably leave the code where it is until someone's willing to do the work you described. clang/Tooling/Syntax is the most consistent location for the code, and the one proposed in the original RFC last year.

If you don't feel like updating the projects to do a consistent rename, that's totally fine; I didn't intend to imply that was expected of you or required for this to proceed.

The status quo is that we try to be consistent unless there's a compelling reason not to, so I disagree; the onus is on you as to why this is a special snowflake that deserves to be inconsistent with everything else in the project.

If I understand, you're saying that LLVM in general, or clang-tools-extra in particular values consistency over quality.

You've not convinced me there's a quality issue here, just a difference in stylistic preference. The directory layout and naming structure has worked well enough thus far (excepting perhaps the test directory for clangd, but I'll be frank: I don't get why that change was needed and it's made my life worse since it was made, so I regret not paying closer attention to that review).

If there's a quality concern here that I'm not yet seeing, I'd like to know more about it. I definitely don't want to prefer consistency over quality.

Is this a documented policy (where?), consensus (among who?), or historical practice (which would beg the question somewhat).
This is a genuine question - I suspect that you're right, but it's hard to know how to challenge this if it's unclear where it comes from.

Not clearly documented anywhere (but sort of is documented; it's how we handle everything else in the coding style guidelines; stick with the conventions used locally as best you can). I would call it a mixture of historical practice and general common practice in the industry. Consistency in interfaces is generally easier on newcomers to a project because it reduces the amount of mental burden to understand the project's architecture. I think part of the issue here is that clang-tools-extra is a dumping ground for projects, so it isn't itself a "project" (in the same sense of LLVM, Clang, lldb, etc). I think that adds some natural tension here because the needs for a random group of projects is hard to get consistency out of as we add more and more projects. It's probably good that we're having this discussion, but I'd prefer if the finer points on policy didn't hold up your patch either.

I was originally concerned about the pseudo parser existing in the tree at all because of maintenance burdens, but those concerns were mitigated by moving the project here to clang-tools-extra. I don't think the directory name being pseudo instead of clang-pseudo will add significant maintenance burden; it'll just be the only project we can't figure out the executable name of from the base directory name (across the entire monorepo, perhaps? I can't find other examples with a quick look). That's.. silly, but not the end of the world or worth holding this patch up over if you insist on not having the clang- prefix that the executable has. But the test directory being in a novel location really does add maintenance burden in the form of making it trivially easy to think you've run tests when you haven't actually done so. This is has already caused me pain with clangd and I don't want to see that being used as precedent to make things worse here.

Can you help me understand the quality concerns you mentioned?

I agreed based on the understanding that the new code would follow the existing conventions, so perhaps the agreement was a bit premature. But, I'm honestly pretty surprised at this significant of resistance to following the same patterns used by (basically) everything else in the project.

I'm not sure I buy the premise ("everything else in the project"). clang-tools-extra isn't really a project: it doesn't have coherent code, goals, or developers.
Each subproject in llvm-project except those under clang-tools-extra has a root directory containing tests etc for the subproject, and makes good use of this!

We're talking about a directory name and where we put tests; what are the technical challenges here? All of the arguments you've presented seem to be stylistic in nature (not that those kinds of issues aren't important to consider), so I feel like I must be missing something.

Right, directory name is a stylistic concern either way AFAICT (redundancy and consistency).

Location of tests is more complicated, and more important. Experience from clangd:

  • A root directory for build system config is important. IIRC the straw that broke the camel's back for rearranging clangd was having a root CMakeLists file to turn it off in one obvious place.
  • The most common development tools (editors, shells, grep, ...) are much harder to use if your working set is several directories rather than one.
  • Linking strangers to the code leaves the tests essentially invisible.
  • It clearly defines the boundaries of the project. Before the move, I don't think most contributors could have answered the question "which directories are part of clangd" with confidence. (I couldn't!)
  • On multiple occasions contributors failed to find one of (unit|lit) tests, because once you're looking in nonobvious places you stop when you find anything. So they put tests in the wrong place.
  • The clang-tools-extra layout doesn't provide test targets for subprojects, which is a dealbreaker; it seems people resort to guessing at llvm-lit invocations instead. We were maintaining check-clangd in clang-tools-extra/test/CMakeLists.txt as a special snowflake, which is presumably also objectionable (it's 35 lines of CMake).

If consistency is to be the sine qua non, then I think we should probably leave the code where it is until someone's willing to do the work you described. clang/Tooling/Syntax is the most consistent location for the code, and the one proposed in the original RFC last year.

If you don't feel like updating the projects to do a consistent rename, that's totally fine; I didn't intend to imply that was expected of you or required for this to proceed.

When you say it's fine, do you mean that it's acceptable to land this patch, or to abandon it, or to land your preferred version that I'm not OK with? :-)

I think we are now is:

  • there was an RFC
  • several months later, code started landing, and there was new discussion (on the RFC thread) including a proposal to move
  • we agreed that a move is OK with us, but are stuck on the details

So if it makes a difference, it seems you're _requesting_ a change here, not approving one.

The directory layout and naming structure has worked well enough thus far

If people finding build/test setup weird and alienating, I think they're more likely to not contribute than to offer clear feedback. I'm not sure how we can tell it works well.
Anecdotally, I do know that I was put off from contributing to clang-tidy by how hard it was to maintain and debug tests, I've lost significant time when I've needed to do this, and I've seen others struggle with it too.
But while this has come up many times in conversation, I've never seen it written down. Does open-source have a taboo about whining about stuff being hard? :-)

I think part of the issue here is that clang-tools-extra is a dumping ground for projects, so it isn't itself a "project" (in the same sense of LLVM, Clang, lldb, etc).

I do agree with this perspective, clang-tools-extra is "just a directory".
This is why it didn't seems like we were doing something wild with clangd: we were just treating it like a "real" subproject under the llvm-project umbrella.
(I'd be happy with moving clangd or clang-pseudo directly under llvm-project if that addresses the concerns, as long as I don't have to chase impossible consensus-via-RFC).

I was originally concerned about the pseudo parser existing in the tree at all because of maintenance burdens, but those concerns were mitigated by moving the project here to clang-tools-extra. I don't think the directory name being pseudo instead of clang-pseudo will add significant maintenance burden; it'll just be the only project we can't figure out the executable name

The executable we're talking about here is essentially a driver for tests, so this is similar to not being able to find that the name of clang/Index or tools/libclang is c-index-test.

But the test directory being in a novel location really does add maintenance burden in the form of making it trivially easy to think you've run tests when you haven't actually done so.

I'm sorry that we broke this workflow, churn is painful. I think it got fixed though!
It's also a fragile workflow which doesn't update dependencies, which is why we don't use it and hadn't tested it.
Are you sure this isn't a symptom of relying on low-level tools because check-clang-tools is too coarse-grained?

This is has already caused me pain with clangd and I don't want to see that being used as precedent to make things worse here.

This feels like a double standard - the precedents you've cited have already caused me pain, too.

aaron.ballman accepted this revision.Mar 9 2022, 1:00 PM

I agreed based on the understanding that the new code would follow the existing conventions, so perhaps the agreement was a bit premature. But, I'm honestly pretty surprised at this significant of resistance to following the same patterns used by (basically) everything else in the project.

I'm not sure I buy the premise ("everything else in the project"). clang-tools-extra isn't really a project: it doesn't have coherent code, goals, or developers.
Each subproject in llvm-project except those under clang-tools-extra has a root directory containing tests etc for the subproject, and makes good use of this!

Yeah, I think we're converging on this point.

We're talking about a directory name and where we put tests; what are the technical challenges here? All of the arguments you've presented seem to be stylistic in nature (not that those kinds of issues aren't important to consider), so I feel like I must be missing something.

Right, directory name is a stylistic concern either way AFAICT (redundancy and consistency).

Agreed.

Location of tests is more complicated, and more important. Experience from clangd:

  • A root directory for build system config is important. IIRC the straw that broke the camel's back for rearranging clangd was having a root CMakeLists file to turn it off in one obvious place.
  • The most common development tools (editors, shells, grep, ...) are much harder to use if your working set is several directories rather than one.
  • Linking strangers to the code leaves the tests essentially invisible.
  • It clearly defines the boundaries of the project. Before the move, I don't think most contributors could have answered the question "which directories are part of clangd" with confidence. (I couldn't!)
  • On multiple occasions contributors failed to find one of (unit|lit) tests, because once you're looking in nonobvious places you stop when you find anything. So they put tests in the wrong place.
  • The clang-tools-extra layout doesn't provide test targets for subprojects, which is a dealbreaker; it seems people resort to guessing at llvm-lit invocations instead. We were maintaining check-clangd in clang-tools-extra/test/CMakeLists.txt as a special snowflake, which is presumably also objectionable (it's 35 lines of CMake).

Thank you for the details here, this helps me understand the concerns a lot better.

If consistency is to be the sine qua non, then I think we should probably leave the code where it is until someone's willing to do the work you described. clang/Tooling/Syntax is the most consistent location for the code, and the one proposed in the original RFC last year.

If you don't feel like updating the projects to do a consistent rename, that's totally fine; I didn't intend to imply that was expected of you or required for this to proceed.

When you say it's fine, do you mean that it's acceptable to land this patch, or to abandon it, or to land your preferred version that I'm not OK with? :-)

Sorry for the confusion, I mostly meant the last point, but without any additional work on your end after the patch lands.

I think we are now is:

  • there was an RFC
  • several months later, code started landing, and there was new discussion (on the RFC thread) including a proposal to move
  • we agreed that a move is OK with us, but are stuck on the details

So if it makes a difference, it seems you're _requesting_ a change here, not approving one.

A bit of both, really. I requested the move, you're doing the move, I didn't approve yet because the move did things I wasn't expecting.

The directory layout and naming structure has worked well enough thus far

If people finding build/test setup weird and alienating, I think they're more likely to not contribute than to offer clear feedback. I'm not sure how we can tell it works well.
Anecdotally, I do know that I was put off from contributing to clang-tidy by how hard it was to maintain and debug tests, I've lost significant time when I've needed to do this, and I've seen others struggle with it too.
But while this has come up many times in conversation, I've never seen it written down. Does open-source have a taboo about whining about stuff being hard? :-)

My measurement for how well it works is how long it's existed without anyone proposing to change it, but that's not really "measuring" so much as "guessing".

I think part of the issue here is that clang-tools-extra is a dumping ground for projects, so it isn't itself a "project" (in the same sense of LLVM, Clang, lldb, etc).

I do agree with this perspective, clang-tools-extra is "just a directory".
This is why it didn't seems like we were doing something wild with clangd: we were just treating it like a "real" subproject under the llvm-project umbrella.
(I'd be happy with moving clangd or clang-pseudo directly under llvm-project if that addresses the concerns, as long as I don't have to chase impossible consensus-via-RFC).

Yeah, I've been thinking about this a bit more and it's causing me to come around to the idea that the initial structure for clang-tools-extra was wrong (especially coupled with some of your points above).

I was originally concerned about the pseudo parser existing in the tree at all because of maintenance burdens, but those concerns were mitigated by moving the project here to clang-tools-extra. I don't think the directory name being pseudo instead of clang-pseudo will add significant maintenance burden; it'll just be the only project we can't figure out the executable name

The executable we're talking about here is essentially a driver for tests, so this is similar to not being able to find that the name of clang/Index or tools/libclang is c-index-test.

... well shoot, I didn't know that... I thought clang-pseudo was a tool users would run to drive the pseudo parser as a REPL tool (kind of like clang-query) as opposed to a project that exists just to drive tests.

But the test directory being in a novel location really does add maintenance burden in the form of making it trivially easy to think you've run tests when you haven't actually done so.

I'm sorry that we broke this workflow, churn is painful. I think it got fixed though!
It's also a fragile workflow which doesn't update dependencies, which is why we don't use it and hadn't tested it.
Are you sure this isn't a symptom of relying on low-level tools because check-clang-tools is too coarse-grained?

Maybe it is? I've been running llvm-lit manually for so many years, it's hard for me to tell, tbh.

This is has already caused me pain with clangd and I don't want to see that being used as precedent to make things worse here.

This feels like a double standard - the precedents you've cited have already caused me pain, too.

Fair point, we're all in pain. :-D Okay, I think you've convinced me of a few things:

  1. The name of the tool directory doesn't matter to me any more now that I know this is a test harness and not an executable users are going to be looking for the source for all that often.
  2. The layout of clang-tools-extra in terms of test directories is problematic and it'd sure be nice if someday someone moved the test directories under the individual tools instead of using a top-level test directory.
  3. Because of #2 and because clangd already started down the path of using a second-level test directory, I'm now okay with the proposed layout in this patch, so giving it my LGTM.

Thank you for bearing with me while we figured this out together!

Thanks Aaron, and my apologies for being easily frustrated.

  1. The layout of clang-tools-extra in terms of test directories is problematic and it'd sure be nice if someday someone moved the test directories under the individual tools instead of using a top-level test directory.

I'd really like to enable this.
I think the blocker is mostly the amount of boilerplate needed to set up test directories.
These required files are mostly boilerplate:

  • test/CMakeLists.txt (deps are meaningful, rest is boilerplate)
  • test/lit.site.cfg.py.in
  • test/lit.cfg.py
  • test/Unit/lit.site.cfg.py.in (this patch puts these under unittest/ instead, but either way)
  • test/Unit/lit.cfg.py

Creating 10 more copies of each of those seems a little irresponsible, especially for tools that are "lightly maintained". So it's hard to say with a straight face that restructuring them all is a good idea, and maybe this motivated the original clang-tools-extra structure.
But if those files were generated/eliminated somehow, restructuring would be very simple patches that didn't add more cruft to maintain.

It's not obvious to me exactly how to do this (lit.* files affect both discovery and configuration) but I'll dig into this more tomorrow.

Thanks Aaron, and my apologies for being easily frustrated.

No worries at all! I always appreciate our discussions, even when we don't initially agree.

  1. The layout of clang-tools-extra in terms of test directories is problematic and it'd sure be nice if someday someone moved the test directories under the individual tools instead of using a top-level test directory.

I'd really like to enable this.
I think the blocker is mostly the amount of boilerplate needed to set up test directories.
These required files are mostly boilerplate:

  • test/CMakeLists.txt (deps are meaningful, rest is boilerplate)
  • test/lit.site.cfg.py.in
  • test/lit.cfg.py
  • test/Unit/lit.site.cfg.py.in (this patch puts these under unittest/ instead, but either way)
  • test/Unit/lit.cfg.py

Creating 10 more copies of each of those seems a little irresponsible, especially for tools that are "lightly maintained". So it's hard to say with a straight face that restructuring them all is a good idea, and maybe this motivated the original clang-tools-extra structure.
But if those files were generated/eliminated somehow, restructuring would be very simple patches that didn't add more cruft to maintain.

It's not obvious to me exactly how to do this (lit.* files affect both discovery and configuration) but I'll dig into this more tomorrow.

That's fair. It could be that we decide that clang-tools-extra is the dumping ground for lightly maintained one-off tools where it makes sense to share docs and tests, and other things that are expected to be actively maintained long-term (clang-tidy, clandg) should live at the top level of the monorepo as stand-alone tools. That may add more tension for where to add new projects though, so it may not be ideal either.

nridge added a subscriber: nridge.Mar 11 2022, 2:13 PM
sammccall updated this revision to Diff 415619.Mar 15 2022, 4:04 PM
sammccall marked 4 inline comments as done.

Rebase and restore conventional test/Unit suite

This revision was landed with ongoing or failed builds.Mar 15 2022, 4:14 PM
This revision was automatically updated to reflect the committed changes.

Hi Sam
i think this breaks our amdgpu buildbot
https://lab.llvm.org/buildbot/#/builders/193/builds/8513

any suggestions?

Sorry, this broke the world and I wasn't watching it carefully enough.
Reverted in 049f4e4eab19c6e468e029232e94ca71245b0f56, trying to work out
why the tests passed locally.

Ah I see what happened - there was one incorrect (and unneeded) dependency left behind, which caused tests to fail if you *didn't* have clang-tools-extra enabled.
Relanding...

thakis added a subscriber: thakis.Mar 15 2022, 5:55 PM

Is there a reason why this can't be part of clang-tools-extra/test and check-clang-tools?

Why is this in a directory called "pseudo" instead of "clang-pseudo"? This makes everything fairly self-inconsistent (pseudo/tool vs clang-pseudo in there, but then also check-clang-pseudo) – I found this pretty confusing.

Reading CMAKE_CURRENT_SOURCE_DIR in the lit.cfg is fairly unusual. clang's unit tests don't do this, for example.

New lit.cfg input files should use the path() abstraction so that relative paths are in the generated lit.cfg output files (which helps for running the tests on a different computer than building them). See a16ba6fea2e554fae465dcaaca1d687d8e83a62e for an example.

Is there a reason why this can't be part of clang-tools-extra/test

This was discussed pretty extensively in the review above.
Other projects across LLVM use proj/{lib,include,unittest,test} and it works well.
clang-tools-extra isn't a project but rather a set of separate projects (few common developers, few changes across them). Mixing the directories of different projects doesn't work well.

I want to make it possible to convert more/all of the directories to separate-project structure, so I'm working on reducing the amount of CMake/lit.cfg boilerplate needed.

and check-clang-tools?

Yeah, we should add this. The narrower check-clang-pseudo target is essential for development though, and creating merged test targets seems to be a nontrivial amount of CMake goop, so I had put it in the cmake/lit cleanup pile, hoping to get to it soon.

So I can understand better, what is this useful for?
My mental model is: buildbots run check-all, people working on project run check-$project, and check-clang-tools is mostly useful if you made a clang change and want to test its rdeps.

Why is this in a directory called "pseudo" instead of "clang-pseudo"? This makes everything fairly self-inconsistent (pseudo/tool vs clang-pseudo in there, but then also check-clang-pseudo) – I found this pretty confusing.

It's in a directory with "clang" in the name, the "clang-" prefix only seems necessary in flat namespaces (in bin/, cmake targets etc).
Unlike the other directories here, this isn't really a command-line tool, but rather a library. The command-line tool is mostly a demo for lit testing. Maybe tool/ isn't the right name for that.

(I think it would be useful to drop the clang- prefixes off the other directories here too, but I'm not going to push too hard for that)

Reading CMAKE_CURRENT_SOURCE_DIR in the lit.cfg is fairly unusual. clang's unit tests don't do this, for example.

New lit.cfg input files should use the path() abstraction so that relative paths are in the generated lit.cfg output files (which helps for running the tests on a different computer than building them). See a16ba6fea2e554fae465dcaaca1d687d8e83a62e for an example.

Thanks. I'd seen that but thought clang-tools-extra didn't have this requirement/wasn't being tested in this way. I'm a bit reluctant to add these without a way to tell if it's doing anything, is there a buildbot to verify this or a script to run?

Is there a reason why this can't be part of clang-tools-extra/test

This was discussed pretty extensively in the review above.
Other projects across LLVM use proj/{lib,include,unittest,test} and it works well.
clang-tools-extra isn't a project but rather a set of separate projects (few common developers, few changes across them). Mixing the directories of different projects doesn't work well.

I want to make it possible to convert more/all of the directories to separate-project structure, so I'm working on reducing the amount of CMake/lit.cfg boilerplate needed.

It seems less work and less toil to make this one new thing look like all the other clang-tools-extra projects than to change all other clang-tools-extra projects to look like this new thing.

Thanks. I'd seen that but thought clang-tools-extra didn't have this requirement/wasn't being tested in this way. I'm a bit reluctant to add these without a way to tell if it's doing anything, is there a buildbot to verify this or a script to run?

Open the generated lit.site.cfg and verify that paths in there are relative. Check that check-foo still passes.

bjope added a subscriber: bjope.Apr 4 2022, 2:51 AM

Have had problems with failing stage 2 builds since this patch:

FAILED: tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o 
/repo//install/stage2/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/tools/extra/pseudo/unittests -I/repo//llvm-project/clang-tools-extra/pseudo/unittests -I/repo//llvm-project/clang/include -Itools/clang/include -Iinclude -I/repo//llvm-project/llvm/include -I/repo//llvm-project/clang-tools-extra/pseudo/include -Itools/clang/tools/extra/pseudo/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG    -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o -MF tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o.d -o tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o -c /repo//llvm-project/clang-tools-extra/pseudo/unittests/DirectiveMapTest.cpp
/repo//llvm-project/clang-tools-extra/pseudo/unittests/DirectiveMapTest.cpp:16:10: fatal error: 'gmock/gmock.h' file not found
#include "gmock/gmock.h"
         ^~~~~~~~~~~~~~~
1 error generated.

Is there some dependency to googletest missing somewhere?

(Unless someone not just knows about these things, then I guess I need to debug our build setup a bit more to give more details about it.)

Have had problems with failing stage 2 builds since this patch:

FAILED: tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o 
/repo//install/stage2/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/tools/extra/pseudo/unittests -I/repo//llvm-project/clang-tools-extra/pseudo/unittests -I/repo//llvm-project/clang/include -Itools/clang/include -Iinclude -I/repo//llvm-project/llvm/include -I/repo//llvm-project/clang-tools-extra/pseudo/include -Itools/clang/tools/extra/pseudo/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG    -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o -MF tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o.d -o tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o -c /repo//llvm-project/clang-tools-extra/pseudo/unittests/DirectiveMapTest.cpp
/repo//llvm-project/clang-tools-extra/pseudo/unittests/DirectiveMapTest.cpp:16:10: fatal error: 'gmock/gmock.h' file not found
#include "gmock/gmock.h"
         ^~~~~~~~~~~~~~~
1 error generated.

Is there some dependency to googletest missing somewhere?

(Unless someone not just knows about these things, then I guess I need to debug our build setup a bit more to give more details about it.)

The -I is supposed to be added by t

Have had problems with failing stage 2 builds since this patch:

FAILED: tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o 
/repo//install/stage2/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/tools/extra/pseudo/unittests -I/repo//llvm-project/clang-tools-extra/pseudo/unittests -I/repo//llvm-project/clang/include -Itools/clang/include -Iinclude -I/repo//llvm-project/llvm/include -I/repo//llvm-project/clang-tools-extra/pseudo/include -Itools/clang/tools/extra/pseudo/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -flto=thin -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG    -Wno-variadic-macros -Wno-gnu-zero-variadic-macro-arguments -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o -MF tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o.d -o tools/clang/tools/extra/pseudo/unittests/CMakeFiles/ClangPseudoTests.dir/DirectiveMapTest.cpp.o -c /repo//llvm-project/clang-tools-extra/pseudo/unittests/DirectiveMapTest.cpp
/repo//llvm-project/clang-tools-extra/pseudo/unittests/DirectiveMapTest.cpp:16:10: fatal error: 'gmock/gmock.h' file not found
#include "gmock/gmock.h"
         ^~~~~~~~~~~~~~~
1 error generated.

Is there some dependency to googletest missing somewhere?

(Unless someone not just knows about these things, then I guess I need to debug our build setup a bit more to give more details about it.)

The way this is meant to work is:

  • pseudo/unittests/CMakeLists.txt calls add_unittest
  • add_unittest (in AddLLVM.cmake) adds a dependency on llvm_gtest
  • llvm_gtest is defined in llvm/utils/unittest/CMakeLists.txt, and specifies target_include_directories(llvm_gtest PUBLIC ...) to add the directories

As far as I can tell this is how other things in clang and clang-tools-extra work

Maybe your configuration is just not building unittests and we're not respecting that setting? I'll try making that conditional...

Have had problems with failing stage 2 builds since this patch:

Should be fixed in 72ae6cc3a689869dcc15cfa8d0abcdd35a35a484, please let me know if not.

bjope added a comment.Apr 4 2022, 6:44 AM

Have had problems with failing stage 2 builds since this patch:

Should be fixed in 72ae6cc3a689869dcc15cfa8d0abcdd35a35a484, please let me know if not.

Yes, you are right. Fantastic finding without me even providing the cmake command.
But it was true that for the failing stage we had disabled tests using -DLLVM_INCLUDE_TESTS=OFF which result in CLANG_INCLUDE_TESTS being OFF as well. So your fix to respect the CLANG_INCLUDE_TESTS setting does indeed solve the problem. Thanks alot for the help!

Have had problems with failing stage 2 builds since this patch:

Should be fixed in 72ae6cc3a689869dcc15cfa8d0abcdd35a35a484, please let me know if not.

Yes, you are right. Fantastic finding without me even providing the cmake command.
But it was true that for the failing stage we had disabled tests using -DLLVM_INCLUDE_TESTS=OFF which result in CLANG_INCLUDE_TESTS being OFF as well. So your fix to respect the CLANG_INCLUDE_TESTS setting does indeed solve the problem. Thanks alot for the help!

No worries, sorry for the break!

clang/lib/Tooling/Syntax/Pseudo/README.md