This is an archive of the discontinued LLVM Phabricator instance.

[clang][ASTMatchers] Add a few type-related Matchers
AbandonedPublic

Authored by danix800 on Aug 25 2023, 11:54 AM.

Details

Summary

Add a few type-related Matchers

  • bitIntType
  • constantMatrixType
  • dependentAddressSpaceType
  • dependentBitIntType
  • dependentSizedMatrixType
  • dependentVectorType
  • hasTypeForDecl
  • objcTypeParamDecl
  • objcTypeParamType
  • pipeType
  • vectorType

Diff Detail

Event Timeline

danix800 created this revision.Aug 25 2023, 11:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2023, 11:54 AM
Herald added a subscriber: ChuanqiXu. · View Herald Transcript
danix800 requested review of this revision.Aug 25 2023, 11:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2023, 11:54 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
PiotrZSL accepted this revision.Aug 25 2023, 1:28 PM

LGTM.
Nothing fancy, looks straight forward.
You may want to wait for an Aaron Ballman opinion.

clang/docs/LibASTMatchersReference.html
2454

Maybe: "Matches a fixed int type of a specified bitwidth."

This revision is now accepted and ready to land.Aug 25 2023, 1:28 PM

LGTM.
Nothing fancy, looks straight forward.
You may want to wait for an Aaron Ballman opinion.

Thanks for the reviewing.

Ping @aaron.ballman, could you please take a look at this revision?

danix800 updated this revision to Diff 553697.Aug 25 2023, 8:50 PM

Update docs for bitIntType & dependentBitIntType.

danix800 marked an inline comment as done.Aug 25 2023, 8:51 PM
danix800 updated this revision to Diff 553779.Aug 26 2023, 8:44 PM

Update doc: fix improper naming in objc testcase

Are these matchers going to be used in-tree (by clang-tidy, or something else)? We typically do not add new AST matches until there's a need for them because the AST matchers have a pretty big impact on build times of Clang itself.

clang/include/clang/ASTMatchers/ASTMatchers.h
3971

This could use some example code that demonstrates usage.

3974

I think this needs to go through ASTContext::getTypeDeclType() for correctness, as the ASTContext is what manages the type cache and performs extra work: https://github.com/llvm/llvm-project/blob/45eb6026d979e306a12ea9e223852b28c3c9edbf/clang/include/clang/AST/ASTContext.h#L1561

7061

How about: Matches a '_BitInt' or '_ExtInt' type.

7068

Drop the i (it only matches the type, not the whole declaration).

7078

Seems like some type/declaration confusion here as well -- it matches the underlying type used by the typedef declaration of X. Same applies below.

danix800 added a comment.EditedAug 28 2023, 5:29 PM

Are these matchers going to be used in-tree (by clang-tidy, or something else)? We typically do not add new AST matches until there's a need for them because the AST matchers have a pretty big impact on build times of Clang itself.

They are used in ASTImporter testcases as shown in https://reviews.llvm.org/D158948. Though this might not be a strong reason to
add these matchers and bring on too much bad impact.

ASTImporter is more urgent since we still lack support for some of the AST nodes so I considered adding them batchly and started with
type-related nodes. I thought that matchers're OK to use in unittests as actually they are used a lot there, but I wasn't aware of the impact
on build of Clang.

If not acceptable, I'm OK with it. We can still test importing with other ways. Matchers are not mandatory.

EDIT: If acceptable, tools like clang-query can benefit from these matchers too, easy for
debugging I guess. Please let me known whether I should proceed on this revision or not. Thanks for reviewing anyway!

Are these matchers going to be used in-tree (by clang-tidy, or something else)? We typically do not add new AST matches until there's a need for them because the AST matchers have a pretty big impact on build times of Clang itself.

They are used in ASTImporter testcases as shown in https://reviews.llvm.org/D158948. Though this might not be a strong reason to
add these matchers and bring on too much bad impact.

ASTImporter is more urgent since we still lack support for some of the AST nodes so I considered adding them batchly and started with
type-related nodes. I thought that matchers're OK to use in unittests as actually they are used a lot there, but I wasn't aware of the impact
on build of Clang.

If not acceptable, I'm OK with it. We can still test importing with other ways. Matchers are not mandatory.

EDIT: If acceptable, tools like clang-query can benefit from these matchers too, easy for
debugging I guess. Please let me known whether I should proceed on this revision or not. Thanks for reviewing anyway!

Pulling in a few more folks for extra opinions in case I'm off-base, but I think a good approach for right now is to add these matches to the unit test file directly in the AST importer patch and to hold off on landing this one. Then you still get the test coverage you need in the unit tests, but the extra compilation time is limited to just one .cpp file and not everything including ASTMatchers.h. As we find a need for the matchers, we can lift them from the unit test into the public matchers. It's not that there's no utility to these (there is!), it's more just the balancing act between compile times and utility.

Are these matchers going to be used in-tree (by clang-tidy, or something else)? We typically do not add new AST matches until there's a need for them because the AST matchers have a pretty big impact on build times of Clang itself.

They are used in ASTImporter testcases as shown in https://reviews.llvm.org/D158948. Though this might not be a strong reason to
add these matchers and bring on too much bad impact.

ASTImporter is more urgent since we still lack support for some of the AST nodes so I considered adding them batchly and started with
type-related nodes. I thought that matchers're OK to use in unittests as actually they are used a lot there, but I wasn't aware of the impact
on build of Clang.

If not acceptable, I'm OK with it. We can still test importing with other ways. Matchers are not mandatory.

EDIT: If acceptable, tools like clang-query can benefit from these matchers too, easy for
debugging I guess. Please let me known whether I should proceed on this revision or not. Thanks for reviewing anyway!

Pulling in a few more folks for extra opinions in case I'm off-base, but I think a good approach for right now is to add these matches to the unit test file directly in the AST importer patch and to hold off on landing this one. Then you still get the test coverage you need in the unit tests, but the extra compilation time is limited to just one .cpp file and not everything including ASTMatchers.h. As we find a need for the matchers, we can lift them from the unit test into the public matchers. It's not that there's no utility to these (there is!), it's more just the balancing act between compile times and utility.

This is a sad state of affairs :-(
Not having "obvious" matchers for some types/properties in the AST is a real problem inhibiting use of matchers IME. People will use fragile hacks or give up entirely rather than write custom matchers, and missing base matchers for certain types is perceived as a bug.
To have someone recognize a gap & contribute matchers, and not accept the patch because matchers are too slow to compile, is pretty painful.
But the compile time issue is real!

A few ideas:

  • could we move these (and possibly other rarely-used matchers) to ObscureMatchers.h or something? It's definitely ugly, but maybe not as bad as missing them entirely.
  • seems like we should extract an ASTMatchersBase.h or so to separate "matcher framework" from "all the slow matchers" so headers that expose matcher-based APIs don't have to transitively expose everything, reducing the cost of such changes
  • (long shot) maybe we can work out which code patterns are particularly terrible for parse times and change them

Are these matchers going to be used in-tree (by clang-tidy, or something else)? We typically do not add new AST matches until there's a need for them because the AST matchers have a pretty big impact on build times of Clang itself.

They are used in ASTImporter testcases as shown in https://reviews.llvm.org/D158948. Though this might not be a strong reason to
add these matchers and bring on too much bad impact.

ASTImporter is more urgent since we still lack support for some of the AST nodes so I considered adding them batchly and started with
type-related nodes. I thought that matchers're OK to use in unittests as actually they are used a lot there, but I wasn't aware of the impact
on build of Clang.

If not acceptable, I'm OK with it. We can still test importing with other ways. Matchers are not mandatory.

EDIT: If acceptable, tools like clang-query can benefit from these matchers too, easy for
debugging I guess. Please let me known whether I should proceed on this revision or not. Thanks for reviewing anyway!

Pulling in a few more folks for extra opinions in case I'm off-base, but I think a good approach for right now is to add these matches to the unit test file directly in the AST importer patch and to hold off on landing this one. Then you still get the test coverage you need in the unit tests, but the extra compilation time is limited to just one .cpp file and not everything including ASTMatchers.h. As we find a need for the matchers, we can lift them from the unit test into the public matchers. It's not that there's no utility to these (there is!), it's more just the balancing act between compile times and utility.

This is a sad state of affairs :-(

Agreed, but it's also not a new state of affairs; we've held this position for as long as we've had AST matchers. (For reference, ASTMatchers.h creates so many symbols we needed to enable /bigobj just to be able to generate an object file that could represent it -- you use that option when you have > 65k addressable sections in an object file!)

Not having "obvious" matchers for some types/properties in the AST is a real problem inhibiting use of matchers IME. People will use fragile hacks or give up entirely rather than write custom matchers, and missing base matchers for certain types is perceived as a bug.
To have someone recognize a gap & contribute matchers, and not accept the patch because matchers are too slow to compile, is pretty painful.
But the compile time issue is real!

Agreed. :-(

A few ideas:

  • could we move these (and possibly other rarely-used matchers) to ObscureMatchers.h or something? It's definitely ugly, but maybe not as bad as missing them entirely.
  • seems like we should extract an ASTMatchersBase.h or so to separate "matcher framework" from "all the slow matchers" so headers that expose matcher-based APIs don't have to transitively expose everything, reducing the cost of such changes
  • (long shot) maybe we can work out which code patterns are particularly terrible for parse times and change them

The problem is the pile of templates that is used to generate the DSL and used for dynamic matchers by clang-query. IIRC, we looked into reducing the overhead and eventually punted on it, some background can be found here: https://reviews.llvm.org/D84038 when we finally decided to enable /bigobj everywhere. I know we did a pass to split ASTMatchers.h up so we now have ASTMatchersInternal.h, so I think we might have done about as much extraction as we can (though it'd be worth seeing if there's more we can do now that we've got access to C++17 in source; I think we were still on C++11 when we last did this).

Ideally, I would love to table generate more of the AST matchers from source such as https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/TypeNodes.td and https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/DeclNodes.td so that we automatically get all of the type and declaration matchers (at least the top-level matchers) without any extra maintenance overhead to keep them in sync. But I've always had the impression this idea was dead in the water because of overhead. I suspect moving things around into multiple headers won't help all that much (and it will likely cause confusion as to what header file to put any given AST matcher into) because Clang itself will still need to include all of those headers (and thus will still hit all the compile time and object size problems).

Thanks for the history, I wasn't aware this was such a big deal for such a long time!

A few ideas:

  • could we move these (and possibly other rarely-used matchers) to ObscureMatchers.h or something? It's definitely ugly, but maybe not as bad as missing them entirely.
  • seems like we should extract an ASTMatchersBase.h or so to separate "matcher framework" from "all the slow matchers" so headers that expose matcher-based APIs don't have to transitively expose everything, reducing the cost of such changes
  • (long shot) maybe we can work out which code patterns are particularly terrible for parse times and change them

The problem is the pile of templates that is used to generate the DSL and used for dynamic matchers by clang-query. IIRC, we looked into reducing the overhead and eventually punted on it, some background can be found here: https://reviews.llvm.org/D84038 when we finally decided to enable /bigobj everywhere.

Thanks, I'll take a look and see if C++17 gives us new tools, but seems likely it's just too complicated.
Clearly the answer here is to force everyone to use the dynamic matchers and string literals. (Only half kidding - if we didn't try to express all constraints at compile time this would all be so much easier...)

I know we did a pass to split ASTMatchers.h up so we now have ASTMatchersInternal.h, so I think we might have done about as much extraction as we can

The thing I think is possible is to have a header that defines DeclMatcher, BoundNodes, anything etc without including all the concrete matchers. This is the header that ASTMatchFinder.h should include.

(At the moment we have some insane include chains like Analysis/FlowSensitive/DataflowAnalysis.h => MatchSwitch.h (for TransferState) => ASTMatchers.h` (for anything etc). Thus the main header of an ~unrelated library exports all matcher types).

Ideally, I would love to table generate more of the AST matchers ... Clang itself will still need to include all of those headers (and thus will still hit all the compile time and object size problems).

That would be great. For compile times, I'd expect we could include fewer matchers into fewer TUs - very little of clang should be using most of the matchers, and the sum of TU compile times is usually somewhat more important than the max.
I have less understanding of the object size issues. My understanding is that the current matchers are written ~entirely as templates in headers, the API surface is that way to achieve the decidedly un-C++-like DSL, but the implementation just because it's convenient. Fundamentally it doesn't seem like basic node-type matchers should have to instantiate many templates!

I did some testing on my machine (host compiler clang-14), and I'm not sure the specifics justify the level of caution here:

ASTMatchers.h parses in 6.5s (with clang-check)
If I comment out the code + internal ASTMatcher includes, then just the deps (mostly AST headers) parses in 4.9s.
1.6s for matchers impl is slow but not build-breakingly so. I don't think we should be scared of the marginal parse time increase from this change. (ASTMatchers is not in all that many TUs anyhow - no "core" headers re-export it).

The bug referenced is about object size. The connection with matchers seems to be that matcher tests had their section limit increased, before this applied to the whole project.
However this is specifically for tests that test most of the matchers, you only pay for what you use.
I verified that a cpp file that includes ASTMatchers.h and does nothing else generates a trivial-sized object file (<1kb). So again, I don't think we need to fear this change: if it's used in many places, it's justified, and if it's not used, it's not expensive.

(If compilation of files like NodeMatchersTest.cpp are significant build bottlenecks, we'd need to address that, but *those* would be fine to split up in unprincipled ways, so I'm not too concerned)

I did some testing on my machine (host compiler clang-14), and I'm not sure the specifics justify the level of caution here:

Thank you for doing some research on this!

ASTMatchers.h parses in 6.5s (with clang-check)
If I comment out the code + internal ASTMatcher includes, then just the deps (mostly AST headers) parses in 4.9s.
1.6s for matchers impl is slow but not build-breakingly so. I don't think we should be scared of the marginal parse time increase from this change. (ASTMatchers is not in all that many TUs anyhow - no "core" headers re-export it).

Out of curiosity, are you testing on Windows with MSVC? My understanding on build time performance impacts was that it's debug builds with MSVC that get hit especially hard. (I can test that setup but I could use some help figuring out how you tested compile times so I can do a similar test instead of something totally different from what you tried.)

It would be fantastic if the build time issues were not a problem!

The bug referenced is about object size. The connection with matchers seems to be that matcher tests had their section limit increased, before this applied to the whole project.
However this is specifically for tests that test most of the matchers, you only pay for what you use.
I verified that a cpp file that includes ASTMatchers.h and does nothing else generates a trivial-sized object file (<1kb). So again, I don't think we need to fear this change: if it's used in many places, it's justified, and if it's not used, it's not expensive.

(If compilation of files like NodeMatchersTest.cpp are significant build bottlenecks, we'd need to address that, but *those* would be fine to split up in unprincipled ways, so I'm not too concerned)

FWIW, we had to increase the limits on various files until we finally broke down and did it project-wide. So it wasn't just limited to testing files having the issue; I think the big problem was templated symbols from instantiations. But as you say, if it's pay-for-what-you-use, that's pretty reasonable.

CC @PiotrZSL @LegalizeAdulthood @carlosgalvezp as folks who work on clang-tidy, which uses AST matchers all over the place. They may also have some good insights or preferences here.

Out of curiosity, are you testing on Windows with MSVC? My understanding on build time performance impacts was that it's debug builds with MSVC that get hit especially hard. (I can test that setup but I could use some help figuring out how you tested compile times so I can do a similar test instead of something totally different from what you tried.)

I'm not :-( I don't have a windows machine, and the VM I used to have has bitrotted beyond repair (not sure how useful benchmarks from there would be anyway).

This is with clang on linux, just running time clang-check-14 include/clang/ASTMatchers/ASTMatchers.h a few times.
The "deps only" test was #if 0 ing out all the code in ASTMatcher.h, and commenting out the ASTMatchers/* includes, and repeating the same.
For object files I just added a dummy ASTMatchers.cpp with only #include "clang/ASTMatchers/ASTMatchers.h".

I guess the equivalent of clang-check would be /Zs, not sure the best way to feed it the right flags...

danix800 added a comment.EditedAug 29 2023, 6:25 PM

Out of curiosity, are you testing on Windows with MSVC? My understanding on build time performance impacts was that it's debug builds with MSVC that get hit especially hard. (I can test that setup but I could use some help figuring out how you tested compile times so I can do a similar test instead of something totally different from what you tried.)

I'm not :-( I don't have a windows machine, and the VM I used to have has bitrotted beyond repair (not sure how useful benchmarks from there would be anyway).

This is with clang on linux, just running time clang-check-14 include/clang/ASTMatchers/ASTMatchers.h a few times.
The "deps only" test was #if 0 ing out all the code in ASTMatcher.h, and commenting out the ASTMatchers/* includes, and repeating the same.
For object files I just added a dummy ASTMatchers.cpp with only #include "clang/ASTMatchers/ASTMatchers.h".

I guess the equivalent of clang-check would be /Zs, not sure the best way to feed it the right flags...

Thanks for the testing. I can do the windows part with MSVC cl.exe.

  1. ASTMatchers.h: ~3.5s
  2. "Deps only": ~3.2s
  3. ASTMatchers.cpp obj: 8K
$ uname -a
MINGW64_NT-10.0-20348 WIN-LH267487PAC 3.3.3-341.x86_64 2022-01-17 11:45 UTC x86_64 Msys

$ echo $SHELL
/usr/bin/bash

$ cat test.ps1
& "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\bin\Hostx64\x64\cl.exe" /permissive- /MP /we"4238" /GS /Zc:preprocessor /W4 /wd"4141" /wd"4146" /wd"4244" /wd"4267" /wd"4291" /wd"4351" /wd"4456" /wd"4457" /wd"4458" /wd"4459" /wd"4503" /wd"4624" /wd"4722" /wd"4100" /wd"4127" /wd"4512" /wd"4505" /wd"4610" /wd"4510" /wd"4702" /wd"4245" /wd"4706" /wd"4310" /wd"4701" /wd"4703" /wd"4389" /wd"4611" /wd"4805" /wd"4204" /wd"4577" /wd"4091" /wd"4592" /wd"4319" /wd"4709" /wd"5105" /wd"4324" /Zc:wchar_t /I"C:\Users\Administrator\source\repos\llvm-project\build-main-vs2022\tools\clang\lib\ASTMatchers" /I"C:\Users\Administrator\source\repos\llvm-project\clang\lib\ASTMatchers" /I"C:\Users\Administrator\source\repos\llvm-project\clang\include" /I"C:\Users\Administrator\source\repos\llvm-project\build-main-vs2022\tools\clang\include" /I"C:\Users\Administrator\source\repos\llvm-project\build-main-vs2022\include" /I"C:\Users\Administrator\source\repos\llvm-project\llvm\include" /Gm- /O2 /Ob2 /Zc:inline /fp:precise /U"NDEBUG" /D "_UNICODE" /D "UNICODE" /D "WIN32" /D "_WINDOWS" /D "_GLIBCXX_ASSERTIONS" /D "_LIBCPP_ENABLE_ASSERTIONS" /D "_CRT_SECURE_NO_DEPRECATE" /D "_CRT_SECURE_NO_WARNINGS" /D "_CRT_NONSTDC_NO_DEPRECATE" /D "_CRT_NONSTDC_NO_WARNINGS" /D "_SCL_SECURE_NO_DEPRECATE" /D "_SCL_SECURE_NO_WARNINGS" /D "__STDC_CONSTANT_MACROS" /D "__STDC_FORMAT_MACROS" /D "__STDC_LIMIT_MACROS" /D CMAKE_INTDIR=\"Release\" /errorReport:prompt /WX- /Zc:forScope /GR /Gd /Oi /MD /std:c++17 /EHsc /nologo /Fp"obj.clangASTMatchers.dir\Release\obj.clangASTMatchers.pch" /diagnostics:column /Zc:__cplusplus /bigobj -w14062 /Gw /I"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include" /I"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\atlmfc\include" /I"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" /I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt" /I"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\UnitTest\include" /I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\um" /I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\shared" /I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\winrt" /I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\cppwinrt" /c ..\clang\include\clang\ASTMatchers\ASTMatchers.h /TP

 $ time powershell ./test.ps1
ASTMatchers.h

real    0m3.143s
user    0m0.015s
sys     0m0.000s

EDIT: Obj size is incorrect, sorry!

# incorrect
$ du -h ASTMatchers.obj
8.0K    ASTMatchers.obj

$ du -b ASTMatchers.obj
5283    ASTMatchers.obj

About 5.3k.

From a mine perspective when it comes to clang-tidy:

  • I personally do not care about Objective-C specific matchers - I do not plan to use them.
  • I personally avoid using "Type" based matchers if I can (due to mess with elaborated/wrapped types), it's easier to create custom local matcher for QualType and do there anything is needed. QualType/Type class got very helpful API.
  • There is nothing more annoying than not having a main matcher for specific Stmt, Decl, Expr, TypeLoc (places that actually got SourceLocation) and some basic matchers for traversal into parents/childs.
  • There are lot of local matchers defined in Clang-tidy, those would be a best candidates to push upwards.
  • I'm missing some matchers related to AST traversal: noneOf, equalOrDescendant, hasFirstAncestor(PositiveMatcher, NegativeMatcher), isBeforeInTransationUnit, isInSameScopeAsBoundNode, ...

As for matchers in this change, except maybe hasTypeForDecl, I do not think that I will be using them.

From a mine perspective when it comes to clang-tidy:

  • I personally do not care about Objective-C specific matchers - I do not plan to use them.
  • I personally avoid using "Type" based matchers if I can (due to mess with elaborated/wrapped types), it's easier to create custom local matcher for QualType and do there anything is needed. QualType/Type class got very helpful API.
  • There is nothing more annoying than not having a main matcher for specific Stmt, Decl, Expr, TypeLoc (places that actually got SourceLocation) and some basic matchers for traversal into parents/childs.
  • There are lot of local matchers defined in Clang-tidy, those would be a best candidates to push upwards.
  • I'm missing some matchers related to AST traversal: noneOf, equalOrDescendant, hasFirstAncestor(PositiveMatcher, NegativeMatcher), isBeforeInTransationUnit, isInSameScopeAsBoundNode, ...

As for matchers in this change, except maybe hasTypeForDecl, I do not think that I will be using them.

All right, I'll abandon this revison. Thanks for reviewing.

danix800 abandoned this revision.Aug 29 2023, 11:21 PM