Page MenuHomePhabricator

[AST] Add generator for source location introspection
ClosedPublic

Authored by steveire on Dec 12 2020, 10:24 AM.

Details

Summary

Generate a json file containing descriptions of AST classes and their
public accessors which return SourceLocation or SourceRange.

Use the JSON file to generate a C++ API and implementation for accessing
the source locations and method names for accessing them for a given AST
node.

This new API can be used to implement 'srcloc' output in clang-query:

http://ce.steveire.com/z/m_kTIo

In this first version of this feature, only the accessors for Stmt
classes are generated, not Decls, TypeLocs etc. Those can be added
after this change is reviewed, as this change is mostly about
infrastructure of these code generators.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
njames93 added inline comments.Feb 27 2021, 3:06 PM
clang/include/clang/Tooling/NodeIntrospection.h
55

Should this (and potential a few others) be moved to an implementation file.

64

Is this more readable, IDK, but it sure as hell is more fun :)

66–67
clang/lib/Tooling/DumpTool/APIData.h
22

While there is a proposal in place, right now we should ensure we aren't deviating from the current system in patches. Unfortunately readability-identifier-naming has been disabled on clang directory due to excessive violations. May I suggest re-enabling it locally and then running clang-tidy-diff.py over the patch. should square most things up.

clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp
23–26

Why is this a variable, a templated function should do the same thing.
I imagine its something like this.

46

This seems dangerous, the MatchASTConsumer doesn't own its MatchFinder, so this is going to leak.

clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.h
2

Can you add a modeline here, same goes for other headers.

15

Any reason for empty lines between includes?

clang/lib/Tooling/DumpTool/ClangSrcLocDump.cpp
81
87–89

Another point, is there any use case of this outside of clang-query. If not would it not be wise to move this infrastructure to clang-tools-extra/clang-query?

steveire updated this revision to Diff 326987.Feb 28 2021, 11:07 AM
steveire marked 7 inline comments as done.

Update

steveire marked 3 inline comments as done.Feb 28 2021, 11:07 AM
steveire added inline comments.
clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp
23–26

That seems far more noisy. I've left it as a lambda and moved it to the point of use.

steveire marked 2 inline comments as done.Feb 28 2021, 11:10 AM
steveire added inline comments.
clang/include/clang/Tooling/NodeIntrospection.h
55

The implementation file is generated by the python script. Rather than hiding this in the python script, I think it's better to leave it here.

steveire updated this revision to Diff 326997.Feb 28 2021, 2:13 PM
steveire marked an inline comment as done.

Update

This is almost ready but a few more points need addressing.
Running clang-format over the inc file is pointless and just extends compilation time while adding an unnecessary dependency on clang-format.
The inc file should likely live in the include build directory, All tablegen files seem to live in there. You could either move the CMake code that generates it into the include directory, or alter the directory, this should do that if you want and its safer than replace as it would only change the last /lib/ detected.

# Replace the last lib component of the current binary directory with include
string(FIND ${CMAKE_CURRENT_BINARY_DIR} "/lib/" PATH_LIB_START REVERSE)
if(PATH_LIB_START EQUAL -1)
  message(FATAL_ERROR "Couldn't find lib component in binary directory")
endif()
math(EXPR PATH_LIB_END "${PATH_LIB_START}+5")
string(SUBSTRING ${CMAKE_CURRENT_BINARY_DIR} 0 ${PATH_LIB_START} PATH_HEAD)
string(SUBSTRING ${CMAKE_CURRENT_BINARY_DIR} ${PATH_LIB_END} -1 PATH_TAIL)
string(CONCAT BINARY_INCLUDE_DIR ${PATH_HEAD} "/include/" ${PATH_TAIL})

After moving it to the Include output folder, In the cpp file you would need #include "clang/Tooling/NodeIntrospection.inc".
This would also remove a lot of those commands in lib/tooling/CMakeLists.txt.
Tablegen has a command line option --write-if-changed It may be wise to also include that in your generator script instead of using copy-if-different in the aforementioned CMakeLists.txt.

clang/include/clang/Tooling/NodeIntrospection.h
17–18

These should be quoted includes

clang/lib/Tooling/CMakeLists.txt
29

It may be wise to use the COMMENT argument to let the users know that it's building the ASTNodeAPI.json.

56

Likewise a comment to say building NodeIntrospection.inc.

128

This shouldn't appear in the source list.

clang/lib/Tooling/DumpTool/APIData.h
11

Header guard should be LLVM_CLANG_LIB_TOOLING_DUMPTOOL_APIDATA_H

clang/lib/Tooling/NodeIntrospection.cpp
14–16

Quoted includes and the NodeIntrospection.h include is the MainFileInclude so should appear first.

steveire marked 9 inline comments as done.Mar 10 2021, 2:35 PM
steveire added inline comments.
clang/lib/Tooling/CMakeLists.txt
128

We need to tell CMake the dependency.

steveire marked an inline comment as done.Mar 10 2021, 2:36 PM
njames93 accepted this revision.Mar 10 2021, 2:37 PM

nit: A few reformat hints

This revision is now accepted and ready to land.Mar 10 2021, 2:37 PM
steveire marked an inline comment as done.Mar 10 2021, 2:37 PM
This revision was landed with ongoing or failed builds.Mar 10 2021, 2:39 PM
This revision was automatically updated to reflect the committed changes.

@thakis Presumably you'll have to update the GN build now.

A few more high-level questions:

  • What's the point of the intermediary json file? Why not generate the final c++ directly? (As far as I can tell, this wasn't discussed during the review yet)
  • Do we need to generate code for this at all? Could this be done via xmacros or tablegen?

Having a bespoke custom python -> json -> python -> c++ pipeline here seems like it's fairly different from how the rest of clang does things, and it seems like it duplicates some of the existing tooling we have here.

(Having said that, I'm no code owner here -- @rsmith is. Maybe he has an opinion.)

Lower-level: Did you see all the comments on https://reviews.llvm.org/rGd627a27d264b47eda3f15f086ff419dfe053ebf7 ? This relanded with them unaddressed. Please address them in a follow-up. (Sorry for leaving the comments on the commit instead of the review!)

clang/lib/Tooling/CMakeLists.txt
31

Putting this in the root of the build dir seems a bit untidy. I think CMAKE_CURRENT_BINARY_DIR is what we usually use for generated files.

74

...like used here. Why generated/? Everything in CMAKE_CURRENT_BINARY_DIR is generated. (Compare to find . -name '*.inc')

Also, why not make the python script write the file only if changed instead of making a copy here? (like llvm-tblgen does)

Hello. Does this work when the default target triple isn't native? This seems to be trying to compile clang sources with the just built clang - something that I don't think is always possible. I'm seeing errors like fatal error: 'cstddef' file not found, and failing to link the new IntrospectionTests, with undefined references to NodeIntrospection::GetLocations, as a full toolchain is not setup. I don't believe the buildbots are working right now, so it's difficult to see if any other systems have similar problems.

Also some of the files here have been added with a University of Illinois Open Source License. They should presumably be using the newer Apache license now.

A few more high-level questions:

  • What's the point of the intermediary json file? Why not generate the final c++ directly? (As far as I can tell, this wasn't discussed during the review yet)

It came up in review earlier: https://reviews.llvm.org/D93164#2456181

  • Do we need to generate code for this at all? Could this be done via xmacros or tablegen?

Can you say more? Would this require generating the declarations in include/clang/AST? It sounds like a large maintenance burden, but maybe I'm missing something.

Having a bespoke custom python -> json -> python -> c++ pipeline here seems like it's fairly different from how the rest of clang does things, and it seems like it duplicates some of the existing tooling we have here.

(Having said that, I'm no code owner here -- @rsmith is. Maybe he has an opinion.)

Lower-level: Did you see all the comments on https://reviews.llvm.org/rGd627a27d264b47eda3f15f086ff419dfe053ebf7 ? This relanded with them unaddressed. Please address them in a follow-up. (Sorry for leaving the comments on the commit instead of the review!)

Done, thanks!

Hello. Does this work when the default target triple isn't native? This seems to be trying to compile clang sources with the just built clang - something that I don't think is always possible. I'm seeing errors like fatal error: 'cstddef' file not found, and failing to link the new IntrospectionTests, with undefined references to NodeIntrospection::GetLocations, as a full toolchain is not setup. I don't believe the buildbots are working right now, so it's difficult to see if any other systems have similar problems.

Can you report if running cmake . -DCLANG_TOOLING_BUILD_AST_INTROSPECTION=OFF allows you to complete the build?

Is there some way I can check that the default target triple isn't native in the cmake code? Then I can set that option automatically.

Also some of the files here have been added with a University of Illinois Open Source License. They should presumably be using the newer Apache license now.

Fixed, thanks!

This revision is now accepted and ready to land.Mar 14 2021, 9:09 AM

Was the problem there just the shebang line?

@nikic How do I run a test build on that machine? Or can you diagnose the problem instead?

nikic added a comment.Mar 14 2021, 9:47 AM

@steveire When running the command manually I get:

/root/llvm-compile-time-tracker/llvm-project/clang/lib/Tooling/DumpTool/generate_cxx_src_locs.py --json-input-path /root/llvm-compile-time-tracker/llvm-project-build/ASTNodeAPI.json --output-file generated/NodeIntrospection.inc --empty-implementation 0
-bash: /root/llvm-compile-time-tracker/llvm-project/clang/lib/Tooling/DumpTool/generate_cxx_src_locs.py: /usr/bin/python: bad interpreter: No such file or directory

So probably this doesn't work on any system that only has Python 3.

As the .py file is invoked directly, I assume that ${PYTHON_EXECUTABLE} is empty. Maybe you were looking for ${Python3_EXECUTABLE}?

arichardson added inline comments.
clang/lib/Tooling/DumpTool/generate_cxx_src_locs.py
2

Maybe this should be #!/use/bin/env python (or python3) instead?

It may be wise to alter this so there is no need for the python script.
How about altering the dump tool to support outputting both json files and the cpp code needed for node introspection.
Maybe have arguments to the tool --json-output-path and --introspection-output-path.
If both are specified write both, if none specified error out.

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2021, 3:36 PM

ast-dump-tool is still somewhere in lib/ instead of in tools/ in the reland as far as I can tell.

I am seeing a spew of errors after 19740652c4c4329e2b9e77f96e5e31c360b4e8bb (what appears to be the latest version of this patch):

$ cmake \
-G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=$(command -v clang) \
-DCMAKE_CXX_COMPILER=$(command -v clang++) \
-DLLVM_CCACHE_BUILD=ON \
-DLLVM_ENABLE_PROJECTS=clang \
../llvm
...

$ ninja run-ast-api-dump-tool
...
In file included from /home/nathan/src/llvm-project/build/tools/clang/lib/Tooling/ASTTU.cpp:2:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/AST.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/ASTContext.h:19:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/CanonicalType.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/Type.h:20:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/DependenceFlags.h:11:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/Basic/BitmaskEnum.h:18:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:16:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/Support/MathExtras.h:21:
/usr/include/c++/10.2.0/cstdint:52:11: error: no member named 'int_fast8_t' in the global namespace
  using ::int_fast8_t;
        ~~^
/usr/include/c++/10.2.0/cstdint:53:11: error: no member named 'int_fast16_t' in the global namespace; did you mean '__int_least16_t'?
  using ::int_fast16_t;
        ~~^
/usr/include/bits/types.h:54:19: note: '__int_least16_t' declared here
typedef __int16_t __int_least16_t;
                  ^
In file included from /home/nathan/src/llvm-project/build/tools/clang/lib/Tooling/ASTTU.cpp:2:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/AST.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/ASTContext.h:19:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/CanonicalType.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/Type.h:20:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/DependenceFlags.h:11:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/Basic/BitmaskEnum.h:18:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:16:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/Support/MathExtras.h:21:
/usr/include/c++/10.2.0/cstdint:54:11: error: no member named 'int_fast32_t' in the global namespace; did you mean '__int_least32_t'?
  using ::int_fast32_t;
        ~~^
/usr/include/bits/types.h:56:19: note: '__int_least32_t' declared here
typedef __int32_t __int_least32_t;
                  ^
In file included from /home/nathan/src/llvm-project/build/tools/clang/lib/Tooling/ASTTU.cpp:2:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/AST.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/ASTContext.h:19:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/CanonicalType.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/Type.h:20:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/DependenceFlags.h:11:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/Basic/BitmaskEnum.h:18:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:16:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/Support/MathExtras.h:21:
/usr/include/c++/10.2.0/cstdint:55:11: error: no member named 'int_fast64_t' in the global namespace; did you mean '__int_least64_t'?
  using ::int_fast64_t;
        ~~^
/usr/include/bits/types.h:58:19: note: '__int_least64_t' declared here
typedef __int64_t __int_least64_t;
                  ^
...

I am seeing a spew of errors after 19740652c4c4329e2b9e77f96e5e31c360b4e8bb (what appears to be the latest version of this patch):

Hi @nathanchance Does it make your build fail? I have pushed a fix. Can you update and try again?

I am seeing a spew of errors after 19740652c4c4329e2b9e77f96e5e31c360b4e8bb (what appears to be the latest version of this patch):

Hi @nathanchance Does it make your build fail? I have pushed a fix. Can you update and try again?

Does not look like my build ever errored out (exit code was 0 even without the fix you pushed).

It does look like the errors are hidden now though (mostly but that is fine enough for me):

[1373/1373] ASTNodeAPI.json
20 errors generated.

Thanks for the quick response!

This change breaks cross-compilation now, as it tries running an executable built for the target system:

FAILED: tools/clang/lib/Tooling/ASTNodeAPI.json 
cd /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling && /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump --skip-processing=0 --astheader=/home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTTU.cpp -I /home/mgorny/llvm-project/build.arm64/lib/clang/13.0.0/include -I /home/mgorny/llvm-project/llvm/../clang/include -I /home/mgorny/llvm-project/build.arm64/tools/clang/include -I /home/mgorny/llvm-project/build.arm64/include -I /home/mgorny/llvm-project/llvm/include -I /sysroot/arm64/usr/include/c++/v1 -I /usr/lib/clang/11.0.1/include -I /sysroot/arm64/usr/include --json-output-path /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTNodeAPI.json
/bin/sh: /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump: Exec format error

I guess you can look at TableGens how to correctly generate and use host executables.