This is an archive of the discontinued LLVM Phabricator instance.

[AST] Add generator for source location introspection
ClosedPublic

Authored by steveire on Dec 12 2020, 10:24 AM.

Details

Summary

Generate a json file containing descriptions of AST classes and their
public accessors which return SourceLocation or SourceRange.

Use the JSON file to generate a C++ API and implementation for accessing
the source locations and method names for accessing them for a given AST
node.

This new API can be used to implement 'srcloc' output in clang-query:

http://ce.steveire.com/z/m_kTIo

In this first version of this feature, only the accessors for Stmt
classes are generated, not Decls, TypeLocs etc. Those can be added
after this change is reviewed, as this change is mostly about
infrastructure of these code generators.

Diff Detail

Event Timeline

steveire created this revision.Dec 12 2020, 10:24 AM
steveire requested review of this revision.Dec 12 2020, 10:24 AM
Herald added a project: Restricted Project. · View Herald Transcript

Do I understand correctly that the workflow is to use the new dumping tool to generate the needed JSON file that then gets used as input to generate_cxx_src_locs.py which creates NodeLocationIntrospection.cpp/.h that then gets used by clang-query (eventually)? So there are two levels of translation involved to get the final source code? If so, do you know what the performance/overhead for this looks like compared to a typical build? I'm trying to get an idea for whether this will have negative impacts on the build bots such that we may want to add an LLVM cmake configure option to control whether this work happens or not.

clang/lib/Tooling/DumpTool/APIData.h
2

Looks like a copy pasta error.

11

Might as well fix this lint warning.

22

per the usual naming rules.

26

Are these extensions going to add new members for those? If so, perhaps Locs and Rngs should have more descriptive names initially?

clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp
41

TIL what "clade" means, thank you for that new word. :-D

72–75

Similar question here about whether we should use less generic names or not.

126
128
135
136–137
155–156
158
clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.h
11

This one as well.

18

I don't think this is being used, but you should include what you use (StringRef, unique_ptr)

30

Should this ctor be marked explicit?

32–35
clang/lib/Tooling/DumpTool/ClangSrcLocDump.cpp
44

Hmmm, do we want such a long name for this option? I was guessing that -o isn't appropriate because there may be multiple files output and so this isn't naming the output file but the output directory, but json-output-path is a mouthful for setting the output location. Or do you think users won't be using that option often enough for the length to matter?

71
74
106

Do you think we may need the ability for the user to pass other options to this driver invocation for things like -fms-compatibility or whatnot?

109
113
clang/lib/Tooling/DumpTool/generate_cxx_src_locs.py
77–87

Would this be a more clear equivalent?

clang/unittests/Introspection/IntrospectionTest.cpp
15

Might as well fix this lint warning.

41
65

Do I understand correctly that the workflow is to use the new dumping tool to generate the needed JSON file that then gets used as input to generate_cxx_src_locs.py which creates NodeLocationIntrospection.cpp/.h that then gets used by clang-query (eventually)?

Yes, that's right. I've added the patch for the latter now at D93325.

So there are two levels of translation involved to get the final source code?

There reason for the separation of the generation of JSON from the generation of C++ (using the JSON) is that the JSON can also be used to generate bindings for other languages. In https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching-refactoring-tools-eurollvm-and-accu/ I demonstrated that by generating and using Javascript bindings, but I've also proven the concept with Python3 bindings and added a clang-tidy module allowing clang-tidy checks to be written in Python. That resolves https://bugs.llvm.org//show_bug.cgi?id=32739 at least partially (there may be things which can still only be done in C++).

If so, do you know what the performance/overhead for this looks like compared to a typical build? I'm trying to get an idea for whether this will have negative impacts on the build bots such that we may want to add an LLVM cmake configure option to control whether this work happens or not.

In this patch, the generation is already disabled for Debug builds because the JSON generation is slow in Debug builds (I think we discussed that in Belfast). The reason it is slow is that compiling AST/AST.h with a debug-build seems to be slow. The slowness is not caused by the AST matching, but seems to be spent parsing.

With a release build, generating the JSON takes 3 seconds on my laptop. Plenty of compilations in the llvm build take longer. Generating the c++ files with generate_cxx_src_locs.py takes 0.04 seconds.

I'm not opposed to adding a condition in cmake for it, in case someone wants to build it even in Debug mode.

clang/lib/Tooling/DumpTool/APIData.h
22

I changed it, but are you referring to this proposal? https://llvm.org/docs/Proposals/VariableNames.html

26

Just for an idea, in a follow-up I have

TypeSourceInfos
TypeLocs
TemplateArgumentLocs
DeclNames
NestedNames
ConstCharStarMethods
StringRefMethods
StdStringMethods

We can review those names in the follow-up.

clang/lib/Tooling/DumpTool/ClangSrcLocDump.cpp
44

Yes, this tool will only be called from the buildsystem. The intention is also that it doesn't generate any other files either. The JSON file should have all the content needed for clang-query srcloc, bindings defined and used in the llvm repo etc and any other use we find for this. If we want to enable third parties to do similar things, we would install the JSON file, not this tool.

106

I don't think the user needs to have that ability, no. If there's any platform-specific options needed, they would be added directly here.

109

error: ‘Compilation’ was not declared in this scope; did you mean ‘clang::driver::Compilation’? Is this really necessary?

113

error: ‘JobList’ does not name a type I don't get a suggestion from the compiler. Is this really necessary?

clang/lib/Tooling/DumpTool/generate_cxx_src_locs.py
77–87
tools/clang/include/clang/Tooling/NodeLocationIntrospection.h: In member function ‘bool clang::tooling::internal::RangeLessThan::operator()(const std::pair<clang::SourceRange, std::__cxx11::basic_string<char> >&, const std::pair<clang::SourceRange, std::__cxx11::basic_string<char> >&) const’:
tools/clang/include/clang/Tooling/NodeLocationIntrospection.h:31:21: error: cannot bind non-const lvalue reference of type ‘clang::SourceLocation&’ to an rvalue of type ‘clang::SourceLocation’
   31 |     return std::tie(SourceLocation(LHS.first.getBegin()), LHS.first.getEnd(),
      |                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/9/bits/unique_ptr.h:37,
                 from /usr/include/c++/9/memory:80,
                 from llvm-project/llvm/include/llvm/Support/Casting.h:20,
                 from llvm-project/clang/include/clang/Basic/LLVM.h:21,
                 from llvm-project/clang/include/clang/Basic/DiagnosticIDs.h:17,
                 from llvm-project/clang/include/clang/Basic/Diagnostic.h:17,
                 from llvm-project/clang/include/clang/AST/NestedNameSpecifier.h:18,
                 from llvm-project/clang/include/clang/AST/Type.h:21,
                 from llvm-project/clang/include/clang/AST/CanonicalType.h:17,
                 from llvm-project/clang/include/clang/AST/ASTContext.h:19,
                 from llvm-project/clang/include/clang/AST/AST.h:17,
                 from tools/clang/lib/Tooling/generated/NodeLocationIntrospection.cpp:10:
/usr/include/c++/9/tuple:1611:19: note:   initializing argument 1 of ‘constexpr std::tuple<_Elements& ...> std::tie(_Elements& ...) [with _Elements = {clang::SourceLocation, clang::SourceLocation, const std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >}]’
 1611 |     tie(_Elements&... __args) noexcept
      |         ~~~~~~~~~~^~~~~~~~~~

I wasn't able to make it work.

clang/unittests/Introspection/IntrospectionTest.cpp
15

Funny, I thought clang-format would fix these things.

steveire updated this revision to Diff 312024.Dec 15 2020, 2:08 PM
steveire marked 19 inline comments as done.

Update

steveire marked 3 inline comments as done.Dec 15 2020, 2:08 PM
steveire added inline comments.
clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp
41

It's a biology word, but I think it works well!

steveire marked an inline comment as done.Dec 15 2020, 2:40 PM
steveire added a subscriber: thakis.

@thakis FYI - I think the GN build would need to be adapted to this.

Mostly mechanical changes requested here.

@thakis FYI - I think the GN build would need to be adapted to this.

FWIW the GN build has a bot that can typically update its gn files.

clang/include/clang/Tooling/NodeIntrospection.h
41

Is there a strong use case to return by value here?

55

Should this (and potential a few others) be moved to an implementation file.

56

It's probably more effective to use a SmallVector to reduce the needs for allocations here.

61

This is wasteful, just operate on rbegin/rend later and eliminate this call.

64

Is this more readable, IDK, but it sure as hell is more fun :)

66–67
clang/lib/Tooling/DumpTool/APIData.h
22

While there is a proposal in place, right now we should ensure we aren't deviating from the current system in patches. Unfortunately readability-identifier-naming has been disabled on clang directory due to excessive violations. May I suggest re-enabling it locally and then running clang-tidy-diff.py over the patch. should square most things up.

clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp
23–26

Why is this a variable, a templated function should do the same thing.
I imagine its something like this.

46

This seems dangerous, the MatchASTConsumer doesn't own its MatchFinder, so this is going to leak.

clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.h
2

Can you add a modeline here, same goes for other headers.

15

Any reason for empty lines between includes?

clang/lib/Tooling/DumpTool/ClangSrcLocDump.cpp
81
87–89

Another point, is there any use case of this outside of clang-query. If not would it not be wise to move this infrastructure to clang-tools-extra/clang-query?

steveire updated this revision to Diff 326985.Feb 28 2021, 11:00 AM
steveire updated this revision to Diff 326987.Feb 28 2021, 11:07 AM
steveire marked 7 inline comments as done.

Update

steveire marked 3 inline comments as done.Feb 28 2021, 11:07 AM
steveire added inline comments.
clang/lib/Tooling/DumpTool/ASTSrcLocProcessor.cpp
23–26

That seems far more noisy. I've left it as a lambda and moved it to the point of use.

steveire marked 2 inline comments as done.Feb 28 2021, 11:10 AM
steveire added inline comments.
clang/include/clang/Tooling/NodeIntrospection.h
55

The implementation file is generated by the python script. Rather than hiding this in the python script, I think it's better to leave it here.

steveire updated this revision to Diff 326997.Feb 28 2021, 2:13 PM
steveire marked an inline comment as done.

Update

This is almost ready but a few more points need addressing.
Running clang-format over the inc file is pointless and just extends compilation time while adding an unnecessary dependency on clang-format.
The inc file should likely live in the include build directory, All tablegen files seem to live in there. You could either move the CMake code that generates it into the include directory, or alter the directory, this should do that if you want and its safer than replace as it would only change the last /lib/ detected.

# Replace the last lib component of the current binary directory with include
string(FIND ${CMAKE_CURRENT_BINARY_DIR} "/lib/" PATH_LIB_START REVERSE)
if(PATH_LIB_START EQUAL -1)
  message(FATAL_ERROR "Couldn't find lib component in binary directory")
endif()
math(EXPR PATH_LIB_END "${PATH_LIB_START}+5")
string(SUBSTRING ${CMAKE_CURRENT_BINARY_DIR} 0 ${PATH_LIB_START} PATH_HEAD)
string(SUBSTRING ${CMAKE_CURRENT_BINARY_DIR} ${PATH_LIB_END} -1 PATH_TAIL)
string(CONCAT BINARY_INCLUDE_DIR ${PATH_HEAD} "/include/" ${PATH_TAIL})

After moving it to the Include output folder, In the cpp file you would need #include "clang/Tooling/NodeIntrospection.inc".
This would also remove a lot of those commands in lib/tooling/CMakeLists.txt.
Tablegen has a command line option --write-if-changed It may be wise to also include that in your generator script instead of using copy-if-different in the aforementioned CMakeLists.txt.

clang/include/clang/Tooling/NodeIntrospection.h
17–18

These should be quoted includes

clang/lib/Tooling/CMakeLists.txt
29

It may be wise to use the COMMENT argument to let the users know that it's building the ASTNodeAPI.json.

56

Likewise a comment to say building NodeIntrospection.inc.

99

This shouldn't appear in the source list.

clang/lib/Tooling/DumpTool/APIData.h
11

Header guard should be LLVM_CLANG_LIB_TOOLING_DUMPTOOL_APIDATA_H

clang/lib/Tooling/NodeIntrospection.cpp
14–16

Quoted includes and the NodeIntrospection.h include is the MainFileInclude so should appear first.

steveire marked 9 inline comments as done.Mar 10 2021, 2:35 PM
steveire added inline comments.
clang/lib/Tooling/CMakeLists.txt
99

We need to tell CMake the dependency.

steveire marked an inline comment as done.Mar 10 2021, 2:36 PM
njames93 accepted this revision.Mar 10 2021, 2:37 PM

nit: A few reformat hints

This revision is now accepted and ready to land.Mar 10 2021, 2:37 PM
steveire marked an inline comment as done.Mar 10 2021, 2:37 PM
This revision was landed with ongoing or failed builds.Mar 10 2021, 2:39 PM
This revision was automatically updated to reflect the committed changes.

@thakis Presumably you'll have to update the GN build now.

thakis added a subscriber: rsmith.Mar 14 2021, 6:44 AM

A few more high-level questions:

  • What's the point of the intermediary json file? Why not generate the final c++ directly? (As far as I can tell, this wasn't discussed during the review yet)
  • Do we need to generate code for this at all? Could this be done via xmacros or tablegen?

Having a bespoke custom python -> json -> python -> c++ pipeline here seems like it's fairly different from how the rest of clang does things, and it seems like it duplicates some of the existing tooling we have here.

(Having said that, I'm no code owner here -- @rsmith is. Maybe he has an opinion.)

Lower-level: Did you see all the comments on https://reviews.llvm.org/rGd627a27d264b47eda3f15f086ff419dfe053ebf7 ? This relanded with them unaddressed. Please address them in a follow-up. (Sorry for leaving the comments on the commit instead of the review!)

clang/lib/Tooling/CMakeLists.txt
31

Putting this in the root of the build dir seems a bit untidy. I think CMAKE_CURRENT_BINARY_DIR is what we usually use for generated files.

74

...like used here. Why generated/? Everything in CMAKE_CURRENT_BINARY_DIR is generated. (Compare to find . -name '*.inc')

Also, why not make the python script write the file only if changed instead of making a copy here? (like llvm-tblgen does)

Hello. Does this work when the default target triple isn't native? This seems to be trying to compile clang sources with the just built clang - something that I don't think is always possible. I'm seeing errors like fatal error: 'cstddef' file not found, and failing to link the new IntrospectionTests, with undefined references to NodeIntrospection::GetLocations, as a full toolchain is not setup. I don't believe the buildbots are working right now, so it's difficult to see if any other systems have similar problems.

Also some of the files here have been added with a University of Illinois Open Source License. They should presumably be using the newer Apache license now.

A few more high-level questions:

  • What's the point of the intermediary json file? Why not generate the final c++ directly? (As far as I can tell, this wasn't discussed during the review yet)

It came up in review earlier: https://reviews.llvm.org/D93164#2456181

  • Do we need to generate code for this at all? Could this be done via xmacros or tablegen?

Can you say more? Would this require generating the declarations in include/clang/AST? It sounds like a large maintenance burden, but maybe I'm missing something.

Having a bespoke custom python -> json -> python -> c++ pipeline here seems like it's fairly different from how the rest of clang does things, and it seems like it duplicates some of the existing tooling we have here.

(Having said that, I'm no code owner here -- @rsmith is. Maybe he has an opinion.)

Lower-level: Did you see all the comments on https://reviews.llvm.org/rGd627a27d264b47eda3f15f086ff419dfe053ebf7 ? This relanded with them unaddressed. Please address them in a follow-up. (Sorry for leaving the comments on the commit instead of the review!)

Done, thanks!

Hello. Does this work when the default target triple isn't native? This seems to be trying to compile clang sources with the just built clang - something that I don't think is always possible. I'm seeing errors like fatal error: 'cstddef' file not found, and failing to link the new IntrospectionTests, with undefined references to NodeIntrospection::GetLocations, as a full toolchain is not setup. I don't believe the buildbots are working right now, so it's difficult to see if any other systems have similar problems.

Can you report if running cmake . -DCLANG_TOOLING_BUILD_AST_INTROSPECTION=OFF allows you to complete the build?

Is there some way I can check that the default target triple isn't native in the cmake code? Then I can set that option automatically.

Also some of the files here have been added with a University of Illinois Open Source License. They should presumably be using the newer Apache license now.

Fixed, thanks!

This revision is now accepted and ready to land.Mar 14 2021, 9:09 AM

Was the problem there just the shebang line?

@nikic How do I run a test build on that machine? Or can you diagnose the problem instead?

nikic added a comment.Mar 14 2021, 9:47 AM

@steveire When running the command manually I get:

/root/llvm-compile-time-tracker/llvm-project/clang/lib/Tooling/DumpTool/generate_cxx_src_locs.py --json-input-path /root/llvm-compile-time-tracker/llvm-project-build/ASTNodeAPI.json --output-file generated/NodeIntrospection.inc --empty-implementation 0
-bash: /root/llvm-compile-time-tracker/llvm-project/clang/lib/Tooling/DumpTool/generate_cxx_src_locs.py: /usr/bin/python: bad interpreter: No such file or directory

So probably this doesn't work on any system that only has Python 3.

As the .py file is invoked directly, I assume that ${PYTHON_EXECUTABLE} is empty. Maybe you were looking for ${Python3_EXECUTABLE}?

arichardson added inline comments.
clang/lib/Tooling/DumpTool/generate_cxx_src_locs.py
1

Maybe this should be #!/use/bin/env python (or python3) instead?

It may be wise to alter this so there is no need for the python script.
How about altering the dump tool to support outputting both json files and the cpp code needed for node introspection.
Maybe have arguments to the tool --json-output-path and --introspection-output-path.
If both are specified write both, if none specified error out.

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2021, 3:36 PM

ast-dump-tool is still somewhere in lib/ instead of in tools/ in the reland as far as I can tell.

I am seeing a spew of errors after 19740652c4c4329e2b9e77f96e5e31c360b4e8bb (what appears to be the latest version of this patch):

$ cmake \
-G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=$(command -v clang) \
-DCMAKE_CXX_COMPILER=$(command -v clang++) \
-DLLVM_CCACHE_BUILD=ON \
-DLLVM_ENABLE_PROJECTS=clang \
../llvm
...

$ ninja run-ast-api-dump-tool
...
In file included from /home/nathan/src/llvm-project/build/tools/clang/lib/Tooling/ASTTU.cpp:2:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/AST.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/ASTContext.h:19:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/CanonicalType.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/Type.h:20:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/DependenceFlags.h:11:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/Basic/BitmaskEnum.h:18:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:16:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/Support/MathExtras.h:21:
/usr/include/c++/10.2.0/cstdint:52:11: error: no member named 'int_fast8_t' in the global namespace
  using ::int_fast8_t;
        ~~^
/usr/include/c++/10.2.0/cstdint:53:11: error: no member named 'int_fast16_t' in the global namespace; did you mean '__int_least16_t'?
  using ::int_fast16_t;
        ~~^
/usr/include/bits/types.h:54:19: note: '__int_least16_t' declared here
typedef __int16_t __int_least16_t;
                  ^
In file included from /home/nathan/src/llvm-project/build/tools/clang/lib/Tooling/ASTTU.cpp:2:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/AST.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/ASTContext.h:19:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/CanonicalType.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/Type.h:20:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/DependenceFlags.h:11:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/Basic/BitmaskEnum.h:18:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:16:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/Support/MathExtras.h:21:
/usr/include/c++/10.2.0/cstdint:54:11: error: no member named 'int_fast32_t' in the global namespace; did you mean '__int_least32_t'?
  using ::int_fast32_t;
        ~~^
/usr/include/bits/types.h:56:19: note: '__int_least32_t' declared here
typedef __int32_t __int_least32_t;
                  ^
In file included from /home/nathan/src/llvm-project/build/tools/clang/lib/Tooling/ASTTU.cpp:2:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/AST.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/ASTContext.h:19:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/CanonicalType.h:17:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/Type.h:20:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/AST/DependenceFlags.h:11:
In file included from /home/nathan/src/llvm-project/llvm/../clang/include/clang/Basic/BitmaskEnum.h:18:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/ADT/BitmaskEnum.h:16:
In file included from /home/nathan/src/llvm-project/llvm/include/llvm/Support/MathExtras.h:21:
/usr/include/c++/10.2.0/cstdint:55:11: error: no member named 'int_fast64_t' in the global namespace; did you mean '__int_least64_t'?
  using ::int_fast64_t;
        ~~^
/usr/include/bits/types.h:58:19: note: '__int_least64_t' declared here
typedef __int64_t __int_least64_t;
                  ^
...

I am seeing a spew of errors after 19740652c4c4329e2b9e77f96e5e31c360b4e8bb (what appears to be the latest version of this patch):

Hi @nathanchance Does it make your build fail? I have pushed a fix. Can you update and try again?

I am seeing a spew of errors after 19740652c4c4329e2b9e77f96e5e31c360b4e8bb (what appears to be the latest version of this patch):

Hi @nathanchance Does it make your build fail? I have pushed a fix. Can you update and try again?

Does not look like my build ever errored out (exit code was 0 even without the fix you pushed).

It does look like the errors are hidden now though (mostly but that is fine enough for me):

[1373/1373] ASTNodeAPI.json
20 errors generated.

Thanks for the quick response!

This change breaks cross-compilation now, as it tries running an executable built for the target system:

FAILED: tools/clang/lib/Tooling/ASTNodeAPI.json 
cd /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling && /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump --skip-processing=0 --astheader=/home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTTU.cpp -I /home/mgorny/llvm-project/build.arm64/lib/clang/13.0.0/include -I /home/mgorny/llvm-project/llvm/../clang/include -I /home/mgorny/llvm-project/build.arm64/tools/clang/include -I /home/mgorny/llvm-project/build.arm64/include -I /home/mgorny/llvm-project/llvm/include -I /sysroot/arm64/usr/include/c++/v1 -I /usr/lib/clang/11.0.1/include -I /sysroot/arm64/usr/include --json-output-path /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTNodeAPI.json
/bin/sh: /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump: Exec format error

I guess you can look at TableGens how to correctly generate and use host executables.

This change breaks cross-compilation now, as it tries running an executable built for the target system:

FAILED: tools/clang/lib/Tooling/ASTNodeAPI.json 
cd /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling && /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump --skip-processing=0 --astheader=/home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTTU.cpp -I /home/mgorny/llvm-project/build.arm64/lib/clang/13.0.0/include -I /home/mgorny/llvm-project/llvm/../clang/include -I /home/mgorny/llvm-project/build.arm64/tools/clang/include -I /home/mgorny/llvm-project/build.arm64/include -I /home/mgorny/llvm-project/llvm/include -I /sysroot/arm64/usr/include/c++/v1 -I /usr/lib/clang/11.0.1/include -I /sysroot/arm64/usr/include --json-output-path /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTNodeAPI.json
/bin/sh: /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump: Exec format error

I guess you can look at TableGens how to correctly generate and use host executables.

I am now running into this error now that clang13 has released. I cannot figure a way around this. I do not see any cmake options similar to the tablegens that allow you to specify the binary for the build system. Am I missing something or is clang13 just broken for cross compilation?

smeenai added inline comments.
clang/lib/Tooling/CMakeLists.txt
29–30

I'm looking at this commit in the context of https://bugs.llvm.org/show_bug.cgi?id=52106. Why do we not run the generator when targeting Windows or Apple platforms? Note that WIN32 and APPLE reflect the target platform, not the platform you're building on.

steveire added inline comments.Nov 8 2021, 3:55 AM
clang/lib/Tooling/CMakeLists.txt
29–30

I was not able to make the change pass the LLVM CI system on those platforms.

This change breaks cross-compilation now, as it tries running an executable built for the target system:

FAILED: tools/clang/lib/Tooling/ASTNodeAPI.json 
cd /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling && /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump --skip-processing=0 --astheader=/home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTTU.cpp -I /home/mgorny/llvm-project/build.arm64/lib/clang/13.0.0/include -I /home/mgorny/llvm-project/llvm/../clang/include -I /home/mgorny/llvm-project/build.arm64/tools/clang/include -I /home/mgorny/llvm-project/build.arm64/include -I /home/mgorny/llvm-project/llvm/include -I /sysroot/arm64/usr/include/c++/v1 -I /usr/lib/clang/11.0.1/include -I /sysroot/arm64/usr/include --json-output-path /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTNodeAPI.json
/bin/sh: /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump: Exec format error

I guess you can look at TableGens how to correctly generate and use host executables.

I am now running into this error now that clang13 has released. I cannot figure a way around this. I do not see any cmake options similar to the tablegens that allow you to specify the binary for the build system. Am I missing something or is clang13 just broken for cross compilation?

Another confirmation that this change has broken cross compilation. I have tried setting both -DCLANG_TOOLING_BUILD_AST_INTROSPECTION=OFF and -DCMAKE_CROSSCOMPILING=ON separately and in combination. This means that LLVM cannot be cross compiled for AArch64, which is a pretty serious problem!

lancethepants added a comment.EditedJan 18 2022, 1:42 PM

This change breaks cross-compilation now, as it tries running an executable built for the target system:

FAILED: tools/clang/lib/Tooling/ASTNodeAPI.json 
cd /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling && /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump --skip-processing=0 --astheader=/home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTTU.cpp -I /home/mgorny/llvm-project/build.arm64/lib/clang/13.0.0/include -I /home/mgorny/llvm-project/llvm/../clang/include -I /home/mgorny/llvm-project/build.arm64/tools/clang/include -I /home/mgorny/llvm-project/build.arm64/include -I /home/mgorny/llvm-project/llvm/include -I /sysroot/arm64/usr/include/c++/v1 -I /usr/lib/clang/11.0.1/include -I /sysroot/arm64/usr/include --json-output-path /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTNodeAPI.json
/bin/sh: /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump: Exec format error

I guess you can look at TableGens how to correctly generate and use host executables.

I am now running into this error now that clang13 has released. I cannot figure a way around this. I do not see any cmake options similar to the tablegens that allow you to specify the binary for the build system. Am I missing something or is clang13 just broken for cross compilation?

Another confirmation that this change has broken cross compilation. I have tried setting both -DCLANG_TOOLING_BUILD_AST_INTROSPECTION=OFF and -DCMAKE_CROSSCOMPILING=ON separately and in combination. This means that LLVM cannot be cross compiled for AArch64, which is a pretty serious problem!

So I did find a solution. Maybe somewhere else in the forums or on irc. Can't remember for sure.

-DCMAKE_SYSTEM_NAME="Linux"
Add this is well to your cmake invocation. For some reason this triggers in cmake that we are cross-compiling. Why -DCMAKE_CROSSCOMPILING=ON is insufficient I have no idea why. I'm compiling on linux for linux, didn't think I'd need it. Kind of unintuitive, but that should hopefully get you going.

This change breaks cross-compilation now, as it tries running an executable built for the target system:

FAILED: tools/clang/lib/Tooling/ASTNodeAPI.json 
cd /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling && /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump --skip-processing=0 --astheader=/home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTTU.cpp -I /home/mgorny/llvm-project/build.arm64/lib/clang/13.0.0/include -I /home/mgorny/llvm-project/llvm/../clang/include -I /home/mgorny/llvm-project/build.arm64/tools/clang/include -I /home/mgorny/llvm-project/build.arm64/include -I /home/mgorny/llvm-project/llvm/include -I /sysroot/arm64/usr/include/c++/v1 -I /usr/lib/clang/11.0.1/include -I /sysroot/arm64/usr/include --json-output-path /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTNodeAPI.json
/bin/sh: /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump: Exec format error

I guess you can look at TableGens how to correctly generate and use host executables.

I am now running into this error now that clang13 has released. I cannot figure a way around this. I do not see any cmake options similar to the tablegens that allow you to specify the binary for the build system. Am I missing something or is clang13 just broken for cross compilation?

Another confirmation that this change has broken cross compilation. I have tried setting both -DCLANG_TOOLING_BUILD_AST_INTROSPECTION=OFF and -DCMAKE_CROSSCOMPILING=ON separately and in combination. This means that LLVM cannot be cross compiled for AArch64, which is a pretty serious problem!

So I did find a solution. Maybe somewhere else in the forums or on irc. Can't remember for sure.

-DCMAKE_SYSTEM_NAME="Linux"
Add this is well to your cmake invocation. For some reason this triggers in cmake that we are cross-compiling. Why -DCMAKE_CROSSCOMPILING=ON is insufficient I have no idea why. I'm compiling on linux for linux, didn't think I'd need it. Kind of unintuitive, but that should hopefully get you going.

I believe the relevant bit of CMake logic is https://gitlab.kitware.com/cmake/cmake/-/blob/a2e42a577b08dc65ba801e18e5c8be163df83455/Modules/CMakeDetermineSystem.cmake#L136-158. If CMAKE_SYSTEM_NAME isn't set, it'll set CMAKE_CROSSCOMPILING to FALSE internally.

This change breaks cross-compilation now, as it tries running an executable built for the target system:

FAILED: tools/clang/lib/Tooling/ASTNodeAPI.json 
cd /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling && /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump --skip-processing=0 --astheader=/home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTTU.cpp -I /home/mgorny/llvm-project/build.arm64/lib/clang/13.0.0/include -I /home/mgorny/llvm-project/llvm/../clang/include -I /home/mgorny/llvm-project/build.arm64/tools/clang/include -I /home/mgorny/llvm-project/build.arm64/include -I /home/mgorny/llvm-project/llvm/include -I /sysroot/arm64/usr/include/c++/v1 -I /usr/lib/clang/11.0.1/include -I /sysroot/arm64/usr/include --json-output-path /home/mgorny/llvm-project/build.arm64/tools/clang/lib/Tooling/ASTNodeAPI.json
/bin/sh: /home/mgorny/llvm-project/build.arm64/bin/clang-ast-dump: Exec format error

I guess you can look at TableGens how to correctly generate and use host executables.

I am now running into this error now that clang13 has released. I cannot figure a way around this. I do not see any cmake options similar to the tablegens that allow you to specify the binary for the build system. Am I missing something or is clang13 just broken for cross compilation?

Another confirmation that this change has broken cross compilation. I have tried setting both -DCLANG_TOOLING_BUILD_AST_INTROSPECTION=OFF and -DCMAKE_CROSSCOMPILING=ON separately and in combination. This means that LLVM cannot be cross compiled for AArch64, which is a pretty serious problem!

So I did find a solution. Maybe somewhere else in the forums or on irc. Can't remember for sure.

-DCMAKE_SYSTEM_NAME="Linux"
Add this is well to your cmake invocation. For some reason this triggers in cmake that we are cross-compiling. Why -DCMAKE_CROSSCOMPILING=ON is insufficient I have no idea why. I'm compiling on linux for linux, didn't think I'd need it. Kind of unintuitive, but that should hopefully get you going.

@lancethepants @smeenai thanks for the pointers! Unfortunately, it still doesn't work for me after passing -DCMAKE_SYSTEM_NAME="Linux". Passing that option did change the CMake output, so it's definitely recognized at least (the messages about which sanitizer tests will run now call the platform "Linux" as opposed to referring to the platform as default). I am also compiling on Linux for Linux, I'm just trying to build an AArch64 toolchain which I can copy to an SD card for testing on an AArch64 board. @lancethepants could you check your CMake version where you got this to work? I'm using CMake 3.22.1 and trying to build llvmorg-13.0.0

The full invocation I'm using is the following:

cmake -G Ninja \
  -DLLVM_TABLEGEN=${TOP}/${CLANG_TBLGEN_DIR}/bin/llvm-tblgen \
  -DCLANG_TABLEGEN=${TOP}/${CLANG_TBLGEN_DIR}/bin/clang-tblgen \
  -DCMAKE_SYSTEM_NAME="Linux" \
  -DCMAKE_CROSSCOMPILING=ON \
  -DCLANG_TOOLING_BUILD_AST_INTROSPECTION=OFF \
  -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
  -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
  -DCMAKE_BUILD_TYPE=MinSizeRel \
  -DCMAKE_INSTALL_PREFIX=${TOP}/${ROOT_DIR} \
  -DLLVM_DEFAULT_TARGET_TRIPLE=aarch64-linux-gnu \
  -DLLVM_NATIVE_ARCH="X86" \
  -DLLVM_TARGET_ARCH="AArch64" \
  -DLLVM_TARGETS_TO_BUILD="AArch64" \
  -DLLVM_ENABLE_PROJECTS="clang;compiler-rt;lld;" \
  -DLLVM_ENABLE_TERMINFO=OFF \
  -DLLVM_ENABLE_BINDINGS=OFF \
  -DLLVM_BUILD_EXAMPLES=OFF \
  -DLLVM_BUILD_TESTS=OFF \
  -DLLVM_BUILD_BENCHMARKS=OFF \
  ${TOP}/${SRC_DIR}/llvm/

@lancethepants @smeenai thanks for the pointers! Unfortunately, it still doesn't work for me after passing -DCMAKE_SYSTEM_NAME="Linux". Passing that option did change the CMake output, so it's definitely recognized at least (the messages about which sanitizer tests will run now call the platform "Linux" as opposed to referring to the platform as default). I am also compiling on Linux for Linux, I'm just trying to build an AArch64 toolchain which I can copy to an SD card for testing on an AArch64 board. @lancethepants could you check your CMake version where you got this to work? I'm using CMake 3.22.1 and trying to build llvmorg-13.0.0

The full invocation I'm using is the following:

cmake -G Ninja \
  -DLLVM_TABLEGEN=${TOP}/${CLANG_TBLGEN_DIR}/bin/llvm-tblgen \
  -DCLANG_TABLEGEN=${TOP}/${CLANG_TBLGEN_DIR}/bin/clang-tblgen \
  -DCMAKE_SYSTEM_NAME="Linux" \
  -DCMAKE_CROSSCOMPILING=ON \
  -DCLANG_TOOLING_BUILD_AST_INTROSPECTION=OFF \
  -DCMAKE_C_COMPILER=aarch64-linux-gnu-gcc \
  -DCMAKE_CXX_COMPILER=aarch64-linux-gnu-g++ \
  -DCMAKE_BUILD_TYPE=MinSizeRel \
  -DCMAKE_INSTALL_PREFIX=${TOP}/${ROOT_DIR} \
  -DLLVM_DEFAULT_TARGET_TRIPLE=aarch64-linux-gnu \
  -DLLVM_NATIVE_ARCH="X86" \
  -DLLVM_TARGET_ARCH="AArch64" \
  -DLLVM_TARGETS_TO_BUILD="AArch64" \
  -DLLVM_ENABLE_PROJECTS="clang;compiler-rt;lld;" \
  -DLLVM_ENABLE_TERMINFO=OFF \
  -DLLVM_ENABLE_BINDINGS=OFF \
  -DLLVM_BUILD_EXAMPLES=OFF \
  -DLLVM_BUILD_TESTS=OFF \
  -DLLVM_BUILD_BENCHMARKS=OFF \
  ${TOP}/${SRC_DIR}/llvm/

Are you stil encountering the same issue?
This is my cmake invocation.
https://github.com/lancethepants/tomatoware/blob/master/scripts/buildroot.sh#L808-L843

I typically update cmake to the most recent whenver I get around to updating my project. The latest I've used is 3.21.4. Looks like you're using the very newest, so I doubt it's the difference between my version and your version.