Page MenuHomePhabricator

[analyzer] On-demand parsing capability for CTU

Authored by gamesh411 on Mar 5 2020, 1:31 AM.



Add an option to enable on-demand parsing of needed ASTs during CTU analysis.
Two options are introduced. CTUOnDemandParsing enables the feature, and
CTUOnDemandParsingDatabase specifies the path to a compilation database, which
has all the necessary information to generate the ASTs.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
thakis added inline comments.Apr 27 2020, 3:38 AM
13 ↗(On Diff #260238)

We've been very careful to make clang (the compiler binary) only depend on clangToolingCore and not on the much bigger clangTooling library. Since clang depends on the CrossTU library, this breaks that. Can you reorganize things so that we don't need a dependency from clang-the-compiler-binary on clangTooling?

Looks like this also breaks tests on Windows and Mac, eg

It probably makes sense to revert while you investigate?

Sorry, this change broke a couple bots that are in the official CI, at least and I'm reverting this change -- feel free to reland after fixing the issue.

gamesh411 reopened this revision.EditedApr 27 2020, 6:34 AM

Non-linux buildbots were utterly broken :( . Trying to fix them...

Thanks for reverting this for me :)
Also I will investigate the possibility to not depend on Tooling.

This revision is now accepted and ready to land.Apr 27 2020, 6:34 AM
gamesh411 updated this revision to Diff 260308.Apr 27 2020, 6:39 AM

Remove platform constraint from test file

gamesh411 updated this revision to Diff 260310.Apr 27 2020, 6:44 AM

[NFC] Fix arcanist double commit revisioning

gamesh411 marked an inline comment as done.Apr 27 2020, 7:18 AM
gamesh411 added inline comments.
13 ↗(On Diff #260238)

I have gone, and investigated where the main functionality of on-demand AST loading comes from. (Only after wishfully thinking changing the dependency from clangTooling to clangToolingCore would work, which of course didnt't )

It seems that this functionality used the major components of clangTooling, namely JSONCompilationDatabase::loadFromFile, ClangTool ctor/dtor, ClangTool::buildAST.
In order to circumvent the dependency and the reimplementation of handling compilation databases, a call to an external tool could be used, or the clangTooling library would need to be refactored in a way which is not a very tasteful refactoring. IMHO. I would like to know which of the aforementioned way is deemed to be best.

Remove arch target from another invocation

gamesh411 updated this revision to Diff 263294.May 11 2020, 3:34 PM

Implement a solution without a dependency on clangTooling

balazske added inline comments.May 12 2020, 2:16 AM

Is here a lambda necessary? I think it must not be over-used. In this case we can construct the file path as the first step and then use it to load the file, without using lambda call.


Is here a special reason to use auto && type (and not auto & or std::string &)? Probably this code is faulty: The CmdPart is moved out and then destructed, before the content of it (where the c_str points to) is used?

Please check the summary of the patch, it seems to contain old information as well.


This is obsolete information.


Is the explicit needed here?


"valid compilation database" is this line still what we wish to say? A YAML is used, so most likely we need a "valid invocation list" or something.


Maybe it would be worthwhile to add a dummy example like main.cpp: g++ hello-world.cpp which has the proper format into this help.


The idea here was to not depend on libtooling but these lines are still here!


<> -> "" and order


still using terms "compilation database", is this intended?


@balazske This is a generic lambda where auto&& is the universal reference. It is equivalent to writing template <typename T> f(T&& param). It obeys the usual point of instantiation and reference cascading rules, if a const std::string& is given as T, const std::string& && will become a single &.


Are the headers from which these types come from included?


Possibly unrelated change?


Does the LIT syntax allow for better organising these commands? This is now a bit hard to read

Something like this would help:

// RUN: rm -rf %t && mkdir -p %t
// RUN: cp "%s" "%t/ctu-on-demand-parsing.c" && cp "%S/Inputs/ctu-other.c" "%t/ctu-other.c"
// [empty]
// [comment]
// RUN: [json magic]
// RUN: [yaml magic]
// [empty]
// RUN: extdef map
// [empty]
// RUN: cc1
whisperity requested changes to this revision.May 12 2020, 4:38 AM

(Maybe this will make Phab not show "✅ Accepted"...)

This revision now requires changes to proceed.May 12 2020, 4:38 AM

Hi Endre, just checked the latest update. Overall looks good to me, but found some nits.


Should we rename this member to ....PathTo....InvocationList... ?


Could we avoid this transfer if InvocationList was storing const char * values instead of std::strings? Why can't we store char*s in InvocationList?


typo: filed -> field


Perhaps we could document here that the compile_commands.json is needed ONLY for %clang_extdef_map.


I know this is just a nit, but this is very hard to read. Do you think we could break this up (and other long lines) into multiple lines but within the same RUN directive?

gamesh411 updated this revision to Diff 266473.May 27 2020, 3:37 AM
gamesh411 marked 29 inline comments as done.

Update functional changes, documentation update incoming

gamesh411 marked 2 inline comments as done.May 27 2020, 4:18 AM

The remaining documentation and test changes are also underway.


I will implement this separation by suffixes in the next change.


This constructor started out with having a single param only. I agree, that explicit is no longer necessary.


Renamed to InvocationListFilePath.


Most usages of the CI require a non-const ref (this is due to lacking API support for getting analyzer options), and I found it more convenient to just take a non-const ref, instead of const_casting it away later. Not sure myself, but I think I will roll with this now (and I should definitely check the API of CompilerInstance).




I have solved the problem of supporting multiple architectures by offloading the responsibility to the user (to the tool driving the analysis). The invocation list file is now supposed to contain only invocations with the same arch/target and this should also be the same as the arch/target of the main TU analyzed. If there is still ambiguity after this separation, this error should nevertheless be emitted ( IMHO ).


Refactored to use path concatenation.


We certainly could, I think there won't be null-characters in paths so storing char* s would not lose generality. However, I think the memory management done by std::string is worth it, as it is clear where the character-data lives in memory.


Results of my research: In theory, the STL could do something fancy here if it wanted, but it is not the case in reality. If you want move-operations with algorithms (the ones not explicitly mentioning move in their names) you have to be explicit about it: move-iterators-fluent-cpp.


Included the llvm/Support/YAMLParser.h


Funny enough, this is very relevant. Dumping the AST does not errors out on this construction, however, the parser in on-demand mode does. This asm syntax is however supported by both. To document this, I have added a TODO as well.

gamesh411 marked an inline comment as done.May 27 2020, 5:33 AM
whisperity resigned from this revision.May 28 2020, 1:01 AM
whisperity added inline comments.

Duped include in L34.

This revision is now accepted and ready to land.May 28 2020, 1:01 AM
gamesh411 updated this revision to Diff 267143.May 29 2020, 1:43 AM
gamesh411 marked 6 inline comments as done.

Fix documentation and commit message

gamesh411 marked 3 inline comments as done.May 29 2020, 1:51 AM
martong accepted this revision.May 29 2020, 3:10 AM


gamesh411 updated this revision to Diff 269439.Jun 9 2020, 1:37 AM

Extend index file format
Update documentation

gamesh411 updated this revision to Diff 269527.Jun 9 2020, 6:36 AM

Fix test case, and reorder warning

whisperity added inline comments.Jun 9 2020, 7:29 AM

These two lines could be formatted together.


Just for good measure, I would like a test case on the "invocation list is ambiguous" error. (So in case we in the future figure out how to disambiguate, we can remove that test case!)

gamesh411 updated this revision to Diff 269615.Jun 9 2020, 11:26 AM

use consumeError in test

gamesh411 updated this revision to Diff 269639.Jun 9 2020, 12:56 PM

add ambiguous invocation list test case

gamesh411 marked 2 inline comments as done.Jun 9 2020, 12:58 PM
gamesh411 updated this revision to Diff 269677.Jun 9 2020, 2:22 PM

add ambiguity checking during invocation list parsing

This revision was automatically updated to reflect the committed changes.

This breaks check-clang on mac:

Please take a look, and revert for now if it takes a while to fix.

This breaks check-clang on mac:

Please take a look, and revert for now if it takes a while to fix.

Thanks for reporting this, i think the output is valuable for debugging. I have restricted the test case to linux for now.