Add an option to enable on-demand parsing of needed ASTs during CTU analysis.
Two options are introduced. CTUOnDemandParsing enables the feature, and
CTUOnDemandParsingDatabase specifies the path to a compilation database, which
has all the necessary information to generate the ASTs.
Details
- Reviewers
martong balazske Szelethus xazax.hun whisperity - Commits
- rG5cc18516c483: [analyzer] On-demand parsing capability for CTU
rG97e07d0c352c: [analyzer] On-demand parsing capability for CTU
rG020815fafd15: [analyzer] On-demand parsing capability for CTU
rG811c0c9eb462: [analyzer] On-demand parsing capability for CTU
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
---|---|---|
402 | CI.getPCHContainerOperations() is probably not needed, because we are not going to exercise the ASTReader, so this could be a nullptr (default param value). |
clang/lib/StaticAnalyzer/Core/CallEvent.cpp | ||
---|---|---|
577 | left here some debugging |
clang/include/clang/CrossTU/CrossTranslationUnit.h | ||
---|---|---|
221 | class would look better here? (The descendants are class too.) | |
224 | Probably it is good to mention in the documentation that the function is used with a string read from a file and the type of the file determines the meaning of the Identifier here (the calling code does have no direct knowledge about it). | |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
449 | matches | |
515 | Members of ASTOnDemandLoader would be better in a single group in the source file. |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
---|---|---|
358 | Could we check somehow if CTUDir is indeed an absolute path? If not then probably we should bail out with an error. Though I am not sure if there is an easy and portable way in llvm:: to check for beeing an absolute path. | |
364 | There is no variable named as OnDemandParsingCommands, perhaps you made a rename but forgot to rename in the comments? | |
373 | There is no variable named as ASTSourcePath, perhaps you made a rename but forgot to rename in the comments? | |
402 | If CI.getPCHContainerOperations() is not needed then we can remove the CI member of ASTOnDemandLoader. | |
432 | Perhaps we should make sure that Identifier is indeed an absolute path and if not then we should emit an error. If a user do not use CodeChecker to generate the ExternalDefMapping it may contain relative paths. See e.g.: https://clang.llvm.org/docs/analyzer/user-docs/CrossTranslationUnit.html#manual-ctu-analysis | |
446 | Could we have a lit test for this case? I.e having a compile_commands.json with two ambiguous entries for the same file and then we should expect the compiler diag? | |
clang/test/Analysis/ctu-main.c | ||
53 | Is this hunk related? | |
clang/test/Analysis/ctu-on-demand-parsing.c | ||
3 | Why do we need sed here? I don't see any \ in the compile commands json. Is it because of Windows builds? Could you please explain in a comment maybe in the previous line. |
Refactored the test
Copying the compile_commands.json in case of test for C files
results in the extdef mapping tool finding the correct compiler
flags for the C to be imported.
I have investigated the issue, and the compiler flags were looked up using the heuristic implemented in tooling.
This heuristic looks for the suitable compilation database in an upward ascending fashion inside the directory tree startin from the input source file.
By copying both the source file and the compilation database to the test directory this heuristic does the right thing now. (up until now the found compile_commands.json was the one used for llvm-project itself, and picked up a c++ specific compilation).
This issue is solved, however the ASTImporter still cannot import the inline definition of the struct, and emits an error.
Right now I am debugging the master branch that uses the AST-dumps wheter I also encounter this error inside the ASTImporter (just to see if I am operating on sane assumptions).
You are right, the ctudir2 prefix is not needed as the %t substitution is test-file-level unique.
I've been looking into this and the ASTImporter indeed does not (cannot) import the definition of structInProto(). But that's just fine, that is why we have a branching for the call expression (TRUE/FALSE) below in ctu-main.c . All other functions whose definition were imported gives only the TRUE branch.
clang_analyzer_eval(structInProto(&d) == 0); // expected-warning{{TRUE}} expected-warning{{FALSE}}
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
---|---|---|
358 | Actually this does not need to be absolute. The naming of the variable should be changed to PrefixedPath or similar. I will do that for now. | |
358 | It not necessarily an absolute path, only prefixed by CTUDir. There is no clear policy as to where to use strictly absolute paths and where to accept relative paths. IMHO this should be considered, and I am most certainly open for a discussion. In the meantime, I will just rename the variable to PrefixedPath. | |
402 | As for the automatic lifetime of Tool, I am inclined to say this is not the case. I have run my clang-build with sanitizers (undefined and memory), and no bad access or UB was reported. If someone has more domain knowledge about tooling and the lifetime issues that could arise, I would be thrilled to be enlightened. | |
402 | As for the need to pass PCHContainerOperaitons: | |
432 | The policy of absolute and relative paths should be discussed. See my previous answer concerning AbsPath variable above. BTW llvm::sys::path::is_absolute is a solution for asserting this as I see it. | |
clang/test/Analysis/ctu-main.c | ||
53 | Seems unrelated, not sure how it got included. Fixing it. | |
clang/test/Analysis/ctu-on-demand-parsing.c | ||
3 | I have done this by example. Some CodeGen lit tests use sed the same way I did here. Needed for Windows compatibility maybe? (note that %t could substitute into a path containing backslashes on Windows) However, on Windows platforms, the availability of sed is not guaranteed either. The backslashes are used as escape chars in the JSON, hence the need to double-escape them. No comments are provided on those test also... I will add something tho (and also in the cpp test). |
clang/test/Analysis/ctu-on-demand-parsing.c | ||
---|---|---|
6 | Needed because the in-expression definition of a struct (which we cannot import) also warrants a compiler warning, and this comes up 2 times even (once when generating the extdef-mapping, second time when we analyze the file). Its easier to handle this warning in the compilation database, as that removes both occurrences of the warning. | |
7 | Cannot remove as of yet, because they have .ast suffixed entries. In order to remove them the whole logic of what should be inside the index (externalDefMap.txt) for on-demand and non-on-demand cases should be discussed. |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
---|---|---|
446 | Good idea, I will create a standalone test case for this! |
Thanks Endre for the docs! I checked the CI job also, seemed okay, so, I think we are ready! Nice work! Let's do the commit!
clang/docs/analyzer/user-docs/CrossTranslationUnit.rst | ||
---|---|---|
26 | Maybe just simply write This tool uses a compilation command database ... |
clang/include/clang/CrossTU/CrossTranslationUnit.h | ||
---|---|---|
56 | The two enum values refer to compilation database and compile command database. I'd prefer to use the same wording in both values. | |
227 | I am not sure if this is good design. Looking at the code this does not look bad. But it might be a code smell. | |
285 | Why is this no longer const? | |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
120 | Unfortunately, this is a very common case for a large number of projects that supports multiple targets (e.g. 32 and 64 bits). Is there a plan to mitigate this problem? |
clang/include/clang/CrossTU/CrossTranslationUnit.h | ||
---|---|---|
56 | +1 | |
227 | The way how we process the extDefMapping file is identical in both cases. That's an index, keyed with the USRs of functions and then we get back a value. And the way how we use that value is different. In the PCH case that holds the path for the .ast file, in the ODM case that is the name of the source file which we must find in the compile db. So, I think the process of getting the AST for a USR requires the polymorphic behavior from the loaders. We discussed other alternatives with Endre. We were thinking that maybe the extDefMapping file should be identical in both cases. But then we would need to add the .ast postfixes for the entries in the PCH case. And we cannot just do that, because we may not know if what is the correct postfix. The user may have generated .pch files instead. Also, we don't want to compel any Clang user to use CodeChecker (CC will always create .ast files). CTU should be running fine by manually executing the independent steps. | |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
120 | I don't think we could do that. We need to merge ASTs of those TUs that are linkable. Otherwise the merge will be unsuccessful because of certain mismatches in Decls (structural eq will fail). |
Ok, looks good to me.
The minor nit regarding the naming is easy to fix before commit. The design question I had is not a blocker, my suggested alternative can be implemented later (if desired) in a backward-compatible way from the user's point of view.
clang/include/clang/CrossTU/CrossTranslationUnit.h | ||
---|---|---|
227 | Let me rephrase my concerns a bit. Do we really need a polymorphic ASTLoader to be present for the whole analysis? Wouldn't it make more sense to always do the same thing, i.e. if we are given a pch file load it, if we are given a source file, parse it? This way we would not be restricted to on-demand or two pass ctu analysis, but we could do any combination of the two. |
clang/docs/analyzer/user-docs/CrossTranslationUnit.rst | ||
---|---|---|
21 | I think it's PCH-based with a -. | |
25 | symbols' USR names | |
26 |
| |
26 | And perhaps add some cross-references to what a compilation database is, etc. These things are also documented within Clang's doc tree. | |
31 | Instead of a prefix, create an overview of PCH-based analysis, and add this and the next one as a 3rd-level heading? | |
110–113 | Are these flags documented somewhere? | |
207–208 | Same here for comments from above. | |
311–315 | These lines could be removed, I think, they don't add anything to the presentation. | |
316–317 | Due to the lack of colours, perhaps you could use ls -F which suffixes each directory with a /: reports/. | |
367 | as | |
368 | expect it to work only ... (to emphasise the limitation) | |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
120 | More clever logging of link commands and creating separate CDBs for each target or each binary could help this cause, but that needs infrastructural capabilities from both the build system and the analysis driver (i.e. CodeChecker). |
clang/include/clang/CrossTU/CrossTranslationUnit.h | ||
---|---|---|
227 | Well yeah, we could do that, it is a good idea, thanks! We will consider this in the future. I like in this idea that the command line options to Clang would be simplified. But then we must be transparent and show/log the user which method we are using. |
clang/docs/analyzer/user-docs/CrossTranslationUnit.rst | ||
---|---|---|
353 | What's this line? Is this added by Phab, or is this a weird merge conflict / parsing error of the patch? |
@whisperity Thanks for the review :3
clang/docs/analyzer/user-docs/CrossTranslationUnit.rst | ||
---|---|---|
110–113 | As with ctu-dir, these flags are only documented in analyzer options. However, it would indeed be beneficial to list CTU related flags here in the user documentation. I will create a new revision for that specifically. | |
311–315 | Good point, i remove it from here and from the above example as well. |
I am landing this. Thanks for the reviews @martong @balazske @xazax.hun @whisperity !
clang/lib/CrossTU/CMakeLists.txt | ||
---|---|---|
13 | We've been very careful to make clang (the compiler binary) only depend on clangToolingCore and not on the much bigger clangTooling library. Since clang depends on the CrossTU library, this breaks that. Can you reorganize things so that we don't need a dependency from clang-the-compiler-binary on clangTooling? |
Looks like this also breaks tests on Windows and Mac, eg http://45.33.8.238/win/13902/step_7.txt
It probably makes sense to revert while you investigate?
Sorry, this change broke a couple bots that are in the official CI, at least http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/7566 and http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/66618. I'm reverting this change -- feel free to reland after fixing the issue.
Non-linux buildbots were utterly broken :( . Trying to fix them...
Thanks for reverting this for me :)
Also I will investigate the possibility to not depend on Tooling.
clang/lib/CrossTU/CMakeLists.txt | ||
---|---|---|
13 | I have gone, and investigated where the main functionality of on-demand AST loading comes from. (Only after wishfully thinking changing the dependency from clangTooling to clangToolingCore would work, which of course didnt't ) It seems that this functionality used the major components of clangTooling, namely JSONCompilationDatabase::loadFromFile, ClangTool ctor/dtor, ClangTool::buildAST. |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
---|---|---|
358–377 | Is here a lambda necessary? I think it must not be over-used. In this case we can construct the file path as the first step and then use it to load the file, without using lambda call. | |
581 | Is here a special reason to use auto && type (and not auto & or std::string &)? Probably this code is faulty: The CmdPart is moved out and then destructed, before the content of it (where the c_str points to) is used? |
Please check the summary of the patch, it seems to contain old information as well.
clang/docs/analyzer/user-docs/CrossTranslationUnit.rst | ||
---|---|---|
210–212 | This is obsolete information. | |
clang/include/clang/CrossTU/CrossTranslationUnit.h | ||
234 | Is the explicit needed here? | |
clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def | ||
388–389 | "valid compilation database" is this line still what we wish to say? A YAML is used, so most likely we need a "valid invocation list" or something. | |
393–398 | Maybe it would be worthwhile to add a dummy example like main.cpp: g++ hello-world.cpp which has the proper format into this help. | |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
21–25 | The idea here was to not depend on libtooling but these lines are still here! | |
32 | <> -> "" and order | |
119–120 | still using terms "compilation database", is this intended? | |
581 | @balazske This is a generic lambda where auto&& is the universal reference. It is equivalent to writing template <typename T> f(T&& param). It obeys the usual point of instantiation and reference cascading rules, if a const std::string& is given as T, const std::string& && will become a single &. | |
607 | Are the headers from which these types come from included? | |
clang/test/Analysis/Inputs/ctu-other.c | ||
34–38 | Possibly unrelated change? | |
clang/test/Analysis/ctu-on-demand-parsing.c | ||
2–16 | Does the LIT syntax allow for better organising these commands? This is now a bit hard to read Something like this would help: // RUN: rm -rf %t && mkdir -p %t // RUN: cp "%s" "%t/ctu-on-demand-parsing.c" && cp "%S/Inputs/ctu-other.c" "%t/ctu-other.c" // [empty] // [comment] // RUN: [json magic] // RUN: [yaml magic] // [empty] // RUN: extdef map // [empty] // RUN: cc1 |
Hi Endre, just checked the latest update. Overall looks good to me, but found some nits.
clang/include/clang/CrossTU/CrossTranslationUnit.h | ||
---|---|---|
260 | Should we rename this member to ....PathTo....InvocationList... ? | |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
579 | Could we avoid this transfer if InvocationList was storing const char * values instead of std::strings? Why can't we store char*s in InvocationList? | |
591 | typo: filed -> field | |
clang/test/Analysis/ctu-on-demand-parsing.c | ||
7 | Perhaps we could document here that the compile_commands.json is needed ONLY for %clang_extdef_map. | |
clang/test/Analysis/ctu-on-demand-parsing.cpp | ||
10 | I know this is just a nit, but this is very hard to read. Do you think we could break this up (and other long lines) into multiple lines but within the same RUN directive? |
The remaining documentation and test changes are also underway.
clang/include/clang/CrossTU/CrossTranslationUnit.h | ||
---|---|---|
227 | I will implement this separation by suffixes in the next change. | |
234 | This constructor started out with having a single param only. I agree, that explicit is no longer necessary. | |
260 | Renamed to InvocationListFilePath. | |
285 | Most usages of the CI require a non-const ref (this is due to lacking API support for getting analyzer options), and I found it more convenient to just take a non-const ref, instead of const_casting it away later. Not sure myself, but I think I will roll with this now (and I should definitely check the API of CompilerInstance). | |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
119–120 | renamed | |
120 | I have solved the problem of supporting multiple architectures by offloading the responsibility to the user (to the tool driving the analysis). The invocation list file is now supposed to contain only invocations with the same arch/target and this should also be the same as the arch/target of the main TU analyzed. If there is still ambiguity after this separation, this error should nevertheless be emitted ( IMHO ). | |
358–377 | Refactored to use path concatenation. | |
579 | We certainly could, I think there won't be null-characters in paths so storing char* s would not lose generality. However, I think the memory management done by std::string is worth it, as it is clear where the character-data lives in memory. | |
581 | Results of my research: In theory, the STL could do something fancy here if it wanted, but it is not the case in reality. If you want move-operations with algorithms (the ones not explicitly mentioning move in their names) you have to be explicit about it: move-iterators-fluent-cpp. | |
607 | Included the llvm/Support/YAMLParser.h | |
clang/test/Analysis/Inputs/ctu-other.c | ||
34–38 | Funny enough, this is very relevant. Dumping the AST does not errors out on this construction, however, the parser in on-demand mode does. This asm syntax is however supported by both. To document this, I have added a TODO as well. |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
---|---|---|
26 | Duped include in L34. |
clang/lib/CrossTU/CrossTranslationUnit.cpp | ||
---|---|---|
121–122 | These two lines could be formatted together. | |
clang/unittests/CrossTU/CrossTranslationUnitTest.cpp | ||
176–177 | Just for good measure, I would like a test case on the "invocation list is ambiguous" error. (So in case we in the future figure out how to disambiguate, we can remove that test case!) |
This breaks check-clang on mac: http://45.33.8.238/mac/15258/step_7.txt
Please take a look, and revert for now if it takes a while to fix.
Thanks for reporting this, i think the output is valuable for debugging. I have restricted the test case to linux for now.
I think it's PCH-based with a -.