This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/StaticAnalyzer/Core/
-
clang/
-
StaticAnalyzer/
-
Core/
-
AnalyzerOptions.h
-
PathSensitive/
1/1
ExprEngine.h
-
SubEngine.h
-
lib/StaticAnalyzer/
-
StaticAnalyzer/
-
Core/
1/2
AnalyzerOptions.cpp
-
CMakeLists.txt
2/7
CallEvent.cpp
-
ExprEngine.cpp
3/7
PathDiagnostic.cpp
-
Frontend/
4/4
AnalysisConsumer.cpp
-
CMakeLists.txt
-
test/Analysis/
-
Analysis/
-
Inputs/
-
ctu-chain.cpp
-
ctu-other.cpp
2/2
externalFnMap.txt
-
ctu-main.cpp
-
tools/scan-build-py/
-
scan-build-py/
-
libscanbuild/
-
__init__.py
62/64
analyze.py
2/2
arguments.py
5/5
clang.py
1/3
report.py
-
tests/unit/
-
unit/
2/2
test_analyze.py
-
test_clang.py

Differential D30691

[analyzer] Support for naive cross translational unit analysis
AbandonedPublic

Authored by xazax.hun on Mar 7 2017, 4:47 AM.

Download Raw Diff

Tokens

"Party Time" token, awarded by whisperity.

Details

Reviewers

dcoughlin
a.sidorin
zaks.anna
NoQ
danielmarjamaki
rsmith
• rizsotto
dkrupp
george.karpenkov
ilya-biryukov

Commits

rGeb0584bee413: [analyzer] Support for naive cross translation unit analysis
rC326323: [analyzer] Support for naive cross translation unit analysis
rL326323: [analyzer] Support for naive cross translation unit analysis

Summary

This patch adds support for naive cross translational unit analysis.

The aim of this patch is to be minimal to enable incremental development of the feature on the top of the tree. This patch should be an NFC in case CTUDir is not provided by the user.

When CTUDir is provided:

In case a function definition is not available it will be looked up from a textual index, whether it was available in another TU.
The AST dump of the other TU will be loaded and the function definition will be inlined.

One of the main limitations is that the coverage pattern of the analysis will change when this feature is enabled. For this reason, in the future, it might be better to include some heuristics to prefer examining shorter execution paths to the longer ones. Until then this feature is not recommended to be turned on by users unless they already fixed the important issues with this feature turned off.

We will cover more detailed analysis of this patch soon in our EuroLLVM talk: http://llvm.org/devmtg/2017-03//2017/02/20/accepted-sessions.html#7
We will talk about how this works and the memory usage, analysis time, coverage pattern change, limitations of the ASTImporter, how the length of the bug paths changed and a lot more.

Feel free to skip the review after the talk, but we wanted to make the code available in an easy to review format before the conference.

Note that the initial prototype was done by A. Sidorin et al.: http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html

Contributions to the measurements and the new version of the code: Peter Szecsi, Zoltan Gera, Daniel Krupp, Kareem Khazem.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In D30691#731617, @zaks.anna wrote:

I agree that scan-build or scan-build-py integration is an important issue to resolve here. What I envision is that users will just call scan-build and pass -whole-project as an option to it. Everything else will happen automagically:)

We contacted Laszlo and we have a pull request into scan-build that is under review. He is very helpful and supports the idea of scan-build-py supporting CTU analysis.

I do not quite understand why AST serialization is needed at all. Can we instead recompile the translation units on demand into a separate ASTContext and then ASTImport?

We did a prototype implementation of on-demand reparsing. On the C projects we tested, the runtime is increased by 10-30% compared to dumping the ASTs. Note that, it is relatively fast to parse C, I would expect a much bigger delta in case of C++ projects. Unfortunately, we weren't able to test that setting due to the ASTImporter limitations.

include/clang/AST/Mangle.h
59 ↗	(On Diff #94814)	Note that the newest version of this patch does not use name mangling, it uses USRs instead. This turned out to be a perfectly viable alternative and we did not see any behavioral changes on the project we tested after the transition.

Thanks for the reviews so far.
I think we have addressed all major concerns regarding this patch:

-(Anna) Scan-build-py integration of the functionality is nearly finished (see https://github.com/rizsotto/scan-build/issues/83) (--ctu switch performs both analysis phases at once). This I think could go in a different patch, but until we could keep the ctu-build.py and ctu-analyze.py scripts. Do you agree?

-(Devin,NoQ) In the externalFnMap.txt (which contains which function definitions is located where) Unified Symbol Resolution (USR) is used to identify functions instead of mangled names, which seems to work equally well for C and C++

-(Anna) Dumping the ASTs to the disk. We tried a version where, ASTs are not dumped in the 1st phase, but is recreated each time a function definition is needed from an external TU. It works fine, but the analysis time went up by 20-30% on open source C projects. Is it OK to add this functionality in a next patch? Or should we it as an optional feature right now?

Do you see anything else that would prevent this patch to get in?

Hello Daniel & Gabor,
Thank you very much for your work. This patch looks good to me but I think such a change should also be approved by maintainers.

-(Anna) Scan-build-py integration of the functionality is nearly finished (see https://github.com/rizsotto/scan-build/issues/83) (--ctu switch performs both analysis phases at once). This I think could go in a different patch, but until we could keep the ctu-build.py and ctu-analyze.py scripts. Do you agree?

It's important to bring this patch into the LLVM repo so that it becomes part of the clang/llvm project and is used. The whole point of adding CTU integration to scan-build-py is to make sure that there is a single tool that all/most users could use; adding the patch to a fork does not accomplish that goal. Also, I am not a fan of developing on downstream branches and that is against the LLVM Developer policy due to all the reasons described here: http://www.llvm.org/docs/DeveloperPolicy.html#incremental-development. This development style leads to fragmentation of the community and the project. Unfortunately, we often see cases where large patches developed out of tree never make it in as a result of not following this policy and it would be great to avoid this in the future.

This I think could go in a different patch, but until we could keep the ctu-build.py and ctu-analyze.py scripts. Do you agree?

It would be best to just add the scan-build-py support to the tree, especially, since the new scrips are not tested.

-(Anna) Dumping the ASTs to the disk. We tried a version where, ASTs are not dumped in the 1st phase, but is recreated each time a function definition is needed from an external TU. It works fine, but the analysis time went up by 20-30% on open source C projects.

I am curious which optimization strategies you considered. An idea that @NoQ came up with was to serialize only the most used translation units. Another idea is to choose the TUs that a particular file has most dependencies on and only inline functions from those.

What mode would you prefer? Would you pay the 20%-30% in speed but reduce the huge disk consumption? That might be a good option, especially, if you have not exhausted the ideas on how to optimize.

Is it OK to add this functionality in a next patch? Or should we it as an optional feature right now?

This depends on what the plan for going forward is. Specifically, if we do not need the serialization mode, you could remove that from this patch and add the new mode. If you think the serialization mode is essential going forward, we could have the other mode in a separate patch. (It would be useful to split out the serialization mode work into a separate patch so that we could revert it later on if we see that the other mode is better.)

I see some changes to the compiler proper, such as ASTImporter, ASTContext, SourceManager. We should split those changes into a separate patch and ask for review from people working on those components. You can point back to this patch, which would contain the analyzer related changes, and state that the other patch is blocking this work.

Thanks.

It would be best to just add the scan-build-py support to the tree, especially, since the new scrips are not tested.

OK. We will update this patch with the scan-build-py changes and remove the ctu-build.py and ctu-analyze.py scripts.

I am curious which optimization strategies you considered. An idea that @NoQ came up with was to serialize only the most used translation units. Another idea is to choose the TUs that a particular file has most dependencies on and only inline functions from those.

Both of these strategies could work in practice, we did not try them. We implemented the two extremes: serialize all TUs/don't serialize any of the TUs. Both of them could can be useful in practice as is (depending if the user is cpu/memory/disk space bound). I think we could try with the above suggested optimizations as an incremental improvement (and keep this initial patch as simple as possible).

What mode would you prefer? Would you pay the 20%-30% in speed but reduce the huge disk consumption? That might be a good option, especially, if you have not exhausted the ideas on how to optimize.

In this initial version I think we should keep the serializing mode. We just measured that the heap consumption of the non-serializing mode and it seems to be ~50% larger. Probably because the serializing mode only loads those AST fragments from the disk that is imported. But I can imagine that some user still want to use the non-serializing version which is not using extra disk space. So we will add the non-serializing mode in a next patch as an Analyzer option first (which we can turn into default behaviour later on). OK?

I see some changes to the compiler proper, such as ASTImporter, ASTContext, SourceManager. We should split those changes into a separate patch and ask for review from people working on those components. You can point back to this patch, which would contain the analyzer related changes, and state that the other patch is blocking this work.

Allright, we will do that.

So is it OK to proceed like this?

Some of the CTU related analyzer independent parts are being factored out.
The review is ongoing here: https://reviews.llvm.org/D34512

Another small and independent part is under review here: https://reviews.llvm.org/D34506

Regarding serializing vs not serializing and now vs later.

I think we eventually need to provide a reasonable default approach presented to the user. This approach shouldn't be hurting the user dramatically in any sense. Because serializing hurts the user's disk space dramatically, and not-serializing may be slower and a bit more memory-intensive but isn't too bad in all senses, out of these two options not-serializing is definitely preferable as a default approach.

Later we should definitely consider the alternative approaches that serialize only some ASTs, with the hope that one of them would turn out to be a better default approach.

From 2. it follows that for now it's better to keep both approaches around - as we believe that the ideal approach may be a combination of the two. Therefore it doesn't really matter in what order they land.

tl;dr: I propose to land serialization-based approach first, then land non-serialization-based approach later and make it default, then consider taking the best of the two and making it a new default.

Richard (added as reviewer) usually owns decisions around clang itself. Writing an email to cfe-dev with the numbers and wait for whether others have concerns would probably also be good.

Patch scan-build instead of using custom scripts
Rebase patch based on the proposed LibTooling CTU code

xazax.hun added a reviewer: • rizsotto.Jul 11 2017, 7:57 AM

xazax.hun added a parent revision: D34512: Add preliminary Cross Translation Unit support library.Jul 12 2017, 7:44 AM

whisperity added a subscriber: • rizsotto.Jul 24 2017, 7:48 AM

whisperity mentioned this in D34512: Add preliminary Cross Translation Unit support library.Aug 9 2017, 9:21 AM

Alexander_Droste added a subscriber: Alexander_Droste.Aug 12 2017, 2:22 AM

danielmarjamaki added inline comments.Aug 31 2017, 3:47 AM

tools/scan-build-py/libscanbuild/analyze.py
166	I believe you can write: for line in open(filename, 'r'):
173	this 'with' seems redundant. I suggest an assignment and then less indentation will be needed below
243	not a big deal but I would use early exits in this function
tools/scan-build-py/libscanbuild/clang.py
177	I am guessing that you can use cmd.find() instead of the loop

The Python code here still uses mangled name in their wording. Does this mean this patch is yet to be updated with the USR management in the parent patch?

tools/scan-build-py/libscanbuild/analyze.py
166	Do we want to rely on the interpreter implementation on when the file is closed. If for line in open(filename, 'r'): something() is used, the file handle will be closed based on garbage collection rules. Having this handle disposed after the iteration is true for the stock CPython implementation, but it is still nontheless an implementation specific approach. Whereas using `with` will explicitly close the file handle on the spot, no matter what.
173	I don't seem to understand what do you want to assign to what.

danielmarjamaki accepted this revision.Sep 1 2017, 1:07 AM

danielmarjamaki added inline comments.

tools/scan-build-py/libscanbuild/analyze.py
166	ok I did not know that. feel free to ignore my comment.
173	I did not consider the garbage collection. I assumed that out_file would Always be closed when it Went out of scope and then this would require less indentation: out_file = open(extern_fns_map_file, 'w') for mangled_name, ast_file in mangled_ast_pairs: out_file.write('%s %s\n' % (mangled_name, ast_file))
243	with "not a big deal" I mean; feel free to ignore my comment if you want to have it this way.

This revision is now accepted and ready to land.Sep 1 2017, 1:07 AM

While testing this I stumbled upon a crash with the following test case:

inc.h

#define BASE ((int*)0)
void foo();

main.c:

#include "inc.h"
void moo()
{
    int a = BASE[0];
    foo();
}

other.c

#include "inc.h"
void foo()
{
    int a = BASE[0];
}

Note that I used a custom checker that did not stop on the path like the DerefChecker would here. I did not know how to reproduce it with official checkers, but the issue should be understandable without reproduction.

With the given test a checker may produce two results for the null dereference in moo() and foo(). When analyzing main.c they will both be found and therefore sorted with PathDiagnostic.cpp "compareCrossTUSourceLocs".

If either of the FullSourceLocs is a MacroID, the call SM.getFileEntryForID(XL.getFileID()) will return a null pointer. The null pointer will crash the program when attempting to call ->getName() on it.

My solution was to add the following lines before the .getFileID() calls:

XL = XL.getExpansionLoc();
YL = YL.getExpansionLoc();

lib/StaticAnalyzer/Core/PathDiagnostic.cpp
391	see comment

In a similar case, static inline functions are an issue.

inc.h

void foo();
static inline void wow()
{
    int a = *(int*)0;
}

main.c

#include "inc.h"
void moo()
{
    foo();
    wow();
}

other.c

#include "inc.h"
void foo()
{
    wow();
}

The inline function is inlined into each calling AST as different AST objects. This causes the PathDiagnostics to be distinct, while pointing to the exact same source location.

When the compareCrossTUSourceLocs function tries to compare the FullSourceLocs, it cannot find a difference and FlushDiagnostics will assert on an erroneous compare function.

For my purposes I replaced the return statement of the compareCrossTUSourceLocs function with:

return XL.getFileID() < YL.getFileID();

A more correct fix would create only one unique diagnostic for both cases.

nikhgupt added a subscriber: nikhgupt.Sep 22 2017, 9:31 AM

Fixed an issue with source locations
Updated to latest trunk

Herald added a subscriber: baloghadamsoftware. · View Herald TranscriptSep 25 2017, 2:35 AM

In D30691#878711, @r.stahl wrote:

If either of the FullSourceLocs is a MacroID, the call SM.getFileEntryForID(XL.getFileID()) will return a null pointer. The null pointer will crash the program when attempting to call ->getName() on it.

Thank you for the report and the detailed analysis!
I did not want to get the expansion location since the original code did not query that either, so for now I just handle the case when I can not query a file entry. This case can occur even when no macros are in the code, for example when a PCH is referring to a nonexistent file.

In D30691#878830, @r.stahl wrote:
For my purposes I replaced the return statement of the compareCrossTUSourceLocs function with:
return XL.getFileID() < YL.getFileID();
A more correct fix would create only one unique diagnostic for both cases.

Thank you for the report, I could reproduce this after modifying the null dereference checker. My fear of using file IDs is that I don't know whether they are stable. So in subsequent runs, different paths might be chosen by the analyzer and this could be confusing to the user. I will think about a workaround that both stable and solves this assertion.

I see two possible ways to do the proper fix. One is to check explicitly for this case when the same function appears in multiple translation units. A better approach would be to have the ASTImporter handle this case. I think the proper fix is better addressed in a separate patch.

Please fix the incompatibility between analyze-build and lib/CrossTU in the format of externalFnMap.txt mappfing file.

tools/scan-build-py/libscanbuild/analyze.py
563	There is a incompatibility between this scan-build (analyze-build actually) implementation and new lib/CrossTU library. CrossTranslationUnitContext::loadExternalAST( StringRef LookupName, StringRef CrossTUDir, StringRef IndexName) expects the externalFnMap.txt to be in "functionName astFilename" format. however currently we generate here "functionName@arch astFilename" lines. One possible fix could be to create one externalFnMap.txt indexes per arch <collect-dir>/ast/x86_64/externalFnMap.txt <collect-dir>/ast/ppc64/externalFnMap.txt etc. and call clang analyze with the architecture specific map directory: e.g. ctu-dir=<collect-dir>/ast/x86_64 This would then work if the "to-be-analyzed" source-code is cross-compiled into multiple architectures. Would be useful to add a test-case too to check if the map file and ctu-dir content generated by analyze-build is compatible.
613	Maybe we could use the full target-triple for distinguishing the AST binaries, not only the architecture part. The sys part for example is probably important too and a "win32" AST may not be compatible with a "linux" AST.

This revision now requires changes to proceed.Oct 15 2017, 7:28 AM

Update the scan-build part to work correctly with the accepted version of libCrossTU

zaks.anna added a reviewer: george.karpenkov.Oct 23 2017, 11:56 AM

nikhgupt added a subscriber: tobiasvk.Oct 23 2017, 1:17 PM

Rebased on ToT

george.karpenkov requested changes to this revision.Nov 27 2017, 2:34 PM

george.karpenkov added inline comments.

tools/scan-build-py/libscanbuild/analyze.py
45	What would happen when multiple analyzer runs are launched concurrently in the same directory? Would they race on this file?
65	`run_analyzer_with_ctu` is an unfortunate function name, since it may or may not launch CTU depending on the passed arguments. Could we just call it `run_analyzer`?
104	Actually I would prefer a separate NFC PR just moving this function. This PR is already too complicated as it is.
121	Extensive `hasattr` usage is often a codesmell, and it hinders understanding. Firstly, shouldn't `args.ctu_phases.dir` be always available, judging from the code in `arguments.py`? Secondly, in what cases `args.ctu_phases` is not available when we are already going down the CTU code path? Shouldn't we throw an error at that point instead of creating a default configuration? (default-configurations-instead-of-crashing is also another way to introduce very subtle and hard-to-catch error modes)
129	Could you also specify what `func_map_lines` is (file? list? of strings?) in the docstring? There's specific Sphinx syntax for that. Otherwise it is hard to follow.
139	Could be improved with `defaultdict`: mangled_to_asts = defaultdict(set) ... mangled_to_asts[mangled_name].add(ast_file)
145	Firstly, no need to modify the set in order to get the first element, just use `next(iter(ast_files))`. Secondly, when exactly do conflicting names happen? Is it a bug? Shouldn't the tool log such cases?
146	Overall, instead of creating a dictionary with multiple elements, and then converting to a list, it's much simper to only add an element to `mangled_to_asts` when it is not already mapped to something (probably logging a message otherwise), and then just return `mangled_to_asts.items()`
161	Firstly, is `glob.glob` actually random, as the docstring is saying? If yes, can we make it not to be random (e.g. with `sorted`), as dealing with randomized input makes tracking down bugs so much harder?
169	If `write_global_map` is a closure, please use the fact that it would capture `ctudir` from the outer scope, and remove it from the argument list. That would make it more obvious that it does not change between invocations.
190	Having an analysis tool remove files is scary, what if (maybe not in this revision, but in a future iteration) a bug is introduced, and the tool removes user code instead? Why not just create a temporary directory with `tempfile.mkdtemp`, put all temporary files there, and then simply iterate through them? Then you would be able to get rid of the constant `CPU_TEMP_FNMAP_FOLDER` entirely, and OS would be responsible for cleanup.
244	Similarly to the comment above, I would prefer if analysis tool would not remove files (and I assume those are not huge ones?) Can we just use temporary directories?
255	Same as the comment above about removing folders. Also it seems like there should be a way to remove redundancy in `if collect / remove tree` block repeated twice.
285	Using JSON-serialization-over-environment-variables is very unorthodox, and would give very bizarre error messages if the user would try to customize settings (I assume that is the intent, right?) I think that separating those options into separate ENV variables (if they _have_ to be customizable) would be much better.
319	Again, is it possible to avoid JSON-over-environment-variables?
557	`mangled_name, path = fn_src_txt.split(" ", 1)` ?
584	`try/except/pass` is almost always bad. When can the error occur? Why are we ignoring it?
590	The above can be written more succinctly as: `ast_command = [opts['clang'], ...] + args + ['-w', ...]`
601	Similarly here, `funcmap_command` can be generated in one line using `+`
611	Again, why is this error ignored?
632	In which case is this branch hit? Isn't improperly formed input argument indicative of an internal error at this stage?
718	This blank line should not be in this PR.
tools/scan-build-py/libscanbuild/clang.py
168	I might be missing something here, but why is the ability to call `--version` indicative of CTU support? At worst, this can lead to obscuring real bugs: imagine if the user has `args.clang` pointed to broken/non-existent binary, then `is_ctu_capable` would simply return `False` (hiding the original error!), which would show a completely misleading error message. Just checking `func_map_cmd` seems better, but even in this case we should probably log any errors occurring on `-version` call (such messages would really aid debugging)
177	Seconded, would prefer this rewritten using `separator = cmd.find('-triple')`
tools/scan-build-py/libscanbuild/report.py
270	I understand the intent here, but it seems it should be handled at a different level: would it be hard to change Clang to only write the report file at the very end, when no crash should be encountered? Or make parsers not choke on empty fields?
tools/scan-build-py/tests/unit/test_analyze.py
338	Probably more tests are required for almost 400 lines of functional Python code in this PR. Would it be hard to have a full LIT-style integration test? E.g. have a dummy script emulating Clang with a dummy directory structure, which would show how all pieces are meant to fit together?

This revision now requires changes to proceed.Nov 27 2017, 2:34 PM

Thanks George for the review. I will start working on the code right away. I've tried to answer the simpler cases.

tools/scan-build-py/libscanbuild/analyze.py
45	Yes and no. The 1st, collect part of CTU creates this file by aggregating all data from the build system, while the 2nd part which does the analysis itself only reads it. Multiple analysis can use this file simultaneously without problem. However, multiple collect phases launched on a system does not make sense. In this case, the later one would write over the previous results with the same data.
65	We have an other run_analyzer method but still, it is a good idead, I will make something better up.
104	Yes. The only reason was for the move to make it testable. However, we need more tests as you wrote down below.
121	It definitely needs more comments, so thanks, I will put them here. There are two separate possibilities here for this code part: The user does not want CTU at all. In this case, no CTU phases are switched on (collect and analyze), so no CTU code will run. This is why dir has no importance in this case. CTU capabilities were not even detected, so the help and available arguments about CTU are also missing. In this case we create a dummy config telling that everything is switched off. The reason for using hasattr was that not having CTU capabilities is not an exceptional condition, rather a matter of configuration. I felt that handling this with an exception would be misleading.
129	Thanks, I'll do it.
145	Nice catch, thanks. For your second question: Unfortunately, it is not a bug, rather a misfeature which we can't handle yet. There can be tricky build-systems where a single file with different configurations is built multiple times, so the same function signature will be present in multiple link units. The other option is when two separate compile units have an exact same function signature, but they are never linked together, so it is not a problem in the build itself. In this case, the cross compilation unit functionality cannot tell exactly which compilation unit to turn to for the function, because there are multiple valid choices. In this case, we deliberately leave such functions out to avoid potential problems. It deserves a comment I think.
161	"random" here means we don't care. So you are right. glob.glob doc says it is arbitrary but I couldn't find that it is deterministic or not. Probably sorting it will not do harm.
190	Yes, you are right. We are essentially using a temp dir. Because of the size we first had to put it next to the project (not on tmp drive for instance) and for debugging purposes we gave a name to it. Still it can be done with mkdtemp as well.
244	Unlike above, here we do remove non-temporary data intentionally. The user asks here to do the recollection of CTU data for a fresh start. Because there is no "clean" functionality in the analyzer interface itself, this seemed to be the easiest-on-user solution to save him/her an extra effort or a new command.
285	To be honest, I could never make a decision over this. The previous version of scan-build-py used extensively environment variables for everything which ended up in a huge mess and env contamination. There was an effort to reduce this to only a few, well-designated ones. Still there is the need of adding new data to the environment.
584	I think this code is redundant with the if above.
632	An other part of scan-build-py, analyze_cc uses namedtuple to json format to communicate. However, the names are not coming back from json, so this code helps in this. This is the case when someone uses the whole toolset with compiler wrapping. All the environment variable hassle is also happening because of this. So these env vars are not for user modification (as you've suggested earlier).
tools/scan-build-py/libscanbuild/clang.py
168	The original idea was that clang can give information about CTU support itself. However, it never happened because the analyzer is so deep down in the system. So I am open to remove the clang binary check here. However, clang binary is needed anyway, so the whole toolset will still throw an error later on not having a clang binary.

The code modifications are coming soon (after doing some extensive testing) for the scan-build part.

tools/scan-build-py/tests/unit/test_analyze.py
338	You are right. The testing infra in scan-build-py is not right anyway (uses nosetests). However, this should be a new patch as you've mentioned earlier.

gerazo added inline comments.Dec 6 2017, 7:17 AM

tools/scan-build-py/libscanbuild/analyze.py
146	The reason for the previous is that we need to count the occurence number ofdifferent mappings only let those pass through which don't have multiple variations.
190	Finally, I came to the conclusion that mkdtemp would not be better than the current solution. In order to find our created dir by other threads, we need a designated name. Suffixing it by generated name would further complicate things as we need not to allow multiple concurrent runs here. The current solution is more robust from this point of view.
243	I've checked it through. The only place for an early exit now would be before the else. The 1st and 2nd ifs are in fact non-orthogonal.
255	Th previous call for data removal happens because the user asked for a collect run, so we clean data to do a recollection. This second one happens because the user asked for a full recollection and anaylsis run all in one. So here we destroy the temp data for user's convenience. This happens after, not before like previously. The default behavior is to do this when the user uses the tool the easy way (collect and analyze all in one) and we intentionally keep collection data if the user only asks for a collect or an analyze run. So with this, the user can use a collect run's results for multiple analyze runs. This is written in the command line help. I will definitely put comments here to explain.
319	There is an other thing against changing this. Currently the interface here using env variables is used by intercept-build, analyze-build and scan-build tool as well. In order to drop json, we need to change those tools too. It would be a separate patch definitely.
584	Here the folders are created on demand. Because these are created parallel by multiple processes, there is small chance that an other process already created the folder between the isdir check and the makedirs call. This is why the the pass is needed to make it always run correctly. I will add a comment.
590	After several iterations of the code, I find it easier to version control such multiline constructs. If someone changes a data source, it is clear which one (which line) was modified. The succint notation does not allow clean VCS annotations.

george.karpenkov added inline comments.Dec 6 2017, 2:03 PM

tools/scan-build-py/libscanbuild/analyze.py
45	However, multiple collect phases launched on a system does not make sense Why not? What about a system with multiple users, where code is in the shared directory? Or even simpler: I've launched a background process, forgot about it, and then launched it again? In this case, the later one would write over the previous results with the same data. That is probably fine, I am more worried about racing, where process B would be reading a partially overriden file (not completely sure whether it is possible)
104	Sure, but that separate PR can also include tests.
121	Right, but instead of doing (1) and (2) can we have a separate (maybe hidden from user) param called e.g. `ctu_enabled` which would explicitly communicate that?
145	In this case, we deliberately leave such functions out to avoid potential problems We probably want to log it, don't we?
146	ah, OK!
190	OK
244	OK!
319	OK I didn't know that the JSON interface was used by other tools. In that case, ignore my comment.
590	OK. Though you could still use split addition across multiple lines with `\`
632	OK so `opts['ctu']` is a tuple or a named tuple depending on how this function is entered? BTW could you point me to the `analyze_cc` entry point? For the purpose of having more uniform code with less cases to care about, do you think we could just use ordinary tuples instead of constructing a named one, since we have to deconstruct an ordinary tuple in any case?
tools/scan-build-py/libscanbuild/arguments.py
376	BTW can we also explicitly add `dest='ctu_dir'` here, as otherwise I was initially very confused as to where the variable is set.
tools/scan-build-py/libscanbuild/clang.py
168	so the whole toolset will still throw an error later on not having a clang binary. Of course, but I think that would be easier to debug, and the error would mean that Clang is not available, not that CTU is not working.

@gerazo addressed some review comments in scan-build-py.

Herald added a subscriber: rnkovacs. · View Herald TranscriptDec 7 2017, 9:54 AM

Python part looks good to me. I don't know whether @dcoughlin or @NoQ would want to insert additional comments on C++ parts.

gerazo added inline comments.Dec 11 2017, 5:30 AM

tools/scan-build-py/libscanbuild/analyze.py
45	I see your point. In order to create the multiple user scenario you've mentioned, those users need to give the exact same output folders for their jobs. Our original bet was that our users are not doing this as the scan-build tool itself is also cannot be used this way. Still the process left there is something to consider. I will plan some kind of locking mechanism to avoid situations like this.
632	Using a NamedTuple improves readability of the code a lot with less comments. It is unfortunate that serializing it is not solved by Python. I think moving this code to the entry point would make the whole thing much nicer. The entry point is at analyze_compiler_wrapper
tools/scan-build-py/libscanbuild/arguments.py
376	Yes, of course.

Further improvements to python script part.

george.karpenkov added inline comments.Dec 12 2017, 11:28 AM

tools/scan-build-py/libscanbuild/report.py
257	Minor nitpicking: type comments are semi-standardized with Sphinx-style auto-generated documentation, and should be a part of the docstring.

I've tried using the patch, and I got blocked at the following: CTU options are only exposed when one goes through analyze-build frontend, which requires compile_commands.json to be present. I've used libear to generate compile_commands.json, but the generated JSON does not contain the command field, which causes @require before run to die (also, due to the passing style this error was unnecessarily difficult to debug).
So could you write a short documentation somewhere how all pieces fit together? What entry point should be used, what should people do who don't have a build system-generated compile_commands.jsonetc. etc.

This revision now requires changes to proceed.Dec 13 2017, 5:26 PM

Some comments on the C++ inline.

include/clang/AST/ASTContext.h
82 ↗	(On Diff #126543)	Is this forward declaration needed?
include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
63	I don't think CrossTranslationUnitContext belongs in ExprEngine (it doesn't really have much to do with transfer functions and graph construction. Can you move it to AnalysisDeclContextManager instead? Also, when you move it to AnalysisManager can you make it a pointer that is NULL when naive cross-translation support is not enabled? This will make it more clear that the cross-translation unit support will not always be available.
lib/StaticAnalyzer/Core/AnalyzerOptions.cpp
400	Can you also add an analyzer option that is something like 'enable-naive-cross-translation-unit-analysis' and defaults to false? I'd like to avoid using the presence of 'ctu-dir' as an indication that the analyzer should use the naive CTU analysis. This way when if add a less naive CTU analysis we'll be able to the CTUDir for analysis artifacts (such as summaries) for the less naive CTU analysis as well.
lib/StaticAnalyzer/Core/CallEvent.cpp
379	This downcast is an indication that the CTUContext is living in the wrong class.
386	Can this logic be moved to AnalysisDeclContext->getBody()? CallEvent::getRuntimeDefinition() is really all about modeling function dispatch at run time. It seems odd to have the cross-translation-unit loading (which is about compiler book-keeping rather than semantics) here.
396	I don't think it makes sense to diagnose index errors here. Doing it when during analysis means that, for example, the parse error could be emitted or not emitted depending on whether the analyzer thinks a particular call site is reached. It would be better to validate/parse the index before starting analysis rather than during analysis itself.
lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp
199	There is no need for AnalysisConsumer to take both a CompilerInstance and a Preprocessor. You can just call `getPreprocessor()` to get the preprocessor from the CompilerInstance

In D30691#954740, @george.karpenkov wrote:

I've tried using the patch, and I got blocked at the following: CTU options are only exposed when one goes through analyze-build frontend, which requires compile_commands.json to be present. I've used libear to generate compile_commands.json, but the generated JSON does not contain the command field, which causes @require before run to die (also, due to the passing style this error was unnecessarily difficult to debug).
So could you write a short documentation somewhere how all pieces fit together? What entry point should be used, what should people do who don't have a build system-generated compile_commands.jsonetc. etc.

Basically this patch does not change the way scan-build-py is working. So you can either create a compile_commands.json using intercept-build and than use analyze-build to use the result. Or you can do the two together using scan-build. We only offer the CTU functionality in analyze-build, because CTU needs multiple runs anyway, so on the spot build action capture and analyze is not possible with CTU. The general usage of scan-build-py is written in clang/tools/scan-build-py/README.md

xazax.hun marked 6 inline comments as done.Dec 19 2017, 5:29 AM

xazax.hun added inline comments.

lib/StaticAnalyzer/Core/CallEvent.cpp
386	I just tried that and unfortunately, that introduces cyclic dependencies. CrossTU depends on Frontend. Frontend depends on Sema. Sema depends on Analysis. Making Analysis depending on CrossTU introduces the cycle.
396	While I agree, right now this validation is not the part of the analyzer but the responsibility of the "driver" script for example CodeChecker. It is useful to have this diagnostics to catch bugs in the driver.

Address some review comments
Rebased on ToT

xazax.hun added inline comments.Dec 19 2017, 7:11 AM

lib/StaticAnalyzer/Core/AnalyzerOptions.cpp
407	This option might not be the most readable but this way experimental is a prefix. Do you prefer this way or something like `enable-experimental-naive-ctu-analysis`?

Had a fresh look on the C++ part, it is super clean now, i'm very impressed :)

lib/StaticAnalyzer/Core/CallEvent.cpp
373–394	*Humbly suggests to refactor whatever we need here into `SubEngine`'s virtual method(s). `getAnalysisManager()` is already there, so i guess we only need to expose `getCrossTranslationUnitContext()`.
385	I think `CallEvent` is the last place where i'd prefer to hardcode this filename. Maybe hardcode it in `CrossTranslationUnitContext` or get from `AnalyzerOptions`? (the former, i guess, because i doubt anybody would ever want to change it).
lib/StaticAnalyzer/Core/PathDiagnostic.cpp
418–424	It seems to me that `XDL` and `YDL` are exactly the same as `XL` and `YL` we've seen at the beginning of the function. ...we still have only one `SourceManager`, right?

xazax.hun added inline comments.Jan 11 2018, 1:12 PM

lib/StaticAnalyzer/Core/PathDiagnostic.cpp
418–424	Is this true? One is the location associated with the PathDiagnostic the other is the location of the Decl associated with the issue. I do not have deep understanding of this part of the code but not sure if these are guaranteed to be the same.

NoQ added inline comments.Jan 11 2018, 1:25 PM

lib/StaticAnalyzer/Core/PathDiagnostic.cpp
418–424	Whoops, you're totally right, never mind. Comments might have probably helped me understand that faster.

Fixed review comments

xazax.hun added inline comments.Jan 11 2018, 1:32 PM

lib/StaticAnalyzer/Core/PathDiagnostic.cpp
418–424	Sure, comments might help, but I want to keep the changes in this patch focused and minimal, so I prefer to do such modifications in a separate patch.

nikhgupt added inline comments.Jan 25 2018, 3:43 PM

lib/StaticAnalyzer/Core/PathDiagnostic.cpp
395	getName could yield incorrect results if two files in the project have the same name. This might break the assert for PathDiagnostics 'total ordering' and 'uniqueness'. Maybe replacing FileEntry's getName with FullSourceLoc's getFileID could resolve this.

Herald added a subscriber: hintonda. · View Herald TranscriptJan 25 2018, 3:43 PM

xazax.hun marked 71 inline comments as done.Feb 9 2018, 4:19 AM

Python code looks OK to me, I have one last request: could we have a small documentation how the whole thing is supposed work in integration, preferably on an available open-source project any reader could check out?
I am asking because I have actually tried and failed to launch this CTU patch on a project I was analyzing.

hintonda removed a subscriber: hintonda.Feb 10 2018, 3:18 AM

gerazo added inline comments.Feb 12 2018, 6:38 AM

tools/scan-build-py/libscanbuild/analyze.py
718	Scheduled to be done.
tools/scan-build-py/libscanbuild/report.py
270	After careful investigation, I have to say, it would be very hard to tell that we've done this right in the current setup. Unlike the the rest of clang, ASTImporter.cpp is not a strong part of the system. It has a lot of ongoing fixes and probably more problems (missing CXX functionality) on the way. This is the part which makes CTU less reliable than clang generally. In order to elegantly survive a problem of this unit, we need to leave the other things intact and fix ASTImporter instead (this is an ongoing effort). Until then, this fix seems good enough.

Rebased to current ToT
Fixed a problem that the scan-build-py used an old version of the ctu configuration option
Added a short guide how to use CTU

In D30691#1003514, @george.karpenkov wrote:

Python code looks OK to me, I have one last request: could we have a small documentation how the whole thing is supposed work in integration, preferably on an available open-source project any reader could check out?
I am asking because I have actually tried and failed to launch this CTU patch on a project I was analyzing.

We added the documentation. Could you recheck? Thanks in advance!

lib/StaticAnalyzer/Core/PathDiagnostic.cpp
395	Thank you, this is a known problem that we plan to address in a follow-up patch.

Thanks Gabor, this looks good to me. Please commit!

george.karpenkov accepted this revision.Feb 27 2018, 10:04 PM

This revision was not accepted when it landed; it landed in state Needs Review.Feb 28 2018, 5:25 AM

Closed by commit rL326323: [analyzer] Support for naive cross translation unit analysis (authored by xazax). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptFeb 28 2018, 5:25 AM

This revision was not accepted when it landed; it landed in state Needs Review.Feb 28 2018, 5:25 AM

Closed by commit rC326323: [analyzer] Support for naive cross translation unit analysis (authored by xazax). · Explain Why

This revision was automatically updated to reflect the committed changes.

whisperity awarded a token.Feb 28 2018, 5:40 AM

The changes were reverted: http://llvm.org/viewvc/llvm-project?rev=326432&view=rev
Gabor, could you take a look?

@ilya-biryukov

Could you please provide some more details where the cyclic dependency is? I cannot reproduce the problem and usually cmake fails when there is a cyclic dependency and you are building shared libraries.

Resubmitted in https://reviews.llvm.org/rL326439

Phabricator did not display the mailing list conversation. The point is, the circular dependency does not exist in the upstream version of clang. The reason is that CMake does not track the header includes as dependencies. There might be some layering violations with the header includes but those are independent of this patch and need to be fixed separately.

r.stahl mentioned this in D45564: [analyzer] Fix null deref in AnyFunctionCall::getRuntimeDefinition.Apr 12 2018, 5:14 AM

r.stahl mentioned this in D48474: [analyzer][ctu] fix unsortable diagnostics.Jun 22 2018, 1:24 AM

Just noticed: getRuntimeDefinition() has a lot of overrides in CallEvent sub-classes, and there paths that don't defer to AnyFunctionCall::getRuntimeDefinition(), eg., CXXInstanceCall::getRuntimeDefinition() => if (MD->isVirtual()) .... Which means that for some calls we aren't even trying to make a CTU lookup.

Herald added a subscriber: mikhail.ramalho. · View Herald TranscriptJul 16 2018, 2:57 PM

Szelethus added a subscriber: Szelethus.Jul 17 2018, 8:39 AM

Which means that for some calls we aren't even trying to make a CTU lookup.

Thanks @NoQ, we will take a look at it!

@NoQ , @dkrupp

CallEvent::getRuntimeDefinition is overwritten in

AnyFunctionCall
CXXInstanceCall
CXXMemberCall
CXXDestructorCall
ObjCMethodCall

AnyFunctionCall handles the CTU logic.
CXXInstanceCall calls into AnyFunctionCall if the function is not virtual.
If the function is virtual then we try to find the Decl of it via the dynamic type info.
At this point, we could get the definition of that function via the CTU logic, indeed.
This is something we should implement in the future.

However, it seems like this is the only case (not considering ObjC).
CXXMemberCall calls back to AnyFunctionCall or to CXXInstanceCall.
CXXDestructorCall does the same.

Just added a new entry to our roadmap: https://github.com/Ericsson/clang/issues/435

Aha, cool, so it's probably just virtual functions.

Revision Contents

Path

Size

include/

clang/

StaticAnalyzer/

Core/

AnalyzerOptions.h

21 lines

PathSensitive/

ExprEngine.h

17 lines

SubEngine.h

7 lines

lib/

StaticAnalyzer/

Core/

23 lines

1 line

23 lines

32 lines

19 lines

Frontend/

AnalysisConsumer.cpp

15 lines

CMakeLists.txt

1 line

test/

Analysis/

Inputs/

20 lines

67 lines

13 lines

58 lines

tools/

scan-build-py/

libscanbuild/

3 lines

279 lines

76 lines

27 lines

22 lines

tests/

unit/

test_analyze.py

84 lines

test_clang.py

12 lines

Diff 129509

include/clang/StaticAnalyzer/Core/AnalyzerOptions.h

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	private:
Optional<bool> WidenLoops;		Optional<bool> WidenLoops;

/// \sa shouldUnrollLoops		/// \sa shouldUnrollLoops
Optional<bool> UnrollLoops;		Optional<bool> UnrollLoops;

/// \sa shouldDisplayNotesAsEvents		/// \sa shouldDisplayNotesAsEvents
Optional<bool> DisplayNotesAsEvents;		Optional<bool> DisplayNotesAsEvents;

		/// \sa getCTUDir
		Optional<StringRef> CTUDir;

		/// \sa getCTUIndexName
		Optional<StringRef> CTUIndexName;

		/// \sa naiveCTUEnabled
		Optional<bool> NaiveCTU;


/// A helper function that retrieves option for a given full-qualified		/// A helper function that retrieves option for a given full-qualified
/// checker name.		/// checker name.
/// Options for checkers can be specified via 'analyzer-config' command-line		/// Options for checkers can be specified via 'analyzer-config' command-line
/// option.		/// option.
/// Example:		/// Example:
/// @code-analyzer-config unix.Malloc:OptionName=CheckerOptionValue @endcode		/// @code-analyzer-config unix.Malloc:OptionName=CheckerOptionValue @endcode
/// or @code-analyzer-config unix:OptionName=GroupOptionValue @endcode		/// or @code-analyzer-config unix:OptionName=GroupOptionValue @endcode
/// for groups of checkers.		/// for groups of checkers.
▲ Show 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	public:
/// Returns true if the bug reporter should transparently treat extra note		/// Returns true if the bug reporter should transparently treat extra note
/// diagnostic pieces as event diagnostic pieces. Useful when the diagnostic		/// diagnostic pieces as event diagnostic pieces. Useful when the diagnostic
/// consumer doesn't support the extra note pieces.		/// consumer doesn't support the extra note pieces.
///		///
/// This is controlled by the 'extra-notes-as-events' option, which defaults		/// This is controlled by the 'extra-notes-as-events' option, which defaults
/// to false when unset.		/// to false when unset.
bool shouldDisplayNotesAsEvents();		bool shouldDisplayNotesAsEvents();

		/// Returns the directory containing the CTU related files.
		StringRef getCTUDir();

		/// Returns the name of the file containing the CTU index of functions.
		StringRef getCTUIndexName();

		/// Returns true when naive cross translation unit analysis is enabled.
		/// This is an experimental feature to inline functions from another
		/// translation units.
		bool naiveCTUEnabled();

public:		public:
AnalyzerOptions() :		AnalyzerOptions() :
AnalysisStoreOpt(RegionStoreModel),		AnalysisStoreOpt(RegionStoreModel),
AnalysisConstraintsOpt(RangeConstraintsModel),		AnalysisConstraintsOpt(RangeConstraintsModel),
AnalysisDiagOpt(PD_HTML),		AnalysisDiagOpt(PD_HTML),
AnalysisPurgeOpt(PurgeStmt),		AnalysisPurgeOpt(PurgeStmt),
DisableAllChecks(0),		DisableAllChecks(0),
ShowCheckerHelp(0),		ShowCheckerHelp(0),
Show All 25 Lines

include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h

Show All 32 Lines
class CXXConstructExpr;		class CXXConstructExpr;
class CXXDeleteExpr;		class CXXDeleteExpr;
class CXXNewExpr;		class CXXNewExpr;
class CXXTemporaryObjectExpr;		class CXXTemporaryObjectExpr;
class CXXThisExpr;		class CXXThisExpr;
class MaterializeTemporaryExpr;		class MaterializeTemporaryExpr;
class ObjCAtSynchronizedStmt;		class ObjCAtSynchronizedStmt;
class ObjCForCollectionStmt;		class ObjCForCollectionStmt;

		namespace cross_tu {
		class CrossTranslationUnitContext;
		}

namespace ento {		namespace ento {

class AnalysisManager;		class AnalysisManager;
class CallEvent;		class CallEvent;
class CXXConstructorCall;		class CXXConstructorCall;

class ExprEngine : public SubEngine {		class ExprEngine : public SubEngine {
public:		public:
/// The modes of inlining, which override the default analysis-wide settings.		/// The modes of inlining, which override the default analysis-wide settings.
enum InliningModes {		enum InliningModes {
/// Follow the default settings for inlining callees.		/// Follow the default settings for inlining callees.
Inline_Regular = 0,		Inline_Regular = 0,
/// Do minimal inlining of callees.		/// Do minimal inlining of callees.
Inline_Minimal = 0x1		Inline_Minimal = 0x1
};		};

private:		private:
		cross_tu::CrossTranslationUnitContext &CTU;
		dcoughlinUnsubmitted Done Reply Inline Actions I don't think CrossTranslationUnitContext belongs in ExprEngine (it doesn't really have much to do with transfer functions and graph construction. Can you move it to AnalysisDeclContextManager instead? Also, when you move it to AnalysisManager can you make it a pointer that is NULL when naive cross-translation support is not enabled? This will make it more clear that the cross-translation unit support will not always be available. dcoughlin: I don't think CrossTranslationUnitContext belongs in ExprEngine (it doesn't really have much to…

AnalysisManager &AMgr;		AnalysisManager &AMgr;

AnalysisDeclContextManager &AnalysisDeclContexts;		AnalysisDeclContextManager &AnalysisDeclContexts;

CoreEngine Engine;		CoreEngine Engine;

/// G - the simulation graph.		/// G - the simulation graph.
ExplodedGraph& G;		ExplodedGraph& G;
Show All 25 Lines	private:
/// The functions which have been analyzed through inlining. This is owned by		/// The functions which have been analyzed through inlining. This is owned by
/// AnalysisConsumer. It can be null.		/// AnalysisConsumer. It can be null.
SetOfConstDecls *VisitedCallees;		SetOfConstDecls *VisitedCallees;

/// The flag, which specifies the mode of inlining for the engine.		/// The flag, which specifies the mode of inlining for the engine.
InliningModes HowToInline;		InliningModes HowToInline;

public:		public:
ExprEngine(AnalysisManager &mgr, bool gcEnabled,		ExprEngine(cross_tu::CrossTranslationUnitContext &CTU, AnalysisManager &mgr,
SetOfConstDecls *VisitedCalleesIn,		bool gcEnabled, SetOfConstDecls *VisitedCalleesIn,
FunctionSummariesTy *FS,		FunctionSummariesTy *FS, InliningModes HowToInlineIn);
InliningModes HowToInlineIn);

~ExprEngine() override;		~ExprEngine() override;

/// Returns true if there is still simulation state on the worklist.		/// Returns true if there is still simulation state on the worklist.
bool ExecuteWorkList(const LocationContext *L, unsigned Steps = 150000) {		bool ExecuteWorkList(const LocationContext *L, unsigned Steps = 150000) {
return Engine.ExecuteWorkList(L, Steps, nullptr);		return Engine.ExecuteWorkList(L, Steps, nullptr);
}		}

Show All 15 Lines	public:
CheckerManager &getCheckerManager() const {		CheckerManager &getCheckerManager() const {
return *AMgr.getCheckerManager();		return *AMgr.getCheckerManager();
}		}

SValBuilder &getSValBuilder() { return svalBuilder; }		SValBuilder &getSValBuilder() { return svalBuilder; }

BugReporter& getBugReporter() { return BR; }		BugReporter& getBugReporter() { return BR; }

		cross_tu::CrossTranslationUnitContext *getCrossTranslationUnitContext() {
		return &CTU;
		}

const NodeBuilderContext &getBuilderContext() {		const NodeBuilderContext &getBuilderContext() {
assert(currBldrCtx);		assert(currBldrCtx);
return *currBldrCtx;		return *currBldrCtx;
}		}

bool isObjCGCEnabled() { return ObjCGCEnabled; }		bool isObjCGCEnabled() { return ObjCGCEnabled; }

const Stmt *getStmt() const;		const Stmt *getStmt() const;
▲ Show 20 Lines • Show All 533 Lines • Show Last 20 Lines

include/clang/StaticAnalyzer/Core/PathSensitive/SubEngine.h

Show All 18 Lines

namespace clang {		namespace clang {

class CFGBlock;		class CFGBlock;
class CFGElement;		class CFGElement;
class LocationContext;		class LocationContext;
class Stmt;		class Stmt;

		namespace cross_tu {
		class CrossTranslationUnitContext;
		}

namespace ento {		namespace ento {

struct NodeBuilderContext;		struct NodeBuilderContext;
class AnalysisManager;		class AnalysisManager;
class ExplodedNodeSet;		class ExplodedNodeSet;
class ExplodedNode;		class ExplodedNode;
class ProgramState;		class ProgramState;
class ProgramStateManager;		class ProgramStateManager;
Show All 9 Lines	class SubEngine {
virtual void anchor();		virtual void anchor();
public:		public:
virtual ~SubEngine() {}		virtual ~SubEngine() {}

virtual ProgramStateRef getInitialState(const LocationContext *InitLoc) = 0;		virtual ProgramStateRef getInitialState(const LocationContext *InitLoc) = 0;

virtual AnalysisManager &getAnalysisManager() = 0;		virtual AnalysisManager &getAnalysisManager() = 0;

		virtual cross_tu::CrossTranslationUnitContext *
		getCrossTranslationUnitContext() = 0;

virtual ProgramStateManager &getStateManager() = 0;		virtual ProgramStateManager &getStateManager() = 0;

/// Called by CoreEngine. Used to generate new successor		/// Called by CoreEngine. Used to generate new successor
/// nodes by processing the 'effects' of a block-level statement.		/// nodes by processing the 'effects' of a block-level statement.
virtual void processCFGElement(const CFGElement E, ExplodedNode* Pred,		virtual void processCFGElement(const CFGElement E, ExplodedNode* Pred,
unsigned StmtIdx, NodeBuilderContext *Ctx)=0;		unsigned StmtIdx, NodeBuilderContext *Ctx)=0;

/// Called by CoreEngine when it starts processing a CFGBlock. The		/// Called by CoreEngine when it starts processing a CFGBlock. The
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

lib/StaticAnalyzer/Core/AnalyzerOptions.cpp

	Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines
	}			}

	bool AnalyzerOptions::shouldDisplayNotesAsEvents() {			bool AnalyzerOptions::shouldDisplayNotesAsEvents() {
	if (!DisplayNotesAsEvents.hasValue())			if (!DisplayNotesAsEvents.hasValue())
	DisplayNotesAsEvents =			DisplayNotesAsEvents =
	getBooleanOption("notes-as-events", /Default=/false);			getBooleanOption("notes-as-events", /Default=/false);
	return DisplayNotesAsEvents.getValue();			return DisplayNotesAsEvents.getValue();
	}			}

				StringRef AnalyzerOptions::getCTUDir() {
				if (!CTUDir.hasValue()) {
				CTUDir = getOptionAsString("ctu-dir", "");
				if (!llvm::sys::fs::is_directory(*CTUDir))
				CTUDir = "";
				dcoughlinUnsubmitted Done Reply Inline Actions Can you also add an analyzer option that is something like 'enable-naive-cross-translation-unit-analysis' and defaults to false? I'd like to avoid using the presence of 'ctu-dir' as an indication that the analyzer should use the naive CTU analysis. This way when if add a less naive CTU analysis we'll be able to the CTUDir for analysis artifacts (such as summaries) for the less naive CTU analysis as well. dcoughlin: Can you also add an analyzer option that is something like 'enable-naive-cross-translation-unit…
				}
				return CTUDir.getValue();
				}

				bool AnalyzerOptions::naiveCTUEnabled() {
				if (!NaiveCTU.hasValue()) {
				NaiveCTU = getBooleanOption("experimental-enable-naive-ctu-analysis",
				xazax.hunAuthorUnsubmitted Not Done Reply Inline Actions This option might not be the most readable but this way experimental is a prefix. Do you prefer this way or something like `enable-experimental-naive-ctu-analysis`? xazax.hun: This option might not be the most readable but this way experimental is a prefix. Do you prefer…
				/Default=/false);
				}
				return NaiveCTU.getValue();
				}

				StringRef AnalyzerOptions::getCTUIndexName() {
				if (!CTUIndexName.hasValue())
				CTUIndexName = getOptionAsString("ctu-index-name", "externalFnMap.txt");
				return CTUIndexName.getValue();
				}

lib/StaticAnalyzer/Core/CMakeLists.txt

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	add_clang_library(clangStaticAnalyzerCore
SymbolManager.cpp		SymbolManager.cpp
Z3ConstraintManager.cpp		Z3ConstraintManager.cpp

LINK_LIBS		LINK_LIBS
clangAST		clangAST
clangASTMatchers		clangASTMatchers
clangAnalysis		clangAnalysis
clangBasic		clangBasic
		clangCrossTU
clangLex		clangLex
clangRewrite		clangRewrite
${Z3_LINK_FILES}		${Z3_LINK_FILES}
)		)

if(CLANG_ANALYZER_WITH_Z3)		if(CLANG_ANALYZER_WITH_Z3)
target_include_directories(clangStaticAnalyzerCore SYSTEM		target_include_directories(clangStaticAnalyzerCore SYSTEM
PRIVATE		PRIVATE
${Z3_INCLUDE_DIR}		${Z3_INCLUDE_DIR}
)		)
endif()		endif()

lib/StaticAnalyzer/Core/CallEvent.cpp

Show All 10 Lines
/// sensitive instances of different kinds of function and method calls		/// sensitive instances of different kinds of function and method calls
/// (C, C++, and Objective-C).		/// (C, C++, and Objective-C).
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
#include "clang/AST/ParentMap.h"		#include "clang/AST/ParentMap.h"
#include "clang/Analysis/ProgramPoint.h"		#include "clang/Analysis/ProgramPoint.h"
		#include "clang/CrossTU/CrossTranslationUnit.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicTypeMap.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicTypeMap.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

#define DEBUG_TYPE "static-analyzer-call-event"		#define DEBUG_TYPE "static-analyzer-call-event"
▲ Show 20 Lines • Show All 337 Lines • ▼ Show 20 Lines	DEBUG({
<< "\n";		<< "\n";
});		});
if (Body) {		if (Body) {
const Decl* Decl = AD->getDecl();		const Decl* Decl = AD->getDecl();
return RuntimeDefinition(Decl);		return RuntimeDefinition(Decl);
}		}
}		}

		SubEngine *Engine = getState()->getStateManager().getOwningEngine();
		AnalyzerOptions &Opts = Engine->getAnalysisManager().options;

		// Try to get CTU definition only if CTUDir is provided.
		if (!Opts.naiveCTUEnabled())
		return RuntimeDefinition();

		dcoughlinUnsubmitted Not Done Reply Inline Actions This downcast is an indication that the CTUContext is living in the wrong class. dcoughlin: This downcast is an indication that the CTUContext is living in the wrong class.
		cross_tu::CrossTranslationUnitContext &CTUCtx =
		*Engine->getCrossTranslationUnitContext();
		llvm::Expected<const FunctionDecl *> CTUDeclOrError =
		CTUCtx.getCrossTUDefinition(FD, Opts.getCTUDir(), Opts.getCTUIndexName());

		if (!CTUDeclOrError) {
		NoQUnsubmitted Done Reply Inline Actions I think `CallEvent` is the last place where i'd prefer to hardcode this filename. Maybe hardcode it in `CrossTranslationUnitContext` or get from `AnalyzerOptions`? (the former, i guess, because i doubt anybody would ever want to change it). NoQ: I think `CallEvent` is the last place where i'd prefer to hardcode this filename. Maybe…
		handleAllErrors(CTUDeclOrError.takeError(),
		dcoughlinUnsubmitted Not Done Reply Inline Actions Can this logic be moved to AnalysisDeclContext->getBody()? CallEvent::getRuntimeDefinition() is really all about modeling function dispatch at run time. It seems odd to have the cross-translation-unit loading (which is about compiler book-keeping rather than semantics) here. dcoughlin: Can this logic be moved to AnalysisDeclContext->getBody()? CallEvent::getRuntimeDefinition()…
		xazax.hunAuthorUnsubmitted Not Done Reply Inline Actions I just tried that and unfortunately, that introduces cyclic dependencies. CrossTU depends on Frontend. Frontend depends on Sema. Sema depends on Analysis. Making Analysis depending on CrossTU introduces the cycle. xazax.hun: I just tried that and unfortunately, that introduces cyclic dependencies. CrossTU depends on…
		[&](const cross_tu::IndexError &IE) {
		CTUCtx.emitCrossTUDiagnostics(IE);
		});
return RuntimeDefinition();		return RuntimeDefinition();
}		}

		return RuntimeDefinition(*CTUDeclOrError);
		}
		NoQUnsubmitted Done Reply Inline Actions Humbly suggests to refactor whatever we need here into `SubEngine`'s virtual method(s). `getAnalysisManager()` is already there, so i guess we only need to expose `getCrossTranslationUnitContext()`. NoQ:* *Humbly suggests to refactor whatever we need here into `SubEngine`'s virtual method(s).

void AnyFunctionCall::getInitialStackFrameContents(		void AnyFunctionCall::getInitialStackFrameContents(
		dcoughlinUnsubmitted Not Done Reply Inline Actions I don't think it makes sense to diagnose index errors here. Doing it when during analysis means that, for example, the parse error could be emitted or not emitted depending on whether the analyzer thinks a particular call site is reached. It would be better to validate/parse the index before starting analysis rather than during analysis itself. dcoughlin: I don't think it makes sense to diagnose index errors here. Doing it when during analysis…
		xazax.hunAuthorUnsubmitted Not Done Reply Inline Actions While I agree, right now this validation is not the part of the analyzer but the responsibility of the "driver" script for example CodeChecker. It is useful to have this diagnostics to catch bugs in the driver. xazax.hun: While I agree, right now this validation is not the part of the analyzer but the responsibility…
const StackFrameContext *CalleeCtx,		const StackFrameContext *CalleeCtx,
BindingsTy &Bindings) const {		BindingsTy &Bindings) const {
const FunctionDecl *D = cast<FunctionDecl>(CalleeCtx->getDecl());		const FunctionDecl *D = cast<FunctionDecl>(CalleeCtx->getDecl());
SValBuilder &SVB = getState()->getStateManager().getSValBuilder();		SValBuilder &SVB = getState()->getStateManager().getSValBuilder();
addParameterValuesToBindings(CalleeCtx, Bindings, SVB, *this,		addParameterValuesToBindings(CalleeCtx, Bindings, SVB, *this,
D->parameters());		D->parameters());
}		}

▲ Show 20 Lines • Show All 832 Lines • Show Last 20 Lines

lib/StaticAnalyzer/Core/ExprEngine.cpp

Show All 17 Lines
#include "clang/AST/CharUnits.h"		#include "clang/AST/CharUnits.h"
#include "clang/AST/ParentMap.h"		#include "clang/AST/ParentMap.h"
#include "clang/Analysis/CFGStmtMap.h"		#include "clang/Analysis/CFGStmtMap.h"
#include "clang/AST/StmtCXX.h"		#include "clang/AST/StmtCXX.h"
#include "clang/AST/StmtObjC.h"		#include "clang/AST/StmtObjC.h"
#include "clang/Basic/Builtins.h"		#include "clang/Basic/Builtins.h"
#include "clang/Basic/PrettyStackTrace.h"		#include "clang/Basic/PrettyStackTrace.h"
#include "clang/Basic/SourceManager.h"		#include "clang/Basic/SourceManager.h"
		#include "clang/CrossTU/CrossTranslationUnit.h"
#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"		#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
#include "clang/StaticAnalyzer/Core/CheckerManager.h"		#include "clang/StaticAnalyzer/Core/CheckerManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/LoopWidening.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/LoopWidening.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/LoopUnrolling.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/LoopUnrolling.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Support/SaveAndRestore.h"		#include "llvm/Support/SaveAndRestore.h"
Show All 30 Lines	REGISTER_TRAIT_WITH_PROGRAMSTATE(InitializedTemporariesSet,
llvm::ImmutableSet<CXXBindTemporaryContext>)		llvm::ImmutableSet<CXXBindTemporaryContext>)

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Engine construction and deletion.		// Engine construction and deletion.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static const char* TagProviderName = "ExprEngine";		static const char* TagProviderName = "ExprEngine";

ExprEngine::ExprEngine(AnalysisManager &mgr, bool gcEnabled,		ExprEngine::ExprEngine(cross_tu::CrossTranslationUnitContext &CTU,
		AnalysisManager &mgr, bool gcEnabled,
SetOfConstDecls *VisitedCalleesIn,		SetOfConstDecls *VisitedCalleesIn,
FunctionSummariesTy *FS,		FunctionSummariesTy *FS, InliningModes HowToInlineIn)
InliningModes HowToInlineIn)		: CTU(CTU), AMgr(mgr),
: AMgr(mgr),
AnalysisDeclContexts(mgr.getAnalysisDeclContextManager()),		AnalysisDeclContexts(mgr.getAnalysisDeclContextManager()),
Engine(*this, FS),		Engine(*this, FS), G(Engine.getGraph()),
G(Engine.getGraph()),
StateMgr(getContext(), mgr.getStoreManagerCreator(),		StateMgr(getContext(), mgr.getStoreManagerCreator(),
mgr.getConstraintManagerCreator(), G.getAllocator(),		mgr.getConstraintManagerCreator(), G.getAllocator(), this),
this),
SymMgr(StateMgr.getSymbolManager()),		SymMgr(StateMgr.getSymbolManager()),
svalBuilder(StateMgr.getSValBuilder()),		svalBuilder(StateMgr.getSValBuilder()), currStmtIdx(0),
currStmtIdx(0), currBldrCtx(nullptr),		currBldrCtx(nullptr), ObjCNoRet(mgr.getASTContext()),
ObjCNoRet(mgr.getASTContext()),
ObjCGCEnabled(gcEnabled), BR(mgr, *this),		ObjCGCEnabled(gcEnabled), BR(mgr, *this),
VisitedCallees(VisitedCalleesIn),		VisitedCallees(VisitedCalleesIn), HowToInline(HowToInlineIn) {
HowToInline(HowToInlineIn)
{
unsigned TrimInterval = mgr.options.getGraphTrimInterval();		unsigned TrimInterval = mgr.options.getGraphTrimInterval();
if (TrimInterval != 0) {		if (TrimInterval != 0) {
// Enable eager node reclaimation when constructing the ExplodedGraph.		// Enable eager node reclaimation when constructing the ExplodedGraph.
G.enableNodeReclamation(TrimInterval);		G.enableNodeReclamation(TrimInterval);
}		}
}		}

ExprEngine::~ExprEngine() {		ExprEngine::~ExprEngine() {
▲ Show 20 Lines • Show All 2,870 Lines • Show Last 20 Lines

lib/StaticAnalyzer/Core/PathDiagnostic.cpp

Show First 20 Lines • Show All 375 Lines • ▼ Show 20 Lines	for ( ; X_I != X_end && Y_I != Y_end; ++X_I, ++Y_I) {
Optional<bool> b = comparePiece(X_I, Y_I);		Optional<bool> b = comparePiece(X_I, Y_I);
if (b.hasValue())		if (b.hasValue())
return b.getValue();		return b.getValue();
}		}

return None;		return None;
}		}

		static bool compareCrossTUSourceLocs(FullSourceLoc XL, FullSourceLoc YL) {
		std::pair<FileID, unsigned> XOffs = XL.getDecomposedLoc();
		std::pair<FileID, unsigned> YOffs = YL.getDecomposedLoc();
		const SourceManager &SM = XL.getManager();
		std::pair<bool, bool> InSameTU = SM.isInTheSameTranslationUnit(XOffs, YOffs);
		if (InSameTU.first)
		return XL.isBeforeInTranslationUnitThan(YL);
		const FileEntry *XFE = SM.getFileEntryForID(XL.getFileID());
		r.stahlUnsubmitted Not Done Reply Inline Actions see comment r.stahl: see comment
		const FileEntry *YFE = SM.getFileEntryForID(YL.getFileID());
		if (!XFE \|\| !YFE)
		return XFE && !YFE;
		return XFE->getName() < YFE->getName();
		nikhguptUnsubmitted Not Done Reply Inline Actions getName could yield incorrect results if two files in the project have the same name. This might break the assert for PathDiagnostics 'total ordering' and 'uniqueness'. Maybe replacing FileEntry's getName with FullSourceLoc's getFileID could resolve this. nikhgupt: getName could yield incorrect results if two files in the project have the same name. This…
		xazax.hunAuthorUnsubmitted Not Done Reply Inline Actions Thank you, this is a known problem that we plan to address in a follow-up patch. xazax.hun: Thank you, this is a known problem that we plan to address in a follow-up patch.
		}

static bool compare(const PathDiagnostic &X, const PathDiagnostic &Y) {		static bool compare(const PathDiagnostic &X, const PathDiagnostic &Y) {
FullSourceLoc XL = X.getLocation().asLocation();		FullSourceLoc XL = X.getLocation().asLocation();
FullSourceLoc YL = Y.getLocation().asLocation();		FullSourceLoc YL = Y.getLocation().asLocation();
if (XL != YL)		if (XL != YL)
return XL.isBeforeInTranslationUnitThan(YL);		return compareCrossTUSourceLocs(XL, YL);
if (X.getBugType() != Y.getBugType())		if (X.getBugType() != Y.getBugType())
return X.getBugType() < Y.getBugType();		return X.getBugType() < Y.getBugType();
if (X.getCategory() != Y.getCategory())		if (X.getCategory() != Y.getCategory())
return X.getCategory() < Y.getCategory();		return X.getCategory() < Y.getCategory();
if (X.getVerboseDescription() != Y.getVerboseDescription())		if (X.getVerboseDescription() != Y.getVerboseDescription())
return X.getVerboseDescription() < Y.getVerboseDescription();		return X.getVerboseDescription() < Y.getVerboseDescription();
if (X.getShortDescription() != Y.getShortDescription())		if (X.getShortDescription() != Y.getShortDescription())
return X.getShortDescription() < Y.getShortDescription();		return X.getShortDescription() < Y.getShortDescription();
if (X.getDeclWithIssue() != Y.getDeclWithIssue()) {		if (X.getDeclWithIssue() != Y.getDeclWithIssue()) {
const Decl *XD = X.getDeclWithIssue();		const Decl *XD = X.getDeclWithIssue();
if (!XD)		if (!XD)
return true;		return true;
const Decl *YD = Y.getDeclWithIssue();		const Decl *YD = Y.getDeclWithIssue();
if (!YD)		if (!YD)
return false;		return false;
SourceLocation XDL = XD->getLocation();		SourceLocation XDL = XD->getLocation();
SourceLocation YDL = YD->getLocation();		SourceLocation YDL = YD->getLocation();
if (XDL != YDL) {		if (XDL != YDL) {
const SourceManager &SM = XL.getManager();		const SourceManager &SM = XL.getManager();
return SM.isBeforeInTranslationUnit(XDL, YDL);		return compareCrossTUSourceLocs(FullSourceLoc(XDL, SM),
		FullSourceLoc(YDL, SM));
}		}
		NoQUnsubmitted Done Reply Inline Actions It seems to me that `XDL` and `YDL` are exactly the same as `XL` and `YL` we've seen at the beginning of the function. ...we still have only one `SourceManager`, right? NoQ: It seems to me that `XDL` and `YDL` are exactly the same as `XL` and `YL` we've seen at the…
		xazax.hunAuthorUnsubmitted Done Reply Inline Actions Is this true? One is the location associated with the PathDiagnostic the other is the location of the Decl associated with the issue. I do not have deep understanding of this part of the code but not sure if these are guaranteed to be the same. xazax.hun: Is this true? One is the location associated with the PathDiagnostic the other is the location…
		NoQUnsubmitted Done Reply Inline Actions Whoops, you're totally right, never mind. Comments might have probably helped me understand that faster. NoQ: Whoops, you're totally right, never mind. Comments might have probably helped me understand…
		xazax.hunAuthorUnsubmitted Not Done Reply Inline Actions Sure, comments might help, but I want to keep the changes in this patch focused and minimal, so I prefer to do such modifications in a separate patch. xazax.hun: Sure, comments might help, but I want to keep the changes in this patch focused and minimal, so…
}		}
PathDiagnostic::meta_iterator XI = X.meta_begin(), XE = X.meta_end();		PathDiagnostic::meta_iterator XI = X.meta_begin(), XE = X.meta_end();
PathDiagnostic::meta_iterator YI = Y.meta_begin(), YE = Y.meta_end();		PathDiagnostic::meta_iterator YI = Y.meta_begin(), YE = Y.meta_end();
if (XE - XI != YE - YI)		if (XE - XI != YE - YI)
return (XE - XI) < (YE - YI);		return (XE - XI) < (YE - YI);
for ( ; XI != XE ; ++XI, ++YI) {		for ( ; XI != XE ; ++XI, ++YI) {
if (XI != YI)		if (XI != YI)
return (XI) < (YI);		return (XI) < (YI);
▲ Show 20 Lines • Show All 828 Lines • Show Last 20 Lines

lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp

Show All 16 Lines
#include "clang/AST/DeclCXX.h"		#include "clang/AST/DeclCXX.h"
#include "clang/AST/DeclObjC.h"		#include "clang/AST/DeclObjC.h"
#include "clang/AST/RecursiveASTVisitor.h"		#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Analysis/Analyses/LiveVariables.h"		#include "clang/Analysis/Analyses/LiveVariables.h"
#include "clang/Analysis/CFG.h"		#include "clang/Analysis/CFG.h"
#include "clang/Analysis/CallGraph.h"		#include "clang/Analysis/CallGraph.h"
#include "clang/Analysis/CodeInjector.h"		#include "clang/Analysis/CodeInjector.h"
#include "clang/Basic/SourceManager.h"		#include "clang/Basic/SourceManager.h"
		#include "clang/CrossTU/CrossTranslationUnit.h"
#include "clang/Frontend/CompilerInstance.h"		#include "clang/Frontend/CompilerInstance.h"
#include "clang/Lex/Preprocessor.h"		#include "clang/Lex/Preprocessor.h"
#include "clang/StaticAnalyzer/Checkers/LocalCheckers.h"		#include "clang/StaticAnalyzer/Checkers/LocalCheckers.h"
#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"		#include "clang/StaticAnalyzer/Core/AnalyzerOptions.h"
#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"		#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
#include "clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h"		#include "clang/StaticAnalyzer/Core/BugReporter/PathDiagnostic.h"
#include "clang/StaticAnalyzer/Core/CheckerManager.h"		#include "clang/StaticAnalyzer/Core/CheckerManager.h"
#include "clang/StaticAnalyzer/Core/PathDiagnosticConsumers.h"		#include "clang/StaticAnalyzer/Core/PathDiagnosticConsumers.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
#include "clang/StaticAnalyzer/Frontend/CheckerRegistration.h"		#include "clang/StaticAnalyzer/Frontend/CheckerRegistration.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"
#include "llvm/Support/Timer.h"		#include "llvm/Support/Timer.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <memory>		#include <memory>
#include <queue>		#include <queue>
#include <utility>		#include <utility>

using namespace clang;		using namespace clang;
		danielmarjamakiUnsubmitted Done Reply Inline Actions I believe sys/file.h and unistd.h are posix includes. danielmarjamaki: I believe sys/file.h and unistd.h are posix includes.
using namespace ento;		using namespace ento;

#define DEBUG_TYPE "AnalysisConsumer"		#define DEBUG_TYPE "AnalysisConsumer"

static std::unique_ptr<ExplodedNode::Auditor> CreateUbiViz();		static std::unique_ptr<ExplodedNode::Auditor> CreateUbiViz();

STATISTIC(NumFunctionTopLevel, "The # of functions at top level.");		STATISTIC(NumFunctionTopLevel, "The # of functions at top level.");
STATISTIC(NumFunctionsAnalyzed,		STATISTIC(NumFunctionsAnalyzed,
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines

public:		public:
ASTContext *Ctx;		ASTContext *Ctx;
const Preprocessor &PP;		const Preprocessor &PP;
const std::string OutDir;		const std::string OutDir;
AnalyzerOptionsRef Opts;		AnalyzerOptionsRef Opts;
ArrayRef<std::string> Plugins;		ArrayRef<std::string> Plugins;
CodeInjector *Injector;		CodeInjector *Injector;
		cross_tu::CrossTranslationUnitContext CTU;

/// \brief Stores the declarations from the local translation unit.		/// \brief Stores the declarations from the local translation unit.
/// Note, we pre-compute the local declarations at parse time as an		/// Note, we pre-compute the local declarations at parse time as an
/// optimization to make sure we do not deserialize everything from disk.		/// optimization to make sure we do not deserialize everything from disk.
/// The local declaration to all declarations ratio might be very small when		/// The local declaration to all declarations ratio might be very small when
/// working with a PCH file.		/// working with a PCH file.
SetOfDecls LocalTUDecls;		SetOfDecls LocalTUDecls;

// Set of PathDiagnosticConsumers. Owned by AnalysisManager.		// Set of PathDiagnosticConsumers. Owned by AnalysisManager.
PathDiagnosticConsumers PathConsumers;		PathDiagnosticConsumers PathConsumers;

StoreManagerCreator CreateStoreMgr;		StoreManagerCreator CreateStoreMgr;
ConstraintManagerCreator CreateConstraintMgr;		ConstraintManagerCreator CreateConstraintMgr;

std::unique_ptr<CheckerManager> checkerMgr;		std::unique_ptr<CheckerManager> checkerMgr;
std::unique_ptr<AnalysisManager> Mgr;		std::unique_ptr<AnalysisManager> Mgr;

/// Time the analyzes time of each translation unit.		/// Time the analyzes time of each translation unit.
static llvm::Timer* TUTotalTimer;		static llvm::Timer* TUTotalTimer;

/// The information about analyzed functions shared throughout the		/// The information about analyzed functions shared throughout the
/// translation unit.		/// translation unit.
FunctionSummariesTy FunctionSummaries;		FunctionSummariesTy FunctionSummaries;

AnalysisConsumer(const Preprocessor &pp, const std::string &outdir,		AnalysisConsumer(CompilerInstance &CI, const std::string &outdir,
AnalyzerOptionsRef opts, ArrayRef<std::string> plugins,		AnalyzerOptionsRef opts, ArrayRef<std::string> plugins,
CodeInjector *injector)		CodeInjector *injector)
		dcoughlinUnsubmitted Done Reply Inline Actions There is no need for AnalysisConsumer to take both a CompilerInstance and a Preprocessor. You can just call `getPreprocessor()` to get the preprocessor from the CompilerInstance dcoughlin: There is no need for AnalysisConsumer to take both a CompilerInstance and a Preprocessor. You…
: RecVisitorMode(0), RecVisitorBR(nullptr), Ctx(nullptr), PP(pp),		: RecVisitorMode(0), RecVisitorBR(nullptr), Ctx(nullptr),
OutDir(outdir), Opts(std::move(opts)), Plugins(plugins),		PP(CI.getPreprocessor()), OutDir(outdir), Opts(std::move(opts)),
Injector(injector) {		Plugins(plugins), Injector(injector), CTU(CI) {
DigestAnalyzerOptions();		DigestAnalyzerOptions();
if (Opts->PrintStats) {		if (Opts->PrintStats) {
llvm::EnableStatistics(false);		llvm::EnableStatistics(false);
TUTotalTimer = new llvm::Timer("time", "Analyzer Total Time");		TUTotalTimer = new llvm::Timer("time", "Analyzer Total Time");
}		}
}		}

~AnalysisConsumer() override {		~AnalysisConsumer() override {
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines
}		}

static bool shouldSkipFunction(const Decl *D,		static bool shouldSkipFunction(const Decl *D,
const SetOfConstDecls &Visited,		const SetOfConstDecls &Visited,
const SetOfConstDecls &VisitedAsTopLevel) {		const SetOfConstDecls &VisitedAsTopLevel) {
if (VisitedAsTopLevel.count(D))		if (VisitedAsTopLevel.count(D))
return true;		return true;

// We want to re-analyse the functions as top level in the following cases:		// We want to re-analyse the functions as top level in the following cases:
		danielmarjamakiUnsubmitted Done Reply Inline Actions this is posix file-I/O danielmarjamaki: this is posix file-I/O
		a.sidorinUnsubmitted Done Reply Inline Actions Does C++ API allows file-level synchronization? We need it to support multiple processes writing same file. a.sidorin: Does C++ API allows file-level synchronization? We need it to support multiple processes…
// - The 'init' methods should be reanalyzed because		// - The 'init' methods should be reanalyzed because
// ObjCNonNilReturnValueChecker assumes that '[super init]' never returns		// ObjCNonNilReturnValueChecker assumes that '[super init]' never returns
// 'nil' and unless we analyze the 'init' functions as top level, we will		// 'nil' and unless we analyze the 'init' functions as top level, we will
// not catch errors within defensive code.		// not catch errors within defensive code.
// - We want to reanalyze all ObjC methods as top level to report Retain		// - We want to reanalyze all ObjC methods as top level to report Retain
// Count naming convention errors more aggressively.		// Count naming convention errors more aggressively.
if (isa<ObjCMethodDecl>(D))		if (isa<ObjCMethodDecl>(D))
return false;		return false;
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	void AnalysisConsumer::ActionExprEngine(Decl *D, bool ObjCGCEnabled,
// FIXME: Inter-procedural analysis will need to handle invalid CFGs.		// FIXME: Inter-procedural analysis will need to handle invalid CFGs.
if (!Mgr->getCFG(D))		if (!Mgr->getCFG(D))
return;		return;

// See if the LiveVariables analysis scales.		// See if the LiveVariables analysis scales.
if (!Mgr->getAnalysisDeclContext(D)->getAnalysis<RelaxedLiveVariables>())		if (!Mgr->getAnalysisDeclContext(D)->getAnalysis<RelaxedLiveVariables>())
return;		return;

ExprEngine Eng(*Mgr, ObjCGCEnabled, VisitedCallees, &FunctionSummaries,IMode);		ExprEngine Eng(CTU, *Mgr, ObjCGCEnabled, VisitedCallees, &FunctionSummaries,
		IMode);

// Set the graph auditor.		// Set the graph auditor.
std::unique_ptr<ExplodedNode::Auditor> Auditor;		std::unique_ptr<ExplodedNode::Auditor> Auditor;
if (Mgr->options.visualizeExplodedGraphWithUbiGraph) {		if (Mgr->options.visualizeExplodedGraphWithUbiGraph) {
Auditor = CreateUbiViz();		Auditor = CreateUbiViz();
ExplodedNode::SetAuditor(Auditor.get());		ExplodedNode::SetAuditor(Auditor.get());
}		}

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
ento::CreateAnalysisConsumer(CompilerInstance &CI) {		ento::CreateAnalysisConsumer(CompilerInstance &CI) {
// Disable the effects of '-Werror' when using the AnalysisConsumer.		// Disable the effects of '-Werror' when using the AnalysisConsumer.
CI.getPreprocessor().getDiagnostics().setWarningsAsErrors(false);		CI.getPreprocessor().getDiagnostics().setWarningsAsErrors(false);

AnalyzerOptionsRef analyzerOpts = CI.getAnalyzerOpts();		AnalyzerOptionsRef analyzerOpts = CI.getAnalyzerOpts();
bool hasModelPath = analyzerOpts->Config.count("model-path") > 0;		bool hasModelPath = analyzerOpts->Config.count("model-path") > 0;

return llvm::make_unique<AnalysisConsumer>(		return llvm::make_unique<AnalysisConsumer>(
CI.getPreprocessor(), CI.getFrontendOpts().OutputFile, analyzerOpts,		CI, CI.getFrontendOpts().OutputFile, analyzerOpts,
CI.getFrontendOpts().Plugins,		CI.getFrontendOpts().Plugins,
hasModelPath ? new ModelInjector(CI) : nullptr);		hasModelPath ? new ModelInjector(CI) : nullptr);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Ubigraph Visualization. FIXME: Move to separate file.		// Ubigraph Visualization. FIXME: Move to separate file.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

lib/StaticAnalyzer/Frontend/CMakeLists.txt

Show All 9 Lines	add_clang_library(clangStaticAnalyzerFrontend
ModelConsumer.cpp		ModelConsumer.cpp
FrontendActions.cpp		FrontendActions.cpp
ModelInjector.cpp		ModelInjector.cpp

LINK_LIBS		LINK_LIBS
clangAST		clangAST
clangAnalysis		clangAnalysis
clangBasic		clangBasic
		clangCrossTU
clangFrontend		clangFrontend
clangLex		clangLex
clangStaticAnalyzerCheckers		clangStaticAnalyzerCheckers
clangStaticAnalyzerCore		clangStaticAnalyzerCore
)		)

test/Analysis/Inputs/ctu-chain.cpp

This file was added.

				int h_chain(int x) {
				return x * 2;
				}

				namespace chns {
				int chf3(int x);

				int chf2(int x) {
				return chf3(x);
				}

				class chcls {
				public:
				int chf4(int x);
				};

				int chcls::chf4(int x) {
				return x * 3;
				}
				}

test/Analysis/Inputs/ctu-other.cpp

This file was added.

				int callback_to_main(int x);
				int f(int x) {
				return x - 1;
				}

				int g(int x) {
				return callback_to_main(x) + 1;
				}

				int h_chain(int);

				int h(int x) {
				return 2 * h_chain(x);
				}

				namespace myns {
				int fns(int x) {
				return x + 7;
				}

				namespace embed_ns {
				int fens(int x) {
				return x - 3;
				}
				}

				class embed_cls {
				public:
				int fecl(int x) {
				return x - 7;
				}
				};
				}

				class mycls {
				public:
				int fcl(int x) {
				return x + 5;
				}
				static int fscl(int x) {
				return x + 6;
				}

				class embed_cls2 {
				public:
				int fecl2(int x) {
				return x - 11;
				}
				};
				};

				namespace chns {
				int chf2(int x);

				class chcls {
				public:
				int chf4(int x);
				};

				int chf3(int x) {
				return chcls().chf4(x);
				}

				int chf1(int x) {
				return chf2(x);
				}
				}

test/Analysis/Inputs/externalFnMap.txt

This file was added.

				c:@N@chns@F@chf1#I# ctu-other.cpp.ast
				c:@N@myns@N@embed_ns@F@fens#I# ctu-other.cpp.ast
				NoQUnsubmitted Done Reply Inline Actions These tests use a pre-made external function map. Are you willing to add tests for generating function maps? That'd be useful, in my opinion, because it'd actually tell people how to run the whole thing. NoQ: These tests use a pre-made external function map. Are you willing to add tests for generating…
				xazax.hunAuthorUnsubmitted Done Reply Inline Actions Good idea! We will add a test for that. xazax.hun: Good idea! We will add a test for that.
				c:@F@g#I# ctu-other.cpp.ast
				c:@S@mycls@F@fscl#I#S ctu-other.cpp.ast
				c:@S@mycls@F@fcl#I# ctu-other.cpp.ast
				c:@N@myns@S@embed_cls@F@fecl#I# ctu-other.cpp.ast
				c:@S@mycls@S@embed_cls2@F@fecl2#I# ctu-other.cpp.ast
				c:@F@f#I# ctu-other.cpp.ast
				c:@N@myns@F@fns#I# ctu-other.cpp.ast
				c:@F@h#I# ctu-other.cpp.ast
				c:@F@h_chain#I# ctu-chain.cpp.ast
				c:@N@chns@S@chcls@F@chf4#I# ctu-chain.cpp.ast
				c:@N@chns@F@chf2#I# ctu-chain.cpp.ast

test/Analysis/ctu-main.cpp

This file was added.

				// RUN: mkdir -p %T/ctudir
				// RUN: %clang_cc1 -triple x86_64-pc-linux-gnu -emit-pch -o %T/ctudir/ctu-other.cpp.ast %S/Inputs/ctu-other.cpp
				// RUN: %clang_cc1 -triple x86_64-pc-linux-gnu -emit-pch -o %T/ctudir/ctu-chain.cpp.ast %S/Inputs/ctu-chain.cpp
				// RUN: cp %S/Inputs/externalFnMap.txt %T/ctudir/
				// RUN: %clang_cc1 -triple x86_64-pc-linux-gnu -fsyntax-only -analyze -analyzer-checker=core,debug.ExprInspection -analyzer-config experimental-enable-naive-ctu-analysis=true -analyzer-config ctu-dir=%T/ctudir -verify %s

				void clang_analyzer_eval(int);

				int f(int);
				int g(int);
				int h(int);

				int callback_to_main(int x) { return x + 1; }

				namespace myns {
				int fns(int x);

				namespace embed_ns {
				int fens(int x);
				}

				class embed_cls {
				public:
				int fecl(int x);
				};
				}

				class mycls {
				public:
				int fcl(int x);
				static int fscl(int x);

				class embed_cls2 {
				public:
				int fecl2(int x);
				};
				};

				namespace chns {
				int chf1(int x);
				}

				int main() {
				clang_analyzer_eval(f(3) == 2); // expected-warning{{TRUE}}
				clang_analyzer_eval(f(4) == 3); // expected-warning{{TRUE}}
				clang_analyzer_eval(f(5) == 3); // expected-warning{{FALSE}}
				clang_analyzer_eval(g(4) == 6); // expected-warning{{TRUE}}
				clang_analyzer_eval(h(2) == 8); // expected-warning{{TRUE}}

				clang_analyzer_eval(myns::fns(2) == 9); // expected-warning{{TRUE}}
				clang_analyzer_eval(myns::embed_ns::fens(2) == -1); // expected-warning{{TRUE}}
				clang_analyzer_eval(mycls().fcl(1) == 6); // expected-warning{{TRUE}}
				clang_analyzer_eval(mycls::fscl(1) == 7); // expected-warning{{TRUE}}
				clang_analyzer_eval(myns::embed_cls().fecl(1) == -6); // expected-warning{{TRUE}}
				clang_analyzer_eval(mycls::embed_cls2().fecl2(0) == -11); // expected-warning{{TRUE}}

				clang_analyzer_eval(chns::chf1(4) == 12); // expected-warning{{TRUE}}
				}

tools/scan-build-py/libscanbuild/init.py

	Show All 13 Lines
	import shlex			import shlex
	import subprocess			import subprocess
	import sys			import sys

	ENVIRONMENT_KEY = 'INTERCEPT_BUILD'			ENVIRONMENT_KEY = 'INTERCEPT_BUILD'

	Execution = collections.namedtuple('Execution', ['pid', 'cwd', 'cmd'])			Execution = collections.namedtuple('Execution', ['pid', 'cwd', 'cmd'])

				CtuConfig = collections.namedtuple('CtuConfig', ['collect', 'analyze', 'dir',
				'func_map_cmd'])


	def duplicate_check(method):			def duplicate_check(method):
	""" Predicate to detect duplicated entries.			""" Predicate to detect duplicated entries.

	Unique hash method can be use to detect duplicates. Entries are			Unique hash method can be use to detect duplicates. Entries are
	represented as dictionaries, which has no default hash method.			represented as dictionaries, which has no default hash method.
	This implementation uses a set datatype to store the unique hash values.			This implementation uses a set datatype to store the unique hash values.

	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

tools/scan-build-py/libscanbuild/analyze.py

Show All 16 Lines
import json		import json
import logging		import logging
import multiprocessing		import multiprocessing
import tempfile		import tempfile
import functools		import functools
import subprocess		import subprocess
import contextlib		import contextlib
import datetime		import datetime
		import shutil
		import glob
		from collections import defaultdict

from libscanbuild import command_entry_point, compiler_wrapper, \		from libscanbuild import command_entry_point, compiler_wrapper, \
wrapper_environment, run_build, run_command		wrapper_environment, run_build, run_command, CtuConfig
from libscanbuild.arguments import parse_args_for_scan_build, \		from libscanbuild.arguments import parse_args_for_scan_build, \
parse_args_for_analyze_build		parse_args_for_analyze_build
from libscanbuild.intercept import capture		from libscanbuild.intercept import capture
from libscanbuild.report import document		from libscanbuild.report import document
from libscanbuild.compilation import split_command, classify_source, \		from libscanbuild.compilation import split_command, classify_source, \
compiler_language		compiler_language
from libscanbuild.clang import get_version, get_arguments		from libscanbuild.clang import get_version, get_arguments, get_triple_arch
from libscanbuild.shell import decode		from libscanbuild.shell import decode

__all__ = ['scan_build', 'analyze_build', 'analyze_compiler_wrapper']		__all__ = ['scan_build', 'analyze_build', 'analyze_compiler_wrapper']

COMPILER_WRAPPER_CC = 'analyze-cc'		COMPILER_WRAPPER_CC = 'analyze-cc'
COMPILER_WRAPPER_CXX = 'analyze-c++'		COMPILER_WRAPPER_CXX = 'analyze-c++'

		CTU_FUNCTION_MAP_FILENAME = 'externalFnMap.txt'
		george.karpenkovUnsubmitted Done Reply Inline Actions What would happen when multiple analyzer runs are launched concurrently in the same directory? Would they race on this file? george.karpenkov: What would happen when multiple analyzer runs are launched concurrently in the same directory?
		gerazoUnsubmitted Done Reply Inline Actions Yes and no. The 1st, collect part of CTU creates this file by aggregating all data from the build system, while the 2nd part which does the analysis itself only reads it. Multiple analysis can use this file simultaneously without problem. However, multiple collect phases launched on a system does not make sense. In this case, the later one would write over the previous results with the same data. gerazo: Yes and no. The 1st, collect part of CTU creates this file by aggregating all data from the…
		george.karpenkovUnsubmitted Done Reply Inline Actions However, multiple collect phases launched on a system does not make sense Why not? What about a system with multiple users, where code is in the shared directory? Or even simpler: I've launched a background process, forgot about it, and then launched it again? In this case, the later one would write over the previous results with the same data. That is probably fine, I am more worried about racing, where process B would be reading a partially overriden file (not completely sure whether it is possible) george.karpenkov: > However, multiple collect phases launched on a system does not make sense Why not? What…
		gerazoUnsubmitted Done Reply Inline Actions I see your point. In order to create the multiple user scenario you've mentioned, those users need to give the exact same output folders for their jobs. Our original bet was that our users are not doing this as the scan-build tool itself is also cannot be used this way. Still the process left there is something to consider. I will plan some kind of locking mechanism to avoid situations like this. gerazo: I see your point. In order to create the multiple user scenario you've mentioned, those users…
		CTU_TEMP_FNMAP_FOLDER = 'tmpExternalFnMaps'


@command_entry_point		@command_entry_point
def scan_build():		def scan_build():
""" Entry point for scan-build command. """		""" Entry point for scan-build command. """

args = parse_args_for_scan_build()		args = parse_args_for_scan_build()
# will re-assign the report directory as new output		# will re-assign the report directory as new output
with report_directory(args.output, args.keep_empty) as args.output:		with report_directory(args.output, args.keep_empty) as args.output:
# Run against a build command. there are cases, when analyzer run		# Run against a build command. there are cases, when analyzer run
# is not required. But we need to set up everything for the		# is not required. But we need to set up everything for the
# wrappers, because 'configure' needs to capture the CC/CXX values		# wrappers, because 'configure' needs to capture the CC/CXX values
# for the Makefile.		# for the Makefile.
if args.intercept_first:		if args.intercept_first:
# Run build command with intercept module.		# Run build command with intercept module.
exit_code = capture(args)		exit_code = capture(args)
# Run the analyzer against the captured commands.		# Run the analyzer against the captured commands.
if need_analyzer(args.build):		if need_analyzer(args.build):
run_analyzer_parallel(args)		govern_analyzer_runs(args)
		george.karpenkovUnsubmitted Done Reply Inline Actions `run_analyzer_with_ctu` is an unfortunate function name, since it may or may not launch CTU depending on the passed arguments. Could we just call it `run_analyzer`? george.karpenkov: `run_analyzer_with_ctu` is an unfortunate function name, since it may or may not launch CTU…
		gerazoUnsubmitted Done Reply Inline Actions We have an other run_analyzer method but still, it is a good idead, I will make something better up. gerazo: We have an other run_analyzer method but still, it is a good idead, I will make something…
else:		else:
# Run build command and analyzer with compiler wrappers.		# Run build command and analyzer with compiler wrappers.
environment = setup_environment(args)		environment = setup_environment(args)
exit_code = run_build(args.build, env=environment)		exit_code = run_build(args.build, env=environment)
# Cover report generation and bug counting.		# Cover report generation and bug counting.
number_of_bugs = document(args)		number_of_bugs = document(args)
# Set exit status as it was requested.		# Set exit status as it was requested.
return number_of_bugs if args.status_bugs else exit_code		return number_of_bugs if args.status_bugs else exit_code


@command_entry_point		@command_entry_point
def analyze_build():		def analyze_build():
""" Entry point for analyze-build command. """		""" Entry point for analyze-build command. """

args = parse_args_for_analyze_build()		args = parse_args_for_analyze_build()
# will re-assign the report directory as new output		# will re-assign the report directory as new output
with report_directory(args.output, args.keep_empty) as args.output:		with report_directory(args.output, args.keep_empty) as args.output:
# Run the analyzer against a compilation db.		# Run the analyzer against a compilation db.
run_analyzer_parallel(args)		govern_analyzer_runs(args)
# Cover report generation and bug counting.		# Cover report generation and bug counting.
number_of_bugs = document(args)		number_of_bugs = document(args)
# Set exit status as it was requested.		# Set exit status as it was requested.
return number_of_bugs if args.status_bugs else 0		return number_of_bugs if args.status_bugs else 0


def need_analyzer(args):		def need_analyzer(args):
""" Check the intent of the build command.		""" Check the intent of the build command.

When static analyzer run against project configure step, it should be		When static analyzer run against project configure step, it should be
silent and no need to run the analyzer or generate report.		silent and no need to run the analyzer or generate report.

To run `scan-build` against the configure step might be neccessary,		To run `scan-build` against the configure step might be neccessary,
when compiler wrappers are used. That's the moment when build setup		when compiler wrappers are used. That's the moment when build setup
check the compiler and capture the location for the build process. """		check the compiler and capture the location for the build process. """

return len(args) and not re.search('configure\|autogen', args[0])		return len(args) and not re.search('configure\|autogen', args[0])


		def prefix_with(constant, pieces):
		george.karpenkovUnsubmitted Done Reply Inline Actions Actually I would prefer a separate NFC PR just moving this function. This PR is already too complicated as it is. george.karpenkov: Actually I would prefer a separate NFC PR just moving this function. This PR is already too…
		gerazoUnsubmitted Done Reply Inline Actions Yes. The only reason was for the move to make it testable. However, we need more tests as you wrote down below. gerazo: Yes. The only reason was for the move to make it testable. However, we need more tests as you…
		george.karpenkovUnsubmitted Done Reply Inline Actions Sure, but that separate PR can also include tests. george.karpenkov: Sure, but that separate PR can also include tests.
		""" From a sequence create another sequence where every second element
		is from the original sequence and the odd elements are the prefix.

		eg.: prefix_with(0, [1,2,3]) creates [0, 1, 0, 2, 0, 3] """

		return [elem for piece in pieces for elem in [constant, piece]]


		def get_ctu_config_from_args(args):
		""" CTU configuration is created from the chosen phases and dir. """

		return (
		CtuConfig(collect=args.ctu_phases.collect,
		analyze=args.ctu_phases.analyze,
		dir=args.ctu_dir,
		func_map_cmd=args.func_map_cmd)
		if hasattr(args, 'ctu_phases') and hasattr(args.ctu_phases, 'dir')
		george.karpenkovUnsubmitted Done Reply Inline Actions Extensive `hasattr` usage is often a codesmell, and it hinders understanding. Firstly, shouldn't `args.ctu_phases.dir` be always available, judging from the code in `arguments.py`? Secondly, in what cases `args.ctu_phases` is not available when we are already going down the CTU code path? Shouldn't we throw an error at that point instead of creating a default configuration? (default-configurations-instead-of-crashing is also another way to introduce very subtle and hard-to-catch error modes) george.karpenkov: Extensive `hasattr` usage is often a codesmell, and it hinders understanding. Firstly…
		gerazoUnsubmitted Done Reply Inline Actions It definitely needs more comments, so thanks, I will put them here. There are two separate possibilities here for this code part: The user does not want CTU at all. In this case, no CTU phases are switched on (collect and analyze), so no CTU code will run. This is why dir has no importance in this case. CTU capabilities were not even detected, so the help and available arguments about CTU are also missing. In this case we create a dummy config telling that everything is switched off. The reason for using hasattr was that not having CTU capabilities is not an exceptional condition, rather a matter of configuration. I felt that handling this with an exception would be misleading. gerazo: It definitely needs more comments, so thanks, I will put them here. There are two separate…
		george.karpenkovUnsubmitted Done Reply Inline Actions Right, but instead of doing (1) and (2) can we have a separate (maybe hidden from user) param called e.g. `ctu_enabled` which would explicitly communicate that? george.karpenkov: Right, but instead of doing (1) and (2) can we have a separate (maybe hidden from user) param…
		else CtuConfig(collect=False, analyze=False, dir='', func_map_cmd=''))


		def get_ctu_config_from_json(ctu_conf_json):
		""" CTU configuration is created from the chosen phases and dir. """

		ctu_config = json.loads(ctu_conf_json)
		# Recover namedtuple from json when coming from analyze-cc or analyze-c++
		george.karpenkovUnsubmitted Done Reply Inline Actions Could you also specify what `func_map_lines` is (file? list? of strings?) in the docstring? There's specific Sphinx syntax for that. Otherwise it is hard to follow. george.karpenkov: Could you also specify what `func_map_lines` is (file? list? of strings?) in the docstring?
		gerazoUnsubmitted Done Reply Inline Actions Thanks, I'll do it. gerazo: Thanks, I'll do it.
		return CtuConfig(collect=ctu_config[0],
		analyze=ctu_config[1],
		dir=ctu_config[2],
		func_map_cmd=ctu_config[3])


		def create_global_ctu_function_map(func_map_lines):
		""" Takes iterator of individual function maps and creates a global map
		keeping only unique names. We leave conflicting names out of CTU.

		george.karpenkovUnsubmitted Done Reply Inline Actions Could be improved with `defaultdict`: mangled_to_asts = defaultdict(set) ... mangled_to_asts[mangled_name].add(ast_file) george.karpenkov: Could be improved with `defaultdict`: ``` mangled_to_asts = defaultdict(set) ...
		:param func_map_lines: Contains the id of a function (mangled name) and
		the originating source (the corresponding AST file) name.
		:type func_map_lines: Iterator of str.
		:returns: Mangled name - AST file pairs.
		:rtype: List of (str, str) tuples.
		"""
		george.karpenkovUnsubmitted Done Reply Inline Actions Firstly, no need to modify the set in order to get the first element, just use `next(iter(ast_files))`. Secondly, when exactly do conflicting names happen? Is it a bug? Shouldn't the tool log such cases? george.karpenkov: Firstly, no need to modify the set in order to get the first element, just use `next(iter…
		gerazoUnsubmitted Done Reply Inline Actions Nice catch, thanks. For your second question: Unfortunately, it is not a bug, rather a misfeature which we can't handle yet. There can be tricky build-systems where a single file with different configurations is built multiple times, so the same function signature will be present in multiple link units. The other option is when two separate compile units have an exact same function signature, but they are never linked together, so it is not a problem in the build itself. In this case, the cross compilation unit functionality cannot tell exactly which compilation unit to turn to for the function, because there are multiple valid choices. In this case, we deliberately leave such functions out to avoid potential problems. It deserves a comment I think. gerazo: Nice catch, thanks. For your second question: Unfortunately, it is not a bug, rather a…
		george.karpenkovUnsubmitted Done Reply Inline Actions In this case, we deliberately leave such functions out to avoid potential problems We probably want to log it, don't we? george.karpenkov: > In this case, we deliberately leave such functions out to avoid potential problems We…

		george.karpenkovUnsubmitted Done Reply Inline Actions Overall, instead of creating a dictionary with multiple elements, and then converting to a list, it's much simper to only add an element to `mangled_to_asts` when it is not already mapped to something (probably logging a message otherwise), and then just return `mangled_to_asts.items()` george.karpenkov: Overall, instead of creating a dictionary with multiple elements, and then converting to a list…
		gerazoUnsubmitted Done Reply Inline Actions The reason for the previous is that we need to count the occurence number ofdifferent mappings only let those pass through which don't have multiple variations. gerazo: The reason for the previous is that we need to count the occurence number ofdifferent mappings…
		george.karpenkovUnsubmitted Done Reply Inline Actions ah, OK! george.karpenkov: ah, OK!
		mangled_to_asts = defaultdict(set)

		for line in func_map_lines:
		mangled_name, ast_file = line.strip().split(' ', 1)
		mangled_to_asts[mangled_name].add(ast_file)

		mangled_ast_pairs = []

		for mangled_name, ast_files in mangled_to_asts.items():
		if len(ast_files) == 1:
		mangled_ast_pairs.append((mangled_name, next(iter(ast_files))))

		return mangled_ast_pairs


		george.karpenkovUnsubmitted Done Reply Inline Actions Firstly, is `glob.glob` actually random, as the docstring is saying? If yes, can we make it not to be random (e.g. with `sorted`), as dealing with randomized input makes tracking down bugs so much harder? george.karpenkov: Firstly, is `glob.glob` actually random, as the docstring is saying? If yes, can we make it not…
		gerazoUnsubmitted Done Reply Inline Actions "random" here means we don't care. So you are right. glob.glob doc says it is arbitrary but I couldn't find that it is deterministic or not. Probably sorting it will not do harm. gerazo: "random" here means we don't care. So you are right. glob.glob doc says it is arbitrary but I…
		def merge_ctu_func_maps(ctudir):
		""" Merge individual function maps into a global one.

		As the collect phase runs parallel on multiple threads, all compilation
		units are separately mapped into a temporary file in CTU_TEMP_FNMAP_FOLDER.
		danielmarjamakiUnsubmitted Done Reply Inline Actions I believe you can write: for line in open(filename, 'r'): danielmarjamaki: I believe you can write: for line in open(filename, 'r'):
		whisperityUnsubmitted Done Reply Inline Actions Do we want to rely on the interpreter implementation on when the file is closed. If for line in open(filename, 'r'): something() is used, the file handle will be closed based on garbage collection rules. Having this handle disposed after the iteration is true for the stock CPython implementation, but it is still nontheless an implementation specific approach. Whereas using `with` will explicitly close the file handle on the spot, no matter what. whisperity: Do we want to rely on the interpreter implementation on when the file is closed. If ```…
		danielmarjamakiUnsubmitted Done Reply Inline Actions ok I did not know that. feel free to ignore my comment. danielmarjamaki: ok I did not know that. feel free to ignore my comment.
		These function maps contain the mangled names of functions and the source
		(AST generated from the source) which had them.
		These files should be merged at the end into a global map file:
		george.karpenkovUnsubmitted Done Reply Inline Actions If `write_global_map` is a closure, please use the fact that it would capture `ctudir` from the outer scope, and remove it from the argument list. That would make it more obvious that it does not change between invocations. george.karpenkov: If `write_global_map` is a closure, please use the fact that it would capture `ctudir` from the…
		CTU_FUNCTION_MAP_FILENAME."""

		def generate_func_map_lines(fnmap_dir):
		""" Iterate over all lines of input files in a determined order. """
		danielmarjamakiUnsubmitted Done Reply Inline Actions this 'with' seems redundant. I suggest an assignment and then less indentation will be needed below danielmarjamaki: this 'with' seems redundant. I suggest an assignment and then less indentation will be needed…
		whisperityUnsubmitted Done Reply Inline Actions I don't seem to understand what do you want to assign to what. whisperity: I don't seem to understand what do you want to assign to what.
		danielmarjamakiUnsubmitted Done Reply Inline Actions I did not consider the garbage collection. I assumed that out_file would Always be closed when it Went out of scope and then this would require less indentation: out_file = open(extern_fns_map_file, 'w') for mangled_name, ast_file in mangled_ast_pairs: out_file.write('%s %s\n' % (mangled_name, ast_file)) danielmarjamaki: I did not consider the garbage collection. I assumed that out_file would Always be closed when…

		files = glob.glob(os.path.join(fnmap_dir, '*'))
		files.sort()
		for filename in files:
		with open(filename, 'r') as in_file:
		for line in in_file:
		yield line

		def write_global_map(arch, mangled_ast_pairs):
		""" Write (mangled function name, ast file) pairs into final file. """

		extern_fns_map_file = os.path.join(ctudir, arch,
		CTU_FUNCTION_MAP_FILENAME)
		with open(extern_fns_map_file, 'w') as out_file:
		for mangled_name, ast_file in mangled_ast_pairs:
		out_file.write('%s %s\n' % (mangled_name, ast_file))

		george.karpenkovUnsubmitted Done Reply Inline Actions Having an analysis tool remove files is scary, what if (maybe not in this revision, but in a future iteration) a bug is introduced, and the tool removes user code instead? Why not just create a temporary directory with `tempfile.mkdtemp`, put all temporary files there, and then simply iterate through them? Then you would be able to get rid of the constant `CPU_TEMP_FNMAP_FOLDER` entirely, and OS would be responsible for cleanup. george.karpenkov: Having an analysis tool remove files is scary, what if (maybe not in this revision, but in a…
		gerazoUnsubmitted Done Reply Inline Actions Yes, you are right. We are essentially using a temp dir. Because of the size we first had to put it next to the project (not on tmp drive for instance) and for debugging purposes we gave a name to it. Still it can be done with mkdtemp as well. gerazo: Yes, you are right. We are essentially using a temp dir. Because of the size we first had to…
		gerazoUnsubmitted Done Reply Inline Actions Finally, I came to the conclusion that mkdtemp would not be better than the current solution. In order to find our created dir by other threads, we need a designated name. Suffixing it by generated name would further complicate things as we need not to allow multiple concurrent runs here. The current solution is more robust from this point of view. gerazo: Finally, I came to the conclusion that mkdtemp would not be better than the current solution.
		george.karpenkovUnsubmitted Done Reply Inline Actions OK george.karpenkov: OK
		triple_arches = glob.glob(os.path.join(ctudir, '*'))
		for triple_path in triple_arches:
		if os.path.isdir(triple_path):
		triple_arch = os.path.basename(triple_path)
		fnmap_dir = os.path.join(ctudir, triple_arch,
		CTU_TEMP_FNMAP_FOLDER)

		func_map_lines = generate_func_map_lines(fnmap_dir)
		mangled_ast_pairs = create_global_ctu_function_map(func_map_lines)
		write_global_map(triple_arch, mangled_ast_pairs)

		# Remove all temporary files
		shutil.rmtree(fnmap_dir, ignore_errors=True)


def run_analyzer_parallel(args):		def run_analyzer_parallel(args):
""" Runs the analyzer against the given compilation database. """		""" Runs the analyzer against the given compilation database. """

def exclude(filename):		def exclude(filename):
""" Return true when any excluded directory prefix the filename. """		""" Return true when any excluded directory prefix the filename. """
return any(re.match(r'^' + directory, filename)		return any(re.match(r'^' + directory, filename)
for directory in args.excludes)		for directory in args.excludes)

consts = {		consts = {
'clang': args.clang,		'clang': args.clang,
'output_dir': args.output,		'output_dir': args.output,
'output_format': args.output_format,		'output_format': args.output_format,
'output_failures': args.output_failures,		'output_failures': args.output_failures,
'direct_args': analyzer_params(args),		'direct_args': analyzer_params(args),
'force_debug': args.force_debug		'force_debug': args.force_debug,
		'ctu': get_ctu_config_from_args(args)
}		}

logging.debug('run analyzer against compilation database')		logging.debug('run analyzer against compilation database')
with open(args.cdb, 'r') as handle:		with open(args.cdb, 'r') as handle:
generator = (dict(cmd, **consts)		generator = (dict(cmd, **consts)
for cmd in json.load(handle) if not exclude(cmd['file']))		for cmd in json.load(handle) if not exclude(cmd['file']))
# when verbose output requested execute sequentially		# when verbose output requested execute sequentially
pool = multiprocessing.Pool(1 if args.verbose > 2 else None)		pool = multiprocessing.Pool(1 if args.verbose > 2 else None)
for current in pool.imap_unordered(run, generator):		for current in pool.imap_unordered(run, generator):
if current is not None:		if current is not None:
# display error message from the static analyzer		# display error message from the static analyzer
for line in current['error_output']:		for line in current['error_output']:
logging.info(line.rstrip())		logging.info(line.rstrip())
pool.close()		pool.close()
pool.join()		pool.join()


		def govern_analyzer_runs(args):
		""" Governs multiple runs in CTU mode or runs once in normal mode. """

		ctu_config = get_ctu_config_from_args(args)
		# If we do a CTU collect (1st phase) we remove all previous collection
		danielmarjamakiUnsubmitted Done Reply Inline Actions not a big deal but I would use early exits in this function danielmarjamaki: not a big deal but I would use early exits in this function
		danielmarjamakiUnsubmitted Done Reply Inline Actions with "not a big deal" I mean; feel free to ignore my comment if you want to have it this way. danielmarjamaki: with "not a big deal" I mean; feel free to ignore my comment if you want to have it this way.
		gerazoUnsubmitted Done Reply Inline Actions I've checked it through. The only place for an early exit now would be before the else. The 1st and 2nd ifs are in fact non-orthogonal. gerazo: I've checked it through. The only place for an early exit now would be before the else. The 1st…
		# data first.
		george.karpenkovUnsubmitted Done Reply Inline Actions Similarly to the comment above, I would prefer if analysis tool would not remove files (and I assume those are not huge ones?) Can we just use temporary directories? george.karpenkov: Similarly to the comment above, I would prefer if analysis tool would not remove files (and I…
		gerazoUnsubmitted Done Reply Inline Actions Unlike above, here we do remove non-temporary data intentionally. The user asks here to do the recollection of CTU data for a fresh start. Because there is no "clean" functionality in the analyzer interface itself, this seemed to be the easiest-on-user solution to save him/her an extra effort or a new command. gerazo: Unlike above, here we do remove non-temporary data intentionally. The user asks here to do the…
		george.karpenkovUnsubmitted Done Reply Inline Actions OK! george.karpenkov: OK!
		if ctu_config.collect:
		shutil.rmtree(ctu_config.dir, ignore_errors=True)

		# If the user asked for a collect (1st) and analyze (2nd) phase, we do an
		# all-in-one run where we deliberately remove collection data before and
		# also after the run. If the user asks only for a single phase data is
		# left so multiple analyze runs can use the same data gathered by a single
		# collection run.
		if ctu_config.collect and ctu_config.analyze:
		# CTU strings are coming from args.ctu_dir and func_map_cmd,
		# so we can leave it empty
		george.karpenkovUnsubmitted Done Reply Inline Actions Same as the comment above about removing folders. Also it seems like there should be a way to remove redundancy in `if collect / remove tree` block repeated twice. george.karpenkov: Same as the comment above about removing folders. Also it seems like there should be a way to…
		gerazoUnsubmitted Done Reply Inline Actions Th previous call for data removal happens because the user asked for a collect run, so we clean data to do a recollection. This second one happens because the user asked for a full recollection and anaylsis run all in one. So here we destroy the temp data for user's convenience. This happens after, not before like previously. The default behavior is to do this when the user uses the tool the easy way (collect and analyze all in one) and we intentionally keep collection data if the user only asks for a collect or an analyze run. So with this, the user can use a collect run's results for multiple analyze runs. This is written in the command line help. I will definitely put comments here to explain. gerazo: Th previous call for data removal happens because the user asked for a collect run, so we clean…
		args.ctu_phases = CtuConfig(collect=True, analyze=False,
		dir='', func_map_cmd='')
		run_analyzer_parallel(args)
		merge_ctu_func_maps(ctu_config.dir)
		args.ctu_phases = CtuConfig(collect=False, analyze=True,
		dir='', func_map_cmd='')
		run_analyzer_parallel(args)
		shutil.rmtree(ctu_config.dir, ignore_errors=True)
		else:
		# Single runs (collect or analyze) are launched from here.
		run_analyzer_parallel(args)
		if ctu_config.collect:
		merge_ctu_func_maps(ctu_config.dir)


def setup_environment(args):		def setup_environment(args):
""" Set up environment for build command to interpose compiler wrapper. """		""" Set up environment for build command to interpose compiler wrapper. """

environment = dict(os.environ)		environment = dict(os.environ)
environment.update(wrapper_environment(args))		environment.update(wrapper_environment(args))
environment.update({		environment.update({
'CC': COMPILER_WRAPPER_CC,		'CC': COMPILER_WRAPPER_CC,
'CXX': COMPILER_WRAPPER_CXX,		'CXX': COMPILER_WRAPPER_CXX,
'ANALYZE_BUILD_CLANG': args.clang if need_analyzer(args.build) else '',		'ANALYZE_BUILD_CLANG': args.clang if need_analyzer(args.build) else '',
'ANALYZE_BUILD_REPORT_DIR': args.output,		'ANALYZE_BUILD_REPORT_DIR': args.output,
'ANALYZE_BUILD_REPORT_FORMAT': args.output_format,		'ANALYZE_BUILD_REPORT_FORMAT': args.output_format,
'ANALYZE_BUILD_REPORT_FAILURES': 'yes' if args.output_failures else '',		'ANALYZE_BUILD_REPORT_FAILURES': 'yes' if args.output_failures else '',
'ANALYZE_BUILD_PARAMETERS': ' '.join(analyzer_params(args)),		'ANALYZE_BUILD_PARAMETERS': ' '.join(analyzer_params(args)),
'ANALYZE_BUILD_FORCE_DEBUG': 'yes' if args.force_debug else ''		'ANALYZE_BUILD_FORCE_DEBUG': 'yes' if args.force_debug else '',
		'ANALYZE_BUILD_CTU': json.dumps(get_ctu_config_from_args(args))
		george.karpenkovUnsubmitted Done Reply Inline Actions Using JSON-serialization-over-environment-variables is very unorthodox, and would give very bizarre error messages if the user would try to customize settings (I assume that is the intent, right?) I think that separating those options into separate ENV variables (if they _have_ to be customizable) would be much better. george.karpenkov: Using JSON-serialization-over-environment-variables is very unorthodox, and would give very…
		gerazoUnsubmitted Done Reply Inline Actions To be honest, I could never make a decision over this. The previous version of scan-build-py used extensively environment variables for everything which ended up in a huge mess and env contamination. There was an effort to reduce this to only a few, well-designated ones. Still there is the need of adding new data to the environment. gerazo: To be honest, I could never make a decision over this. The previous version of scan-build-py…
})		})
return environment		return environment


@command_entry_point		@command_entry_point
def analyze_compiler_wrapper():		def analyze_compiler_wrapper():
""" Entry point for `analyze-cc` and `analyze-c++` compiler wrappers. """		""" Entry point for `analyze-cc` and `analyze-c++` compiler wrappers. """

Show All 16 Lines	parameters = {
'clang': os.getenv('ANALYZE_BUILD_CLANG'),		'clang': os.getenv('ANALYZE_BUILD_CLANG'),
'output_dir': os.getenv('ANALYZE_BUILD_REPORT_DIR'),		'output_dir': os.getenv('ANALYZE_BUILD_REPORT_DIR'),
'output_format': os.getenv('ANALYZE_BUILD_REPORT_FORMAT'),		'output_format': os.getenv('ANALYZE_BUILD_REPORT_FORMAT'),
'output_failures': os.getenv('ANALYZE_BUILD_REPORT_FAILURES'),		'output_failures': os.getenv('ANALYZE_BUILD_REPORT_FAILURES'),
'direct_args': os.getenv('ANALYZE_BUILD_PARAMETERS',		'direct_args': os.getenv('ANALYZE_BUILD_PARAMETERS',
'').split(' '),		'').split(' '),
'force_debug': os.getenv('ANALYZE_BUILD_FORCE_DEBUG'),		'force_debug': os.getenv('ANALYZE_BUILD_FORCE_DEBUG'),
'directory': execution.cwd,		'directory': execution.cwd,
'command': [execution.cmd[0], '-c'] + compilation.flags		'command': [execution.cmd[0], '-c'] + compilation.flags,
		'ctu': get_ctu_config_from_json(os.getenv('ANALYZE_BUILD_CTU'))
		george.karpenkovUnsubmitted Done Reply Inline Actions Again, is it possible to avoid JSON-over-environment-variables? george.karpenkov: Again, is it possible to avoid JSON-over-environment-variables?
		gerazoUnsubmitted Done Reply Inline Actions There is an other thing against changing this. Currently the interface here using env variables is used by intercept-build, analyze-build and scan-build tool as well. In order to drop json, we need to change those tools too. It would be a separate patch definitely. gerazo: There is an other thing against changing this. Currently the interface here using env variables…
		george.karpenkovUnsubmitted Done Reply Inline Actions OK I didn't know that the JSON interface was used by other tools. In that case, ignore my comment. george.karpenkov: OK I didn't know that the JSON interface was used by other tools. In that case, ignore my…
}		}
# call static analyzer against the compilation		# call static analyzer against the compilation
for source in compilation.files:		for source in compilation.files:
parameters.update({'file': source})		parameters.update({'file': source})
logging.debug('analyzer parameters %s', parameters)		logging.debug('analyzer parameters %s', parameters)
current = run(parameters)		current = run(parameters)
# display error message from the static analyzer		# display error message from the static analyzer
if current is not None:		if current is not None:
Show All 33 Lines	finally:
if not keep:		if not keep:
os.rmdir(name)		os.rmdir(name)


def analyzer_params(args):		def analyzer_params(args):
""" A group of command line arguments can mapped to command		""" A group of command line arguments can mapped to command
line arguments of the analyzer. This method generates those. """		line arguments of the analyzer. This method generates those. """

def prefix_with(constant, pieces):
""" From a sequence create another sequence where every second element
is from the original sequence and the odd elements are the prefix.

eg.: prefix_with(0, [1,2,3]) creates [0, 1, 0, 2, 0, 3] """

return [elem for piece in pieces for elem in [constant, piece]]

result = []		result = []

if args.store_model:		if args.store_model:
result.append('-analyzer-store={0}'.format(args.store_model))		result.append('-analyzer-store={0}'.format(args.store_model))
if args.constraints_model:		if args.constraints_model:
result.append('-analyzer-constraints={0}'.format(		result.append('-analyzer-constraints={0}'.format(
args.constraints_model))		args.constraints_model))
if args.internal_stats:		if args.internal_stats:
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

@require(['command', # entry from compilation database		@require(['command', # entry from compilation database
'directory', # entry from compilation database		'directory', # entry from compilation database
'file', # entry from compilation database		'file', # entry from compilation database
'clang', # clang executable name (and path)		'clang', # clang executable name (and path)
'direct_args', # arguments from command line		'direct_args', # arguments from command line
'force_debug', # kill non debug macros		'force_debug', # kill non debug macros
'output_dir', # where generated report files shall go		'output_dir', # where generated report files shall go
'output_format', # it's 'plist' or 'html' or both		'output_format', # it's 'plist', 'html', both or plist-multi-file
'output_failures']) # generate crash reports or not		'output_failures', # generate crash reports or not
		'ctu']) # ctu control options
def run(opts):		def run(opts):
""" Entry point to run (or not) static analyzer against a single entry		""" Entry point to run (or not) static analyzer against a single entry
of the compilation database.		of the compilation database.

This complex task is decomposed into smaller methods which are calling		This complex task is decomposed into smaller methods which are calling
each other in chain. If the analyzis is not possibe the given method		each other in chain. If the analyzis is not possibe the given method
just return and break the chain.		just return and break the chain.

▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	@require(['clang', 'directory', 'flags', 'direct_args', 'file', 'output_dir',
'output_format'])		'output_format'])
def run_analyzer(opts, continuation=report_failure):		def run_analyzer(opts, continuation=report_failure):
""" It assembles the analysis command line and executes it. Capture the		""" It assembles the analysis command line and executes it. Capture the
output of the analysis and returns with it. If failure reports are		output of the analysis and returns with it. If failure reports are
requested, it calls the continuation to generate it. """		requested, it calls the continuation to generate it. """

def target():		def target():
""" Creates output file name for reports. """		""" Creates output file name for reports. """
if opts['output_format'] in {'plist', 'plist-html'}:		if opts['output_format'] in {
		'plist',
		'plist-html',
		'plist-multi-file'}:
(handle, name) = tempfile.mkstemp(prefix='report-',		(handle, name) = tempfile.mkstemp(prefix='report-',
suffix='.plist',		suffix='.plist',
dir=opts['output_dir'])		dir=opts['output_dir'])
os.close(handle)		os.close(handle)
return name		return name
return opts['output_dir']		return opts['output_dir']

try:		try:
cwd = opts['directory']		cwd = opts['directory']
cmd = get_arguments([opts['clang'], '--analyze'] +		cmd = get_arguments([opts['clang'], '--analyze'] +
opts['direct_args'] + opts['flags'] +		opts['direct_args'] + opts['flags'] +
[opts['file'], '-o', target()],		[opts['file'], '-o', target()],
cwd)		cwd)
output = run_command(cmd, cwd=cwd)		output = run_command(cmd, cwd=cwd)
return {'error_output': output, 'exit_code': 0}		return {'error_output': output, 'exit_code': 0}
except subprocess.CalledProcessError as ex:		except subprocess.CalledProcessError as ex:
result = {'error_output': ex.output, 'exit_code': ex.returncode}		result = {'error_output': ex.output, 'exit_code': ex.returncode}
if opts.get('output_failures', False):		if opts.get('output_failures', False):
opts.update(result)		opts.update(result)
continuation(opts)		continuation(opts)
return result		return result


		def func_map_list_src_to_ast(func_src_list):
		""" Turns textual function map list with source files into a
		function map list with ast files. """

		func_ast_list = []
		for fn_src_txt in func_src_list:
		mangled_name, path = fn_src_txt.split(" ", 1)
		# Normalize path on windows as well
		path = os.path.splitdrive(path)[1]
		george.karpenkovUnsubmitted Done Reply Inline Actions `mangled_name, path = fn_src_txt.split(" ", 1)` ? george.karpenkov: `mangled_name, path = fn_src_txt.split(" ", 1)` ?
		# Make relative path out of absolute
		path = path[1:] if path[0] == os.sep else path
		ast_path = os.path.join("ast", path + ".ast")
		func_ast_list.append(mangled_name + " " + ast_path)
		return func_ast_list

		dkruppUnsubmitted Done Reply Inline Actions There is a incompatibility between this scan-build (analyze-build actually) implementation and new lib/CrossTU library. CrossTranslationUnitContext::loadExternalAST( StringRef LookupName, StringRef CrossTUDir, StringRef IndexName) expects the externalFnMap.txt to be in "functionName astFilename" format. however currently we generate here "functionName@arch astFilename" lines. One possible fix could be to create one externalFnMap.txt indexes per arch <collect-dir>/ast/x86_64/externalFnMap.txt <collect-dir>/ast/ppc64/externalFnMap.txt etc. and call clang analyze with the architecture specific map directory: e.g. ctu-dir=<collect-dir>/ast/x86_64 This would then work if the "to-be-analyzed" source-code is cross-compiled into multiple architectures. Would be useful to add a test-case too to check if the map file and ctu-dir content generated by analyze-build is compatible. dkrupp: There is a incompatibility between this scan-build (analyze-build actually) implementation and…

		@require(['clang', 'directory', 'flags', 'direct_args', 'file', 'ctu'])
		def ctu_collect_phase(opts):
		""" Preprocess source by generating all data needed by CTU analysis. """

		def generate_ast(triple_arch):
		""" Generates ASTs for the current compilation command. """

		args = opts['direct_args'] + opts['flags']
		ast_joined_path = os.path.join(opts['ctu'].dir, triple_arch, 'ast',
		os.path.realpath(opts['file'])[1:] +
		'.ast')
		ast_path = os.path.abspath(ast_joined_path)
		ast_dir = os.path.dirname(ast_path)
		if not os.path.isdir(ast_dir):
		try:
		os.makedirs(ast_dir)
		except OSError:
		# In case an other process already created it.
		pass
		ast_command = [opts['clang'], '-emit-ast']
		george.karpenkovUnsubmitted Done Reply Inline Actions `try/except/pass` is almost always bad. When can the error occur? Why are we ignoring it? george.karpenkov: `try/except/pass` is almost always bad. When can the error occur? Why are we ignoring it?
		gerazoUnsubmitted Done Reply Inline Actions I think this code is redundant with the if above. gerazo: I think this code is redundant with the if above.
		gerazoUnsubmitted Done Reply Inline Actions Here the folders are created on demand. Because these are created parallel by multiple processes, there is small chance that an other process already created the folder between the isdir check and the makedirs call. This is why the the pass is needed to make it always run correctly. I will add a comment. gerazo: Here the folders are created on demand. Because these are created parallel by multiple…
		ast_command.extend(args)
		ast_command.append('-w')
		ast_command.append(opts['file'])
		ast_command.append('-o')
		ast_command.append(ast_path)
		logging.debug("Generating AST using '%s'", ast_command)
		george.karpenkovUnsubmitted Done Reply Inline Actions The above can be written more succinctly as: `ast_command = [opts['clang'], ...] + args + ['-w', ...]` george.karpenkov: The above can be written more succinctly as: `ast_command = [opts['clang'], ...] + args + ['-w'…
		gerazoUnsubmitted Done Reply Inline Actions After several iterations of the code, I find it easier to version control such multiline constructs. If someone changes a data source, it is clear which one (which line) was modified. The succint notation does not allow clean VCS annotations. gerazo: After several iterations of the code, I find it easier to version control such multiline…
		george.karpenkovUnsubmitted Done Reply Inline Actions OK. Though you could still use split addition across multiple lines with `\` george.karpenkov: OK. Though you could still use split addition across multiple lines with `\`
		run_command(ast_command, cwd=opts['directory'])

		def map_functions(triple_arch):
		""" Generate function map file for the current source. """

		args = opts['direct_args'] + opts['flags']
		funcmap_command = [opts['ctu'].func_map_cmd]
		funcmap_command.append(opts['file'])
		funcmap_command.append('--')
		funcmap_command.extend(args)
		logging.debug("Generating function map using '%s'", funcmap_command)
		george.karpenkovUnsubmitted Done Reply Inline Actions Similarly here, `funcmap_command` can be generated in one line using `+` george.karpenkov: Similarly here, `funcmap_command` can be generated in one line using `+`
		func_src_list = run_command(funcmap_command, cwd=opts['directory'])
		func_ast_list = func_map_list_src_to_ast(func_src_list)
		extern_fns_map_folder = os.path.join(opts['ctu'].dir, triple_arch,
		CTU_TEMP_FNMAP_FOLDER)
		if not os.path.isdir(extern_fns_map_folder):
		try:
		os.makedirs(extern_fns_map_folder)
		except OSError:
		# In case an other process already created it.
		pass
		george.karpenkovUnsubmitted Done Reply Inline Actions Again, why is this error ignored? george.karpenkov: Again, why is this error ignored?
		if func_ast_list:
		with tempfile.NamedTemporaryFile(mode='w',
		dkruppUnsubmitted Done Reply Inline Actions Maybe we could use the full target-triple for distinguishing the AST binaries, not only the architecture part. The sys part for example is probably important too and a "win32" AST may not be compatible with a "linux" AST. dkrupp: Maybe we could use the full target-triple for distinguishing the AST binaries, not only the…
		dir=extern_fns_map_folder,
		delete=False) as out_file:
		out_file.write("\n".join(func_ast_list) + "\n")

		cwd = opts['directory']
		cmd = [opts['clang'], '--analyze'] + opts['direct_args'] + opts['flags'] \
		+ [opts['file']]
		triple_arch = get_triple_arch(cmd, cwd)
		generate_ast(triple_arch)
		map_functions(triple_arch)


		@require(['ctu'])
		def dispatch_ctu(opts, continuation=run_analyzer):
		""" Execute only one phase of 2 phases of CTU if needed. """

		ctu_config = opts['ctu']

		if ctu_config.collect or ctu_config.analyze:
		george.karpenkovUnsubmitted Done Reply Inline Actions In which case is this branch hit? Isn't improperly formed input argument indicative of an internal error at this stage? george.karpenkov: In which case is this branch hit? Isn't improperly formed input argument indicative of an…
		gerazoUnsubmitted Done Reply Inline Actions An other part of scan-build-py, analyze_cc uses namedtuple to json format to communicate. However, the names are not coming back from json, so this code helps in this. This is the case when someone uses the whole toolset with compiler wrapping. All the environment variable hassle is also happening because of this. So these env vars are not for user modification (as you've suggested earlier). gerazo: An other part of scan-build-py, analyze_cc uses namedtuple to json format to communicate.
		george.karpenkovUnsubmitted Done Reply Inline Actions OK so `opts['ctu']` is a tuple or a named tuple depending on how this function is entered? BTW could you point me to the `analyze_cc` entry point? For the purpose of having more uniform code with less cases to care about, do you think we could just use ordinary tuples instead of constructing a named one, since we have to deconstruct an ordinary tuple in any case? george.karpenkov: OK so `opts['ctu']` is a tuple or a named tuple depending on how this function is entered? BTW…
		gerazoUnsubmitted Done Reply Inline Actions Using a NamedTuple improves readability of the code a lot with less comments. It is unfortunate that serializing it is not solved by Python. I think moving this code to the entry point would make the whole thing much nicer. The entry point is at analyze_compiler_wrapper gerazo: Using a NamedTuple improves readability of the code a lot with less comments. It is unfortunate…
		assert ctu_config.collect != ctu_config.analyze
		if ctu_config.collect:
		return ctu_collect_phase(opts)
		if ctu_config.analyze:
		cwd = opts['directory']
		cmd = [opts['clang'], '--analyze'] + opts['direct_args'] \
		+ opts['flags'] + [opts['file']]
		triarch = get_triple_arch(cmd, cwd)
		ctu_options = ['ctu-dir=' + os.path.join(ctu_config.dir, triarch),
		'reanalyze-ctu-visited=true']
		analyzer_options = prefix_with('-analyzer-config', ctu_options)
		direct_options = prefix_with('-Xanalyzer', analyzer_options)
		opts['direct_args'].extend(direct_options)

		return continuation(opts)


@require(['flags', 'force_debug'])		@require(['flags', 'force_debug'])
def filter_debug_flags(opts, continuation=run_analyzer):		def filter_debug_flags(opts, continuation=dispatch_ctu):
""" Filter out nondebug macros when requested. """		""" Filter out nondebug macros when requested. """

if opts.pop('force_debug'):		if opts.pop('force_debug'):
# lazy implementation just append an undefine macro at the end		# lazy implementation just append an undefine macro at the end
opts.update({'flags': opts['flags'] + ['-UNDEBUG']})		opts.update({'flags': opts['flags'] + ['-UNDEBUG']})

return continuation(opts)		return continuation(opts)

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if received_list:
return continuation(opts)		return continuation(opts)
else:		else:
logging.debug('skip analysis, found not supported arch')		logging.debug('skip analysis, found not supported arch')
return None		return None
else:		else:
logging.debug('analysis, on default arch')		logging.debug('analysis, on default arch')
return continuation(opts)		return continuation(opts)


		george.karpenkovUnsubmitted Not Done Reply Inline Actions This blank line should not be in this PR. george.karpenkov: This blank line should not be in this PR.
		gerazoUnsubmitted Not Done Reply Inline Actions Scheduled to be done. gerazo: Scheduled to be done.
# To have good results from static analyzer certain compiler options shall be		# To have good results from static analyzer certain compiler options shall be
# omitted. The compiler flag filtering only affects the static analyzer run.		# omitted. The compiler flag filtering only affects the static analyzer run.
#		#
# Keys are the option name, value number of options to skip		# Keys are the option name, value number of options to skip
IGNORED_FLAGS = {		IGNORED_FLAGS = {
'-c': 0, # compile option will be overwritten		'-c': 0, # compile option will be overwritten
'-fsyntax-only': 0, # static analyzer option will be overwritten		'-fsyntax-only': 0, # static analyzer option will be overwritten
'-o': 1, # will set up own output file		'-o': 1, # will set up own output file
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

tools/scan-build-py/libscanbuild/arguments.py

Show All 12 Lines
Validations are mostly calling specific help methods, or mangling values.		Validations are mostly calling specific help methods, or mangling values.
"""		"""

import os		import os
import sys		import sys
import argparse		import argparse
import logging		import logging
import tempfile		import tempfile
from libscanbuild import reconfigure_logging		from libscanbuild import reconfigure_logging, CtuConfig
from libscanbuild.clang import get_checkers		from libscanbuild.clang import get_checkers, is_ctu_capable

__all__ = ['parse_args_for_intercept_build', 'parse_args_for_analyze_build',		__all__ = ['parse_args_for_intercept_build', 'parse_args_for_analyze_build',
'parse_args_for_scan_build']		'parse_args_for_scan_build']


def parse_args_for_intercept_build():		def parse_args_for_intercept_build():
""" Parse and validate command-line arguments for intercept-build. """		""" Parse and validate command-line arguments for intercept-build. """

▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	def normalize_args_for_analyze(args, from_build_command):
# expecting some argument to be present. so, instead of query the args		# expecting some argument to be present. so, instead of query the args
# object about the presence of the flag, we fake it here. to make those		# object about the presence of the flag, we fake it here. to make those
# methods more readable. (it's an arguable choice, took it only for those		# methods more readable. (it's an arguable choice, took it only for those
# which have good default value.)		# which have good default value.)
if from_build_command:		if from_build_command:
# add cdb parameter invisibly to make report module working.		# add cdb parameter invisibly to make report module working.
args.cdb = 'compile_commands.json'		args.cdb = 'compile_commands.json'

		# Make ctu_dir an abspath as it is needed inside clang
		if not from_build_command and hasattr(args, 'ctu_phases') \
		and hasattr(args.ctu_phases, 'dir'):
		args.ctu_dir = os.path.abspath(args.ctu_dir)


def validate_args_for_analyze(parser, args, from_build_command):		def validate_args_for_analyze(parser, args, from_build_command):
""" Command line parsing is done by the argparse module, but semantic		""" Command line parsing is done by the argparse module, but semantic
validation still needs to be done. This method is doing it for		validation still needs to be done. This method is doing it for
analyze-build and scan-build commands.		analyze-build and scan-build commands.

:param parser: The command line parser object.		:param parser: The command line parser object.
:param args: Parsed argument object.		:param args: Parsed argument object.
:param from_build_command: Boolean value tells is the command suppose		:param from_build_command: Boolean value tells is the command suppose
to run the analyzer against a build command or a compilation db.		to run the analyzer against a build command or a compilation db.
:return: No return value, but this call might throw when validation		:return: No return value, but this call might throw when validation
fails. """		fails. """

if args.help_checkers_verbose:		if args.help_checkers_verbose:
print_checkers(get_checkers(args.clang, args.plugins))		print_checkers(get_checkers(args.clang, args.plugins))
parser.exit(status=0)		parser.exit(status=0)
elif args.help_checkers:		elif args.help_checkers:
print_active_checkers(get_checkers(args.clang, args.plugins))		print_active_checkers(get_checkers(args.clang, args.plugins))
parser.exit(status=0)		parser.exit(status=0)
elif from_build_command and not args.build:		elif from_build_command and not args.build:
parser.error(message='missing build command')		parser.error(message='missing build command')
elif not from_build_command and not os.path.exists(args.cdb):		elif not from_build_command and not os.path.exists(args.cdb):
parser.error(message='compilation database is missing')		parser.error(message='compilation database is missing')

		# If the user wants CTU mode
		if not from_build_command and hasattr(args, 'ctu_phases') \
		and hasattr(args.ctu_phases, 'dir'):
		# If CTU analyze_only, the input directory should exist
		if args.ctu_phases.analyze and not args.ctu_phases.collect \
		and not os.path.exists(args.ctu_dir):
		parser.error(message='missing CTU directory')
		# Check CTU capability via checking clang-func-mapping
		if not is_ctu_capable(args.func_map_cmd):
		parser.error(message="""This version of clang does not support CTU
		functionality or clang-func-mapping command not found.""")


def create_intercept_parser():		def create_intercept_parser():
""" Creates a parser for command-line arguments to 'intercept'. """		""" Creates a parser for command-line arguments to 'intercept'. """

parser = create_default_parser()		parser = create_default_parser()
parser_add_cdb(parser)		parser_add_cdb(parser)

parser_add_prefer_wrapper(parser)		parser_add_prefer_wrapper(parser)
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	def create_analyze_parser(from_build_command):
format_group.add_argument(		format_group.add_argument(
'--plist-html',		'--plist-html',
'-plist-html',		'-plist-html',
dest='output_format',		dest='output_format',
const='plist-html',		const='plist-html',
default='html',		default='html',
action='store_const',		action='store_const',
help="""Cause the results as a set of .html and .plist files.""")		help="""Cause the results as a set of .html and .plist files.""")
# TODO: implement '-view '		format_group.add_argument(
		'--plist-multi-file',
		'-plist-multi-file',
		dest='output_format',
		const='plist-multi-file',
		default='html',
		action='store_const',
		help="""Cause the results as a set of .plist files with extra
		information on related files.""")

advanced = parser.add_argument_group('advanced options')		advanced = parser.add_argument_group('advanced options')
advanced.add_argument(		advanced.add_argument(
'--use-analyzer',		'--use-analyzer',
metavar='<path>',		metavar='<path>',
dest='clang',		dest='clang',
default='clang',		default='clang',
help="""'%(prog)s' uses the 'clang' executable relative to itself for		help="""'%(prog)s' uses the 'clang' executable relative to itself for
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	def create_analyze_parser(from_build_command):
plugins.add_argument(		plugins.add_argument(
'--help-checkers-verbose',		'--help-checkers-verbose',
action='store_true',		action='store_true',
help="""Print all available checkers and mark the enabled ones.""")		help="""Print all available checkers and mark the enabled ones.""")

if from_build_command:		if from_build_command:
parser.add_argument(		parser.add_argument(
dest='build', nargs=argparse.REMAINDER, help="""Command to run.""")		dest='build', nargs=argparse.REMAINDER, help="""Command to run.""")
		else:
		ctu = parser.add_argument_group('cross translation unit analysis')
		ctu_mutex_group = ctu.add_mutually_exclusive_group()
		ctu_mutex_group.add_argument(
		'--ctu',
		action='store_const',
		const=CtuConfig(collect=True, analyze=True,
		dir='', func_map_cmd=''),
		dest='ctu_phases',
		help="""Perform cross translation unit (ctu) analysis (both collect
		and analyze phases) using default <ctu-dir> for temporary output.
		At the end of the analysis, the temporary directory is removed.""")
		ctu.add_argument(
		'--ctu-dir',
		metavar='<ctu-dir>',
		dest='ctu_dir',
		george.karpenkovUnsubmitted Done Reply Inline Actions BTW can we also explicitly add `dest='ctu_dir'` here, as otherwise I was initially very confused as to where the variable is set. george.karpenkov: BTW can we also explicitly add `dest='ctu_dir'` here, as otherwise I was initially very…
		gerazoUnsubmitted Done Reply Inline Actions Yes, of course. gerazo: Yes, of course.
		default='ctu-dir',
		help="""Defines the temporary directory used between ctu
		phases.""")
		ctu_mutex_group.add_argument(
		'--ctu-collect-only',
		action='store_const',
		const=CtuConfig(collect=True, analyze=False,
		dir='', func_map_cmd=''),
		dest='ctu_phases',
		help="""Perform only the collect phase of ctu.
		Keep <ctu-dir> for further use.""")
		ctu_mutex_group.add_argument(
		'--ctu-analyze-only',
		action='store_const',
		const=CtuConfig(collect=False, analyze=True,
		dir='', func_map_cmd=''),
		dest='ctu_phases',
		help="""Perform only the analyze phase of ctu. <ctu-dir> should be
		present and will not be removed after analysis.""")
		ctu.add_argument(
		'--use-func-map-cmd',
		metavar='<path>',
		dest='func_map_cmd',
		default='clang-func-mapping',
		help="""'%(prog)s' uses the 'clang-func-mapping' executable
		relative to itself for generating function maps for static
		analysis. One can override this behavior with this option by using
		the 'clang-func-mapping' packaged with Xcode (on OS X) or from the
		PATH.""")
return parser		return parser


def create_default_parser():		def create_default_parser():
""" Creates command line parser for all build wrapper commands. """		""" Creates command line parser for all build wrapper commands. """

parser = argparse.ArgumentParser(		parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter)		formatter_class=argparse.ArgumentDefaultsHelpFormatter)
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

tools/scan-build-py/libscanbuild/clang.py

# -- coding: utf-8 --		# -- coding: utf-8 --
# The LLVM Compiler Infrastructure		# The LLVM Compiler Infrastructure
#		#
# This file is distributed under the University of Illinois Open Source		# This file is distributed under the University of Illinois Open Source
# License. See LICENSE.TXT for details.		# License. See LICENSE.TXT for details.
""" This module is responsible for the Clang executable.		""" This module is responsible for the Clang executable.

Since Clang command line interface is so rich, but this project is using only		Since Clang command line interface is so rich, but this project is using only
a subset of that, it makes sense to create a function specific wrapper. """		a subset of that, it makes sense to create a function specific wrapper. """

		import subprocess
import re		import re
from libscanbuild import run_command		from libscanbuild import run_command
from libscanbuild.shell import decode		from libscanbuild.shell import decode

__all__ = ['get_version', 'get_arguments', 'get_checkers']		__all__ = ['get_version', 'get_arguments', 'get_checkers', 'is_ctu_capable',
		'get_triple_arch']

# regex for activated checker		# regex for activated checker
ACTIVE_CHECKER_PATTERN = re.compile(r'^-analyzer-checker=(.*)$')		ACTIVE_CHECKER_PATTERN = re.compile(r'^-analyzer-checker=(.*)$')


def get_version(clang):		def get_version(clang):
""" Returns the compiler version as string.		""" Returns the compiler version as string.

▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	def get_checkers(clang, plugins):
checkers = {		checkers = {
name: (description, is_active_checker(name))		name: (description, is_active_checker(name))
for name, description in parse_checkers(lines)		for name, description in parse_checkers(lines)
}		}
if not checkers:		if not checkers:
raise Exception('Could not query Clang for available checkers.')		raise Exception('Could not query Clang for available checkers.')

return checkers		return checkers


		def is_ctu_capable(func_map_cmd):
		""" Detects if the current (or given) clang and function mapping
		executables are CTU compatible. """

		try:
		run_command([func_map_cmd, '-version'])
		except (OSError, subprocess.CalledProcessError):
		return False
		return True

		george.karpenkovUnsubmitted Done Reply Inline Actions I might be missing something here, but why is the ability to call `--version` indicative of CTU support? At worst, this can lead to obscuring real bugs: imagine if the user has `args.clang` pointed to broken/non-existent binary, then `is_ctu_capable` would simply return `False` (hiding the original error!), which would show a completely misleading error message. Just checking `func_map_cmd` seems better, but even in this case we should probably log any errors occurring on `-version` call (such messages would really aid debugging) george.karpenkov: I might be missing something here, but why is the ability to call `--version` indicative of CTU…
		gerazoUnsubmitted Done Reply Inline Actions The original idea was that clang can give information about CTU support itself. However, it never happened because the analyzer is so deep down in the system. So I am open to remove the clang binary check here. However, clang binary is needed anyway, so the whole toolset will still throw an error later on not having a clang binary. gerazo: The original idea was that clang can give information about CTU support itself. However, it…
		george.karpenkovUnsubmitted Done Reply Inline Actions so the whole toolset will still throw an error later on not having a clang binary. Of course, but I think that would be easier to debug, and the error would mean that Clang is not available, not that CTU is not working. george.karpenkov: > so the whole toolset will still throw an error later on not having a clang binary. Of course…

		def get_triple_arch(command, cwd):
		"""Returns the architecture part of the target triple for the given
		compilation command. """

		cmd = get_arguments(command, cwd)
		try:
		separator = cmd.index("-triple")
		return cmd[separator + 1]
		danielmarjamakiUnsubmitted Done Reply Inline Actions I am guessing that you can use cmd.find() instead of the loop danielmarjamaki: I am guessing that you can use cmd.find() instead of the loop
		george.karpenkovUnsubmitted Done Reply Inline Actions Seconded, would prefer this rewritten using `separator = cmd.find('-triple')` george.karpenkov: Seconded, would prefer this rewritten using `separator = cmd.find('-triple')`
		except (IndexError, ValueError):
		return ""

tools/scan-build-py/libscanbuild/report.py

# -- coding: utf-8 --		# -- coding: utf-8 --
# The LLVM Compiler Infrastructure		# The LLVM Compiler Infrastructure
#		#
# This file is distributed under the University of Illinois Open Source		# This file is distributed under the University of Illinois Open Source
# License. See LICENSE.TXT for details.		# License. See LICENSE.TXT for details.
""" This module is responsible to generate 'index.html' for the report.		""" This module is responsible to generate 'index.html' for the report.

The input for this step is the output directory, where individual reports		The input for this step is the output directory, where individual reports
could be found. It parses those reports and generates 'index.html'. """		could be found. It parses those reports and generates 'index.html'. """

import re		import re
import os		import os
import os.path		import os.path
import sys		import sys
import shutil		import shutil
import itertools
import plistlib		import plistlib
import glob		import glob
import json		import json
import logging		import logging
import datetime		import datetime
from libscanbuild import duplicate_check		from libscanbuild import duplicate_check
from libscanbuild.clang import get_version		from libscanbuild.clang import get_version

▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	def read_crashes(output_dir):
""" Generate a unique sequence of crashes from given output directory. """		""" Generate a unique sequence of crashes from given output directory. """

return (parse_crash(filename)		return (parse_crash(filename)
for filename in glob.iglob(os.path.join(output_dir, 'failures',		for filename in glob.iglob(os.path.join(output_dir, 'failures',
'*.info.txt')))		'*.info.txt')))


def read_bugs(output_dir, html):		def read_bugs(output_dir, html):
		# type: (str, bool) -> Generator[Dict[str, Any], None, None]
		george.karpenkovUnsubmitted Done Reply Inline Actions Minor nitpicking: type comments are semi-standardized with Sphinx-style auto-generated documentation, and should be a part of the docstring. george.karpenkov: Minor nitpicking: type comments are semi-standardized with Sphinx-style auto-generated…
""" Generate a unique sequence of bugs from given output directory.		""" Generate a unique sequence of bugs from given output directory.

Duplicates can be in a project if the same module was compiled multiple		Duplicates can be in a project if the same module was compiled multiple
times with different compiler options. These would be better to show in		times with different compiler options. These would be better to show in
the final report (cover) only once. """		the final report (cover) only once. """

parser = parse_bug_html if html else parse_bug_plist		def empty(file_name):
pattern = '.html' if html else '.plist'		return os.stat(file_name).st_size == 0

duplicate = duplicate_check(		duplicate = duplicate_check(
lambda bug: '{bug_line}.{bug_path_length}:{bug_file}'.format(**bug))		lambda bug: '{bug_line}.{bug_path_length}:{bug_file}'.format(**bug))

bugs = itertools.chain.from_iterable(		# get the right parser for the job.
		george.karpenkovUnsubmitted Not Done Reply Inline Actions I understand the intent here, but it seems it should be handled at a different level: would it be hard to change Clang to only write the report file at the very end, when no crash should be encountered? Or make parsers not choke on empty fields? george.karpenkov: I understand the intent here, but it seems it should be handled at a different level: would it…
		gerazoUnsubmitted Not Done Reply Inline Actions After careful investigation, I have to say, it would be very hard to tell that we've done this right in the current setup. Unlike the the rest of clang, ASTImporter.cpp is not a strong part of the system. It has a lot of ongoing fixes and probably more problems (missing CXX functionality) on the way. This is the part which makes CTU less reliable than clang generally. In order to elegantly survive a problem of this unit, we need to leave the other things intact and fix ASTImporter instead (this is an ongoing effort). Until then, this fix seems good enough. gerazo: After careful investigation, I have to say, it would be very hard to tell that we've done this…
# parser creates a bug generator not the bug itself		parser = parse_bug_html if html else parse_bug_plist
parser(filename)		# get the input files, which are not empty.
for filename in glob.iglob(os.path.join(output_dir, pattern)))		pattern = os.path.join(output_dir, '.html' if html else '.plist')
		bug_files = (file for file in glob.iglob(pattern) if not empty(file))
return (bug for bug in bugs if not duplicate(bug))
		for bug_file in bug_files:
		for bug in parser(bug_file):
		if not duplicate(bug):
		yield bug


def parse_bug_plist(filename):		def parse_bug_plist(filename):
""" Returns the generator of bugs from a single .plist file. """		""" Returns the generator of bugs from a single .plist file. """

content = plistlib.readPlist(filename)		content = plistlib.readPlist(filename)
files = content.get('files')		files = content.get('files')
for bug in content.get('diagnostics', []):		for bug in content.get('diagnostics', []):
▲ Show 20 Lines • Show All 219 Lines • Show Last 20 Lines

tools/scan-build-py/tests/unit/test_analyze.py

# -- coding: utf-8 --		# -- coding: utf-8 --
# The LLVM Compiler Infrastructure		# The LLVM Compiler Infrastructure
#		#
# This file is distributed under the University of Illinois Open Source		# This file is distributed under the University of Illinois Open Source
# License. See LICENSE.TXT for details.		# License. See LICENSE.TXT for details.

import libear
import libscanbuild.analyze as sut
import unittest		import unittest
import re		import re
import os		import os
import os.path		import os.path
		import libear
		import libscanbuild.analyze as sut


class ReportDirectoryTest(unittest.TestCase):		class ReportDirectoryTest(unittest.TestCase):

# Test that successive report directory names ascend in lexicographic		# Test that successive report directory names ascend in lexicographic
# order. This is required so that report directories from two runs of		# order. This is required so that report directories from two runs of
# scan-build can be easily matched up to compare results.		# scan-build can be easily matched up to compare results.
def test_directory_name_comparison(self):		def test_directory_name_comparison(self):
▲ Show 20 Lines • Show All 307 Lines • ▼ Show 20 Lines	def test_method_with_expecteds(self):
self.assertRaises(KeyError, method_with_expecteds, dict())		self.assertRaises(KeyError, method_with_expecteds, dict())
self.assertRaises(KeyError, method_with_expecteds, {})		self.assertRaises(KeyError, method_with_expecteds, {})
self.assertRaises(KeyError, method_with_expecteds, {'this': 2})		self.assertRaises(KeyError, method_with_expecteds, {'this': 2})
self.assertRaises(KeyError, method_with_expecteds, {'that': 3})		self.assertRaises(KeyError, method_with_expecteds, {'that': 3})
self.assertEqual(method_with_expecteds({'this': 0, 'that': 3}), 0)		self.assertEqual(method_with_expecteds({'this': 0, 'that': 3}), 0)

def test_method_exception_not_caught(self):		def test_method_exception_not_caught(self):
self.assertRaises(Exception, method_exception_from_inside, dict())		self.assertRaises(Exception, method_exception_from_inside, dict())


		class PrefixWithTest(unittest.TestCase):
		george.karpenkovUnsubmitted Done Reply Inline Actions Probably more tests are required for almost 400 lines of functional Python code in this PR. Would it be hard to have a full LIT-style integration test? E.g. have a dummy script emulating Clang with a dummy directory structure, which would show how all pieces are meant to fit together? george.karpenkov: Probably more tests are required for almost 400 lines of functional Python code in this PR.
		gerazoUnsubmitted Done Reply Inline Actions You are right. The testing infra in scan-build-py is not right anyway (uses nosetests). However, this should be a new patch as you've mentioned earlier. gerazo: You are right. The testing infra in scan-build-py is not right anyway (uses nosetests). However…

		def test_gives_empty_on_empty(self):
		res = sut.prefix_with(0, [])
		self.assertFalse(res)

		def test_interleaves_prefix(self):
		res = sut.prefix_with(0, [1, 2, 3])
		self.assertListEqual([0, 1, 0, 2, 0, 3], res)


		class MergeCtuMapTest(unittest.TestCase):

		def test_no_map_gives_empty(self):
		pairs = sut.create_global_ctu_function_map([])
		self.assertFalse(pairs)

		def test_multiple_maps_merged(self):
		concat_map = ['c:@F@fun1#I# ast/fun1.c.ast',
		'c:@F@fun2#I# ast/fun2.c.ast',
		'c:@F@fun3#I# ast/fun3.c.ast']
		pairs = sut.create_global_ctu_function_map(concat_map)
		self.assertTrue(('c:@F@fun1#I#', 'ast/fun1.c.ast') in pairs)
		self.assertTrue(('c:@F@fun2#I#', 'ast/fun2.c.ast') in pairs)
		self.assertTrue(('c:@F@fun3#I#', 'ast/fun3.c.ast') in pairs)
		self.assertEqual(3, len(pairs))

		def test_not_unique_func_left_out(self):
		concat_map = ['c:@F@fun1#I# ast/fun1.c.ast',
		'c:@F@fun2#I# ast/fun2.c.ast',
		'c:@F@fun1#I# ast/fun7.c.ast']
		pairs = sut.create_global_ctu_function_map(concat_map)
		self.assertFalse(('c:@F@fun1#I#', 'ast/fun1.c.ast') in pairs)
		self.assertFalse(('c:@F@fun1#I#', 'ast/fun7.c.ast') in pairs)
		self.assertTrue(('c:@F@fun2#I#', 'ast/fun2.c.ast') in pairs)
		self.assertEqual(1, len(pairs))

		def test_duplicates_are_kept(self):
		concat_map = ['c:@F@fun1#I# ast/fun1.c.ast',
		'c:@F@fun2#I# ast/fun2.c.ast',
		'c:@F@fun1#I# ast/fun1.c.ast']
		pairs = sut.create_global_ctu_function_map(concat_map)
		self.assertTrue(('c:@F@fun1#I#', 'ast/fun1.c.ast') in pairs)
		self.assertTrue(('c:@F@fun2#I#', 'ast/fun2.c.ast') in pairs)
		self.assertEqual(2, len(pairs))

		def test_space_handled_in_source(self):
		concat_map = ['c:@F@fun1#I# ast/f un.c.ast']
		pairs = sut.create_global_ctu_function_map(concat_map)
		self.assertTrue(('c:@F@fun1#I#', 'ast/f un.c.ast') in pairs)
		self.assertEqual(1, len(pairs))


		class FuncMapSrcToAstTest(unittest.TestCase):

		def test_empty_gives_empty(self):
		fun_ast_lst = sut.func_map_list_src_to_ast([])
		self.assertFalse(fun_ast_lst)

		def test_sources_to_asts(self):
		fun_src_lst = ['c:@F@f1#I# ' + os.path.join(os.sep + 'path', 'f1.c'),
		'c:@F@f2#I# ' + os.path.join(os.sep + 'path', 'f2.c')]
		fun_ast_lst = sut.func_map_list_src_to_ast(fun_src_lst)
		self.assertTrue('c:@F@f1#I# ' +
		os.path.join('ast', 'path', 'f1.c.ast')
		in fun_ast_lst)
		self.assertTrue('c:@F@f2#I# ' +
		os.path.join('ast', 'path', 'f2.c.ast')
		in fun_ast_lst)
		self.assertEqual(2, len(fun_ast_lst))

		def test_spaces_handled(self):
		fun_src_lst = ['c:@F@f1#I# ' + os.path.join(os.sep + 'path', 'f 1.c')]
		fun_ast_lst = sut.func_map_list_src_to_ast(fun_src_lst)
		self.assertTrue('c:@F@f1#I# ' +
		os.path.join('ast', 'path', 'f 1.c.ast')
		in fun_ast_lst)
		self.assertEqual(1, len(fun_ast_lst))

tools/scan-build-py/tests/unit/test_clang.py

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	def test_parse_checkers(self):
' checker.one Checker One description',		' checker.one Checker One description',
' checker.two',		' checker.two',
' Checker Two description']		' Checker Two description']
result = dict(sut.parse_checkers(lines))		result = dict(sut.parse_checkers(lines))
self.assertTrue('checker.one' in result)		self.assertTrue('checker.one' in result)
self.assertEqual('Checker One description', result.get('checker.one'))		self.assertEqual('Checker One description', result.get('checker.one'))
self.assertTrue('checker.two' in result)		self.assertTrue('checker.two' in result)
self.assertEqual('Checker Two description', result.get('checker.two'))		self.assertEqual('Checker Two description', result.get('checker.two'))


		class ClangIsCtuCapableTest(unittest.TestCase):
		def test_ctu_not_found(self):
		is_ctu = sut.is_ctu_capable('not-found-clang-func-mapping')
		self.assertFalse(is_ctu)


		class ClangGetTripleArchTest(unittest.TestCase):
		def test_arch_is_not_empty(self):
		arch = sut.get_triple_arch(['clang', '-E', '-'], '.')
		self.assertTrue(len(arch) > 0)