This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Analysis/Analyses/
-
Analyses/
4/13
ReachingDefinitions.h
-
StaticAnalyzer/Checkers/
-
Checkers/
-
Checkers.td
-
lib/
-
Analysis/
-
CMakeLists.txt
4/10
ReachingDefinitions.cpp
-
StaticAnalyzer/Checkers/
-
Checkers/
-
DebugCheckers.cpp
-
test/Analysis/
-
Analysis/
1/2
dump-definitions.cpp

Differential D76287

[analysis][analyzer] Introduce the skeleton of a reaching definitions calculator
Needs ReviewPublic

Authored by Szelethus on Mar 17 2020, 8:13 AM.

Download Raw Diff

Details

Reviewers

NoQ
xazax.hun
martong
rnkovacs
dcoughlin
baloghadamsoftware
Charusso
gribozavr2
ymandel
whisperity

Summary

The following revision adds the basic infrastructure for a reaching definitions algorithm for C++.

Short description & motivation

This is a dataflow algorithm designed to find a set of definitions (for example assignments) to variables in a given CFGBlock. To demonstrate, in the following code snippet:

int flag;

void foo();

void f() {
  int *x = nullptr;
  flag = 1;

  foo();
  if (flag)
    x = new int;

  foo();
  if (flag)
    *x = 5;
}

The CFG would look like this:


                   -> [B3] ->    -> [B1] ->
                  /          \  /          \
[B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)]

Should foo() change flag's value on the first function call to false, then true on the second, a null pointer dereference error would occur. Using a reaching definitions calculator, we can retrieve that the set of ingoing definitions to B1 is {(flag, [B2]), (x, [B3]), (x, [B4])}. The set hints that x has a reaching definitions that would not have caused a nullpointer dereference.

The algorithm

A reaching definition for a given instruction is an earlier instruction whose target variable can reach (be assigned to) the given one without an intervening assignment. The similarly named reaching definitions is a data-flow analysis which statically determines which definitions may reach a given point in the code [1].

The set of ingoing and outgoing reaching definitions are calculated for each basic block with set operations on GEN and KILL sets. This could be further fine-grained so the RD sets would be a property of a statement, rather then an entire CFGBlock, but I didn't want to delay the publishing of this patch any further, and I believe that such a change wouldn't hurt the infrastructure much.

Implementation

As the formal definition would suggest, this was conceived for instructions, which is why even such terms as "variable" or "definition" can be hard to define for C++. For this reason, the patch spares a lot of LOC for documentation and reasoning. While the algorithm itself is simple (and is implemented quite literally from [1]), the problematic part of this is the generation of of GEN and KILL sets. I tried to introduce an infrastructure that can tolerate a lot of things I have inevitable forgotten (or left for followup patches) with the use of easy to add AST matchers.

Immediate questions to address

We already have 2 dataflow algorithms implemented in the Analysis library: UninitializedObject and LiveVariables. Reaching definitions doesn't sound all too dissimilar. Why are we adding hundreds of LOC here? Can't we reuse some of it?

UninitializedObject and LiveVariables are practically the same algorithm with minimal differences. Despite this, they take up ~750 and ~1050 LOC respectively. Both of those algorithms can be expressed with the use of GEN and KILL sets, yet none of them are, and they duplicate a lot of logic. It isn't terribly obvious however how their logic could be (or really, should be) merged.

Shockingly, we have no GEN and KILL set implementations in Clang. I think that is the main addition of this patch, even if it unfortunately duplicates some logic. The previously mentioned two analyses however could serve as an inspiration.

UninitializedObject and LiveVariables uses ASTVisitors rather than ASTMatchers. The latter is more expensive. Why did we go with them?

Matchers are more expressive. For instance, how would you find s.a.x.z with visitors? Its doable, but requires to keep track of a lot of state and would make the implementation ugly. I don't have complete confidence in my decision here, so I welcome alternative suggestions or counterarguments.

What are the main things to get done?

In order:

Finalize the infrastructure for GEN set generation.
Make it possible to calculate RD sets for statements, not only blocks.
Improve the interface of ReachingDefinitionsCalculator. What we like to query? Most probably the set of ingoing RDs for a variable at a given statement.
Performance: use immutable data structures and a better CFG traversal strategy.

References

My main source was wikipedia:
[1] https://en.wikipedia.org/wiki/Reaching_definition

I read the following articles, but they didn't give me the information I needed:

Tonella, Paolo, et al. "Variable precision reaching definitions analysis for software maintenance." Proceedings. First Euromicro Conference on Software Maintenance and Reengineering. IEEE, 1997.

Collard, Jean-François, and Jens Knoop. "A comparative study of reaching-definitions analyses." (1998).

Diff Detail

Event Timeline

Szelethus created this revision.Mar 17 2020, 8:13 AM

Herald added subscribers: cfe-commits, DenisDvlp, steakhal and 8 others. · View Herald TranscriptMar 17 2020, 8:13 AM

Charusso added a reviewer: Charusso.Mar 17 2020, 8:49 AM

Harbormaster completed remote builds in B49435: Diff 250740.Mar 17 2020, 9:38 AM

whisperity added inline comments.Mar 17 2020, 10:30 AM

clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
79	Shouldn't this be called `DefinitionLess` if this is the "natural" comparator for `Definition`? Also, is this the convention in LLVM, instead of providing an explicit specialisation for `std::less`?
131	I do not feel this is visually separative enough, the fact that there is a free-floating comment (seemingly bogusly not attached to any declaration) breaks reading of the code. How about this: class X { // Fields that should not change after construction private: Foo bar; Bla blah; // Fields that change at every step private: Blah bla; public: Yadda blebba; };
140	You do not seem to be using this variable but in one place. Are you sure it is worth the saving as a field? Also, is it valid in the first place to have `this->Context` not to be the same as `this->D->getASTContext()`?
176–179	Is this the right approach? A public method makes the user itch to call it. If `GenSetMatcherCallback` is a superclass of every possible implementation, I think adding that class as a friend here works, and you could make the methods private, or protected.
236–242	The immutability might not(?), but the performance aspects of the RBT behind STL `map` could? `K` is a pointer, why not use `DenseMap`? How large do you expect these containers to be when they peak out?
clang/lib/Analysis/ReachingDefinitions.cpp
96	I know this is the first patch, but it might be worth mentioning here that this thing does not match user-defined operators.
162	You mean explicit destructor calls here, right?
275	Instead of simply blockID, could you harmonise this output with the CFG dump and say `Bx` instead of just `X`?
clang/test/Analysis/dump-definitions.cpp
17–18	What is an element? How do they get their numbers? What does the `3` mean here? I get that basic block 1 (the body of the function) writes `ptr`... but I don't understand this further from looking at the expected output.

Added some more reviewers who might be interested.

I think it is very crucial to make the intentions clear, how do you define definition and variable?
Other than assignments we might include pre/postfix increment/decrement, non-const by-ref function arguments and so on as definitions.
Also, it would be great to have some proposal upfront how indirect writes should behave.

Basically, this algorithm is not very useful on its own, but it can be used as a building block for other kind of analyses. So it would make sense to try to look for potential users and use cases and see if they have the same requirements or do you need to introduce configurations?
Trying to rewrite some existing algorithms in terms of these sets and see if the tests pass might be a good experiment if it is not too much work. (In case we can validate the performance we could even replace the original implementations if this makes things easier.)

I think having a reaching definition set per basic block makes perfect sense, as it should be relatively easy to calculate the reaching definition of an instruction given the set and the basic block. So maybe it is more efficient to calculate the reaching definition sets on demand for instruction rather than computing it for every single instruction.

Regarding ASTMatchers, I think they might be great for now for prototyping but I think once you want to make this more robust covering every corner case in the matcher expression will be at least as hard as writing a visitor if not harder. But we will see :)

clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
40	I wonder if `Variable` will be the right notion long term. Do we want to be able to represent heap locations or do we exclude them on purpose? Reasoning about the heap is quite challenging so both answers might be reasonable. But in case we try to tackle the more general problem, we basically need a more general term like `MemoryLocation`.
60	Is the inheritance justified here? Is a definition a variable? Maybe having a variable member better express the relationship.
98	As far as I understand this is the analysis state for you, while GenSet is used for the transfer functions (I might be mistaken). I think it might be better to origanize the code the following way: /// Analysis state everything state related /// Transfer functions everything transfer function related /// Iteration/Traversal the main loop of the algorithm
222	Hmm, I am not really familiar with the specifics and maybe this is optimized away but always wondered if we only need the address of a static variable why don't we chose the smallest one? Like a char?
clang/lib/Analysis/ReachingDefinitions.cpp
41	In the future this will be more complicated. For example if I assign to a struct all of its members needs to be killed. As a result, you will not only need to represent memory regions but also the relationships between them. I wonder if the analyzer's memregion classes are any good or are they too specific to the analyzer.
137	Note that, `memberExpr` is not supported at this point. It is not derived from `declRefExpr`.

Herald added a subscriber: ASDenysPetrov. · View Herald TranscriptMar 17 2020, 12:41 PM

In D76287#1927340, @xazax.hun wrote:

I think it is very crucial to make the intentions clear, how do you define definition and variable?
Other than assignments we might include pre/postfix increment/decrement, non-const by-ref function arguments and so on as definitions.
Also, it would be great to have some proposal upfront how indirect writes should behave.

You're totally right. The thing that I wanted to achieve, partly, is to make the implementation flexible for things I may have forgotten, or will need to add for future standards/languages. As you can see from D64991, I did make a lot more progress than what is shown here, so I'll explain some of my long-term plans based on the experience I gathered:

I think it is very crucial to make the intentions clear, how do you define [...] variable?

Variable could be practically anything that can be written, but due to the nature of what we can work this, we have to make severe restrictions.

We don't really have a good points-to-analysis. This discards pointees of any kind pretty much immediately. I don't have long term plans to add support for them.
Arrays are strongly related, and while there are studies on indexed variables (the last paper in my summary talks about such an approach), I think the way C++ complicates things on an uncomfortable level. I do have confidence however adding support for it in the future for the cases where the index is known compile time (possibly with the help of the RD algorithm itself) should be doable by modifying the Variable class.

So this leaves the most obvious: VarDecls, and for record objects a (VarDecl, FieldChain) pair (for s.x.a, this would be (s, [x,a])). Setting the VarDecl part of the pair to nullptr could refer to the implicit this during the analysis of a C++ methods.

The plan was to represent each field with a separate Variable object, so for this code

struct A {
  struct B {
    int x, y;
  };
  B b;
};

void foo() { A a };

The following list of Variable objects would be created:

a
a.b
a.b.x
a.b.y

The reason behind this seemingly wasteful storage is that the eventual set operations would be difficult if not impossible to implement, if a single Variable object were to hold the information about its fields. Mind that each of them could have a totally different definition associated with it. I hope that by emloying immutable data structures these wouldn't be terribly expensive memory-wise.

I think it is very crucial to make the intentions clear, how do you define definition[...]?
Other than assignments we might include pre/postfix increment/decrement, non-const by-ref function arguments and so on as definitions.

Definition is a statement, more specifically a CFGStmt (an element of a CFGBlock), that either writes, or could potentially write a variable. The proposed definition finding stage should be very flexible for future additions.

Also, it would be great to have some proposal upfront how indirect writes should behave.

I wrote a mail about this during my GSoC project: http://lists.llvm.org/pipermail/cfe-dev/2019-July/062975.html.

Basically, this algorithm is not very useful on its own, but it can be used as a building block for other kind of analyses. So it would make sense to try to look for potential users and use cases and see if they have the same requirements or do you need to introduce configurations?

The Static Analyzer would be the number one user of the RD algorithm, as described in the summary, so I guess that you referred to the users of GEN/KILL sets? What do you mean under configurations?

Trying to rewrite some existing algorithms in terms of these sets and see if the tests pass might be a good experiment if it is not too much work. (In case we can validate the performance we could even replace the original implementations if this makes things easier.)

I'm not too sure how much work it would take (I suspect an unreasonable quantity, but I might be wrong), so I'll take a look. This is a great idea to gain a lot more knowledge about this topic, even if it eventually fails.

I think having a reaching definition set per basic block makes perfect sense, as it should be relatively easy to calculate the reaching definition of an instruction given the set and the basic block. So maybe it is more efficient to calculate the reaching definition sets on demand for instruction rather than computing it for every single instruction.

Regarding ASTMatchers, I think they might be great for now for prototyping but I think once you want to make this more robust covering every corner case in the matcher expression will be at least as hard as writing a visitor if not harder. But we will see :)

Thank you for the detailed response! I won't update this patch for a while to leave time for others to respond, but I will try to work on the other algorithms a bit.

clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
40	I don't have long term plans to reason about pointees in general. Heap in particular is probably off limits for the foreseeable future.
60	Yup, you're right. This was a quick hack while developing.
176–179	Moving this to `GenSetMatcherCallback` would indeed be a great idea :)
236–242	Measurements on real-life codebases can never come soon enough, but I fear it'll be a while before I get them.
clang/lib/Analysis/ReachingDefinitions.cpp
41	As described in my comment, record variables would have numerous `Variable` classes, so this function wouldn't get much more complicated in the future (as seen in D64991).
137	Indeed, that is one beast of a matcher :) You can take a sneak peak in D64991.
162	Nope, implicit. I wouldn't worry much about explicit calls, I would handle them the same as I would any `CallExpr`. Implicit destructor calls could modify the global state, and are not visibly present in the code.
275	What do you mean?
clang/test/Analysis/dump-definitions.cpp
17–18	A `CFGBlock` is a block that contains all statements that are always executed sequentially. The statements are enumerated according to the execution order. 3 here means that the definition is the 3rd element in the 1st `CFGBlock`.

In D76287#1929221, @Szelethus wrote:d:

Variable could be practically anything that can be written, but due to the nature of what we can work this, we have to make severe restrictions.

We don't really have a good points-to-analysis. This discards pointees of any kind pretty much immediately. I don't have long term plans to add support for them.

I am a bit concerned about this. So if we have something like *x = 2, this definition will not appear in any of the reaching definition sets? An alternative would be to include it in all the sets that are enabled by the strict-aliasing rules. The reason why I am a bit concerned about this because I think we should be clear what the user can expect from this algorithm? Will we calculate a superset of reaching definitions (over-approximating)? Will we calculate a subset (under-approximating)? Different users might have different requirements. Usually, dataflow analyses tend to over-approximate, but if we omit some elements from the reaching definitions, we will both over- and under-approximate at the same time. While this might make sense in certain use-cases, this might be quite surprising for potential users of the algorithm. This is why I think it is important to set a clear high-level goal. One such goal could be that we always want to over-approximate modulo bugs.

Arrays are strongly related, and while there are studies on indexed variables (the last paper in my summary talks about such an approach), I think the way C++ complicates things on an uncomfortable level. I do have confidence however adding support for it in the future for the cases where the index is known compile time (possibly with the help of the RD algorithm itself) should be doable by modifying the Variable class.

I think initially handling arrays as if it contained only one element should be fine.

So this leaves the most obvious: VarDecls, and for record objects a (VarDecl, FieldChain) pair (for s.x.a, this would be (s, [x,a])). Setting the VarDecl part of the pair to nullptr could refer to the implicit this during the analysis of a C++ methods.

Note that, this is very similar to what the lifetime analysis is using. See the LifetimeContractVariable in https://reviews.llvm.org/D72810. But we have a more refined memory location that can also include temporaries that we use during the analysis.

The plan was to represent each field with a separate Variable object, so for this code
struct A {
  struct B {
    int x, y;
  };
  B b;
};

void foo() { A a };
The following list of Variable objects would be created:
a
a.b
a.b.x
a.b.y
The reason behind this seemingly wasteful storage is that the eventual set operations would be difficult if not impossible to implement, if a single Variable object were to hold the information about its fields. Mind that each of them could have a totally different definition associated with it. I hope that by emloying immutable data structures these wouldn't be terribly expensive memory-wise.

Not sure what do you mean here. Basically, I think the question is, how hierarchical do you want your representation to be. I.e.: do you want to have pointers between sub/super objects or do you prefer doing lookups when you want to invalidate subobjects?

Also, it would be great to have some proposal upfront how indirect writes should behave.

I wrote a mail about this during my GSoC project: http://lists.llvm.org/pipermail/cfe-dev/2019-July/062975.html.

I think saying that *x = 2 will just be ignored is not sufficient. See my argument above about over- and under-approximation. I think it would be great to know what are the consequences of this decision. E.g. you could check how the existing algorithms like uninitialized variables behave. For example, one possible mitigation of the problem with pointers would be, to consider &var a definition.
My point is, we have several ways to deal with pointers without precise points-to analysis and ignoring them is just one of the options. Having more options considered with pros and cons enumerated would greatly increase our confidence in your chosen solution.

The Static Analyzer would be the number one user of the RD algorithm, as described in the summary, so I guess that you referred to the users of GEN/KILL sets? What do you mean under configurations?

I mean both. As you mentioned GEN/KILL could be reused for other analyses. But reaching definitions can be a building block for other high-level checks like finding raw pointers that are owners. Consider the following example:

void foo(int *p) {
  ...
  delete p;
}

How do you know that the pointer p is an owner of the pointee? You know that because it is deleted. So if you want to classify some raw pointers as owners one possible way to do it to check the reaching definitions for delete statements. Clang-tidy could try to convert some of those pointers to unique_ptrs. This is how this algorithm can be more generally useful as a building block for other problems unrelated to the static analyzer.

DenisDvlp removed a subscriber: DenisDvlp.Mar 18 2020, 2:28 PM

martong added inline comments.Apr 30 2020, 7:50 AM

clang/lib/Analysis/ReachingDefinitions.cpp
269	I understand that this is the worklist algorithm uplifted from Wikipedia. But how do we transmogrify the original algorithm [1] to that one? What's particularly interesting for me is that we continue with the successors from here instead of examining all blocks over again. [1] Dragon book, 2007

What about this patch? I'm removing my reviewer bit just so it doesn't appear in my list anymore, but if there are any updates, I'll keep myself as a subscriber. 🙂

Herald added a subscriber: manas. · View Herald TranscriptJun 3 2021, 6:49 AM

Revision Contents

Path

Size

clang/

include/

clang/

Analysis/

Analyses/

ReachingDefinitions.h

247 lines

StaticAnalyzer/

Checkers/

Checkers.td

13 lines

lib/

Analysis/

CMakeLists.txt

1 line

ReachingDefinitions.cpp

319 lines

StaticAnalyzer/

Checkers/

DebugCheckers.cpp

71 lines

test/

Analysis/

dump-definitions.cpp

153 lines

Diff 250740

clang/include/clang/Analysis/Analyses/ReachingDefinitions.h

This file was added.

				//===--- ReachingDefinitions.h ----------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Calculates reaching definitions for a CFG.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H
				#define LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H

				#include "clang/AST/Decl.h"
				#include "clang/ASTMatchers/ASTMatchFinder.h"
				#include "clang/Analysis/AnalysisDeclContext.h"
				#include "clang/Analysis/CFG.h"
				#include "llvm/ADT/SmallVector.h"
				#include <set>

				namespace clang {

				//===----------------------------------------------------------------------===//
				// Since reaching definitions was implemented for instructions, it isn't well
				// thought out what a definition of a variable is in C/C++, nor what we really
				// mean under the term variable.
				//===----------------------------------------------------------------------===//

				namespace ReachingDefinitionsDetail {

				//===----------------------------------------------------------------------===//
				// What a 'variable' is:
				// We define a variable as a VarDecl.
				// TODO: How about records? Add support for fields.
				// TODO: How do we describe the implicit this parameter for methods?
				//===----------------------------------------------------------------------===//

				struct Variable {
				xazax.hunUnsubmitted Not Done Reply Inline Actions I wonder if `Variable` will be the right notion long term. Do we want to be able to represent heap locations or do we exclude them on purpose? Reasoning about the heap is quite challenging so both answers might be reasonable. But in case we try to tackle the more general problem, we basically need a more general term like `MemoryLocation`. xazax.hun: I wonder if `Variable` will be the right notion long term. Do we want to be able to represent…
				SzelethusAuthorUnsubmitted Done Reply Inline Actions I don't have long term plans to reason about pointees in general. Heap in particular is probably off limits for the foreseeable future. Szelethus: I don't have long term plans to reason about pointees in general. Heap in particular is…
				const VarDecl *Var;

				Variable(const VarDecl *Var) : Var(Var) {}
				};

				struct VariableLess {
				// TODO: const ref? but what if we change this to immutablelist?
				bool operator()(Variable Lhs, Variable Rhs) {
				return Lhs.Var < Rhs.Var;
				}
				};

				//===----------------------------------------------------------------------===//
				// What a 'definition of a variable' is:
				// Whatever statement that writes a 'variable', or may write a 'variable'.
				// TODO: What about global variables on function calls? We should introduce
				// invalidations as well.
				//===----------------------------------------------------------------------===//

				class Definition : public Variable {
				xazax.hunUnsubmitted Not Done Reply Inline Actions Is the inheritance justified here? Is a definition a variable? Maybe having a variable member better express the relationship. xazax.hun: Is the inheritance justified here? Is a definition a variable? Maybe having a variable member…
				SzelethusAuthorUnsubmitted Done Reply Inline Actions Yup, you're right. This was a quick hack while developing. Szelethus: Yup, you're right. This was a quick hack while developing.
				public:
				enum DefinitionKind { Write };

				CFGBlock::ConstCFGElementRef E;
				DefinitionKind Kind;

				public:

				Definition(const VarDecl *Var, CFGBlock::ConstCFGElementRef E,
				DefinitionKind Kind)
				: Variable(Var), E(E), Kind(Kind) {}

				const CFGBlock *getCFGBlock() const { return E.getParent(); }

				// (varname [blockid, elementid]) (reason)
				void dump() const;
				};

				struct VarAndCFGElementLess {
				whisperityUnsubmitted Not Done Reply Inline Actions Shouldn't this be called `DefinitionLess` if this is the "natural" comparator for `Definition`? Also, is this the convention in LLVM, instead of providing an explicit specialisation for `std::less`? whisperity: Shouldn't this be called `DefinitionLess` if this is the "natural" comparator for `Definition`?
				bool operator()(Definition Lhs, Definition Rhs) const {
				return std::tie(Lhs.Var, Lhs.E) <
				std::tie(Rhs.Var, Rhs.E);
				}
				};

				/// A set of definitions sorted only by the variable, so that each basic block
				/// may only emit a single definition for any single variable.
				// TODO: We need to track more then a single definition to a variable for a
				// block's GEN set. Say, the static analyzer manages to prove that a potential
				// invalidation definition (like a function call) didn't write the variable, we
				// need to retrieve the definitions up to that point in the block.
				// TODO: This is ridiculously espensive, change to ImmutableSet.
				using GenSet = std::set<Definition, VariableLess>;

				/// A set of definitions sorted by the variable and the location of the
				/// definition. For KILL, IN and OUT sets this is correct, because a CFGBlock
				/// may kill several definitions of the same variables from different locations.
				using DefinitionSet = std::set<Definition, VarAndCFGElementLess>;
				xazax.hunUnsubmitted Not Done Reply Inline Actions As far as I understand this is the analysis state for you, while GenSet is used for the transfer functions (I might be mistaken). I think it might be better to origanize the code the following way: /// Analysis state everything state related /// Transfer functions everything transfer function related /// Iteration/Traversal the main loop of the algorithm xazax.hun: As far as I understand this is the analysis state for you, while GenSet is used for the…

				//===----------------------------------------------------------------------===//
				// Determining whether a statement modifies a variable is a challanging, and
				// requires using expensive-to-create matchers, and an easily extensible
				// interface, hence the use of a builder class outside of the main calculator.
				//===----------------------------------------------------------------------===//

				class GenSetBuilder;

				class GenSetMatcherCallback : public ast_matchers::MatchFinder::MatchCallback {
				protected:
				GenSetBuilder &GSBuilder;

				GenSetMatcherCallback(GenSetBuilder &GSBuilder) : GSBuilder(GSBuilder) {}
				};

				/// Responsible for building the GEN sets for each basic block.
				///
				/// Since pointer escapes or function calls in general require us to generate
				/// definitions that are invalidations, we need to gather all variables relevant
				/// for this analysis, like parameters, locals and globals. We refer to this
				/// stage as 'variable finding':
				/// * Collect all non-local, visible variables
				/// * Collect all local variables within the function that is used
				///
				/// The actual building of GEN sets has two stages:
				/// 1. For each CFGStmt in the CFGBlock, look for a statement that may be a
				/// definition of an expression. ('definition finding')
				/// 2. Search expressions for variables recursively. ('expression finding')
				class GenSetBuilder {
				using DefinitionKind = Definition::DefinitionKind;

				// Fields that shouldn't change after the construction of the builder object.
				whisperityUnsubmitted Not Done Reply Inline Actions I do not feel this is visually separative enough, the fact that there is a free-floating comment (seemingly bogusly not attached to any declaration) breaks reading of the code. How about this: class X { // Fields that should not change after construction private: Foo bar; Bla blah; // Fields that change at every step private: Blah bla; public: Yadda blebba; }; whisperity: I do not feel this is visually separative enough, the fact that there is a free-floating…

				llvm::SmallVector<std::unique_ptr<GenSetMatcherCallback>, 8> Callbacks;

				ast_matchers::MatchFinder VariableFinder;
				ast_matchers::MatchFinder DefinitionFinder;
				ast_matchers::MatchFinder ExpressionFinder;

				const Decl *D;
				ASTContext *Context = nullptr;
				whisperityUnsubmitted Not Done Reply Inline Actions You do not seem to be using this variable but in one place. Are you sure it is worth the saving as a field? Also, is it valid in the first place to have `this->Context` not to be the same as `this->D->getASTContext()`? whisperity: You do not seem to be using this variable but in one place. Are you sure it is worth the saving…

				// TODO: This does not contain non-local visible variables.
				std::set<Variable, VariableLess> AllVariables;

				// Fields that are changed at and during each GEN set construction.

				GenSet *CurrentGenSet;
				Optional<CFGBlock::ConstCFGElementRef> CurrentCFGElem;

				// This field is used for definition finder matchers to communicate with
				// expression finder matchers what kind of a definition we need to note. This
				// isn't too pretty, but the interaction in between matcher callbacks are
				// inherently awkward.
				DefinitionKind CurrentKind = Definition::Write;

				private:
				// TODO: Make this public and allow custom matchers to be added?
				template <ast_matchers::MatchFinder GenSetBuilder::*Finder,
				class GenSetMatcherCallbackTy>
				void addMatcher() {
				Callbacks.emplace_back(std::make_unique<GenSetMatcherCallbackTy>(*this));
				(this->*Finder)
				.addMatcher(GenSetMatcherCallbackTy::getMatcher(),
				Callbacks.back().get());
				}

				public:
				GenSetBuilder(const Decl *D);

				//===--------------------------------------------------------------------===//
				// Methods for retrieving a GEN set.
				//===--------------------------------------------------------------------===//

				void getGenSetForCFGBlock(const CFGBlock *B, GenSet &Gen);

				//===--------------------------------------------------------------------===//
				// Utility methods for building a GEN set. These are public because the
				// callback objects will need to call them.
				//===--------------------------------------------------------------------===//
				whisperityUnsubmitted Not Done Reply Inline Actions Is this the right approach? A public method makes the user itch to call it. If `GenSetMatcherCallback` is a superclass of every possible implementation, I think adding that class as a friend here works, and you could make the methods private, or protected. whisperity: Is this the right approach? A public method makes the user itch to call it. If…
				SzelethusAuthorUnsubmitted Done Reply Inline Actions Moving this to `GenSetMatcherCallback` would indeed be a great idea :) Szelethus: Moving this to `GenSetMatcherCallback` would indeed be a great idea :)

				/// When we find a definition to an expression, we need to see if that
				/// expression is a variable, or some other expression that needs further
				/// processing, like in this case:
				/// (a, b) = 10;
				void handleExpr(const Expr *E, DefinitionKind Kind);

				/// Insert a new defintion of a variable into the current GEN set.
				void insertToGenSet(const VarDecl *Var, DefinitionKind Kind);

				void insertToGenSet(const VarDecl *Var) { insertToGenSet(Var, CurrentKind); }

				/// Called during the variable finding stage.
				void addVariable(const VarDecl *Var);
				};

				} // end of namespace ReachingDefinitionsDetail

				/// Calculates reaching definitions for each basic block. This calculation
				/// doesn't try to argue about aliasing, meaning that some definitions are
				/// definite write (we know that the variable it written), and some are a result
				/// of invalidation, like passing a variable as a non-const reference to a
				/// function.
				class ReachingDefinitionsCalculator : public ManagedAnalysis {
				using GenSetBuilder = ReachingDefinitionsDetail::GenSetBuilder;

				public:
				using GenSet = ReachingDefinitionsDetail::GenSet;
				using DefinitionKind = ReachingDefinitionsDetail::Definition::DefinitionKind;
				using DefinitionSet = ReachingDefinitionsDetail::DefinitionSet;

				void calculate();

				void dumpKillSet() const;
				void dumpGenSet() const;
				void dumpReachingDefinitions() const;

				static ReachingDefinitionsCalculator *create(AnalysisDeclContext &Ctx) {
				return new ReachingDefinitionsCalculator(Ctx.getDecl(), Ctx.getCFG());
				}

				static const void *getTag() {
				static int x;
				xazax.hunUnsubmitted Not Done Reply Inline Actions Hmm, I am not really familiar with the specifics and maybe this is optimized away but always wondered if we only need the address of a static variable why don't we chose the smallest one? Like a char? xazax.hun: Hmm, I am not really familiar with the specifics and maybe this is optimized away but always…
				return &x;
				}

				private:
				ReachingDefinitionsCalculator(const Decl D, const CFG cfg);

				void init();

				const CFG *cfg;
				GenSetBuilder GSBuilder;

				// TODO: Make the GEN set public to allow clients to remove definitions, or
				// possibly mark an invalidation as write.
				// TODO: Should we change to ImmutableMap? Does this matter that much?
				std::map<const CFGBlock *, GenSet> Gen;
				std::map<const CFGBlock *, DefinitionSet> Kill;

				public:
				std::map<const CFGBlock *, DefinitionSet> In;
				std::map<const CFGBlock *, DefinitionSet> Out;
				whisperityUnsubmitted Not Done Reply Inline Actions The immutability might not(?), but the performance aspects of the RBT behind STL `map` could? `K` is a pointer, why not use `DenseMap`? How large do you expect these containers to be when they peak out? whisperity: The immutability might not(?), but the performance aspects of the RBT behind STL `map` could?
				SzelethusAuthorUnsubmitted Done Reply Inline Actions Measurements on real-life codebases can never come soon enough, but I fear it'll be a while before I get them. Szelethus: Measurements on real-life codebases can never come soon enough, but I fear it'll be a while…
				};

				} // end of namespace clang

				#endif // LLVM_CLANG_ANALYSIS_ANALYSES_REACHABLE_DEFINITIONS_H

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td

Show First 20 Lines • Show All 1,409 Lines • ▼ Show 20 Lines	def DebugContainerModeling : Checker<"DebugContainerModeling">,
Dependencies<[ContainerModeling]>,		Dependencies<[ContainerModeling]>,
Documentation<NotDocumented>;		Documentation<NotDocumented>;

def DebugIteratorModeling : Checker<"DebugIteratorModeling">,		def DebugIteratorModeling : Checker<"DebugIteratorModeling">,
HelpText<"Check the analyzer's understanding of C++ iterators">,		HelpText<"Check the analyzer's understanding of C++ iterators">,
Dependencies<[DebugContainerModeling, IteratorModeling]>,		Dependencies<[DebugContainerModeling, IteratorModeling]>,
Documentation<NotDocumented>;		Documentation<NotDocumented>;

} // end "debug"		def GenSetDumper : Checker<"DumpGenSets">,
		HelpText<"Dump the GEN sets for each block in a function">,
		Documentation<NotDocumented>;

		def KillSetDumper : Checker<"DumpKillSets">,
		HelpText<"Dump the KILL sets for each block in a function">,
		Documentation<NotDocumented>;

		def ReachingDefinitionsDumper : Checker<"DumpReachingDefinitions">,
		HelpText<"Dump the reaching definitions for each block in a function">,
		Documentation<NotDocumented>;

		} // end "debug"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Clone Detection		// Clone Detection
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

let ParentPackage = CloneDetectionAlpha in {		let ParentPackage = CloneDetectionAlpha in {

def CloneChecker : Checker<"CloneChecker">,		def CloneChecker : Checker<"CloneChecker">,
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

clang/lib/Analysis/CMakeLists.txt

Show All 16 Lines	add_clang_library(clangAnalysis
Dominators.cpp		Dominators.cpp
ExprMutationAnalyzer.cpp		ExprMutationAnalyzer.cpp
LiveVariables.cpp		LiveVariables.cpp
ObjCNoReturn.cpp		ObjCNoReturn.cpp
PathDiagnostic.cpp		PathDiagnostic.cpp
PostOrderCFGView.cpp		PostOrderCFGView.cpp
ProgramPoint.cpp		ProgramPoint.cpp
ReachableCode.cpp		ReachableCode.cpp
		ReachingDefinitions.cpp
RetainSummaryManager.cpp		RetainSummaryManager.cpp
ThreadSafety.cpp		ThreadSafety.cpp
ThreadSafetyCommon.cpp		ThreadSafetyCommon.cpp
ThreadSafetyLogical.cpp		ThreadSafetyLogical.cpp
ThreadSafetyTIL.cpp		ThreadSafetyTIL.cpp
UninitializedValues.cpp		UninitializedValues.cpp

LINK_LIBS		LINK_LIBS
clangAST		clangAST
clangASTMatchers		clangASTMatchers
clangBasic		clangBasic
clangLex		clangLex
)		)

add_subdirectory(plugins)		add_subdirectory(plugins)

clang/lib/Analysis/ReachingDefinitions.cpp

This file was added.

				//===--- ReachableDefinitions.cpp -------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Calculates reachable definitions for a variable.
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Analysis/Analyses/ReachingDefinitions.h"
				#include "clang/AST/Decl.h"
				#include "clang/AST/DeclCXX.h"
				#include "clang/AST/DeclLookups.h"
				#include "clang/AST/Expr.h"
				#include "clang/ASTMatchers/ASTMatchFinder.h"
				#include "clang/ASTMatchers/ASTMatchers.h"
				#include "clang/ASTMatchers/ASTMatchersInternal.h"
				#include "clang/Analysis/CFG.h"
				#include "clang/Basic/LLVM.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/ADT/SetOperations.h"
				#include "llvm/Support/ErrorHandling.h"
				#include <memory>

				using namespace clang;
				using namespace ReachingDefinitionsDetail;
				using namespace ast_matchers;

				using DefinitionKind = Definition::DefinitionKind;

				//===----------------------------------------------------------------------===//
				// Utility functions.
				//===----------------------------------------------------------------------===//

				static bool killsVar(const Definition &Victim, const GenSet &Set) {
				for (const Definition &Perpetrator : Set)
				if (Victim.Var == Perpetrator.Var)
				xazax.hunUnsubmitted Not Done Reply Inline Actions In the future this will be more complicated. For example if I assign to a struct all of its members needs to be killed. As a result, you will not only need to represent memory regions but also the relationships between them. I wonder if the analyzer's memregion classes are any good or are they too specific to the analyzer. xazax.hun: In the future this will be more complicated. For example if I assign to a struct all of its…
				SzelethusAuthorUnsubmitted Done Reply Inline Actions As described in my comment, record variables would have numerous `Variable` classes, so this function wouldn't get much more complicated in the future (as seen in D64991). Szelethus: As described in my comment, record variables would have numerous `Variable` classes, so this…
				return true;
				return false;
				}

				static StringRef describeDefinitionKind(DefinitionKind K) {
				switch (K) {
				case Definition::Write:
				return "write";
				}
				}

				//===----------------------------------------------------------------------===//
				// Methods of Definition.
				//===----------------------------------------------------------------------===//

				void Definition::dump() const {
				llvm::errs() << "(" << Var->getNameAsString();

				llvm::errs() << ", [" << getCFGBlock()->getIndexInCFG() << ", "
				<< E.getIndexInBlock() << "])"
				<< " <" << (describeDefinitionKind(Kind)) << ">";
				}

				//===----------------------------------------------------------------------===//
				// Matcher callbacks for constructing GEN sets for the variable finding stage.
				//===----------------------------------------------------------------------===//

				class VariableCollectorCB : public GenSetMatcherCallback {
				public:
				VariableCollectorCB(GenSetBuilder &GSBuilder)
				: GenSetMatcherCallback(GSBuilder) {}

				static internal::Matcher<Stmt> getMatcher() {
				return stmt(forEachDescendant(declRefExpr().bind("var")));
				}

				virtual void run(const MatchFinder::MatchResult &Result) override {
				const auto *E = Result.Nodes.getNodeAs<DeclRefExpr>("var");
				assert(E);
				if (const VarDecl *Var = dyn_cast<VarDecl>(E->getDecl()))
				GSBuilder.addVariable(Var);
				}
				};

				//===----------------------------------------------------------------------===//
				// Matcher callbacks for constructing GEN sets for the definition finding stage.
				//===----------------------------------------------------------------------===//

				class AssignmentOperatorCB : public GenSetMatcherCallback {
				public:
				AssignmentOperatorCB(GenSetBuilder &GSBuilder)
				: GenSetMatcherCallback(GSBuilder) {}

				static internal::Matcher<Stmt> getMatcher() {
				return binaryOperator(isAssignmentOperator()).bind("assign");
				whisperityUnsubmitted Not Done Reply Inline Actions I know this is the first patch, but it might be worth mentioning here that this thing does not match user-defined operators. whisperity: I know this is the first patch, but it might be worth mentioning here that this thing does not…
				}

				virtual void run(const MatchFinder::MatchResult &Result) override {
				const auto *BO = Result.Nodes.getNodeAs<BinaryOperator>("assign");
				assert(BO);
				GSBuilder.handleExpr(BO->getLHS(), DefinitionKind::Write);
				}
				};

				class DeclarationCB : public GenSetMatcherCallback {
				public:
				DeclarationCB(GenSetBuilder &GSBuilder) : GenSetMatcherCallback(GSBuilder) {}

				static internal::Matcher<Stmt> getMatcher() {
				return declStmt().bind("decls");
				}

				virtual void run(const MatchFinder::MatchResult &Result) override {
				const auto *DS = Result.Nodes.getNodeAs<DeclStmt>("decls");
				assert(DS);

				for (const Decl *D : DS->decls()) {
				const auto *Var = dyn_cast<VarDecl>(D);
				if (!Var)
				continue;

				GSBuilder.insertToGenSet(Var, DefinitionKind::Write);
				}
				}
				};

				//===----------------------------------------------------------------------===//
				// Matcher callbacks for constructing GEN sets for the expression finding stage.
				//===----------------------------------------------------------------------===//

				class DeclRefExprCB : public GenSetMatcherCallback {
				public:
				DeclRefExprCB(GenSetBuilder &GSBuilder) : GenSetMatcherCallback(GSBuilder) {}

				static internal::Matcher<Stmt> getMatcher() {
				return declRefExpr(to(varDecl().bind("var")));
				xazax.hunUnsubmitted Not Done Reply Inline Actions Note that, `memberExpr` is not supported at this point. It is not derived from `declRefExpr`. xazax.hun: Note that, `memberExpr` is not supported at this point. It is not derived from `declRefExpr`.
				SzelethusAuthorUnsubmitted Done Reply Inline Actions Indeed, that is one beast of a matcher :) You can take a sneak peak in D64991. Szelethus: Indeed, that is one beast of a matcher :) You can take a sneak peak in D64991.
				}

				virtual void run(const MatchFinder::MatchResult &Result) override {
				const auto *Var = Result.Nodes.getNodeAs<VarDecl>("var");
				assert(Var);
				GSBuilder.insertToGenSet(Var);
				}
				};

				//===----------------------------------------------------------------------===//
				// Methods of GenSetBuilder.
				//===----------------------------------------------------------------------===//

				GenSetBuilder::GenSetBuilder(const Decl *D)
				: D(D), Context(&D->getASTContext()) {

				// TODO: Should we match the entire TU for nested static variables?
				addMatcher<&GenSetBuilder::VariableFinder, VariableCollectorCB>();

				// TODO: Elvis operator (?:).
				// TODO: C++ MemberExpr?
				// TODO: ParenExpr? (((a))) = 3;
				addMatcher<&GenSetBuilder::ExpressionFinder, DeclRefExprCB>();

				// TODO: Destructor calls? Should we be THAT conservative?
				whisperityUnsubmitted Not Done Reply Inline Actions You mean explicit destructor calls here, right? whisperity: You mean //explicit// destructor calls here, right?
				SzelethusAuthorUnsubmitted Done Reply Inline Actions Nope, implicit. I wouldn't worry much about explicit calls, I would handle them the same as I would any `CallExpr`. Implicit destructor calls could modify the global state, and are not visibly present in the code. Szelethus: Nope, implicit. I wouldn't worry much about explicit calls, I would handle them the same as I…
				// TODO: Regular function calls?
				// TODO: Moving an object?
				// TODO: Method calls?
				// TODO: Analyzing a method?
				// TODO: What do you do with Objective.*?
				// TODO: Exceptions?
				addMatcher<&GenSetBuilder::DefinitionFinder, AssignmentOperatorCB>();
				addMatcher<&GenSetBuilder::DefinitionFinder, DeclarationCB>();

				// Collect all used variables.
				if (const auto *FD = dyn_cast<FunctionDecl>(D)) {
				VariableFinder.match(*FD, FD->getASTContext());
				for (const ParmVarDecl *Param : FD->parameters())
				addVariable(Param);
				}

				// TODO: Collect all visible, non-local variables as well.
				}

				void GenSetBuilder::getGenSetForCFGBlock(const CFGBlock *B, GenSet &Gen) {
				if (B->empty())
				return;

				VariableFinder.match(*D, D->getASTContext());

				CurrentGenSet = &Gen;

				for (CFGBlock::ConstCFGElementRef E : B->rrefs()) {
				if (E->getKind() != CFGElement::Kind::Statement)
				continue;
				CurrentCFGElem = E;

				const Stmt *S = E->castAs<CFGStmt>().getStmt();
				assert(S);
				DefinitionFinder.match(*S, D->getASTContext());
				}

				CurrentGenSet = nullptr;
				}

				void GenSetBuilder::handleExpr(const Expr *E, DefinitionKind Kind) {
				CurrentKind = Kind;
				ExpressionFinder.match(E, Context);
				}

				void GenSetBuilder::insertToGenSet(const VarDecl *Var,
				DefinitionKind Kind) {
				CurrentGenSet->emplace(Var, *CurrentCFGElem, Kind);
				}

				void GenSetBuilder::addVariable(const VarDecl *Var) {
				AllVariables.emplace(Var);
				}

				//===----------------------------------------------------------------------===//
				// Methods of ReachingDefinitionsCalculator.
				//===----------------------------------------------------------------------===//

				ReachingDefinitionsCalculator::ReachingDefinitionsCalculator(const Decl *D,
				const CFG *cfg)
				: cfg(cfg), GSBuilder(D) {

				for (const CFGBlock B : cfg)
				GSBuilder.getGenSetForCFGBlock(B, Gen[B]);
				// TODO: We ignore parameters! We should collect them as definitions as well.

				calculate();
				}

				void ReachingDefinitionsCalculator::init() {
				llvm::SmallVector<Definition, 16> AllDefinitions;
				for (const std::pair<const CFGBlock *, GenSet> G : Gen)
				for (const Definition &Def : G.second)
				AllDefinitions.push_back(Def);

				for (const std::pair<const CFGBlock *, GenSet> G : Gen)
				for (const Definition &Def : AllDefinitions)
				if (G.first != Def.getCFGBlock() && killsVar(Def, G.second))
				Kill[G.first].insert(Def);
				}

				using WorklistTy = llvm::SmallVector<const CFGBlock *, 5>;

				void ReachingDefinitionsCalculator::calculate() {
				init();

				// TODO: Use ForwardDataflowWorklist.
				for (const std::pair<const CFGBlock *, GenSet> G : Gen)
				Out[G.first] = {G.second.begin(), G.second.end()};

				WorklistTy Worklist({cfg->begin(), cfg->end()});

				while (!Worklist.empty()) {
				const CFGBlock *N = Worklist.pop_back_val();

				for (const CFGBlock *Pred : N->preds())
				llvm::set_union(In[N], Out[Pred]);

				bool HasChanged =
				llvm::set_union(Out[N], llvm::set_difference(In[N], Kill[N]));

				if (llvm::set_union(Out[N], Gen[N]))
				HasChanged = true;

				if (HasChanged) {
				for (const CFGBlock *Succ : N->succs())
				Worklist.push_back(Succ);
				martongUnsubmitted Not Done Reply Inline Actions I understand that this is the worklist algorithm uplifted from Wikipedia. But how do we transmogrify the original algorithm [1] to that one? What's particularly interesting for me is that we continue with the successors from here instead of examining all blocks over again. [1] Dragon book, 2007 martong: I understand that this is the worklist algorithm uplifted from Wikipedia. But how do we…
				}
				}
				}

				void ReachingDefinitionsCalculator::dumpGenSet() const {
				llvm::errs() << "GEN sets: blockid (varname [blockid, elementid])\n";
				whisperityUnsubmitted Not Done Reply Inline Actions Instead of simply blockID, could you harmonise this output with the CFG dump and say `Bx` instead of just `X`? whisperity: Instead of simply blockID, could you harmonise this output with the CFG dump and say `Bx`…
				SzelethusAuthorUnsubmitted Done Reply Inline Actions What do you mean? Szelethus: What do you mean?
				for (const std::pair<const CFGBlock *, GenSet> D : Gen) {
				size_t BlockID = llvm::find(*cfg, D.first) - cfg->begin();
				for (const Definition &Def : D.second) {
				llvm::errs() << BlockID << ' ';
				Def.dump();
				llvm::errs() << '\n';
				}
				}
				}

				void ReachingDefinitionsCalculator::dumpKillSet() const {
				llvm::errs() << "KILL sets: blockid (varname [blockid, elementid])\n";
				for (const std::pair<const CFGBlock *, DefinitionSet> D : Kill) {
				size_t BlockID = llvm::find(*cfg, D.first) - cfg->begin();
				for (const Definition &Def : D.second) {
				llvm::errs() << BlockID << ' ';
				Def.dump();
				llvm::errs() << '\n';
				}
				}
				}

				void ReachingDefinitionsCalculator::dumpReachingDefinitions() const {
				llvm::errs() << "Reaching definition sets: "
				"blockid IN/OUT (varname [blockid, elementid])\n";
				for (const CFGBlock B : cfg) {
				size_t BlockID = llvm::find(*cfg, B) - cfg->begin();
				if (In.count(B)) {
				for (const Definition &Def : In.find(B)->second) {
				llvm::errs() << BlockID << " IN ";
				Def.dump();
				llvm::errs() << '\n';
				}
				}

				if (Out.count(B)) {
				for (const Definition &Def : Out.find(B)->second) {
				llvm::errs() << BlockID << " OUT ";
				Def.dump();
				llvm::errs() << '\n';
				}
				}
				}
				}

clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp

//==- DebugCheckers.cpp - Debugging Checkers ---------------------- C++ --==//		//==- DebugCheckers.cpp - Debugging Checkers ---------------------- C++ --==//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines checkers that display debugging information.		// This file defines checkers that display debugging information.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
#include "clang/Analysis/Analyses/Dominators.h"		#include "clang/Analysis/Analyses/Dominators.h"
#include "clang/Analysis/Analyses/LiveVariables.h"		#include "clang/Analysis/Analyses/LiveVariables.h"
		#include "clang/Analysis/Analyses/ReachingDefinitions.h"
#include "clang/Analysis/CallGraph.h"		#include "clang/Analysis/CallGraph.h"
#include "clang/StaticAnalyzer/Core/Checker.h"		#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"		#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
		#include "clang/StaticAnalyzer/Core/Checker.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
#include "llvm/Support/Process.h"		#include "llvm/Support/Process.h"

using namespace clang;		using namespace clang;
using namespace ento;		using namespace ento;
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	void ento::registerControlDependencyTreeDumper(CheckerManager &mgr) {
mgr.registerChecker<ControlDependencyTreeDumper>();		mgr.registerChecker<ControlDependencyTreeDumper>();
}		}

bool ento::shouldRegisterControlDependencyTreeDumper(const LangOptions &LO) {		bool ento::shouldRegisterControlDependencyTreeDumper(const LangOptions &LO) {
return true;		return true;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// GenSetDumper
		//===----------------------------------------------------------------------===//

		namespace {
		class GenSetDumper : public Checker<check::ASTCodeBody> {
		public:
		void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
		BugReporter &BR) const {
		if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
		AC->getAnalysis<ReachingDefinitionsCalculator>()->dumpGenSet();
		}
		};
		} // namespace

		void ento::registerGenSetDumper(CheckerManager &mgr) {
		mgr.registerChecker<GenSetDumper>();
		}

		bool ento::shouldRegisterGenSetDumper(const LangOptions &LO) { return true; }

		//===----------------------------------------------------------------------===//
		// KillSetDumper
		//===----------------------------------------------------------------------===//

		namespace {
		class KillSetDumper : public Checker<check::ASTCodeBody> {
		public:
		void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
		BugReporter &BR) const {
		if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
		AC->getAnalysis<ReachingDefinitionsCalculator>()->dumpKillSet();
		}
		};
		} // namespace

		void ento::registerKillSetDumper(CheckerManager &mgr) {
		mgr.registerChecker<KillSetDumper>();
		}

		bool ento::shouldRegisterKillSetDumper(const LangOptions &LO) { return true; }

		//===----------------------------------------------------------------------===//
		// ReachingDefinitionsDumper
		//===----------------------------------------------------------------------===//

		namespace {
		class ReachingDefinitionsDumper : public Checker<check::ASTCodeBody> {
		public:
		void checkASTCodeBody(const Decl *D, AnalysisManager &mgr,
		BugReporter &BR) const {
		if (AnalysisDeclContext *AC = mgr.getAnalysisDeclContext(D))
		AC->getAnalysis<ReachingDefinitionsCalculator>()
		->dumpReachingDefinitions();
		}
		};
		} // namespace

		void ento::registerReachingDefinitionsDumper(CheckerManager &mgr) {
		mgr.registerChecker<ReachingDefinitionsDumper>();
		}

		bool ento::shouldRegisterReachingDefinitionsDumper(const LangOptions &LO) {
		return true;
		}

		//===----------------------------------------------------------------------===//
// LiveVariablesDumper		// LiveVariablesDumper
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
class LiveVariablesDumper : public Checker<check::ASTCodeBody> {		class LiveVariablesDumper : public Checker<check::ASTCodeBody> {
public:		public:
void checkASTCodeBody(const Decl *D, AnalysisManager& mgr,		void checkASTCodeBody(const Decl *D, AnalysisManager& mgr,
BugReporter &BR) const {		BugReporter &BR) const {
▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

clang/test/Analysis/dump-definitions.cpp

This file was added.

				// RUN: %clang_analyze_cc1 %s \
				// RUN: -analyzer-checker=debug.DumpCFG \
				// RUN: -analyzer-checker=debug.DumpGenSets \
				// RUN: -analyzer-checker=debug.DumpKillSets \
				// RUN: -analyzer-checker=debug.DumpReachingDefinitions \
				// RUN: 2>&1 \| FileCheck %s

				int getInt();
				int *getIntPtr();
				bool coin();

				void single_vardecl_in_declstmt() {
				int *ptr = getIntPtr();
				}
				// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]

				// CHECK: GEN sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
				whisperityUnsubmitted Not Done Reply Inline Actions What is an element? How do they get their numbers? What does the `3` mean here? I get that basic block 1 (the body of the function) writes `ptr`... but I don't understand this further from looking at the expected output. whisperity: What is an //element//? How do they get their numbers? What does the `3` mean here? I get that…
				SzelethusAuthorUnsubmitted Done Reply Inline Actions A `CFGBlock` is a block that contains all statements that are always executed sequentially. The statements are enumerated according to the execution order. 3 here means that the definition is the 3rd element in the 1st `CFGBlock`. Szelethus: A `CFGBlock` is a block that contains all statements that are always executed sequentially. The…
				// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
				// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
				// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
				// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>

				void multiple_vardecl_in_declstmt() {
				int *ptr = getIntPtr(), i;
				}
				// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]

				// CHECK: GEN sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
				// CHECK-NEXT: 1 (i, [1, 4]) <write>
				// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
				// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
				// CHECK-NEXT: 0 IN (i, [1, 4]) <write>
				// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
				// CHECK-NEXT: 0 OUT (i, [1, 4]) <write>
				// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>
				// CHECK-NEXT: 1 OUT (i, [1, 4]) <write>

				void function_and_vardecl_in_declstmt() {
				int *ptr = getIntPtr(), a();
				}
				// [B2 (ENTRY)] -> [B1] -> [B0 (EXIT)]

				// CHECK: GEN sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: 1 (ptr, [1, 3]) <write>
				// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
				// CHECK-NEXT: 0 IN (ptr, [1, 3]) <write>
				// CHECK-NEXT: 0 OUT (ptr, [1, 3]) <write>
				// CHECK-NEXT: 1 OUT (ptr, [1, 3]) <write>

				void single_def_in_same_block() {
				int *ptr = getIntPtr();

				if (coin())
				ptr = 0;

				if (!ptr)
				*ptr = 5;
				}
				// -> [B3] -> -> [B1] ->
				// / \ / \
				// [B5 (ENTRY)] -> [B4] ------> [B2] ---> [B0 (EXIT)]

				// CHECK: GEN sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: 3 (ptr, [3, 3]) <write>
				// CHECK-NEXT: 4 (ptr, [4, 3]) <write>
				// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: 3 (ptr, [4, 3]) <write>
				// CHECK-NEXT: 4 (ptr, [3, 3]) <write>
				// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
				// CHECK-NEXT: 0 IN (ptr, [3, 3]) <write>
				// CHECK-NEXT: 0 IN (ptr, [4, 3]) <write>
				// CHECK-NEXT: 0 OUT (ptr, [3, 3]) <write>
				// CHECK-NEXT: 0 OUT (ptr, [4, 3]) <write>
				// CHECK-NEXT: 1 IN (ptr, [3, 3]) <write>
				// CHECK-NEXT: 1 IN (ptr, [4, 3]) <write>
				// CHECK-NEXT: 1 OUT (ptr, [3, 3]) <write>
				// CHECK-NEXT: 1 OUT (ptr, [4, 3]) <write>
				// CHECK-NEXT: 2 IN (ptr, [3, 3]) <write>
				// CHECK-NEXT: 2 IN (ptr, [4, 3]) <write>
				// CHECK-NEXT: 2 OUT (ptr, [3, 3]) <write>
				// CHECK-NEXT: 2 OUT (ptr, [4, 3]) <write>
				// CHECK-NEXT: 3 IN (ptr, [4, 3]) <write>
				// CHECK-NEXT: 3 OUT (ptr, [3, 3]) <write>
				// CHECK-NEXT: 4 OUT (ptr, [4, 3]) <write>

				void different_assignments() {
				int i = getInt();

				if (coin())
				i = 0;

				i += 3;

				if (!coin())
				i -= 2;

				i *= 9;

				if (i = 0)
				;
				}
				// -> [B5] -> -> [B3] -> -> [B1] ->
				// / \ / \ / \
				// [B7 (ENTRY)] -> [B6] ------> [B4] -------> [B2] ---> [B0 (EXIT)]

				// CHECK: GEN sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: 2 (i, [2, 5]) <write>
				// CHECK-NEXT: 3 (i, [3, 2]) <write>
				// CHECK-NEXT: 4 (i, [4, 2]) <write>
				// CHECK-NEXT: 5 (i, [5, 2]) <write>
				// CHECK-NEXT: 6 (i, [6, 3]) <write>
				// CHECK-NEXT: KILL sets: blockid (varname [blockid, elementid])
				// CHECK-NEXT: 2 (i, [3, 2]) <write>
				// CHECK-NEXT: 2 (i, [4, 2]) <write>
				// CHECK-NEXT: 2 (i, [5, 2]) <write>
				// CHECK-NEXT: 2 (i, [6, 3]) <write>
				// CHECK-NEXT: 3 (i, [2, 5]) <write>
				// CHECK-NEXT: 3 (i, [4, 2]) <write>
				// CHECK-NEXT: 3 (i, [5, 2]) <write>
				// CHECK-NEXT: 3 (i, [6, 3]) <write>
				// CHECK-NEXT: 4 (i, [2, 5]) <write>
				// CHECK-NEXT: 4 (i, [3, 2]) <write>
				// CHECK-NEXT: 4 (i, [5, 2]) <write>
				// CHECK-NEXT: 4 (i, [6, 3]) <write>
				// CHECK-NEXT: 5 (i, [2, 5]) <write>
				// CHECK-NEXT: 5 (i, [3, 2]) <write>
				// CHECK-NEXT: 5 (i, [4, 2]) <write>
				// CHECK-NEXT: 5 (i, [6, 3]) <write>
				// CHECK-NEXT: 6 (i, [2, 5]) <write>
				// CHECK-NEXT: 6 (i, [3, 2]) <write>
				// CHECK-NEXT: 6 (i, [4, 2]) <write>
				// CHECK-NEXT: 6 (i, [5, 2]) <write>
				// CHECK-NEXT: Reaching definition sets: blockid IN/OUT (varname [blockid, elementid])
				// CHECK-NEXT: 0 IN (i, [2, 5]) <write>
				// CHECK-NEXT: 0 OUT (i, [2, 5]) <write>
				// CHECK-NEXT: 1 IN (i, [2, 5]) <write>
				// CHECK-NEXT: 1 OUT (i, [2, 5]) <write>
				// CHECK-NEXT: 2 IN (i, [3, 2]) <write>
				// CHECK-NEXT: 2 IN (i, [4, 2]) <write>
				// CHECK-NEXT: 2 OUT (i, [2, 5]) <write>
				// CHECK-NEXT: 3 IN (i, [4, 2]) <write>
				// CHECK-NEXT: 3 OUT (i, [3, 2]) <write>
				// CHECK-NEXT: 4 IN (i, [5, 2]) <write>
				// CHECK-NEXT: 4 IN (i, [6, 3]) <write>
				// CHECK-NEXT: 4 OUT (i, [4, 2]) <write>
				// CHECK-NEXT: 5 IN (i, [6, 3]) <write>
				// CHECK-NEXT: 5 OUT (i, [5, 2]) <write>
				// CHECK-NEXT: 6 OUT (i, [6, 3]) <write>

This is an archive of the discontinued LLVM Phabricator instance.

[analysis][analyzer] Introduce the skeleton of a reaching definitions calculatorNeeds ReviewPublic

Details

Short description & motivation

The algorithm

Implementation

Immediate questions to address

Further reading

References

Diff Detail

Event Timeline

Revision Contents

Diff 250740

clang/include/clang/Analysis/Analyses/ReachingDefinitions.h

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td

clang/lib/Analysis/CMakeLists.txt

clang/lib/Analysis/ReachingDefinitions.cpp

clang/lib/StaticAnalyzer/Checkers/DebugCheckers.cpp

clang/test/Analysis/dump-definitions.cpp

[analysis][analyzer] Introduce the skeleton of a reaching definitions calculator
Needs ReviewPublic