This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Analysis/FlowSensitive/
-
clang/
-
Analysis/
-
FlowSensitive/
-
DataflowEnvironment.h
-
Transfer.h
1/3
TypeErasedDataflowAnalysis.h
-
lib/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
6/15
DataflowEnvironment.cpp
2/4
Transfer.cpp
-
TypeErasedDataflowAnalysis.cpp
-
unittests/Analysis/FlowSensitive/
-
Analysis/
-
FlowSensitive/
3/6
TransferTest.cpp

Differential D130306

[clang][dataflow] Analyze calls to in-TU functions
ClosedPublic

Authored by samestep on Jul 21 2022, 2:34 PM.

Download Raw Diff

Details

Reviewers

NoQ
ymandel
gribozavr2
sgatev
li.zhe.hua
xazax.hun

Commits

rG300fbf56f89a: [clang][dataflow] Analyze calls to in-TU functions
rGfa2b83d07eca: [clang][dataflow] Analyze calls to in-TU functions

Summary

This patch adds initial support for context-sensitive analysis of simple functions whose definition is available in the translation unit, guarded by the ContextSensitive flag in the new TransferOptions struct. When this option is true, the VisitCallExpr case in the builtin transfer function has a fallthrough case which checks for a direct callee with a body. In that case, it constructs a CFG from that callee body, uses the new pushCall method on the Environment to make an environment to analyze the callee, and then calls runDataflowAnalysis with a NoopAnalysis (disabling context-sensitive analysis on that sub-analysis, to avoid problems with recursion). After the sub-analysis completes, the Environment from its exit block is simply assigned back to the environment at the callsite.

The pushCall method (which currently only supports non-method functions with some restrictions) maps the SourceLocations for all the parameters to the existing source locations for the corresponding arguments from the callsite.

This patch adds a few tests to check that this context-sensitive analysis works on simple functions. More sophisticated functionality will be added later; the most important next step is to explicitly model context in some fields of the DataflowAnalysisContext class, as mentioned in a FIXME comment in the pushCall implementation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

samestep created this revision.Jul 21 2022, 2:34 PM

Herald added a reviewer: NoQ. · View Herald TranscriptJul 21 2022, 2:34 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: martong, tschuett, xazax.hun. · View Herald Transcript

samestep requested review of this revision.Jul 21 2022, 2:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2022, 2:34 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

samestep edited the summary of this revision. (Show Details)Jul 21 2022, 2:48 PM

samestep added reviewers: ymandel, gribozavr2, sgatev.

Harbormaster completed remote builds in B176862: Diff 446635.Jul 21 2022, 3:01 PM

There are many ways to introduce context sensitivity into the framework, this patch seems to take the "inline substitution" approach, the same approach the Clang Static Analyzer is taking. While this approach is relatively easy to implement and has great precision, it also has some scalability problems. Did you also consider a summary-based approach? In general, I believe the inline substitution approach results in an easier to use interface for the users of the framework, but I am a bit concerned about the scalability problems.

Some other related questions:

Why call noop analysis? As far as I understand, this would only update the environment but not the lattice of the current analysis, i.e., if the analysis is computing some information like liveness, that information would not be context sensitive. Do I miss something?
Why limit the call depth to 1? The patch mentions recursive functions. In case of the Clang Static Analyzer, the call depth is 4. I think if we go with the inline substitution approach, we want this parameter to be tunable, because different analyses might have different sweet spots for the call stack depth.
The CSA also has other tunables, e.g., small functions are always inlined and large functions are never inlined.
Currently, it looks like the framework assumes functions that cannot be inlined are not doing anything. This is an unsound assumption, and I wonder if we should change that before we try to settle on the best values for the tunable parameters.
The current code might do a bit too much work. E.g. consider:

while (...) {
  inlinableCall();
}

As far as I understand, the current approach would start the analysis of inlinableCall from scratch in each iteration. I wonder if we actually want to preserve the state between the iterations, so we do not always reevaluate the call from scratch. Currently, it might not be a big deal as the fixed-point iteration part is disabled. But this could be a perf problem in the future, unless I miss something.

I think the environment currently is not really up to context-sensitive evaluation. Consider a recursive function:

void f(int a) {
  ++a;
  if (a < 10)
      f(a);
}

Here, when the recursive call is evaluated, it is ambiguous what a refers to. Is it the argument of the caller or the callee? To solve this ambiguity, the calling context needs to be the part of the keys in the environment.

Sam, this is a great start! I'm really excited to see that you have a core working so quickly.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
206
218	I wonder how this will work between caller and callee. Do we need separate global var state in the frame? If so, maybe mention that as well in the FIXME above.
235	I'm pretty sure we want `SkipPast::Reference`. That will ensure that the parameter and argument share the same underlying location. Otherwise, in the case of references, the parameter will point to the reference location object rather than just directly to the location.
clang/lib/Analysis/FlowSensitive/Transfer.cpp
517	here and below: s/TODO/FIXME.
524	This seems worth a FIXME or, at least, an explanation. It implies that with the current design, we can't support general-purpose analyses, which we should probably fix. Given our goal of supporting models that don't involve specialized lattices, I think this is a good compromise for the short term, but not a stable solution for the framework (hence FIXME sounds right).

Update parent patch

Harbormaster completed remote builds in B177007: Diff 446830.Jul 22 2022, 7:50 AM

In D130306#3670259, @xazax.hun wrote:

There are many ways to introduce context sensitivity into the framework, this patch seems to take the "inline substitution" approach, the same approach the Clang Static Analyzer is taking. While this approach is relatively easy to implement and has great precision, it also has some scalability problems. Did you also consider a summary-based approach? In general, I believe the inline substitution approach results in an easier to use interface for the users of the framework, but I am a bit concerned about the scalability problems.

Good point, thanks! Yes, we considered a summary-based approach, but we decided not to use it because (as you mentioned) it would be much more difficult to implement, especially for callees with nontrivial CFGs, which would result in a nontrivial flow condition instead of just Values in the Environment.

Could you elaborate on what specific scalability problems you are concerned about? The main one that comes to mind for me is unpredictable cost due to the potential for arbitrary callee bodies to be present in the translation unit. While this particular patch doesn't address that concern, we definitely have plans to do so: I'm guessing that will take the form of providing the analysis an allowlist of symbols of which it is allowed to analyze the bodies, so it would treat any symbols not in that list as if their bodies are not available in the TU regardless.

To clarify, the main reason we want context-sensitive analysis is to allow us to simplify our definitions of some models, such as the optional checker model. The goal is to provide the analysis a mock implementation of an optional type, and then use context-sensitive analysis (probably just one or two layers deep) to model the constructors and methods.

Some other related questions:

Why call noop analysis? As far as I understand, this would only update the environment but not the lattice of the current analysis, i.e., if the analysis is computing some information like liveness, that information would not be context sensitive. Do I miss something?

The alternative in this case would be to use the same analysis for the callee that is already being used for the caller? I agree that would be nicer to serve a broader category of use cases. I didn't do that in this patch for a couple reasons:

In the short term, that would require threading more information through to the builtin transfer function, while this patch was meant to just be a minimum viable product.
In the longer term, we probably don't need that for our specific goals (just modeling simple fields of mock classes) mentioned above.

However, if you have a suggestion for a way to construct an instance of the outer analysis here, that would definitely be useful.

Why limit the call depth to 1? The patch mentions recursive functions. In case of the Clang Static Analyzer, the call depth is 4. I think if we go with the inline substitution approach, we want this parameter to be tunable, because different analyses might have different sweet spots for the call stack depth.

There's no particular reason for this. We plan to support more call stack depth soon. This would probably make sense as a field in the TransferOptions struct.

The CSA also has other tunables, e.g., small functions are always inlined and large functions are never inlined.

See my response earlier about an allowlist for symbols to inline.

Currently, it looks like the framework assumes functions that cannot be inlined are not doing anything. This is an unsound assumption, and I wonder if we should change that before we try to settle on the best values for the tunable parameters.

Agreed, this assumption is unsound. However, the framework already makes many other unsound assumptions in a similar spirit, so this one doesn't immediately strike me as one that needs to change. I'll defer to people with more context.

The current code might do a bit too much work. E.g. consider:
while (...) {
  inlinableCall();
}
As far as I understand, the current approach would start the analysis of inlinableCall from scratch in each iteration. I wonder if we actually want to preserve the state between the iterations, so we do not always reevaluate the call from scratch. Currently, it might not be a big deal as the fixed-point iteration part is disabled. But this could be a perf problem in the future, unless I miss something.

Agreed, this is somewhat wasteful. Note that not everything is thrown away, because the same DataflowAnalysisContext is reused when analyzing the callee. Still, we would like to handle this in a smarter way in the future, as you mention. For now, though, while just building up functionality behind a feature flag, we don't plan to worry as much about this particular performance concern.

I think the environment currently is not really up to context-sensitive evaluation. Consider a recursive function:
void f(int a) {
  ++a;
  if (a < 10)
      f(a);
}
Here, when the recursive call is evaluated, it is ambiguous what a refers to. Is it the argument of the caller or the callee? To solve this ambiguity, the calling context needs to be the part of the keys in the environment.

Yes, I mentioned this in the patch description and in a comment in the pushCall implementation: we plan to do this by modifying the set of fields in the DataflowAnalysisContext class. I plan to do this in my next patch, once this one is landed.

samestep added inline comments.Jul 22 2022, 7:53 AM

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
206	OK, I'll change this; would you like for me to replace all the other `TODO`s with `FIXME`s, as well?
218	Could you clarify what you mean? Perhaps I just don't understand exactly what is meant by "global vars" here.
235	OK, thank you! I'll make that change.
clang/lib/Analysis/FlowSensitive/Transfer.cpp
517	Will do.
524	Good point, and @xazax.hun pointed this out as well. I'll add a `FIXME` here, at least.

ymandel added inline comments.Jul 22 2022, 8:10 AM

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
206	Just those in this patch.
218	https://github.com/llvm/llvm-project/blob/main/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp#L131-L135 /// Initializes global storage values that are declared or referenced from /// sub-statements of `S`. // FIXME: Add support for resetting globals after function calls to enable // the implementation of sound analyses. Since this already mentions a need to reset after function calls, seemed relevant here.

Address Yitzie's comments

samestep added inline comments.Jul 22 2022, 8:14 AM

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
218	Hmm, OK. I pretty much just pattern-matched from the `Environment` constructor right above this method implementation. Would it be better for me to instead just remove this `initGlobalVars` call for now and replace it with a `FIXME` saying that we'll probably need to call this but it's unclear how exactly to do so?

Harbormaster completed remote builds in B177016: Diff 446842.Jul 22 2022, 8:43 AM

Fix typo in comment

ymandel added inline comments.Jul 22 2022, 8:53 AM

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
218	SGTM. We can note this constraint on models' reference implementations. Specifically, that they cannot reference globals.

gribozavr2 added inline comments.Jul 22 2022, 8:57 AM

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
229	The Clang AST includes argument expressions for defaulted arguments, so I believe there shouldn't be anything left to do here, it should just work.
clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
3896

Harbormaster completed remote builds in B177023: Diff 446855.Jul 22 2022, 9:19 AM

ymandel added a reviewer: li.zhe.hua.Jul 22 2022, 9:25 AM

Don't allow globals

samestep added inline comments.Jul 22 2022, 11:34 AM

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
218	Done.
229	Oh nice! I'm updating this comment, thanks.
clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
3896	Good idea, done.

Update comment about default parameters

Harbormaster completed remote builds in B177059: Diff 446915.Jul 22 2022, 12:36 PM

li.zhe.hua added inline comments.Jul 22 2022, 12:58 PM

clang/include/clang/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.h
64	Nit: `-Wmissing-field-initializers` is apparently enabled, and starts warning on this.
clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
42–44	For the purposes of the test, there's really only 3 states: No built-in transfer Built-in transfer, no context-sensitive Built-in transfer, with context-sensitive It may be more readable for tests to have a 3-state enum, that `runDataFlow` will then use to produce the corresponding `DataflowAnalysisOptions`. As is, a snippet like {/.ApplyBuiltinTransfer=/true, /.BuiltinTransferOptions=/{/.ContextSensitive=/false}}); is rough to read. Good enum names with good comments would probably make this much better. WDYT?
67	Nit: `-Wmissing-field-initializers` is apparently enabled, and starts warning on this.

samestep added inline comments.Jul 22 2022, 1:08 PM

clang/include/clang/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.h
64	Ah thanks, will fix.
clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
42–44	I agree that there are only 3 states, but I also think that conceptually this really is a multi-layered thing; either we apply the built-in transfer or not, and then if we do apply the builtin transfer, there's some set of options we pass to it. Thus, it doesn't seem right to just collapse it into a flat enum; I'm not sure though.
67	Same here; thanks!

Appease -Wmissing-field-initializers

Harbormaster completed remote builds in B177097: Diff 446958.Jul 22 2022, 1:39 PM

In D130306#3671852, @samestep wrote:

In D130306#3670259, @xazax.hun wrote:

Yes, we considered a summary-based approach, but we decided not to use it because (as you mentioned) it would be much more difficult to implement, especially for callees with nontrivial CFGs, which would result in a nontrivial flow condition instead of just Values in the Environment.

Thanks! I think this is a major design decision, and I'd love to see a discussion about the alternatives considered before jumping to an implementation. Specifically, I'd like to know if summaries are out of scope or something you would consider in the future. Knowing this is useful because this can influence how the code should be reviewed. E.g., if you plan to have multiple ways to do context sensitive analysis in the future, we should make sure that the current code will not lock us in in the inline substitution approach. If summaries are not planned at all, this is not a concern.

Could you elaborate on what specific scalability problems you are concerned about?

The Clang Static Analyzer is using this approach and it was a long way to iron out all the corner cases where the performance was bad. It has many cuts including maximum number of visits for a basic block, maximum call stack depth, not inlining functions after a certain size threshold and so on. Basically, it took some time to get the right performance and precision. But overall, the inline substitution approach will never scale to large call stacks/long calling contexts. On the other hand, a summary-based approach can potentially find bugs across a really large number of function calls with reasonable costs. Mainly because the same function is not reanalyzed for every context.

To clarify, the main reason we want context-sensitive analysis is to allow us to simplify our definitions of some models, such as the optional checker model. The goal is to provide the analysis a mock implementation of an optional type, and then use context-sensitive analysis (probably just one or two layers deep) to model the constructors and methods.

Thanks, this is also useful context!

Some other related questions:

Why call noop analysis?

The alternative in this case would be to use the same analysis for the callee that is already being used for the caller? I agree that would be nicer to serve a broader category of use cases. I didn't do that in this patch for a couple reasons:

In the short term, that would require threading more information through to the builtin transfer function, while this patch was meant to just be a minimum viable product.

In the longer term, we probably don't need that for our specific goals (just modeling simple fields of mock classes) mentioned above.

I see. This was not clear to me from the description of the patch notes. It seems to me that the goal here is to simplify the modeling of certain types, not general context-sensitive analysis. I reviewed this patch with the wrong idea in mind. If that is the goal, the current approach makes sense. But I think the comments should make clear what the intended use case and the limitations of the current approach is.

Currently, it looks like the framework assumes functions that cannot be inlined are not doing anything. This is an unsound assumption, and I wonder if we should change that before we try to settle on the best values for the tunable parameters.

Agreed, this assumption is unsound. However, the framework already makes many other unsound assumptions in a similar spirit, so this one doesn't immediately strike me as one that needs to change. I'll defer to people with more context.

The main reason I wanted to call this out because it increasingly seems to be whenever a decision needs to be made, the framework is getting less and less sound. Basically, many design decisions here are very similar to what the Clang Static Analyzer is doing. Since Clang already has an analysis framework for unsound analysis, I wanted to avoid you reinventing the wheel and doing something similar to CSA instead of providing something new for the use cases that cannot be covered by the CSA.

I think the environment currently is not really up to context-sensitive evaluation. Consider a recursive function:

Yes, I mentioned this in the patch description and in a comment in the pushCall implementation: we plan to do this by modifying the set of fields in the DataflowAnalysisContext class. I plan to do this in my next patch, once this one is landed.

Oh, my bad! I somehow blinked and totally skipped that comment.

samestep edited the summary of this revision. (Show Details)Jul 25 2022, 7:28 AM

In D130306#3673120, @xazax.hun wrote:

Thanks! I think this is a major design decision, and I'd love to see a discussion about the alternatives considered before jumping to an implementation. Specifically, I'd like to know if summaries are out of scope or something you would consider in the future. Knowing this is useful because this can influence how the code should be reviewed. E.g., if you plan to have multiple ways to do context sensitive analysis in the future, we should make sure that the current code will not lock us in in the inline substitution approach. If summaries are not planned at all, this is not a concern.

This is a good question. I shared with you our design doc, which may help clarify somewhat; please let me know if you have further concerns.

The Clang Static Analyzer is using this approach and it was a long way to iron out all the corner cases where the performance was bad. It has many cuts including maximum number of visits for a basic block, maximum call stack depth, not inlining functions after a certain size threshold and so on. Basically, it took some time to get the right performance and precision. But overall, the inline substitution approach will never scale to large call stacks/long calling contexts. On the other hand, a summary-based approach can potentially find bugs across a really large number of function calls with reasonable costs. Mainly because the same function is not reanalyzed for every context.

That makes a lot of sense. From what you're saying, it sounds like we'll avoid that in our plan by keeping contexts small due to only context-sensitively analyzing simple models that we write ourselves.

I see. This was not clear to me from the description of the patch notes. It seems to me that the goal here is to simplify the modeling of certain types, not general context-sensitive analysis. I reviewed this patch with the wrong idea in mind. If that is the goal, the current approach makes sense. But I think the comments should make clear what the intended use case and the limitations of the current approach is.

Fair point. Hopefully the intended use case will become more clear in the code itself as I follow up with further patches; if not then I can modify the comments to clarify that. Are there any specific things you'd like to see written comments in this patch itself before landing?

The main reason I wanted to call this out because it increasingly seems to be whenever a decision needs to be made, the framework is getting less and less sound. Basically, many design decisions here are very similar to what the Clang Static Analyzer is doing. Since Clang already has an analysis framework for unsound analysis, I wanted to avoid you reinventing the wheel and doing something similar to CSA instead of providing something new for the use cases that cannot be covered by the CSA.

This makes sense; in this case, though, the unsoundness is already present (this patch does nothing to change the way we analyze calls to functions for which we can't see the body), so if anything, unsoundness is reduced here. I'll let @ymandel respond to this in more detail, though.

Oh, my bad! I somehow blinked and totally skipped that comment.

No worries! This patch is a bit large so it's easy to miss. Thanks for taking the time to review in such detail!

In D130306#3676325, @samestep wrote:

The main reason I wanted to call this out because it increasingly seems to be whenever a decision needs to be made, the framework is getting less and less sound. Basically, many design decisions here are very similar to what the Clang Static Analyzer is doing. Since Clang already has an analysis framework for unsound analysis, I wanted to avoid you reinventing the wheel and doing something similar to CSA instead of providing something new for the use cases that cannot be covered by the CSA.

This makes sense; in this case, though, the unsoundness is already present (this patch does nothing to change the way we analyze calls to functions for which we can't see the body), so if anything, unsoundness is reduced here. I'll let @ymandel respond to this in more detail, though.

Gabor, I fully agree. We need to start paying down the debt on the unsoundness, reducing it where possible and otherwise giving users control over whether to incur it. However, as Sam wrote, we did not expect to be incurring any new unsoundness here.

Thanks! Knowing the context, I am much happier with the direction overall. Is the plan to analyze a mock of std::optional instead of the actual code in the STL? How will that mock be shipped? Would that be embedded in the binary?

In D130306#3676475, @ymandel wrote:

Gabor, I fully agree. We need to start paying down the debt on the unsoundness, reducing it where possible and otherwise giving users control over whether to incur it.

I'm glad that this is still on the roadmap. I am a bit worried about how hard it will be to make the current memory model sound. Generally, I saw some researchers arguing that the access path approach approach is a better fit for dataflow analysis. See this paper as an example. Although admittedly, I do not fully agree with everything they claim, they focus on distributive problems in that paper, and I found that most actual problems that we want to solve are not distributive. But whatever model we end up using, to restore soundness we might need to introduce a way to summarize certain constructs in our memory model. There are some ideas in this survey paper.

However, as Sam wrote, we did not expect to be incurring any new unsoundness here.

I fully agree that this patch itself will not introduce new unsoundness (after the fixme mentioned in the description is resolved). My main concerns were:

The framework started to feel more similar to symbolic execution than abstract interpretation, see the answer to this question on the differences.
I was a bit worried that adding more features before fixing the soundness issues might make fixing those problems harder.
I was not sure the current approach would scale to general context-sensitive analysis. But looks like that is a non-goal for now. Doing this to model certain types of interest makes sense to me and is a good first step.

Overall, I am excited for context-sensitive analysis, and some of my concerns are addressed. Looking forward to the follow-up patches :)

In D130306#3676672, @xazax.hun wrote:

Thanks! Knowing the context, I am much happier with the direction overall. Is the plan to analyze a mock of std::optional instead of the actual code in the STL? How will that mock be shipped? Would that be embedded in the binary?

Glad to hear it! Yes, the current plan is to analyze a mock of std::optional instead of the actual type. One reason for this is that we would like to use the same mock to model multiple different optional types (e.g. absl::optional) using the same mock. Our current plan is to embed it directly in the binary.

Overall, I am excited for context-sensitive analysis, and some of my concerns are addressed. Looking forward to the follow-up patches :)

Thanks Gábor! I'll let @ymandel and others respond to your other points; also, thanks for the links, those resources look very helpful.

@xazax.hun Do you have anything else you'd like addressed/changed (either here or in the doc we shared with you) before I land this?

In D130306#3679294, @samestep wrote:

@xazax.hun Do you have anything else you'd like addressed/changed (either here or in the doc we shared with you) before I land this?

Nope, most of my concerns are unrelated to this patch. Sorry for hijacking the conversation with some offtopics. Feel free to land.

This revision is now accepted and ready to land.Jul 26 2022, 7:07 AM

Herald added a subscriber: rnkovacs. · View Herald TranscriptJul 26 2022, 7:07 AM

ymandel accepted this revision.Jul 26 2022, 8:40 AM

ymandel added inline comments.

clang/include/clang/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.h
64	optional, but in my experience, being explicit can help readability/findability in certain places.

Be explicit when constructing TransferOptions

This revision was landed with ongoing or failed builds.Jul 26 2022, 10:27 AM

Closed by commit rGfa2b83d07eca: [clang][dataflow] Analyze calls to in-TU functions (authored by samestep). · Explain Why

This revision was automatically updated to reflect the committed changes.

samestep added a commit: rGfa2b83d07eca: [clang][dataflow] Analyze calls to in-TU functions.

samestep added a reverting change: rGcc9aa157a83a: Revert "[clang][dataflow] Analyze calls to in-TU functions".Jul 26 2022, 10:30 AM

samestep reopened this revision.Jul 26 2022, 10:37 AM

This revision is now accepted and ready to land.Jul 26 2022, 10:37 AM

Use different name for TransferOptions field

samestep edited the summary of this revision. (Show Details)Jul 26 2022, 10:42 AM

This revision was landed with ongoing or failed builds.Jul 26 2022, 10:54 AM

Closed by commit rG300fbf56f89a: [clang][dataflow] Analyze calls to in-TU functions (authored by samestep). · Explain Why

This revision was automatically updated to reflect the committed changes.

samestep added a commit: rG300fbf56f89a: [clang][dataflow] Analyze calls to in-TU functions.

Harbormaster completed remote builds in B177657: Diff 447758.Jul 26 2022, 11:32 AM

samestep mentioned this in D130593: [clang][dataflow] Separate context by frame.Jul 26 2022, 11:51 AM

A few variables cause warinings in -Asserts.

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
216	It is used only in assert.
222	ditto

In D130306#3680942, @chapuni wrote:

A few variables cause warinings in -Asserts.

Thanks for pointing this out! How should I address this? Should I just inline the definitions of those variables into the asserts themselves?

In D130306#3681291, @samestep wrote:

In D130306#3680942, @chapuni wrote:

A few variables cause warinings in -Asserts.

Thanks for pointing this out! How should I address this? Should I just inline the definitions of those variables into the asserts themselves?

Someone took care of it: https://github.com/llvm/llvm-project/commit/1f8ae9d7e7e4afcc4e76728b28e64941660ca3eb

Revision Contents

Path

Size

clang/

include/

clang/

Analysis/

FlowSensitive/

DataflowEnvironment.h

15 lines

Transfer.h

9 lines

TypeErasedDataflowAnalysis.h

11 lines

lib/

Analysis/

FlowSensitive/

DataflowEnvironment.cpp

38 lines

Transfer.cpp

42 lines

TypeErasedDataflowAnalysis.cpp

14 lines

unittests/

Analysis/

FlowSensitive/

TransferTest.cpp

114 lines

Diff 447761

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	public:
///		///
/// If `DeclCtx` is a function, initializes the environment with symbolic		/// If `DeclCtx` is a function, initializes the environment with symbolic
/// representations of the function parameters.		/// representations of the function parameters.
///		///
/// If `DeclCtx` is a non-static member function, initializes the environment		/// If `DeclCtx` is a non-static member function, initializes the environment
/// with a symbolic representation of the `this` pointee.		/// with a symbolic representation of the `this` pointee.
Environment(DataflowAnalysisContext &DACtx, const DeclContext &DeclCtx);		Environment(DataflowAnalysisContext &DACtx, const DeclContext &DeclCtx);

		/// Creates and returns an environment to use for an inline analysis of the
		/// callee. Uses the storage location from each argument in the `Call` as the
		/// storage location for the corresponding parameter in the callee.
		///
		/// Requirements:
		///
		/// The callee of `Call` must be a `FunctionDecl` with a body.
		///
		/// The body of the callee must not reference globals.
		///
		/// The arguments of `Call` must map 1:1 to the callee's parameters.
		///
		/// Each argument of `Call` must already have a `StorageLocation`.
		Environment pushCall(const CallExpr *Call) const;

/// Returns true if and only if the environment is equivalent to `Other`, i.e		/// Returns true if and only if the environment is equivalent to `Other`, i.e
/// the two environments:		/// the two environments:
/// - have the same mappings from declarations to storage locations,		/// - have the same mappings from declarations to storage locations,
/// - have the same mappings from expressions to storage locations,		/// - have the same mappings from expressions to storage locations,
/// - have the same or equivalent (according to `Model`) values assigned to		/// - have the same or equivalent (according to `Model`) values assigned to
/// the same storage locations.		/// the same storage locations.
///		///
/// Requirements:		/// Requirements:
▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

clang/include/clang/Analysis/FlowSensitive/Transfer.h

	Show All 14 Lines
	#define LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_TRANSFER_H			#define LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_TRANSFER_H

	#include "clang/AST/Stmt.h"			#include "clang/AST/Stmt.h"
	#include "clang/Analysis/FlowSensitive/DataflowEnvironment.h"			#include "clang/Analysis/FlowSensitive/DataflowEnvironment.h"

	namespace clang {			namespace clang {
	namespace dataflow {			namespace dataflow {

				struct TransferOptions {
				/// Determines whether to analyze function bodies when present in the
				/// translation unit.
				bool ContextSensitive = false;
				};

	/// Maps statements to the environments of basic blocks that contain them.			/// Maps statements to the environments of basic blocks that contain them.
	class StmtToEnvMap {			class StmtToEnvMap {
	public:			public:
	virtual ~StmtToEnvMap() = default;			virtual ~StmtToEnvMap() = default;

	/// Returns the environment of the basic block that contains `S` or nullptr if			/// Returns the environment of the basic block that contains `S` or nullptr if
	/// there isn't one.			/// there isn't one.
	/// FIXME: Ensure that the result can't be null and return a const reference.			/// FIXME: Ensure that the result can't be null and return a const reference.
	virtual const Environment *getEnvironment(const Stmt &S) const = 0;			virtual const Environment *getEnvironment(const Stmt &S) const = 0;
	};			};

	/// Evaluates `S` and updates `Env` accordingly.			/// Evaluates `S` and updates `Env` accordingly.
	///			///
	/// Requirements:			/// Requirements:
	///			///
	/// `S` must not be `ParenExpr` or `ExprWithCleanups`.			/// `S` must not be `ParenExpr` or `ExprWithCleanups`.
	void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env);			void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env,
				TransferOptions Options);

	} // namespace dataflow			} // namespace dataflow
	} // namespace clang			} // namespace clang

	#endif // LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_TRANSFER_H			#endif // LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_TRANSFER_H

clang/include/clang/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.h

Show All 17 Lines

#include <vector> #include <vector>

#include "clang/AST/ASTContext.h" #include "clang/AST/ASTContext.h"

#include "clang/AST/Stmt.h" #include "clang/AST/Stmt.h"

#include "clang/Analysis/CFG.h" #include "clang/Analysis/CFG.h"

#include "clang/Analysis/FlowSensitive/ControlFlowContext.h" #include "clang/Analysis/FlowSensitive/ControlFlowContext.h"

#include "clang/Analysis/FlowSensitive/DataflowEnvironment.h" #include "clang/Analysis/FlowSensitive/DataflowEnvironment.h"

#include "clang/Analysis/FlowSensitive/DataflowLattice.h" #include "clang/Analysis/FlowSensitive/DataflowLattice.h"

#include "clang/Analysis/FlowSensitive/Transfer.h"

#include "llvm/ADT/Any.h" #include "llvm/ADT/Any.h"

#include "llvm/ADT/Optional.h" #include "llvm/ADT/Optional.h"

#include "llvm/Support/Error.h" #include "llvm/Support/Error.h"

namespace clang { namespace clang {

namespace dataflow { namespace dataflow {

struct DataflowAnalysisOptions { struct DataflowAnalysisOptions {

/// Determines whether to apply the built-in transfer functions. /// Determines whether to apply the built-in transfer functions.

// FIXME: Remove this option once the framework supports composing analyses // FIXME: Remove this option once the framework supports composing analyses

// (at which point the built-in transfer functions can be simply a standalone // (at which point the built-in transfer functions can be simply a standalone

// analysis). // analysis).

bool ApplyBuiltinTransfer = true; bool ApplyBuiltinTransfer = true;

/// Only has an effect if `ApplyBuiltinTransfer` is true.

TransferOptions BuiltinTransferOptions;

}; };

/// Type-erased lattice element container. /// Type-erased lattice element container.

/// ///

/// Requirements: /// Requirements:

/// ///

/// The type of the object stored in the container must be a bounded /// The type of the object stored in the container must be a bounded

/// join-semilattice. /// join-semilattice.

struct TypeErasedLattice { struct TypeErasedLattice {

llvm::Any Value; llvm::Any Value;

}; };

/// Type-erased base class for dataflow analyses built on a single lattice type. /// Type-erased base class for dataflow analyses built on a single lattice type.

class TypeErasedDataflowAnalysis : public Environment::ValueModel { class TypeErasedDataflowAnalysis : public Environment::ValueModel {

DataflowAnalysisOptions Options; DataflowAnalysisOptions Options;

public: public:

TypeErasedDataflowAnalysis() : Options({}) {} TypeErasedDataflowAnalysis() : Options({}) {}

/// Deprecated. Use the `DataflowAnalysisOptions` constructor instead. /// Deprecated. Use the `DataflowAnalysisOptions` constructor instead.

TypeErasedDataflowAnalysis(bool ApplyBuiltinTransfer) TypeErasedDataflowAnalysis(bool ApplyBuiltinTransfer)

: Options({ApplyBuiltinTransfer}) {} : Options({ApplyBuiltinTransfer, TransferOptions{}}) {}

li.zhe.huaUnsubmitted

Not Done

TypeErasedDataflowAnalysis(bool ApplyBuiltinTransfer)

- : Options({ApplyBuiltinTransfer}) {}

+ : Options({ApplyBuiltinTransfer, {}}) {}

TypeErasedDataflowAnalysis(DataflowAnalysisOptions Options)

Nit: -Wmissing-field-initializers is apparently enabled, and starts warning on this.

li.zhe.hua: Nit: `-Wmissing-field-initializers` is apparently enabled, and starts warning on this.

samestepAuthorUnsubmitted

Done

Ah thanks, will fix.

samestep: Ah thanks, will fix.

ymandelUnsubmitted

Not Done

TypeErasedDataflowAnalysis(bool ApplyBuiltinTransfer)

- : Options({ApplyBuiltinTransfer, {}}) {}

+ : Options({ApplyBuiltinTransfer, TransferOptions{}}) {}

TypeErasedDataflowAnalysis(DataflowAnalysisOptions Options)

optional, but in my experience, being explicit can help readability/findability in certain places.

ymandel: optional, but in my experience, being explicit can help readability/findability in certain…

TypeErasedDataflowAnalysis(DataflowAnalysisOptions Options) TypeErasedDataflowAnalysis(DataflowAnalysisOptions Options)

: Options(Options) {} : Options(Options) {}

virtual ~TypeErasedDataflowAnalysis() {} virtual ~TypeErasedDataflowAnalysis() {}

/// Returns the `ASTContext` that is used by the analysis. /// Returns the `ASTContext` that is used by the analysis.

virtual ASTContext &getASTContext() = 0; virtual ASTContext &getASTContext() = 0;

Show All 16 Lines public:

/// Applies the analysis transfer function for a given statement and /// Applies the analysis transfer function for a given statement and

/// type-erased lattice element. /// type-erased lattice element.

virtual void transferTypeErased(const Stmt *, TypeErasedLattice &, virtual void transferTypeErased(const Stmt *, TypeErasedLattice &,

Environment &) = 0; Environment &) = 0;

/// Determines whether to apply the built-in transfer functions, which model /// Determines whether to apply the built-in transfer functions, which model

/// the heap and stack in the `Environment`. /// the heap and stack in the `Environment`.

bool applyBuiltinTransfer() const { return Options.ApplyBuiltinTransfer; } bool applyBuiltinTransfer() const { return Options.ApplyBuiltinTransfer; }

/// Returns the options to be passed to the built-in transfer functions.

TransferOptions builtinTransferOptions() const {

return Options.BuiltinTransferOptions;

}

}; };

/// Type-erased model of the program at a given program point. /// Type-erased model of the program at a given program point.

struct TypeErasedDataflowAnalysisState { struct TypeErasedDataflowAnalysisState {

/// Type-erased model of a program property. /// Type-erased model of a program property.

TypeErasedLattice Lattice; TypeErasedLattice Lattice;

/// Model of the state of the program (store and heap). /// Model of the state of the program (store and heap).

▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines if (MethodDecl && !MethodDecl->isStatic()) {

DACtx.setThisPointeeStorageLocation(ThisPointeeLoc); DACtx.setThisPointeeStorageLocation(ThisPointeeLoc);

if (Value *ThisPointeeVal = createValue(ThisPointeeType)) if (Value *ThisPointeeVal = createValue(ThisPointeeType))

setValue(ThisPointeeLoc, *ThisPointeeVal); setValue(ThisPointeeLoc, *ThisPointeeVal);

} }

Environment Environment::pushCall(const CallExpr *Call) const {

Environment Env(*this);

// FIXME: Currently this only works if the callee is never a method and the

ymandelUnsubmitted

Not Done

Environment Env(*this);

- // TODO: Currently this only works if the callee is never a method and the

+ // FIXME: Currently this only works if the callee is never a method and the

// same callee is never analyzed from multiple separate callsites. To

ymandel:

samestepAuthorUnsubmitted

Done

OK, I'll change this; would you like for me to replace all the other TODOs with FIXMEs, as well?

samestep: OK, I'll change this; would you like for me to replace all the other `TODO`s with `FIXME`s, as…

ymandelUnsubmitted

Not Done

Just those in this patch.

ymandel: Just those in this patch.

// same callee is never analyzed from multiple separate callsites. To

// generalize this, we'll need to store a "context" field (probably a stack of

// `const CallExpr *`s) in the `Environment`, and then change the

// `DataflowAnalysisContext` class to hold a map from contexts to "frames",

// where each frame stores its own version of what are currently the

// `DeclToLoc`, `ExprToLoc`, and `ThisPointeeLoc` fields.

const auto *FuncDecl = Call->getDirectCallee();

assert(FuncDecl != nullptr);

const auto *Body = FuncDecl->getBody();

chapuniUnsubmitted

Not Done

It is used only in assert.

chapuni: It is used only in assert.

assert(Body != nullptr);

// FIXME: In order to allow the callee to reference globals, we probably need

ymandelUnsubmitted

Not Done

I wonder how this will work between caller and callee. Do we need separate global var state in the frame? If so, maybe mention that as well in the FIXME above.

ymandel: I wonder how this will work between caller and callee. Do we need separate global var state in…

samestepAuthorUnsubmitted

Done

Could you clarify what you mean? Perhaps I just don't understand exactly what is meant by "global vars" here.

samestep: Could you clarify what you mean? Perhaps I just don't understand exactly what is meant by…

ymandelUnsubmitted

Not Done

https://github.com/llvm/llvm-project/blob/main/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp#L131-L135

/// Initializes global storage values that are declared or referenced from
/// sub-statements of `S`.
// FIXME: Add support for resetting globals after function calls to enable
// the implementation of sound analyses.

Since this already mentions a need to reset after function calls, seemed relevant here.

ymandel: https://github.com/llvm/llvm-project/blob/main/clang/lib/Analysis/FlowSensitive/DataflowEnviron…

samestepAuthorUnsubmitted

Done

Hmm, OK. I pretty much just pattern-matched from the Environment constructor right above this method implementation. Would it be better for me to instead just remove this initGlobalVars call for now and replace it with a FIXME saying that we'll probably need to call this but it's unclear how exactly to do so?

samestep: Hmm, OK. I pretty much just pattern-matched from the `Environment` constructor right above this…

ymandelUnsubmitted

Not Done

SGTM. We can note this constraint on models' reference implementations. Specifically, that they cannot reference globals.

ymandel: SGTM. We can note this constraint on models' reference implementations. Specifically, that they…

samestepAuthorUnsubmitted

Done

Done.

samestep: Done.

// to call `initGlobalVars` here in some way.

auto ParamIt = FuncDecl->param_begin();

auto ParamEnd = FuncDecl->param_end();

chapuniUnsubmitted

Not Done

ditto

chapuni: ditto

auto ArgIt = Call->arg_begin();

auto ArgEnd = Call->arg_end();

// FIXME: Parameters don't always map to arguments 1:1; examples include

// overloaded operators implemented as member functions, and parameter packs.

for (; ArgIt != ArgEnd; ++ParamIt, ++ArgIt) {

assert(ParamIt != ParamEnd);

gribozavr2Unsubmitted

Not Done

The Clang AST includes argument expressions for defaulted arguments, so I believe there shouldn't be anything left to do here, it should just work.

gribozavr2: The Clang AST includes argument expressions for defaulted arguments, so I believe there…

samestepAuthorUnsubmitted

Done

Oh nice! I'm updating this comment, thanks.

samestep: Oh nice! I'm updating this comment, thanks.

const VarDecl *Param = *ParamIt;

const Expr *Arg = *ArgIt;

auto *ArgLoc = Env.getStorageLocation(*Arg, SkipPast::Reference);

assert(ArgLoc != nullptr);

Env.setStorageLocation(*Param, *ArgLoc);

ymandelUnsubmitted

Not Done

I'm pretty sure we want SkipPast::Reference. That will ensure that the parameter and argument share the same underlying location. Otherwise, in the case of references, the parameter will point to the reference location object rather than just directly to the location.

ymandel: I'm pretty sure we want `SkipPast::Reference`. That will ensure that the parameter and argument…

samestepAuthorUnsubmitted

Done

OK, thank you! I'll make that change.

samestep: OK, thank you! I'll make that change.

}

return Env;

}

bool Environment::equivalentTo(const Environment &Other, bool Environment::equivalentTo(const Environment &Other,

Environment::ValueModel &Model) const { Environment::ValueModel &Model) const {

assert(DACtx == Other.DACtx); assert(DACtx == Other.DACtx);

if (DeclToLoc != Other.DeclToLoc) if (DeclToLoc != Other.DeclToLoc)

return false; return false;

if (ExprToLoc != Other.ExprToLoc) if (ExprToLoc != Other.ExprToLoc)

▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

clang/lib/Analysis/FlowSensitive/Transfer.cpp

Show All 14 Lines
#include "clang/AST/Decl.h"		#include "clang/AST/Decl.h"
#include "clang/AST/DeclBase.h"		#include "clang/AST/DeclBase.h"
#include "clang/AST/DeclCXX.h"		#include "clang/AST/DeclCXX.h"
#include "clang/AST/Expr.h"		#include "clang/AST/Expr.h"
#include "clang/AST/ExprCXX.h"		#include "clang/AST/ExprCXX.h"
#include "clang/AST/OperationKinds.h"		#include "clang/AST/OperationKinds.h"
#include "clang/AST/Stmt.h"		#include "clang/AST/Stmt.h"
#include "clang/AST/StmtVisitor.h"		#include "clang/AST/StmtVisitor.h"
		#include "clang/Analysis/FlowSensitive/ControlFlowContext.h"
#include "clang/Analysis/FlowSensitive/DataflowEnvironment.h"		#include "clang/Analysis/FlowSensitive/DataflowEnvironment.h"
		#include "clang/Analysis/FlowSensitive/NoopAnalysis.h"
#include "clang/Analysis/FlowSensitive/Value.h"		#include "clang/Analysis/FlowSensitive/Value.h"
#include "clang/Basic/Builtins.h"		#include "clang/Basic/Builtins.h"
#include "clang/Basic/OperatorKinds.h"		#include "clang/Basic/OperatorKinds.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include <cassert>		#include <cassert>
#include <memory>		#include <memory>
#include <tuple>		#include <tuple>
Show All 9 Lines	if (auto *RHSValue =
dyn_cast_or_null<BoolValue>(Env.getValue(RHS, SkipPast::Reference)))		dyn_cast_or_null<BoolValue>(Env.getValue(RHS, SkipPast::Reference)))
return Env.makeIff(LHSValue, RHSValue);		return Env.makeIff(LHSValue, RHSValue);

return Env.makeAtomicBoolValue();		return Env.makeAtomicBoolValue();
}		}

class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {		class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
public:		public:
TransferVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env)		TransferVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env,
: StmtToEnv(StmtToEnv), Env(Env) {}		TransferOptions Options)
		: StmtToEnv(StmtToEnv), Env(Env), Options(Options) {}

void VisitBinaryOperator(const BinaryOperator *S) {		void VisitBinaryOperator(const BinaryOperator *S) {
const Expr *LHS = S->getLHS();		const Expr *LHS = S->getLHS();
assert(LHS != nullptr);		assert(LHS != nullptr);

const Expr *RHS = S->getRHS();		const Expr *RHS = S->getRHS();
assert(RHS != nullptr);		assert(RHS != nullptr);

▲ Show 20 Lines • Show All 439 Lines • ▼ Show 20 Lines	if (S->isCallToStdMove()) {
assert(S->getNumArgs() > 0);		assert(S->getNumArgs() > 0);
assert(S->getArg(0) != nullptr);		assert(S->getArg(0) != nullptr);
// `__builtin_expect` returns by-value, so strip away any potential		// `__builtin_expect` returns by-value, so strip away any potential
// references in the argument.		// references in the argument.
auto ArgLoc = Env.getStorageLocation(S->getArg(0), SkipPast::Reference);		auto ArgLoc = Env.getStorageLocation(S->getArg(0), SkipPast::Reference);
if (ArgLoc == nullptr)		if (ArgLoc == nullptr)
return;		return;
Env.setStorageLocation(S, ArgLoc);		Env.setStorageLocation(S, ArgLoc);
		} else if (const FunctionDecl *F = S->getDirectCallee()) {
		// This case is for context-sensitive analysis, which we only do if we
		// have the callee body available in the translation unit.
		if (!Options.ContextSensitive \|\| F->getBody() == nullptr)
		return;

		auto &ASTCtx = F->getASTContext();

		// FIXME: Cache these CFGs.
		ymandelUnsubmitted Not Done Reply Inline Actions here and below: s/TODO/FIXME. ymandel: here and below: s/TODO/FIXME.
		samestepAuthorUnsubmitted Done Reply Inline Actions Will do. samestep: Will do.
		auto CFCtx = ControlFlowContext::build(F, F->getBody(), &ASTCtx);
		// FIXME: Handle errors here and below.
		assert(CFCtx);
		auto ExitBlock = CFCtx->getCFG().getExit().getBlockID();

		auto CalleeEnv = Env.pushCall(S);

		ymandelUnsubmitted Not Done Reply Inline Actions This seems worth a FIXME or, at least, an explanation. It implies that with the current design, we can't support general-purpose analyses, which we should probably fix. Given our goal of supporting models that don't involve specialized lattices, I think this is a good compromise for the short term, but not a stable solution for the framework (hence FIXME sounds right). ymandel: This seems worth a FIXME or, at least, an explanation. It implies that with the current design…
		samestepAuthorUnsubmitted Done Reply Inline Actions Good point, and @xazax.hun pointed this out as well. I'll add a `FIXME` here, at least. samestep: Good point, and @xazax.hun pointed this out as well. I'll add a `FIXME` here, at least.
		// FIXME: Use the same analysis as the caller for the callee.
		DataflowAnalysisOptions Options;
		auto Analysis = NoopAnalysis(ASTCtx, Options);

		auto BlockToOutputState =
		dataflow::runDataflowAnalysis(*CFCtx, Analysis, CalleeEnv);
		assert(BlockToOutputState);
		assert(ExitBlock < BlockToOutputState->size());

		auto ExitState = (*BlockToOutputState)[ExitBlock];
		assert(ExitState);

		Env = ExitState->Env;
}		}
}		}

void VisitMaterializeTemporaryExpr(const MaterializeTemporaryExpr *S) {		void VisitMaterializeTemporaryExpr(const MaterializeTemporaryExpr *S) {
const Expr *SubExpr = S->getSubExpr();		const Expr *SubExpr = S->getSubExpr();
assert(SubExpr != nullptr);		assert(SubExpr != nullptr);

auto SubExprLoc = Env.getStorageLocation(SubExpr, SkipPast::None);		auto SubExprLoc = Env.getStorageLocation(SubExpr, SkipPast::None);
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	BoolValue &getLogicOperatorSubExprValue(const Expr &SubExpr) {

// If the value of `SubExpr` is still unknown, we create a fresh symbolic		// If the value of `SubExpr` is still unknown, we create a fresh symbolic
// boolean value for it.		// boolean value for it.
return Env.makeAtomicBoolValue();		return Env.makeAtomicBoolValue();
}		}

const StmtToEnvMap &StmtToEnv;		const StmtToEnvMap &StmtToEnv;
Environment &Env;		Environment &Env;
		TransferOptions Options;
};		};

void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env) {		void transfer(const StmtToEnvMap &StmtToEnv, const Stmt &S, Environment &Env,
TransferVisitor(StmtToEnv, Env).Visit(&S);		TransferOptions Options) {
		TransferVisitor(StmtToEnv, Env, Options).Visit(&S);
}		}

} // namespace dataflow		} // namespace dataflow
} // namespace clang		} // namespace clang

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	static int blockIndexInPredecessor(const CFGBlock &Pred,
return BlockPos - Pred.succ_begin();		return BlockPos - Pred.succ_begin();
}		}

/// Extends the flow condition of an environment based on a terminator		/// Extends the flow condition of an environment based on a terminator
/// statement.		/// statement.
class TerminatorVisitor : public ConstStmtVisitor<TerminatorVisitor> {		class TerminatorVisitor : public ConstStmtVisitor<TerminatorVisitor> {
public:		public:
TerminatorVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env,		TerminatorVisitor(const StmtToEnvMap &StmtToEnv, Environment &Env,
int BlockSuccIdx)		int BlockSuccIdx, TransferOptions TransferOpts)
: StmtToEnv(StmtToEnv), Env(Env), BlockSuccIdx(BlockSuccIdx) {}		: StmtToEnv(StmtToEnv), Env(Env), BlockSuccIdx(BlockSuccIdx),
		TransferOpts(TransferOpts) {}

void VisitIfStmt(const IfStmt *S) {		void VisitIfStmt(const IfStmt *S) {
auto *Cond = S->getCond();		auto *Cond = S->getCond();
assert(Cond != nullptr);		assert(Cond != nullptr);
extendFlowCondition(*Cond);		extendFlowCondition(*Cond);
}		}

void VisitWhileStmt(const WhileStmt *S) {		void VisitWhileStmt(const WhileStmt *S) {
Show All 26 Lines	void VisitConditionalOperator(const ConditionalOperator *S) {
assert(Cond != nullptr);		assert(Cond != nullptr);
extendFlowCondition(*Cond);		extendFlowCondition(*Cond);
}		}

private:		private:
void extendFlowCondition(const Expr &Cond) {		void extendFlowCondition(const Expr &Cond) {
// The terminator sub-expression might not be evaluated.		// The terminator sub-expression might not be evaluated.
if (Env.getStorageLocation(Cond, SkipPast::None) == nullptr)		if (Env.getStorageLocation(Cond, SkipPast::None) == nullptr)
transfer(StmtToEnv, Cond, Env);		transfer(StmtToEnv, Cond, Env, TransferOpts);

// FIXME: The flow condition must be an r-value, so `SkipPast::None` should		// FIXME: The flow condition must be an r-value, so `SkipPast::None` should
// suffice.		// suffice.
auto *Val =		auto *Val =
cast_or_null<BoolValue>(Env.getValue(Cond, SkipPast::Reference));		cast_or_null<BoolValue>(Env.getValue(Cond, SkipPast::Reference));
// Value merging depends on flow conditions from different environments		// Value merging depends on flow conditions from different environments
// being mutually exclusive -- that is, they cannot both be true in their		// being mutually exclusive -- that is, they cannot both be true in their
// entirety (even if they may share some clauses). So, we need some value		// entirety (even if they may share some clauses). So, we need some value
Show All 15 Lines	if (BlockSuccIdx == 1)
Val = &Env.makeNot(*Val);		Val = &Env.makeNot(*Val);

Env.addToFlowCondition(*Val);		Env.addToFlowCondition(*Val);
}		}

const StmtToEnvMap &StmtToEnv;		const StmtToEnvMap &StmtToEnv;
Environment &Env;		Environment &Env;
int BlockSuccIdx;		int BlockSuccIdx;
		TransferOptions TransferOpts;
};		};

/// Computes the input state for a given basic block by joining the output		/// Computes the input state for a given basic block by joining the output
/// states of its predecessors.		/// states of its predecessors.
///		///
/// Requirements:		/// Requirements:
///		///
/// All predecessors of `Block` except those with loop back edges must have		/// All predecessors of `Block` except those with loop back edges must have
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	for (const CFGBlock *Pred : Preds) {
if (!MaybePredState)		if (!MaybePredState)
continue;		continue;

TypeErasedDataflowAnalysisState PredState = MaybePredState.value();		TypeErasedDataflowAnalysisState PredState = MaybePredState.value();
if (ApplyBuiltinTransfer) {		if (ApplyBuiltinTransfer) {
if (const Stmt *PredTerminatorStmt = Pred->getTerminatorStmt()) {		if (const Stmt *PredTerminatorStmt = Pred->getTerminatorStmt()) {
const StmtToEnvMapImpl StmtToEnv(CFCtx, BlockStates);		const StmtToEnvMapImpl StmtToEnv(CFCtx, BlockStates);
TerminatorVisitor(StmtToEnv, PredState.Env,		TerminatorVisitor(StmtToEnv, PredState.Env,
blockIndexInPredecessor(*Pred, Block))		blockIndexInPredecessor(*Pred, Block),
		Analysis.builtinTransferOptions())
.Visit(PredTerminatorStmt);		.Visit(PredTerminatorStmt);
}		}
}		}

if (MaybeState) {		if (MaybeState) {
Analysis.joinTypeErased(MaybeState->Lattice, PredState.Lattice);		Analysis.joinTypeErased(MaybeState->Lattice, PredState.Lattice);
MaybeState->Env.join(PredState.Env, Analysis);		MaybeState->Env.join(PredState.Env, Analysis);
} else {		} else {
Show All 19 Lines	static void transferCFGStmt(
TypeErasedDataflowAnalysisState &State,		TypeErasedDataflowAnalysisState &State,
std::function<void(const CFGStmt &,		std::function<void(const CFGStmt &,
const TypeErasedDataflowAnalysisState &)>		const TypeErasedDataflowAnalysisState &)>
HandleTransferredStmt) {		HandleTransferredStmt) {
const Stmt *S = CfgStmt.getStmt();		const Stmt *S = CfgStmt.getStmt();
assert(S != nullptr);		assert(S != nullptr);

if (Analysis.applyBuiltinTransfer())		if (Analysis.applyBuiltinTransfer())
transfer(StmtToEnvMapImpl(CFCtx, BlockStates), *S, State.Env);		transfer(StmtToEnvMapImpl(CFCtx, BlockStates), *S, State.Env,
		Analysis.builtinTransferOptions());
Analysis.transferTypeErased(S, State.Lattice, State.Env);		Analysis.transferTypeErased(S, State.Lattice, State.Env);

if (HandleTransferredStmt != nullptr)		if (HandleTransferredStmt != nullptr)
HandleTransferredStmt(CfgStmt, State);		HandleTransferredStmt(CfgStmt, State);
}		}

/// Transfers `State` by evaluating `CfgInit`.		/// Transfers `State` by evaluating `CfgInit`.
static void transferCFGInitializer(const CFGInitializer &CfgInit,		static void transferCFGInitializer(const CFGInitializer &CfgInit,
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

clang/unittests/Analysis/FlowSensitive/TransferTest.cpp

Show All 33 Lines

using ::testing::ElementsAre; using ::testing::ElementsAre;

using ::testing::IsNull; using ::testing::IsNull;

using ::testing::NotNull; using ::testing::NotNull;

using ::testing::Pair; using ::testing::Pair;

using ::testing::SizeIs; using ::testing::SizeIs;

template <typename Matcher> template <typename Matcher>

void runDataflow(llvm::StringRef Code, Matcher Match, void runDataflow(llvm::StringRef Code, Matcher Match,

DataflowAnalysisOptions Options,

LangStandard::Kind Std = LangStandard::lang_cxx17, LangStandard::Kind Std = LangStandard::lang_cxx17,

bool ApplyBuiltinTransfer = true,

llvm::StringRef TargetFun = "target") { llvm::StringRef TargetFun = "target") {

li.zhe.huaUnsubmitted

Not Done

For the purposes of the test, there's really only 3 states:

No built-in transfer
Built-in transfer, no context-sensitive
Built-in transfer, with context-sensitive

It may be more readable for tests to have a 3-state enum, that runDataFlow will then use to produce the corresponding DataflowAnalysisOptions. As is, a snippet like

{/*.ApplyBuiltinTransfer=*/true,
 /*.BuiltinTransferOptions=*/{/*.ContextSensitive=*/false}});

is rough to read. Good enum names with good comments would probably make this much better. WDYT?

li.zhe.hua: For the purposes of the test, there's really only 3 states: # No built-in transfer # Built…

samestepAuthorUnsubmitted

Done

I agree that there are only 3 states, but I also think that conceptually this really is a multi-layered thing; either we apply the built-in transfer or not, and then if we do apply the builtin transfer, there's some set of options we pass to it. Thus, it doesn't seem right to just collapse it into a flat enum; I'm not sure though.

samestep: I agree that there are only 3 states, but I also think that conceptually this really is a multi…

ASSERT_THAT_ERROR( ASSERT_THAT_ERROR(

test::checkDataflow<NoopAnalysis>( test::checkDataflow<NoopAnalysis>(

Code, TargetFun, Code, TargetFun,

[ApplyBuiltinTransfer](ASTContext &C, Environment &) { [Options](ASTContext &C, Environment &) {

return NoopAnalysis(C, ApplyBuiltinTransfer); return NoopAnalysis(C, Options);

}, },

[&Match]( [&Match](

llvm::ArrayRef< llvm::ArrayRef<

std::pair<std::string, DataflowAnalysisState<NoopLattice>>> std::pair<std::string, DataflowAnalysisState<NoopLattice>>>

Results, Results,

ASTContext &ASTCtx) { Match(Results, ASTCtx); }, ASTContext &ASTCtx) { Match(Results, ASTCtx); },

{"-fsyntax-only", "-fno-delayed-template-parsing", {"-fsyntax-only", "-fno-delayed-template-parsing",

"-std=" + "-std=" + std::string(

std::string(

LangStandard::getLangStandardForKind(Std).getName())}), LangStandard::getLangStandardForKind(Std).getName())}),

llvm::Succeeded()); llvm::Succeeded());

} }

template <typename Matcher>

void runDataflow(llvm::StringRef Code, Matcher Match,

LangStandard::Kind Std = LangStandard::lang_cxx17,

bool ApplyBuiltinTransfer = true,

llvm::StringRef TargetFun = "target") {

runDataflow(Code, Match, {ApplyBuiltinTransfer, {}}, Std, TargetFun);

li.zhe.huaUnsubmitted

Not Done

llvm::StringRef TargetFun = "target") {

- runDataflow(Code, Match, {ApplyBuiltinTransfer}, Std, TargetFun);

+ runDataflow(Code, Match, {ApplyBuiltinTransfer, {}}, Std, TargetFun);

}

TEST(TransferTest, IntVarDeclNotTrackedWhenTransferDisabled) {

Nit: -Wmissing-field-initializers is apparently enabled, and starts warning on this.

li.zhe.hua: Nit: `-Wmissing-field-initializers` is apparently enabled, and starts warning on this.

samestepAuthorUnsubmitted

Done

Same here; thanks!

samestep: Same here; thanks!

}

TEST(TransferTest, IntVarDeclNotTrackedWhenTransferDisabled) { TEST(TransferTest, IntVarDeclNotTrackedWhenTransferDisabled) {

std::string Code = R"( std::string Code = R"(

void target() { void target() {

int Foo; int Foo;

// [[p]] // [[p]]

} }

)"; )";

runDataflow( runDataflow(

▲ Show 20 Lines • Show All 3,786 Lines • ▼ Show 20 Lines runDataflow(

ASSERT_THAT(FooDecl, NotNull()); ASSERT_THAT(FooDecl, NotNull());

BoolValue &LoopBodyFooVal = BoolValue &LoopBodyFooVal =

*cast<BoolValue>(LoopBodyEnv.getValue(*FooDecl, SkipPast::None)); *cast<BoolValue>(LoopBodyEnv.getValue(*FooDecl, SkipPast::None));

EXPECT_FALSE(LoopBodyEnv.flowConditionImplies(LoopBodyFooVal)); EXPECT_FALSE(LoopBodyEnv.flowConditionImplies(LoopBodyFooVal));

}); });

} }

TEST(TransferTest, ContextSensitiveOptionDisabled) {

std::string Code = R"(

bool GiveBool();

void SetBool(bool &Var) { Var = true; }

void target() {

bool Foo = GiveBool();

SetBool(Foo);

// [[p]]

}

)";

runDataflow(Code,

[](llvm::ArrayRef<

std::pair<std::string, DataflowAnalysisState<NoopLattice>>>

Results,

ASTContext &ASTCtx) {

ASSERT_THAT(Results, ElementsAre(Pair("p", _)));

const Environment &Env = Results[0].second.Env;

const ValueDecl *FooDecl = findValueDecl(ASTCtx, "Foo");

ASSERT_THAT(FooDecl, NotNull());

auto &FooVal =

*cast<BoolValue>(Env.getValue(*FooDecl, SkipPast::None));

EXPECT_FALSE(Env.flowConditionImplies(FooVal));

gribozavr2Unsubmitted

Not Done

*cast<BoolValue>(Env.getValue(*FooDecl, SkipPast::None));

EXPECT_FALSE(Env.flowConditionImplies(FooVal));

+ EXPECT_FALSE(Env.flowConditionImplies(Env.makeNot(FooVal)));

{/*.ApplyBuiltinTransfer=*/true,

gribozavr2:

samestepAuthorUnsubmitted

Done

Good idea, done.

samestep: Good idea, done.

EXPECT_FALSE(Env.flowConditionImplies(Env.makeNot(FooVal)));

{/*.ApplyBuiltinTransfer=*/true,

/*.BuiltinTransferOptions=*/{/*.ContextSensitive=*/false}});

}

TEST(TransferTest, ContextSensitiveSetTrue) {

std::string Code = R"(

bool GiveBool();

void SetBool(bool &Var) { Var = true; }

void target() {

bool Foo = GiveBool();

SetBool(Foo);

// [[p]]

}

)";

runDataflow(Code,

[](llvm::ArrayRef<

std::pair<std::string, DataflowAnalysisState<NoopLattice>>>

Results,

ASTContext &ASTCtx) {

ASSERT_THAT(Results, ElementsAre(Pair("p", _)));

const Environment &Env = Results[0].second.Env;

const ValueDecl *FooDecl = findValueDecl(ASTCtx, "Foo");

ASSERT_THAT(FooDecl, NotNull());

auto &FooVal =

*cast<BoolValue>(Env.getValue(*FooDecl, SkipPast::None));

EXPECT_TRUE(Env.flowConditionImplies(FooVal));

{/*.ApplyBuiltinTransfer=*/true,

/*.BuiltinTransferOptions=*/{/*.ContextSensitive=*/true}});

}

TEST(TransferTest, ContextSensitiveSetFalse) {

std::string Code = R"(

bool GiveBool();

void SetBool(bool &Var) { Var = false; }

void target() {

bool Foo = GiveBool();

SetBool(Foo);

// [[p]]

}

)";

runDataflow(Code,

[](llvm::ArrayRef<

std::pair<std::string, DataflowAnalysisState<NoopLattice>>>

Results,

ASTContext &ASTCtx) {

ASSERT_THAT(Results, ElementsAre(Pair("p", _)));

const Environment &Env = Results[0].second.Env;

const ValueDecl *FooDecl = findValueDecl(ASTCtx, "Foo");

ASSERT_THAT(FooDecl, NotNull());

auto &FooVal =

*cast<BoolValue>(Env.getValue(*FooDecl, SkipPast::None));

EXPECT_TRUE(Env.flowConditionImplies(Env.makeNot(FooVal)));

{/*.ApplyBuiltinTransfer=*/true,

/*.BuiltinTransferOptions=*/{/*.ContextSensitive=*/true}});

}

} // namespace } // namespace

This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Analyze calls to in-TU functionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 447761

clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h

clang/include/clang/Analysis/FlowSensitive/Transfer.h

clang/include/clang/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.h

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp

clang/lib/Analysis/FlowSensitive/Transfer.cpp

clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp

clang/unittests/Analysis/FlowSensitive/TransferTest.cpp

[clang][dataflow] Analyze calls to in-TU functions
ClosedPublic