This is an archive of the discontinued LLVM Phabricator instance.

Add scope information to CFG
ClosedPublic

Authored by m.ostapenko on Jan 21 2016, 6:17 AM.

Details

Summary
  • This patch adds two new CFG elements CFGScopeBegin and CFGScopeEnd that indicate when a local scope begins and ends respectively.
  • The addition has been developed behind a new analyzer-config flag called cfg-scopes (false by default).
  • A new test file called scopes-cfg-output.cpp has been created. This tests the new (experimental) feature.
  • Old test files affected by this patch are analyzer-config.c and analyzer-config.cpp; these have been updated to reflect changes pertaining to the newly added analyzer-config flag.
  • All tests pass.

Diff Detail

Repository
rL LLVM

Event Timeline

bshastry updated this revision to Diff 45522.Jan 21 2016, 6:17 AM
bshastry retitled this revision from to Add scope information to CFG.
bshastry updated this object.
bshastry added reviewers: dcoughlin, klimek, zaks.anna.
NoQ added a subscriber: NoQ.Jan 27 2016, 9:32 AM
dcoughlin edited edge metadata.Jan 28 2016, 9:48 AM

Bhargava,

Thanks for the patch! Here are some quick, superficial comments. Some more in-depth comments coming later.

Also, it is very helpful for reviewers on Phabricator if the you include more context in the diff with -U999999. (See http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface).

include/clang/Analysis/CFG.h
171

Even though a lot of code doesn't follow the guidelines, LLVM style discourages duplicating the function or class name at the beginning of the comment. (See http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments)

192

Same doc comment style issue here.

790

I would suggest calling this "AddScopes" to match the pluralization of the other options and propagate that name throughout the rest of the patch (e.g. "IncludeScopesInCFG", etc.).

lib/Analysis/CFG.cpp
249

I think it would be helpful to name this something more expressive (perhaps "TriggerStmt") so it is immediately clear what the relationship between the statement the LocalScope is.

1225

What is the difference between TriggerStmt and ScopeCreator? Do we need them both here?

1430

I think it is very important to document the different between 'S' and 'ScopeCreator' here.

lib/StaticAnalyzer/Core/AnalysisManager.cpp
29

The indentation looks off here.

bshastry updated this revision to Diff 46471.Jan 30 2016, 4:12 AM
bshastry edited edge metadata.

Updated patch to include full context.

Thanks for including more context. It is very helpful on Phabricator.

This looks like it is in good shape to me, but I think there are still two things to be done.

  1. I think it is important to not create ScopeBegin/ScopeEnd elements in the CFG if the scope doesn't actually contain any local variables. The CFGElements in blocks get iterated over many times, so it is good to keep the CFG as lean as possible. For example, in
if (x) {
 x++;
}

there isn't a need to add a scope pair around the Then statement. I think you should be able to avoid this by modifying addAutomaticObjDtors() to not add the End if the local scope is empty and having it communicate whether it did so or not to callers to that they can avoid adding the corresponding Begin after emitting the code in the scope.

  1. I think you are also missing some important scopes. For example, in:
int stuff();
void bar() {
  int *x = 0;
  if (int i = stuff()) {
    x = &i;
  } else {
  }
}

There needs to be a scope that starts immediately before int i = stuff() is evaluated and ends when the then statement ends. Every control flow statement with a condition variable (if/for/while/switch) will need the same treatment.

It is probably also good to add some tests that don't use C++ objects with destructors but instead just plain-old-C to make sure the scopes are correctly added even when there are no automatic destructors.

Also, don't forget to add cases for ScopeBegin and ScopeEnd to ExprEngine::processCFGElement() and getLocationForCaller() in PathDiagnostic.cpp. These can be llvm_unreachable() for now. Right now compiling your patch warns about those being missing.

Hello Bhargava,

Do you have any progress on this patch?
We have implemented ScopeContext support in the analyzer core but our work relies on this patch. Do you plan to continue work on CFG?
Also, I found a similar review: http://reviews.llvm.org/D15031 and I don't know what patch should really be accepted. Devin, Bhargava, could you please take a look?

Bhargava,

Thanks for the patch! Here are some quick, superficial comments. Some more in-depth comments coming later.

Also, it is very helpful for reviewers on Phabricator if the you include more context in the diff with -U999999. (See http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface).

Hello Bhargava,

Do you have any progress on this patch?
We have implemented ScopeContext support in the analyzer core but our work relies on this patch. Do you plan to continue work on CFG?
Also, I found a similar review: http://reviews.llvm.org/D15031 and I don't know what patch should really be accepted. Devin, Bhargava, could you please take a look?

Hi Aleksei,

I was working on this patch but got sidetracked. It is nice to know that there is going to be support for ScopeContext within the analyzer core. I would be happy to get back on this patch and push it upstream with help from Devin. I cannot promise a strict timeline but hopefully by the end of this week, there should be something commit ready.

Regards,
Bhargava

jbcoe added a subscriber: jbcoe.Apr 4 2016, 7:43 AM
mgehre added a subscriber: mgehre.Apr 4 2016, 9:48 AM
bshastry updated this revision to Diff 53018.EditedApr 8 2016, 6:38 AM
bshastry updated this object.

Addressed Devin's comments from first review:

  • Comments conform to LLVM style
  • Missing switch cases added for CFGElements ScopeBegin and ScopeEnd

Please note that this patch is not final yet. Devin's comments from second review are still being incorporated. Specifically, the comments being addressed are:

  • Document difference between Stmt and TriggerStmt
  • Avoiding scopes when no local VarDecl exists
  • Correcting the position of scopes to before the first VarDecl in scope e.g., as part of a condition statement
bshastry updated this revision to Diff 53026.Apr 8 2016, 7:56 AM
bshastry marked 3 inline comments as done.

Fixed bug: Previous patch caused build failure due to CFG scope option rename (AddScope instead of AddScopeInfo) not being applied to CFG.cpp and the test case.

bshastry updated this revision to Diff 53030.Apr 8 2016, 8:51 AM

Renamed and documented the difference between scope creator statement (TriggerScopeStmt) and auto obj dtors statement (TriggerAutoObjDtorsStmt).

bshastry marked 4 inline comments as done.Apr 8 2016, 9:04 AM

Hi Devin,

Please find my thoughts on your second batch of comments inline:

  1. I think it is important to not create ScopeBegin/ScopeEnd elements in the CFG if the scope doesn't actually contain any local variables. The CFGElements in blocks get iterated over many times, so it is good to keep the CFG as lean as possible. For example, in
if (x) {
 x++;
}

there isn't a need to add a scope pair around the Then statement. I think you should be able to avoid this by modifying addAutomaticObjDtors() to not add the End if the local scope is empty and having it communicate whether it did so or not to callers to that they can avoid adding the corresponding Begin after emitting the code in the scope.

[Task 1] I agree that it is necessary to keep CFG lean. However, your implementation proposal (communicating outcome of ScopeEnd addition up the caller hierarchy) requires careful consideration. Problematic cases are when scope end and scope begin are scattered across visitors e.g., breakstmt, continuestmt etc. For instance, the information about presence/absence of a scopeend element from a break stmt needs to be propogated through to the corresponding switch case stmt. I am afraid I feel a bit underprepared to implement this.

  1. I think you are also missing some important scopes. For example, in:
int stuff();
void bar() {
  int *x = 0;
  if (int i = stuff()) {
    x = &i;
  } else {
  }
}

There needs to be a scope that starts immediately before int i = stuff() is evaluated and ends when the then statement ends. Every control flow statement with a condition variable (if/for/while/switch) will need the same treatment.

[Task 2] Agreed. This is a bit more simpler than the task 1. I will have a go and post a patch when this is ready.

It is probably also good to add some tests that don't use C++ objects with destructors but instead just plain-old-C to make sure the scopes are correctly added even when there are no automatic destructors.

[Task 3] Agreed. Will couple this with task 2.

Also, don't forget to add cases for ScopeBegin and ScopeEnd to ExprEngine::processCFGElement() and getLocationForCaller() in PathDiagnostic.cpp. These can be llvm_unreachable() for now. Right now compiling your patch warns about those being missing.

This has been done in the latest patch.

lib/Analysis/CFG.cpp
1463

Yes. I have renamed TriggerStmt to TriggerAutoObjDtorsStmt and ScopeCreator to TriggerScopeStmt. The former tells us when to add ScopeEnd element and the former ScopeBegin element. Addition of TriggerScopeStmt was necessary because TriggerAutoObjDtorsStmt doesn't tell us anything about the present scope's context; only that the present scope is going to end.

bshastry updated this revision to Diff 53033.Apr 8 2016, 9:24 AM
bshastry marked an inline comment as done.

Nit: Renamed addScopeInfo to addScopes in AnalysisDeclContextManager ctor declaration

Hello Bhargava,

I used your patch as a development base. It is a really good improvement. But there are some things I wish to point on.
First, it is good to document some things.

  1. Scopes are not strictly nested. A ScopeEnd may terminate a number of ScopeBegins - for example, for break, continue or return. It was not evident for me so I'd prefer to state it somewhere.
  2. As I understand, the number of ScopeBegins on any path should be more or equal than the number of ScopeEnds.

Second, I found an issue for SwitchStmts without a CompoundStmt body. Consider this test case:

void test() {
  switch (0)
  case 0:
  default: {
    int a = 0;
  }
}

CFG for this code will contain a ScopeEnd for SwitchStmt but no ScopeBegin (invariant #2 is violated) so there are 2 ScopeBegins and 3 ScopeEnds in this linear CFG. It looks like a bug for me.

BTW, I found that handle exit of multiple Scopes with a single ScopeEnd it is pretty inconvenient for analyzer since the checker should manually check all the scopes we leave. Could we modify this behaviour to have each ScopeBegin correspond to a single ScopeEnd or vise versa?

Another test case, break-related.

void testBreak() {
  for (int i = 0; i < 3; ++i) {
    {
      int unused;
      break;
    }
  }
  {
    int unused;
  }
}

Its CFG looks broken.

Hello Bhargava. Is there any update on this review?

Hello Bhargava. Is there any update on this review?

Hi Aleksei, Firstly, thanks for the good feedback on the initial implementation. It is useful going forward. Sadly, I have not been able to set aside time for working on a follow-up. I will try to pick up where I left off and post a patch when it is ready. It might take a couple of weeks though, since my schedule is very tight at the moment.

Hi,

unless someone has objections (@bshastry ?) I'll commandeer this revision and post updated patch shortly (it still needs some polishing though).

m.ostapenko commandeered this revision.Jun 23 2017, 4:55 AM
m.ostapenko added a reviewer: bshastry.
m.ostapenko added a subscriber: cfe-commits.
m.ostapenko set the repository for this revision to rL LLVM.
m.ostapenko added a project: Restricted Project.

So, updating the diff. This is still a very experimental version and any feedback would be greatly appreciated.
Current patch should support basic {If, While, For, Compound, Switch}Stmts as well as their interactions with {Break, Continue, Return}Stmts.
GotoStmt and CXXForRangeStmt are not supported at this moment.

Please consider also https://reviews.llvm.org/D15031
It already handles all possible control flows. Instead of ScopeBegin and ScopeEnd,
it introduces LifetimeEnds elements. It's a more specific in that is also
correctly models the order of lifetime expiry of variables and the order
or destructor calls / lifetime ending.

Hi Matthias,
I have posted a comment about review duplication (more than a year ago!) in your review but you haven't answered. So, all this time we were thinking that you do separate non-related work.
@dcoughlin As a reviewer of both patches - could you tell us what's the difference between them? And how are we going to resolve this issue?

NoQ added a comment.Jun 24 2017, 11:17 PM

Maxim, totally thanks for picking this up!

Could you explain the idea behind shouldDeferScopeEnd, maybe in a code comment before the function?

Current patch should support basic {If, While, For, Compound, Switch}Stmts as well as their interactions with {Break, Continue, Return}Stmts.
GotoStmt and CXXForRangeStmt are not supported at this moment.

SwitchStmt isn't much easier than GotoStmt; it doesn't jump backwards, but it can still jump into multiple different scopes. Does your code handle Duff's device (1) (2) correctly? We should probably add it as a test, or split out switch support into a separate patch together with goto, if such test isn't yet supported.

zaks.anna edited edge metadata.Jun 26 2017, 1:52 PM

@dcoughlin As a reviewer of both patches - could you tell us what's the difference between them? And how are we going to resolve this issue?

Unfortunately, @dcoughlin is on vacation this week; should be back next week.

dcoughlin edited edge metadata.Jul 5 2017, 11:41 AM

@dcoughlin As a reviewer of both patches - could you tell us what's the difference between them? And how are we going to resolve this issue?

These two patches are emitting markers in the CFG for different things.

Here is how I would characterize the difference between the two patches.

  • Despite its name, https://reviews.llvm.org/D15031, is really emitting markers for when the lifetime of a C++ object in an automatic variable ends. For C++ objects with non-trivial destructors, this point is when the destructor is called. At this point the storage for the variable still exists, but what you can do with that storage is very restricted by the language because its contents have been destroyed. Note that even with the contents of the object have been destroyed, it is still meaningful to, say, take the address of the variable and construct a new object into it with a placement new. In other words, the same variable can have different objects, with different lifetimes, in it at different program points.
  • In contrast, the purpose of this patch (https://reviews.llvm.org/D16403) is to add markers for when the storage duration for the variable begins and ends (this is, when the storage exists). Once the storage duration has ended, you can't placement new into the variables address, because another variable may already be at that address.

I don't think there is an "issue" to resolve here. We should make sure the two patches play nicely with each other, though. In particular, we should make sure that the markers for when lifetime ends and when storage duration ends are ordered correctly.

Oh, I see now. I was wrongly thinking that these patches do the same thing and we're duplicating the work. Thank you very much for your explanation, Devin.

Hi Artem, I'm sorry for a long delay (nasty corporate issues).

In D16403#789957, @NoQ wrote:

Maxim, totally thanks for picking this up!

Could you explain the idea behind shouldDeferScopeEnd, maybe in a code comment before the function?

Current patch should support basic {If, While, For, Compound, Switch}Stmts as well as their interactions with {Break, Continue, Return}Stmts.
GotoStmt and CXXForRangeStmt are not supported at this moment.

SwitchStmt isn't much easier than GotoStmt; it doesn't jump backwards, but it can still jump into multiple different scopes. Does your code handle Duff's device (1) (2) correctly? We should probably add it as a test, or split out switch support into a separate patch together with goto, if such test isn't yet supported.

Ugh, yeah, SwitchStmt is tricky (thank you for pointing to Duff's device example!). I've tried to resolve several issues with switch revealed by this testcase, but didn't succeed for now :(. So, it was decided to remove SwitchStmt support in this patch.
There is one more thing I'd like to clarify: since e.g. BreakStmt can terminate multiple scopes, do I need to create ScopeEnd marks for all of them to make analyzer's work easier? Because right now only one "cumulative" block is generated and I'm not sure that it's acceptable for analyzer.

NoQ added a comment.EditedJul 10 2017, 7:19 AM

There is one more thing I'd like to clarify: since e.g. BreakStmt can terminate multiple scopes, do I need to create ScopeEnd marks for all of them to make analyzer's work easier? Because right now only one "cumulative" block is generated and I'm not sure that it's acceptable for analyzer.

The analyzer intends to maintain a stack of scopes. When the scope is entered during symbolic execution, it gets put on top of the stack, and when it is exited it is taken off the top of the stack. It's probably not important if we have one mark or multiple marks, but it's important to know what scopes, in what order, are getting entered or exited.

m.ostapenko retitled this revision from Add scope information to CFG to Add scope information to CFG for If/While/For/Do/Compound/CXXRangeFor statements.

Updating the diff. I've dropped SwitchStmt support, let's implement in separately as well as GotoStmt.

szepet added a subscriber: szepet.Jul 13 2017, 8:47 AM

Hello Maxim!
Thanks for working on this!

Ugh, yeah, SwitchStmt is tricky (thank you for pointing to Duff's device example!). I've tried to resolve several issues with switch revealed by this testcase, but didn't succeed for now :(. So, it was decided to remove SwitchStmt support in this patch.

I think this is a great decision! It can build up incremental and the patch can be more easily reviewed. When do you plan to upload a patch wihtout casestmt support? (As far as I see there is switchstmt related code but maybe Im missing something.)

NoQ added a comment.Jul 13 2017, 8:56 AM

I think the remaining switch-related code seems to be about C++17 switch condition variables, i.e. switch (int x = ...)(?)

In D16403#808104, @NoQ wrote:

I think the remaining switch-related code seems to be about C++17 switch condition variables, i.e. switch (int x = ...)(?)

Yeah, exactly. I can remove it from this patch if it looks confusing.

Rebased and removed a bunch of stale changes. Also added a check for goto's: if we see GotoStmt and have cfg-scopes == true, make badCFG = true and retry without scopes enabled. This check will be removed once GotoStmt will become supported.

m.ostapenko updated this revision to Diff 108644.EditedJul 28 2017, 7:06 AM

Updated some comments. Could someone take a look please?

Rebased and ping.

Maxim, thanks for commandeering the patch. I apologize for the long delay reviewing!

I really like the approach of retrying without scopes enabled when we hit a construct we can't handle yet. This will make is possible to turn the feature on by default (and get it running on large amounts of code) while still developing it incrementally!

I'd like to suggest a change in the scope representation that will make it much easier to support all the corner cases and also re-use the existing logic for handling variable scopes. Right now the CFGScopeBegin and CFGScopeEnd elements use a statement (a loop or a CompoundStatement) to uniquely identify a scope. However, this is a problem because a for loop with an initializer may have two scopes that we really want to distinguish:

for (int i = 0; i < 10; i++)
  int j = i;

Here i and j really have different scopes, but with the current approach we won't be able tell them apart.

My suggestion is to instead use the first VarDecl declared in a scope to uniquely identify it. This means, for example, that CFGScopeBegin's constructor will take a VarDecl instead of a statement.

What's nice about this VarDecl approach is that the CFGBuilder already has a bunch of infrastructure to correctly track local scopes for variables (see the LocalScope class in CFG.cpp and the addLocalScopeForStmt() function.) Further, there are two functions, addLocalScopeAndDtors() and addAutomaticObjHandling() that are already called in all the places where a scope should end. This means you should only need to add logic to add an end of scope element in those two functions. What's more, future work to extend CFG construction to handle new C++ language features will also use these two functions -- so we will get support for those features for free. For an example of how to change those two functions, take a look at Matthias Gehre's commit in https://reviews.llvm.org/D15031. What you would need to do for CFGScopeEnd would be very similar to that patch. Using this approach should also make it possible to re-use the logic in that patch to eventually handle gotos.

To handle CFGScopeBegin, you would only need to change CFGBuilder::VisitDeclSubExpr() to detect when the variable being visited is the first variable declared in its LocalScope. If so, you would append the CFGScopeBegin element. What's nice about this is that you won't have to handle all the different kinds of loops individually.

What do you think? I'd be happy to have a Skype/Google hangouts talk about this if you want to have real-time discussion about it.

Hi Devin,

now I'm very sorry for a such long delay. Now I have a bunch of time to proceed development of this patch (if scope contexts are still needed, of course).
Regarding to approach you suggested (reuse LocalScope infrastructure and use first VarDecl for ScopeBegin): it seems very reasonable for me, but perhaps I need more advisory here.
I've implemented a draft patch to make sure I've understood you correctly (this is not a final version, the test case is completely inadequate for now). While testing, I've discovered several questions that I would like to discuss:

  1. Do we need to have one-to-one mapping between ScopeBegins and corresponding ScopeEnds or is it OK to assume that ScopeEnd can terminate several nested scopes?
  2. In the following example:
class A {
public:
  A() {}
  ~A() {}
  operator int() const { return 1; }
};

bool UV;
void test_for_implicit_scope() {
  for (A a; A b = a; )
      if (UV) continue;
      A c;
  }
it seems that lifetime of **b** ends when we branch out from the loop body (if entered, of course), but it seems that in current implementation we don't generate an implicit destructor for **b** just before **continue**. Is this a bug, or perhaps I'm missing something?
# The approach with first VarDecl works not so well with the following test case:
void test_goto() {
  A f;
l0:
  A d;
  { 
    A a;
    if (UV) goto l0;
    if (UV) goto l1;
    A b;
  }
l1:
  A c;
}

in this approach we'll don't emit a ScopeBegin for d because the first VarDecl is f. However IMHO we still need to add ScopeBegin for d in order to handle d's scope end occurring when we jumping to l0 via if (UV) goto l0;. I think this can be solved by adding ScopeBegin for d when backpatching blocks late in buildCFG routine (when we know all targets of all gotos). Does this approach look reasonable for you?

Thanks.

m.ostapenko retitled this revision from Add scope information to CFG for If/While/For/Do/Compound/CXXRangeFor statements to Add scope information to CFG.

Some code cleanup + updated test case.

Rebased and ping.

NoQ added a comment.Jan 26 2018, 12:37 PM

Hmm. @m.ostapenko @szepet @mgehre - I think it might be a good time to figure out if ScopeBegin/ScopeEnd, LoopEntrance/LoopExit, LifetimeEnds, AutomaticObjectDtor elements work nicely together. How should they be ordered with respect to each other? Is any of these scope representation a superset of another scope representation, or maybe fully covered by other two or three other scope representations? Would anybody be willing to produce some pictures (-analyzer-checker debug.ViewCFG and attach here) with current and/or intended behavior? Not sure, i guess LifetimeEnds is mostly used in clang-tidy so it does not necessarily need to work together with analyzer-specific elements (or maybe it's so great that we should switch to using it), but it would still be great if we had a single scope representation which would be rich enough to satisfy all needs.

NoQ added a comment.Jan 26 2018, 12:43 PM

Do we need to have one-to-one mapping between ScopeBegins and corresponding ScopeEnds or is it OK to assume that ScopeEnd can terminate several nested scopes?

It's fine if ScopeEnds terminates multiple scopes - as long as it is easy to find out what scopes are being terminated by looking at it. Because in the analyzer we need to put the scope on the stack when we enter it and pop it from the stack when we leave it, and those must match no matter what. So imagine that we look at the current ScopeEnd and at the stack of scopes we currently have. Once we have that, we should be able to figure out what scopes are ending, without using ParentMap or CFGStmtMap or re-visiting a large chunk of the AST recursively - ideally by a direct lookup.

Hi Artem,

In D16403#989451, @NoQ wrote:

Hmm. @m.ostapenko @szepet @mgehre - I think it might be a good time to figure out if ScopeBegin/ScopeEnd, LoopEntrance/LoopExit, LifetimeEnds, AutomaticObjectDtor elements work nicely together. How should they be ordered with respect to each other?

Here are several observations about ScopeBegin/ScopeEnd, LoopEntrance/LoopExit, LifetimeEnds, AutomaticObjectDtor interaction:

  1. LifetimeEnds and AutomaticObjectDtor don't work together (I'm unable to tell why).
  2. The difference between ScopeBegin/ScopeEnd and LifetimeEnds was described by Devin several comments above, in short:
    • LifetimeEnds emits markers for when the lifetime of a C++ object in an automatic variable ends. For C++ objects with non-trivial destructors, this point is when the destructor is called. At this point the storage for the variable still exists, but what you can do with that storage is very restricted by the language because its contents have been destroyed.
    • ScopeBegin/ScopeEnd add markers for when the storage duration for the variable begins and ends. Hence I can conclude that ScopeEnd should reside after implicit destructors and LifetimeEnds markers in CFG.
  3. LoopEntrance/LoopExit improve modelling of unrolled loops and I'm not sure whether scopes across iterations are the only thing that's modeled here.

Is any of these scope representation a superset of another scope representation, or maybe fully covered by other two or three other scope representations?

Given my observations above, I'm not sure whether some representation can be fully covered by others -- all of them serve different purposes. Or perhaps I'm missing something?

Would anybody be willing to produce some pictures (-analyzer-checker debug.ViewCFG and attach here) with current and/or intended behavior?

I'm attaching two CFGs for the following example:

void test_for_implicit_scope() {
  for (A a; A b = a; )
    A c;
}
$ ./bin/clang -cc1 -fcxx-exceptions -fexceptions -analyze -analyzer-checker=debug.ViewCFG -analyzer-config cfg-scopes=true -analyzer-config cfg-loopexit=true -analyzer-config unroll-loops=true /tmp/scopes-cfg-output.cpp
-> CFG-scopes-destructors-loopexit.dot

$ ./bin/clang -cc1 -fcxx-exceptions -fexceptions -analyze -analyzer-checker=debug.ViewCFG -analyzer-config cfg-scopes=true -analyzer-config cfg-loopexit=true -analyzer-config cfg-lifetime=true -analyzer-config cfg-implicit-dtors=false /tmp/scopes-cfg-output.cpp
-> CFG-scopes-loopexit-lifetime.dot{F5792940}

{F5792939}

Not sure, i guess LifetimeEnds is mostly used in clang-tidy so it does not necessarily need to work together with analyzer-specific elements (or maybe it's so great that we should switch to using it), but it would still be great if we had a single scope representation which would be rich enough to satisfy all needs.

Actually upload the files.

NoQ added a comment.Jan 30 2018, 2:09 PM

Thank you, this explanation looks very reasonable.

All right, so right after the termination of the loop we have

[B1]
1: ForStmt (LoopExit)
2: [B4.5].~A() (Implicit destructor)
3: [B5.3].~A() (Implicit destructor)
4: CFGScopeEnd(a)
5: CFGScopeEnd(b)

... where [B4.5] is A b = a; and [B5.3] is A a;. Am i understanding correctly that while destroying a you can still use the storage of b safely? Or should a go out of scope before b gets destroyed? Also, is the order of scope ends actually correct here - shouldn't b go out of scope earlier? Given that they are in very different lifetime scopes (a is one for the whole loop, b is per-loop-iteration). I guess the order would matter for the analyzer.

NoQ added a comment.Jan 30 2018, 2:11 PM

@szepet: so i see that LoopExit goes in the beginning of the cleanup block after the loop, while various ScopeEnds go after the LoopExit. Would loop unrolling be significantly broken if you simply subscribe to ScopeEnd instead of LoopExit and avoid cleaning up the loop state until destructors are processed? I might not be remembering correctly - is LoopExit only used for cleanup, or do we have more work to be done here?

In D16403#992452, @NoQ wrote:

Thank you, this explanation looks very reasonable.

All right, so right after the termination of the loop we have

[B1]
1: ForStmt (LoopExit)
2: [B4.5].~A() (Implicit destructor)
3: [B5.3].~A() (Implicit destructor)
4: CFGScopeEnd(a)
5: CFGScopeEnd(b)

... where [B4.5] is A b = a; and [B5.3] is A a;. Am i understanding correctly that while destroying a you can still use the storage of b safely? Or should a go out of scope before b gets destroyed?

AFAIU, when we destroying a we can still use the storage of b because nothing can be allocated into b's storage between calling destructors for b and a. So, imho the sequence should look like:

[B4.5].~A() (Implicit destructor)
[B5.3].~A() (Implicit destructor)
CFGScopeEnd(b)
CFGScopeEnd(a)

Also, is the order of scope ends actually correct here - shouldn't b go out of scope earlier? Given that they are in very different lifetime scopes (a is one for the whole loop, b is per-loop-iteration). I guess the order would matter for the analyzer.

Yeah, this is a bug in a current implementation. Will fix this in the next patch version.

Fix scope ends order (as discussed above) and adjust a testcase.

NoQ added a comment.Feb 7 2018, 5:58 PM

I poked Devin offline and we agreed that the overall approach is good to go. Maxim, thank you for picking it up!

We still don't have scopes for segments of code that don't have any variables in them, so i guess it's not yet in the shape where it is super useful for loops, but it's already useful for finding use of stale stack variables, which was the whole point originally, so i think this should definitely land soon.

In D16403#992452, @NoQ wrote:

Am i understanding correctly that while destroying a you can still use the storage of b safely? Or should a go out of scope before b gets destroyed?

AFAIU, when we destroying a we can still use the storage of b because nothing can be allocated into b's storage between calling destructors for b and a. So, imho the sequence should look like:

[B4.5].~A() (Implicit destructor)
[B5.3].~A() (Implicit destructor)
CFGScopeEnd(b)
CFGScopeEnd(a)

Thought about it a bit more and this still doesn't look correct to me. Like, a.~A() is a function call. It can do a lot of unexpected stuff to registers and stack space before jumping into the function. And given that a and b are in different scopes (a is in loop scope, b is in single iteration scope), storage of b is not protected from such stuff during call to the destructor of a. So there's definitely something fishy here. I guess scope ends and destructors would have to be properly interleaved in all cases of exiting multiple scopes.

NoQ added a reviewer: rsmith.Feb 7 2018, 6:01 PM
NoQ added a subscriber: rsmith.

Added @rsmith because we're trying to give him a heads up every time large CFG changes are coming, because if we mess up it would affect not only the analyzer but the whole compiler through analysis-based compiler warnings, just in case.

In D16403#992452, @NoQ wrote:

Thank you, this explanation looks very reasonable.

All right, so right after the termination of the loop we have

[B1]
1: ForStmt (LoopExit)
2: [B4.5].~A() (Implicit destructor)
3: [B5.3].~A() (Implicit destructor)
4: CFGScopeEnd(a)
5: CFGScopeEnd(b)

... where [B4.5] is A b = a; and [B5.3] is A a;. Am i understanding correctly that while destroying a you can still use the storage of b safely? Or should a go out of scope before b gets destroyed? Also, is the order of scope ends actually correct here - shouldn't b go out of scope earlier? Given that they are in very different lifetime scopes (a is one for the whole loop, b is per-loop-iteration). I guess the order would matter for the analyzer.

I guess CFGScopeEnd should happen (should be encountered) before the LoopExit and CFGScopeBegin should be after LoopEntrance (still not sure if it is going to be part of the CFG ever.) Just like parenthesis {(())} that is what would make the most sense for me. However, I do not want to block this patch or something so I can do these changes as well by modifying LoopExit (and probably should since ImplicitDestructor calls should precede the LoopExit as well).

In D16403#992454, @NoQ wrote:

@szepet: so i see that LoopExit goes in the beginning of the cleanup block after the loop, while various ScopeEnds go after the LoopExit. Would loop unrolling be significantly broken if you simply subscribe to ScopeEnd instead of LoopExit and avoid cleaning up the loop state until destructors are processed? I might not be remembering correctly - is LoopExit only used for cleanup, or do we have more work to be done here?

I guess your following comment just answers this:

In D16403#1001466, @NoQ wrote:

We still don't have scopes for segments of code that don't have any variables in them, so i guess it's not yet in the shape where it is super useful for loops, but it's already useful for finding use of stale stack variables, which was the whole point originally, so i think this should definitely land soon.

It could be, however, we would lose cases like:

int i = 0;
int a[32];
for(i = 0;i<32;++i) {a[i] = i;}

Since there is no variable which has the scope of the loop, ScopeEnd would be not enough. Sure, we could remove this case, however, the aim is to extend the loop-patterns for completely unrolling. Another thing that there are the patches which would enhance the covered cases by LoopExit (D39398) and add the LoopEntrance to the CFG(D41150) as well. So, at this point, I feel like it would be a huge step back not to use these elements. (Sorry Maxim that we are discussing this here^^)

NoQ added a comment.Feb 16 2018, 8:07 PM
In D16403#992454, @NoQ wrote:

@szepet: so i see that LoopExit goes in the beginning of the cleanup block after the loop, while various ScopeEnds go after the LoopExit. Would loop unrolling be significantly broken if you simply subscribe to ScopeEnd instead of LoopExit and avoid cleaning up the loop state until destructors are processed? I might not be remembering correctly - is LoopExit only used for cleanup, or do we have more work to be done here?

I guess your following comment just answers this:

In D16403#1001466, @NoQ wrote:

We still don't have scopes for segments of code that don't have any variables in them, so i guess it's not yet in the shape where it is super useful for loops, but it's already useful for finding use of stale stack variables, which was the whole point originally, so i think this should definitely land soon.

It could be, however, we would lose cases like:

int i = 0;
int a[32];
for(i = 0;i<32;++i) {a[i] = i;}

Since there is no variable which has the scope of the loop, ScopeEnd would be not enough. Sure, we could remove this case, however, the aim is to extend the loop-patterns for completely unrolling. Another thing that there are the patches which would enhance the covered cases by LoopExit (D39398) and add the LoopEntrance to the CFG(D41150) as well. So, at this point, I feel like it would be a huge step back not to use these elements. (Sorry Maxim that we are discussing this here^^)

Yeah, i mean, like, if we change the scope markers to also appear even when no variables are present in the scope, then it would be possible to replace loop markers with some of the scope markers, right?

In D16403#1011218, @NoQ wrote:
In D16403#992454, @NoQ wrote:

@szepet: so i see that LoopExit goes in the beginning of the cleanup block after the loop, while various ScopeEnds go after the LoopExit. Would loop unrolling be significantly broken if you simply subscribe to ScopeEnd instead of LoopExit and avoid cleaning up the loop state until destructors are processed? I might not be remembering correctly - is LoopExit only used for cleanup, or do we have more work to be done here?

I guess your following comment just answers this:

In D16403#1001466, @NoQ wrote:

We still don't have scopes for segments of code that don't have any variables in them, so i guess it's not yet in the shape where it is super useful for loops, but it's already useful for finding use of stale stack variables, which was the whole point originally, so i think this should definitely land soon.

It could be, however, we would lose cases like:

int i = 0;
int a[32];
for(i = 0;i<32;++i) {a[i] = i;}

Since there is no variable which has the scope of the loop, ScopeEnd would be not enough. Sure, we could remove this case, however, the aim is to extend the loop-patterns for completely unrolling. Another thing that there are the patches which would enhance the covered cases by LoopExit (D39398) and add the LoopEntrance to the CFG(D41150) as well. So, at this point, I feel like it would be a huge step back not to use these elements. (Sorry Maxim that we are discussing this here^^)

Yeah, i mean, like, if we change the scope markers to also appear even when no variables are present in the scope, then it would be possible to replace loop markers with some of the scope markers, right?

Hm, so does this mean that I need to cover the case when no variables are present in loop scope here in this patch?

In D16403#1001466, @NoQ wrote:

I poked Devin offline and we agreed that the overall approach is good to go. Maxim, thank you for picking it up!

We still don't have scopes for segments of code that don't have any variables in them, so i guess it's not yet in the shape where it is super useful for loops, but it's already useful for finding use of stale stack variables, which was the whole point originally, so i think this should definitely land soon.

In D16403#992452, @NoQ wrote:

Am i understanding correctly that while destroying a you can still use the storage of b safely? Or should a go out of scope before b gets destroyed?

AFAIU, when we destroying a we can still use the storage of b because nothing can be allocated into b's storage between calling destructors for b and a. So, imho the sequence should look like:

[B4.5].~A() (Implicit destructor)
[B5.3].~A() (Implicit destructor)
CFGScopeEnd(b)
CFGScopeEnd(a)

Thought about it a bit more and this still doesn't look correct to me. Like, a.~A() is a function call. It can do a lot of unexpected stuff to registers and stack space before jumping into the function. And given that a and b are in different scopes (a is in loop scope, b is in single iteration scope), storage of b is not protected from such stuff during call to the destructor of a. So there's definitely something fishy here. I guess scope ends and destructors would have to be properly interleaved in all cases of exiting multiple scopes.

Sounds reasonable, I'll fix this.

NoQ added a comment.Feb 20 2018, 10:15 AM
In D16403#1011218, @NoQ wrote:

Yeah, i mean, like, if we change the scope markers to also appear even when no variables are present in the scope, then it would be possible to replace loop markers with some of the scope markers, right?

Hm, so does this mean that I need to cover the case when no variables are present in loop scope here in this patch?

Definitely not :) Feel free to address it some day (or let someone else address it) - the culture of keeping our patches small is something that we currently badly lack :)

In D16403#1011218, @NoQ wrote:

Yeah, i mean, like, if we change the scope markers to also appear even when no variables are present in the scope, then it would be possible to replace loop markers with some of the scope markers, right?

OK, probably I am a bit too slow for this, but I dont get it. Yes, it would be possible to replace them IF these markers appears even in case of no variables. However, this patch is based on LocalScopes which practically means that VarDecls are needed. Aaand here we are, it would require a different approach to consistently mark things like LoopExit.
Another thing that what should be the TriggerStmt? I guess the Stmt of the loop. So, LoopExit and ScopeExit would be the same but the underlying TriggerStmt would decide which marks a loop and which marks a variable? Then I can just rewrite LoopExit into ScopeEnd. Or if you would like to see that, a subclass of ScopeEnd. However, the main functionality should stay the same (I guess).
So, could you help me out what are those things I do not see/understand?

Rebased and updated. Fix the issue with ordering between ScopeEnds and implicit destructors.

Now, the cfg for this example:

void test_for_implicit_scope() {
  for (A a; A b = a;)
    A c;
}

looks like this:

NoQ accepted this revision.Mar 7 2018, 5:50 PM

I think we should land this and celebrate.

@szepet: Ouch, i was sure i already answered this, sorry, dunno where this went.

So, LoopExit and ScopeExit would be the same but the underlying TriggerStmt would decide which marks a loop and which marks a variable?

I was thinking of a single CFGElement to mark both. Which would probably be called "ScopeExit", but you could use it for detecting the end of the loop and, if necessary, add whatever information you need into it.

lib/Analysis/CFG.cpp
1550

It seems that something has moved asterisks to a weird spot, might be a rebase artifact or accidental tooling.

This revision is now accepted and ready to land.Mar 7 2018, 5:50 PM
NoQ added inline comments.Mar 7 2018, 5:51 PM
lib/Analysis/CFG.cpp
1550

Uhm, it was always this way, just weird history when i looked at the diff between diffs, nvm, sorry.

thakis added a subscriber: thakis.Mar 8 2018, 9:47 AM

Since this affects analysis-based warnings, have you checked if this patch has any effect on compile times?

NoQ added a comment.Mar 8 2018, 11:04 AM

Since this affects analysis-based warnings, have you checked if this patch has any effect on compile times?

Just in case - this is under an analyzer-only flag, so the actual warnings are not affected. I guess it might be useful to evaluate compilation time impact caused by checking the flag and bailing out, but i doubt it'd be noticeable.

This revision was automatically updated to reflect the committed changes.