This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] (1/n) Support pointee mutation analysis in ExprMutationAnalyzer.
AcceptedPublic

Authored by shuaiwang on Sep 17 2018, 10:53 PM.

Details

Summary

We handle pointee mutation for native pointers & pointer-like types
(loosely defined as having an operator* returning non-const reference)

This diff alone just implemented the cases where pointee of an Exp is
*directly* mutated (e.g. invoking non-const member function.)

The most trivial class of casts is handled as well for easier unit
testing, findPointeeCastMutation is not done yet.

Planned future work:

  • findPointeeDerefMutation: Exp is first dereferenced, and then mutated.
  • findPointee{Member,Array}Mutation: member (or array element) of pointee is accessed and then mutated.
  • findPointeeArithmeticMutation: handling pointer arithmetic
  • findPointeeAssignmentMutation: pointer is assigned to another var and then mutated.

Diff Detail

Event Timeline

shuaiwang created this revision.Sep 17 2018, 10:53 PM

Looks like a good start! I think getting the pointers right will be most difficult, because of the multiple levels of indirection they allow. Do you think it would be possible to the analysis for >const?< int ***-cases? (recursively checking through the pointer levels)

lib/Analysis/ExprMutationAnalyzer.cpp
225

Just to be sure that i understand:
the changes here are more performance optimizations then directly related to detect pointee mutations?

467

no else after return. this makes the short circuit logic clearer

469

values are not marked as const by the llvm code guideline

486

implicit conversion from pointer to bool? Please make that better visible and please dont use const on values.
Not sure for the matchers, I have seen both, but they should be treated as values as well i think.

490

Please make that sentence clearer, i could not understand it (only interpret :D)

508

shouldn't be the constness of the argument considered here?

515

Either remove the comment or make it a full sentence please. I think the variable naming is clear enough to elide

527

Please add a todo to ensure the missing casts are not forgotten

unittests/Analysis/ExprMutationAnalyzerTest.cpp
54

maybe even assert(E && S)?

76

That doesn't look like a super idea. It is super hidden that variable 'p*' will be analyzed for pointee mutation and others aren't. Especially in 12 months and when someone else wants to change something, this will be the source of headaches.

Don't you think there could be a better way of doing this switch?

169

I feel that there a multiple tests missing:

  • multiple levels of pointers int ***, int * const *
  • pointers to references int &*
  • references to pointers int *&
  • ensure that having a const pointer does no influence the pointee analysis int * const p = &i; *p = 42;
  • a class with operator* + operator-> const/non-const and the analysis for pointers to that class
  • pointer returned from a function
  • non-const reference returned
int& foo(int *p) {
  return *p;
}
194

Could you please add the EXPECT_FALSE(isMutated()) && EXPECT_TRUE(isPointeeMutated()?
Maybe a short helper for that like isOnlyPointeeMutated() would be nice.

Here and the other cases as well

705

Please add a mutating example for array access through a pointer (as its very common in C-style code).

shuaiwang updated this revision to Diff 166068.Sep 18 2018, 9:39 PM
shuaiwang marked 7 inline comments as done.

[WIP] Addressed some of review comments.

Do you think it would be possible to the analysis for >const?< int ***-cases? (recursively checking through the pointer levels)

I think that should be possible, will do after single-layer pointee analysis is done. I believe we can build multi-layer analysis based on much of the single-layer analysis.

lib/Analysis/ExprMutationAnalyzer.cpp
225

Yes :)

508

We need that for non-pointee version, but not for pointee version, for example:

void g1(int * const);

void f1() {
  int *x;
  g1(x); // <-- x is passed to `g1`, we consider that as a mutation, the argument type do have a top-level const
}

void g2(const int *);

void f2() {
  int *x;
  g2(x); // <-- declRefExp(to(x)) is NOT directly passed to `g2`, there's a layer a ImplicitCastExpr<NoOp> in between, and after the implicit cast, the type of the expression becomes "const int *" instead of just "int*", so it'll fail the `isPointeeMutable` check at the beginning of `findPointeeDirectMutation`
}

In summary, we rely on:

  • Checking whether pointee is actually mutable at the beginning
  • Carefully handling casts by not trivially ignoring them unless absolutely safe
JonasToth added inline comments.Sep 18 2018, 11:20 PM
lib/Analysis/ExprMutationAnalyzer.cpp
508

I see. That makes sense, thanks for clarification :)

Szelethus added inline comments.Sep 19 2018, 7:39 PM
lib/Analysis/ExprMutationAnalyzer.cpp
86

Didn't you mean these to be doxygen comments?

shuaiwang marked 5 inline comments as done.Sep 19 2018, 10:35 PM
shuaiwang added inline comments.
unittests/Analysis/ExprMutationAnalyzerTest.cpp
76

Yeah... agree :)
But how about let's keep it this way just for now, and I'll pause the work on supporting pointee analysis and fix it properly but in a separate diff following this one?

I've been thinking about this and also did some prototyping, and here are my thoughts:
In order to properly fix this we need to change the interface of ExprMutationAnalyzer to not simply return a Stmt but instead return a MutationTrace with additional metadata. This wasn't needed before because we can reconstruct a mutation chain by continuously calling findMutation, and that works well for cases like "int x; int& r0 = x; int& r1 = r0; r1 = 123;". But now that pointers come into play, and we need to handle cases like "int *p; int *p2 = p; int& r = *p2; r = 123;", and in order to reconstruct the mutation chain, we need to do findPointeeMutation(p) -> findPointeeMutation(p2) -> findMutation(r), notice that we need to switch from calling findPointeeMutation to calling findMutation. However we don't know where the switch should happen simply based on what's returned from findPointeeMutation, we need additional metadata for that.

That interface change will unavoidably affect how memoization is done so we need to change memoization as well.

Now since we need to change both the interface and memoization anyway, we might as well design them properly so that multiple-layers of pointers can be supported nicely. I think we can end up with something like this:

class ExprMutationAnalyzer {
public:
  struct MutationTrace {
    const Stmt *Value;
    const MutationTrace *Next;
    // Other metadata
  };

  // Finds whether any mutative operations are performed on the given expression after dereferencing `DerefLayers` times.
  const MutationTrace *findMutation(const Expr *, int DerefLayers = 0);
  const MutationTrace *findMutation(const Decl *, int DerefLayers = 0);
};

This involves quite some refactoring, and doing this refactoring before this diff and later rebasing seems quite some trouble that I'd like to avoid.

Let me know what do you think :)

JonasToth added inline comments.Sep 20 2018, 6:55 AM
unittests/Analysis/ExprMutationAnalyzerTest.cpp
76

Is the second part of your answer part to adressing this testing issue?
Given the whole Analyzer is WIP it is ok to postpone the testing change to later for me. But I feel that this is a quality issue that more experience clang develops should comment on because it still lands in trunk. Are you aware of other patches then clang-tidy that rely on EMA?

Regarding you second part (its related to multi-layer pointer mutation?!):
Having a dependency-graph that follows the general data flow _with_ mutation annotations would be nice to have. There is probably even something in clang (e.g. CFG is probably similar in idea, just with all execution paths), but I don't know enough to properly judge here.

In my opinion it is ok, to not do the refactoring now and just support one layer of pointer indirection. Especially in C++ there is usually no need for multi-layer pointers but more a relict from C-style.
C on the other hand relies on that.
Designing EMA new for the more generalized functionality (if necessary) would be of course great, but again: Better analyzer ppl should be involved then me, even though I would still be very interested to help :)

169

for the multi-level pointer mutation: it would be enough to test, that the second layer is analyzed properly, and that the int * >const< * would be detected.

shuaiwang added inline comments.Sep 20 2018, 9:29 AM
unittests/Analysis/ExprMutationAnalyzerTest.cpp
76

The second part is more of trying to explain why I'd prefer to keep the test this way just for this diff.
Basically we need a refactoring that makes EMA returns MutationTrace in order to fix this test util here.

But thinking more about this, maybe we don't need that (for now), this util is to help restore a mutation chain, and since we don't support a chain of mutation for pointers yet we can have a simpler version of pointeeMutatedBy that doesn't support chaining.

I mentioned multi-layer pointer mutation because _if_ we were to do a refactoring, we'd better just do it right and take possible features needed in the future into account. But the refactoring itself is motivated by the need of fixing this testing util (and making the return value from findMutation generally more useful.)

BTW by MutationTrace it's not as complicated or sophisticated as CFG, it's simply a trace helping understand why the mutation happens, for example in this case:

int x;
int &y = x;
int &z = y;
z = 123;

it's a bit disconnected if we just say z = 123 mutated declRefExpr(to(x)), instead:

  • findMutation(x) returns y
  • findMutation(y) returns z
  • findMutation(z) returns z = 123

This doesn't apply to (majority) of cases where the mutative expression is an ancestor of the given expression, in those cases it's typically quite clear why the ancestor expression mutated its descendant.

Anyway, I'll remove this little hackery here and address other comments later (hopefully today), and I'll do a refactoring before pointer analysis supports chaining.

JonasToth added inline comments.Sep 20 2018, 9:37 AM
unittests/Analysis/ExprMutationAnalyzerTest.cpp
76

Ok, understanding the decision making of the EMA would be a good thing to have in general as well.

shuaiwang marked 10 inline comments as done.Sep 21 2018, 10:12 PM
shuaiwang added inline comments.
lib/Analysis/ExprMutationAnalyzer.cpp
86

I feel these don't need to be doxygen comments.

unittests/Analysis/ExprMutationAnalyzerTest.cpp
169

Added except for:

  • Anything that requires following a dereference, we need findPointeeDerefMutation for that.
  • Pointer to a class with operator* + operator->, I think those two operators doesn't matter, there's no way to accidentally invoke them from a pointer.
705

This also need findPointeeDerefMutation.

shuaiwang marked 5 inline comments as done.

Addresses review comments.

JonasToth added inline comments.Sep 22 2018, 5:35 AM
unittests/Analysis/ExprMutationAnalyzerTest.cpp
169

But we want to analyze smart pointers in the future as well, not? It would be good to already prepare that in the testing department.
Or would the nature of operator* already make that happen magically?

shuaiwang added inline comments.Sep 22 2018, 10:37 AM
unittests/Analysis/ExprMutationAnalyzerTest.cpp
169

Yes we'll handle smart pointers, and we'll handle that in findPointeeDerefMutation, basically it'll look like:

if (native pointer && derefed with *) findMutation(deref expr)
if (smart pointer && called operator*) findMutation(operator call expr)
if (smart pointer && called operator->) findPointeeMutation(operator call expr)

I think it would be more clear if we can match the implementation progress with unit test cases as that shows what kind of cases starts to be supported by the change.

TDD, thats ok ;)

Am 22.09.2018 um 19:37 schrieb Shuai Wang via Phabricator:

shuaiwang added inline comments.

Comment at: unittests/Analysis/ExprMutationAnalyzerTest.cpp:156

EXPECT_THAT(mutatedBy(Results, AST.get()), ElementsAre("x.mf()"));

+

+ AST = tooling::buildASTFromCode(

JonasToth wrote:

shuaiwang wrote:

JonasToth wrote:

JonasToth wrote:

I feel that there a multiple tests missing:

  • multiple levels of pointers int ***, int * const *
  • pointers to references int &*
  • references to pointers int *&
  • ensure that having a const pointer does no influence the pointee analysis int * const p = &i; *p = 42;
  • a class with operator* + operator-> const/non-const and the analysis for pointers to that class
  • pointer returned from a function
  • non-const reference returned
int& foo(int *p) {
  return *p;
}

for the multi-level pointer mutation: it would be enough to test, that the second layer is analyzed properly, and that the int * >const< * would be detected.

Added except for:

  • Anything that requires following a dereference, we need findPointeeDerefMutation for that.
  • Pointer to a class with operator* + operator->, I think those two operators doesn't matter, there's no way to accidentally invoke them from a pointer.

But we want to analyze smart pointers in the future as well, not? It would be good to already prepare that in the testing department.
Or would the nature of operator* already make that happen magically?

Yes we'll handle smart pointers, and we'll handle that in findPointeeDerefMutation, basically it'll look like:

if (native pointer && derefed with *) findMutation(deref expr)
if (smart pointer && called operator*) findMutation(operator call expr)
if (smart pointer && called operator->) findPointeeMutation(operator call expr)

I think it would be more clear if we can match the implementation progress with unit test cases as that shows what kind of cases starts to be supported by the change.

Repository:

rC Clang

https://reviews.llvm.org/D52219

shuaiwang updated this revision to Diff 166710.Sep 24 2018, 9:59 AM

Added test case place holder for cases that should be supported in later patches.

JonasToth added inline comments.Sep 25 2018, 3:24 AM
unittests/Analysis/ExprMutationAnalyzerTest.cpp
61

Is there a reason why the pointer is marked const? Usually this is not done in the codebase and I feel consistency is key (similar to const for values).

90

You can put this line 2 lines below, 'initialize late'

169

Ok.

639

a nice test to add would be `const int* f() { int* x; return x; }.

shuaiwang marked 6 inline comments as done.

Addressed review comments.

LGTM, with a few nits and a question. Please wait a bit if @aaron.ballman or @george.karpenkov have any comments before committing.

lib/Analysis/ExprMutationAnalyzer.cpp
88

you can elide the braces here

101

Please make this a full sentence (adding ... the overloaded 'operator*' is returning ... is enough.)

226

This comment is one sentence, just replace const. Bail with const, bail

unittests/Analysis/ExprMutationAnalyzerTest.cpp
87

pointer is const, please remove the last const

645

I am confused. The const int* guarantees, that x-pointee is not mutated. Why shall this be diagnosed as a mutation?

JonasToth accepted this revision.Sep 27 2018, 3:42 AM
This revision is now accepted and ready to land.Sep 27 2018, 3:42 AM
shuaiwang updated this revision to Diff 168578.Oct 6 2018, 4:19 PM
shuaiwang marked 5 inline comments as done.

Resolved review comments.

I think you can commit, there was enough opportunity to respond and we pinged directly as well.