Page MenuHomePhabricator

[analyzer][WIP] Implement a primitive reaching definitions analysis
Needs ReviewPublic

Authored by Szelethus on Jul 19 2019, 6:30 AM.

Details

Summary

When I say WIP, I really mean it. This has no comments, no nothing, but follows this algorithm pretty literally.

This analysis isn't conservative enough yet, I really should include function calls with non-const references into the gen set, but hey, at this point I'm optimistic that it can be done.

Diff Detail

Event Timeline

Szelethus created this revision.Jul 19 2019, 6:30 AM
Szelethus updated this revision to Diff 211119.EditedJul 22 2019, 9:04 AM

I plan to split this patch up eventually, but here's where I'm standing right now:

  • Correct the algorithm that by accident did this: GEN[B] = GEN[B] union (IN[B] - KILL[B]) instead of this: OUT[B] = GEN[B] union (IN[B] - KILL[B])
  • Realize that llvm::SmallSet uses both operator== and operator<, and since for a reaching definition set I'd like to sort by VarDecl, but for the kill set that AND the CFGElementRef, give up on it and use std::set instead with two different comparators
  • Collect all non-local variables and regard all function calls as definitions to them
  • Collect parameter declarations as well, add them to GEN[entry block]
  • Plenty more test cases

Still left to do:

  • Document, cite the algorithm
  • If a pointee is written, invalidate all variables with the same type:
int i;
int *ptr;
// ...
*ptr = 0; // might point to i, regard it as i's definition
  • Tests for unreachable successors
  • Evaluate on a large codebase
  • Come up with something smart to handle FieldDecls (or just ignore them :^) )
NoQ added a comment.Jul 22 2019, 1:16 PM

CFGElementRef

Wait, is it a thing already?? Did i miss anything???

This analysis isn't conservative enough yet, I really should include function calls with non-const references into the gen set, but hey, at this point I'm optimistic that it can be done.

"Conservative" is not a property of the algorithm; it depends on how is the algorithm used. For some purposes you'll want to add an extra definition when you're not sure if it'll be a definition, for other purposes you'll want to only have definitions in which you are 100% certain in the set.

clang/lib/Analysis/ReachingDefinitions.cpp
69

I don't think you really need this. Every assignment would anyway appear in the CFG as a separate element. And also forEachDescendant() may accidentally scan items in different CFG blocks (i.e., if the whole statement is a ?: or a logical op).

In D64991#1596292, @NoQ wrote:

CFGElementRef

Wait, is it a thing already?? Did i miss anything???

Oh, yea, I have it locally, still need to iron out a couple things on that before daring to upload it :) I figured it wouldn't cause much confusion so I shared this one kinda hanging in the air.

This analysis isn't conservative enough yet, I really should include function calls with non-const references into the gen set, but hey, at this point I'm optimistic that it can be done.

"Conservative" is not a property of the algorithm; it depends on how is the algorithm used. For some purposes you'll want to add an extra definition when you're not sure if it'll be a definition, for other purposes you'll want to only have definitions in which you are 100% certain in the set.

Aye I'll try to specify on that. Thanks!

Szelethus added a subscriber: steakhal.

@steakhal you some great experience with (strict) aliasing, could you chip in on this maybe? :)

Szelethus updated this revision to Diff 211457.Jul 24 2019, 3:33 AM

Rebase on top of D65196.

Szelethus updated this revision to Diff 211509.Jul 24 2019, 7:58 AM

Use the correct sorting for the GEN, KILL, IN and OUT sets.

xazax.hun added inline comments.Jul 29 2019, 9:17 PM
clang/include/clang/Analysis/Analyses/ReachingDefinitions.h
30

In the future we might also want to reason about FieldDecls.

50

Is it safe to omit the Kind from the comparisons?

clang/lib/Analysis/ReachingDefinitions.cpp
61

Will this work if the LHS is not directly a declrefexpr?

E.g.:

(a, b) = 5;

Or more fun cases:

(cond ? a : b) = 5;
102

What about the non-const arguments of calls, and indirect writes through pointers? Even if you plan not to handle some of these, I prefer to have a TODO for these the appropriate places as soon as possible.

104

There are non-assignment operators too, like operator++. You might want to consider them as definitions too.

Hmm, we could make a redundant assignments checker: if a variable has multiple reaching definitions, but those all assign the same value, emit a warning. We could even use fixits with that.

void t(int a) {
  if (coin())
    a = 2; // note: reaching def
  else
    a = 2; // note: reaching def
  use(a); // warn: a is always 2 here
}
NoQ added a comment.Mon, Sep 9, 5:24 PM

Hmm, we could make a redundant assignments checker: if a variable has multiple reaching definitions, but those all assign the same value, emit a warning. We could even use fixits with that.

void t(int a) {
  if (coin())
    a = 2; // note: reaching def
  else
    a = 2; // note: reaching def
  use(a); // warn: a is always 2 here
}

Sounds like a useful compiler warning to me.

Also what specific fixit do you have in mind and why do you think it'll be easily obtainable from the results of the reaching definition analysis?