This is an archive of the discontinued LLVM Phabricator instance.

[CFLAA] Added an initial working implementation of Andersen's analysis
ClosedPublic

Authored by grievejia on Jul 12 2016, 4:56 PM.

Details

Summary

Sorry for the long delay. This is the initial prototype of CFLAndersAliasAnalysis. The algorithm I used here is an eager variant of what was described in the paper "Demand-driven alias analysis in C". Please refer to the comments for more detailed description of the algorithm.

For now the analysis is NOT in any usable shape. It is not sufficiently tested, can only handle intra-procedural queries, is field-insensitive, and doesn't understand AliasAttrs. There are also lots of code duplications between CFLSteens and CFLAnders. These are all problems that future patches need to address.

Diff Detail

Repository
rL LLVM

Event Timeline

grievejia updated this revision to Diff 63758.Jul 12 2016, 4:56 PM
grievejia retitled this revision from to [CFLAA] Added an initial working implementation of Andersen's analysis.
grievejia updated this object.
grievejia added a subscriber: llvm-commits.

Thanks for the patch!

lib/Analysis/CFLAndersAliasAnalysis.cpp
119 ↗(On Diff #63758)

Redundant else

120 ↗(On Diff #63758)

Is this correct with how we handle insert?

ISTM that the assert wouldn't fire in the following code:

InstantiatedValue A{0x100, 0};
InstantiatedValue B{0x200, 0};

ReachSet.insert(A, B, MatchState::Whatever);
ReachSet.insert(B, A, MatchState::Whatever);
auto BAliases = ReachSet.reachableValueAliases(B);
assert(std::distance(BAliases.begin(), BAliases.end()) == 0);

...Which looks bad. I could be misunderstanding, though.

130 ↗(On Diff #63758)

s/papter/paper. (Also, please add a period)

150 ↗(On Diff #63758)

Same "wouldn't insert make this not work" comment as in ReachabilitySet

154 ↗(On Diff #63758)

Redundant else

186 ↗(On Diff #63758)

Please be consistent with line 193 when comparing against 0. :)

198 ↗(On Diff #63758)

sort is only guaranteed to use operator< to sort its elems. Since we're using std::less<const Value *> elsewhere, do we want to use it here, as well?

216 ↗(On Diff #63758)

Same comment as sort above

255 ↗(On Diff #63758)

Maybe:

if (...)
  return NodeBelow;
return None;

would be a bit cleaner than Optional<...> Ret;?

305 ↗(On Diff #63758)

Looks like there's a fair amount of duplication between this and FlowFromMemAlias (same for FlowTo/FlowToMemAlias). We may want to clean that up at some point, assuming the order in which we propagate doesn't matter.

426 ↗(On Diff #63758)

FWIW, Optional<T>::operator-> will do this assertion for you.

lib/Analysis/CFLGraph.h
219 ↗(On Diff #63758)

I realize that there's a fair amount of duplication in this patch that we plan to kill, but it doesn't seem too difficult to merge this and addLoadEdge's implementations.

(I'm fine with having two separate functions that users call; I just don't like two nearly identical functions sitting right next to each other. :) )

grievejia marked 6 inline comments as done.Jul 14 2016, 8:36 AM
grievejia added inline comments.
lib/Analysis/CFLAndersAliasAnalysis.cpp
120 ↗(On Diff #63758)

I thought so in the beginning. But if we choose not to sort A and B and just let the insertion happen, then I'm pretty sure that half of the storage in ReachSet will be wasted, since we will store the fact that "A is reachable from B" twice.

In your case, we won't put A in B's reaching set, but we do put B in A's reaching set, so if the fixpoint is properly reached, then there shouldn't be any problem because we will eventually propagate everything and it doesn't matter whether the reachability goes through A or B. The question is, will the propagation terminate before a fixpoint is reached? That I'm not sure. More tests are needed to help me figure that out.

198 ↗(On Diff #63758)

Wow I thought std::sort uses std::less by default! What a wonderful design C++ has!

In practices I think I could just use operator< for pointer comparison. Most compiler would just accept it. The reason I used std::less is to add some portability here because strictly speaking comparing pointers to different allocations is UB according to language spec.

255 ↗(On Diff #63758)

Agreed. I wrote it this way because I am pretty sure NRVO can be triggered here. I'm not too sure if today's compiler is smart enough to RVO if different values are returned in different branches?

305 ↗(On Diff #63758)

Agreed. It shouldn't matter, and I hope it doesn't...

grievejia updated this revision to Diff 63983.Jul 14 2016, 8:37 AM
grievejia edited edge metadata.
grievejia marked an inline comment as done.

Code duplication fixes.

lib/Analysis/CFLAndersAliasAnalysis.cpp
120 ↗(On Diff #63983)

I'd much rather just put a FIXME that says "here's a neat optimization we may be able to do: [...]," because this sounds like an optimization that, if it subtly breaks things, can cost a nontrivial amount of debugging time. (...And it potentially makes code harder to reason about, since reachableValueAliases effectively returns an arbitrary subset of the reachable value aliases for a given InstantiatedValue)

To be clear, I think it's a valuable optimization if we can do it, but for now, I think it would be better to focus on getting simple, correct code. When we have a reasonable number of tests/features, we can always swap back to the "sorted" version and see if anything breaks.

255 ↗(On Diff #63983)

Seems that it's inlined everywhere, so rvo/nrvo would be a non-issue. Throwing a noinline on it generates the same machine code for either function body (with clang 3.7 + -O3 + no asserts -- didn't test in any other configuration); the IR differs a tiny bit, but that difference is apparently immaterial on x86. Allowing inlining, there are a few small differences in the produced asm, and stacks end up a handful of bytes bigger in a few cases, but that's about it.

If you want to check things like this in the future, here's what I did:

  • __attribute__((noinline)) on the function definition
  • Generated my cmake directory with -DCMAKE_EXPORT_COMPILE_COMMANDS=Yes
  • Looked for the relevant compile command inside of /path/to/build/dir/compile_commands.json, added -DNDEBUG, and threw -S -emit-llvm on it so it would hand back IR
  • Grepped the resultant IR for getNodeBelow

:)

If you prefer the way this is written, I think the readability difference is basically zero, so I'm fine with it.

grievejia updated this revision to Diff 64063.Jul 14 2016, 4:28 PM
grievejia marked 2 inline comments as done.

Removed the symmetry-based optimization.

george.burgess.iv edited edge metadata.

LGTM -- will commit later today. Thanks again!

This revision is now accepted and ready to land.Jul 15 2016, 11:25 AM
This revision was automatically updated to reflect the committed changes.