This is an archive of the discontinued LLVM Phabricator instance.

[CFLAA] Add interprocerual analysis support to CFLAndersAliasAnalysis
ClosedPublic

Authored by grievejia on Jul 17 2016, 3:07 PM.

Details

Summary

This patch finally makes CFLAnders properly build function summaries.

The approach I took was more complicated than I expected. Initially, I thought that the information we need to build the summary is all included in the ReachabilitySet, and I could just iterate over it once to populate the entire function summary. However, it turns out that we might have cases like this:

define i32* @f(i32*** %arg1) {
  %d = load i32**, i32*** arg1
  %d2 = load i32*, i32** %d
  ret i32* %d2
}

Here node (%d2, 0) does not value-alias with any other nodes in ReachabilitySet [1][2]. All we can deduce from ReachSet is that (%d, 1) is assigned to (%d2, 0), and (%arg, 1) is assigned to (%d, 0). The value %d here acts as an "intermediate value" that connects assignments from/to its different DerefLevel. Such a connection is not something we could obvious obtain from ReachabilitySet, which only holds assignments that are on the same level. If we don't check for this case, we will incorrectly conclude that the return value of @f has no relation with the parameter %arg1 of @f.

There could be many ways to resolve the above issue. The proposed solution in this patch is to build a read&write set for each value V in the function, holding param/return values that are read into or write from V, respectively. If both the read set and the write set of V are non-empty, then we know that V has been used as an "intermediate value", and we should add an cross-level assignment summary edge from everything in the write set to everything in the read set. An alternative design is to bake those read/write sets into ReachabilitySet itself so we don't have to build additional structures at summary construction time. But I decided to go with the current approach since it is relatively cleaner and easy to think about.

[1] Here I use the pair notation (x, y) to denote an InstantiatedValue with value x and dereference-level y.
[2] In theory, (%d2, 0) should alias (%arg1, 2), but our CFL matching algorithm doesn't add the node (%arg1, 2) into ReachabilitySet. The algorithm doesn't do it because (%arg1, 2) never gets explicitly referenced in the function body. If we were to add (%arg1, 2), we need to also consider questions like "why don't we add (%arg1, 3)", "at which DerefLevel should we stop adding nodes", and things may quickly get unnecessarily complicated that way.

Diff Detail

Repository
rL LLVM

Event Timeline

grievejia updated this revision to Diff 64264.Jul 17 2016, 3:07 PM
grievejia retitled this revision from to [CFLAA] Add interprocerual analysis support to CFLAndersAliasAnalysis.
grievejia updated this object.
grievejia added a subscriber: llvm-commits.

Yay tests! Thanks for the patch

lib/Analysis/CFLAndersAliasAnalysis.cpp
99 ↗(On Diff #64264)

Is there a reason these aren't static?

256 ↗(On Diff #64264)

Nit: Apparently we have a function called is_contained in STLExtras that does just this. So, can we use llvm::is_contained(RetVals, Val)) instead, please? :)

302 ↗(On Diff #64264)

I realize that Args and RetVals will generally be small, but can we use a set (for either) instead, please?

303 ↗(On Diff #64264)

Nit: is_contained

339 ↗(On Diff #64264)

Do we intend to do nothing here if it InnerMapping is e.g. FlowToReadWrite?

test/Analysis/CFLAliasAnalysis/Steensgaard/interproc-store-arg-multilevel.ll
32 ↗(On Diff #64264)

Is this a cleanup from before, or is this as a result of this change? If the former, I'll commit it separately. :)

grievejia added inline comments.Jul 19 2016, 9:42 AM
lib/Analysis/CFLAndersAliasAnalysis.cpp
302 ↗(On Diff #64264)

If the vectors are small, isn't it unlikely that sets will be faster?

339 ↗(On Diff #64264)

Yes. All XXXReadWrite states are not useful here.

The reason is, again, that (X, Y, XXXReadWrite) represent structures like "X = Z; Y = Z;". In other words, there is no transitive assignments from X to Y or from Y to X. If you reason about the inclusion relations, the points-to set of X does not have to be a subset of the points-to set of Y, and vice versa. It's therefore imprecise to add "X = Y" or "Y = X" to the function summary.

What should be added to the summary is "X=Z" and "Y=Z", but we don't know what Z is so we can't do that. It is fine that we don't know Z though: if such a Z exists, we must have (X, Z, XXXReadOnly), (Z, X, XXXWriteOnly), (Y, Z, XXXReadOnly), and (Z, Y, XXXWriteOnly) all stored somewhere in ReachSet. Z will eventually be processed when those entries are encountered

test/Analysis/CFLAliasAnalysis/Steensgaard/interproc-store-arg-multilevel.ll
32 ↗(On Diff #64264)

This is a cleanup. I made a mistake. %b must alias %lp here.

grievejia added inline comments.Jul 19 2016, 9:47 AM
lib/Analysis/CFLAndersAliasAnalysis.cpp
99 ↗(On Diff #64264)

They are defined inside anonymous namespace so I assume they're implicitly static? We did the same thing for those StratifiedIndices.

grievejia updated this revision to Diff 64514.Jul 19 2016, 9:53 AM
grievejia edited edge metadata.
grievejia marked 2 inline comments as done.

Replace std::find with is_contained

george.burgess.iv edited edge metadata.

Looks good -- thanks again! (Will commit the change to interproc-store-arg-multilevel.ll separately, as promised, to make it clear that it's unrelated)

lib/Analysis/CFLAndersAliasAnalysis.cpp
110 ↗(On Diff #64514)

That makes sense, then; generally the preference in LLVM tends to be "only put structs/classes in anonymous namespaces; make everything else static," but so long as these aren't externally visible, I'm happy.

313 ↗(On Diff #64514)

Yeah, I guess I don't see a real-world case where either of these will (practically) be huge, so you're right. :)

This revision is now accepted and ready to land.Jul 19 2016, 1:44 PM
This revision was automatically updated to reflect the committed changes.