This is an archive of the discontinued LLVM Phabricator instance.

LLVM CFL Alias Analysis -- Supporting Data Structures
AbandonedPublic

Authored by george.burgess.iv on Jul 16 2014, 6:38 PM.

Details

Summary

For the motivation for this patch, please see the primary CFL Alias Analysis patch titled "LLVM CFL Alias Analysis -- Algorithm"

This was split off from the primary CFL AA patch because a ~1.5KLOC review is never fun, and this can be cleanly committed prior to committing the "main" CFL AA patch.

This contains the implementation of a weighted, bidirected graph and of a stratified sets algorithm. For more information, see here: http://www.cse.cuhk.edu.hk/lyu/_media/paper/pldi2013.pdf

Diff Detail

Event Timeline

gbiv updated this revision to Diff 11548.Jul 16 2014, 6:38 PM
gbiv retitled this revision from to LLVM CFL Alias Analysis -- Supporting Data Structures.
gbiv updated this object.
gbiv edited the test plan for this revision. (Show Details)
gbiv added a project: deleted.
gbiv added a subscriber: Unknown Object (MLST).
dberlin edited edge metadata.Jul 17 2014, 2:03 PM

Just some preliminary comments

lib/Analysis/StratifiedSets.h
95

For this implementation, it would be good to explain what the impact of having something above or below something else is. You later use the "below" property to ensure proper marking of externals aliasing, but without an explanation of what the impact of being below something is, it's not obvious to a reader that what you do is right.

409

An explanation of what exactly this worklist algorithm is doing, and why it is right, would be good :)

gbiv updated this revision to Diff 11663.Jul 18 2014, 12:26 PM
gbiv edited edge metadata.

Addressed dberlin's comments, swapped std::map->DenseMap and std::set->DenseSet, and took out some unnecessary headers.

Hi George,

Thanks for working on this! I apologize ahead of time for not entering these comments on the web site (I was composing this review on the plane). As a general note, you'll need to fix a bunch of compliance issues with LLVM's coding conventions (variable naming, braces, etc.), and while I pointed out a couple of these, I quickly stopped.

+ \brief These are stratified sets, as described in the paper located at
+
www.cse.cuhk.edu.hk/lyu/_media/paper/pldi2013.pdf -- in short, this is meant

This looks like an unofficial link that could become stale. I recommend including the full reference (a search engine can find the PDF, if it is still available, at some future time).

+ class StratifiedInfo;
+
+ class StratifiedInfo {
+ public:

You don't need to pre-declare this one line before defining it. Also, make it a struct.

+// This builder allows the user to add elements with, (at the same level as)

Remove the comma prior to the parenthesis.

+// In the paper above, StratifiedSets are described as a data structure wherein

"In the paper above, StratifiedSets are described as" -> "StratifiedSets are"

+ constexpr static auto SetSentinel =
+ std::numeric_limits<StratifiedIndex>::max();

Add a comment explaining how this is used.

+ and the set deemed as below it. This also keeps information on the notion
+
of "remapping",

What does it mean to "keep information on the notion of" somthing? Does this means it manages the state associated with state remapping?

+ elements that have parents in the union-find structure. This is super
+
difficult to debug, so we use accessors.

You don't need to justify the use of accessors.

+ void setBelow(StratifiedIndex i) {
+ assert(!isRemapped());
+ below = i;
+ }

Please assert that i != SetSentinel. Same applies to setAbove and remapTo.

+ // aliasedExternal without that check. So we have this.

Remove "So we have this."

+ bool hasAliasedExternalRaw() const { return aliasedExternal; }

It sounds like this should not be public. Make it private and make this a friend of whatever needs access.

+ // \brief For initial remapping to another set
+ void remapTo(StratifiedIndex other) {

What's the reason for using the term "remapping" to describe the set unioning operation? There seem to be better choices (union, join, group; maybe some term related to proxying would be more appropriate?).

+ Removing dead sets may not be worth it -- "saves" us 20 bits in the average
+
case (51 sets total) and 4KB in the worst case (out of 1.07M sets == ~133KB
+ // total) when bootstrapping LLVM.

I don't understand this comment in the context of the code. It saves us bits where?

+ while (current->hasAbove()) {
+ current = &linksAt(current->getAbove());
+ }

Remove the {}

+ DenseSet<SetLinks *> visited;

SmallPtrSet<SetLinks, 16> Visisted; (pick some number you feel is reasonable, 16 is just a guess on my part) -- unless you expect the set to be pretty large.

+ bool hasExternals = false;

HasExternals (please follow the LLVM coding conventions for naming variables)

+ if (!insertSuccess) {
+ continue;
+ }

Remove {} (please follow the LLVM coding conventions in this regard)

+ \brief We store external aliasing information as part of SetLinks, when
+
it's way more space efficient to store it as a bitset. This generates said
+ // bitset.

This comment implies that this function moves information into a bitset to save space, but it does not. It simply creates the bitset, leaving the original information in the links unchanged.

+ std::vector<bool> aliasingExternals;
+ aliasingExternals.resize(links.size());

Set the size using the constructor.

+ assert(&linksAt(idx1) != &linksAt(idx2));

This assert needs some text because it is not obvious what it is asserting. Make it something like:

assert(&linksAt(idx1) != &linksAt(idx2) && "Trying to merge two sets with the same <fill in here>");

+ // CASE 1: If into is above or below from, we need to merge both the

Which one is 'into'?

+ We want directional searches up/down, but for the initial case, we want
+
to search both up and down.

Please clarify what 'up/down' means (does this mean 'up or down' or 'up then down').

+ // Funny story: the current implementation of this function makes it

Yea, you know ;)

+ TODO: Maybe find a better way to do this than a forest of if
+
statements?

This will look somewhat better once you remove all of the unnecessary {}

+ // TODO: Maybe restructure? I don't like public->private->public

Either do it or remove the comment.

-Hal

  • Original Message -----

From: "George Burgess IV" <gbiv@google.com>
To: gbiv@google.com, chandlerc@gmail.com, nlewycky@google.com, dberlin@dberlin.org
Cc: llvm-commits@cs.uiuc.edu
Sent: Friday, July 18, 2014 2:26:40 PM
Subject: Re: [PATCH] LLVM CFL Alias Analysis -- Supporting Data Structures

Addressed dberlin's comments, swapped std::map->DenseMap and
std::set->DenseSet, and took out some unnecessary headers.

http://reviews.llvm.org/D4550

Files:

lib/Analysis/StratifiedSets.h
lib/Analysis/WeightedBidirectedGraph.h

llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

sanjoy added a subscriber: sanjoy.Jul 29 2014, 1:08 AM

Totally bike-shedding comment on terminology: StratifiedSets.h uses terms that both leak the fact that we're manipulating memory expressions and aliasing information (hasAliasingExternal) and terms that try to hide it (up, down). It may be clearer to rename either hasAliasingExternal to something more general (some property that is transitive in the "low" direction) or up / down to something more specific (addressof / dereference).

lib/Analysis/StratifiedSets.h
136

Should this be %k = select %c, %ad, %bp?

423

Why only update start here? Can't the whole sequence of SetLinks remapped to the final index? If it is beneficial to updateRemap only start, why not do it outside the loop, once we have the first SetLink that isn't remapped?

495

If I understood the logic correctly here, maybe this can be written not as worklist algorithm, but simply as two loops, each in one direction? So something structurally like this:

StratifiedIndex linksIntoIdx = idx1, linksFromIdx = idx2;
while (true) {
  auto &linksInto = linksAt(linksIntoIdx);
  auto &linksFrom = linksAt(linksFromIdx);
  linksInto.remapTo(linksFrom.number);

  if (linksInto.hasAbove() && linksFrom.hasAbove()) {
    linksIntoIdx = linksInto.getAbove();
    linksFromIdx = linksFrom.getAbove();
  } else {
    break;
  }
}

if (linksAt(linksIntoIdx).hasAbove() || linksAt(linksFromIdx).hasAbove()) {
  auto &topLink =
      linksAt(linksIntoIdx).hasAbove() ?
      linksAt(linksIntoIdx).getAbove() : linksAt(linksFromIdx).getAbove();
  topLink.setBelow(linksIntoIdx);
  linksAt(linksIntoIdx).setAbove(topLink);
}

// ... and a second loop in the below direction

If the logic cannot be expressed that way then a comment on why that is the case would help. :)

gbiv updated this revision to Diff 12000.Jul 29 2014, 4:25 PM

Updated code to better match LLVM style, addressed most/all of hfinkel's and sanjoy's concerns.

Responses to a few of the higher-level comments:
hfinkel:

What does it mean to "keep information on the notion of" somthing? Does this means it manages the state associated with state remapping?

Comment updated in an effort to better describe what I'm doing. :)

What's the reason for using the term "remapping" to describe the set unioning operation? There seem to be better choices (union, join, group; maybe some term related to proxying would be more appropriate?).

It seemed like it fit what was happening well enough, because our goal is to effectively treat the "remapped" set as deleted, and a proxy a different set. I do like the terms "ProxyFor" and "Proxies" more though, so I'll swap to one of the two in the next rev.

sanjoy:

{Terminology comment}

Agreed. The notion of an aliasingExternal will be removed in the next revision, because, in implementing interprocedural analysis (coming soon to a code review near you), a more generic approach was necessary. I haven't quite settled on a name for it yet, so if you have any suggestions for names of properties that are transitive in the "low" direction, I'd be happy to hear them. :)

Why only update start here? ...

It only updated start because doing so was correct and simple to code. I see your point though, so I swapped over to updating everything.

gbiv updated this revision to Diff 12177.Aug 4 2014, 1:44 PM

Modified data structures to support interprocedural analysis.

In more specific terms:

Added set relationship data (aboveness/belowness) into StratifiedSets
Expanded aliasExternals to a more generic "Attributes"/"Attrs" type, so we can more accurately express whether uncertainty is coming from arguments or globals
We now remove sets that have been remapped at build-time so we conserve as much space as possible, because now the space penalty of letting a "remapped" set slip passed build time has gone up from a single bit (field in aliasingExternals bitset) to ~12 bytes.

Looks good to me, modulo one issue, but you should wait for Nick or
Hal to give you a final okay.

I know both are busy, so it may be a bit.

As for my last comment on this one:

The attributes mechanism is not entirely well described.
If the data structure is meant to truly be independent of the Aliasing
related stuff, then your description should just be "these are
attributes that builders may use for various purposes. The semantics
are <x> on merging". or something similar.
If they aren't independent, it probably pays to give them a bit more
thought and structure, but i don't think that's worth the time quite
yet, until we know more about which end up useful/don't (IE whether
you want additional attributes"

gbiv updated this revision to Diff 12241.Aug 6 2014, 11:21 AM
gbiv added a subscriber: george.burgess.iv.

+Personal account (internship ends on Friday)

Addressed dberlin's feedback + fixed a merging bug/renamed some attribute-related methods.

nlewycky edited edge metadata.Aug 20 2014, 7:11 PM

For StratifiedSets, I need more documentation (*cough* or to read the paper) in order to continue reviewing it.

WeightedBidirectedGraph.h looks good to me, though I'd prefer "Bidirectional" (optional).

lib/Analysis/StratifiedSets.h
9

Missing high-level conceptual description of stratified sets.

37

Period to end complete sentence.

Also, the reader is left wondering what attributes are and why it should contain 32?

171–173

Extra space between then and recursively.
Extra space between below and %a.
Extra space between would and end.

338

propogateAttrs -> propagateAttrs

389

These methods need docstrings.

396

To check my understanding, this is what causes a cycle (attempting to insert A above B given that B is already above A) to be flattened into a single, uh, "stratus"?

lib/Analysis/WeightedBidirectedGraph.h
1

"Bidirected"? Bidirectional?

1

I'm not convinced this should be pulled out of CFL Alias Analysis.

LLVM's graph infrastructure has:
a) generic algorithms that operate over child_begin/child_end iterators
b) wrappers that are constructed from other objects and provide child_begin/end.
Sometimes the wrappers in (b) are null wrappers the implementation of the graph APIs are intrusive upon the underlying classes.

This doesn't fit that model at all, this is a graph data type. If this needs to exist at all (ie., we can't lazily construct it from other objects we have handy like the IR) then it should be local to CFLAliasAnalysis. That could mean creating a new CFGAlias directory to put this file in.

This is looks pretty good. Once the comments from the others reviewers are addressed, I think this can go in.

lib/Analysis/StratifiedSets.h
195

Don't break the line here.

lib/Analysis/WeightedBidirectedGraph.h
1

I agree. It is also not that big and could probably just go into the main source file too.

george.burgess.iv commandeered this revision.Feb 3 2016, 11:50 AM
george.burgess.iv added a reviewer: gbiv.

Commandeering to abandon; this went in a long time ago as part of D5106.

george.burgess.iv abandoned this revision.Feb 3 2016, 11:50 AM