This is an archive of the discontinued LLVM Phabricator instance.

Add predicateinfo intrinsic, analysis pass, and basic NewGVN support
AbandonedPublic

Authored by dberlin on Jan 30 2017, 9:23 PM.

Details

Summary

I'll split this patch up for review, just starting a discussion and
putting something people can play with.

This patch adds a pass to build extended SSA (see "ABCD: eliminating
array bounds checks on demand"), and an intrinsic to support it. This
is then used to get functionality equivalent to propagateEquality in
GVN, in NewGVN (without having to replace instructions as we go). It
would work similarly in SCCP or other passes. This has been talked
about a few times, so i built a real implementation and tried to
productionize it.

The intrinsic is essentially a copy intrinsic which the passes uses to
attach data to (depending on what predicates apply). It is marked as
readnone and returning it's first argument (which is the operand it's
a copy of).

Copies are inserted for operands used in assumes and conditional
branches that are based on comparisons (see below for more)

Every use affected by the predicate is renamed to the appropriate
intrinsic result.

E.g.
%cmp = icmp eq i32 %x, 50
br i1 %cmp, label %true, label %false
true:
ret i32 %x
false:
ret i32 1

will become

%cmp = icmp eq i32, %x, 50
br i1 %cmp, label %true, label %false
true:
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 %x, 50 }
%x.0 = call @llvm.predicateinfo.i32(i32 %x)
ret i32 %x.0
false:
ret i23 1

(you can use -print-predicateinfo to get an annotated-with-predicateinfo dump)

This enables us to easily determine what operations are affected by a
given predicate, and how operations affected by a chain of
predicates.

The intrinsic, marked as returning it's first argument, has no code
generation effect (though currently not every optimization pass knows
that intrinsics with the returned attribute can be looked through).

We deliberately do not attach any info through a second operand so
that the intrinsics do not need to dominate the comparisons/etc (since
in the case of assume, we may want to push them up the post-dominator
tree)

The actual pass stores data about each renamed operand. For operands
used in comparisons in branches, it stores what edge you are in (true
or false), the original comparison, and both parts of the branch edge
(IE the branch block and the successor block for this edge). This is
done so they can be moved without worrying and we can detect they are
invalid. For operands renamed due to assumes, we store where the
assume is.

The pass does not do insertions in blocks unless the operand with
predicateinfo is used there.

(see the false edge above)

Time wise, the pass is O(defs+uses of operands of branch ending
comparisons/assumes). For a CFG that contains 2 million blocks and 1
million icmps, it takes 600ms to insert predicate info. The largest
amount of time is acutally spent in the depth first iterator walking
the dominator tree and inserting blocks into the visited set (which is
a waste of time since it's a tree, sadly). Most passes simply crash
on this file :) Renaming uses takes about 150ms. (timing on most
normal testcases is not really measurable)

We could make it possible to insert only for certain operations if we
want, and at least what i've seen so far, it would be fast enough to
not worry horribly about. Given how fast it is, I have not worried
about updating. Currently, NewGVN nowunderstands the returned
attribute, so it destroys them all. Also as mentioned, we can detect
if they've been invalidated by being moved. Just moving them would
not actually invalidate them (since they have all the info in them
necessary to give correct answers), the only thing that actually would
make them do something bad would be to add uses of them that *aren't*
dominated by one of the edges.

Note that if we decide we don't want to go this direction for some
reason, i would likely make this private to NewGVN or something (i
have another way to do what it does, but it's not as nice).

We also may want to centralize some of the knowledge about what things
imply what (IE have "getConstantValue" or "getEquivalentNames" or
various things, otherwise people have to use isTrueWhenEqual and
isImpliedTrue, everywhere. We have a bunch of missed opt reports about
the places that try to do this but don't catch every case, etc).

Event Timeline

dberlin created this revision.Jan 30 2017, 9:23 PM
sanjoy edited edge metadata.Jan 31 2017, 12:38 AM

Just moving them would not actually invalidate them (since they have all the info in them necessary to give correct answers)

Wouldn't speculating them be a problem? That is:

int x = ...;
if (0 s< x s< 20) {
  x_ = predicate_info(x) // attached icmp 0 s< x s< 20
  r0 = 1 / (x_ + 1);
  print r0;
}

to

int x = ...;
x_ = predicate_info(x) // attached icmp 0 s< x s< 20
if (0 s< x s< 20) {
  r0 = 1 / (x_ + 1);
  print r0;
}

to (division safe to speculate since it divides by [1, 20) + 1 =
[2, 21)):

int x = ...;
x_ = predicate_info(x) // attached icmp 0 s< x s< 20
r0 = 1 / (x_ + 1);
if (0 s< x s< 20) {
  print r0;
}

but now you could divide by zero if x was -1.

Generally, (as I've mentioned on IRC) the only concern I had with this was the compile time hit we may have because of the extra dereferences that will now be necessary to go from a use to its "true" operand.

Generally, (as I've mentioned on IRC) the only concern I had with this was the compile time hit we may have because of the extra dereferences that will now be necessary to go from a use to its "true" operand.

Noticed a potential ambiguity here -- I meant going from a use to the value that use actually uses, skipping all of the intermediate predicate_info calls in the use chain.

  • Remove dead code for placing single argument phi nodes

Just moving them would not actually invalidate them (since they have all the info in them necessary to give correct answers)

Wouldn't speculating them be a problem? That is:

int x = ...;
if (0 s< x s< 20) {
  x_ = predicate_info(x) // attached icmp 0 s< x s< 20
  r0 = 1 / (x_ + 1);
  print r0;
}

to

int x = ...;
x_ = predicate_info(x) // attached icmp 0 s< x s< 20
if (0 s< x s< 20) {
  r0 = 1 / (x_ + 1);
  print r0;
}

It actually knows it belongs inside the if block, it stores the branch block and successor block it belonged to in the info. They are placed at very specific points, so it's also possible to make the retrieval function verify they have not been moved before returning info (IE it can choose to either not give an answer, or determine the answer is still valid). We could do this in passes that may move them.

(Probably not return an answer).

So in your above case, if you did this, we would say nothing, and it becomes equivalent to what it was before.

The bigger problem right now is that nothing is looking through the returned intrinsic, but i can easily fix that in any pass we add predicateinfo to.

Generally, (as I've mentioned on IRC) the only concern I had with this was the compile time hit we may have because of the extra dereferences that will now be necessary to go from a use to its "true" operand.

Sure, which is why, for the moment, i'd probably start by cleaning up our existing usage of this type of info where possible, rather than just add it in new places.

Generally, our passes are based on algorithms that assume a single ssa "name" (for those not familiar, these are not names in llvm, but it's easier to talk about names and values than Value and values :P) has a single value that can be determined across the function.

To handle the cases predicateinfo handles (IE where this is not true), they generally take one of a few approaches:

  1. They eagerly try to discover and propagate this info by replacing uses (This is used in part by GVN and EarlyCSE). This tradeoff makes them unsuitable to be used as analysis, and hard to use on any sub-portion of the CFG
  2. They maintain complex data structures that try to say what the value of a given name is in different blocks. This is often expensive to maintain (scoped hash tables, which have to be rebuilt each time due to popping scopes), or expensive to do lookups in (log n if it's an interval tree of dfs numbers, whereas gvn's findleader takes O(N)). Sometimes, they maintain multiple ones of these, in conjunction with #1 (GVN is worse than earlycse here).

I tried this approach in NewGVN on a branch, and it's ... a mess to do as an analysis.

  1. They try to compute the info lazily to avoid the problems of #1 and #2. This is expensive in a different way.
  2. They do nothing, and get worse results (SCCP takes this approach)

The reason, btw, they have to do #1 and #2 at the same time is to handle simple cases like this:

define i32 @test1(i32 %x) {
    %cmp = icmp eq i32 %x, 50
    br i1 %cmp, label %true, label %false
true:
    ret i32 %x
false:
    ret i32 1
}

%x has a different value in the true block, but has no name for that value. So either they have to do eager replacement, or analysis and then separate lookups for each use + general instruction rewriting :(
They all choose the former because the latter is expensive.
With predicateinfo, there is a new name there, so you don't have to do either.
The main cases predicateinfo doesn't get are critical edge cases (which are fixable)

If we can cleanup a bunch of the existing approaches without compile time loss, it's IMHO a win even if we compute it a few times and don't update it.

davide edited edge metadata.Feb 1 2017, 11:34 AM

Before I review, some quick comments/questions.

Just moving them would not actually invalidate them (since they have all the info in them necessary to give correct answers)

Wouldn't speculating them be a problem? That is:

int x = ...;
if (0 s< x s< 20) {
  x_ = predicate_info(x) // attached icmp 0 s< x s< 20
  r0 = 1 / (x_ + 1);
  print r0;
}

to

int x = ...;
x_ = predicate_info(x) // attached icmp 0 s< x s< 20
if (0 s< x s< 20) {
  r0 = 1 / (x_ + 1);
  print r0;
}

It actually knows it belongs inside the if block, it stores the branch block and successor block it belonged to in the info. They are placed at very specific points, so it's also possible to make the retrieval function verify they have not been moved before returning info (IE it can choose to either not give an answer, or determine the answer is still valid). We could do this in passes that may move them.

(Probably not return an answer).

So in your above case, if you did this, we would say nothing, and it becomes equivalent to what it was before.

The bigger problem right now is that nothing is looking through the returned intrinsic, but i can easily fix that in any pass we add predicateinfo to.

Generally, (as I've mentioned on IRC) the only concern I had with this was the compile time hit we may have because of the extra dereferences that will now be necessary to go from a use to its "true" operand.

Sure, which is why, for the moment, i'd probably start by cleaning up our existing usage of this type of info where possible, rather than just add it in new places.

I agree.

Generally, our passes are based on algorithms that assume a single ssa "name" (for those not familiar, these are not names in llvm, but it's easier to talk about names and values than Value and values :P) has a single value that can be determined across the function.

To handle the cases predicateinfo handles (IE where this is not true), they generally take one of a few approaches:

  1. They eagerly try to discover and propagate this info by replacing uses (This is used in part by GVN and EarlyCSE). This tradeoff makes them unsuitable to be used as analysis, and hard to use on any sub-portion of the CFG
  2. They maintain complex data structures that try to say what the value of a given name is in different blocks. This is often expensive to maintain (scoped hash tables, which have to be rebuilt each time due to popping scopes), or expensive to do lookups in (log n if it's an interval tree of dfs numbers, whereas gvn's findleader takes O(N)). Sometimes, they maintain multiple ones of these, in conjunction with #1 (GVN is worse than earlycse here).

I tried this approach in NewGVN on a branch, and it's ... a mess to do as an analysis.

  1. They try to compute the info lazily to avoid the problems of #1 and #2. This is expensive in a different way.

Are you thinking about CVP?

  1. They do nothing, and get worse results (SCCP takes this approach)

I really think SCCP should learn about this.

The reason, btw, they have to do #1 and #2 at the same time is to handle simple cases like this:

define i32 @test1(i32 %x) {
    %cmp = icmp eq i32 %x, 50
    br i1 %cmp, label %true, label %false
true:
    ret i32 %x
false:
    ret i32 1
}

%x has a different value in the true block, but has no name for that value. So either they have to do eager replacement, or analysis and then separate lookups for each use + general instruction rewriting :(
They all choose the former because the latter is expensive.
With predicateinfo, there is a new name there, so you don't have to do either.
The main cases predicateinfo doesn't get are critical edge cases (which are fixable)

If we can cleanup a bunch of the existing approaches without compile time loss, it's IMHO a win even if we compute it a few times and don't update it.

silvas added a subscriber: silvas.

One key issue here is that this analysis mutates the IR, which is rarely done in LLVM, and I think we are trying to remove all cases of doing that (as Chandler puts it in http://llvm.org/devmtg/2014-04/PDFs/Talks/Passes.pdf slide 17 "Forms a new, sub-set IR, which is problematic"). On that point, you probably want to start a discussion on LLVMdev (with a link to this review). I anticipate at least Chandler is going to have something to say about this w.r.t the new PM, so make sure to CC him.

Sure, which is why, for the moment, i'd probably start by cleaning up our existing usage of this type of info where possible, rather than just add it in new places.

Generally, our passes are based on algorithms that assume a single ssa "name" (for those not familiar, these are not names in llvm, but it's easier to talk about names and values than Value and values :P) has a single value that can be determined across the function.

To handle the cases predicateinfo handles (IE where this is not true), they generally take one of a few approaches:

  1. They eagerly try to discover and propagate this info by replacing uses (This is used in part by GVN and EarlyCSE). This tradeoff makes them unsuitable to be used as analysis, and hard to use on any sub-portion of the CFG
  2. They maintain complex data structures that try to say what the value of a given name is in different blocks. This is often expensive to maintain (scoped hash tables, which have to be rebuilt each time due to popping scopes), or expensive to do lookups in (log n if it's an interval tree of dfs numbers, whereas gvn's findleader takes O(N)). Sometimes, they maintain multiple ones of these, in conjunction with #1 (GVN is worse than earlycse here).

I tried this approach in NewGVN on a branch, and it's ... a mess to do as an analysis.

I'm glad to hear that your tried this. My understanding is that traditionally in LLVM we try to keep things inferred from the IR cached in analyses (and only using explicit annotations for things coming in from the frontend). But it seems that what you are saying is that this is really nontrivial to do in this case, with the crux being that you need a primitive for creating a new SSA name to sparsely maintain the information you need (with the rest cached in the analysis). I think it would be better to call the intrinsic "new ssa name" or something which focuses on the primitive mechanism it is providing, rather than one user (can't think of other users off the top of my head though, though I haven't tried very hard). Still, the issue of mutating the IR to insert these new SSA names is quite thorny and deserves a thread on llvm-dev IMO; this use case seems pretty compelling.

One key issue here is that this analysis mutates the IR, which is rarely done in LLVM, and I think we are trying to remove all cases of doing that (as Chandler puts it in http://llvm.org/devmtg/2014-04/PDFs/Talks/Passes.pdf slide 17 "Forms a new, sub-set IR, which is problematic"). On that point, you probably want to start a discussion on LLVMdev (with a link to this review). I anticipate at least Chandler is going to have something to say about this w.r.t the new PM, so make sure to CC him.

Done

Sure, which is why, for the moment, i'd probably start by cleaning up our existing usage of this type of info where possible, rather than just add it in new places.

Generally, our passes are based on algorithms that assume a single ssa "name" (for those not familiar, these are not names in llvm, but it's easier to talk about names and values than Value and values :P) has a single value that can be determined across the function.

To handle the cases predicateinfo handles (IE where this is not true), they generally take one of a few approaches:

  1. They eagerly try to discover and propagate this info by replacing uses (This is used in part by GVN and EarlyCSE). This tradeoff makes them unsuitable to be used as analysis, and hard to use on any sub-portion of the CFG
  2. They maintain complex data structures that try to say what the value of a given name is in different blocks. This is often expensive to maintain (scoped hash tables, which have to be rebuilt each time due to popping scopes), or expensive to do lookups in (log n if it's an interval tree of dfs numbers, whereas gvn's findleader takes O(N)). Sometimes, they maintain multiple ones of these, in conjunction with #1 (GVN is worse than earlycse here).

I tried this approach in NewGVN on a branch, and it's ... a mess to do as an analysis.

I'm glad to hear that your tried this. My understanding is that traditionally in LLVM we try to keep things inferred from the IR cached in analyses (and only using explicit annotations for things coming in from the frontend).

IMHO, this has turned out to be a bad strategy for a number of things, and a good one for others.
I don't think we should pretend it is the right tradeoff for everything.

But it seems that what you are saying is that this is really nontrivial to do in this case, with the crux being that you need a primitive for creating a new SSA name to sparsely maintain the information you need (with the rest cached in the analysis). I think it would be better to call the intrinsic "new ssa name" or something which focuses on the primitive mechanism it is providing, rather than one user (can't think of other users off the top of my head though, though I haven't tried very hard). Still, the issue of mutating the IR to insert these new SSA names is quite thorny and deserves a thread on llvm-dev IMO; this use case seems pretty compelling.

Would you feel the same way if i just made it a pass or a utility?

We have plenty of both that are required that mutate the IR.
(like, for example, LCSSA)
The reason it's not a pass is because passes can't return results from pass.
It could be a utility, right up until we try to decide we want to update it.

ATM, it's fast enough i don't think we have to do that, but it's hard to predict the future.
Truthfully, on the large cfg testcase, most things take 10+ minutes, and we take 600ms.
(dominators takes about the same).
I'm not sure we could really make it faster through updating, we'd just avoid invalidation since most changes are non-destructive.

minor fixes

include/llvm/Transforms/Utils/PredicateInfo.h
157–158

redundant.

245

take unique_ptr by value. This ensures that the pointer value sinks (and it is also less characters to read, and it is easier for the optimizer to optimize it)

lib/Transforms/Scalar/NewGVN.cpp
1124

auto *

lib/Transforms/Utils/PredicateInfo.cpp
87–88

Is it ok here to copy the map?

135–136

auto

261

auto

557

extra line betwwen

558–559

Why just not store unique_ptrs inside the map?

dberlin added inline comments.Feb 2 2017, 10:54 AM
lib/Transforms/Utils/PredicateInfo.cpp
87–88

No.We really don't want it copying the map

558–559

Yeah, i'll fix it.

dberlin marked 8 inline comments as done.Feb 2 2017, 11:29 AM
dberlin added inline comments.
include/llvm/Transforms/Utils/PredicateInfo.h
245

This does not actually work, afaict.
If you try it, it does not compile no matter what because unique ptrs can't be copied by value, only moved.

If you want me to do something here, exact code appreciated.

This is also a direct copy of the idiom we are using in other analysis passes :)

lib/Transforms/Scalar/NewGVN.cpp
1124

As the summary says, this part of code is mainly here so folks can play with it.
I plan on submitting the newgvn changes separately from the rest, and will clean them up prior to submission.

  • Remove dead code for placing single argument phi nodes
  • Fix some merge errors
  • Update for review comments
dberlin updated this revision to Diff 87027.Feb 3 2017, 2:27 PM
  • Move from Analysis to utility
  • NewGVN: Don't merge metadata when replacing predicateinfo with original operand
dberlin abandoned this revision.Feb 3 2017, 2:29 PM

About to abandon this in favor of split patches.

Prazek added inline comments.Feb 3 2017, 4:33 PM
lib/Transforms/Utils/PredicateInfo.cpp
87–88

I will have to look at it later. It looks a little bit suspisous, that this class modifies a map that it doesn't own, doing it in const method. It is just something that I would not expect from name like this.

Is this map used after using this class? I haven't checked it, but if it worked with copying, then probably not, which means that this map could be taken by &&, and own it, without copying.

Prazek added inline comments.Feb 3 2017, 4:35 PM
include/llvm/Transforms/Utils/PredicateInfo.h
245

I am not sure if the code dissapeard, but code like this:

void take(std::unique_ptr<int> p) {
}


void push() {
  std::unique_ptr<int> a;

  take(std::move(a));
}

Compiles fine, so I would guess it should work, but I will check it with your next reviews.