Page MenuHomePhabricator

Introduce llvm.experimental.widenable_condition intrinsic
ClosedPublic

Authored by mkazantsev on Aug 24 2018, 1:12 AM.

Details

Summary

This patch introduces a new instinsic @llvm.experimental.widenable_condition
that allows explicit representation for guards. It is an alternative to using
@llvm.experimental.guard intrinsic that does not contain implicit control flow.

We keep finding places where @llvm.experimental.guard is not supported or
treated too conservatively, and there are 2 reasons to that:

  • @llvm.experimental.guard has memory write side effect to model implicit control flow, and this sometimes confuses passes and analyzes that work with memory;
  • Not all passes and analysis are aware of the semantics of guards. These passes treat them as regular throwing call and have no idea that the condition of guard may be used to prove something. One well-known place which had caused us troubles in the past is explicit loop iteration count calculation in SCEV. Another example is new loop unswitching which is not aware of guards. Whenever a new pass appears, we potentially have this problem there.

Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a deopt block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.

This patch introduces a new explicit form of guards alternative to @llvm.experimental.guard
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:

  %widenable_condition = call i1 @llvm.experimental.widenable.condition()
  %new_condition = and i1 %cond, %widenable_condition
  br i1 %new_condition, label %guarded, label %deopt

guarded:
  ; Guarded instructions

deopt:
  call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

The new intrinsic @llvm.experimental.widenable.condition has semantics of an
undef, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to br instuction, and it still can be
widened by replacing the result of @llvm.experimental.widenable.condition()
with and with any arbitrary boolean value (as long as the branch that is taken when
it is false has a deopt and has no side-effects).

For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".

This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sanjoy removed a reviewer: sanjoy.Aug 24 2018, 9:06 AM
sanjoy added a subscriber: sanjoy.

Conceptually this seems fine to me, but I won't have time to do a proper review.

The code looks fine.
I see that all your tests have empty deopt bundles.
Wouldnt it make sense to add something there, just in case?

include/llvm/IR/Intrinsics.td
833 ↗(On Diff #162330)

for guards represented as explicit branches

include/llvm/Transforms/Scalar/ExplicifyGuards.h
1 ↗(On Diff #162330)

"intinrics" -> intrinsics

mkazantsev marked 2 inline comments as done.

Fixed comments, added bundles to tests.

fedor.sergeev accepted this revision.Sep 4 2018, 2:03 AM

Thanks for test update.
LGTM (with new pm nit as noted).

lib/Transforms/Scalar/ExplicifyGuards.cpp
119 ↗(On Diff #163759)

PreservedAnalyses::all() ?

This revision is now accepted and ready to land.Sep 4 2018, 2:03 AM
mkazantsev added inline comments.
lib/Transforms/Scalar/ExplicifyGuards.cpp
119 ↗(On Diff #163759)

Eh, yes. Thanks for pointing out!

mkazantsev updated this revision to Diff 163770.Sep 4 2018, 2:48 AM
mkazantsev marked an inline comment as done.

Fixed analysis preservation.

apilipenko added inline comments.Sep 4 2018, 2:56 PM
docs/LangRef.rst
15097–15099 ↗(On Diff #163770)

Why do you need this limitation?

reames requested changes to this revision.Sep 5 2018, 7:18 PM

Overall, looks pretty good. One rounds of comments, but I'm expecting this to converge quickly.

I think there's a missing piece here. How do widennable_conditions themselves get lowered? I think that belongs to be in this patch so that it's a whole contained, end to end piece.

A reasonable choice would be to say that CGP or SelectionDAG just lowers them directly to the constant false.

docs/LangRef.rst
15109 ↗(On Diff #163770)

The text here needs tweaked. I'd suggest:
%widenable_cond_orig = call i1 @llvm.experimental.widenable.condition()
%widenable_cond = and i1 %widenable_cond_orig, %any_other_cond

15119 ↗(On Diff #163770)

You're wording here is problematic. As written, it's only legal to lower *if* the else leads to a deopt block which I'm pretty sure is not what you meant. I think you can just drop everything after "either true or false."

15125 ↗(On Diff #163770)

This doesn't sound like part of lowering.

Why don't you move this to semantics and reword it as:
"wcond" will never throw an exception and thus cannot be invoked.

include/llvm/Transforms/Scalar/ExplicifyGuards.h
1 ↗(On Diff #163770)

I'm not actively objecting to the structure here, but this really just feels like another form of guard lowering. Maybe if would be better to reuse the existing implementation and just have a parameter to control which form we're lowing to? LowerGuards vs LowerGuardsToWidennableConds?

lib/Transforms/Scalar/ExplicifyGuards.cpp
80 ↗(On Diff #163770)

"explicify" as a verb feels very awkward. "MakeGuardsExplicit" would be more clear.

This revision now requires changes to proceed.Sep 5 2018, 7:18 PM
mkazantsev added inline comments.Sep 6 2018, 8:02 PM
docs/LangRef.rst
15097–15099 ↗(On Diff #163770)

As stated above, we want these two constructions do exactly the same thing:

; Unguarded instructions
 call void @llvm.experimental.guard(i1 %cond, <args...>) ["deopt"(<deopt_args...>)]
 ; Guarded instructions

and

block:
  ; Unguarded instructions
  %widenable_condition = call i1 @llvm.experimental.widenable.condition()
  %new_condition = and i1 %cond, %widenable_condition
  br i1 %new_condition, label %guarded, label %deopt

guarded:
  ; Guarded instructions

deopt:
  call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

If there is something side-effecting before deoptimization in deopt block, these two constructions are obviously not equivalent.

mkazantsev marked 3 inline comments as done.Sep 7 2018, 12:46 AM
mkazantsev added inline comments.
include/llvm/Transforms/Scalar/ExplicifyGuards.h
1 ↗(On Diff #163770)

I had another model in my head. This pass will make guards explicit, but these explicit guards are still guards (i.e. they can be widened). LowerGuards's comments states:

// This pass lowers the llvm.experimental.guard intrinsic to a conditional call
// to @llvm.experimental.deoptimize.  Once this happens, the guard can no longer
// be widened.

I don't want the semantics of LowerGuards be changed. My plan was to teach it to turn widenable_condition calls to true, so that we preserve the invariant "no widening is possible after LowerGuards".

mkazantsev updated this revision to Diff 164381.Sep 7 2018, 3:26 AM

Fixed LangRef, renamed pass to "MakeGuardsExplicit".

apilipenko added inline comments.Sep 8 2018, 6:40 PM
docs/LangRef.rst
15097–15099 ↗(On Diff #163770)

Sorry, I don't quite follow you. *These two* constructions are exactly the same. widenable condition representation just enables more flexibility and you can have some extra code before deoptimization. So with widenable condition representation you can express more than what you could do with old guards.

I don't think that there are correctness or profitability reasons to impose this limitation.

reames accepted this revision.Sep 10 2018, 1:46 PM

LGTM.

I'm approving this in the current form, despite a bit of hesitation doing so. I'd like to see the conversation around the restriction of widening based on target block to continue. I think there's a good change we'll want to tweak the semantics there, but I see that as a minor tweak, not a major redesign.

There are a couple of follow ups I'd like to see here.

  1. A default lowering strategy for the new representation. (Likely in SelectionDAG).
  2. (Optional) Extending the existing LowerGuard pass to lower the new form as well.
  3. Tests for EarlyCSE, InstCombine, GVN, and DSE which show that two adjacent calls to the widen intrinsic aren't merged and that memory can be forwarded past. (Note: The first part is profitability, but the second is legality.)
lib/Transforms/Scalar/MakeGuardsExplicit.cpp
72 ↗(On Diff #164381)

I don't think this needs to use the guard calling convention. It'll never get lowered. The important bit was using this CC on the deopt call which will.

This revision is now accepted and ready to land.Sep 10 2018, 1:46 PM
sanjoy added inline comments.Sep 10 2018, 2:37 PM
docs/LangRef.rst
15097–15099 ↗(On Diff #163770)

I agree with Artur here; @llvm.experimental.widenable.condition is technically an independent concept from @llvm.experimental.deoptimize though of course it is heavily inspired by it.

For instance with @llvm.experimental.widenable.condition you could do things like:

int f(int x) {
  if (@llvm.experimental.widenable.condition()) {
    return faster_to_execute(x);
  } else {
    return easier_to_constant_fold(x);
  }
}

and later fold @llvm.experimental.widenable.condition() to false or true depending on whether f was inlined into a call site where x is a compile time constant or not.

We should also not spec @llvm.experimental.widenable.condition() as returning undef since undef is problematic. Instead we should say each call of @llvm.experimental.widenable.condition() non-deterministically returns true or false but the returned value is a normal i1 and does not have magic properties like undef.

mkazantsev marked 2 inline comments as done.

I tried to reformulate the semantics to make it independent on deoptimize intrinsic.

Made changes to LangRef to make the widening part independent on deoptimize intrinsic. After giving it some thought, I see no solid reason why we should impose any limitations on deopt block.

mkazantsev requested review of this revision.Sep 16 2018, 10:24 PM

I would ask one more round of review for changes made to LangRef. I tried to make it more generic and less reliant on deopt's specifics than it used to be.

fedor.sergeev added inline comments.Sep 17 2018, 8:05 AM
docs/LangRef.rst
15058 ↗(On Diff #165713)

Plural form looks weird here. Single intrinsic represents a single condition...

15152 ↗(On Diff #165713)

without showing a use of %new_cond this additional instruction does not really change anything.
Since you show the same transform with %new_cond usage below I believe two code-blocks shown above can be deleted without sacrificing anything.

mkazantsev marked 2 inline comments as done.
sanjoy added inline comments.Sep 22 2018, 11:44 AM
docs/LangRef.rst
15085 ↗(On Diff #165897)

I think this spec should just be "The intrinsic `@llvm.experimental.widenable.condition() always non-deterministically returns true` or false." The ", and it is guaranteed that any returned value leads to correct program execution and creates no undefined behavior in code" bit is true of everything in LLVM IR -- the frontend has to ensure the IR it generated is correct and doesn't have UB.

I'd also emphasize that every invocation of this intrinsic produces a single well defined value non-deterministically (so it isn't like undef).

15146 ↗(On Diff #165897)

Drop the "always".

15161 ↗(On Diff #165897)

Not sure if this belongs in the langref, but the intrinsic must be RAUW'ed with the stronger condition, replacing just one use is unsound right?

15174 ↗(On Diff #165897)

This is an important detail; not from a semantics perspective but from a performance perspective. I'm wondering if this behavior should be a part of name of the intrinsic (or maybe even that the intrinsic should have an argument which is what we default lower the intrinsic to).

For instance, given this spec it would be correct but unwise to lower a range check to:

%w = widenable_cond();
if (%w || out_of_bounds()) deoptimize();

but there is nothing in its name that makes this obvious.

mkazantsev marked 2 inline comments as done.Oct 8 2018, 2:32 AM
mkazantsev added inline comments.
docs/LangRef.rst
15161 ↗(On Diff #165897)

I see no problem in replacing only one use. It maybe makes no sense, but by definition it should be no bug.

15174 ↗(On Diff #165897)

I think that the last sentence about the default lowering strategy implies this, but actually it is OK to implement a different lowering strategy if at some use case this one is non-profitable.

mkazantsev updated this revision to Diff 168624.Oct 8 2018, 2:35 AM

Addressed comments to LangRef.

sanjoy added inline comments.Oct 8 2018, 10:53 AM
docs/LangRef.rst
15161 ↗(On Diff #165897)

What happens if the initial program is:

%c = widenable_cond();
%x = xor %c, %c

In this original program %x is always false, but if you replace one use of %c with a different value than the other use then %x may not be false.

mkazantsev added inline comments.Oct 16 2018, 9:03 PM
docs/LangRef.rst
15161 ↗(On Diff #165897)

We should preserve the invariant that the program is correct whether %c is true or false; even in your example we should guarantee that any value of %x that can be produced this way still leads to correct program execution. But I agree that there can be something fishy there; I'll make this correction.

Added clarification about RAUW to Lowering section.

nhaehnle removed a subscriber: nhaehnle.Oct 17 2018, 3:17 AM

Can we go with this version? :)

chandlerc added inline comments.
docs/LangRef.rst
15085 ↗(On Diff #165897)

I don't think "always" adds much value here.

I also think "non-deterministically" can be confusing to the reader as this doesn't cause the *compiler* to fold them non-deterministically.

I wonder if we could phrase the semantics more like:

The intrinsic ``...`` returns either `true` or `false`. For each evaluation of a call to this intrinsic, the program must be valid and correct both if it returns `true` and if it returns `false`. This allows transformation passes to replace evaluations of this intrinsic with either value whenever one is beneficial.

Uncertain how others feel about this approach to the semantics.

15174 ↗(On Diff #165897)

I don't think we need to allow passing in the directionality... FEs can choose to emit the alternatives in the structure necessary?

But I do like the point that maybe the fact that one is the "default" should be evident to the FE author so that they can choose the structure correctly.

15134–15137 ↗(On Diff #169947)

I really like this proposal, but the term "widenable condition" really doesn't help me understand it at all.

Have you all thought of other terminology that might work here?

Some ideas:

"equivalent condition"
"equivalent alternative condition"

Something focusing on the fact that this intrinsic models a value which can inhabit two alternative states that are semantically equivalent when executed.

Thoughts?

15145–15148 ↗(On Diff #169947)

I would phrase this the other way around. I think the important thing to mention about undef here is its *difference*. I also wouldn't focus first on the *use case*, but the *semantic* difference:

While this may appear similar in semantics to `undef`, it is very different in that an invocation produces a particular, singular value. It is also intended to be lowered late, and remain available for specific optimizations and transforms that can benefit from its special properties.
mkazantsev marked 4 inline comments as done.Nov 27 2018, 8:41 PM
mkazantsev added inline comments.
docs/LangRef.rst
15134–15137 ↗(On Diff #169947)

This name was chosen specifically because it is supposed to be used for widening transforms. Actually the semantics that is given in LangRef is wider and has more possible applications than what I was thinking of. :)

I am open to discussion how it can be called, but how about this: we merge it as is and then continue this discussion separately. I don't really like "equivalent condition" (without context, it's unclear equivalent to what?), but I agree that we can change the name. Just let's discuss it separately and merge it as pure NFC when we settle to some option so that I could be unblocked on items where this can be used.

Addressed comments, LangRef fixed. As for naming question, I am open to discussion, but I'd prefer to have it merged so that I could be unblocked on applications of that, and when we find a better name for it, I can make NFC for that.
@chandlerc , how do you feel about this?

One of the alternatives naming schemes which was discussed is to call this intrinsic should_deoptimize. In this case the meaning of the returned value is inverted, so we need to or it with the condition which we want to widen.

  %should_deoptimize = call i1 @llvm.experimental.should_deoptimize()
  %deoptimize = or i1 %cond, %should_deoptimize
  br i1 %deoptimize, label %guarded, label %deopt

guarded:
  ; Guarded instructions

deopt:
  call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

With this phrasing the intrinsic decides whether we want to deoptimize from this method early or not. It's frontend's responsibility to arrange so the true returned from should_deoptimize would result in a deoptimization with correct state. This is a more limiting phrasing then the proposed widenable.condition as it ties the semantics of this intrinsic with deoptimization.

One of the alternatives naming schemes which was discussed is to call this intrinsic should_deoptimize. In this case the meaning of the returned value is inverted, so we need to or it with the condition which we want to widen.

Current definition in LangRef gives it much wider use cases than this. The point is, we have no obligation to *only* use this intrinsic in and condition with deopt in one of branches. For example, this usage is legit:

if (wc()) {
  // Apply some solution that is fast on small data
} else {
  // Apply another alternative solution that is fast on big data
}

No deoptimize at all, but it is OK to make various optimizations that will play with heuristics when to choose which.

reames accepted this revision.Dec 4 2018, 8:53 PM

LGTM w/minor comment to be addressed before landing.

Note: I am specific LGTMing this in the current form, despite the previously raised question about naming. Given the conversation appears to have died and this is in the experimental namespace allowing naming changes anyway, it's time to land this patch and then iterate in tree if needed. If anyone *actively objects* feel free to speak up, but let's not let bikeshedding block progress here if we can avoid it.

include/llvm/IR/Intrinsics.td
833 ↗(On Diff #162330)

please remove the "for guards represented ..." part. The intrinsic is specified in a more generic manner than just for guards.

This revision is now accepted and ready to land.Dec 4 2018, 8:53 PM
This revision was automatically updated to reflect the committed changes.