Page MenuHomePhabricator

[IndVars] Do not branch on poison
Needs ReviewPublic

Authored by sanjoy on Feb 27 2017, 10:56 PM.

Details

Reviewers
majnemer
efriedma
Summary

Earlier, LFTR would introduce branches on values that could
potentially be poison. This change fixes the issue by demoting the
value LFTR is about to introduce a branch on to a non-poison variant
of itself whenever it cannot statically prove that it isn't poison.

Fixes PR31181.

Diff Detail

Event Timeline

sanjoy created this revision.Feb 27 2017, 10:56 PM

ping (on this and all the patches this one depends on)

efriedma edited edge metadata.Mar 15 2017, 12:51 PM

I guess first off, I'd like to take a step back and ask: do we need to do this transform at all? We're using some very complicated logic to conditionally rewrite an induction variable in terms of another induction variable... and then LSR will rewrite all the induction variables from scratch. This also seems difficult to test effectively: if we incorrectly rewrite an induction variable to something which could be poison, it'll almost always work, except in obscure cases.

sanjoy added a subscriber: atrick.Mar 15 2017, 12:56 PM

I guess first off, I'd like to take a step back and ask: do we need to do this transform at all? We're using some very complicated logic to conditionally rewrite an induction variable in terms of another induction variable... and then LSR will rewrite all the induction variables from scratch. This also seems difficult to test effectively: if we incorrectly rewrite an induction variable to something which could be poison, it'll almost always work, except in obscure cases.

I see this transform as a "canonicalization" type transform (supposed to make later loop opts easier), but I did not add it. Maybe @atrick can shed some light here on the initial motivation? I'm most certainly in favor of deleting code where possible.

Having said that, I don't think this is the only place in LLVM where the "it'll almost always work, except in obscure cases" adage applies, so that alone can't be the reason for excising this.

I see this transform as a "canonicalization" type transform (supposed to make later loop opts easier), but I did not add it. Maybe @atrick can shed some light here on the initial motivation? I'm most certainly in favor of deleting code where possible.

I don't know of any good reason to run linearFunctionTestReplace during IndVarSimplify. I think it should be part of LSR's OptimizeLoopTermCond(), which is the only thing that depends on it now AFAIK. I suppose it's possible that some loop vectorizer code relies on LFTR.

If SCEV provides the same information before and after LFTR, then it doesn't seem like a useful canonicalization.

Replacing a quadratic loop test (i*i < N) with a linear loop test does seem like a useful simplification, but I suspect it's exceedingly rare.

I'm not sure how moving it to LSR would help you deal with poison on the backedge though. If you completely remove LFTR then

  • We lose the quadratic to linear optimization (the original meaning of LFTR)
  • I think LSR will fail to generate certain expected patterns like counting down to zero

The part that's actually making this code complicated is the attempt to reuse an existing induction variable. Hopefully we don't have to worry about the trip count itself being poison?

Since LSR makes new induction variables anyway, I think the problem this patch is trying to solve doesn't exist if we transform the loop test in LSR. And it would be a bit more powerful there, because we could transform the loop test whether or not the loop actually has a canonical induction variable.

Materializing a new IV isn't a good canonicalization. It may block other loop transformations and there's no guarantee that LSR will undo it. General rule: never pessimize your loop and expect LSR to rewrite it. LSR only works well on a small subset of loops and essentially gives up in a lot of cases. I agree the complexity smells bad. I don't have a better suggestion at the moment.

Err, oops, I can see how that could be misread. I wasn't suggesting we should make a new IV in LSR; I was suggesting we could more aggressively use existing IVs because LSR will reconstruct the underlying IR.

When I talk about creating or removing an IV, I'm talking about the strongly connected phi cycle. I'm not referring to any "derived IV" stemming from values that depend on the IV.

I found the dependent patches and roughly figured out what's hapenning here. It seems like we could leave the original IV's nw flags in place and simply introduce an add without the nuw flags for the branch's post-increment. Maybe that's what Eli is suggesting? I think that would result in a nicely canonical form, but it may require adjusting downstream passes. I think the risk of interfering with downstream passes could be reduced by moving all of LFTR into LSR.

When I talk about creating or removing an IV, I'm talking about the strongly connected phi cycle. I'm not referring to any "derived IV" stemming from values that depend on the IV.

I found the dependent patches and roughly figured out what's hapenning here. It seems like we could leave the original IV's nw flags in place and simply introduce an add without the nuw flags for the branch's post-increment. Maybe that's what Eli is suggesting? I think that would result in a nicely canonical form, but it may require adjusting downstream passes. I think the risk of interfering with downstream passes could be reduced by moving all of LFTR into LSR.

It isn't just that, but also cases like this:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

;; Slightly edited version of the @f_1 function in the test case, but as I've noted
;; the test case in the patch is broken and needs to be fixed to be the same as this.
define i32 @f_1(i32* %ptr, i1 %always_false) {
entry:
  br label %for.cond

for.cond:
  %iv = phi i32 [ -2147483648, %entry ], [ %iv.inc, %always_taken ]
  %iv2 = phi i32 [ 0, %entry ], [ %iv2.inc, %always_taken ]
  store i32 %iv, i32* %ptr
  br i1 %always_false, label %never_taken, label %always_taken

never_taken:
  store volatile i32 %iv2, i32* %ptr
  br label %always_taken

always_taken:
  %iv.inc = add nsw i32 %iv, 1
  %iv2.inc = add nsw i32 %iv2, 1
  %cmp = icmp slt i32 %iv, 20
  br i1 %cmp, label %for.cond, label %for.end

for.end:
  ret i32 0
}

where LFTR rewrites the exit condition to use %iv2.inc (a "different" induction variable) which may be poison (and is poison in the last two iterations in this case).

Hanging the comparison off of a non-nsw/nuw increment of %iv2 won't help since by the last iteration %iv2 is poison, so a non-nsw/nuw addition will propagate that poison.

test/Transforms/IndVarSimplify/PR31181.ll
51

This test case is wrong -- it the increment on %iv.inc should just be nsw and not nuw.

Ok. I thought the problem was caused by converting the loop test to a post-increment compare. Is that a problem?

The problem you've been talking about is caused by FindLoopCounter. This is mostly about setting up a specific pattern of loop for count-down-to-zero codegen. I wanted to make sure we don't create additional IVs in the process. A side effect is that we can reduce the number of IVs for some loops as a "nice" simplification. That doesn't seem worth doing if we need to erase flags. Why doesn't FindLoopCounter just check isIVNeverPoison?

At any rate, FindLoopCounter is making the wrong choice here. I don't understand why it's picking an "AlmostDead" IV over an IV that starts at non-zero.

Ok. I thought the problem was caused by converting the loop test to a post-increment compare. Is that a problem?

Yes, but that isn't the only problem.

The problem you've been talking about is caused by FindLoopCounter.

Yes.

This is mostly about setting up a specific pattern of loop for count-down-to-zero codegen. I wanted to make sure we don't create additional IVs in the process. A side effect is that we can reduce the number of IVs for some loops as a "nice" simplification. That doesn't seem worth doing if we need to erase flags. Why doesn't FindLoopCounter just check isIVNeverPoison?

That would be correct, but it pessimizes the case where it would have been okay, for the various reasons outlined in the code for this patch, to nevertheless emit a branch on a potentially poison IV.

At any rate, FindLoopCounter is making the wrong choice here. I don't understand why it's picking an "AlmostDead" IV over an IV that starts at non-zero.

Are you talking about the example I pasted above? Then the IV picked by FindLoopCounter is not AlmostDead, since it (deliberately) has a use in the never_taken block.

I was referring to the same example but missed that there were two stores. So FindLoopCounter is working as intended.

I don't think that dropping flags is a good tradeoff for counting from zero. Especially this early in the pipeline. Beyond that, I'll leave it up to you!

sanjoy updated this revision to Diff 92252.Mar 18 2017, 12:26 PM
  • Fix test case

Hi Andy,

I'd like to revive this patch. I tried what you suggested -- I taught LFTR to bail out of the transform if it would have to drop no-wrap flags. With that change, the following tests fail:

LLVM :: Transforms/IndVarSimplify/2011-10-27-lftrnull.ll
LLVM :: Transforms/IndVarSimplify/2011-11-01-lftrptr.ll
LLVM :: Transforms/IndVarSimplify/PR31181.ll
LLVM :: Transforms/IndVarSimplify/lftr-udiv-tripcount.ll
LLVM :: Transforms/IndVarSimplify/lftr-zext.ll
LLVM :: Transforms/IndVarSimplify/lftr_simple.ll
LLVM :: Transforms/IndVarSimplify/widen-nsw.ll

If that seems reasonable (i.e. pessimizing the cases tested in the tests above seem okay), I'll go ahead and change makeValueNonPoisonOnBackEdge to not drop no-wrap flags.

I'd also appreciate it someone takes a look at the dependencies of this patch.

The other option would be to keep LFTR aggressive (i.e. have it drop no-wrap flags if needed), but to do that only if the backedge condition is quadratic.

My suggestion was this: LFTR is not necessary this early in the pipeline if it's only purpose is converting a 'cmp slt' to 'cmp eq'. As long as SCEV can compute the trip count, the loop test shouldn't matter. If we really need to convert the loop test, it could be done during LSR and we could drop the 'nw' flags at that point.

We still want to remove sext/zext by widening, remove quadratic IVs, etc. It's not clear to me why we need to pessimize those particular test cases. In all those cases, is the problem that the rewritten loop test uses a possibly-poison IV? I don't even see nsw/nuw flags in the tests.

@mzolotukhin, if we do need to pessimize any of these cases, can you evaluate the performance impact, or suggest someone who can?

My suggestion was this: LFTR is not necessary this early in the pipeline if it's only purpose is converting a 'cmp slt' to 'cmp eq'. As long as SCEV can compute the trip count, the loop test shouldn't matter. If we really need to convert the loop test, it could be done during LSR and we could drop the 'nw' flags at that point.

We still want to remove sext/zext by widening, remove quadratic IVs, etc.

I think we're on the same page then (https://reviews.llvm.org/D30446#741950).

It's not clear to me why we need to pessimize those particular test cases.  In all those cases, is the problem that the rewritten loop test uses a possibly-poison IV? I don't even see nsw/nuw flags in the tests.

I'll have to check case by case, but in several tests I see an inbounds GEP, which can also generate poison.

@mzolotukhin, if we do need to pessimize any of these cases, can you evaluate the performance impact, or suggest someone who can?

I can provide a version of this patch that avoids stripping no-wrap flags if that helps. Alternatively, you can remove the code paths that call dropPoisonGeneratingFlags and get the same effect.

reames added a subscriber: reames.Jun 5 2017, 1:09 PM

We just ran across this again, with a bit more of an urgent motivation this time. I'd very much like to see this patch make progress.

If I follow the discussion so far, it basically comes down to:

  1. We should factor out the LFTR code such that it can be used by LSR as well as IndVarSimplify.
  2. Once done, we should restrict the IndVar version to "do no harm". Specifically, this means that we do *not* strip nsw/nuw flags, we simply don't convert in that case.
  3. The LSR version should use the "strip flags if needed" strategy.

Is everyone comfortable with that direction?

Looking at a few of the test cases Sanjoy mentioned, I think we could also trivially re-infer flags in several of the cases. We could potentially integrate that logic into the IndVar version of the code if desired - i.e. trigger if either no-nsw, or could trivially infer nsw on IV w/new use added.

lib/Transforms/Scalar/IndVarSimplify.cpp
2174

Ok, seeing the usage here, I think this is probably the wrong interface for the problem even if we want to pursue this approach. I think we'd be much better off with an analysis which provided an interface for asking whether a given Instruction is known to be non-poison. Your IV analysis could be used, but you could also walk forward through uses to by-assumption non-poison use as well.

2193

minor: can't we do this before the more general search since we already have the information? Or are you trying to check where the new IV itself is known non-poison?

atrick added a comment.Jun 5 2017, 1:19 PM

As a generally good direction I like the idea of moving LFTR, and eventually even IV widening, forward to the LSR stage of the pipeline.
So, I'm ok with it as long as you do the normal amount of benchmark validation and @mzolotukhin doesn't see any immediate performance issues with this approach.

Sorry for the delay, I missed this thread somehow.
I'm running tests now, and publish the results as soon as I get them.

Michael

I ran SPEC2006 and CTMark with and without the patches and didn't see any performance difference.

Michael

Thanks for doing the performance validation @mzolotukhin.
Nice code @sanjoy.

sanjoy added inline comments.Jun 17 2017, 3:45 PM
lib/Transforms/Scalar/IndVarSimplify.cpp
2174

I think a context-free analysis with a simpler interface will be less powerful (if I understand your suggestion correctly). For instance, say we have:

loop:
  %iv = phi ...
  %iv.inc = add nsw i32 %iv, ...
  %cmp = %iv.inc s< ...
  call void @may_exit()
  br i1 %cmp, ...

An analysis can't correctly return true for isKnownNonPoison(%iv.inc) since it is possible for it to be poison and have @may_exit call exit(0) to avoid the UB due to branching on poison.

However, if we know the expression we're about to generate needs to be non-poison only at the point of the existing backedge, we can exploit the fact that the original condition for the branch is known to be not poison at that location, which is what I'm trying to do via propagateKnownNonPoison.

2193

I certainly want to first check if it is possible to get away without calling dropPoisonGeneratingFlags on anything at all.

However, I don't remember if there was a good reason to have PoisonGenerator->dropPoisonGeneratingFlags() before IV->dropPoisonGeneratingFlags(). One (somewhat weak) reason to do this: when I do IV->dropPoisonGeneratingFlags() I only affect stuff within the loop[0] I'm trying to optimize but when I do PoisonGenerator->dropPoisonGeneratingFlags() I may affect larger scope of things.

[0]: This is not really true (I affect stuff that uses the exit value of the IV, for one thing), which is why this line of reasoning is weak.