This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] countToEliminateCompares(): fix handling of [in]equality predicates (PR43840)
ClosedPublic

Authored by lebedev.ri on Oct 30 2019, 8:22 AM.

Details

Summary

I believe this bisects to https://reviews.llvm.org/D44983 ([LoopUnroll] Only peel if a predicate becomes known in the loop body.)

While that revision did contain tests that showed arguably-subpar peeling for
[in]equality predicates that [not] happen in the middle of the loop,
it also disabled peeling for the *first* loop iteration,
because latch would be canonicalized to [in]equality comparison..

That was intentional as per https://reviews.llvm.org/D44983#1059583.
I'm not 100% sure that i'm using correct checks here,
but this fix appears to be going in the right direction..

Let me know if i'm missing some checks here..

Fixes PR43840.

Diff Detail

Event Timeline

lebedev.ri created this revision.Oct 30 2019, 8:22 AM
fhahn added a comment.Nov 4 2019, 2:14 AM

I'm not really sure how that would look, if we look at -loop-unroll -O3
for those tests, said loops no longer have that condition anyway.
And if we do peel off one more iteration regardless, then we bloat
the code by performing one loop iteration outside of the loop,
without simplifying the loop.

The interesting test cases for the FIXME are test7 and test8. In those cases, we have to peel off 4 iterations, to eliminate the condition in the loop, but with the current logic we peel off only 3: we currently try to peel until a condition in the loop becomes known. For monotonic predicates, we know it will stay known for the rest of the loop (and can be eliminated further down the road).

For EQ/NE that's not the case. In some cases (like in the tests), it is enough to peel one additional iteration, if the predicate becomes known for the rest of the loop. I think the reason why the conditions in test7 and test8 get eliminated with -loop-unroll -O3 is that they will get peeled twice. It would also be good to add an additional test cases with NE/EQ predicates that alternate (e.g. a check that the induction variable is even/odd). For such loops, peeling of the first iteration is not really helpful.

lebedev.ri updated this revision to Diff 227837.Nov 5 2019, 3:10 AM

I'm not really sure how that would look, if we look at -loop-unroll -O3
for those tests, said loops no longer have that condition anyway.
And if we do peel off one more iteration regardless, then we bloat
the code by performing one loop iteration outside of the loop,
without simplifying the loop.

The interesting test cases for the FIXME are test7 and test8. In those cases, we have to peel off 4 iterations, to eliminate the condition in the loop, but with the current logic we peel off only 3: we currently try to peel until a condition in the loop becomes known. For monotonic predicates, we know it will stay known for the rest of the loop (and can be eliminated further down the road).

For EQ/NE that's not the case. In some cases (like in the tests), it is enough to peel one additional iteration, if the predicate becomes known for the rest of the loop. I think the reason why the conditions in test7 and test8 get eliminated with -loop-unroll -O3 is that they will get peeled twice. It would also be good to add an additional test cases with NE/EQ predicates that alternate (e.g. a check that the induction variable is even/odd). For such loops, peeling of the first iteration is not really helpful.

Thank you for taking a look! I should have been more specific.
That much i understand. It works if we are looking for i!=/==3.
But as you can see from the tests, if we were looking for i!=/==0,
we then end up peeling one iteration more than we need to.

To make it more visible, -loop-unroll -simplifycfg -instcombine -simplifycfg output for those tests:

define void @test12__peel_first_iter_via_eq_pred(i32 %len) {
entry:
  %cmp5 = icmp sgt i32 %len, 0
  br i1 %cmp5, label %if.then.peel, label %for.cond.cleanup

if.then.peel:                                     ; preds = %entry
  call void @init()
  call void @sink()
  %exitcond.peel = icmp eq i32 %len, 1
  br i1 %exitcond.peel, label %for.cond.cleanup, label %if.end.peel5

if.end.peel5:                                     ; preds = %if.then.peel
  call void @sink()                                                          ; <- BAD
  %exitcond.peel7 = icmp eq i32 %len, 2
  br i1 %exitcond.peel7, label %for.cond.cleanup, label %for.body

for.cond.cleanup:                                 ; preds = %if.then.peel, %if.end.peel5, %for.body, %entry
  ret void

for.body:                                         ; preds = %if.end.peel5, %for.body
  %i.06 = phi i32 [ %inc, %for.body ], [ 2, %if.end.peel5 ]
  call void @sink()
  %inc = add nuw nsw i32 %i.06, 1
  %exitcond = icmp eq i32 %inc, %len
  br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !0
}

define void @test13__peel_first_iter_via_ne_pred(i32 %len) {
entry:
  %cmp5 = icmp sgt i32 %len, 0
  br i1 %cmp5, label %if.then.peel, label %for.cond.cleanup

if.then.peel:                                     ; preds = %entry
  call void @init()
  call void @sink()
  %exitcond.peel = icmp eq i32 %len, 1
  br i1 %exitcond.peel, label %for.cond.cleanup, label %if.end.peel5

if.end.peel5:                                     ; preds = %if.then.peel
  call void @sink()                                                          ; <- BAD
  %exitcond.peel7 = icmp eq i32 %len, 2
  br i1 %exitcond.peel7, label %for.cond.cleanup, label %for.body

for.cond.cleanup:                                 ; preds = %if.then.peel, %if.end.peel5, %for.body, %entry
  ret void

for.body:                                         ; preds = %if.end.peel5, %for.body
  %i.06 = phi i32 [ %inc, %for.body ], [ 2, %if.end.peel5 ]
  call void @sink()
  %inc = add nuw nsw i32 %i.06, 1
  %exitcond = icmp eq i32 %inc, %len
  br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !2
}

declare void @init()

declare void @sink()
lebedev.ri updated this revision to Diff 227848.Nov 5 2019, 5:04 AM

Okay, this looks super ugly to me, but this seems to do the right thing..

lebedev.ri updated this revision to Diff 227849.Nov 5 2019, 5:05 AM
lebedev.ri edited the summary of this revision. (Show Details)

Denoise the diff.

lebedev.ri updated this revision to Diff 227850.Nov 5 2019, 5:14 AM

Undo unneeded check.

xbolva00 added inline comments.
llvm/test/Transforms/LoopUnroll/peel-loop-conditions.ll
524–525

Remove fixme

lebedev.ri updated this revision to Diff 227852.Nov 5 2019, 5:35 AM

Update comments.

lebedev.ri marked an inline comment as done.Nov 5 2019, 5:35 AM

Fix edge-case handling - if we need to peel that one extra iteration,
we should still respect the MaxPeelCount, and if we aren't allowed
to peel, we should give up, not do subpar peeling job.

fhahn accepted this revision.Nov 6 2019, 3:32 AM

LGTM, with a few outstanding comments!

I think some cases could still slip through the cracks, but the most important negative cases are covered by the test cases.

llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp
216

please update the comment, it still mentions we not supporting non-monotonic predicates.

241

IMO it would be slightly better to make the captures explicit (here and below)

255

nit: ...does the predicate .... become known...

259

maybe replace 'we're done here' with' Give up'. like below, to make it clear that we do not want to peel for the given condition.

267

I think we don't need to check the inverse predicate is known, when we already know the regular predicate is known, which is the previous check. Similarly, we already know know SE.isKnownPredicate(ICmpInst::getInversePredicate(Pred), IterVal, RightSCEV), so we should not need to check !SE.isKnownPredicate(Pred, IterVal, RightSCEV)

It might be worth having those 2 as an assertion though, in case I am missing something.

This revision is now accepted and ready to land.Nov 6 2019, 3:32 AM
lebedev.ri updated this revision to Diff 228035.Nov 6 2019, 3:55 AM
lebedev.ri marked 6 inline comments as done.

@fhahn thank you for the review!
Addressed review notes.

lebedev.ri updated this revision to Diff 228036.Nov 6 2019, 3:56 AM

Re-rebase, NFC.

lebedev.ri added inline comments.Nov 6 2019, 4:00 AM
llvm/lib/Transforms/Utils/LoopUnrollPeel.cpp
267

This is indeed a mess.
I believe this is what you had in mind?

This revision was automatically updated to reflect the committed changes.

@fhahn @hfinkel @reames
While there, can anyone give any general suggestion about https://bugs.llvm.org/show_bug.cgi?id=43910, https://bugs.llvm.org/show_bug.cgi?id=43892 ?
Is there any existing pass where something like that fit?

This is probably not the best place to discuss other loop optimizations. I'll comment in the bugs.