This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Apply loop guards to divisibility tests
ClosedPublic

Authored by gilr on Jan 27 2021, 6:04 AM.

Details

Summary

Extend applyLoopGuards() to take into account conditions/assumes proving some value %v to be divisible by D by rewriting %v to (%v / D) * D. This lets the loop unroller and the loop vectorizer identify more loops as not requiring remainder loops.

Diff Detail

Event Timeline

gilr created this revision.Jan 27 2021, 6:04 AM
gilr requested review of this revision.Jan 27 2021, 6:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2021, 6:04 AM
gilr updated this revision to Diff 320341.Jan 31 2021, 1:21 AM

Rebased

fhahn added inline comments.Jan 31 2021, 1:58 AM
llvm/lib/Analysis/ScalarEvolution.cpp
13271

Here we basically are matching a SCEV URem expression, right? There's a helper for that (matchURem), which should help to simplify the code a bit?

13282

Here we could just re-use the existing URem expression?

fhahn added inline comments.Jan 31 2021, 2:05 AM
llvm/test/Transforms/LoopUnroll/runtime-unroll-assume-no-remainder.ll
8

could you also add version of this test that checks %cmp in the guard condition, instead of an assume?

46

is this extra branch needed?

58

the extra blocks should not be needed for unrolling.

69

can we either change %i.011 to i64 or the users to use i32 instead of the extension?

81

should not be needed for unrolling.

gilr marked 3 inline comments as done.Jan 31 2021, 10:11 AM
gilr added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
13271

Ah, indeed. Thanks!

13282

Do you mean by computing (A / B) * B from the URem expression e.g. via A - URemExpr?

llvm/test/Transforms/LoopUnroll/runtime-unroll-assume-no-remainder.ll
46

This was done in D94088 to prevent D93974 (should it ever get in) from validating the entry-block assume for GetMinTrailingZeros() w/o an assumption context. But since appyLoopGuards() uses its own assumption validation check it's indeed redundant. Will remove.

fhahn added inline comments.Jan 31 2021, 10:54 AM
llvm/lib/Analysis/ScalarEvolution.cpp
13282

Nevermind, I was thinking about something else and we probably need the div & multiply to make sure the SCEV logic can determine the right multiple.

gilr updated this revision to Diff 320366.Jan 31 2021, 11:22 AM
gilr retitled this revision from [SCEV] Apply loop guards to trailing zero bits to [SCEV] Apply loop guards to zero modulo conditions.
gilr edited the summary of this revision. (Show Details)

Addressed comments.

fhahn accepted this revision.Feb 1 2021, 2:45 AM

LGTM with the additional tests, thanks!

I did some testing and it appears this exposes a crash in matchURem. I'm taking a look at that now, I think it would be good to wait with landing the change until this is resolved.

llvm/test/Transforms/LoopUnroll/runtime-unroll-assume-no-remainder.ll
8

could you add a similar negative test, e.g. where the and does not strip the lowest bit?

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll
67

could you add a similar negative test, e.g. where the and does not strip the lowest bit?

Perhaps also add a test using urem?

This revision is now accepted and ready to land.Feb 1 2021, 2:45 AM
fhahn added a comment.Feb 1 2021, 5:54 AM

I did some testing and it appears this exposes a crash in matchURem. I'm taking a look at that now, I think it would be good to wait with landing the change until this is resolved.

Should be fixed by f1e8136115ac

gilr marked an inline comment as done.Feb 1 2021, 8:45 AM

I did some testing and it appears this exposes a crash in matchURem. I'm taking a look at that now, I think it would be good to wait with landing the change until this is resolved.

Should be fixed by f1e8136115ac

Excellent, thanks!

llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll
67

could you add a similar negative test, e.g. where the and does not strip the lowest bit?

Will do.

Perhaps also add a test using urem?

Ah, good catch!
Trying VF=4, IC=3 exposed two issues:

  • While applyLoopGuards() rewrites my divisible-by-12 TC correctly to (12 * ...), getURemExpr(TC, 12) doesn't fold to zero. This can perhaps be solved by using SCEVDivision instead of getURemExpr(). Will try that as an improvement over this patch.
  • The test then hits the assert "VF*UF must be a power of 2 when folding tail by masking" later in LV. This isn't related to this patch (can be reproduced by forcing optsize and IC=3), so I'll upload a separate fix for it.
fhahn added inline comments.Feb 1 2021, 9:02 AM
llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll
67

While applyLoopGuards() rewrites my divisible-by-12 TC correctly to (12 * ...), getURemExpr(TC, 12) doesn't fold to zero. This can perhaps be solved by using SCEVDivision instead of getURemExpr(). Will try that as an improvement over this patch.

That sounds like an independent improvement to SCEV.

The test then hits the assert "VF*UF must be a power of 2 when folding tail by masking" later in LV. This isn't related to this patch (can be reproduced by forcing optsize and IC=3), so I'll upload a separate fix for it.

Sounds good, thanks!

gilr updated this revision to Diff 320533.Feb 1 2021, 11:14 AM

Addressed comments.

gilr retitled this revision from [SCEV] Apply loop guards to zero modulo conditions to [SCEV] Apply loop guards to divisibility tests.Feb 1 2021, 9:29 PM
This revision was automatically updated to reflect the committed changes.