This is an archive of the discontinued LLVM Phabricator instance.

[LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards
ClosedPublic

Authored by eopXD on Aug 16 2021, 2:52 AM.

Details

Summary

Expression guraded in loop entry can be folded prior to comparison. This patch
proceeds D107353 and makes LIR able to deal with nested for-loop.

Diff Detail

Event Timeline

eopXD created this revision.Aug 16 2021, 2:52 AM
eopXD requested review of this revision.Aug 16 2021, 2:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 2:52 AM

I think this needs to be somewhere in SCEV itself.

eopXD added a comment.Aug 16 2021, 3:20 AM

I think this needs to be somewhere in SCEV itself.

Do you mean the structure SCEVFolder?

eopXD updated this revision to Diff 366579.Aug 16 2021, 3:23 AM

Fix clang-format warning.

I think this needs to be somewhere in SCEV itself.

Do you mean the structure SCEVFolder?

No, i mean that the ScalarEvolution should perform this canonicalization internally, in ScalarEvolution::getSignExtendExpr()
It already does it, in fact: https://github.com/llvm/llvm-project/blob/0dc6b597db4d8b25f3df96a8f6574be732e18fdd/llvm/lib/Analysis/ScalarEvolution.cpp#L2100-L2103
Though perhaps that is not feasible because we don't know the loop we're after when in ScalarEvolution::getSignExtendExpr().

eopXD added a comment.Aug 16 2021, 3:36 AM

No, i mean that the ScalarEvolution should perform this canonicalization internally, in ScalarEvolution::getSignExtendExpr()
It already does it, in fact: https://github.com/llvm/llvm-project/blob/0dc6b597db4d8b25f3df96a8f6574be732e18fdd/llvm/lib/Analysis/ScalarEvolution.cpp#L2100-L2103

I see. Thank you for the elaboration.

Though perhaps that is not feasible because we don't know the loop we're after when in ScalarEvolution::getSignExtendExpr().

I agree.

Whitney added a subscriber: bmahjour.
eopXD updated this revision to Diff 366661.Aug 16 2021, 9:26 AM

Fix some minor type-o.

bmahjour added inline comments.Aug 16 2021, 10:02 AM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
310

that is -> that are

319

greater of -> greater or

321–325
  if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Expr,
                                  SE.getZero(Expr->getType())))
    return SE.getZeroExtendExpr(visit(Expr->getOperand()), Expr->getType());
 return Expr;
}
988–989

fold expressions that is -> fold an expression that is

988–989

proceed optimization if equal. -> proceed with optimization, if equal.

993

with respect to -> based on

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
223

Given that the stride is o x 4, it's not clear how the loop guard can help make assumptions about sign/zero extension. Can you explain this example in the description section?

eopXD updated this revision to Diff 366783.Aug 16 2021, 6:25 PM
eopXD marked 7 inline comments as done.

Address comments.
Thank you for pointing out the grammatical errors.

eopXD updated this revision to Diff 367393.EditedAug 18 2021, 8:08 PM

Update test case CHECK based on change of debug log.

This revision is now accepted and ready to land.Aug 19 2021, 8:41 AM
fhahn added a comment.Aug 19 2021, 9:14 AM

Could you clean up the basic block names & co in the test case?

eopXD updated this revision to Diff 367706.Aug 19 2021, 9:10 PM

Update test case with better renaming and clear redundant blocks.

eopXD added a comment.Aug 20 2021, 7:15 AM

Thank you @lebedev.ri and @bmahjour for reviewing the patch. I am now going to commit it.

bmahjour added a comment.EditedAug 20 2021, 7:45 AM

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode. But what if we are compiling in 32-bit mode, or if 'n' and 'm' where 'signed long long'?

eopXD added a comment.EditedAug 21 2021, 3:26 AM

But what if we are compiling in 32-bit mode,

This is a valid point. Thank you for pointing this out.
Created D108507 to add test case that is aware of 32-bit mode.

or if 'n' and 'm' where 'signed long long'?

The accepted scenario for LIR is MemsetSizeSCEV == PositivePointerStrideSCEV.
The n, m accepted would always be non-negative. I can't think of scenario when n, m needs to remain as a signed long long. (Please correct me if I'm wrong.

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode.

I agree with you that SCEVFolder has trivial functionality. It only to turn sext to zext. This patch only fixes the test case added, which is very limited.
On the other hand, I don't think I have deep understanding to benchmarks. So I am not sure if I have covered most of the "practical memset idioms" that can be optimized here. Do you have other "fold loop-guard into SCEV-expression` cases that you wish this patch to have? I would be happy to resolve it.

or if 'n' and 'm' where 'signed long long'?

The accepted scenario for LIR is MemsetSizeSCEV == PositivePointerStrideSCEV.
The n, m accepted would always be non-negative. I can't think of scenario when n, m needs to remain as a signed long long. (Please correct me if I'm wrong.

Since we don't normalize loops we may have reverse loops where the upper bound is a negative signed value. Another example is if IVs start with negative values:

void foo(int n, int m, float Arr[n][m]) {
  for (int i = -100; i < n; i++)
    for (int j = -200; j < m; j++)
      Arr[i+100][j+200] = 1.0;
}

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode.

I agree with you that SCEVFolder has trivial functionality. It only to turn sext to zext. This patch only fixes the test case added, which is very limited.

The term "fold" doesn't seem to be the right terminology. I think "rewrite" is more appropriate. For example, SCEVFolder may be called SCEVSignToZeroExtentionRewriter (or something similar).

On the other hand, I don't think I have deep understanding to benchmarks. So I am not sure if I have covered most of the "practical memset idioms" that can be optimized here. Do you have other "fold loop-guard into SCEV-expression` cases that you wish this patch to have? I would be happy to resolve it.

I don't have any specific benchmarks in mind.

eopXD retitled this revision from [LoopIdiom] let LIR fold memset pointer / stride SCEV regarding loop guards to [LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards.Oct 11 2021, 5:51 PM
Whitney added a project: Restricted Project.Dec 1 2021, 8:26 AM
eopXD updated this revision to Diff 393150.Dec 9 2021, 6:59 AM

Rebase to latest main.

eopXD updated this revision to Diff 393160.Dec 9 2021, 7:19 AM

Update testcase due to rebase.

eopXD updated this revision to Diff 393170.Dec 9 2021, 8:06 AM

Rename SCEVFolder to SCEVSignToZeroExtentionRewriter.
Add example with negative start.

eopXD added a comment.Dec 9 2021, 8:15 AM

Hi @bmahjour,

Sorry for such delay.

I think the example you raised don't cause a problem to the current implementation.
The comparison is between "pointer stride" and "memset size", which all need to be non-negative.
The start/end of the induction variable don't matter because if the origin stride is negative it will be turned positive before comparison.

I have addressed your comment on renaming the SCEV folding structure and added test case for negative start loops like your example.
(Your example is loop strided store, I modified it to memset in the examples)

bmahjour added inline comments.Dec 10 2021, 11:02 AM
llvm/test/Transforms/LoopIdiom/memset-runtime.ll
110–140

Why is this test removed?

eopXD added inline comments.Dec 10 2021, 9:19 PM
llvm/test/Transforms/LoopIdiom/memset-runtime.ll
110–140

The test file is split into 2 files, memset-runtime-32bit.ll and memset-runtime-64bit.ll to test on 32-bit mode and 64-bit mode in D108507. So memset-runtime.ll is gone.

LGTM

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
110–140

If you could put the comment, describing what these tests are meant for (ie lines 120-125), at the beginning of memset-runtime-32bit.ll and/or memset-runtime-64bit.ll it would be helpful.

bmahjour accepted this revision.Dec 13 2021, 7:10 AM
eopXD updated this revision to Diff 393893.Dec 13 2021, 7:29 AM

Rebase.

eopXD updated this revision to Diff 393909.Dec 13 2021, 8:26 AM

Last rebase was with the wrong version, rebase again.

eopXD updated this revision to Diff 393911.Dec 13 2021, 8:34 AM

Address @bmahjour 's comment, add comment to test case this patch improves.

This revision was landed with ongoing or failed builds.Dec 13 2021, 9:37 AM
This revision was automatically updated to reflect the committed changes.