Page MenuHomePhabricator

[LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards
AcceptedPublic

Authored by eopXD on Aug 16 2021, 2:52 AM.

Details

Summary

Expression guraded in loop entry can be folded prior to comparison. This patch
proceeds D107353 and makes LIR able to deal with nested for-loop.

Diff Detail

Event Timeline

eopXD created this revision.Aug 16 2021, 2:52 AM
eopXD requested review of this revision.Aug 16 2021, 2:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2021, 2:52 AM

I think this needs to be somewhere in SCEV itself.

eopXD added a comment.Aug 16 2021, 3:20 AM

I think this needs to be somewhere in SCEV itself.

Do you mean the structure SCEVFolder?

eopXD updated this revision to Diff 366579.Aug 16 2021, 3:23 AM

Fix clang-format warning.

I think this needs to be somewhere in SCEV itself.

Do you mean the structure SCEVFolder?

No, i mean that the ScalarEvolution should perform this canonicalization internally, in ScalarEvolution::getSignExtendExpr()
It already does it, in fact: https://github.com/llvm/llvm-project/blob/0dc6b597db4d8b25f3df96a8f6574be732e18fdd/llvm/lib/Analysis/ScalarEvolution.cpp#L2100-L2103
Though perhaps that is not feasible because we don't know the loop we're after when in ScalarEvolution::getSignExtendExpr().

eopXD added a comment.Aug 16 2021, 3:36 AM

No, i mean that the ScalarEvolution should perform this canonicalization internally, in ScalarEvolution::getSignExtendExpr()
It already does it, in fact: https://github.com/llvm/llvm-project/blob/0dc6b597db4d8b25f3df96a8f6574be732e18fdd/llvm/lib/Analysis/ScalarEvolution.cpp#L2100-L2103

I see. Thank you for the elaboration.

Though perhaps that is not feasible because we don't know the loop we're after when in ScalarEvolution::getSignExtendExpr().

I agree.

Whitney added a subscriber: bmahjour.
eopXD updated this revision to Diff 366661.Aug 16 2021, 9:26 AM

Fix some minor type-o.

bmahjour added inline comments.Aug 16 2021, 10:02 AM
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp
310

that is -> that are

319

greater of -> greater or

321–325
  if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Expr,
                                  SE.getZero(Expr->getType())))
    return SE.getZeroExtendExpr(visit(Expr->getOperand()), Expr->getType());
 return Expr;
}
988–989

fold expressions that is -> fold an expression that is

988–989

proceed optimization if equal. -> proceed with optimization, if equal.

993

with respect to -> based on

llvm/test/Transforms/LoopIdiom/memset-runtime.ll
223

Given that the stride is o x 4, it's not clear how the loop guard can help make assumptions about sign/zero extension. Can you explain this example in the description section?

eopXD updated this revision to Diff 366783.Aug 16 2021, 6:25 PM
eopXD marked 7 inline comments as done.

Address comments.
Thank you for pointing out the grammatical errors.

eopXD updated this revision to Diff 367393.EditedAug 18 2021, 8:08 PM

Update test case CHECK based on change of debug log.

This revision is now accepted and ready to land.Aug 19 2021, 8:41 AM
fhahn added a comment.Aug 19 2021, 9:14 AM

Could you clean up the basic block names & co in the test case?

eopXD updated this revision to Diff 367706.Aug 19 2021, 9:10 PM

Update test case with better renaming and clear redundant blocks.

eopXD added a comment.Aug 20 2021, 7:15 AM

Thank you @lebedev.ri and @bmahjour for reviewing the patch. I am now going to commit it.

bmahjour added a comment.EditedAug 20 2021, 7:45 AM

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode. But what if we are compiling in 32-bit mode, or if 'n' and 'm' where 'signed long long'?

eopXD added a comment.EditedAug 21 2021, 3:26 AM

But what if we are compiling in 32-bit mode,

This is a valid point. Thank you for pointing this out.
Created D108507 to add test case that is aware of 32-bit mode.

or if 'n' and 'm' where 'signed long long'?

The accepted scenario for LIR is MemsetSizeSCEV == PositivePointerStrideSCEV.
The n, m accepted would always be non-negative. I can't think of scenario when n, m needs to remain as a signed long long. (Please correct me if I'm wrong.

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode.

I agree with you that SCEVFolder has trivial functionality. It only to turn sext to zext. This patch only fixes the test case added, which is very limited.
On the other hand, I don't think I have deep understanding to benchmarks. So I am not sure if I have covered most of the "practical memset idioms" that can be optimized here. Do you have other "fold loop-guard into SCEV-expression` cases that you wish this patch to have? I would be happy to resolve it.

or if 'n' and 'm' where 'signed long long'?

The accepted scenario for LIR is MemsetSizeSCEV == PositivePointerStrideSCEV.
The n, m accepted would always be non-negative. I can't think of scenario when n, m needs to remain as a signed long long. (Please correct me if I'm wrong.

Since we don't normalize loops we may have reverse loops where the upper bound is a negative signed value. Another example is if IVs start with negative values:

void foo(int n, int m, float Arr[n][m]) {
  for (int i = -100; i < n; i++)
    for (int j = -200; j < m; j++)
      Arr[i+100][j+200] = 1.0;
}

I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode.

I agree with you that SCEVFolder has trivial functionality. It only to turn sext to zext. This patch only fixes the test case added, which is very limited.

The term "fold" doesn't seem to be the right terminology. I think "rewrite" is more appropriate. For example, SCEVFolder may be called SCEVSignToZeroExtentionRewriter (or something similar).

On the other hand, I don't think I have deep understanding to benchmarks. So I am not sure if I have covered most of the "practical memset idioms" that can be optimized here. Do you have other "fold loop-guard into SCEV-expression` cases that you wish this patch to have? I would be happy to resolve it.

I don't have any specific benchmarks in mind.

eopXD retitled this revision from [LoopIdiom] let LIR fold memset pointer / stride SCEV regarding loop guards to [LoopIdiom] Let LIR fold memset pointer / stride SCEV regarding loop guards.Oct 11 2021, 5:51 PM
Whitney added a project: Restricted Project.Wed, Dec 1, 8:26 AM