Expression guraded in loop entry can be folded prior to comparison. This patch
proceeds D107353 and makes LIR able to deal with nested for-loop.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
No, i mean that the ScalarEvolution should perform this canonicalization internally, in ScalarEvolution::getSignExtendExpr()
It already does it, in fact: https://github.com/llvm/llvm-project/blob/0dc6b597db4d8b25f3df96a8f6574be732e18fdd/llvm/lib/Analysis/ScalarEvolution.cpp#L2100-L2103
Though perhaps that is not feasible because we don't know the loop we're after when in ScalarEvolution::getSignExtendExpr().
No, i mean that the ScalarEvolution should perform this canonicalization internally, in ScalarEvolution::getSignExtendExpr()
It already does it, in fact: https://github.com/llvm/llvm-project/blob/0dc6b597db4d8b25f3df96a8f6574be732e18fdd/llvm/lib/Analysis/ScalarEvolution.cpp#L2100-L2103
I see. Thank you for the elaboration.
Though perhaps that is not feasible because we don't know the loop we're after when in ScalarEvolution::getSignExtendExpr().
I agree.
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | ||
---|---|---|
310 | that is -> that are | |
319 | greater of -> greater or | |
321–325 | if (SE.isLoopEntryGuardedByCond(CurLoop, ICmpInst::ICMP_SGE, Expr, SE.getZero(Expr->getType()))) return SE.getZeroExtendExpr(visit(Expr->getOperand()), Expr->getType()); return Expr; } | |
988–989 | fold expressions that is -> fold an expression that is | |
988–989 | proceed optimization if equal. -> proceed with optimization, if equal. | |
993 | with respect to -> based on | |
llvm/test/Transforms/LoopIdiom/memset-runtime.ll | ||
223 | Given that the stride is o x 4, it's not clear how the loop guard can help make assumptions about sign/zero extension. Can you explain this example in the description section? |
Thank you @lebedev.ri and @bmahjour for reviewing the patch. I am now going to commit it.
I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode. But what if we are compiling in 32-bit mode, or if 'n' and 'm' where 'signed long long'?
This is a valid point. Thank you for pointing this out.
Created D108507 to add test case that is aware of 32-bit mode.
or if 'n' and 'm' where 'signed long long'?
The accepted scenario for LIR is MemsetSizeSCEV == PositivePointerStrideSCEV.
The n, m accepted would always be non-negative. I can't think of scenario when n, m needs to remain as a signed long long. (Please correct me if I'm wrong.
I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode.
I agree with you that SCEVFolder has trivial functionality. It only to turn sext to zext. This patch only fixes the test case added, which is very limited.
On the other hand, I don't think I have deep understanding to benchmarks. So I am not sure if I have covered most of the "practical memset idioms" that can be optimized here. Do you have other "fold loop-guard into SCEV-expression` cases that you wish this patch to have? I would be happy to resolve it.
Since we don't normalize loops we may have reverse loops where the upper bound is a negative signed value. Another example is if IVs start with negative values:
void foo(int n, int m, float Arr[n][m]) { for (int i = -100; i < n; i++) for (int j = -200; j < m; j++) Arr[i+100][j+200] = 1.0; }
I still have some reservations about the SCEVFolder implementation in this patch. It doesn't really fold anything, it just zero extends an expression if it is safe to do so. The assumption being made is that the expression it compares against is unsigned (so a "folding" opportunity is realized). In the LIT test provided the trip count is zero extended by loop simplify because it is originally 32-bit and the code is targeting 64-bit mode.
I agree with you that SCEVFolder has trivial functionality. It only to turn sext to zext. This patch only fixes the test case added, which is very limited.
The term "fold" doesn't seem to be the right terminology. I think "rewrite" is more appropriate. For example, SCEVFolder may be called SCEVSignToZeroExtentionRewriter (or something similar).
On the other hand, I don't think I have deep understanding to benchmarks. So I am not sure if I have covered most of the "practical memset idioms" that can be optimized here. Do you have other "fold loop-guard into SCEV-expression` cases that you wish this patch to have? I would be happy to resolve it.
I don't have any specific benchmarks in mind.
Rename SCEVFolder to SCEVSignToZeroExtentionRewriter.
Add example with negative start.
Hi @bmahjour,
Sorry for such delay.
I think the example you raised don't cause a problem to the current implementation.
The comparison is between "pointer stride" and "memset size", which all need to be non-negative.
The start/end of the induction variable don't matter because if the origin stride is negative it will be turned positive before comparison.
I have addressed your comment on renaming the SCEV folding structure and added test case for negative start loops like your example.
(Your example is loop strided store, I modified it to memset in the examples)
llvm/test/Transforms/LoopIdiom/memset-runtime.ll | ||
---|---|---|
110–140 | Why is this test removed? |
LGTM
llvm/test/Transforms/LoopIdiom/memset-runtime.ll | ||
---|---|---|
110–140 | If you could put the comment, describing what these tests are meant for (ie lines 120-125), at the beginning of memset-runtime-32bit.ll and/or memset-runtime-64bit.ll it would be helpful. |
that is -> that are