When vectorization is forced with a pragma, we optimise for min size, and we need to emit runtime memory checks, then allow this code growth and don't run in an assert like we currently do.
Details
Diff Detail
Event Timeline
When we optimise for size, and need to emit runtime checks to disambiguate memory, we should nicely bail, don't vectorise, and not run in an assert like we currently do.
The logic currently in place to do so is to call LVP.plan() which calls computeMaxVF(), which does
// Bail if runtime checks are required, which are not good when optimising // for size. if (runtimeChecksRequired()) return None;
Does this need to change, presumably move earlier?
Which commit does this trace back to, D65197?
This is the result of D65197 and D66803. The former was the functional change, the latter changed the assert that is now triggered.
Add a reference to the problem @uabelho reported.
There is no corresponding PR for this. The problem reported by @uabelho is added here as a test-case/regression test.
Why is the current logic placed too late, and should it be removed if moved earlier as suggested?
Initially I didn't understand this remark, because I thought I had moved the new logic to happen earlier. But having thought about it, I think your question actually is why the existing logic doesn't work, why it doesn't capture this case as it looks like it should. Thus, this extra bit of logic added in this patch shouldn't be necessary, at least not in this form.
And I agree with that. I think in the existing logic there's a F->hasOptSize() check missing. I am going to look again...
Ah, I had completely missed that this is about the interaction of OptForSize and the loop.vectorize loop hint!
Vectorization is forced, which means that we vectorize even though OptForSize is set and we need to emit runtime checks.
The symptom is that we run in this assert in emitMemRuntimeChecks():
assert(!BB->getParent()->hasOptSize() && "Cannot emit memory checks when optimizing for size");
In getScalarEpilogueLowering(), we do not set CM_ScalarEpilogueNotAllowedOptSize because of the loop hint:
if (Hints.getForce() != LoopVectorizeHints::FK_Enabled && (F->hasOptSize() || llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI))) { SEL = CM_ScalarEpilogueNotAllowedOptSize;
And because CM_ScalarEpilogueNotAllowedOptSize is not set, we do not check runtimeChecksRequired, so that we end up with code growth under OptForSize because of the loop hint.
If we agree that this is desired behaviour, which looks reasonable to me because of this condition, I am going to remove the assert as that is not valid.
Better to make the assert valid by checking !forcedToVectorize in addition to !OptForSize.
So starting from D65197 or so, a loop that is forced to vectorize in a function that is OptForSize, is expected to get vectorized even if it involves replicating the loop.
Another assert which presumably requires similar update is the "Cannot SCEV check stride or overflow when optimizing for size" one.
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h | ||
---|---|---|
315 ↗ | (On Diff #221211) | Method itself should be const. |
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
2733–2746 | Assert message should say it all, i.e., "... when optimizing for size unless forced to vectorize"; above documentation seems redundant. In case we are forced to vectorize, optimize for size, and end up replicating the loop due to runtime checks, we should issue an LLVM_DEBUG/ORE message, informing the programmer code bloat may be reduced by removing the forced vectorization, or perhaps by modifying the code s.t. runtime checks are not needed (e.g., adding "restrict"). | |
llvm/test/Transforms/LoopVectorize/runtime-check-optsize.ll | ||
1 ↗ | (On Diff #221211) | Suffice to run loop-vectorize only, instead of -O1? Would be good to add this test to a relevant file, instead of dedicating a new file for each test. |
Another assert which presumably requires similar update is the "Cannot SCEV check stride or overflow when optimizing for size" one.
Yep, you're exactly right. Fallout from the same work.
We have a PR for that one, https://bugs.llvm.org/show_bug.cgi?id=43371, which I will give exactly the same treatment as soon as this change is accepted.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2737 | Check is redundant, Forced is asserted. Probably need to have a dummy use of Forced, or avoid defining it, to avoid warning of unused variable in release mode. | |
llvm/test/Transforms/LoopVectorize/runtime-check.ll | ||
185 | This check line belongs to previous test |
llvm/test/Transforms/LoopVectorize/runtime-check.ll | ||
---|---|---|
187 | missing optsize attribute? |
Added minsize func attr, and also added the check for the new vectorization remark, which I had also missed porting to the new test file... :-( Thanks for spotting. The fix for the other assert will be better than this...
Check is redundant, Forced is asserted.
Probably need to have a dummy use of Forced, or avoid defining it, to avoid warning of unused variable in release mode.