This is an archive of the discontinued LLVM Phabricator instance.

[LV] Forced vectorization with runtime checks and OptForSize
ClosedPublic

Authored by SjoerdMeijer on Sep 19 2019, 7:43 AM.

Details

Summary

When vectorization is forced with a pragma, we optimise for min size, and we need to emit runtime memory checks, then allow this code growth and don't run in an assert like we currently do.

Diff Detail

Repository
rL LLVM

Event Timeline

SjoerdMeijer created this revision.Sep 19 2019, 7:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2019, 7:43 AM

It fixes the problem I reported. Thanks!

Ayal added a comment.Sep 20 2019, 9:12 PM

When we optimise for size, and need to emit runtime checks to disambiguate memory, we should nicely bail, don't vectorise, and not run in an assert like we currently do.

The logic currently in place to do so is to call LVP.plan() which calls computeMaxVF(), which does

// Bail if runtime checks are required, which are not good when optimising
// for size.
if (runtimeChecksRequired())
  return None;

Does this need to change, presumably move earlier?

Thanks. The runtime check is now done earlier.

Ayal added a comment.Sep 21 2019, 8:05 AM

Which commit does this trace back to, D65197? Why is the current logic placed too late, and should it be removed if moved earlier as suggested? Add a reference to the problem @uabelho reported.

Which commit does this trace back to, D65197?

This is the result of D65197 and D66803. The former was the functional change, the latter changed the assert that is now triggered.

Add a reference to the problem @uabelho reported.

There is no corresponding PR for this. The problem reported by @uabelho is added here as a test-case/regression test.

Why is the current logic placed too late, and should it be removed if moved earlier as suggested?

Initially I didn't understand this remark, because I thought I had moved the new logic to happen earlier. But having thought about it, I think your question actually is why the existing logic doesn't work, why it doesn't capture this case as it looks like it should. Thus, this extra bit of logic added in this patch shouldn't be necessary, at least not in this form.

And I agree with that. I think in the existing logic there's a F->hasOptSize() check missing. I am going to look again...

Ah, I had completely missed that this is about the interaction of OptForSize and the loop.vectorize loop hint!
Vectorization is forced, which means that we vectorize even though OptForSize is set and we need to emit runtime checks.

The symptom is that we run in this assert in emitMemRuntimeChecks():

assert(!BB->getParent()->hasOptSize() &&
     "Cannot emit memory checks when optimizing for size");

In getScalarEpilogueLowering(), we do not set CM_ScalarEpilogueNotAllowedOptSize because of the loop hint:

if (Hints.getForce() != LoopVectorizeHints::FK_Enabled &&
    (F->hasOptSize() ||
     llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI))) {
  SEL = CM_ScalarEpilogueNotAllowedOptSize;

And because CM_ScalarEpilogueNotAllowedOptSize is not set, we do not check runtimeChecksRequired, so that we end up with code growth under OptForSize because of the loop hint.

If we agree that this is desired behaviour, which looks reasonable to me because of this condition, I am going to remove the assert as that is not valid.

SjoerdMeijer retitled this revision from [LV] Runtime checks with OptForSize to [LV] Forced vectorization with runtime checks and OptForSize.
SjoerdMeijer edited the summary of this revision. (Show Details)
Ayal added a comment.Sep 21 2019, 5:41 PM

Ah, I had completely missed that this is about the interaction of OptForSize and the loop.vectorize loop hint!
Vectorization is forced, which means that we vectorize even though OptForSize is set and we need to emit runtime checks.

The symptom is that we run in this assert in emitMemRuntimeChecks():

assert(!BB->getParent()->hasOptSize() &&
     "Cannot emit memory checks when optimizing for size");

In getScalarEpilogueLowering(), we do not set CM_ScalarEpilogueNotAllowedOptSize because of the loop hint:

if (Hints.getForce() != LoopVectorizeHints::FK_Enabled &&
    (F->hasOptSize() ||
     llvm::shouldOptimizeForSize(L->getHeader(), PSI, BFI))) {
  SEL = CM_ScalarEpilogueNotAllowedOptSize;

And because CM_ScalarEpilogueNotAllowedOptSize is not set, we do not check runtimeChecksRequired, so that we end up with code growth under OptForSize because of the loop hint.

If we agree that this is desired behaviour, which looks reasonable to me because of this condition, I am going to remove the assert as that is not valid.

Better to make the assert valid by checking !forcedToVectorize in addition to !OptForSize.

Ayal added a comment.Sep 21 2019, 5:43 PM

Better to make the assert valid by checking !forcedToVectorize in addition to !OptForSize.

and also dump some warning.

Restored the assert and expanded it with the !forcedToVectorize check

Ayal added a comment.Sep 22 2019, 9:15 PM

So starting from D65197 or so, a loop that is forced to vectorize in a function that is OptForSize, is expected to get vectorized even if it involves replicating the loop.
Another assert which presumably requires similar update is the "Cannot SCEV check stride or overflow when optimizing for size" one.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
315 ↗(On Diff #221211)

Method itself should be const.
Could alternatively get Hints from CostModel.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2738 ↗(On Diff #221211)

Assert message should say it all, i.e., "... when optimizing for size unless forced to vectorize"; above documentation seems redundant.

In case we are forced to vectorize, optimize for size, and end up replicating the loop due to runtime checks, we should issue an LLVM_DEBUG/ORE message, informing the programmer code bloat may be reduced by removing the forced vectorization, or perhaps by modifying the code s.t. runtime checks are not needed (e.g., adding "restrict").

llvm/test/Transforms/LoopVectorize/runtime-check-optsize.ll
1 ↗(On Diff #221211)

Suffice to run loop-vectorize only, instead of -O1?

Would be good to add this test to a relevant file, instead of dedicating a new file for each test.

All comments addressed.

Another assert which presumably requires similar update is the "Cannot SCEV check stride or overflow when optimizing for size" one.

Yep, you're exactly right. Fallout from the same work.
We have a PR for that one, https://bugs.llvm.org/show_bug.cgi?id=43371, which I will give exactly the same treatment as soon as this change is accepted.

Ayal added inline comments.Sep 23 2019, 7:35 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2737 ↗(On Diff #221322)

Check is redundant, Forced is asserted.

Probably need to have a dummy use of Forced, or avoid defining it, to avoid warning of unused variable in release mode.

llvm/test/Transforms/LoopVectorize/runtime-check.ll
185 ↗(On Diff #221322)

This check line belongs to previous test

sorry, forgot to clean that up, keep the original order

Ayal added inline comments.Sep 23 2019, 11:52 AM
llvm/test/Transforms/LoopVectorize/runtime-check.ll
163 ↗(On Diff #221339)

missing optsize attribute?

Added minsize func attr, and also added the check for the new vectorization remark, which I had also missed porting to the new test file... :-( Thanks for spotting. The fix for the other assert will be better than this...

Ayal accepted this revision.Sep 23 2019, 5:36 PM

This LGTM, thanks for fixing!

This revision is now accepted and ready to land.Sep 23 2019, 5:36 PM

Many thanks for reviewing!
Will now look at PR43371

This revision was automatically updated to reflect the committed changes.