This is an archive of the discontinued LLVM Phabricator instance.

[LV] Emitting SCEV checks with OptForSize
ClosedPublic

Authored by SjoerdMeijer on Sep 26 2019, 5:38 AM.

Details

Summary

This is addressing PR43371. When optimizing for size, and SCEV runtime checks
need to be emitted to check overflow behaviour, the loop vectorizer can run in this
assert:

  
LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks(
llvm::Loop *, llvm::BasicBlock *): Assertion `!BB->getParent()->hasOptSize()
&& "Cannot SCEV check stride or overflow when optimizing for size"'

SCEV should not generate predicates while optimising for size, because
code will be generated for predicates, such as these SCEV overflow runtime
checks.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Sep 26 2019, 5:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2019, 5:38 AM
Ayal added a comment.Sep 26 2019, 2:44 PM

At this point, PSE.getUnionPredicate().getPredicates() returns 0

Can you track where/when the predicate is generated and added to PSE?

The original bailout in LoopVectorizationLegality::canVectorize() checking if (PSE.getUnionPredicate().getComplexity() > SCEVThreshold) could potentially be extended - essentially zeroing SCEVThreshold for hasOptSize, but this takes place *earlier* than runtimeChecksRequired() which is claimed to be too early, contrary to the following claim:

// Okay! We've done all the tests. If any have failed, return false. Otherwise
// we can vectorize, and at this point we don't have any other mem analysis
// which may limit our maximum vectorization factor, so just return true with
// no restrictions.

The latest bailout is probably LoopVectorizationCostModel::selectVectorizationFactor(), but it's better to bailout earlier.

BTW, a related opportunity is // FIXME: Avoid specializing for stride==1 instead of bailing out.

SjoerdMeijer updated this revision to Diff 222117.EditedSep 27 2019, 3:23 AM
SjoerdMeijer edited the summary of this revision. (Show Details)

Thanks for all the pointers and info, that helped me getting back on track.
Hopefully things are actually as simple as the change in this patch (this passes check-all).

Ayal added inline comments.Sep 28 2019, 8:00 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6378 ↗(On Diff #222117)

Better try to fix the issue at its core; isConsecutivePtr() should be conservative and avoid generating predicates under hasOptSize, by preventing getPtrStride() from doing so.
This may lead to successfully vectorizing the loop, albeit with a gather/scatter instead of a (speculative) vector load/store.

llvm/test/Transforms/LoopVectorize/pr43371-optsize-scevchecks.ll
5 ↗(On Diff #222117)

Better try to augment existing optsize files with additional tests.

SjoerdMeijer updated this revision to Diff 222449.EditedSep 30 2019, 9:14 AM
SjoerdMeijer edited the summary of this revision. (Show Details)
  • modified SCEV so that it doesn't add Predicates when OptForSize is enabled. This has indeed the consequence that vectorisation will happen, it will scalarise the stores.
  • moved the test case to an exisiting optsize file.
SjoerdMeijer retitled this revision from [LV] Emitting SCEV checks with OptForSize to [SCEV] Don't add Predicates with OptForSize.Sep 30 2019, 9:19 AM
SjoerdMeijer edited the summary of this revision. (Show Details)

Apologies for the early ping and for the impatience! But I am just really keen on getting this fixed. :-)

Ayal added a subscriber: sanjoy.Oct 4 2019, 8:05 AM

Changing PredicatedScalarEvolution::getAsAddRec() to consider hasOptSize sounds a bit strange to me; after all the transformation that created PSCEV should have considered this; adding @sanjoy for more thoughts on such a proposed SCEV change.

The earlier "fix at the core" thought was for LV to do something along these lines:

@@ -409,7 +409,9 @@ int LoopVectorizationLegality::isConsecutivePtr(Value *Ptr) {
   const ValueToValueMap &Strides =
       getSymbolicStrides() ? *getSymbolicStrides() : ValueToValueMap();
 
-  int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides, true, false);
+  bool CanAddPredicates = !TheLoop->getHeader()->getParent()->hasOptSize();
+  int Stride =
+      getPtrStride(PSE, Ptr, TheLoop, Strides, CanAddPredicates, false);
   if (Stride == 1 || Stride == -1)
     return Stride;
   return 0;

and the related "BTW, a related opportunity is // FIXME: Avoid specializing for stride==1 instead of bailing out" was for LAI to do something along these lines:

@@ -2295,6 +2295,11 @@ void LoopAccessInfo::collectStridedAccess(Value *MemAccess) {
                        "versioning:");
   LLVM_DEBUG(dbgs() << "  Ptr: " << *Ptr << " Stride: " << *Stride << "\n");
 
+  if (TheLoop->getHeader()->getParent()->hasOptSize()) {
+    /* issue an LLVM_DEBUG message that versioning is avoided due to opt-for-size */
+    return;
+  }
+
   // Avoid adding the "Stride == 1" predicate when we know that
   // Stride >= Trip-Count. Such a predicate will effectively optimize a single
   // or zero iteration loop, as Trip-Count <= Stride == 1.

The earlier "fix at the core" thought was for LV to do something along these lines:

Okay, sorry for misunderstanding! The "core" for me was the place that produces predicates in the first place, which is happening by in PSE.getAsAddRec(Ptr) called by getPtrStride(). It was my understanding and line of thought that not producing them, return nullptr for getAsAddRec(), was the most clean and most minimal change, but I am not insisting at all of course! So thanks for your suggestions, I will prepare a patch based on this. If @sanjoy in the mean time does think this is an interesting change, then I could change again.

Thanks again for the help

I agree with @Ayal about changing getAsAddRec for OptForSize. In general, it's better to not "Analyze" if we know we won't be using the result of analysis.

In this regard, I do not like pass managers running Analyses for a Transformation pass w/o first letting the Transformation pass inspect the incoming IR, but that's a totally different discussion and I don't have a solution for that problem. Is new pass manager any better there?

sanjoy added a comment.Oct 4 2019, 3:26 PM

I agree with @Ayal about changing getAsAddRec for OptForSize. In general, it's better to not "Analyze" if we know we won't be using the result of analysis.

+1

I vote against having SCEV depend on OptForSize.

In this regard, I do not like pass managers running Analyses for a Transformation pass w/o first letting the Transformation pass inspect the incoming IR, but that's a totally different discussion and I don't have a solution for that problem.

Maybe I'm misunderstanding, but SCEV is lazy so if a transform looks at the IR and decides not to ask SCEV for trip count or call getSCEV then SCEV will not do any actual work.

Is new pass manager any better there?

hsaito added a comment.Oct 4 2019, 3:42 PM
>> In this regard, I do not like pass managers running Analyses for a Transformation pass w/o first letting the Transformation pass inspect the incoming IR, but that's a totally different discussion and I don't have a solution for that problem.

Maybe I'm misunderstanding, but SCEV is lazy so if a transform looks at the IR and decides not to ask SCEV for trip count or call getSCEV then SCEV will not do any actual work.

Not all analyses are lazy, right? For example, if all loops in the function have some structural issues that vectorizer doesn't like to work on, there aren't any reasons why we'd need to satisfy required analyses consumed only after that bail out point. Maybe, we just need to make many analyses lazy, i.e., not pass manager's problem.

SjoerdMeijer retitled this revision from [SCEV] Don't add Predicates with OptForSize to [LV] Emitting SCEV checks with OptForSize.

Thanks for your comments and thoughts.

This adds/implements @Ayal 's suggestion.

Ayal added a comment.Oct 5 2019, 11:09 AM

The other suggested fix in LoopAccessInfo::collectStridedAccess() indeed deserves a separate patch, being an optimization opportunity rather than preventing an assert -- it's stride predicate is added early enough to be caught by CM.computeMaxVF()'s bailout conditions and avoid assert, as scev4stride1() test in above X86/optsize.ll checks.

llvm/test/Transforms/LoopVectorize/X86/optsize.ll
7

Can one or both existing RUN's above be (re)used instead of adding a third?

201

run in an >> run into an

while >> due to

232

simplify the test: entry block can branch directly to the loop header block.

240

simplify the test: merge the two blocks of the loop into one.

Comments addressed:

  • cleaned up the test case a bit.
  • couldn't reuse an existing run-line, I guess because of -mcpu=skx, but a separate run line seems fine to me.

The other suggested fix in LoopAccessInfo::collectStridedAccess() indeed deserves a separate patch

Thanks for that suggestions, and I will address this separately. I have unfinished business in the vectorizer, and will add this to me my list of things to do next.

Ayal added a comment.Oct 8 2019, 2:27 PM

Comments addressed:

  • cleaned up the test case a bit.

Can be cleaned up a bit more, "header" block is still redundant.

  • couldn't reuse an existing run-line, I guess because of -mcpu=skx, but a separate run line seems fine to me.

Interesting. Better to check instead of guess. This distinction means that

  1. @pr43371() testcase belongs in test/Transforms/LoopVectorize/optsize.ll instead of test/Transforms/LoopVectorize/X86/optsize.ll
  2. InterleavedAccessInfo::collectConstStrideAccesses() in Analysis/VectorUtils.cpp is responsible for creating the predicate under skx early enough to bailout (despite having no interleave groups eventually). Best teach it too to call getPtrStride() with Assume=!hasOptSize instead of 'true'. This is another performance opportunity rather than prevention of asserts.

The other suggested fix in LoopAccessInfo::collectStridedAccess() indeed deserves a separate patch

Thanks for that suggestions, and I will address this separately. I have unfinished business in the vectorizer, and will add this to me my list of things to do next.

Cheers, moved the test back to directory LoopVectorize where it once was instead of LoopVectorize\X86 (which was indeed a mistake, and then added the extra runline out of frustration as the skx core was unknown to me).

My TODO list has grown to fixing up:

  1. InterleavedAccessInfo::collectConstStrideAccesses()
  2. LoopAccessInfo::collectStridedAccess()

Will follow up on this shortly.

Ayal accepted this revision.Oct 9 2019, 5:47 AM

This LGTM, with additional CHECK-NOT to the test, thanks!

llvm/test/Transforms/LoopVectorize/optsize.ll
100 ↗(On Diff #224012)

Better also verify here that no SCEV predicates get generated, e.g., "CHECK-NOT: vector.scevcheck" as in pr39417-optsize-scevchecks.ll, or CHECK-NOT the predicates themselves as in pr34681.ll.

This revision is now accepted and ready to land.Oct 9 2019, 5:47 AM

Many thanks again for reviewing. I will add the check-not before committing, and as I said, I will follow up soon to address the other improvements opportunities.

This revision was automatically updated to reflect the committed changes.