This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold
ClosedPublic

Authored by TiehuZhang on Mar 21 2022, 4:41 AM.

Details

Summary

The runtime check threshold should also restrict interleave count. Otherwise, too many runtime checks will be generated for some cases.

Diff Detail

Event Timeline

TiehuZhang created this revision.Mar 21 2022, 4:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 4:41 AM
TiehuZhang requested review of this revision.Mar 21 2022, 4:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 4:41 AM
dmgreen added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7474

isTooManyRuntimeChecks -> hasTooManyRuntimeChecks ?

7548–7549

Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?

llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll
1 ↗(On Diff #416894)

If you use -debug, it needs to REQUIRES: asserts
It may be a simpler test to just show that the codegen is not vectorized/interleaved though, with a comment explaining that it would be too much overhead.

3 ↗(On Diff #416894)

CHECK-LABEL

10 ↗(On Diff #416894)

There are quite a lot of extra blocks in this test. Can a lot of them be removed?

fhahn added inline comments.Mar 23 2022, 2:28 AM
llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll
76 ↗(On Diff #416894)

Please avoid using undef in the test unless necessary.

Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?

Does this mean this didn't help? It seems that code would not alter the IC in any case, as it is already returning VectorizationFactor::Disabled.

Just as a point of cleanup, is it possible to move the logic into selectInterleaveCount, so that it returns a value of 1 if there are too many runtime checks? I'm not sure all the info needed would be easily available inside the costmodel though.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7476

Should this be checking Hints.allowReordering like the other uses of PragmaVectorizeMemoryCheckThreshold?

Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?

Does this mean this didn't help? It seems that code would not alter the IC in any case, as it is already returning VectorizationFactor::Disabled.

Just as a point of cleanup, is it possible to move the logic into selectInterleaveCount, so that it returns a value of 1 if there are too many runtime checks? I'm not sure all the info needed would be easily available inside the costmodel though.

  1. Requirements can not be accessed in selectInterleaveCount, so additional parameters need to be added to selectInterleaveCount. Do you think it is necessary?<br>
  2. I'm not sure if I need to keep SelectedVF.Width.getKnownMinValue() > 1, but if it is not necessary, we could remove it in a separate patch.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7548–7549

Probably to reduce compilation time (although the impact is small) ?

llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll
1 ↗(On Diff #416894)

Done, thanks very much!

76 ↗(On Diff #416894)

Done, thanks very much!

dmgreen accepted this revision.Mar 31 2022, 4:25 AM
  1. Requirements can not be accessed in selectInterleaveCount, so additional parameters need to be added to selectInterleaveCount. Do you think it is necessary?<br>

Yeah - it might need to pass Requirements.getNumRuntimePointerChecks() through to selectInterleaveCount. I think it's OK as you have it right now, but Florian (or anyone else) can complain if they disagree.

  1. I'm not sure if I need to keep SelectedVF.Width.getKnownMinValue() > 1, but if it is not necessary, we could remove it in a separate patch.

I was just hoping that if it got to that block then it would prevent the interleaving. It looks like it doesn't work like that though.

LGTM. Perhaps wait a day or two in case anyone else has comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7548–7549

Maybe just use a single if now: if (SelectedVF.Width.getKnownMinValue() > 1 && hasTooManyRuntimeChecks()) {

This revision is now accepted and ready to land.Mar 31 2022, 4:25 AM
fhahn added inline comments.Mar 31 2022, 2:51 PM
llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
323

It would be good to document the helper. Also, requiresTooManyRuntimeChecks may be slightly better, because has seems to imply that the checks are already there to me.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10458

This check here should be sufficient, there should be no need to also check in selectInterleaveCount.

Could you just move the remark generation & early exit from ::plan here?

You might want to skip those checks if there's a UserVF or UserIC used, with those I think we should always vectorize if possible. It also might be good to add a check line to your test which forces an interleave count > 1.

llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll
1 ↗(On Diff #419415)

As this relies on the PPC cost-model, this needs to go into the PowerPC subdirectory.

12 ↗(On Diff #419415)

nit: can change %z_e_8676_1962 to i64 and remove the first load. Also, names could be tidied up a bit.

13 ↗(On Diff #419415)

nit: it would be good to clean up the names of the blocks a bit more.

  1. Requirements can not be accessed in selectInterleaveCount, so additional parameters need to be added to selectInterleaveCount. Do you think it is necessary?<br>

Yeah - it might need to pass Requirements.getNumRuntimePointerChecks() through to selectInterleaveCount. I think it's OK as you have it right now, but Florian (or anyone else) can complain if they disagree.

  1. I'm not sure if I need to keep SelectedVF.Width.getKnownMinValue() > 1, but if it is not necessary, we could remove it in a separate patch.

I was just hoping that if it got to that block then it would prevent the interleaving. It looks like it doesn't work like that though.

LGTM. Perhaps wait a day or two in case anyone else has comments.

I think SelectedVF.Width.getKnownMinValue() > 1 is used to avoid the check when VF is scalar, but ElementCount::isScalar() seems better.

I don't think the position for the runtimeCheck is good.
Now, the code looks like:
1.If UserVF is OK, do vectorization using UserVF
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates and select best VF
5.Check the number of runtime checks for selected VF if it's scalar. If it's too many, stop vectorization.

15234 would be a better order?
1.If UserVF is OK, do vectorization using UserVF
5.Check the number of runtime checks(No need to consider about the VF because the number of runtime checks are same for all VF). If it's too many, stop vectorization.
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates

I'm not sure of my opinion. If it's wrong, happy to get a response :)

TiehuZhang updated this revision to Diff 419716.Apr 1 2022, 4:41 AM
fhahn added a comment.Apr 4 2022, 2:00 PM

Thanks for the update!

I don't think the position for the runtimeCheck is good.
Now, the code looks like:
1.If UserVF is OK, do vectorization using UserVF
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates and select best VF
5.Check the number of runtime checks for selected VF if it's scalar. If it's too many, stop vectorization.

15234 would be a better order?
1.If UserVF is OK, do vectorization using UserVF
5.Check the number of runtime checks(No need to consider about the VF because the number of runtime checks are same for all VF). If it's too many, stop vectorization.
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates

I'm not sure of my opinion. If it's wrong, happy to get a response :)

It's indeed not at the most optimal position focused solely at compile-time. But I think for this patch, I think it would be good to roughly keep roughly the original order, especially if we manage to check & error at a single place.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10458

Can the handling be merged into a single check & diagnostic?

TiehuZhang updated this revision to Diff 420798.Apr 6 2022, 5:04 AM
TiehuZhang updated this revision to Diff 427300.May 5 2022, 6:31 AM

Fix the failed case (optimization-remark-options.c), because the remark info should be updated

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 6:31 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
TiehuZhang edited reviewers, added: Weiwei-2021; removed: weiwei.
TiehuZhang removed a reviewer: Weiwei-2021.

(Updated)
Difference with accepted version: Move memory runtime checks to processLoop to control both VF and IC

The code has been updated since accept. Please review it again. Thank you very much! @fhahn @dmgreen

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7548–7549

Done. Thanks for your review!

10458

Hi, @fhahn, thanks for your review! It sounds similar to doesNotMeet (https://reviews.llvm.org/D98634), but the difference is that I need to use UserIC and UserVF to control whether this check needs to be performed, right? E.g.

if (!UserVF && LVP.requiresTooManyRuntimeChecks()) {
  /*generate remarks*/
  VF = VectorizationFactor::Disabled();
}

if (!UserIC && LVP.requiresTooManyRuntimeChecks()) {
  /*generate remarks*/
  IC = 1;
}

Could you just move the remark generation & early exit from ::plan here?

You might want to skip those checks if there's a UserVF or UserIC used, with those I think we should always vectorize if possible. It also might be good to add a check line to your test which forces an interleave count > 1.

10458

Hi,@fhahn, thanks for your reply! Does the current version meet the requirements?

10458

Hi, @fhahn, is there any other problem with this patch?

ping

fhahn accepted this revision.May 12 2022, 8:46 AM

LGTM with additional suggestions inline, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10462

I think we usually try to use early exits to reduce the indentation, so it might be worth doing something like

if (LVP.requiresTooManyRuntimeChecks()) {
 ORE->emit([&]() {
        return OptimizationRemarkAnalysisAliasing(
                   DEBUG_TYPE, "CantReorderMemOps", L->getStartLoc(),
                   L->getHeader())
               << "loop not vectorized: cannot prove it is safe to reorder "
                  "memory operations";
      });
      LLVM_DEBUG(dbgs() << "LV: Too many memory checks needed.\n");
      Hints.emitRemarkWithHints();
      return false;
}

// Select the interleave count.
 IC = CM.selectInterleaveCount(VF.Width, *VF.Cost.getValue());

(this has the added benefit of not checking for !LVP.requiresTooManyRuntimeChecks() but the unnegated version, which is slightly more straight forward)

llvm/test/Transforms/LoopVectorize/PowerPC/interleaved-pointer-runtime-check-unprofitable.ll
3

Might be good to precommit the test case and then just show the difference in this diff (without the fix ; CHECK: vector.memcheck)

fhahn accepted this revision.May 16 2022, 12:40 AM

Still LGTM, thanks! The remaining suggestion can be addressed directly before committing the patch.

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
324

nit: could be turned into const if possible.

TiehuZhang added a comment.EditedMay 16 2022, 1:30 AM

Still LGTM, thanks! The remaining suggestion can be addressed directly before committing the patch.

Thanks, @fhahn! I'll turn the function into const and add the precommit test when committing the patch

This revision was landed with ongoing or failed builds.May 19 2022, 8:29 AM
This revision was automatically updated to reflect the committed changes.
Allen added a subscriber: Allen.May 21 2022, 10:30 PM