This is an archive of the discontinued LLVM Phabricator instance.

[LV] Calculate max feasible scalable VF.
ClosedPublic

Authored by sdesmalen on Mar 12 2021, 7:21 AM.

Details

Summary

This patch also refactors the way the feasible max VF is calculated,
although this is NFC for fixed-width vectors.

After this change scalable VF hints are no longer truncated/clamped
to a shorter scalable VF, nor does it drop the 'scalable flag' from
the suggested VF to vectorize with a similar VF that is fixed.

Instead, the hint is ignored which means the vectorizer is free
to find a more suitable VF, using the CostModel to determine the
best possible VF.

Diff Detail

Event Timeline

sdesmalen created this revision.Mar 12 2021, 7:21 AM
sdesmalen requested review of this revision.Mar 12 2021, 7:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2021, 7:21 AM

Following discussion on D96021, this patch combines functionality in D96021, D96022 and D96023, and is aligns more closely with the approach outlined by @fhahn.
It does mean the patch is quite a bit bigger and therefore more difficult to review. I'm happy to split off parts of it into separate patches if that's useful.

vkmr added a subscriber: vkmr.Mar 15 2021, 3:54 AM
paulwalker-arm added inline comments.Mar 17 2021, 3:19 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5810–5813

This doesn't look particularly clean to my eye. I guess my immediate question is why WidestRegister is not a TypeSize given that would truly represent the size.

The same question can be asked of getRegisterBitWidth but perhaps that is a heavily used function? If so then what about introducing an explicit getVectorRegisterBitWidth that returns TypeSize plus allowing it to take a parameter that specifies whether the query is against fixed or scalable vectors.

I can see why you're asking for ScalableBitsPerBlock because you want to make sure you're able to safely divide this by the largest element count, but here I think using TypeSize.isKnownMultipleOf(SizeOfLargestElt) makes that intent clearer for all vector types.

sdesmalen updated this revision to Diff 331592.Mar 18 2021, 9:42 AM

Rebased onto D98874, which changes getRegisterBitWidth to return a TypeSize.

david-arm added inline comments.
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
138 ↗(On Diff #331592)

Hi @sdesmalen, to be honest I'm not sure I fully understand this. If the user has set the pragma:

#pragma clang loop vectorize_width(4, scalable)

then haven't they also explicitly disabled fixed width? Maybe it's just the name of the function that confuses me a little, since from the user's perspective it feels like

#pragma clang loop vectorize_width(4, scalable)

and

#pragma clang loop vectorize_width(scalable)

are both disabling fixed width. I thought they were both hints that can be dropped by the compiler if necessary?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1623

Should this be element count, since that's what the function returns here?

5627

Isn't this always going to fire for cases where UserVF is scalable and we cannot vectorise reductions? For example, getMaxLegalScalableVF returns ElementCount::getScalable(0) in this case.

5670

Are you deliberately ignoring the scalable case for now because this will be addressed in a later patch?

5823

You could also do

if (MaxVectorSize.isZero())
c-rhodes added inline comments.Mar 30 2021, 5:27 AM
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
126 ↗(On Diff #331592)

nit: explicitly disabled (?)

llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h
177

nit: Reports an informative vectorization message:

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1060–1061

drop failure

5567

nit: comma can be dropped

5664

nit: MaxVF?

5816–5819

MaxVF?

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll
185–188

Should this select the feasible scalable VF? I suppose this is related to the issue @david-arm pointed out in LoopVectorizationCostModel::computeFeasibleMaxVF

fhahn added inline comments.Mar 31 2021, 2:16 AM
llvm/include/llvm/Analysis/TargetTransformInfo.h
933 ↗(On Diff #331592)

This is not needed any longer, right?

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
138 ↗(On Diff #331592)

Would it be possible to decouple the change to compute the max VFs from the ones adding a new option & tweaking the handling here?

sdesmalen updated this revision to Diff 336093.Apr 8 2021, 7:06 AM
sdesmalen marked 12 inline comments as done.
  • Dropped -scalable-vectorization= LV option from this patch.
  • Addressed other comments.
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
138 ↗(On Diff #331592)

I've removed this code for now, as I'll have a follow-up patch that adds an option to enable/disable scalable vectorization separately.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5627

The statement above if (ElementCount::isKnownLE(UserVF, MaxSafeUserVF)) return UserVF should make sure that this assert is always true.

5670

Correct!

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll
185–188

I don't think it should do that by default. Personally I think it's best if the vectorizer just ignores hints that are not valid, and calculate a better answer instead, rather than blindly opting for something that seems "close enough", but may not be profitable.

That's akin to what happens for fixed-width vectors; if the UserVF is not legal (e.g. VF=16), and VF=8 would be legal, it will pick the most profitable VF <= 8, which isn't necessarily 8. Extending this to scalable vectors, we can argue that if VF=vscale x 8 is not legal, it should consider all VFs (fixed and scalable) up to whatever is legal. We can then add an LV option to favour scalable VFs if we know the cost-model doesn't have all the information to make the right decision (although that's something to consider for another patch).

Thanks for all the comments and feedback!

llvm/include/llvm/Analysis/TargetTransformInfo.h
933 ↗(On Diff #331592)

You're right, I've removed that now.

c-rhodes added inline comments.Apr 13 2021, 3:52 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5565

Flip this and return early?

5582–5584

should this be moved closer to where it's used?

5652–5654

could you add a comment stating this is ignored for now but will be addressed in a later patch?

5819

I know this came before your patch but I find it a bit confusing VF and VectorSize are use interchangeably

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-analysis.ll
2

nit: multiple spaces

32

nit: add newline above

sdesmalen updated this revision to Diff 337246.Apr 13 2021, 1:25 PM
sdesmalen marked 6 inline comments as done.
  • Addressed nits.
  • s/MaxVectorSize/MaxVectorElementCount/ (because it's not actually a size, but an element count).
sdesmalen added inline comments.Apr 13 2021, 1:28 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5582–5584

This is actually moved as close as possible, because the first use is WidestType (line 5591).

5819

I see how this is confusing, especially because the VectorSize is actually not a size, but rather an element count. The subtle distinction between the two is that the VF is more of a loop-vectorizer concept (the number of lanes handled per vector iteration), whereas the MaxVectorSize (or better yet, MaxVectorElementCount) is the maximum number of elements in a target's vector register.

Given that I'm changing a lot of lines in this function, it probably doesn't really make the diff much worse if I at least remove part of the confusion by renaming VectorSize -> VectorElementCount.

c-rhodes accepted this revision.Apr 14 2021, 2:49 AM

LGTM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5582–5584

This is actually moved as close as possible, because the first use is WidestType (line 5591).

This revision is now accepted and ready to land.Apr 14 2021, 2:49 AM
fhahn added inline comments.Apr 16 2021, 2:13 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1623

the other function refer to as VF factor as well, even though they return ElementCount. Using VF (or vectorization factor) seems more consistent with the rest of the code and also more descriptive I think, as it conveys some information on what the intended use is.

1625

The other related functions refer to VF instead of element count in their name. Is there a reason to use ElementCount instead of VF here? Perhaps the name could be made a bit more descriptive in general, including the fact that the function computes the max VF based on target-specific properties?

Matt added a subscriber: Matt.Apr 17 2021, 8:53 AM
sdesmalen updated this revision to Diff 339261.Apr 21 2021, 8:57 AM
  • s/getMaxVectorElementCount/getMaximizedVFForTarget/
  • Removed redundant call to isSafeForAnyVectorWidth
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1625

You're right that using VF is probably better/more consistent, I mostly chose ElementCount so not to avoid it with other MaxVF-related functions. I've now changed it to getMaximizedVFForTarget, does that make sense to you?

I've chosen the term 'Maximized' instead of 'Max' to distinguish it from 'Maximum', as in:
The LV tries to Maximize the VF based on the target's register width, but with a Maximum of MaxSafeVF.

I think I've addressed all comments now. Unless there are any other, I'll commit the patch in the coming days.

fhahn accepted this revision.Apr 27 2021, 1:40 AM

LGTM, thanks for the updates!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1625

Sounds good to me, but I am no expert of the English language :)

This revision was landed with ongoing or failed builds.Apr 28 2021, 4:33 AM
This revision was automatically updated to reflect the committed changes.

Thank you for all the input/feedback/reviews @fhahn and @c-rhodes!

Reverting this patch locally resolves the following failure (and many more) for me:
https://lab.llvm.org/buildbot/#/builders/140/builds/2/steps/5/logs/FAIL__Clang__loop-vectorize_c

Trying to figure out if this patch is running into UB. Disabling optimizations in the build seems to resolve these failures.

Reverting this patch locally resolves the following failure (and many more) for me:
https://lab.llvm.org/buildbot/#/builders/140/builds/2/steps/5/logs/FAIL__Clang__loop-vectorize_c

Trying to figure out if this patch is running into UB. Disabling optimizations in the build seems to resolve these failure

Hi @hubert.reinterpretcast, I've built this patch with UBSan but that didn't point out any issues. Do you have any suggestions on how to reproduce this locally? (I see that buildbot is using some IBM xlclang (proprietary?) compiler to build)

Hi @hubert.reinterpretcast, I've built this patch with UBSan but that didn't point out any issues. Do you have any suggestions on how to reproduce this locally? (I see that buildbot is using some IBM xlclang (proprietary?) compiler to build)

I'm actively working on diagnosing this. Would reverting this patch for a few days be much trouble for you? We're hoping to keep our CI green so it's easier to notice new problems.

Hi @hubert.reinterpretcast, I've built this patch with UBSan but that didn't point out any issues. Do you have any suggestions on how to reproduce this locally? (I see that buildbot is using some IBM xlclang (proprietary?) compiler to build)

I'm actively working on diagnosing this.

Thanks for looking into this!

Would reverting this patch for a few days be much trouble for you? We're hoping to keep our CI green so it's easier to notice new problems.

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Would reverting this patch for a few days be much trouble for you? We're hoping to keep our CI green so it's easier to notice new problems.

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

Maybe a paranoid step, but I confirmed that the impact is from the compilation of lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/LoopVectorize.cpp.o. Proceeding with cutting down from there.

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2b413fc4950..ec3eb32cbe6 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1639,7 +1639,7 @@ private:
   ElementCount getMaximizedVFForTarget(unsigned ConstTripCount,
                                        unsigned SmallestType,
                                        unsigned WidestType,
-                                       ElementCount MaxSafeVF);
+                                       const ElementCount &MaxSafeVF);

   /// \return the maximum legal scalable VF, based on the safe max number
   /// of elements.
@@ -5863,7 +5863,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {

 ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
     unsigned ConstTripCount, unsigned SmallestType, unsigned WidestType,
-    ElementCount MaxSafeVF) {
+    const ElementCount &MaxSafeVF) {
   bool ComputeScalableMaxVF = MaxSafeVF.isScalable();
   TypeSize WidestRegister = TTI.getRegisterBitWidth(
       ComputeScalableMaxVF ? TargetTransformInfo::RGK_ScalableVector

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

Hi @hubert.reinterpretcast, thanks for your investigation so far! I have gone through the code several times, but don't really understand why your suggestion fixes the issue. I've also built it with both GCC and Clang, with- and without sanitizers, and with valgrind.

Having said that, I'd be okay with applying your suggestion for the sake of re-landing the patch and not breaking a specific buildbot, but I would like to aim for a concrete date to remove the work-around. What do you think is sensible here?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5808

@hubert.reinterpretcast Here MinVF returns a const ElementCount & instead of ElementCount, not sure if that could make a difference?

fhahn added a comment.May 4 2021, 7:53 AM

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

@hubert.reinterpretcast can you provide an IR test case that triggers the failure with sanitizers?

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

@hubert.reinterpretcast can you provide an IR test case that triggers the failure with sanitizers?

The build compiler on the specific bot (due to what is currently shipping on the platform) is a proprietary compiler that does not use LLVM IR. We have not reached that point where we can confirm that this is a miscompile issue in the build compiler, but I think there is every chance that would be the conclusion (now that the reduction has reached the point where it is -- we have also encountered latent UB in LLVM source in the past). Prior issues of this sort have taken around two weeks to fix after it is reported to the appropriate team (which is delayed while I'm putting together a reduction). I had spent my whole weekend narrowing this down. I am not sure how much time I will have during the week to make further progress. I think we will be able to deploy a fix to the buildbot in early June.

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

Hi @hubert.reinterpretcast, thanks for your investigation so far! I have gone through the code several times, but don't really understand why your suggestion fixes the issue. I've also built it with both GCC and Clang, with- and without sanitizers, and with valgrind.

Thanks for looking into this from your end. I feel terrible that I wasn't more clear that a build compiler issue is looking to be likely.

Thanks for looking into this from your end. I feel terrible that I wasn't more clear that a build compiler issue is looking to be likely.

No worries, at least we have come up with a practical workaround for now. Please do let me know when this issue will be fixed in the compiler used by Buildbot, so that I can remove the workaround.

Thanks for all the effort you have put into investigating this!

MaskRay added a subscriber: MaskRay.EditedAug 12 2021, 2:38 PM

Disabling scalable vectorization, because target does not support scalable vectors. should not be emitted for non-aarch64 non-riscv.
Probably also a good idea not to emit it when sve is disabled on aarch64.

% cat a.cc
float A[128], B[128], C[128];

float foo() {
  float sum = 0;
  for (int i = 0; i < 128; i++)
    C[i] = A[i] + B[i];
  return sum;
}
% clang++ -O2 --target=x86_64-linux-gnu -S a.cc -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize
a.cc:5:3: remark: Disabling scalable vectorization, because target does not support scalable vectors. [-Rpass-analysis=loop-vectorize]
  for (int i = 0; i < 128; i++)
  ^
a.cc:5:3: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]

Patch: D108004