This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Vectorize/
-
llvm/
-
Transforms/
-
Vectorize/
1/1
LoopVectorize.h
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
19/23
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
-
scalable-reductions.ll
2/2
scalable-vf-analysis.ll
2/2
scalable-vf-hint.ll
-
scalable-vf-hint.ll

Differential D98509

[LV] Calculate max feasible scalable VF.
ClosedPublic

Authored by sdesmalen on Mar 12 2021, 7:21 AM.

Download Raw Diff

Details

Reviewers

c-rhodes
fhahn
dmgreen
gilr
evandro
paulwalker-arm

Commits

rG584e9b6e4b49: [LV] Calculate max feasible scalable VF.

Summary

This patch also refactors the way the feasible max VF is calculated,
although this is NFC for fixed-width vectors.

After this change scalable VF hints are no longer truncated/clamped
to a shorter scalable VF, nor does it drop the 'scalable flag' from
the suggested VF to vectorize with a similar VF that is fixed.

Instead, the hint is ignored which means the vectorizer is free
to find a more suitable VF, using the CostModel to determine the
best possible VF.

Diff Detail

Event Timeline

sdesmalen created this revision.Mar 12 2021, 7:21 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 12 2021, 7:21 AM

sdesmalen requested review of this revision.Mar 12 2021, 7:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2021, 7:21 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sdesmalen mentioned this in D96021: [LoopVectorize] NFC: Move UserVF feasibility checks to separate function..Mar 12 2021, 7:30 AM

sdesmalen added reviewers: c-rhodes, fhahn, dmgreen, gilr, evandro.Mar 12 2021, 7:33 AM

sdesmalen mentioned this in D96022: [LoopVectorize] NFC: Split off clamping from computeFeasibleUserVF into its own function..

sdesmalen mentioned this in D96023: [LoopVectorize] Calculate Max Feasible Scalable VF..Mar 12 2021, 7:35 AM

Following discussion on D96021, this patch combines functionality in D96021, D96022 and D96023, and is aligns more closely with the approach outlined by @fhahn.
It does mean the patch is quite a bit bigger and therefore more difficult to review. I'm happy to split off parts of it into separate patches if that's useful.

sdesmalen mentioned this in D96546: [LoopVectorize] NFCI: BuildVPlansWithVPRecipes to include ScalableVFs..Mar 12 2021, 7:40 AM

sdesmalen mentioned this in D96025: [LoopVectorize] Return both fixed and scalable Max VF from computeMaxVF..

Harbormaster completed remote builds in B93499: Diff 330245.Mar 12 2021, 9:10 AM

vkmr added a subscriber: vkmr.Mar 15 2021, 3:54 AM

sdesmalen added a child revision: D98721: [LV] NFC: Return both fixed and scalable Max VF from computeMaxVF..Mar 16 2021, 9:28 AM

sdesmalen added a reviewer: paulwalker-arm.

paulwalker-arm added inline comments.Mar 17 2021, 3:19 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5818–5821	This doesn't look particularly clean to my eye. I guess my immediate question is why `WidestRegister` is not a `TypeSize` given that would truly represent the size. The same question can be asked of `getRegisterBitWidth` but perhaps that is a heavily used function? If so then what about introducing an explicit `getVectorRegisterBitWidth` that returns TypeSize plus allowing it to take a parameter that specifies whether the query is against fixed or scalable vectors. I can see why you're asking for `ScalableBitsPerBlock` because you want to make sure you're able to safely divide this by the largest element count, but here I think using `TypeSize.isKnownMultipleOf(SizeOfLargestElt)` makes that intent clearer for all vector types.

Rebased onto D98874, which changes getRegisterBitWidth to return a TypeSize.

sdesmalen marked an inline comment as done.Mar 18 2021, 9:42 AM

sdesmalen added a parent revision: D98874: [TTI] Return a TypeSize from getRegisterBitWidth..

Harbormaster completed remote builds in B94485: Diff 331592.Mar 18 2021, 10:21 AM

david-arm added a subscriber: david-arm.Mar 26 2021, 9:15 AM

david-arm added inline comments.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
138 ↗	(On Diff #331592)	Hi @sdesmalen, to be honest I'm not sure I fully understand this. If the user has set the pragma: #pragma clang loop vectorize_width(4, scalable) then haven't they also explicitly disabled fixed width? Maybe it's just the name of the function that confuses me a little, since from the user's perspective it feels like #pragma clang loop vectorize_width(4, scalable) and #pragma clang loop vectorize_width(scalable) are both disabling fixed width. I thought they were both hints that can be dropped by the compiler if necessary?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1624	Should this be `element count`, since that's what the function returns here?
5641	Isn't this always going to fire for cases where UserVF is scalable and we cannot vectorise reductions? For example, getMaxLegalScalableVF returns ElementCount::getScalable(0) in this case.
5684	Are you deliberately ignoring the scalable case for now because this will be addressed in a later patch?
5839	You could also do if (MaxVectorSize.isZero())

c-rhodes added inline comments.Mar 30 2021, 5:27 AM

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
126 ↗	(On Diff #331592)	nit: explicitly disabled (?)
llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h
177	nit: `Reports an informative vectorization message:`
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1067–1068	drop failure
5581	nit: comma can be dropped
5678	nit: MaxVF?
5832–5835	`MaxVF`?
llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll
185–188	Should this select the feasible scalable VF? I suppose this is related to the issue @david-arm pointed out in `LoopVectorizationCostModel::computeFeasibleMaxVF`

fhahn added inline comments.Mar 31 2021, 2:16 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
933 ↗	(On Diff #331592)	This is not needed any longer, right?
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
138 ↗	(On Diff #331592)	Would it be possible to decouple the change to compute the max VFs from the ones adding a new option & tweaking the handling here?

Dropped -scalable-vectorization= LV option from this patch.
Addressed other comments.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
138 ↗	(On Diff #331592)	I've removed this code for now, as I'll have a follow-up patch that adds an option to enable/disable scalable vectorization separately.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5641	The statement above `if (ElementCount::isKnownLE(UserVF, MaxSafeUserVF)) return UserVF` should make sure that this assert is always true.
5684	Correct!
llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll
185–188	I don't think it should do that by default. Personally I think it's best if the vectorizer just ignores hints that are not valid, and calculate a better answer instead, rather than blindly opting for something that seems "close enough", but may not be profitable. That's akin to what happens for fixed-width vectors; if the UserVF is not legal (e.g. VF=16), and VF=8 would be legal, it will pick the most profitable VF <= 8, which isn't necessarily 8. Extending this to scalable vectors, we can argue that if `VF=vscale x 8` is not legal, it should consider all VFs (fixed and scalable) up to whatever is legal. We can then add an LV option to favour scalable VFs if we know the cost-model doesn't have all the information to make the right decision (although that's something to consider for another patch).

Thanks for all the comments and feedback!

llvm/include/llvm/Analysis/TargetTransformInfo.h
933 ↗	(On Diff #331592)	You're right, I've removed that now.

Harbormaster completed remote builds in B97718: Diff 336093.Apr 8 2021, 7:48 AM

c-rhodes added inline comments.Apr 13 2021, 3:52 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5579	Flip this and return early?
5596–5598	should this be moved closer to where it's used?
5666–5668	could you add a comment stating this is ignored for now but will be addressed in a later patch?
5835	I know this came before your patch but I find it a bit confusing VF and VectorSize are use interchangeably
llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-analysis.ll
3	nit: multiple spaces
33	nit: add newline above

Addressed nits.
s/MaxVectorSize/MaxVectorElementCount/ (because it's not actually a size, but an element count).

sdesmalen added inline comments.Apr 13 2021, 1:28 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5596–5598	This is actually moved as close as possible, because the first use is WidestType (line 5591).
5835	I see how this is confusing, especially because the VectorSize is actually not a size, but rather an element count. The subtle distinction between the two is that the VF is more of a loop-vectorizer concept (the number of lanes handled per vector iteration), whereas the MaxVectorSize (or better yet, MaxVectorElementCount) is the maximum number of elements in a target's vector register. Given that I'm changing a lot of lines in this function, it probably doesn't really make the diff much worse if I at least remove part of the confusion by renaming VectorSize -> VectorElementCount.

Harbormaster completed remote builds in B98548: Diff 337246.Apr 13 2021, 3:53 PM

LGTM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5596–5598	This is actually moved as close as possible, because the first use is WidestType (line 5591).

This revision is now accepted and ready to land.Apr 14 2021, 2:49 AM

fhahn added inline comments.Apr 16 2021, 2:13 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1624	the other function refer to as VF factor as well, even though they return ElementCount. Using VF (or `vectorization factor`) seems more consistent with the rest of the code and also more descriptive I think, as it conveys some information on what the intended use is.
1626	The other related functions refer to VF instead of element count in their name. Is there a reason to use ElementCount instead of VF here? Perhaps the name could be made a bit more descriptive in general, including the fact that the function computes the max VF based on target-specific properties?

Matt added a subscriber: Matt.Apr 17 2021, 8:53 AM

s/getMaxVectorElementCount/getMaximizedVFForTarget/
Removed redundant call to isSafeForAnyVectorWidth

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1626	You're right that using VF is probably better/more consistent, I mostly chose ElementCount so not to avoid it with other MaxVF-related functions. I've now changed it to `getMaximizedVFForTarget`, does that make sense to you? I've chosen the term 'Maximized' instead of 'Max' to distinguish it from 'Maximum', as in: The LV tries to Maximize the VF based on the target's register width, but with a Maximum of `MaxSafeVF`.

Harbormaster completed remote builds in B100004: Diff 339261.Apr 21 2021, 9:12 AM

I think I've addressed all comments now. Unless there are any other, I'll commit the patch in the coming days.

LGTM, thanks for the updates!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1626	Sounds good to me, but I am no expert of the English language :)

fhahn mentioned this in D96997: [LV] Compute max scalable & fixed VFs up front, then apply them. (WIP).Apr 27 2021, 4:17 AM

This revision was landed with ongoing or failed builds.Apr 28 2021, 4:33 AM

Closed by commit rG584e9b6e4b49: [LV] Calculate max feasible scalable VF. (authored by sdesmalen). · Explain Why

This revision was automatically updated to reflect the committed changes.

sdesmalen added a commit: rG584e9b6e4b49: [LV] Calculate max feasible scalable VF..

Thank you for all the input/feedback/reviews @fhahn and @c-rhodes!

Reverting this patch locally resolves the following failure (and many more) for me:
https://lab.llvm.org/buildbot/#/builders/140/builds/2/steps/5/logs/FAIL__Clang__loop-vectorize_c

In D98509#2724684, @hubert.reinterpretcast wrote:

Reverting this patch locally resolves the following failure (and many more) for me:
https://lab.llvm.org/buildbot/#/builders/140/builds/2/steps/5/logs/FAIL__Clang__loop-vectorize_c

Trying to figure out if this patch is running into UB. Disabling optimizations in the build seems to resolve these failures.

In D98509#2724752, @hubert.reinterpretcast wrote:

In D98509#2724684, @hubert.reinterpretcast wrote:

Reverting this patch locally resolves the following failure (and many more) for me:
https://lab.llvm.org/buildbot/#/builders/140/builds/2/steps/5/logs/FAIL__Clang__loop-vectorize_c

Trying to figure out if this patch is running into UB. Disabling optimizations in the build seems to resolve these failure

Hi @hubert.reinterpretcast, I've built this patch with UBSan but that didn't point out any issues. Do you have any suggestions on how to reproduce this locally? (I see that buildbot is using some IBM xlclang (proprietary?) compiler to build)

In D98509#2725326, @sdesmalen wrote:

Hi @hubert.reinterpretcast, I've built this patch with UBSan but that didn't point out any issues. Do you have any suggestions on how to reproduce this locally? (I see that buildbot is using some IBM xlclang (proprietary?) compiler to build)

I'm actively working on diagnosing this. Would reverting this patch for a few days be much trouble for you? We're hoping to keep our CI green so it's easier to notice new problems.

sdesmalen added a reverting change: rG51d648c119d7: Revert "[LV] Calculate max feasible scalable VF.".Apr 29 2021, 8:05 AM

In D98509#2725542, @hubert.reinterpretcast wrote:

In D98509#2725326, @sdesmalen wrote:

Hi @hubert.reinterpretcast, I've built this patch with UBSan but that didn't point out any issues. Do you have any suggestions on how to reproduce this locally? (I see that buildbot is using some IBM xlclang (proprietary?) compiler to build)

I'm actively working on diagnosing this.

Thanks for looking into this!

Would reverting this patch for a few days be much trouble for you? We're hoping to keep our CI green so it's easier to notice new problems.

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

In D98509#2725672, @sdesmalen wrote:

Would reverting this patch for a few days be much trouble for you? We're hoping to keep our CI green so it's easier to notice new problems.

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

In D98509#2725892, @hubert.reinterpretcast wrote:

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

Maybe a paranoid step, but I confirmed that the impact is from the compilation of lib/Transforms/Vectorize/CMakeFiles/LLVMVectorize.dir/LoopVectorize.cpp.o. Proceeding with cutting down from there.

In D98509#2725892, @hubert.reinterpretcast wrote:

In D98509#2725672, @sdesmalen wrote:

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 2b413fc4950..ec3eb32cbe6 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1639,7 +1639,7 @@ private:
   ElementCount getMaximizedVFForTarget(unsigned ConstTripCount,
                                        unsigned SmallestType,
                                        unsigned WidestType,
-                                       ElementCount MaxSafeVF);
+                                       const ElementCount &MaxSafeVF);

   /// \return the maximum legal scalable VF, based on the safe max number
   /// of elements.
@@ -5863,7 +5863,7 @@ LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {

 ElementCount LoopVectorizationCostModel::getMaximizedVFForTarget(
     unsigned ConstTripCount, unsigned SmallestType, unsigned WidestType,
-    ElementCount MaxSafeVF) {
+    const ElementCount &MaxSafeVF) {
   bool ComputeScalableMaxVF = MaxSafeVF.isScalable();
   TypeSize WidestRegister = TTI.getRegisterBitWidth(
       ComputeScalableMaxVF ? TargetTransformInfo::RGK_ScalableVector

In D98509#2732628, @hubert.reinterpretcast wrote:

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

Hi @hubert.reinterpretcast, thanks for your investigation so far! I have gone through the code several times, but don't really understand why your suggestion fixes the issue. I've also built it with both GCC and Clang, with- and without sanitizers, and with valgrind.

Having said that, I'd be okay with applying your suggestion for the sake of re-landing the patch and not breaking a specific buildbot, but I would like to aim for a concrete date to remove the work-around. What do you think is sensible here?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5824	@hubert.reinterpretcast Here MinVF returns a `const ElementCount &` instead of `ElementCount`, not sure if that could make a difference?

sdesmalen mentioned this in rG9931ae645eb4: Reland "[LV] Calculate max feasible scalable VF.".May 4 2021, 7:46 AM

In D98509#2732628, @hubert.reinterpretcast wrote:

In D98509#2725892, @hubert.reinterpretcast wrote:

In D98509#2725672, @sdesmalen wrote:

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

@hubert.reinterpretcast can you provide an IR test case that triggers the failure with sanitizers?

In D98509#2736317, @fhahn wrote:

In D98509#2732628, @hubert.reinterpretcast wrote:

In D98509#2725892, @hubert.reinterpretcast wrote:

In D98509#2725672, @sdesmalen wrote:

Reverting the patch for a few days is fine, I've pushed rG51d648c119d7773ce6fb809353bd6bd14bca8818 for now.

Thank you! Our internal CI is green again. I'll update with the investigation progress once it gets to a meaningful point.

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

@hubert.reinterpretcast can you provide an IR test case that triggers the failure with sanitizers?

The build compiler on the specific bot (due to what is currently shipping on the platform) is a proprietary compiler that does not use LLVM IR. We have not reached that point where we can confirm that this is a miscompile issue in the build compiler, but I think there is every chance that would be the conclusion (now that the reduction has reached the point where it is -- we have also encountered latent UB in LLVM source in the past). Prior issues of this sort have taken around two weeks to fix after it is reported to the appropriate team (which is delayed while I'm putting together a reduction). I had spent my whole weekend narrowing this down. I am not sure how much time I will have during the week to make further progress. I think we will be able to deploy a fix to the buildbot in early June.

In D98509#2736263, @sdesmalen wrote:

In D98509#2732628, @hubert.reinterpretcast wrote:

I haven't reached the root cause yet, but it appears that the issue we're hitting can be avoided if the MaxSafeVF parameter is made pass-by-reference instead of pass-by-value.
@sdesmalen, I hope the following is not too intrusive a change to your patch.

Hi @hubert.reinterpretcast, thanks for your investigation so far! I have gone through the code several times, but don't really understand why your suggestion fixes the issue. I've also built it with both GCC and Clang, with- and without sanitizers, and with valgrind.

Thanks for looking into this from your end. I feel terrible that I wasn't more clear that a build compiler issue is looking to be likely.

In D98509#2737470, @hubert.reinterpretcast wrote:

Thanks for looking into this from your end. I feel terrible that I wasn't more clear that a build compiler issue is looking to be likely.

No worries, at least we have come up with a practical workaround for now. Please do let me know when this issue will be fixed in the compiler used by Buildbot, so that I can remove the workaround.

Thanks for all the effort you have put into investigating this!

jeroen.dobbelaere added a subscriber: jeroen.dobbelaere.Jul 12 2021, 4:30 AM

Disabling scalable vectorization, because target does not support scalable vectors. should not be emitted for non-aarch64 non-riscv.
Probably also a good idea not to emit it when sve is disabled on aarch64.

% cat a.cc
float A[128], B[128], C[128];

float foo() {
  float sum = 0;
  for (int i = 0; i < 128; i++)
    C[i] = A[i] + B[i];
  return sum;
}
% clang++ -O2 --target=x86_64-linux-gnu -S a.cc -Rpass=loop-vectorize -Rpass-missed=loop-vectorize -Rpass-analysis=loop-vectorize
a.cc:5:3: remark: Disabling scalable vectorization, because target does not support scalable vectors. [-Rpass-analysis=loop-vectorize]
  for (int i = 0; i < 128; i++)
  ^
a.cc:5:3: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]

Patch: D108004

MaskRay mentioned this in D108004: [LoopVectorize] Convert scalable vectorization optimization remark to LLVM_DEBUG.Aug 12 2021, 3:35 PM

hubert.reinterpretcast mentioned this in rG5efb380c263c: [NFC] Undo AIX build compiler workaround.Jun 13 2022, 2:00 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Vectorize/

LoopVectorize.h

7 lines

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

339 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-reductions.ll

42 lines

scalable-vf-analysis.ll

149 lines

scalable-vf-hint.ll

59 lines

scalable-vf-hint.ll

4 lines

Diff 337246

llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h

	Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
	/// Reports a vectorization failure: print \p DebugMsg for debugging			/// Reports a vectorization failure: print \p DebugMsg for debugging
	/// purposes along with the corresponding optimization remark \p RemarkName.			/// purposes along with the corresponding optimization remark \p RemarkName.
	/// If \p I is passed, it is an instruction that prevents vectorization.			/// If \p I is passed, it is an instruction that prevents vectorization.
	/// Otherwise, the loop \p TheLoop is used for the location of the remark.			/// Otherwise, the loop \p TheLoop is used for the location of the remark.
	void reportVectorizationFailure(const StringRef DebugMsg,			void reportVectorizationFailure(const StringRef DebugMsg,
	const StringRef OREMsg, const StringRef ORETag,			const StringRef OREMsg, const StringRef ORETag,
	OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I = nullptr);			OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I = nullptr);

				/// Reports an informative message: print \p Msg for debugging purposes as well
				c-rhodesUnsubmitted Done Reply Inline Actions nit: `Reports an informative vectorization message:` c-rhodes: nit: `Reports an informative vectorization message:`
				/// as an optimization remark. Uses either \p I as location of the remark, or
				/// otherwise \p TheLoop.
				void reportVectorizationInfo(const StringRef OREMsg, const StringRef ORETag,
				OptimizationRemarkEmitter ORE, Loop TheLoop,
				Instruction *I = nullptr);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZE_H			#endif // LLVM_TRANSFORMS_VECTORIZE_LOOPVECTORIZE_H

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,058 Lines • ▼ Show 20 Lines	if (DIL && Inst->getFunction()->isDebugInfoForProfiling() &&
<< "Failed to create new discriminator: "		<< "Failed to create new discriminator: "
<< DIL->getFilename() << " Line: " << DIL->getLine());		<< DIL->getFilename() << " Line: " << DIL->getLine());
}		}
else		else
B.SetCurrentDebugLocation(DIL);		B.SetCurrentDebugLocation(DIL);
} else		} else
B.SetCurrentDebugLocation(DebugLoc());		B.SetCurrentDebugLocation(DebugLoc());
}		}

/// Write a record \p DebugMsg about vectorization failure to the debug		/// Write a \p DebugMsg about vectorization to the debug output stream. If \p I
		c-rhodesUnsubmitted Done Reply Inline Actions drop failure c-rhodes: drop failure
/// output stream. If \p I is passed, it is an instruction that prevents		/// is passed, the message relates to that particular instruction.
/// vectorization.
#ifndef NDEBUG		#ifndef NDEBUG
static void debugVectorizationFailure(const StringRef DebugMsg,		static void debugVectorizationMessage(const StringRef Prefix,
		const StringRef DebugMsg,
Instruction *I) {		Instruction *I) {
dbgs() << "LV: Not vectorizing: " << DebugMsg;		dbgs() << "LV: " << Prefix << DebugMsg;
if (I != nullptr)		if (I != nullptr)
dbgs() << " " << *I;		dbgs() << " " << *I;
else		else
dbgs() << '.';		dbgs() << '.';
dbgs() << '\n';		dbgs() << '\n';
}		}
#endif		#endif

Show All 12 Lines	static OptimizationRemarkAnalysis createLVAnalysis(const char *PassName,
if (I) {		if (I) {
CodeRegion = I->getParent();		CodeRegion = I->getParent();
// If there is no debug location attached to the instruction, revert back to		// If there is no debug location attached to the instruction, revert back to
// using the loop's.		// using the loop's.
if (I->getDebugLoc())		if (I->getDebugLoc())
DL = I->getDebugLoc();		DL = I->getDebugLoc();
}		}

OptimizationRemarkAnalysis R(PassName, RemarkName, DL, CodeRegion);		return OptimizationRemarkAnalysis(PassName, RemarkName, DL, CodeRegion);
R << "loop not vectorized: ";
return R;
}		}

/// Return a value for Step multiplied by VF.		/// Return a value for Step multiplied by VF.
static Value createStepForVF(IRBuilder<> &B, Constant Step, ElementCount VF) {		static Value createStepForVF(IRBuilder<> &B, Constant Step, ElementCount VF) {
assert(isa<ConstantInt>(Step) && "Expected an integer step");		assert(isa<ConstantInt>(Step) && "Expected an integer step");
Constant *StepVal = ConstantInt::get(		Constant *StepVal = ConstantInt::get(
Step->getType(),		Step->getType(),
cast<ConstantInt>(Step)->getSExtValue() * VF.getKnownMinValue());		cast<ConstantInt>(Step)->getSExtValue() * VF.getKnownMinValue());
return VF.isScalable() ? B.CreateVScale(StepVal) : StepVal;		return VF.isScalable() ? B.CreateVScale(StepVal) : StepVal;
}		}

namespace llvm {		namespace llvm {

void reportVectorizationFailure(const StringRef DebugMsg,		void reportVectorizationFailure(const StringRef DebugMsg,
const StringRef OREMsg, const StringRef ORETag,		const StringRef OREMsg, const StringRef ORETag,
OptimizationRemarkEmitter ORE, Loop TheLoop, Instruction *I) {		OptimizationRemarkEmitter ORE, Loop TheLoop,
LLVM_DEBUG(debugVectorizationFailure(DebugMsg, I));		Instruction *I) {
		LLVM_DEBUG(debugVectorizationMessage("Not vectorizing: ", DebugMsg, I));
LoopVectorizeHints Hints(TheLoop, true /* doesn't matter /, ORE);		LoopVectorizeHints Hints(TheLoop, true /* doesn't matter /, ORE);
ORE->emit(createLVAnalysis(Hints.vectorizeAnalysisPassName(),		ORE->emit(
ORETag, TheLoop, I) << OREMsg);		createLVAnalysis(Hints.vectorizeAnalysisPassName(), ORETag, TheLoop, I)
		<< "loop not vectorized: " << OREMsg);
		}

		void reportVectorizationInfo(const StringRef Msg, const StringRef ORETag,
		OptimizationRemarkEmitter ORE, Loop TheLoop,
		Instruction *I) {
		LLVM_DEBUG(debugVectorizationMessage("", Msg, I));
		LoopVectorizeHints Hints(TheLoop, true /* doesn't matter /, ORE);
		ORE->emit(
		createLVAnalysis(Hints.vectorizeAnalysisPassName(), ORETag, TheLoop, I)
		<< Msg);
}		}

} // end namespace llvm		} // end namespace llvm

#ifndef NDEBUG		#ifndef NDEBUG
/// \return string containing a file name and a line # for the given loop.		/// \return string containing a file name and a line # for the given loop.
static std::string getDebugLocString(const Loop *L) {		static std::string getDebugLocString(const Loop *L) {
std::string Result;		std::string Result;
▲ Show 20 Lines • Show All 472 Lines • ▼ Show 20 Lines	private:
unsigned NumPredStores = 0;		unsigned NumPredStores = 0;

/// \return An upper bound for the vectorization factor, a power-of-2 larger		/// \return An upper bound for the vectorization factor, a power-of-2 larger
/// than zero. One is returned if vectorization should best be avoided due		/// than zero. One is returned if vectorization should best be avoided due
/// to cost.		/// to cost.
ElementCount computeFeasibleMaxVF(unsigned ConstTripCount,		ElementCount computeFeasibleMaxVF(unsigned ConstTripCount,
ElementCount UserVF);		ElementCount UserVF);

		/// \return the maximum element count based on the target's vector registers,
		david-armUnsubmitted Done Reply Inline Actions Should this be `element count`, since that's what the function returns here? david-arm: Should this be `element count`, since that's what the function returns here?
		fhahnUnsubmitted Not Done Reply Inline Actions the other function refer to as VF factor as well, even though they return ElementCount. Using VF (or `vectorization factor`) seems more consistent with the rest of the code and also more descriptive I think, as it conveys some information on what the intended use is. fhahn: the other function refer to as VF factor as well, even though they return ElementCount. Using…
		/// limited to MaxVF. This is a helper function of computeFeasibleMaxVF.
		ElementCount getMaxVectorElementCount(unsigned ConstTripCount,
		fhahnUnsubmitted Not Done Reply Inline Actions The other related functions refer to VF instead of element count in their name. Is there a reason to use ElementCount instead of VF here? Perhaps the name could be made a bit more descriptive in general, including the fact that the function computes the max VF based on target-specific properties? fhahn: The other related functions refer to VF instead of element count in their name. Is there a…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions You're right that using VF is probably better/more consistent, I mostly chose ElementCount so not to avoid it with other MaxVF-related functions. I've now changed it to `getMaximizedVFForTarget`, does that make sense to you? I've chosen the term 'Maximized' instead of 'Max' to distinguish it from 'Maximum', as in: The LV tries to Maximize the VF based on the target's register width, but with a Maximum of `MaxSafeVF`. sdesmalen: You're right that using VF is probably better/more consistent, I mostly chose ElementCount so…
		fhahnUnsubmitted Not Done Reply Inline Actions Sounds good to me, but I am no expert of the English language :) fhahn: Sounds good to me, but I am no expert of the English language :)
		unsigned SmallestType,
		unsigned WidestType,
		ElementCount MaxVF);

		/// \return the maximum legal scalable VF, based on the safe max number
		/// of elements.
		ElementCount getMaxLegalScalableVF(unsigned MaxSafeElements);

/// The vectorization cost is a combination of the cost itself and a boolean		/// The vectorization cost is a combination of the cost itself and a boolean
/// indicating whether any of the contributing operations will actually		/// indicating whether any of the contributing operations will actually
/// operate on		/// operate on
/// vector values after type legalization in the backend. If this latter value		/// vector values after type legalization in the backend. If this latter value
/// is		/// is
/// false, then all operations will be scalarized (i.e. no vectorization has		/// false, then all operations will be scalarized (i.e. no vectorization has
/// actually taken place).		/// actually taken place).
using VectorizationCostTy = std::pair<InstructionCost, bool>;		using VectorizationCostTy = std::pair<InstructionCost, bool>;
▲ Show 20 Lines • Show All 3,901 Lines • ▼ Show 20 Lines	reportVectorizationFailure("Runtime stride check for small trip count",
"this loop without such check by compiling with -Os/-Oz",		"this loop without such check by compiling with -Os/-Oz",
"CantVersionLoopWithOptForSize", ORE, TheLoop);		"CantVersionLoopWithOptForSize", ORE, TheLoop);
return true;		return true;
}		}

return false;		return false;
}		}

		ElementCount
		LoopVectorizationCostModel::getMaxLegalScalableVF(unsigned MaxSafeElements) {
		if (!TTI.supportsScalableVectors() && !ForceTargetSupportsScalableVectors) {
		reportVectorizationInfo(
		"Disabling scalable vectorization, because target does not "
		"support scalable vectors.",
		"ScalableVectorsUnsupported", ORE, TheLoop);
		return ElementCount::getScalable(0);
		}

		auto MaxScalableVF = ElementCount::getScalable(1u << 16);

		// Disable scalable vectorization if the loop contains unsupported reductions.
		// Test that the loop-vectorizer can legalize all operations for this MaxVF.
		// FIXME: While for scalable vectors this is currently sufficient, this should
		// be replaced by a more detailed mechanism that filters out specific VFs,
		// instead of invalidating vectorization for a whole set of VFs based on the
		// MaxVF.
		if (!canVectorizeReductions(MaxScalableVF)) {
		reportVectorizationInfo(
		"Scalable vectorization not supported for the reduction "
		"operations found in this loop.",
		"ScalableVFUnfeasible", ORE, TheLoop);
		return ElementCount::getScalable(0);
		}

		if (Legal->isSafeForAnyVectorWidth())
		return MaxScalableVF;
		c-rhodesUnsubmitted Done Reply Inline Actions Flip this and return early? c-rhodes: Flip this and return early?

		// Limit MaxScalableVF by the maximum safe dependence distance.
		c-rhodesUnsubmitted Done Reply Inline Actions nit: comma can be dropped c-rhodes: nit: comma can be dropped
		Optional<unsigned> MaxVScale = TTI.getMaxVScale();
		MaxScalableVF = ElementCount::getScalable(
		MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);
		if (!MaxScalableVF)
		reportVectorizationInfo(
		"Max legal vector width too small, scalable vectorization "
		"unfeasible.",
		"ScalableVFUnfeasible", ORE, TheLoop);

		return MaxScalableVF;
		}

		ElementCount
		LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount,
		ElementCount UserVF) {
		MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
		unsigned SmallestType, WidestType;
		c-rhodesUnsubmitted Done Reply Inline Actions should this be moved closer to where it's used? c-rhodes: should this be moved closer to where it's used?
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions This is actually moved as close as possible, because the first use is WidestType (line 5591). sdesmalen: This is actually moved as close as possible, because the first use is WidestType (line 5591).
		c-rhodesUnsubmitted Not Done Reply Inline Actions This is actually moved as close as possible, because the first use is WidestType (line 5591). c-rhodes: > This is actually moved as close as possible, because the first use is WidestType (line 5591).
		std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();

		// Get the maximum safe dependence distance in bits computed by LAA.
		// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
		// the memory accesses that is most restrictive (involved in the smallest
		// dependence distance).
		unsigned MaxSafeElements =
		PowerOf2Floor(Legal->getMaxSafeVectorWidthInBits() / WidestType);

		auto MaxSafeFixedVF = ElementCount::getFixed(MaxSafeElements);
		auto MaxSafeScalableVF = getMaxLegalScalableVF(MaxSafeElements);

		LLVM_DEBUG(dbgs() << "LV: The max safe fixed VF is: " << MaxSafeFixedVF
		<< ".\n");
		LLVM_DEBUG(dbgs() << "LV: The max safe scalable VF is: " << MaxSafeScalableVF
		<< ".\n");

		// First analyze the UserVF, fall back if the UserVF should be ignored.
		if (UserVF) {
		auto MaxSafeUserVF =
		UserVF.isScalable() ? MaxSafeScalableVF : MaxSafeFixedVF;

		if (ElementCount::isKnownLE(UserVF, MaxSafeUserVF))
		return UserVF;

		assert(ElementCount::isKnownGT(UserVF, MaxSafeUserVF));

		// Only clamp if the UserVF is not scalable. If the UserVF is scalable, it
		// is better to ignore the hint and let the compiler choose a suitable VF.
		if (!UserVF.isScalable()) {
		LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF
		<< " is unsafe, clamping to max safe VF="
		<< MaxSafeFixedVF << ".\n");
		ORE->emit([&]() {
		return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationFactor",
		TheLoop->getStartLoc(),
		TheLoop->getHeader())
		<< "User-specified vectorization factor "
		<< ore::NV("UserVectorizationFactor", UserVF)
		<< " is unsafe, clamping to maximum safe vectorization factor "
		<< ore::NV("VectorizationFactor", MaxSafeFixedVF);
		});
		return MaxSafeFixedVF;
		david-armUnsubmitted Done Reply Inline Actions Isn't this always going to fire for cases where UserVF is scalable and we cannot vectorise reductions? For example, getMaxLegalScalableVF returns ElementCount::getScalable(0) in this case. david-arm: Isn't this always going to fire for cases where UserVF is scalable and we cannot vectorise…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions The statement above `if (ElementCount::isKnownLE(UserVF, MaxSafeUserVF)) return UserVF` should make sure that this assert is always true. sdesmalen: The statement above `if (ElementCount::isKnownLE(UserVF, MaxSafeUserVF)) return UserVF` should…
		}

		LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF
		<< " is unsafe. Ignoring scalable UserVF.\n");
		ORE->emit([&]() {
		return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationFactor",
		TheLoop->getStartLoc(),
		TheLoop->getHeader())
		<< "User-specified vectorization factor "
		<< ore::NV("UserVectorizationFactor", UserVF)
		<< " is unsafe. Ignoring the hint to let the compiler pick a "
		"suitable VF.";
		});
		}

		LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType
		<< " / " << WidestType << " bits.\n");

		ElementCount MaxFixedVF = ElementCount::getFixed(1);
		if (auto MaxVF = getMaxVectorElementCount(ConstTripCount, SmallestType,
		WidestType, MaxSafeFixedVF))
		MaxFixedVF = MaxVF;

		if (auto MaxVF = getMaxVectorElementCount(ConstTripCount, SmallestType,
		WidestType, MaxSafeScalableVF))
		// FIXME: Return scalable VF as well (to be added in future patch).
		if (MaxVF.isScalable())
		c-rhodesUnsubmitted Done Reply Inline Actions could you add a comment stating this is ignored for now but will be addressed in a later patch? c-rhodes: could you add a comment stating this is ignored for now but will be addressed in a later patch?
		LLVM_DEBUG(dbgs() << "LV: Found feasible scalable VF = " << MaxVF
		<< "\n");

		return MaxFixedVF;
		}

Optional<ElementCount>		Optional<ElementCount>
LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {		LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
if (Legal->getRuntimePointerChecking()->Need && TTI.hasBranchDivergence()) {		if (Legal->getRuntimePointerChecking()->Need && TTI.hasBranchDivergence()) {
// TODO: It may by useful to do since it's still likely to be dynamically		// TODO: It may by useful to do since it's still likely to be dynamically
		c-rhodesUnsubmitted Done Reply Inline Actions nit: MaxVF? c-rhodes: nit: MaxVF?
// uniform if the target can skip.		// uniform if the target can skip.
reportVectorizationFailure(		reportVectorizationFailure(
"Not inserting runtime ptr check for divergent target",		"Not inserting runtime ptr check for divergent target",
"runtime pointer checks needed. Not enabled for divergent target",		"runtime pointer checks needed. Not enabled for divergent target",
"CantVersionLoopWithDivergentTarget", ORE, TheLoop);		"CantVersionLoopWithDivergentTarget", ORE, TheLoop);
return None;		return None;
		david-armUnsubmitted Done Reply Inline Actions Are you deliberately ignoring the scalable case for now because this will be addressed in a later patch? david-arm: Are you deliberately ignoring the scalable case for now because this will be addressed in a…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions Correct! sdesmalen: Correct!
}		}

unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);		unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');		LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
if (TC == 1) {		if (TC == 1) {
reportVectorizationFailure("Single iteration (non) loop",		reportVectorizationFailure("Single iteration (non) loop",
"loop trip count is one, irrelevant for vectorization",		"loop trip count is one, irrelevant for vectorization",
"SingleIterationLoop", ORE, TheLoop);		"SingleIterationLoop", ORE, TheLoop);
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	reportVectorizationFailure(
"Cannot optimize for size and vectorize at the same time.",		"Cannot optimize for size and vectorize at the same time.",
"cannot optimize for size and vectorize at the same time. "		"cannot optimize for size and vectorize at the same time. "
"Enable vectorization of this loop with '#pragma clang loop "		"Enable vectorization of this loop with '#pragma clang loop "
"vectorize(enable)' when compiling with -Os/-Oz",		"vectorize(enable)' when compiling with -Os/-Oz",
"NoTailLoopWithOptForSize", ORE, TheLoop);		"NoTailLoopWithOptForSize", ORE, TheLoop);
return None;		return None;
}		}

ElementCount		ElementCount LoopVectorizationCostModel::getMaxVectorElementCount(
LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount,		unsigned ConstTripCount, unsigned SmallestType, unsigned WidestType,
ElementCount UserVF) {		ElementCount MaxSafeVF) {
bool IgnoreScalableUserVF = UserVF.isScalable() &&		bool ComputeScalableMaxVF = MaxSafeVF.isScalable();
!TTI.supportsScalableVectors() &&		TypeSize WidestRegister = TTI.getRegisterBitWidth(
!ForceTargetSupportsScalableVectors;		ComputeScalableMaxVF ? TargetTransformInfo::RGK_ScalableVector
if (IgnoreScalableUserVF) {		: TargetTransformInfo::RGK_FixedWidthVector);
		paulwalker-armUnsubmitted Done Reply Inline Actions This doesn't look particularly clean to my eye. I guess my immediate question is why `WidestRegister` is not a `TypeSize` given that would truly represent the size. The same question can be asked of `getRegisterBitWidth` but perhaps that is a heavily used function? If so then what about introducing an explicit `getVectorRegisterBitWidth` that returns TypeSize plus allowing it to take a parameter that specifies whether the query is against fixed or scalable vectors. I can see why you're asking for `ScalableBitsPerBlock` because you want to make sure you're able to safely divide this by the largest element count, but here I think using `TypeSize.isKnownMultipleOf(SizeOfLargestElt)` makes that intent clearer for all vector types. paulwalker-arm: This doesn't look particularly clean to my eye. I guess my immediate question is why…
LLVM_DEBUG(
dbgs() << "LV: Ignoring VF=" << UserVF		// Convenience function to return the minimum of two ElementCounts.
<< " because target does not support scalable vectors.\n");		auto MinVF = [](const ElementCount &LHS, const ElementCount &RHS) {
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions @hubert.reinterpretcast Here MinVF returns a `const ElementCount &` instead of `ElementCount`, not sure if that could make a difference? sdesmalen: @hubert.reinterpretcast Here MinVF returns a `const ElementCount &` instead of `ElementCount`…
ORE->emit([&]() {		assert((LHS.isScalable() == RHS.isScalable()) &&
return OptimizationRemarkAnalysis(DEBUG_TYPE, "IgnoreScalableUserVF",		"Scalable flags must match");
TheLoop->getStartLoc(),		return ElementCount::isKnownLT(LHS, RHS) ? LHS : RHS;
TheLoop->getHeader())		};
<< "Ignoring VF=" << ore::NV("UserVF", UserVF)
<< " because target does not support scalable vectors.";
});
}

// Beyond this point two scenarios are handled. If UserVF isn't specified
// then a suitable VF is chosen. If UserVF is specified and there are
// dependencies, check if it's legal. However, if a UserVF is specified and
// there are no dependencies, then there's nothing to do.
if (UserVF.isNonZero() && !IgnoreScalableUserVF) {
if (!canVectorizeReductions(UserVF)) {
reportVectorizationFailure(
"LV: Scalable vectorization not supported for the reduction "
"operations found in this loop. Using fixed-width "
"vectorization instead.",
"Scalable vectorization not supported for the reduction operations "
"found in this loop. Using fixed-width vectorization instead.",
"ScalableVFUnfeasible", ORE, TheLoop);
return computeFeasibleMaxVF(
ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));
}

if (Legal->isSafeForAnyVectorWidth())
return UserVF;
}

MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
unsigned SmallestType, WidestType;
std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
unsigned WidestRegister =
TTI.getRegisterBitWidth(TargetTransformInfo::RGK_FixedWidthVector)
.getFixedSize();

// Get the maximum safe dependence distance in bits computed by LAA.
// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
// the memory accesses that is most restrictive (involved in the smallest
// dependence distance).
unsigned MaxSafeVectorWidthInBits = Legal->getMaxSafeVectorWidthInBits();

// If the user vectorization factor is legally unsafe, clamp it to a safe
// value. Otherwise, return as is.
if (UserVF.isNonZero() && !IgnoreScalableUserVF) {
unsigned MaxSafeElements =
PowerOf2Floor(MaxSafeVectorWidthInBits / WidestType);
ElementCount MaxSafeVF = ElementCount::getFixed(MaxSafeElements);

if (UserVF.isScalable()) {
Optional<unsigned> MaxVScale = TTI.getMaxVScale();

// Scale VF by vscale before checking if it's safe.
MaxSafeVF = ElementCount::getScalable(
MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);

if (MaxSafeVF.isZero()) {
// The dependence distance is too small to use scalable vectors,
// fallback on fixed.
LLVM_DEBUG(
dbgs()
<< "LV: Max legal vector width too small, scalable vectorization "
"unfeasible. Using fixed-width vectorization instead.\n");
ORE->emit([&]() {
return OptimizationRemarkAnalysis(DEBUG_TYPE, "ScalableVFUnfeasible",
TheLoop->getStartLoc(),
TheLoop->getHeader())
<< "Max legal vector width too small, scalable vectorization "
<< "unfeasible. Using fixed-width vectorization instead.";
});
return computeFeasibleMaxVF(
ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));
}
}

LLVM_DEBUG(dbgs() << "LV: The max safe VF is: " << MaxSafeVF << ".\n");

if (ElementCount::isKnownLE(UserVF, MaxSafeVF))
return UserVF;

LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF
<< " is unsafe, clamping to max safe VF=" << MaxSafeVF
<< ".\n");
ORE->emit([&]() {
return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationFactor",
TheLoop->getStartLoc(),
TheLoop->getHeader())
<< "User-specified vectorization factor "
<< ore::NV("UserVectorizationFactor", UserVF)
<< " is unsafe, clamping to maximum safe vectorization factor "
<< ore::NV("VectorizationFactor", MaxSafeVF);
});
return MaxSafeVF;
}

WidestRegister = std::min(WidestRegister, MaxSafeVectorWidthInBits);

// Ensure MaxVF is a power of 2; the dependence distance bound may not be.		// Ensure MaxVF is a power of 2; the dependence distance bound may not be.
// Note that both WidestRegister and WidestType may not be a powers of 2.		// Note that both WidestRegister and WidestType may not be a powers of 2.
auto MaxVectorSize =		auto MaxVectorElementCount = ElementCount::get(
ElementCount::getFixed(PowerOf2Floor(WidestRegister / WidestType));		PowerOf2Floor(WidestRegister.getKnownMinSize() / WidestType),
		ComputeScalableMaxVF);
LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType		MaxVectorElementCount = MinVF(MaxVectorElementCount, MaxSafeVF);
		c-rhodesUnsubmitted Done Reply Inline Actions `MaxVF`? c-rhodes: `MaxVF`?
		c-rhodesUnsubmitted Done Reply Inline Actions I know this came before your patch but I find it a bit confusing VF and VectorSize are use interchangeably c-rhodes: I know this came before your patch but I find it a bit confusing VF and VectorSize are use…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions I see how this is confusing, especially because the VectorSize is actually not a size, but rather an element count. The subtle distinction between the two is that the VF is more of a loop-vectorizer concept (the number of lanes handled per vector iteration), whereas the MaxVectorSize (or better yet, MaxVectorElementCount) is the maximum number of elements in a target's vector register. Given that I'm changing a lot of lines in this function, it probably doesn't really make the diff much worse if I at least remove part of the confusion by renaming VectorSize -> VectorElementCount. sdesmalen: I see how this is confusing, especially because the VectorSize is actually not a size, but…
<< " / " << WidestType << " bits.\n");
LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "		LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "
<< WidestRegister << " bits.\n");		<< (MaxVectorElementCount * WidestType) << " bits.\n");

assert(MaxVectorSize.getFixedValue() <= WidestRegister &&		if (!MaxVectorElementCount) {
		david-armUnsubmitted Done Reply Inline Actions You could also do if (MaxVectorSize.isZero()) david-arm: You could also do if (MaxVectorSize.isZero())
"Did not expect to pack so many elements"
" into one vector!");
if (MaxVectorSize.getFixedValue() == 0) {
LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");		LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");
return ElementCount::getFixed(1);		return ElementCount::getFixed(1);
} else if (ConstTripCount && ConstTripCount < MaxVectorSize.getFixedValue() &&		}

		const auto TripCountEC = ElementCount::getFixed(ConstTripCount);
		if (ConstTripCount &&
		ElementCount::isKnownLE(TripCountEC, MaxVectorElementCount) &&
isPowerOf2_32(ConstTripCount)) {		isPowerOf2_32(ConstTripCount)) {
// We need to clamp the VF to be the ConstTripCount. There is no point in		// We need to clamp the VF to be the ConstTripCount. There is no point in
// choosing a higher viable VF as done in the loop below.		// choosing a higher viable VF as done in the loop below. If
		// MaxVectorElementCount is scalable, we only fall back on a fixed VF when
		// the TC is less than or equal to the known number of lanes.
LLVM_DEBUG(dbgs() << "LV: Clamping the MaxVF to the constant trip count: "		LLVM_DEBUG(dbgs() << "LV: Clamping the MaxVF to the constant trip count: "
<< ConstTripCount << "\n");		<< ConstTripCount << "\n");
return ElementCount::getFixed(ConstTripCount);		return TripCountEC;
}		}

ElementCount MaxVF = MaxVectorSize;		ElementCount MaxVF = MaxVectorElementCount;
if (TTI.shouldMaximizeVectorBandwidth(!isScalarEpilogueAllowed()) \|\|		if (TTI.shouldMaximizeVectorBandwidth(!isScalarEpilogueAllowed()) \|\|
(MaximizeBandwidth && isScalarEpilogueAllowed())) {		(MaximizeBandwidth && isScalarEpilogueAllowed())) {
		auto MaxVectorElementCountMaxBW = ElementCount::get(
		PowerOf2Floor(WidestRegister.getKnownMinSize() / SmallestType),
		ComputeScalableMaxVF);
		if (!Legal->isSafeForAnyVectorWidth())
		MaxVectorElementCountMaxBW = MinVF(MaxVectorElementCountMaxBW, MaxSafeVF);

// Collect all viable vectorization factors larger than the default MaxVF		// Collect all viable vectorization factors larger than the default MaxVF
// (i.e. MaxVectorSize).		// (i.e. MaxVectorElementCount).
SmallVector<ElementCount, 8> VFs;		SmallVector<ElementCount, 8> VFs;
auto MaxVectorSizeMaxBW =		for (ElementCount VS = MaxVectorElementCount * 2;
ElementCount::getFixed(WidestRegister / SmallestType);		ElementCount::isKnownLE(VS, MaxVectorElementCountMaxBW); VS *= 2)
for (ElementCount VS = MaxVectorSize * 2;
ElementCount::isKnownLE(VS, MaxVectorSizeMaxBW); VS *= 2)
VFs.push_back(VS);		VFs.push_back(VS);

// For each VF calculate its register usage.		// For each VF calculate its register usage.
auto RUs = calculateRegisterUsage(VFs);		auto RUs = calculateRegisterUsage(VFs);

// Select the largest VF which doesn't require more registers than existing		// Select the largest VF which doesn't require more registers than existing
// ones.		// ones.
for (int i = RUs.size() - 1; i >= 0; --i) {		for (int i = RUs.size() - 1; i >= 0; --i) {
bool Selected = true;		bool Selected = true;
for (auto &pair : RUs[i].MaxLocalUsers) {		for (auto &pair : RUs[i].MaxLocalUsers) {
unsigned TargetNumRegisters = TTI.getNumberOfRegisters(pair.first);		unsigned TargetNumRegisters = TTI.getNumberOfRegisters(pair.first);
if (pair.second > TargetNumRegisters)		if (pair.second > TargetNumRegisters)
Selected = false;		Selected = false;
}		}
if (Selected) {		if (Selected) {
MaxVF = VFs[i];		MaxVF = VFs[i];
break;		break;
}		}
}		}
if (ElementCount MinVF =		if (ElementCount MinVF =
TTI.getMinimumVF(SmallestType, /IsScalable=/false)) {		TTI.getMinimumVF(SmallestType, ComputeScalableMaxVF)) {
if (ElementCount::isKnownLT(MaxVF, MinVF)) {		if (ElementCount::isKnownLT(MaxVF, MinVF)) {
LLVM_DEBUG(dbgs() << "LV: Overriding calculated MaxVF(" << MaxVF		LLVM_DEBUG(dbgs() << "LV: Overriding calculated MaxVF(" << MaxVF
<< ") with target's minimum: " << MinVF << '\n');		<< ") with target's minimum: " << MinVF << '\n');
MaxVF = MinVF;		MaxVF = MinVF;
}		}
}		}
}		}
return MaxVF;		return MaxVF;
▲ Show 20 Lines • Show All 4,086 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	for.body:
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %iv.next, %n		%exitcond.not = icmp eq i64 %iv.next, %n
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

for.end:		for.end:
ret float %add		ret float %add
}		}

; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop. Using fixed-width vectorization instead.		; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
; CHECK-REMARK: vectorized loop (vectorization width: 8, interleaved count: 2)		; CHECK-REMARK: vectorized loop (vectorization width: 8, interleaved count: 2)
define bfloat @fadd_fast_bfloat(bfloat* noalias nocapture readonly %a, i64 %n) {		define bfloat @fadd_fast_bfloat(bfloat* noalias nocapture readonly %a, i64 %n) {
; CHECK-LABEL: @fadd_fast_bfloat		; CHECK-LABEL: @fadd_fast_bfloat
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %[[LOAD1:.*]] = load <8 x bfloat>		; CHECK: %[[LOAD1:.*]] = load <8 x bfloat>
; CHECK: %[[LOAD2:.*]] = load <8 x bfloat>		; CHECK: %[[LOAD2:.*]] = load <8 x bfloat>
; CHECK: %[[FADD1:.*]] = fadd fast <8 x bfloat> %[[LOAD1]]		; CHECK: %[[FADD1:.*]] = fadd fast <8 x bfloat> %[[LOAD1]]
; CHECK: %[[FADD2:.*]] = fadd fast <8 x bfloat> %[[LOAD2]]		; CHECK: %[[FADD2:.*]] = fadd fast <8 x bfloat> %[[LOAD2]]
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
for.end:		for.end:
ret float %.sroa.speculated		ret float %.sroa.speculated
}		}

; Reduction cannot be vectorized		; Reduction cannot be vectorized

; MUL		; MUL

; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop. Using fixed-width vectorization instead.		; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
; CHECK-REMARK: vectorized loop (vectorization width: 8, interleaved count: 2)		; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2)
define i32 @mul(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {		define i32 @mul(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
; CHECK-LABEL: @mul		; CHECK-LABEL: @mul
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %[[LOAD1:.*]] = load <8 x i32>		; CHECK: %[[LOAD1:.*]] = load <4 x i32>
; CHECK: %[[LOAD2:.*]] = load <8 x i32>		; CHECK: %[[LOAD2:.*]] = load <4 x i32>
; CHECK: %[[MUL1:.*]] = mul <8 x i32> %[[LOAD1]]		; CHECK: %[[MUL1:.*]] = mul <4 x i32> %[[LOAD1]]
; CHECK: %[[MUL2:.*]] = mul <8 x i32> %[[LOAD2]]		; CHECK: %[[MUL2:.*]] = mul <4 x i32> %[[LOAD2]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK: %[[RDX:.*]] = mul <8 x i32> %[[MUL2]], %[[MUL1]]		; CHECK: %[[RDX:.*]] = mul <4 x i32> %[[MUL2]], %[[MUL1]]
; CHECK: call i32 @llvm.vector.reduce.mul.v8i32(<8 x i32> %[[RDX]])		; CHECK: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %[[RDX]])
entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%sum.07 = phi i32 [ 2, %entry ], [ %mul, %for.body ]		%sum.07 = phi i32 [ 2, %entry ], [ %mul, %for.body ]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv		%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
%mul = mul nsw i32 %0, %sum.07		%mul = mul nsw i32 %0, %sum.07
%iv.next = add nuw nsw i64 %iv, 1		%iv.next = add nuw nsw i64 %iv, 1
%exitcond.not = icmp eq i64 %iv.next, %n		%exitcond.not = icmp eq i64 %iv.next, %n
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret i32 %mul		ret i32 %mul
}		}

; Note: This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies		; Note: This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies
; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop. Using fixed-width vectorization instead.		; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
; CHECK-REMARK: vectorized loop (vectorization width: 8, interleaved count: 2)		; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2)
define i32 @memory_dependence(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {		define i32 @memory_dependence(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {
; CHECK-LABEL: @memory_dependence		; CHECK-LABEL: @memory_dependence
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK: %[[LOAD1:.*]] = load <8 x i32>		; CHECK: %[[LOAD1:.*]] = load <4 x i32>
; CHECK: %[[LOAD2:.*]] = load <8 x i32>		; CHECK: %[[LOAD2:.*]] = load <4 x i32>
; CHECK: %[[LOAD3:.*]] = load <8 x i32>		; CHECK: %[[LOAD3:.*]] = load <4 x i32>
; CHECK: %[[LOAD4:.*]] = load <8 x i32>		; CHECK: %[[LOAD4:.*]] = load <4 x i32>
; CHECK: %[[ADD1:.*]] = add nsw <8 x i32> %[[LOAD3]], %[[LOAD1]]		; CHECK: %[[ADD1:.*]] = add nsw <4 x i32> %[[LOAD3]], %[[LOAD1]]
; CHECK: %[[ADD2:.*]] = add nsw <8 x i32> %[[LOAD4]], %[[LOAD2]]		; CHECK: %[[ADD2:.*]] = add nsw <4 x i32> %[[LOAD4]], %[[LOAD2]]
; CHECK: %[[MUL1:.*]] = mul <8 x i32> %[[LOAD3]]		; CHECK: %[[MUL1:.*]] = mul <4 x i32> %[[LOAD3]]
; CHECK: %[[MUL2:.*]] = mul <8 x i32> %[[LOAD4]]		; CHECK: %[[MUL2:.*]] = mul <4 x i32> %[[LOAD4]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK: %[[RDX:.*]] = mul <8 x i32> %[[MUL2]], %[[MUL1]]		; CHECK: %[[RDX:.*]] = mul <4 x i32> %[[MUL2]], %[[MUL1]]
; CHECK: call i32 @llvm.vector.reduce.mul.v8i32(<8 x i32> %[[RDX]])		; CHECK: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %[[RDX]])
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%i = phi i64 [ %inc, %for.body ], [ 0, %entry ]		%i = phi i64 [ %inc, %for.body ], [ 0, %entry ]
%sum = phi i32 [ %mul, %for.body ], [ 2, %entry ]		%sum = phi i32 [ %mul, %for.body ], [ 2, %entry ]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i		%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
Show All 22 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-analysis.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt -mtriple=aarch64-none-linux-gnu -mattr=+sve -force-target-instruction-cost=1 -loop-vectorize -S -debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK_SCALABLE_ON
				; RUN: opt -mtriple=aarch64-none-linux-gnu -mattr=+sve -force-target-instruction-cost=1 -loop-vectorize -S -debug-only=loop-vectorize -vectorizer-maximize-bandwidth < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK_SCALABLE_ON_MAXBW
				c-rhodesUnsubmitted Done Reply Inline Actions nit: multiple spaces c-rhodes: nit: multiple spaces

				; Test that the MaxVF for the following loop, that has no dependence distances,
				; is calculated as vscale x 4 (max legal SVE vector size) or vscale x 16
				; (maximized bandwidth for i8 in the loop).
				define void @test0(i32* %a, i8* %b, i32* %c) {
				; CHECK: LV: Checking a loop in "test0"
				; CHECK_SCALABLE_ON: LV: Found feasible scalable VF = vscale x 4
				; CHECK_SCALABLE_ON_MAXBW: LV: Found feasible scalable VF = vscale x 16
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %c, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i8, i8* %b, i64 %iv
				%1 = load i8, i8* %arrayidx2, align 4
				%zext = zext i8 %1 to i32
				%add = add nsw i32 %zext, %0
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %iv
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !0

				exit:
				ret void
				}

				; Test that the MaxVF for the following loop, with a dependence distance
				c-rhodesUnsubmitted Done Reply Inline Actions nit: add newline above c-rhodes: nit: add newline above
				; of 64 elements, is calculated as (maxvscale = 16) * 4.
				define void @test1(i32* %a, i8* %b) {
				; CHECK: LV: Checking a loop in "test1"
				; CHECK_SCALABLE_ON: LV: Found feasible scalable VF = vscale x 4
				; CHECK_SCALABLE_ON_MAXBW: LV: Found feasible scalable VF = vscale x 4
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i8, i8* %b, i64 %iv
				%1 = load i8, i8* %arrayidx2, align 4
				%zext = zext i8 %1 to i32
				%add = add nsw i32 %zext, %0
				%2 = add nuw nsw i64 %iv, 64
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !0

				exit:
				ret void
				}

				; Test that the MaxVF for the following loop, with a dependence distance
				; of 32 elements, is calculated as (maxvscale = 16) * 2.
				define void @test2(i32* %a, i8* %b) {
				; CHECK: LV: Checking a loop in "test2"
				; CHECK_SCALABLE_ON: LV: Found feasible scalable VF = vscale x 2
				; CHECK_SCALABLE_ON_MAXBW: LV: Found feasible scalable VF = vscale x 2
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i8, i8* %b, i64 %iv
				%1 = load i8, i8* %arrayidx2, align 4
				%zext = zext i8 %1 to i32
				%add = add nsw i32 %zext, %0
				%2 = add nuw nsw i64 %iv, 32
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !0

				exit:
				ret void
				}

				; Test that the MaxVF for the following loop, with a dependence distance
				; of 16 elements, is calculated as (maxvscale = 16) * 1.
				define void @test3(i32* %a, i8* %b) {
				; CHECK: LV: Checking a loop in "test3"
				; CHECK_SCALABLE_ON: LV: Found feasible scalable VF = vscale x 1
				; CHECK_SCALABLE_ON_MAXBW: LV: Found feasible scalable VF = vscale x 1
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i8, i8* %b, i64 %iv
				%1 = load i8, i8* %arrayidx2, align 4
				%zext = zext i8 %1 to i32
				%add = add nsw i32 %zext, %0
				%2 = add nuw nsw i64 %iv, 16
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !0

				exit:
				ret void
				}

				; Test the fallback mechanism when scalable vectors are not feasible due
				; to e.g. dependence distance. For the '-scalable-vectorization=exclusive'
				; it shouldn't try to vectorize with fixed-width vectors.
				define void @test4(i32* %a, i32* %b) {
				; CHECK: LV: Checking a loop in "test4"
				; CHECK_SCALABLE_ON-NOT: LV: Found feasible scalable VF
				; CHECK_SCALABLE_ON_MAXBW-NOT: LV: Found feasible scalable VF
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 8
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !2

				exit:
				ret void
				}

				!0 = distinct !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.enable", i1 true}
				!2 = distinct !{!2, !3, !4}
				!3 = !{!"llvm.loop.vectorize.enable", i1 true}
				!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll

Show All 31 Lines
; a[i + 8] = a[i] + b[i];		; a[i + 8] = a[i] + b[i];
; }		; }
; }		; }
;		;
; For scalable vectorization 'vscale' has to be considered, for this example		; For scalable vectorization 'vscale' has to be considered, for this example
; unless max(vscale)=2 it's unsafe to vectorize. For SVE max(vscale)=16, check		; unless max(vscale)=2 it's unsafe to vectorize. For SVE max(vscale)=16, check
; fixed-width vectorization is used instead.		; fixed-width vectorization is used instead.

; CHECK-DBG: LV: Max legal vector width too small, scalable vectorization unfeasible. Using fixed-width vectorization instead.		; CHECK-DBG: LV: Checking a loop in "test1"
; CHECK-DBG: remark: <unknown>:0:0: Max legal vector width too small, scalable vectorization unfeasible. Using fixed-width vectorization instead.		; CHECK-DBG: LV: Max legal vector width too small, scalable vectorization unfeasible.
; CHECK-DBG: LV: The max safe VF is: 8.		; CHECK-DBG: remark: <unknown>:0:0: Max legal vector width too small, scalable vectorization unfeasible.
		; CHECK-DBG: LV: The max safe fixed VF is: 8.
; CHECK-DBG: LV: Selecting VF: 4.		; CHECK-DBG: LV: Selecting VF: 4.
; CHECK-LABEL: @test1		; CHECK-LABEL: @test1
; CHECK: <4 x i32>		; CHECK: <4 x i32>
define void @test1(i32* %a, i32* %b) {		define void @test1(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
Show All 24 Lines
;		;
; void test2(int a, int b, int N) {		; void test2(int a, int b, int N) {
; #pragma clang loop vectorize(enable) vectorize_width(8, scalable)		; #pragma clang loop vectorize(enable) vectorize_width(8, scalable)
; for (int i=0; i<N; ++i) {		; for (int i=0; i<N; ++i) {
; a[i + 4] = a[i] + b[i];		; a[i + 4] = a[i] + b[i];
; }		; }
; }		; }

; CHECK-DBG: LV: Max legal vector width too small, scalable vectorization unfeasible. Using fixed-width vectorization instead.		; CHECK-DBG: LV: Checking a loop in "test2"
; CHECK-DBG: LV: The max safe VF is: 4.		; CHECK-DBG: LV: Max legal vector width too small, scalable vectorization unfeasible.
; CHECK-DBG: LV: User VF=8 is unsafe, clamping to max safe VF=4.		; CHECK-DBG: LV: The max safe fixed VF is: 4.
		; CHECK-DBG: LV: User VF=vscale x 8 is unsafe. Ignoring scalable UserVF.
; CHECK-DBG: LV: Selecting VF: 4.		; CHECK-DBG: LV: Selecting VF: 4.
; CHECK-LABEL: @test2		; CHECK-LABEL: @test2
; CHECK: <4 x i32>		; CHECK: <4 x i32>
define void @test2(i32* %a, i32* %b) {		define void @test2(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
Show All 30 Lines
; for (int i=0; i<N; ++i) {		; for (int i=0; i<N; ++i) {
; a[i + 32] = a[i] + b[i];		; a[i + 32] = a[i] + b[i];
; }		; }
; }		; }
;		;
; Max fixed VF=32, Max scalable VF=2, safe to vectorize.		; Max fixed VF=32, Max scalable VF=2, safe to vectorize.

; CHECK-DBG-LABEL: LV: Checking a loop in "test3"		; CHECK-DBG-LABEL: LV: Checking a loop in "test3"
; CHECK-DBG: LV: The max safe VF is: vscale x 2.		; CHECK-DBG: LV: The max safe scalable VF is: vscale x 2.
; CHECK-DBG: LV: Using user VF vscale x 2.		; CHECK-DBG: LV: Using user VF vscale x 2.
; CHECK-LABEL: @test3		; CHECK-LABEL: @test3
; CHECK: <vscale x 2 x i32>		; CHECK: <vscale x 2 x i32>
define void @test3(i32* %a, i32* %b) {		define void @test3(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
Show All 15 Lines
}		}

!6 = !{!6, !7, !8}		!6 = !{!6, !7, !8}
!7 = !{!"llvm.loop.vectorize.width", i32 2}		!7 = !{!"llvm.loop.vectorize.width", i32 2}
!8 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}		!8 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

; test4		; test4
;		;
; Scalable vectorization feasible, but the VF is unsafe. Should clamp.		; Scalable vectorization feasible, but the given VF is unsafe. Should ignore
		; the hint and leave it to the vectorizer to pick a more suitable VF.
;		;
; Specifies a vector of <vscale x 4 x i32>, i.e. maximum of 64 x i32 with 4		; Specifies a vector of <vscale x 4 x i32>, i.e. maximum of 64 x i32 with 4
; words per 128-bits (packed).		; words per 128-bits (packed).
;		;
; void test4(int a, int b, int N) {		; void test4(int a, int b, int N) {
; #pragma clang loop vectorize(enable) vectorize_width(4, scalable)		; #pragma clang loop vectorize(enable) vectorize_width(4, scalable)
; for (int i=0; i<N; ++i) {		; for (int i=0; i<N; ++i) {
; a[i + 32] = a[i] + b[i];		; a[i + 32] = a[i] + b[i];
; }		; }
; }		; }
;		;
; Max fixed VF=32, Max scalable VF=2, unsafe to vectorize. Should clamp to 2.		; Max fixed VF=32, Max scalable VF=2, unsafe to vectorize.

; CHECK-DBG-LABEL: LV: Checking a loop in "test4"		; CHECK-DBG-LABEL: LV: Checking a loop in "test4"
; CHECK-DBG: LV: The max safe VF is: vscale x 2.		; CHECK-DBG: LV: The max safe scalable VF is: vscale x 2.
; CHECK-DBG: LV: User VF=vscale x 4 is unsafe, clamping to max safe VF=vscale x 2.		; CHECK-DBG: LV: User VF=vscale x 4 is unsafe. Ignoring scalable UserVF.
; CHECK-DBG: remark: <unknown>:0:0: User-specified vectorization factor vscale x 4 is unsafe, clamping to maximum safe vectorization factor vscale x 2		; CHECK-DBG: remark: <unknown>:0:0: User-specified vectorization factor vscale x 4 is unsafe. Ignoring the hint to let the compiler pick a suitable VF.
; CHECK-DBG: LV: Using max VF vscale x 2		; CHECK-DBG: Found feasible scalable VF = vscale x 2
		; CHECK-DBG: LV: Selecting VF: 4.
; CHECK-LABEL: @test4		; CHECK-LABEL: @test4
; CHECK: <vscale x 2 x i32>		; CHECK: <4 x i32>
		c-rhodesUnsubmitted Done Reply Inline Actions Should this select the feasible scalable VF? I suppose this is related to the issue @david-arm pointed out in `LoopVectorizationCostModel::computeFeasibleMaxVF` c-rhodes: Should this select the feasible scalable VF? I suppose this is related to the issue @david-arm…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions I don't think it should do that by default. Personally I think it's best if the vectorizer just ignores hints that are not valid, and calculate a better answer instead, rather than blindly opting for something that seems "close enough", but may not be profitable. That's akin to what happens for fixed-width vectors; if the UserVF is not legal (e.g. VF=16), and VF=8 would be legal, it will pick the most profitable VF <= 8, which isn't necessarily 8. Extending this to scalable vectors, we can argue that if `VF=vscale x 8` is not legal, it should consider all VFs (fixed and scalable) up to whatever is legal. We can then add an LV option to favour scalable VFs if we know the cost-model doesn't have all the information to make the right decision (although that's something to consider for another patch). sdesmalen: I don't think it should do that by default. Personally I think it's best if the vectorizer just…
define void @test4(i32* %a, i32* %b) {		define void @test4(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv		%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
Show All 27 Lines
; for (int i=0; i<N; ++i) {		; for (int i=0; i<N; ++i) {
; a[i + 128] = a[i] + b[i];		; a[i + 128] = a[i] + b[i];
; }		; }
; }		; }
;		;
; Max fixed VF=128, Max scalable VF=8, safe to vectorize.		; Max fixed VF=128, Max scalable VF=8, safe to vectorize.

; CHECK-DBG-LABEL: LV: Checking a loop in "test5"		; CHECK-DBG-LABEL: LV: Checking a loop in "test5"
; CHECK-DBG: LV: The max safe VF is: vscale x 8.		; CHECK-DBG: LV: The max safe scalable VF is: vscale x 8.
; CHECK-DBG: LV: Using user VF vscale x 4		; CHECK-DBG: LV: Using user VF vscale x 4
; CHECK-LABEL: @test5		; CHECK-LABEL: @test5
; CHECK: <vscale x 4 x i32>		; CHECK: <vscale x 4 x i32>
define void @test5(i32* %a, i32* %b) {		define void @test5(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
Show All 15 Lines
}		}

!12 = !{!12, !13, !14}		!12 = !{!12, !13, !14}
!13 = !{!"llvm.loop.vectorize.width", i32 4}		!13 = !{!"llvm.loop.vectorize.width", i32 4}
!14 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}		!14 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

; test6		; test6
;		;
; Scalable vectorization feasible, but the VF is unsafe. Should clamp.		; Scalable vectorization feasible, but the VF is unsafe. Should ignore
		; the hint and leave it to the vectorizer to pick a more suitable VF.
;		;
; Specifies a vector of <vscale x 16 x i32>, i.e. maximum of 256 x i32.		; Specifies a vector of <vscale x 16 x i32>, i.e. maximum of 256 x i32.
;		;
; void test6(int a, int b, int N) {		; void test6(int a, int b, int N) {
; #pragma clang loop vectorize(enable) vectorize_width(16, scalable)		; #pragma clang loop vectorize(enable) vectorize_width(16, scalable)
; for (int i=0; i<N; ++i) {		; for (int i=0; i<N; ++i) {
; a[i + 128] = a[i] + b[i];		; a[i + 128] = a[i] + b[i];
; }		; }
; }		; }
;		;
; Max fixed VF=128, Max scalable VF=8, unsafe to vectorize. Should clamp to 8.		; Max fixed VF=128, Max scalable VF=8, unsafe to vectorize.

; CHECK-DBG-LABEL: LV: Checking a loop in "test6"		; CHECK-DBG-LABEL: LV: Checking a loop in "test6"
; CHECK-DBG: LV: The max safe VF is: vscale x 8.		; CHECK-DBG: LV: The max safe scalable VF is: vscale x 8.
; CHECK-DBG: LV: User VF=vscale x 16 is unsafe, clamping to max safe VF=vscale x 8.		; CHECK-DBG: LV: User VF=vscale x 16 is unsafe. Ignoring scalable UserVF.
; CHECK-DBG: remark: <unknown>:0:0: User-specified vectorization factor vscale x 16 is unsafe, clamping to maximum safe vectorization factor vscale x 8		; CHECK-DBG: remark: <unknown>:0:0: User-specified vectorization factor vscale x 16 is unsafe. Ignoring the hint to let the compiler pick a suitable VF.
; CHECK-DBG: LV: Using max VF vscale x 8		; CHECK-DBG: LV: Found feasible scalable VF = vscale x 4
		; CHECK-DBG: Selecting VF: 4.
; CHECK-LABEL: @test6		; CHECK-LABEL: @test6
; CHECK: <vscale x 8 x i32>		; CHECK: <4 x i32>
define void @test6(i32* %a, i32* %b) {		define void @test6(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv		%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
%0 = load i32, i32* %arrayidx, align 4		%0 = load i32, i32* %arrayidx, align 4
Show All 11 Lines	exit:
ret void		ret void
}		}

!15 = !{!15, !16, !17}		!15 = !{!15, !16, !17}
!16 = !{!"llvm.loop.vectorize.width", i32 16}		!16 = !{!"llvm.loop.vectorize.width", i32 16}
!17 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}		!17 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

; CHECK-NO-SVE-LABEL: LV: Checking a loop in "test_no_sve"		; CHECK-NO-SVE-LABEL: LV: Checking a loop in "test_no_sve"
; CHECK-NO-SVE: LV: Ignoring VF=vscale x 4 because target does not support scalable vectors.		; CHECK-NO-SVE: LV: Disabling scalable vectorization, because target does not support scalable vectors.
; CHECK-NO-SVE: remark: <unknown>:0:0: Ignoring VF=vscale x 4 because target does not support scalable vectors.		; CHECK-NO-SVE: remark: <unknown>:0:0: Disabling scalable vectorization, because target does not support scalable vectors.
		; CHECK-NO-SVE: LV: User VF=vscale x 4 is unsafe. Ignoring scalable UserVF.
; CHECK-NO-SVE: LV: Selecting VF: 4.		; CHECK-NO-SVE: LV: Selecting VF: 4.
; CHECK-NO-SVE: <4 x i32>		; CHECK-NO-SVE: <4 x i32>
; CHECK-NO-SVE-NOT: <vscale x 4 x i32>		; CHECK-NO-SVE-NOT: <vscale x 4 x i32>
define void @test_no_sve(i32* %a, i32* %b) {		define void @test_no_sve(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
Show All 15 Lines
!18 = !{!18, !19, !20}		!18 = !{!18, !19, !20}
!19 = !{!"llvm.loop.vectorize.width", i32 4}		!19 = !{!"llvm.loop.vectorize.width", i32 4}
!20 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}		!20 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

; Test the LV falls back to fixed-width vectorization if scalable vectors are		; Test the LV falls back to fixed-width vectorization if scalable vectors are
; supported but max vscale is undefined.		; supported but max vscale is undefined.
;		;
; CHECK-NO-MAX-VSCALE-LABEL: LV: Checking a loop in "test_no_max_vscale"		; CHECK-NO-MAX-VSCALE-LABEL: LV: Checking a loop in "test_no_max_vscale"
; CHECK-NO-MAX-VSCALE: LV: Max legal vector width too small, scalable vectorization unfeasible. Using fixed-width vectorization instead.		; CEHCK-NO-MAX-VSCALE: The max safe fixed VF is: 4.
; CEHCK-NO-MAX-VSCALE: The max safe VF is: 4.		; CHECK-NO-MAX-VSCALE: LV: User VF=vscale x 4 is unsafe. Ignoring scalable UserVF.
; CHECK-NO-MAX-VSCALE: LV: Selecting VF: 4.		; CHECK-NO-MAX-VSCALE: LV: Selecting VF: 4.
; CHECK-NO-MAX-VSCALE: <4 x i32>		; CHECK-NO-MAX-VSCALE: <4 x i32>
define void @test_no_max_vscale(i32* %a, i32* %b) {		define void @test_no_max_vscale(i32* %a, i32* %b) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
Show All 19 Lines

llvm/test/Transforms/LoopVectorize/scalable-vf-hint.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt -loop-vectorize -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s			; RUN: opt -loop-vectorize -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; CHECK: LV: Ignoring VF=vscale x 4 because target does not support scalable vectors.			; CHECK: LV: Disabling scalable vectorization, because target does not support scalable vectors.
	; CHECK: remark: <unknown>:0:0: Ignoring VF=vscale x 4 because target does not support scalable vectors.			; CHECK: remark: <unknown>:0:0: Disabling scalable vectorization, because target does not support scalable vectors.
	; CHECK: LV: The Widest register safe to use is: 32 bits.			; CHECK: LV: The Widest register safe to use is: 32 bits.
	define void @test1(i32* %a, i32* %b) {			define void @test1(i32* %a, i32* %b) {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
	%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv			%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
	Show All 18 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Calculate max feasible scalable VF.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 337246

llvm/include/llvm/Transforms/Vectorize/LoopVectorize.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-analysis.ll

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll

llvm/test/Transforms/LoopVectorize/scalable-vf-hint.ll

[LV] Calculate max feasible scalable VF.
ClosedPublic