This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Vectorize/
-
lib/
-
Transforms/
-
Vectorize/
8/12
LoopVectorize.cpp

Differential D96021

[LoopVectorize] NFC: Move UserVF feasibility checks to separate function.
AbandonedPublic

Authored by sdesmalen on Feb 4 2021, 5:16 AM.

Download Raw Diff

Details

Reviewers

c-rhodes
fhahn
evandro
gilr
dmgreen

Summary

This patch is NFC and cleans up computeFeasibleMaxVF() by moving all
validity checking (and possibly clamping) of UserVF to a separate function
computeFeasibleUserVF().

This patch is a preparatory patch with the ultimate goal of making
computeMaxVF() return both a max fixed VF and a max scalable VF,
so that selectVectorizationFactor() can pick the most cost-effective
vectorization factor.

Diff Detail

Unit TestsFailed

	Time	Test
	50 ms	x64 debian > lld.MachO/invalid::stub-link.s
	120 ms	x64 windows > LLVM.Instrumentation/InstrProfiling::profiling.ll
	90 ms	x64 windows > lld.MachO/invalid::stub-link.s

Event Timeline

sdesmalen created this revision.Feb 4 2021, 5:16 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 4 2021, 5:16 AM

sdesmalen requested review of this revision.Feb 4 2021, 5:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2021, 5:16 AM

sdesmalen added a child revision: D96022: [LoopVectorize] NFC: Split off clamping from computeFeasibleUserVF into its own function..Feb 4 2021, 5:17 AM

sdesmalen added a parent revision: D96018: [LoopVectorize] NFC: Change computeFeasibleMaxVF to operate on ElementCount..

sdesmalen added a child revision: D96023: [LoopVectorize] Calculate Max Feasible Scalable VF..Feb 4 2021, 5:22 AM

Harbormaster completed remote builds in B87881: Diff 321390.Feb 4 2021, 7:02 AM

sdesmalen updated this revision to Diff 322655.Feb 10 2021, 4:53 AM

sdesmalen added reviewers: c-rhodes, fhahn, evandro, gilr.

sdesmalen set the repository for this revision to rG LLVM Github Monorepo.

sdesmalen added a reviewer: dmgreen.Feb 11 2021, 2:04 PM

c-rhodes added inline comments.Feb 12 2021, 6:52 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1608–1610	/// \return UserVF if it is non-zero and there are no dependences, otherwise /// return a clamped value. For a scalable UserVF, the resulting feasible VF /// may be fixed-width.
5557–5559	should this be part of the lambda since it's not used elsewhere? Not sure if it's an issue but `MinBWs` could now be computed where it previously wasn't.
5561	should functions start with lower case?
5713	`if (UserVF.isZero())`?
5783–5784	can this be moved above `MaxSafeVectorWidthInBits`?
5788–5798	it's nice how you've reused the remark for the debug message above, not really important for this patch but it would be good to do the same thing here

sdesmalen mentioned this in D96025: [LoopVectorize] Return both fixed and scalable Max VF from computeMaxVF..Feb 16 2021, 4:29 AM

Addressed comments and rebased after D95245 landed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5561	GetFeasibleMaxVF is also a local variable (with a function as value) so I thought it should start with upper case. But it seems the code-base is a bit undecided; in a quick grep I counted slightly (15%) more cases starting with an upper-case, than starting with a lower-case.
5788–5798	Thanks. I'm happy to see if I can do that in a separate patch (it requires some changes to the tests).

LGTM, might be worth leaving a little time before landing incase others have any comments, cheers

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5561	GetFeasibleMaxVF is also a local variable (with a function as value) so I thought it should start with upper case. But it seems the code-base is a bit undecided; in a quick grep I counted slightly (15%) more cases starting with an upper-case, than starting with a lower-case. Ah fair enough, thanks for checking.
5788–5798	Thanks. I'm happy to see if I can do that in a separate patch (it requires some changes to the tests). I agree that would be better in a separate patch considering the changes to tests required.

This revision is now accepted and ready to land.Feb 16 2021, 7:28 AM

fhahn added inline comments.Feb 18 2021, 12:59 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1609	it also checks if the reductions are supported for scalable vectors, right? Probably better to not go into the specifics in the comment and just say `\return UserVF directly if it is valid. Otherwise clamp UserVF to the largest valid value.`
5566	nit: the code might be slightly easier to follow if you just directly return if there is a valid UserVF, like // If there is a valid UserVF, use it. if (auto UserVF = computeFeasibleUserVF(...))` return UserVF.getValue(); return computeFeasibleMaxVF()

Addressed comments.

I had a look at the next patch and I am wondering if it would be possible to avoid splitting out the UserVF handling (which in turn requires splitting out the code to limit the VF by the the maximum safe VF, D96022). Unfortunately the code in computeFeasibleMaxVF has not been the most straight-forward, even before adding scalable support and I am a bit worried things will get harder to follow if the code is spread out more.

I think it might be simpler to operate in 3 stages in computeFeasibleMaxVF: 1) compute maximum fixed & scalable VFs, 2) use computed max VFs to pick/limit UserVF, if provided and 3) pick MaxVF based on registers, limit by computed MaxVF. I sketched out this approach in D96997.

I have not looked at the other patches in this stack in detail, so I might be missing something that makes things more complicated. But to pick a scalable VF, we'd just have to split out the logic to compute the MaxVectorSize (something like computeMaxVFBasedOnRegisters) and then call it for both scalable & fixed width vectors and limit by the computed maximum VFs. I think that would be similar to what's done in D96023, but overall I think it might be simpler than the clampFeasibleMaxVF approach, because less parameters need to be passed around and we also avoid re-computing the max VFs to clamp with.

Hi @fhahn, thanks for looking into these two patches.

Today the code in computeFeasibleMaxVF is roughly structured as:

if (Need to care about UserVF) {
  // Validate, clamp or fall back to fixed-width or scalars.
  // Meanwhile, giving various UserVF-specific remarks/debug output.
} else
  // Determine MaxFeasibleVF some other way.

Because the function has gotten so big and have practically non-overlapping code paths, I don't see any downside in splitting this code into separate functions to handle their respective cases.

My main reason for doing these two NFC patches first is that the code-paths for fixed- and scalable max VFs would be identical, i.e. no need to maintain separate variables for Max(Scalable|Fixed)VF, or have lots of conditional code based on whether the UserVF or MaxVF is scalable. With these NFC tweaks, scalable vectors just fit in naturally without a lot of specific 'scalable' changes (see D96023 which extends the function to work on scalable vectors, by calling a different interface to determine the WidestRegister. Not really any other scalable-specific changes are needed).

Patch D96022 moves the clamping operation to a separate function, so that it works for both scalable and fixed-width vectors (based on MaxVscale), making it one of the few only places that need to be very scalable-specific. The interface is generic though, it only asks to clamp "some VF" to some "num elements". computeFeasibleUserVF then only emits different remarks/debug-output based on the clamped value.
After this bit of cleanup, the code-paths for both computeMaxUserVF and computeFeasibleMaxVF do something very similar to what you suggested in D96997:

Start of with a large vector size (based on register width)
Clamp the vector size to max number of elements <legal max or tripcount>
Pick MaxVF.

I find the code in D96997 quite difficult to follow. Not necessarily because of the changes you made, I already find the code difficult to follow for the fixed-width case alone, but adding the scalable case just adds to the complication. Things like:

if (some condition specific to scalable VFs) {
  :
  ScalableMaxVF = ElementCount::getScalable(0)
  UserVF = ElementCount::getScalable(0)
}

:

if (some other condition specific to scalable VFs) {
  :
  ScalableMaxVF = ElementCount::getScalable(0)
}

make me weary for example. I think having the code split up makes the code a lot easier to follow than having the combination {UserVF, auto-select} x {Fixed, Scalable} in one function. After D96021 and D96022, practically all code-paths are the same for both the fixed- and scalable case.

In D96021#2572946, @fhahn wrote:

I have not looked at the other patches in this stack in detail, so I might be missing something that makes things more complicated. But to pick a scalable VF, we'd just have to split out the logic to compute the MaxVectorSize (something like computeMaxVFBasedOnRegisters) and then call it for both scalable & fixed width vectors and limit by the computed maximum VFs. I think that would be similar to what's done in D96023, but overall I think it might be simpler than the clampFeasibleMaxVF approach, because less parameters need to be passed around and we also avoid re-computing the max VFs to clamp with.

I'd be happy to see if more things can be simplified once things work for scalable vectors, but I don't really see many things that are recomputed after D96023?

Moved calculation for MaxSafeElements out of getFeasibleUserVF.

Hi @fhahn, following your feedback, I've refactored the patches in this series a bit to move out some common calculations from getFeasibleUserVF and computeFeasibleMaxVF. I've also tried to implement your idea to start off with a very wide vector, that gets subsequently limited by: loop dependence-distance, vector-register-width, iteration count, etc. You'll see in D96023 that the code is now relatively straight-forward in how it limits the MaxVF, and how it works for both fixed- and scalable vectors. You'll also notice that computeFeasibleMaxVF no longer takes an explicit bool ComputeScalableMaxVF flag.

Hopefully this is more along the lines that you had in mind!

Patches D96021 and D96022 just move some code around, which I think makes the code quite a bit more readable. Specifically for this patch, if you have no objections I'd be quite eager to land this patch, since it just moves some existing functionality into its own function which I think is an immediate improvement. Discussion on how things will end up for scalable vectors can then be done on the subsequent patches.

Harbormaster completed remote builds in B91892: Diff 327893.Mar 3 2021, 11:21 PM

In D96021#2579034, @sdesmalen wrote:

Hi @fhahn, thanks for looking into these two patches.

Hi, sorry for the delay.

Today the code in computeFeasibleMaxVF is roughly structured as:
if (Need to care about UserVF) {
  // Validate, clamp or fall back to fixed-width or scalars.
  // Meanwhile, giving various UserVF-specific remarks/debug output.
} else
  // Determine MaxFeasibleVF some other way.
Because the function has gotten so big and have practically non-overlapping code paths, I don't see any downside in splitting this code into separate functions to handle their respective cases.

Yes that's indeed a problem, but I think that was mostly a consequence of adding scalable support for UserVFs only. Before it was much simpler IIRC and structured similar to outlined earlier (compute max bound, apply to UserVF if present otherwise apply bound when computing max VF).

I find the code in D96997 quite difficult to follow. Not necessarily because of the changes you made, I already find the code difficult to follow for the fixed-width case alone, but adding the scalable case just adds to the complication. Things like:
if (some condition specific to scalable VFs) {
  :
  ScalableMaxVF = ElementCount::getScalable(0)
  UserVF = ElementCount::getScalable(0)
}

:

if (some other condition specific to scalable VFs) {
  :
  ScalableMaxVF = ElementCount::getScalable(0)
}
make me weary for example. I think having the code split up makes the code a lot easier to follow than having the combination {UserVF, auto-select} x {Fixed, Scalable} in one function. After D96021 and D96022, practically all code-paths are the same for both the fixed- and scalable case.

Agreed, that was not ideal, but I think not necessarily due to the new structure, but because it inlined the upper bound computation for scalable VFs, which is much more verbose than for fixed vectors (BTW I think the extra debug messages are great!). I did a quick pass over the patch to move the scalable VF upper bound computation to a separate function, so those separate checks should be gone now.

In D96021#2601389, @sdesmalen wrote:

Hi @fhahn, following your feedback, I've refactored the patches in this series a bit to move out some common calculations from getFeasibleUserVF and computeFeasibleMaxVF. I've also tried to implement your idea to start off with a very wide vector, that gets subsequently limited by: loop dependence-distance, vector-register-width, iteration count, etc. You'll see in D96023 that the code is now relatively straight-forward in how it limits the MaxVF, and how it works for both fixed- and scalable vectors. You'll also notice that computeFeasibleMaxVF no longer takes an explicit bool ComputeScalableMaxVF flag.

Hopefully this is more along the lines that you had in mind!

Thanks for the update, I just had a look. One thing I am not sure about is whether we want to emit the same debug messages & remarks for scalable vectors when we just compute the max scalable VF, without UserVF? Personally I think it would be helpful to emit them independently of whether UserVF is used or not, if scalable vectorization is enabled. For example, if there's a dependence that prevents scalable vectorization due to the MaxVScale computation, I think it would make sense to display the remark/debug output saying this is causing scalable vectorization to be disabled.

As alluded to earlier, I am not sure if splitting up the logic in getFeasibleUserVF and computeFeasibleMaxVF is ideal. The way I see it, we have to apply the same 'rules'/'restrictions' to both the UserVF and/or the max VF we compute. With that in mind, the UserVF handling should be really simple: we just have to check if it is within the upper bound. For scalable vectors, we also have the fallback. But this may just be temporary, because once we also pick a max scalable VF, we can instead proceed with ignoring the UserVF, same as we do for fixed vectors.

I think the main advantages of the approach outlined earlier are:

There's one place we apply the legality restrictions to the upper bound, which is then used for both the computed max VF and UserVF. I think the advantage here is that it is clearer from the code that the same rules apply in both cases (max VF & UserVF) and if we need to add a new constraint, there's a single place to add it.
The 'clamping' is done by a simple minimum function, whereas clampFeasibleMaxVF contains scalable vector specific code and also seems slightly more complicated (e.g. needs an extra optional out parameter)
The same debug messages &Remarks are generated when scalable vectorization is enabled.
Maybe a bit less indirection/complex function calls.

In the end, the code is not vastly different and I think there are no absolute answers here. I hope the above explains the direction from which I am looking at this a bit better than I did initially.

I updated D96997 a bit to better sketch out how it could integrate with computing scalable max VFs. It's still not too polished, it should mainly illustrate the overall structure and there's definitely lots of small things that could be improved further.

Another thing I am not entirely sure is the reduction checks. For now it seems like for AArch64 the VF is not really used, so in the patch it just passes any scalable VF. not sure if that might change in the future, but I think I saw in one of the comments mentioning that we might to remove VFs after computing the max VF anyways.

Hi @fhahn,

[..]
In the end, the code is not vastly different and I think there are no absolute answers here. I hope the above explains the direction from which I am looking at this a bit better than I did initially.

Yes, thanks for this, that was helpful! My original reason for splitting the patch series out like this was to make some trivial NFC "move code around" changes that were relatively simple to review, before making functional changes for scalable vectors. After seeing more clearly what you mean, I agree with you that the code becomes a bit more readable by being a bit more rigorous in the refactoring, and it also simplified the follow-up patches a bit.

Long story short, I have updated my patch series to align it more with the approach that you outlined in D96997. My new patch is in D98509 and I'll abandon this series.

Another thing I am not entirely sure is the reduction checks. For now it seems like for AArch64 the VF is not really used, so in the patch it just passes any scalable VF. not sure if that might change in the future, but I think I saw in one of the comments mentioning that we might to remove VFs after computing the max VF anyways

I initially thought we could discard individual VF candidates that don't match the "can this vectorize" criteria. This may actually be too fine grained because of how VFRange is used to specify a range of VFs, making it difficult to discard individual VFs as candidates. As you say, for SVE it doesn't look at the specific VF anyway. For now, I've added a FIXME around it, indicating this will change in the future.

With that in mind, the UserVF handling should be really simple: we just have to check if it is within the upper bound. For scalable vectors, we also have the fallback. But this may just be temporary, because once we also pick a max scalable VF, we can instead proceed with ignoring the UserVF, same as we do for fixed vectors.

Yes, I have actually implemented that behaviour in the new patch. If a scalable UserVF is not valid, it should not fall back on a less-wide scalable VF, or a VF with the same minimum number of lanes as the UserVF and drop the 'scalable' flag, as it does now. It should instead ignore the UserVF, and do what it would otherwise do: pick a suitable VF that is most cost-effective.

Thanks again!

Abandoned in favour of D98509

sdesmalen abandoned this revision.Mar 12 2021, 7:36 AM

sdesmalen mentioned this in D98509: [LV] Calculate max feasible scalable VF..Mar 12 2021, 7:39 AM

In D96021#2622221, @sdesmalen wrote:

Hi @fhahn,

[..]
In the end, the code is not vastly different and I think there are no absolute answers here. I hope the above explains the direction from which I am looking at this a bit better than I did initially.

Yes, thanks for this, that was helpful! My original reason for splitting the patch series out like this was to make some trivial NFC "move code around" changes that were relatively simple to review, before making functional changes for scalable vectors. After seeing more clearly what you mean, I agree with you that the code becomes a bit more readable by being a bit more rigorous in the refactoring, and it also simplified the follow-up patches a bit.

Long story short, I have updated my patch series to align it more with the approach that you outlined in D96997. My new patch is in D98509 and I'll abandon this series.

Thanks! I would have been happy to update my patch, but it's great you put up an update patch. I'll take a closer look early in the coming week!

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

201 lines

Diff 327893

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,599 Lines • ▼ Show 20 Lines	void invalidateCostModelingDecisions() {
WideningDecisions.clear();		WideningDecisions.clear();
Uniforms.clear();		Uniforms.clear();
Scalars.clear();		Scalars.clear();
}		}

private:		private:
unsigned NumPredStores = 0;		unsigned NumPredStores = 0;

		/// \return UserVF directly if it is valid. Otherwise clamp UserVF to the
		/// largest valid value.
		fhahnUnsubmitted Done Reply Inline Actions it also checks if the reductions are supported for scalable vectors, right? Probably better to not go into the specifics in the comment and just say `\return UserVF directly if it is valid. Otherwise clamp UserVF to the largest valid value.` fhahn: it also checks if the reductions are supported for scalable vectors, right? Probably better to…
		Optional<ElementCount> getFeasibleUserVF(ElementCount UserVF,
		c-rhodesUnsubmitted Done Reply Inline Actions /// \return UserVF if it is non-zero and there are no dependences, otherwise /// return a clamped value. For a scalable UserVF, the resulting feasible VF /// may be fixed-width. c-rhodes: ``` /// \return UserVF if it is non-zero and there are no dependences, otherwise /// return…
		unsigned MaxSafeElements);

/// \return An upper bound for the vectorization factor, a power-of-2 larger		/// \return An upper bound for the vectorization factor, a power-of-2 larger
/// than zero. One is returned if vectorization should best be avoided due		/// than zero. One is returned if vectorization should best be avoided due
/// to cost.		/// to cost.
ElementCount computeFeasibleMaxVF(unsigned ConstTripCount,		ElementCount computeFeasibleMaxVF(unsigned ConstTripCount,
ElementCount UserVF);		unsigned SmallestType, unsigned WidestType);

/// The vectorization cost is a combination of the cost itself and a boolean		/// The vectorization cost is a combination of the cost itself and a boolean
/// indicating whether any of the contributing operations will actually		/// indicating whether any of the contributing operations will actually
/// operate on		/// operate on
/// vector values after type legalization in the backend. If this latter value		/// vector values after type legalization in the backend. If this latter value
/// is		/// is
/// false, then all operations will be scalarized (i.e. no vectorization has		/// false, then all operations will be scalarized (i.e. no vectorization has
/// actually taken place).		/// actually taken place).
▲ Show 20 Lines • Show All 3,923 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');		LLVM_DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
if (TC == 1) {		if (TC == 1) {
reportVectorizationFailure("Single iteration (non) loop",		reportVectorizationFailure("Single iteration (non) loop",
"loop trip count is one, irrelevant for vectorization",		"loop trip count is one, irrelevant for vectorization",
"SingleIterationLoop", ORE, TheLoop);		"SingleIterationLoop", ORE, TheLoop);
return None;		return None;
}		}

		auto GetFeasibleMaxVF = [&]() -> ElementCount {
		MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
		unsigned SmallestType, WidestType;
		c-rhodesUnsubmitted Done Reply Inline Actions should this be part of the lambda since it's not used elsewhere? Not sure if it's an issue but `MinBWs` could now be computed where it previously wasn't. c-rhodes: should this be part of the lambda since it's not used elsewhere? Not sure if it's an issue but…
		std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();

		c-rhodesUnsubmitted Not Done Reply Inline Actions should functions start with lower case? c-rhodes: should functions start with lower case?
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions GetFeasibleMaxVF is also a local variable (with a function as value) so I thought it should start with upper case. But it seems the code-base is a bit undecided; in a quick grep I counted slightly (15%) more cases starting with an upper-case, than starting with a lower-case. sdesmalen: GetFeasibleMaxVF is also a local variable (with a function as value) so I thought it should…
		c-rhodesUnsubmitted Not Done Reply Inline Actions GetFeasibleMaxVF is also a local variable (with a function as value) so I thought it should start with upper case. But it seems the code-base is a bit undecided; in a quick grep I counted slightly (15%) more cases starting with an upper-case, than starting with a lower-case. Ah fair enough, thanks for checking. c-rhodes: > GetFeasibleMaxVF is also a local variable (with a function as value) so I thought it should…
		// Get the maximum safe dependence distance in bits computed by LAA.
		// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
		// the memory accesses that is most restrictive (involved in the smallest
		// dependence distance).
		unsigned MaxSafeElements =
		fhahnUnsubmitted Done Reply Inline Actions nit: the code might be slightly easier to follow if you just directly return if there is a valid UserVF, like // If there is a valid UserVF, use it. if (auto UserVF = computeFeasibleUserVF(...))` return UserVF.getValue(); return computeFeasibleMaxVF() fhahn: nit: the code might be slightly easier to follow if you just directly return if there is a…
		PowerOf2Floor(Legal->getMaxSafeVectorWidthInBits() / WidestType);

		// First analyze the UserVF, fall back if the UserVF should be ignored.
		if (auto MaybeMaxVF = getFeasibleUserVF(UserVF, MaxSafeElements))
		return MaybeMaxVF.getValue();
		return computeFeasibleMaxVF(TC, SmallestType, WidestType);
		};

switch (ScalarEpilogueStatus) {		switch (ScalarEpilogueStatus) {
case CM_ScalarEpilogueAllowed:		case CM_ScalarEpilogueAllowed:
return computeFeasibleMaxVF(TC, UserVF);		return GetFeasibleMaxVF();
case CM_ScalarEpilogueNotAllowedUsePredicate:		case CM_ScalarEpilogueNotAllowedUsePredicate:
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case CM_ScalarEpilogueNotNeededUsePredicate:		case CM_ScalarEpilogueNotNeededUsePredicate:
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: vector predicate hint/switch found.\n"		dbgs() << "LV: vector predicate hint/switch found.\n"
<< "LV: Not allowing scalar epilogue, creating predicated "		<< "LV: Not allowing scalar epilogue, creating predicated "
<< "vector loop.\n");		<< "vector loop.\n");
break;		break;
Show All 21 Lines	LoopVectorizationCostModel::computeMaxVF(ElementCount UserVF, unsigned UserIC) {
// require a lane mask which varies through the vector loop body. (TODO)		// require a lane mask which varies through the vector loop body. (TODO)
if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {		if (TheLoop->getExitingBlock() != TheLoop->getLoopLatch()) {
// If there was a tail-folding hint/switch, but we can't fold the tail by		// If there was a tail-folding hint/switch, but we can't fold the tail by
// masking, fallback to a vectorization with a scalar epilogue.		// masking, fallback to a vectorization with a scalar epilogue.
if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {		if (ScalarEpilogueStatus == CM_ScalarEpilogueNotNeededUsePredicate) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "		LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking: vectorize with a "
"scalar epilogue instead.\n");		"scalar epilogue instead.\n");
ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;		ScalarEpilogueStatus = CM_ScalarEpilogueAllowed;
return computeFeasibleMaxVF(TC, UserVF);		return GetFeasibleMaxVF();
}		}
return None;		return None;
}		}

// Now try the tail folding		// Now try the tail folding

// Invalidate interleave groups that require an epilogue if we can't mask		// Invalidate interleave groups that require an epilogue if we can't mask
// the interleave-group.		// the interleave-group.
if (!useMaskedInterleavedAccesses(TTI)) {		if (!useMaskedInterleavedAccesses(TTI)) {
assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&		assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point");		"No decisions should have been taken at this point");
// Note: There is no need to invalidate any cost modeling decisions here, as		// Note: There is no need to invalidate any cost modeling decisions here, as
// non where taken so far.		// non where taken so far.
InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();		InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
}		}

ElementCount MaxVF = computeFeasibleMaxVF(TC, UserVF);		ElementCount MaxVF = GetFeasibleMaxVF();
assert(!MaxVF.isScalable() &&		assert(!MaxVF.isScalable() &&
"Scalable vectors do not yet support tail folding");		"Scalable vectors do not yet support tail folding");
assert((UserVF.isNonZero() \|\| isPowerOf2_32(MaxVF.getFixedValue())) &&		assert((UserVF.isNonZero() \|\| isPowerOf2_32(MaxVF.getFixedValue())) &&
"MaxVF must be a power of 2");		"MaxVF must be a power of 2");
unsigned MaxVFtimesIC =		unsigned MaxVFtimesIC =
UserIC ? MaxVF.getFixedValue() * UserIC : MaxVF.getFixedValue();		UserIC ? MaxVF.getFixedValue() * UserIC : MaxVF.getFixedValue();
// Avoid tail folding if the trip count is known to be a multiple of any VF we		// Avoid tail folding if the trip count is known to be a multiple of any VF we
// chose.		// chose.
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	reportVectorizationFailure(
"Cannot optimize for size and vectorize at the same time.",		"Cannot optimize for size and vectorize at the same time.",
"cannot optimize for size and vectorize at the same time. "		"cannot optimize for size and vectorize at the same time. "
"Enable vectorization of this loop with '#pragma clang loop "		"Enable vectorization of this loop with '#pragma clang loop "
"vectorize(enable)' when compiling with -Os/-Oz",		"vectorize(enable)' when compiling with -Os/-Oz",
"NoTailLoopWithOptForSize", ORE, TheLoop);		"NoTailLoopWithOptForSize", ORE, TheLoop);
return None;		return None;
}		}

ElementCount		Optional<ElementCount>
LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount,		LoopVectorizationCostModel::getFeasibleUserVF(ElementCount UserVF,
ElementCount UserVF) {		unsigned MaxSafeElements) {
bool IgnoreScalableUserVF = UserVF.isScalable() &&		if (UserVF.isZero())
!TTI.supportsScalableVectors() &&		return None;
!ForceTargetSupportsScalableVectors;
if (IgnoreScalableUserVF) {		if (UserVF.isScalable() && !TTI.supportsScalableVectors() &&
LLVM_DEBUG(		!ForceTargetSupportsScalableVectors) {
dbgs() << "LV: Ignoring VF=" << UserVF		OptimizationRemarkAnalysis R(DEBUG_TYPE, "IgnoreScalableUserVF",
<< " because target does not support scalable vectors.\n");		TheLoop->getStartLoc(), TheLoop->getHeader());
ORE->emit([&]() {		R << "Ignoring VF=" << ore::NV("UserVF", UserVF)
return OptimizationRemarkAnalysis(DEBUG_TYPE, "IgnoreScalableUserVF",
TheLoop->getStartLoc(),
TheLoop->getHeader())
<< "Ignoring VF=" << ore::NV("UserVF", UserVF)
<< " because target does not support scalable vectors.";		<< " because target does not support scalable vectors.";
});		LLVM_DEBUG(dbgs() << "LV: " << R.getMsg() << "\n");
		ORE->emit(R);
		return None;
}		}

// Beyond this point two scenarios are handled. If UserVF isn't specified
// then a suitable VF is chosen. If UserVF is specified and there are
// dependencies, check if it's legal. However, if a UserVF is specified and
// there are no dependencies, then there's nothing to do.
if (UserVF.isNonZero() && !IgnoreScalableUserVF) {
if (!canVectorizeReductions(UserVF)) {		if (!canVectorizeReductions(UserVF)) {
reportVectorizationFailure(		reportVectorizationFailure(
"LV: Scalable vectorization not supported for the reduction "		"LV: Scalable vectorization not supported for the reduction "
		c-rhodesUnsubmitted Done Reply Inline Actions `if (UserVF.isZero())`? c-rhodes: `if (UserVF.isZero())`?
"operations found in this loop. Using fixed-width "		"operations found in this loop. Using fixed-width "
"vectorization instead.",		"vectorization instead.",
"Scalable vectorization not supported for the reduction operations "		"Scalable vectorization not supported for the reduction operations "
"found in this loop. Using fixed-width vectorization instead.",		"found in this loop. Using fixed-width vectorization instead.",
"ScalableVFUnfeasible", ORE, TheLoop);		"ScalableVFUnfeasible", ORE, TheLoop);
return computeFeasibleMaxVF(		// FIXME: The UserVF should actually be ignored in this case.
ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));		UserVF = ElementCount::getFixed(UserVF.getKnownMinValue());
}		}

		// If UserVF is specified and there are no dependencies, no need to check
		// if the UserVF is legal.
if (Legal->isSafeForAnyVectorWidth())		if (Legal->isSafeForAnyVectorWidth())
return UserVF;		return UserVF;
}

MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
unsigned SmallestType, WidestType;
std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
unsigned WidestRegister = TTI.getRegisterBitWidth(true);

// Get the maximum safe dependence distance in bits computed by LAA.
// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
// the memory accesses that is most restrictive (involved in the smallest
// dependence distance).
unsigned MaxSafeVectorWidthInBits = Legal->getMaxSafeVectorWidthInBits();

// If the user vectorization factor is legally unsafe, clamp it to a safe
// value. Otherwise, return as is.
if (UserVF.isNonZero() && !IgnoreScalableUserVF) {
unsigned MaxSafeElements =
PowerOf2Floor(MaxSafeVectorWidthInBits / WidestType);
ElementCount MaxSafeVF = ElementCount::getFixed(MaxSafeElements);		ElementCount MaxSafeVF = ElementCount::getFixed(MaxSafeElements);

if (UserVF.isScalable()) {		if (UserVF.isScalable()) {
Optional<unsigned> MaxVScale = TTI.getMaxVScale();		Optional<unsigned> MaxVScale = TTI.getMaxVScale();

// Scale VF by vscale before checking if it's safe.		// Scale VF by vscale before checking if it's safe.
MaxSafeVF = ElementCount::getScalable(		MaxSafeVF = ElementCount::getScalable(
MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);		MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);

if (MaxSafeVF.isZero()) {		if (MaxSafeVF.isZero()) {
// The dependence distance is too small to use scalable vectors,		// The dependence distance is too small to use scalable vectors,
// fallback on fixed.		// fallback on fixed.
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "LV: Max legal vector width too small, scalable vectorization "		<< "LV: Max legal vector width too small, scalable vectorization "
"unfeasible. Using fixed-width vectorization instead.\n");		"unfeasible. Using fixed-width vectorization instead.\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemarkAnalysis(DEBUG_TYPE, "ScalableVFUnfeasible",		return OptimizationRemarkAnalysis(DEBUG_TYPE, "ScalableVFUnfeasible",
TheLoop->getStartLoc(),		TheLoop->getStartLoc(),
TheLoop->getHeader())		TheLoop->getHeader())
<< "Max legal vector width too small, scalable vectorization "		<< "Max legal vector width too small, scalable vectorization "
<< "unfeasible. Using fixed-width vectorization instead.";		<< "unfeasible. Using fixed-width vectorization instead.";
});		});
return computeFeasibleMaxVF(		return getFeasibleUserVF(
ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));		ElementCount::getFixed(UserVF.getKnownMinValue()), MaxSafeElements);
}		}
}		}

LLVM_DEBUG(dbgs() << "LV: The max safe VF is: " << MaxSafeVF << ".\n");		LLVM_DEBUG(dbgs() << "LV: The max safe VF is: " << MaxSafeVF << ".\n");

if (ElementCount::isKnownLE(UserVF, MaxSafeVF))		if (ElementCount::isKnownLE(UserVF, MaxSafeVF))
return UserVF;		return UserVF;

LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF		LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF
<< " is unsafe, clamping to max safe VF=" << MaxSafeVF		<< " is unsafe, clamping to max safe VF=" << MaxSafeVF
<< ".\n");		<< ".\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationFactor",		return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationFactor",
TheLoop->getStartLoc(),		TheLoop->getStartLoc(),
TheLoop->getHeader())		TheLoop->getHeader())
<< "User-specified vectorization factor "		<< "User-specified vectorization factor "
<< ore::NV("UserVectorizationFactor", UserVF)		<< ore::NV("UserVectorizationFactor", UserVF)
<< " is unsafe, clamping to maximum safe vectorization factor "		<< " is unsafe, clamping to maximum safe vectorization factor "
<< ore::NV("VectorizationFactor", MaxSafeVF);		<< ore::NV("VectorizationFactor", MaxSafeVF);
});		});
return MaxSafeVF;		return MaxSafeVF;
}		}

		ElementCount LoopVectorizationCostModel::computeFeasibleMaxVF(
		unsigned ConstTripCount, unsigned SmallestType, unsigned WidestType) {
		// Get the maximum safe dependence distance in bits computed by LAA.
		// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
		// the memory accesses that is most restrictive (involved in the smallest
		// dependence distance).
		unsigned MaxSafeVectorWidthInBits = Legal->getMaxSafeVectorWidthInBits();

		unsigned WidestRegister = TTI.getRegisterBitWidth(true);
WidestRegister = std::min(WidestRegister, MaxSafeVectorWidthInBits);		WidestRegister = std::min(WidestRegister, MaxSafeVectorWidthInBits);
		c-rhodesUnsubmitted Done Reply Inline Actions can this be moved above `MaxSafeVectorWidthInBits`? c-rhodes: can this be moved above `MaxSafeVectorWidthInBits`?

// Ensure MaxVF is a power of 2; the dependence distance bound may not be.		// Ensure MaxVF is a power of 2; the dependence distance bound may not be.
// Note that both WidestRegister and WidestType may not be a powers of 2.		// Note that both WidestRegister and WidestType may not be a powers of 2.
auto MaxVectorSize =		auto MaxVectorSize =
ElementCount::getFixed(PowerOf2Floor(WidestRegister / WidestType));		ElementCount::getFixed(PowerOf2Floor(WidestRegister / WidestType));

LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType		LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType
<< " / " << WidestType << " bits.\n");		<< " / " << WidestType << " bits.\n");
LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "		LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "
<< WidestRegister << " bits.\n");		<< WidestRegister << " bits.\n");

assert(MaxVectorSize.getFixedValue() <= WidestRegister &&		assert(MaxVectorSize.getFixedValue() <= WidestRegister &&
"Did not expect to pack so many elements"		"Did not expect to pack so many elements"
" into one vector!");		" into one vector!");
		c-rhodesUnsubmitted Not Done Reply Inline Actions it's nice how you've reused the remark for the debug message above, not really important for this patch but it would be good to do the same thing here c-rhodes: it's nice how you've reused the remark for the debug message above, not really important for…
		sdesmalenAuthorUnsubmitted Done Reply Inline Actions Thanks. I'm happy to see if I can do that in a separate patch (it requires some changes to the tests). sdesmalen: Thanks. I'm happy to see if I can do that in a separate patch (it requires some changes to the…
		c-rhodesUnsubmitted Not Done Reply Inline Actions Thanks. I'm happy to see if I can do that in a separate patch (it requires some changes to the tests). I agree that would be better in a separate patch considering the changes to tests required. c-rhodes: > Thanks. I'm happy to see if I can do that in a separate patch (it requires some changes to…
if (MaxVectorSize.getFixedValue() == 0) {		if (MaxVectorSize.getFixedValue() == 0) {
LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");		LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");
return ElementCount::getFixed(1);		return ElementCount::getFixed(1);
} else if (ConstTripCount && ConstTripCount < MaxVectorSize.getFixedValue() &&		} else if (ConstTripCount && ConstTripCount < MaxVectorSize.getFixedValue() &&
isPowerOf2_32(ConstTripCount)) {		isPowerOf2_32(ConstTripCount)) {
// We need to clamp the VF to be the ConstTripCount. There is no point in		// We need to clamp the VF to be the ConstTripCount. There is no point in
// choosing a higher viable VF as done in the loop below.		// choosing a higher viable VF as done in the loop below.
LLVM_DEBUG(dbgs() << "LV: Clamping the MaxVF to the constant trip count: "		LLVM_DEBUG(dbgs() << "LV: Clamping the MaxVF to the constant trip count: "
▲ Show 20 Lines • Show All 4,128 Lines • Show Last 20 Lines