This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
LoopAccessAnalysis.h
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorizationLegality.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
Transforms/Vectorize/
-
Vectorize/
11/34
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
AArch64/
1/3
scalable-loop-unpredicated-body-scalar-tail.ll
2
scalable-vf-hint.ll
-
metadata-width.ll
1/3
optimal-epilog-vectorization-limitations.ll
-
scalable-loop-unpredicated-body-scalar-tail.ll
1/1
scalable-vf-hint.ll

Differential D91718

[LV] Legalize scalable VF hints
ClosedPublic

Authored by c-rhodes on Nov 18 2020, 8:30 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
efriedma
fhahn
dmgreen
Meinersbur
ctetreau
craig.topper
david-arm
• zinob

Commits

rG1e7efd397ac2: [LV] Legalize scalable VF hints

Summary

In the following loop:

void foo(int *a, int *b, int N) {
  for (int i=0; i<N; ++i)
    a[i + 4] = a[i] + b[i];
}

The loop dependence constrains the VF to a maximum of (4, fixed), which
would mean using <4 x i32> as the vector type in vectorization.
Extending this to scalable vectorization, a VF of (4, scalable) implies
a vector type of <vscale x 4 x i32>. To determine if this is legal
vscale must be taken into account. For this example, unless
max(vscale)=1, it's unsafe to vectorize.

For SVE, the number of bits in an SVE register is architecturally
defined to be a multiple of 128 bits with a maximum of 2048 bits, thus
the maximum vscale is 16. In the loop above it is therefore unfeasible
to vectorize with SVE. However, in this loop:

void foo(int *a, int *b, int N) {
  #pragma clang loop vectorize_width(X, scalable)
  for (int i=0; i<N; ++i)
    a[i + 32] = a[i] + b[i];
}

As long as max(vscale) multiplied by the number of lanes 'X' doesn't
exceed the dependence distance, it is safe to vectorize. For SVE a VF of
(2, scalable) is within this constraint, since a vector of <16 x 2 x 32>
will have no dependencies between lanes. For any number of lanes larger
than this it would be unsafe to vectorize.

This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs
specified as loop hints, implementing the following behaviour:

If the backend does not support scalable vectors, ignore the hint.
If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF.
Accept scalable VFs if it's safe to do so.
Otherwise, clamp scalable VFs that exceed the maximum safe VF.

Diff Detail

Event Timeline

c-rhodes created this revision.Nov 18 2020, 8:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2020, 8:30 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

c-rhodes requested review of this revision.Nov 18 2020, 8:30 AM

Harbormaster completed remote builds in B79306: Diff 306119.Nov 18 2020, 8:31 AM

c-rhodes added parent revisions: D90687: [LV] Clamp VF hint when unsafe, D91077: [LoopVectorizer][SVE] Vectorize a simple loop with with a scalable VF..Nov 18 2020, 8:32 AM

sdesmalen added inline comments.Nov 24 2020, 9:41 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5597	nit: `MaxSafeElements`?
5599	Instead of returning an Optional<ElementCount>, I'd prefer the code here to just clamp to MaxFixedVF, in this order: first try to see if it can use the requested scalable VF. if not, then try to see if it can use the requested VF with vscale = 1 (i.e. fixed width) if not, then try if it can use a clamped fixed VF.

fhahn added inline comments.Nov 24 2020, 1:39 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594	This check seems a bit odd. I think we should at least use a named constant to make things clearer and ensure that LAA & LV are kept in sync on the meaning. But would it be possible to instead compute the maximum width of UserVF using MaxVScale (something like `UserVF * WidestType * MaxVScale`)?

c-rhodes added inline comments.Nov 25 2020, 7:41 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594	This check seems a bit odd. I think we should at least use a named constant to make things clearer and ensure that LAA & LV are kept in sync on the meaning. But would it be possible to instead compute the maximum width of UserVF using MaxVScale (something like `UserVF * WidestType * MaxVScale`)? I don't see how this is related, the reason I added this check is this block is validating a user VF and clamping it when it exceeds what is safe in terms of dependencies, but if there are no dependencies there's nothing to do here. I'm not sure if there's a better way of querying if there are no dependencies? `MaxSafeRegisterWidth` is initialized as `-1U` in LAA so I compared it against `UINT_MAX`, maybe it would be a little clearer if `MaxSafeRegisterWidth` was initialized as `UINT_MAX`. It would probably have made sense to introduced this in D90687 although I didn't consider it, it became obvious with this patch since large max safe VFs for loops with no dependencies were being emitted.
5597	nit: `MaxSafeElements`? Cheers that makes more sense, will update it
5599	Instead of returning an Optional<ElementCount>, I'd prefer the code here to just clamp to MaxFixedVF, in this order: first try to see if it can use the requested scalable VF. if not, then try to see if it can use the requested VF with vscale = 1 (i.e. fixed width) if not, then try if it can use a clamped fixed VF. Deferring to fixed-width so Optional isn't necessary is a good point, the first 2 cases should already be handled, I'll add the latter

Changes:

s/FixedVF/MaxSafeElements/
Rather than bail out, clamp to fixed VF is scalable is unfeasible. Remove change to return Optional<ElementCount> from LoopVectorizationCostModel::computeFeasibleMaxVF.
In LoopVectorizationPlanner::plan, if a UserVF is specified but isn't legal then use MaxVF (earlier).
Simplified the IR in the tests.

c-rhodes marked an inline comment as done.Nov 27 2020, 8:25 AM

sdesmalen added inline comments.Nov 30 2020, 6:57 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5612	This does something different than what I suggested though. For the following example, your code would set the MaxSafeVF to 32 elements instead of using a fixed VF of 4 elements (fixed): #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + 1; // for some int* a I think a VF of 4 would be preferred, because otherwise selectiondag will need to expand `<32 x i32>` to 8 x `<4 x i32>` fixed-width vectors (assuming 128bit vectors), which would be somewhat similar to requesting an interleave-count of 8.
llvm/test/Transforms/LoopVectorize/scalable-vf-hint.ll
21	At least until we add lowering of scalable vectors for targets that don't support it, I would rather the code explicitly checks if the target can handle them and ignore the hint if it doesn't (a similar suggestion was made on D88962).

sdesmalen mentioned this in D88962: [SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute.Dec 2 2020, 1:54 AM

c-rhodes mentioned this in D93060: [TTI] Add supportsScalableVectors target hook.Dec 10 2020, 12:51 PM

c-rhodes added a parent revision: D93060: [TTI] Add supportsScalableVectors target hook.

Changes:

Rebase since D88962 and D91077 have now landed.
Address @sdesmalen's comments. Added a target hook supportsScalableVectors in D93060 which is used in this patch to bail out of vectorization if unsupported. Also changed the behavior when scalable vectorization is unfeasible given a dependency, it now changes the hint to fixed, where existing behavior applies (i.e. clamping).
Updated tests.

c-rhodes marked 2 inline comments as done.Dec 10 2020, 1:02 PM

c-rhodes added a parent revision: D93063: [LV] Disable epilogue vectorization for scalable VFs.Dec 10 2020, 1:06 PM

sdesmalen added inline comments.Dec 11 2020, 3:05 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5572	MaxSafeVectorWidthInBits can't exceed UINT_MAX. If this this early bail out isn't strictly necessary, I'd suggest removing it.
5596–5602	Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal? i.e. bool IgnoreUserVF = UserVF.isScalable() && TTI.supportsScalableVectors(); if (IgnoreUserVF) ORE->emit([&]() { return OptimizationRemark(....) }; if (UserVF.isNonZero() && !IgnoreUserVF) { ... } // otherwise, let the LoopVectorizer choose a VF itself. I guess that can be split out into D93060 as well.
5621	Should this be an OptRemark instead?
5626	Can it know that `ElementCount::getFixed(UserVF.getKnownMinValue())` would be a valid VF? (that check is below on line 5614). Probably good to have a test for that as well.
7464	We still want to clamp to a smaller scalable VF if `UserVF.isScalable() && MaxVF.isScalable() && UserVF > MaxVF`. Because the loop below doesn't support scalable VFs, it is best to single that case out. That means: UserVF = Scalable && MaxVF = Scalable && UserVF > MaxVF => Pick MaxVF UserVF = Scalable && MaxVF = Fixed => Fall through into loop below. UserVF = Fixed && MaxVF = Fixed && UserVF > MaxVF => Fall through into loop below. Where the clamping in 1. is temporary until the loop below works on scalable vectors, so that it can choose the most profitable VF based on the cost-model.
llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll
2	Why does it require asserts?

fhahn added inline comments.Dec 11 2020, 3:32 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594	My point was that the original code handled the case where there are no dependencies naturally without an extra check, right? Is there a reason the logic for scalable vectors cannot do the same?
5596–5602	Agreed, if the user requests scalable vectorization through a hint, I probably should just drop the hint and proceed with normal cost-modeling. (if there's a reason to keep the bail-out, can we just return a VF of 1? this is how other places in the function indicate that vectorization should be avoided)

sdesmalen added inline comments.Dec 11 2020, 3:56 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5626	It seems I misread this line/statement, so please ignore this comment.

c-rhodes added inline comments.Dec 11 2020, 6:23 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5572	MaxSafeVectorWidthInBits can't exceed UINT_MAX. If this this early bail out isn't strictly necessary, I'd suggest removing it. There's a couple of existing tests: Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll Transforms/LoopVectorize/metadata-width.ll that will fall over if the hint is ignored when scalable vectorization isn't supported. I suppose those could become target-specific tests.
5596–5602	Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal? ... I guess that can be split out into D93060 as well. Ok np, I'll split that out then.
5621	Should this be an OptRemark instead? Good point, I'll add that
7464	Ok, I'll add a temporary workaround for scalable VFs to select MaxVF above if UserVF is unsafe.
llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll
2	Why does it require asserts? Because of the debug flags

c-rhodes added inline comments.Dec 11 2020, 6:54 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594	My point was that the original code handled the case where there are no dependencies naturally without an extra check, right? Is there a reason the logic for scalable vectors cannot do the same? Sorry I missed your comment. One issue is for non-target specific tests with scalable VFs such as: Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll Transforms/LoopVectorize/metadata-width.ll the hint will be ignored since scalable vectorization isn't supported, and also `getMaxVScale` now returns None. I suppose there's a couple of options, we could add new loop vectorizer flags or those tests could become become target-specific, I'm not sure what the best option is.

sdesmalen added inline comments.Dec 11 2020, 8:16 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5572	If the test has a loop-carried distance that may invalidate auto-vectorization that needs checking, I think it's fair enough for the test to be moved to be a target-specific test (because that will require `getMaxVScale`). For a test like metadata-width.ll it may be useful to add some `-force-target-supports-scalable-vectors=true` flag to avoid making all scalable-vector tests for the LoopVectorizer target-specific. @fhahn do you have any strong feelings about this?

Rather than disable vectorization if scalable VF isn't supported, let the LV pick a VF as it normally does and process with cost-modelling.
Added a new LV flag -force-target-supports-scalable-vectors to support writing non-target specific tests with scalable VFs.
Added a temporary workaround to use MaxVF in place of UserVF until cost modelling loop works for scalable VFs.
Emit opt remark.

c-rhodes marked 3 inline comments as done.Dec 14 2020, 7:38 AM

c-rhodes added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5596–5602	Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal? ... I guess that can be split out into D93060 as well. Ok np, I'll split that out then. Done, although I've kept it as part of this patch. Ignoring the hint if scalable vectors aren't supported breaks existing tests so I've implemented the flag you suggested `-force-target-supports-scalable-vectors` which can be used for non-target specific tests. Although this only works for loops with no dependencies since `getMaxVScale` is None by default, so I've moved `scalable-loop-unpredicated-body-scalar-tail.ll` to AArch64 and enabled SVE.

david-arm added a subscriber: david-arm.Dec 15 2020, 6:40 AM

david-arm added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7445	Is this additional check necessary here? It looks like MaxVF came from computeMaXVF, which makes use of your clamping code above. If you revert this change do your tests still pass?

david-arm added inline comments.Dec 15 2020, 6:51 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7445	Ah I think I misunderstood what the clamping above is doing. It only clamps MaxVF, not UserVF. I wonder if it's worth emitting a remark for this case to let the user know something happened? It might look a bit odd if we downgrade their VF without telling them. I guess it depends upon how quick we get a fix for this.

c-rhodes added inline comments.Dec 15 2020, 6:56 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7445	Is this additional check necessary here? Until the loop below supports scalable VFs yes, with this check we always fall into this block for a scalable VF, this handles the case where the UserVF isn't legal but a scalable VF is feasible so it clamps to the scalable MaxVF. Otherwise the assert below would be triggered, this is changed from the previous revision to implement @sdesmalen suggestion of: `UserVF = Scalable && MaxVF = Scalable && UserVF > MaxVF => Pick MaxVF`, rather than change MaxVF from scalable -> fixed.

Handle scalable VF in a single isScalable block in computeFeasibleMaxVF.

c-rhodes mentioned this in rG7c8796f9db2c: [TTI] Add supportsScalableVectors target hook.Dec 18 2020, 2:59 AM

LGTM. I agree the FIXME in the code isn't ideal, but I understand why it's there and is only a temporary measure until we can fix the loop to support scalable vectors.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-loop-unpredicated-body-scalar-tail.ll
2	nit: Perhaps it's better to run llvm/utils/update_llc_test_checks.py on these tests rather than hand-write the CHECK lines? This seems to be the accepted way of doing it in LLVM these days. :)

This revision is now accepted and ready to land.Dec 18 2020, 6:46 AM

fhahn added inline comments.Dec 18 2020, 7:15 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
278	It would be good to be clear in the message that this is just for testing (same as the previous option). Perhaps something like `Pretend that scalable vectors are supported, even if the target does not support them. The flag should only be used for testing.`
5571–5591	This is just about ignoring scalable UserVFs, right? Might be good to make that clear as part of the variable name.
5572	Yes that sounds reasonable to me.
5594	Ah right, if `getMaxVScale` can return `None`, then my suggestion doesn't work as expected. In that case, I think it would be good to encapsulate the check in LAA and have something like `areAllVectorWidthsSafe` or something. This should be more robust to future changes in LAA.

Address @fhahn's comments, changes:

Clarify help message in -force-target-supports-scalable-vectors flag.
s/IgnoreUserVF/IgnoreScalableUserVF/g
Added isSafeForAnyVectorWidth to LAA that checks if the number of elements that can be operated on simultaneously is not bound. This is used in computeFeasibleMaxVF to skip the checks for whether the VF is safe given the dependence distance.

c-rhodes marked 3 inline comments as done.Dec 18 2020, 9:07 AM

hiraditya added a reviewer: • zinob.Dec 18 2020, 10:01 AM

sdesmalen added inline comments.Jan 4 2021, 6:42 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5600–5601	It should be fine for MaxVScale to be undefined for a target architecture. Based on the logic below where it tries to pick a suitable alternative VF, it should probably pick a different (non-scalable) VF, instead of crashing.

Changes:

Rebased. Target hook for max vscale has been removed from this patch since it landed in D93030.
Address Sander's comment w.r.t. falling back to fixed vectorization if scalable vectors are supported, but max vscale is undefined.

c-rhodes marked an inline comment as done.Jan 5 2021, 10:06 AM

LGTM, thanks. Some small comments left inline.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5571–5591	This early exit could also be on the top of the function, before we compute the register widths & co and is not specific to scalable vectors, right? Might be good to move, although the 'scalable ignored' message might also needs to move.
llvm/test/Transforms/LoopVectorize/AArch64/scalable-loop-unpredicated-body-scalar-tail.ll
101	`-force-vector-width` should also work, right? In that case, it would be good to use that at least for some tests. Also, could this be kept target-independent with `-force-target-supports-scalable-vectors=true`?
llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll
2	is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors.

c-rhodes added inline comments.Jan 6 2021, 7:13 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5571–5591	This early exit could also be on the top of the function, before we compute the register widths & co and is not specific to scalable vectors, right? Might be good to move, although the 'scalable ignored' message might also needs to move. That's right it's not specific to scalable vectors, as long as `User.isNonZero()` it applies. I think you're right, if this is moved to the top `IgnoreScalableUserVF` will need to be part of the check, so: // Nothing to do if there are no dependencies. if (UserVF.isNonZero() && !IgnoreScalableUserVF && Legal->isSafeForAnyVectorWidth()) return UserVF Otherwise scalable VFs will be accepted for targets without scalable vector support, where it currently falls into the existing code to pick a VF. It's fine with me to move this to the top with that in mind.
llvm/test/Transforms/LoopVectorize/AArch64/scalable-loop-unpredicated-body-scalar-tail.ll
101	`-force-vector-width` should also work, right? In that case, it would be good to use that at least for some tests. Yeah good idea, not sure we test the combination of that flag and the metadata, I'll change that. Also, could this be kept target-independent with `-force-target-supports-scalable-vectors=true`? It can for this test since there's no dependencies so max vscale isn't required to determine if the scalable VF is valid. I'll implement your suggestion and keep this target-indepedent.
llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll
2	is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors. The final test here checks epilogue vectorization is disabled for scalable vectors, I can split that out but it'll still require this flag to target-independent. Or it could become an AArch64 test with SVE.

Address @fhahn comments.

@fhahn thanks for reviewing, I think I've addressed all comments. I'll probably land this over the coming days unless there's any objections between then, cheers.

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll
2	is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors. The final test here checks epilogue vectorization is disabled for scalable vectors, I can split that out but it'll still require this flag to target-independent. Or it could become an AArch64 test with SVE. I've split this into a separate test: `llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-scalable.ll`.

LGTM, thanks for your work on this patch @c-rhodes!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5616	nit: is it useful to print MaxSafeElements or MaxSafeVectorWidthInBits here?

c-rhodes added inline comments.Jan 7 2021, 7:35 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5616	nit: is it useful to print MaxSafeElements or MaxSafeVectorWidthInBits here? `MaxSafeElements` is undefined here but LAA emits the max VF anyway which should be sufficient?

Closed by commit rG1e7efd397ac2: [LV] Legalize scalable VF hints (authored by c-rhodes). · Explain WhyJan 8 2021, 3:03 AM

This revision was automatically updated to reflect the committed changes.

c-rhodes added a commit: rG1e7efd397ac2: [LV] Legalize scalable VF hints.

c-rhodes mentioned this in D90687: [LV] Clamp VF hint when unsafe.Jan 16 2021, 9:07 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

6 lines

TargetTransformInfo.h

8 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

2 lines

Transforms/

Vectorize/

LoopVectorizationLegality.h

4 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

6 lines

Transforms/

Vectorize/

LoopVectorize.cpp

90 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-loop-unpredicated-body-scalar-tail.ll

101 lines

scalable-vf-hint.ll

333 lines

metadata-width.ll

2 lines

optimal-epilog-vectorization-limitations.ll

2 lines

scalable-loop-unpredicated-body-scalar-tail.ll

scalable-vf-hint.ll

33 lines

Diff 312819

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	bool areDepsSafe(DepCandidates &AccessSets, MemAccessInfoList &CheckDeps,
const ValueToValueMap &Strides);		const ValueToValueMap &Strides);

/// No memory dependence was encountered that would inhibit		/// No memory dependence was encountered that would inhibit
/// vectorization.		/// vectorization.
bool isSafeForVectorization() const {		bool isSafeForVectorization() const {
return Status == VectorizationSafetyStatus::Safe;		return Status == VectorizationSafetyStatus::Safe;
}		}

		/// Return true if the number of elements that are safe to operate on
		/// simultaneously is not bounded.
		bool isSafeForAnyVectorWidth() const {
		return MaxSafeVectorWidthInBits == UINT_MAX;
		}

/// The maximum number of bytes of a vector register we can vectorize		/// The maximum number of bytes of a vector register we can vectorize
/// the accesses safely with.		/// the accesses safely with.
uint64_t getMaxSafeDepDistBytes() { return MaxSafeDepDistBytes; }		uint64_t getMaxSafeDepDistBytes() { return MaxSafeDepDistBytes; }

/// Return the number of elements that are safe to operate on		/// Return the number of elements that are safe to operate on
/// simultaneously, multiplied by the size of the element in bits.		/// simultaneously, multiplied by the size of the element in bits.
uint64_t getMaxSafeVectorWidthInBits() const {		uint64_t getMaxSafeVectorWidthInBits() const {
return MaxSafeVectorWidthInBits;		return MaxSafeVectorWidthInBits;
▲ Show 20 Lines • Show All 552 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 922 Lines • ▼ Show 20 Lines	public:
const char *getRegisterClassName(unsigned ClassID) const;		const char *getRegisterClassName(unsigned ClassID) const;

/// \return The width of the largest scalar or vector register type.		/// \return The width of the largest scalar or vector register type.
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;

/// \return The width of the smallest vector register type.		/// \return The width of the smallest vector register type.
unsigned getMinVectorRegisterBitWidth() const;		unsigned getMinVectorRegisterBitWidth() const;

		/// \return The maximum value for vscale in scalable vectors such as
		/// <vscale x 4 x i32>. Default is None.
		Optional<unsigned> getMaxVScale() const;

/// \return True if the vectorization factor should be chosen to		/// \return True if the vectorization factor should be chosen to
/// make the vector of the smallest element type match the size of a		/// make the vector of the smallest element type match the size of a
/// vector register. For wider element types, this could result in		/// vector register. For wider element types, this could result in
/// creating vectors that span multiple vector registers.		/// creating vectors that span multiple vector registers.
/// If false, the vectorization factor will be chosen based on the		/// If false, the vectorization factor will be chosen based on the
/// size of the widest element type.		/// size of the widest element type.
bool shouldMaximizeVectorBandwidth(bool OptSize) const;		bool shouldMaximizeVectorBandwidth(bool OptSize) const;

▲ Show 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	virtual int getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx,
const APInt &Imm, Type *Ty,		const APInt &Imm, Type *Ty,
TargetCostKind CostKind) = 0;		TargetCostKind CostKind) = 0;
virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;		virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;
virtual unsigned getRegisterClassForType(bool Vector,		virtual unsigned getRegisterClassForType(bool Vector,
Type *Ty = nullptr) const = 0;		Type *Ty = nullptr) const = 0;
virtual const char *getRegisterClassName(unsigned ClassID) const = 0;		virtual const char *getRegisterClassName(unsigned ClassID) const = 0;
virtual unsigned getRegisterBitWidth(bool Vector) const = 0;		virtual unsigned getRegisterBitWidth(bool Vector) const = 0;
virtual unsigned getMinVectorRegisterBitWidth() = 0;		virtual unsigned getMinVectorRegisterBitWidth() = 0;
		virtual Optional<unsigned> getMaxVScale() const = 0;
virtual bool shouldMaximizeVectorBandwidth(bool OptSize) const = 0;		virtual bool shouldMaximizeVectorBandwidth(bool OptSize) const = 0;
virtual unsigned getMinimumVF(unsigned ElemWidth) const = 0;		virtual unsigned getMinimumVF(unsigned ElemWidth) const = 0;
virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;		virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;
virtual bool shouldConsiderAddressTypePromotion(		virtual bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;
virtual unsigned getCacheLineSize() const = 0;		virtual unsigned getCacheLineSize() const = 0;
virtual Optional<unsigned> getCacheSize(CacheLevel Level) const = 0;		virtual Optional<unsigned> getCacheSize(CacheLevel Level) const = 0;
virtual Optional<unsigned> getCacheAssociativity(CacheLevel Level) const = 0;		virtual Optional<unsigned> getCacheAssociativity(CacheLevel Level) const = 0;
▲ Show 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	const char *getRegisterClassName(unsigned ClassID) const override {
return Impl.getRegisterClassName(ClassID);		return Impl.getRegisterClassName(ClassID);
}		}
unsigned getRegisterBitWidth(bool Vector) const override {		unsigned getRegisterBitWidth(bool Vector) const override {
return Impl.getRegisterBitWidth(Vector);		return Impl.getRegisterBitWidth(Vector);
}		}
unsigned getMinVectorRegisterBitWidth() override {		unsigned getMinVectorRegisterBitWidth() override {
return Impl.getMinVectorRegisterBitWidth();		return Impl.getMinVectorRegisterBitWidth();
}		}
		Optional<unsigned> getMaxVScale() const override {
		return Impl.getMaxVScale();
		}
bool shouldMaximizeVectorBandwidth(bool OptSize) const override {		bool shouldMaximizeVectorBandwidth(bool OptSize) const override {
return Impl.shouldMaximizeVectorBandwidth(OptSize);		return Impl.shouldMaximizeVectorBandwidth(OptSize);
}		}
unsigned getMinimumVF(unsigned ElemWidth) const override {		unsigned getMinimumVF(unsigned ElemWidth) const override {
return Impl.getMinimumVF(ElemWidth);		return Impl.getMinimumVF(ElemWidth);
}		}
unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const override {		unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const override {
return Impl.getMaximumVF(ElemWidth, Opcode);		return Impl.getMaximumVF(ElemWidth, Opcode);
▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 346 Lines • ▼ Show 20 Lines	case 1:
return "Generic::VectorRC";		return "Generic::VectorRC";
}		}
}		}

unsigned getRegisterBitWidth(bool Vector) const { return 32; }		unsigned getRegisterBitWidth(bool Vector) const { return 32; }

unsigned getMinVectorRegisterBitWidth() { return 128; }		unsigned getMinVectorRegisterBitWidth() { return 128; }

		llvm::Optional<unsigned> getMaxVScale() const { return None; }

bool shouldMaximizeVectorBandwidth(bool OptSize) const { return false; }		bool shouldMaximizeVectorBandwidth(bool OptSize) const { return false; }

unsigned getMinimumVF(unsigned ElemWidth) const { return 0; }		unsigned getMinimumVF(unsigned ElemWidth) const { return 0; }

unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const { return 0; }		unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const { return 0; }

bool		bool
shouldConsiderAddressTypePromotion(const Instruction &I,		shouldConsiderAddressTypePromotion(const Instruction &I,
▲ Show 20 Lines • Show All 721 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 565 Lines • ▼ Show 20 Lines	public:

/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getRegisterBitWidth(bool Vector) const { return 32; }		unsigned getRegisterBitWidth(bool Vector) const { return 32; }

		Optional<unsigned> getMaxVScale() const { return None; }

/// Estimate the overhead of scalarizing an instruction. Insert and Extract		/// Estimate the overhead of scalarizing an instruction. Insert and Extract
/// are set if the demanded result elements need to be inserted and/or		/// are set if the demanded result elements need to be inserted and/or
/// extracted from vectors.		/// extracted from vectors.
unsigned getScalarizationOverhead(VectorType *InTy, const APInt &DemandedElts,		unsigned getScalarizationOverhead(VectorType *InTy, const APInt &DemandedElts,
bool Insert, bool Extract) {		bool Insert, bool Extract) {
/// FIXME: a bitfield is not a reasonable abstraction for talking about		/// FIXME: a bitfield is not a reasonable abstraction for talking about
/// which elements are needed from a scalable vector		/// which elements are needed from a scalable vector
auto *Ty = cast<FixedVectorType>(InTy);		auto *Ty = cast<FixedVectorType>(InTy);
▲ Show 20 Lines • Show All 1,466 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 319 Lines • ▼ Show 20 Lines	public:

/// Returns the information that we collected about runtime memory check.		/// Returns the information that we collected about runtime memory check.
const RuntimePointerChecking *getRuntimePointerChecking() const {		const RuntimePointerChecking *getRuntimePointerChecking() const {
return LAI->getRuntimePointerChecking();		return LAI->getRuntimePointerChecking();
}		}

const LoopAccessInfo *getLAI() const { return LAI; }		const LoopAccessInfo *getLAI() const { return LAI; }

		bool isSafeForAnyVectorWidth() const {
		return LAI->getDepChecker().isSafeForAnyVectorWidth();
		}

unsigned getMaxSafeDepDistBytes() { return LAI->getMaxSafeDepDistBytes(); }		unsigned getMaxSafeDepDistBytes() { return LAI->getMaxSafeDepDistBytes(); }

uint64_t getMaxSafeVectorWidthInBits() const {		uint64_t getMaxSafeVectorWidthInBits() const {
return LAI->getDepChecker().getMaxSafeVectorWidthInBits();		return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
}		}

bool hasStride(Value *V) { return LAI->hasStride(V); }		bool hasStride(Value *V) { return LAI->hasStride(V); }

▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 621 Lines • ▼ Show 20 Lines
	unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {			unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {
	return TTIImpl->getRegisterBitWidth(Vector);			return TTIImpl->getRegisterBitWidth(Vector);
	}			}

	unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {			unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {
	return TTIImpl->getMinVectorRegisterBitWidth();			return TTIImpl->getMinVectorRegisterBitWidth();
	}			}

				llvm::Optional<unsigned> TargetTransformInfo::getMaxVScale() const {
				return TTIImpl->getMaxVScale();
				}

	bool TargetTransformInfo::shouldMaximizeVectorBandwidth(bool OptSize) const {			bool TargetTransformInfo::shouldMaximizeVectorBandwidth(bool OptSize) const {
	return TTIImpl->shouldMaximizeVectorBandwidth(OptSize);			return TTIImpl->shouldMaximizeVectorBandwidth(OptSize);
	}			}

	unsigned TargetTransformInfo::getMinimumVF(unsigned ElemWidth) const {			unsigned TargetTransformInfo::getMinimumVF(unsigned ElemWidth) const {
	return TTIImpl->getMinimumVF(ElemWidth);			return TTIImpl->getMinimumVF(ElemWidth);
	}			}

	▲ Show 20 Lines • Show All 820 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	unsigned getRegisterBitWidth(bool Vector) const {
}		}
return 64;		return 64;
}		}

unsigned getMinVectorRegisterBitWidth() {		unsigned getMinVectorRegisterBitWidth() {
return ST->getMinVectorRegisterBitWidth();		return ST->getMinVectorRegisterBitWidth();
}		}

		Optional<unsigned> getMaxVScale() const {
		if (ST->hasSVE())
		return AArch64::SVEMaxBitsPerVector / AArch64::SVEBitsPerBlock;
		return BaseT::getMaxVScale();
		}

unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);

int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,		int getCastInstrCost(unsigned Opcode, Type Dst, Type Src,
TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,		TTI::CastContextHint CCH, TTI::TargetCostKind CostKind,
const Instruction *I = nullptr);		const Instruction *I = nullptr);

int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,		int getExtractWithExtendCost(unsigned Opcode, Type Dst, VectorType VecTy,
unsigned Index);		unsigned Index);
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	cl::desc("A flag that overrides the target's max interleave factor for "
"vectorized loops."));		"vectorized loops."));

static cl::opt<unsigned> ForceTargetInstructionCost(		static cl::opt<unsigned> ForceTargetInstructionCost(
"force-target-instruction-cost", cl::init(0), cl::Hidden,		"force-target-instruction-cost", cl::init(0), cl::Hidden,
cl::desc("A flag that overrides the target's expected cost for "		cl::desc("A flag that overrides the target's expected cost for "
"an instruction to a single constant value. Mostly "		"an instruction to a single constant value. Mostly "
"useful for getting consistent testing."));		"useful for getting consistent testing."));

		static cl::opt<bool> ForceTargetSupportsScalableVectors(
		"force-target-supports-scalable-vectors", cl::init(false), cl::Hidden,
		cl::desc(
		"Pretend that scalable vectors are supported, even if the target does "
		fhahnUnsubmitted Done Reply Inline Actions It would be good to be clear in the message that this is just for testing (same as the previous option). Perhaps something like `Pretend that scalable vectors are supported, even if the target does not support them. The flag should only be used for testing.` fhahn: It would be good to be clear in the message that this is just for testing (same as the previous…
		"not support them. This flag should only be used for testing."));

static cl::opt<unsigned> SmallLoopCost(		static cl::opt<unsigned> SmallLoopCost(
"small-loop-cost", cl::init(20), cl::Hidden,		"small-loop-cost", cl::init(20), cl::Hidden,
cl::desc(		cl::desc(
"The cost of a loop that is considered 'small' by the interleaver."));		"The cost of a loop that is considered 'small' by the interleaver."));

static cl::opt<bool> LoopVectorizeWithBlockFrequency(		static cl::opt<bool> LoopVectorizeWithBlockFrequency(
"loop-vectorize-with-block-frequency", cl::init(true), cl::Hidden,		"loop-vectorize-with-block-frequency", cl::init(true), cl::Hidden,
cl::desc("Enable the use of the block frequency analysis to access PGO "		cl::desc("Enable the use of the block frequency analysis to access PGO "
▲ Show 20 Lines • Show All 5,274 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount,
unsigned WidestRegister = TTI.getRegisterBitWidth(true);		unsigned WidestRegister = TTI.getRegisterBitWidth(true);

// Get the maximum safe dependence distance in bits computed by LAA.		// Get the maximum safe dependence distance in bits computed by LAA.
// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from		// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
// the memory accesses that is most restrictive (involved in the smallest		// the memory accesses that is most restrictive (involved in the smallest
// dependence distance).		// dependence distance).
unsigned MaxSafeVectorWidthInBits = Legal->getMaxSafeVectorWidthInBits();		unsigned MaxSafeVectorWidthInBits = Legal->getMaxSafeVectorWidthInBits();

if (UserVF.isNonZero()) {		bool IgnoreScalableUserVF = UserVF.isScalable() &&
// For now, don't verify legality of scalable vectors.		!TTI.supportsScalableVectors() &&
		sdesmalenUnsubmitted Not Done Reply Inline Actions MaxSafeVectorWidthInBits can't exceed UINT_MAX. If this this early bail out isn't strictly necessary, I'd suggest removing it. sdesmalen: MaxSafeVectorWidthInBits can't exceed UINT_MAX. If this this early bail out isn't strictly…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions MaxSafeVectorWidthInBits can't exceed UINT_MAX. If this this early bail out isn't strictly necessary, I'd suggest removing it. There's a couple of existing tests: Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll Transforms/LoopVectorize/metadata-width.ll that will fall over if the hint is ignored when scalable vectorization isn't supported. I suppose those could become target-specific tests. c-rhodes: > MaxSafeVectorWidthInBits can't exceed UINT_MAX. If this this early bail out isn't strictly…
		sdesmalenUnsubmitted Not Done Reply Inline Actions If the test has a loop-carried distance that may invalidate auto-vectorization that needs checking, I think it's fair enough for the test to be moved to be a target-specific test (because that will require `getMaxVScale`). For a test like metadata-width.ll it may be useful to add some `-force-target-supports-scalable-vectors=true` flag to avoid making all scalable-vector tests for the LoopVectorizer target-specific. @fhahn do you have any strong feelings about this? sdesmalen: If the test has a loop-carried distance that may invalidate auto-vectorization that needs…
		fhahnUnsubmitted Not Done Reply Inline Actions Yes that sounds reasonable to me. fhahn: Yes that sounds reasonable to me.
// This will be addressed properly in https://reviews.llvm.org/D91718.		!ForceTargetSupportsScalableVectors;
if (UserVF.isScalable())		if (IgnoreScalableUserVF) {
		LLVM_DEBUG(
		dbgs() << "LV: Ignoring VF=" << UserVF
		<< " because target does not support scalable vectors.\n");
		ORE->emit([&]() {
		return OptimizationRemarkAnalysis(DEBUG_TYPE, "IgnoreScalableUserVF",
		TheLoop->getStartLoc(),
		TheLoop->getHeader())
		<< "Ignoring VF=" << ore::NV("UserVF", UserVF)
		<< " because target does not support scalable vectors.";
		});
		}

		// If the user vectorization factor is legally unsafe, clamp it to a safe
		// value. Otherwise, return as is.
		if (UserVF.isNonZero() && !IgnoreScalableUserVF) {
		// Nothing to do if there are no dependencies.
		if (Legal->isSafeForAnyVectorWidth())
		fhahnUnsubmitted Done Reply Inline Actions This is just about ignoring scalable UserVFs, right? Might be good to make that clear as part of the variable name. fhahn: This is just about ignoring scalable UserVFs, right? Might be good to make that clear as part…
		fhahnUnsubmitted Not Done Reply Inline Actions This early exit could also be on the top of the function, before we compute the register widths & co and is not specific to scalable vectors, right? Might be good to move, although the 'scalable ignored' message might also needs to move. fhahn: This early exit could also be on the top of the function, before we compute the register widths…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions This early exit could also be on the top of the function, before we compute the register widths & co and is not specific to scalable vectors, right? Might be good to move, although the 'scalable ignored' message might also needs to move. That's right it's not specific to scalable vectors, as long as `User.isNonZero()` it applies. I think you're right, if this is moved to the top `IgnoreScalableUserVF` will need to be part of the check, so: // Nothing to do if there are no dependencies. if (UserVF.isNonZero() && !IgnoreScalableUserVF && Legal->isSafeForAnyVectorWidth()) return UserVF Otherwise scalable VFs will be accepted for targets without scalable vector support, where it currently falls into the existing code to pick a VF. It's fine with me to move this to the top with that in mind. c-rhodes: > This early exit could also be on the top of the function, before we compute the register…
return UserVF;		return UserVF;

// If legally unsafe, clamp the user vectorization factor to a safe value.		unsigned MaxSafeElements =
		fhahnUnsubmitted Not Done Reply Inline Actions This check seems a bit odd. I think we should at least use a named constant to make things clearer and ensure that LAA & LV are kept in sync on the meaning. But would it be possible to instead compute the maximum width of UserVF using MaxVScale (something like `UserVF * WidestType * MaxVScale`)? fhahn: This check seems a bit odd. I think we should at least use a named constant to make things…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions This check seems a bit odd. I think we should at least use a named constant to make things clearer and ensure that LAA & LV are kept in sync on the meaning. But would it be possible to instead compute the maximum width of UserVF using MaxVScale (something like `UserVF * WidestType * MaxVScale`)? I don't see how this is related, the reason I added this check is this block is validating a user VF and clamping it when it exceeds what is safe in terms of dependencies, but if there are no dependencies there's nothing to do here. I'm not sure if there's a better way of querying if there are no dependencies? `MaxSafeRegisterWidth` is initialized as `-1U` in LAA so I compared it against `UINT_MAX`, maybe it would be a little clearer if `MaxSafeRegisterWidth` was initialized as `UINT_MAX`. It would probably have made sense to introduced this in D90687 although I didn't consider it, it became obvious with this patch since large max safe VFs for loops with no dependencies were being emitted. c-rhodes: > This check seems a bit odd. I think we should at least use a named constant to make things…
		fhahnUnsubmitted Not Done Reply Inline Actions My point was that the original code handled the case where there are no dependencies naturally without an extra check, right? Is there a reason the logic for scalable vectors cannot do the same? fhahn: My point was that the original code handled the case where there are no dependencies naturally…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions My point was that the original code handled the case where there are no dependencies naturally without an extra check, right? Is there a reason the logic for scalable vectors cannot do the same? Sorry I missed your comment. One issue is for non-target specific tests with scalable VFs such as: Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll Transforms/LoopVectorize/metadata-width.ll the hint will be ignored since scalable vectorization isn't supported, and also `getMaxVScale` now returns None. I suppose there's a couple of options, we could add new loop vectorizer flags or those tests could become become target-specific, I'm not sure what the best option is. c-rhodes: > My point was that the original code handled the case where there are no dependencies…
		fhahnUnsubmitted Done Reply Inline Actions Ah right, if `getMaxVScale` can return `None`, then my suggestion doesn't work as expected. In that case, I think it would be good to encapsulate the check in LAA and have something like `areAllVectorWidthsSafe` or something. This should be more robust to future changes in LAA. fhahn: Ah right, if `getMaxVScale` can return `None`, then my suggestion doesn't work as expected. In…
unsigned MaxSafeVF = PowerOf2Floor(MaxSafeVectorWidthInBits / WidestType);		PowerOf2Floor(MaxSafeVectorWidthInBits / WidestType);
if (UserVF.getFixedValue() <= MaxSafeVF)		ElementCount MaxSafeVF = ElementCount::getFixed(MaxSafeElements);

		sdesmalenUnsubmitted Done Reply Inline Actions nit: `MaxSafeElements`? sdesmalen: nit: `MaxSafeElements`?
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions nit: `MaxSafeElements`? Cheers that makes more sense, will update it c-rhodes: > nit: `MaxSafeElements`? Cheers that makes more sense, will update it
		if (UserVF.isScalable()) {
		Optional<unsigned> MaxVScale = TTI.getMaxVScale();
		sdesmalenUnsubmitted Not Done Reply Inline Actions Instead of returning an Optional<ElementCount>, I'd prefer the code here to just clamp to MaxFixedVF, in this order: first try to see if it can use the requested scalable VF. if not, then try to see if it can use the requested VF with vscale = 1 (i.e. fixed width) if not, then try if it can use a clamped fixed VF. sdesmalen: Instead of returning an Optional<ElementCount>, I'd prefer the code here to just clamp to…
		c-rhodesAuthorUnsubmitted Done Reply Inline Actions Instead of returning an Optional<ElementCount>, I'd prefer the code here to just clamp to MaxFixedVF, in this order: first try to see if it can use the requested scalable VF. if not, then try to see if it can use the requested VF with vscale = 1 (i.e. fixed width) if not, then try if it can use a clamped fixed VF. Deferring to fixed-width so Optional isn't necessary is a good point, the first 2 cases should already be handled, I'll add the latter c-rhodes: > Instead of returning an Optional<ElementCount>, I'd prefer the code here to just clamp to…
		assert(MaxVScale &&
		"max vscale undefined for target that supports scalable vectors!");
		sdesmalenUnsubmitted Done Reply Inline Actions It should be fine for MaxVScale to be undefined for a target architecture. Based on the logic below where it tries to pick a suitable alternative VF, it should probably pick a different (non-scalable) VF, instead of crashing. sdesmalen: It should be fine for MaxVScale to be undefined for a target architecture. Based on the logic…

		sdesmalenUnsubmitted Done Reply Inline Actions Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal? i.e. bool IgnoreUserVF = UserVF.isScalable() && TTI.supportsScalableVectors(); if (IgnoreUserVF) ORE->emit([&]() { return OptimizationRemark(....) }; if (UserVF.isNonZero() && !IgnoreUserVF) { ... } // otherwise, let the LoopVectorizer choose a VF itself. I guess that can be split out into D93060 as well. sdesmalen: Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to…
		fhahnUnsubmitted Not Done Reply Inline Actions Agreed, if the user requests scalable vectorization through a hint, I probably should just drop the hint and proceed with normal cost-modeling. (if there's a reason to keep the bail-out, can we just return a VF of 1? this is how other places in the function indicate that vectorization should be avoided) fhahn: Agreed, if the user requests scalable vectorization through a hint, I probably should just drop…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal? ... I guess that can be split out into D93060 as well. Ok np, I'll split that out then. c-rhodes: > Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal? ... I guess that can be split out into D93060 as well. Ok np, I'll split that out then. Done, although I've kept it as part of this patch. Ignoring the hint if scalable vectors aren't supported breaks existing tests so I've implemented the flag you suggested `-force-target-supports-scalable-vectors` which can be used for non-target specific tests. Although this only works for loops with no dependencies since `getMaxVScale` is None by default, so I've moved `scalable-loop-unpredicated-body-scalar-tail.ll` to AArch64 and enabled SVE. c-rhodes: > > Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to…
		// Scale VF by vscale before checking if it's safe.
		MaxSafeVF =
		ElementCount::getScalable(MaxSafeElements / MaxVScale.getValue());

		if (MaxSafeVF.isZero()) {
		// The dependence distance is too small to use scalable vectors,
		// fallback on fixed.
		LLVM_DEBUG(
		dbgs()
		<< "LV: Max legal vector width too small, scalable vectorization "
		sdesmalenUnsubmitted Done Reply Inline Actions This does something different than what I suggested though. For the following example, your code would set the MaxSafeVF to 32 elements instead of using a fixed VF of 4 elements (fixed): #pragma clang loop vectorize_width(4, scalable) for (int i=0; i<N; ++i) a[i + 32] = a[i] + 1; // for some int* a I think a VF of 4 would be preferred, because otherwise selectiondag will need to expand `<32 x i32>` to 8 x `<4 x i32>` fixed-width vectors (assuming 128bit vectors), which would be somewhat similar to requesting an interleave-count of 8. sdesmalen: This does something different than what I suggested though. For the following example, your…
		"unfeasible. Using fixed-width vectorization instead.\n");
		ORE->emit([&]() {
		return OptimizationRemarkAnalysis(DEBUG_TYPE, "ScalableVFUnfeasible",
		TheLoop->getStartLoc(),
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: is it useful to print MaxSafeElements or MaxSafeVectorWidthInBits here? sdesmalen: nit: is it useful to print MaxSafeElements or MaxSafeVectorWidthInBits here?
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions nit: is it useful to print MaxSafeElements or MaxSafeVectorWidthInBits here? `MaxSafeElements` is undefined here but LAA emits the max VF anyway which should be sufficient? c-rhodes: > nit: is it useful to print MaxSafeElements or MaxSafeVectorWidthInBits here?
		TheLoop->getHeader())
		<< "Max legal vector width too small, scalable vectorization "
		<< "unfeasible. Using fixed-width vectorization instead.";
		});
		return computeFeasibleMaxVF(
		sdesmalenUnsubmitted Done Reply Inline Actions Should this be an OptRemark instead? sdesmalen: Should this be an OptRemark instead?
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Should this be an OptRemark instead? Good point, I'll add that c-rhodes: > Should this be an OptRemark instead? Good point, I'll add that
		ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));
		}
		}

		LLVM_DEBUG(dbgs() << "LV: The max safe VF is: " << MaxSafeVF << ".\n");
		sdesmalenUnsubmitted Not Done Reply Inline Actions Can it know that `ElementCount::getFixed(UserVF.getKnownMinValue())` would be a valid VF? (that check is below on line 5614). Probably good to have a test for that as well. sdesmalen: Can it know that `ElementCount::getFixed(UserVF.getKnownMinValue())` would be a valid VF? (that…
		sdesmalenUnsubmitted Not Done Reply Inline Actions It seems I misread this line/statement, so please ignore this comment. sdesmalen: It seems I misread this line/statement, so please ignore this comment.

		if (ElementCount::isKnownLE(UserVF, MaxSafeVF))
return UserVF;		return UserVF;

LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF		LLVM_DEBUG(dbgs() << "LV: User VF=" << UserVF
<< " is unsafe, clamping to max safe VF=" << MaxSafeVF		<< " is unsafe, clamping to max safe VF=" << MaxSafeVF
<< ".\n");		<< ".\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationFactor",		return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationFactor",
TheLoop->getStartLoc(),		TheLoop->getStartLoc(),
TheLoop->getHeader())		TheLoop->getHeader())
<< "User-specified vectorization factor "		<< "User-specified vectorization factor "
<< ore::NV("UserVectorizationFactor", UserVF)		<< ore::NV("UserVectorizationFactor", UserVF)
<< " is unsafe, clamping to maximum safe vectorization factor "		<< " is unsafe, clamping to maximum safe vectorization factor "
<< ore::NV("VectorizationFactor", MaxSafeVF);		<< ore::NV("VectorizationFactor", MaxSafeVF);
});		});
return ElementCount::getFixed(MaxSafeVF);		return MaxSafeVF;
}		}

WidestRegister = std::min(WidestRegister, MaxSafeVectorWidthInBits);		WidestRegister = std::min(WidestRegister, MaxSafeVectorWidthInBits);

// Ensure MaxVF is a power of 2; the dependence distance bound may not be.		// Ensure MaxVF is a power of 2; the dependence distance bound may not be.
// Note that both WidestRegister and WidestType may not be a powers of 2.		// Note that both WidestRegister and WidestType may not be a powers of 2.
unsigned MaxVectorSize = PowerOf2Floor(WidestRegister / WidestType);		unsigned MaxVectorSize = PowerOf2Floor(WidestRegister / WidestType);

▲ Show 20 Lines • Show All 1,783 Lines • ▼ Show 20 Lines	if (CM.InterleaveInfo.invalidateGroups())
// based on them, which includes widening decisions and uniform and scalar		// based on them, which includes widening decisions and uniform and scalar
// values.		// values.
CM.invalidateCostModelingDecisions();		CM.invalidateCostModelingDecisions();
}		}

ElementCount MaxVF = MaybeMaxVF.getValue();		ElementCount MaxVF = MaybeMaxVF.getValue();
assert(MaxVF.isNonZero() && "MaxVF is zero.");		assert(MaxVF.isNonZero() && "MaxVF is zero.");

if (!UserVF.isZero() && ElementCount::isKnownLE(UserVF, MaxVF)) {		bool UserVFIsLegal = ElementCount::isKnownLE(UserVF, MaxVF);
LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		if (!UserVF.isZero() &&
assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&		(UserVFIsLegal \|\| (UserVF.isScalable() && MaxVF.isScalable()))) {
		david-armUnsubmitted Not Done Reply Inline Actions Is this additional check necessary here? It looks like MaxVF came from computeMaXVF, which makes use of your clamping code above. If you revert this change do your tests still pass? david-arm: Is this additional check necessary here? It looks like MaxVF came from computeMaXVF, which…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Is this additional check necessary here? Until the loop below supports scalable VFs yes, with this check we always fall into this block for a scalable VF, this handles the case where the UserVF isn't legal but a scalable VF is feasible so it clamps to the scalable MaxVF. Otherwise the assert below would be triggered, this is changed from the previous revision to implement @sdesmalen suggestion of: `UserVF = Scalable && MaxVF = Scalable && UserVF > MaxVF => Pick MaxVF`, rather than change MaxVF from scalable -> fixed. c-rhodes: > Is this additional check necessary here? Until the loop below supports scalable VFs yes…
		david-armUnsubmitted Not Done Reply Inline Actions Ah I think I misunderstood what the clamping above is doing. It only clamps MaxVF, not UserVF. I wonder if it's worth emitting a remark for this case to let the user know something happened? It might look a bit odd if we downgrade their VF without telling them. I guess it depends upon how quick we get a fix for this. david-arm: Ah I think I misunderstood what the clamping above is doing. It only clamps MaxVF, not UserVF.
		// FIXME: MaxVF is temporarily used inplace of UserVF for illegal scalable
		// VFs here, this should be reverted to only use legal UserVFs once the
		// loop below supports scalable VFs.
		ElementCount VF = UserVFIsLegal ? UserVF : MaxVF;
		LLVM_DEBUG(dbgs() << "LV: Using " << (UserVFIsLegal ? "user" : "max")
		<< " VF " << VF << ".\n");
		assert(isPowerOf2_32(VF.getKnownMinValue()) &&
"VF needs to be a power of two");		"VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
CM.selectUserVectorizationFactor(UserVF);		CM.selectUserVectorizationFactor(VF);
CM.collectInLoopReductions();		CM.collectInLoopReductions();
buildVPlansWithVPRecipes(UserVF, UserVF);		buildVPlansWithVPRecipes(VF, VF);
LLVM_DEBUG(printPlans(dbgs()));		LLVM_DEBUG(printPlans(dbgs()));
return {{UserVF, 0}};		return {{VF, 0}};
}		}

assert(!MaxVF.isScalable() &&		assert(!MaxVF.isScalable() &&
"Scalable vectors not yet supported beyond this point");		"Scalable vectors not yet supported beyond this point");
		sdesmalenUnsubmitted Done Reply Inline Actions We still want to clamp to a smaller scalable VF if `UserVF.isScalable() && MaxVF.isScalable() && UserVF > MaxVF`. Because the loop below doesn't support scalable VFs, it is best to single that case out. That means: UserVF = Scalable && MaxVF = Scalable && UserVF > MaxVF => Pick MaxVF UserVF = Scalable && MaxVF = Fixed => Fall through into loop below. UserVF = Fixed && MaxVF = Fixed && UserVF > MaxVF => Fall through into loop below. Where the clamping in 1. is temporary until the loop below works on scalable vectors, so that it can choose the most profitable VF based on the cost-model. sdesmalen: We still want to clamp to a smaller scalable VF if `UserVF.isScalable() && MaxVF.isScalable()…
		c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Ok, I'll add a temporary workaround for scalable VFs to select MaxVF above if UserVF is unsafe. c-rhodes: Ok, I'll add a temporary workaround for scalable VFs to select MaxVF above if UserVF is unsafe.

for (ElementCount VF = ElementCount::getFixed(1);		for (ElementCount VF = ElementCount::getFixed(1);
ElementCount::isKnownLE(VF, MaxVF); VF *= 2) {		ElementCount::isKnownLE(VF, MaxVF); VF *= 2) {
// Collect Uniform and Scalar instructions after vectorization with VF.		// Collect Uniform and Scalar instructions after vectorization with VF.
CM.collectUniformsAndScalars(VF);		CM.collectUniformsAndScalars(VF);

// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
▲ Show 20 Lines • Show All 2,003 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-loop-unpredicated-body-scalar-tail.ll

This file was added.

				; RUN: opt -mtriple=aarch64-none-linux-gnu -mattr=+sve -S -loop-vectorize -instcombine -force-vector-interleave=1 < %s \| FileCheck %s --check-prefix=CHECKUF1
				; RUN: opt -mtriple=aarch64-none-linux-gnu -mattr=+sve -S -loop-vectorize -instcombine -force-vector-interleave=2 < %s \| FileCheck %s --check-prefix=CHECKUF2
				david-armUnsubmitted Not Done Reply Inline Actions nit: Perhaps it's better to run llvm/utils/update_llc_test_checks.py on these tests rather than hand-write the CHECK lines? This seems to be the accepted way of doing it in LLVM these days. :) david-arm: nit: Perhaps it's better to run llvm/utils/update_llc_test_checks.py on these tests rather than…

				; CHECKUF1: for.body.preheader:
				; CHECKUF1-DAG: %wide.trip.count = zext i32 %N to i64
				; CHECKUF1-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECKUF1-DAG: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
				; CHECKUF1-DAG: %min.iters.check = icmp ugt i64 %[[VSCALEX4]], %wide.trip.count

				; CHECKUF1: vector.ph:
				; CHECKUF1-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECKUF1-DAG: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
				; CHECKUF1-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX4]]
				; CHECKUF1: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf

				; CHECKUF1: vector.body:
				; CHECKUF1: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECKUF1: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index
				; CHECKUF1: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*
				; CHECKUF1: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8, !alias.scope !0
				; CHECKUF1: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> undef, double 1.000000e+00, i32 0), <vscale x 4 x double> undef, <vscale x 4 x i32> zeroinitializer)
				; CHECKUF1: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index
				; CHECKUF1: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*
				; CHECKUF1: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8, !alias.scope !3, !noalias !0
				; CHECKUF1: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECKUF1: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
				; CHECKUF1: %index.next = add i64 %index, %[[VSCALEX4]]
				; CHECKUF1: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec
				; CHECKUF1: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !5


				; For an interleave factor of 2, vscale is scaled by 8 instead of 4 (and thus shifted left by 3 instead of 2).
				; There is also the increment for the next iteration, e.g. instead of indexing IDXB, it indexes at IDXB + vscale * 4.

				; CHECKUF2: for.body.preheader:
				; CHECKUF2-DAG: %wide.trip.count = zext i32 %N to i64
				; CHECKUF2-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECKUF2-DAG: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
				; CHECKUF2-DAG: %min.iters.check = icmp ugt i64 %[[VSCALEX8]], %wide.trip.count

				; CHECKUF2: vector.ph:
				; CHECKUF2-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECKUF2-DAG: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
				; CHECKUF2-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX8]]
				; CHECKUF2: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf

				; CHECKUF2: vector.body:
				; CHECKUF2: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECKUF2: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index
				; CHECKUF2: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*
				; CHECKUF2: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8, !alias.scope !0
				; CHECKUF2: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
				; CHECKUF2: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2
				; CHECKUF2: %[[VSCALE2_EXT:.*]] = sext i32 %[[VSCALE2]] to i64
				; CHECKUF2: %[[IDXB_NEXT:.]] = getelementptr inbounds double, double %[[IDXB]], i64 %[[VSCALE2_EXT]]
				; CHECKUF2: %[[IDXB_NEXT_CAST:.]] = bitcast double %[[IDXB_NEXT]] to <vscale x 4 x double>*
				; CHECKUF2: %wide.load{{[0-9]+}} = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_NEXT_CAST]], align 8, !alias.scope !0
				; CHECKUF2: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> undef, double 1.000000e+00, i32 0), <vscale x 4 x double> undef, <vscale x 4 x i32> zeroinitializer)
				; CHECKUF2: %[[FADD_NEXT:.*]] = fadd <vscale x 4 x double> %wide.load{{[0-9]+}}, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> undef, double 1.000000e+00, i32 0), <vscale x 4 x double> undef, <vscale x 4 x i32> zeroinitializer)
				; CHECKUF2: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index
				; CHECKUF2: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*
				; CHECKUF2: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8, !alias.scope !3, !noalias !0
				; CHECKUF2: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
				; CHECKUF2: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2
				; CHECKUF2: %[[VSCALE2_EXT:.*]] = sext i32 %[[VSCALE2]] to i64
				; CHECKUF2: %[[IDXA_NEXT:.]] = getelementptr inbounds double, double %[[IDXA]], i64 %[[VSCALE2_EXT]]
				; CHECKUF2: %[[IDXA_NEXT_CAST:.]] = bitcast double %[[IDXA_NEXT]] to <vscale x 4 x double>*
				; CHECKUF2: store <vscale x 4 x double> %[[FADD_NEXT]], <vscale x 4 x double>* %[[IDXA_NEXT_CAST]], align 8, !alias.scope !3, !noalias !0
				; CHECKUF2: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
				; CHECKUF2: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
				; CHECKUF2: %index.next = add i64 %index, %[[VSCALEX8]]
				; CHECKUF2: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec
				; CHECKUF2: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !5

				define void @loop(i32 %N, double* nocapture %a, double* nocapture readonly %b) {
				entry:
				%cmp7 = icmp sgt i32 %N, 0
				br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				%wide.trip.count = zext i32 %N to i64
				br label %for.body

				for.cond.cleanup: ; preds = %for.body, %entry
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds double, double* %b, i64 %indvars.iv
				%0 = load double, double* %arrayidx, align 8
				%add = fadd double %0, 1.000000e+00
				%arrayidx2 = getelementptr inbounds double, double* %a, i64 %indvars.iv
				store double %add, double* %arrayidx2, align 8
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !1
				}

				!1 = distinct !{!1, !2, !3}
				!2 = !{!"llvm.loop.vectorize.width", i32 4}
				!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				fhahnUnsubmitted Not Done Reply Inline Actions `-force-vector-width` should also work, right? In that case, it would be good to use that at least for some tests. Also, could this be kept target-independent with `-force-target-supports-scalable-vectors=true`? fhahn: `-force-vector-width` should also work, right? In that case, it would be good to use that at…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions `-force-vector-width` should also work, right? In that case, it would be good to use that at least for some tests. Yeah good idea, not sure we test the combination of that flag and the metadata, I'll change that. Also, could this be kept target-independent with `-force-target-supports-scalable-vectors=true`? It can for this test since there's no dependencies so max vscale isn't required to determine if the scalable VF is valid. I'll implement your suggestion and keep this target-indepedent. c-rhodes: > `-force-vector-width` should also work, right? In that case, it would be good to use that at…

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt -mtriple=aarch64-none-linux-gnu -mattr=+sve -loop-vectorize -S < %s 2>&1 \| FileCheck %s
				sdesmalenUnsubmitted Not Done Reply Inline Actions Why does it require asserts? sdesmalen: Why does it require asserts?
				c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions Why does it require asserts? Because of the debug flags c-rhodes: > Why does it require asserts? Because of the debug flags
				; RUN: opt -mtriple=aarch64-none-linux-gnu -mattr=+sve -loop-vectorize -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck --check-prefix=CHECK-DBG %s
				; RUN: opt -mtriple=aarch64-none-linux-gnu -loop-vectorize -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck --check-prefix=CHECK-NO-SVE %s

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

				; These tests validate the behaviour of scalable vectorization factor hints,
				; where the following applies:
				;
				; * If the backend does not support scalable vectors, ignore the hint and let
				; the vectorizer pick a VF.
				; * If there are no dependencies and assuming the VF is a power of 2 the VF
				; should be accepted. This applies to both fixed and scalable VFs.
				; * If the dependency is too small to use scalable vectors, change the VF to
				; fixed, where existing behavior applies (clamping).
				; * If scalable vectorization is feasible given the dependency and the VF is
				; valid, accept it. Otherwise, clamp to the max scalable VF.

				; test1
				;
				; Scalable vectorization unfeasible, clamp VF from (4, scalable) -> (4, fixed).
				;
				; The pragma applied to this loop implies a scalable vector <vscale x 4 x i32>
				; be used for vectorization. For fixed vectors the MaxVF=8, otherwise there
				; would be a dependence between vector lanes for vectors greater than 256 bits.
				;
				; void test1(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(4, scalable)
				; for (int i=0; i<N; ++i) {
				; a[i + 8] = a[i] + b[i];
				; }
				; }
				;
				; For scalable vectorization 'vscale' has to be considered, for this example
				; unless max(vscale)=2 it's unsafe to vectorize. For SVE max(vscale)=16, check
				; fixed-width vectorization is used instead.

				; CHECK-DBG: LV: Max legal vector width too small, scalable vectorization unfeasible. Using fixed-width vectorization instead.
				; CHECK-DBG: remark: <unknown>:0:0: Max legal vector width too small, scalable vectorization unfeasible. Using fixed-width vectorization instead.
				; CHECK-DBG: LV: The max safe VF is: 8.
				; CHECK-DBG: LV: Selecting VF: 4.
				; CHECK-LABEL: @test1
				; CHECK: <4 x i32>
				define void @test1(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 8
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !0

				exit:
				ret void
				}

				!0 = !{!0, !1, !2}
				!1 = !{!"llvm.loop.vectorize.width", i32 4}
				!2 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

				; test2
				;
				; Scalable vectorization unfeasible, clamp VF from (8, scalable) -> (4, fixed).
				;
				; void test2(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(8, scalable)
				; for (int i=0; i<N; ++i) {
				; a[i + 4] = a[i] + b[i];
				; }
				; }

				; CHECK-DBG: LV: Max legal vector width too small, scalable vectorization unfeasible. Using fixed-width vectorization instead.
				; CHECK-DBG: LV: The max safe VF is: 4.
				; CHECK-DBG: LV: User VF=8 is unsafe, clamping to max safe VF=4.
				; CHECK-DBG: LV: Selecting VF: 4.
				; CHECK-LABEL: @test2
				; CHECK: <4 x i32>
				define void @test2(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 4
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !3

				exit:
				ret void
				}

				!3 = !{!3, !4, !5}
				!4 = !{!"llvm.loop.vectorize.width", i32 8}
				!5 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

				; test3
				;
				; Scalable vectorization feasible and the VF is valid.
				;
				; Specifies a vector of <vscale x 2 x i32>, i.e. maximum of 32 x i32 with 2
				; words per 128-bits (unpacked).
				;
				; void test3(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(2, scalable)
				; for (int i=0; i<N; ++i) {
				; a[i + 32] = a[i] + b[i];
				; }
				; }
				;
				; Max fixed VF=32, Max scalable VF=2, safe to vectorize.

				; CHECK-DBG-LABEL: LV: Checking a loop in "test3"
				; CHECK-DBG: LV: The max safe VF is: vscale x 2.
				; CHECK-DBG: LV: Using user VF vscale x 2.
				; CHECK-LABEL: @test3
				; CHECK: <vscale x 2 x i32>
				define void @test3(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 32
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !6

				exit:
				ret void
				}

				!6 = !{!6, !7, !8}
				!7 = !{!"llvm.loop.vectorize.width", i32 2}
				!8 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

				; test4
				;
				; Scalable vectorization feasible, but the VF is unsafe. Should clamp.
				;
				; Specifies a vector of <vscale x 4 x i32>, i.e. maximum of 64 x i32 with 4
				; words per 128-bits (packed).
				;
				; void test4(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(4, scalable)
				; for (int i=0; i<N; ++i) {
				; a[i + 32] = a[i] + b[i];
				; }
				; }
				;
				; Max fixed VF=32, Max scalable VF=2, unsafe to vectorize. Should clamp to 2.

				; CHECK-DBG-LABEL: LV: Checking a loop in "test4"
				; CHECK-DBG: LV: The max safe VF is: vscale x 2.
				; CHECK-DBG: LV: User VF=vscale x 4 is unsafe, clamping to max safe VF=vscale x 2.
				; CHECK-DBG: remark: <unknown>:0:0: User-specified vectorization factor vscale x 4 is unsafe, clamping to maximum safe vectorization factor vscale x 2
				; CHECK-DBG: LV: Using max VF vscale x 2
				; CHECK-LABEL: @test4
				; CHECK: <vscale x 2 x i32>
				define void @test4(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 32
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !9

				exit:
				ret void
				}

				!9 = !{!9, !10, !11}
				!10 = !{!"llvm.loop.vectorize.width", i32 4}
				!11 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

				; test5
				;
				; Scalable vectorization feasible and the VF is valid.
				;
				; Specifies a vector of <vscale x 4 x i32>, i.e. maximum of 64 x i32 with 4
				; words per 128-bits (packed).
				;
				; void test5(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(4, scalable)
				; for (int i=0; i<N; ++i) {
				; a[i + 128] = a[i] + b[i];
				; }
				; }
				;
				; Max fixed VF=128, Max scalable VF=8, safe to vectorize.

				; CHECK-DBG-LABEL: LV: Checking a loop in "test5"
				; CHECK-DBG: LV: The max safe VF is: vscale x 8.
				; CHECK-DBG: LV: Using user VF vscale x 4
				; CHECK-LABEL: @test5
				; CHECK: <vscale x 4 x i32>
				define void @test5(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 128
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !12

				exit:
				ret void
				}

				!12 = !{!12, !13, !14}
				!13 = !{!"llvm.loop.vectorize.width", i32 4}
				!14 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

				; test6
				;
				; Scalable vectorization feasible, but the VF is unsafe. Should clamp.
				;
				; Specifies a vector of <vscale x 16 x i32>, i.e. maximum of 256 x i32.
				;
				; void test6(int a, int b, int N) {
				; #pragma clang loop vectorize(enable) vectorize_width(16, scalable)
				; for (int i=0; i<N; ++i) {
				; a[i + 128] = a[i] + b[i];
				; }
				; }
				;
				; Max fixed VF=128, Max scalable VF=8, unsafe to vectorize. Should clamp to 8.

				; CHECK-DBG-LABEL: LV: Checking a loop in "test6"
				; CHECK-DBG: LV: The max safe VF is: vscale x 8.
				; CHECK-DBG: LV: User VF=vscale x 16 is unsafe, clamping to max safe VF=vscale x 8.
				; CHECK-DBG: remark: <unknown>:0:0: User-specified vectorization factor vscale x 16 is unsafe, clamping to maximum safe vectorization factor vscale x 8
				; CHECK-DBG: LV: Using max VF vscale x 8
				; CHECK-LABEL: @test6
				; CHECK: <vscale x 8 x i32>
				define void @test6(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 128
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !15

				exit:
				ret void
				}

				!15 = !{!15, !16, !17}
				!16 = !{!"llvm.loop.vectorize.width", i32 16}
				!17 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

				; CHECK-NO-SVE: LV: Ignoring VF=vscale x 4 because target does not support scalable vectors.
				; CHECK-NO-SVE: remark: <unknown>:0:0: Ignoring VF=vscale x 4 because target does not support scalable vectors.
				; CHECK-NO-SVE: LV: Selecting VF: 4.
				; CHECK-NO-SVE: <4 x i32>
				define void @test_no_sve(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 4
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !18

				exit:
				ret void
				}

				!18 = !{!18, !19, !20}
				!19 = !{!"llvm.loop.vectorize.width", i32 4}
				!20 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

llvm/test/Transforms/LoopVectorize/metadata-width.ll

	; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -dce -instcombine -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -force-vector-interleave=1 -force-target-supports-scalable-vectors=true -dce -instcombine -S \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK: store <8 x i32>			; CHECK: store <8 x i32>
	; CHECK: ret void			; CHECK: ret void
	define void @test1(i32* nocapture %a, i32 %n) #0 {			define void @test1(i32* nocapture %a, i32 %n) #0 {
	entry:			entry:
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll

	; REQUIRES: asserts			; REQUIRES: asserts
	; RUN: opt < %s -passes='loop-vectorize' -force-vector-width=2 -enable-epilogue-vectorization -epilogue-vectorization-force-VF=2 --debug-only=loop-vectorize -S 2>&1 \| FileCheck %s			; RUN: opt < %s -passes='loop-vectorize' -force-vector-width=2 -force-target-supports-scalable-vectors=true -enable-epilogue-vectorization -epilogue-vectorization-force-VF=2 --debug-only=loop-vectorize -S 2>&1 \| FileCheck %s
				fhahnUnsubmitted Not Done Reply Inline Actions is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors. fhahn: is that necessary? I think it would be better to just add a separate test specifically checking…
				c-rhodesAuthorUnsubmitted Not Done Reply Inline Actions is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors. The final test here checks epilogue vectorization is disabled for scalable vectors, I can split that out but it'll still require this flag to target-independent. Or it could become an AArch64 test with SVE. c-rhodes: > is that necessary? I think it would be better to just add a separate test specifically…
				c-rhodesAuthorUnsubmitted Done Reply Inline Actions is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors. The final test here checks epilogue vectorization is disabled for scalable vectors, I can split that out but it'll still require this flag to target-independent. Or it could become an AArch64 test with SVE. I've split this into a separate test: `llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-scalable.ll`. c-rhodes: > > is that necessary? I think it would be better to just add a separate test specifically…

	target datalayout = "e-m:e-i64:64-n32:64-v256:256:256-v512:512:512"			target datalayout = "e-m:e-i64:64-n32:64-v256:256:256-v512:512:512"

	; Currently we cannot handle reduction loops.			; Currently we cannot handle reduction loops.
	; CHECK: LV: Checking a loop in "f1"			; CHECK: LV: Checking a loop in "f1"
	; CHECK: LEV: Unable to vectorize epilogue because the loop is not a supported candidate.			; CHECK: LEV: Unable to vectorize epilogue because the loop is not a supported candidate.

	define signext i32 @f1(i8* noalias %A, i32 signext %n) {			define signext i32 @f1(i8* noalias %A, i32 signext %n) {
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll

This file was deleted.

	; RUN: opt -S -loop-vectorize -instcombine -force-vector-interleave=1 < %s \| FileCheck %s --check-prefix=CHECKUF1
	; RUN: opt -S -loop-vectorize -instcombine -force-vector-interleave=2 < %s \| FileCheck %s --check-prefix=CHECKUF2

	; CHECKUF1: for.body.preheader:
	; CHECKUF1-DAG: %wide.trip.count = zext i32 %N to i64
	; CHECKUF1-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF1-DAG: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
	; CHECKUF1-DAG: %min.iters.check = icmp ugt i64 %[[VSCALEX4]], %wide.trip.count

	; CHECKUF1: vector.ph:
	; CHECKUF1-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF1-DAG: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
	; CHECKUF1-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX4]]
	; CHECKUF1: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf

	; CHECKUF1: vector.body:
	; CHECKUF1: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECKUF1: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index
	; CHECKUF1: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*
	; CHECKUF1: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8, !alias.scope !0
	; CHECKUF1: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> undef, double 1.000000e+00, i32 0), <vscale x 4 x double> undef, <vscale x 4 x i32> zeroinitializer)
	; CHECKUF1: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index
	; CHECKUF1: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*
	; CHECKUF1: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8, !alias.scope !3, !noalias !0
	; CHECKUF1: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF1: %[[VSCALEX4:.*]] = shl i64 %[[VSCALE]], 2
	; CHECKUF1: %index.next = add i64 %index, %[[VSCALEX4]]
	; CHECKUF1: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec
	; CHECKUF1: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !5


	; For an interleave factor of 2, vscale is scaled by 8 instead of 4 (and thus shifted left by 3 instead of 2).
	; There is also the increment for the next iteration, e.g. instead of indexing IDXB, it indexes at IDXB + vscale * 4.

	; CHECKUF2: for.body.preheader:
	; CHECKUF2-DAG: %wide.trip.count = zext i32 %N to i64
	; CHECKUF2-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF2-DAG: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
	; CHECKUF2-DAG: %min.iters.check = icmp ugt i64 %[[VSCALEX8]], %wide.trip.count

	; CHECKUF2: vector.ph:
	; CHECKUF2-DAG: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF2-DAG: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
	; CHECKUF2-DAG: %n.mod.vf = urem i64 %wide.trip.count, %[[VSCALEX8]]
	; CHECKUF2: %n.vec = sub nsw i64 %wide.trip.count, %n.mod.vf

	; CHECKUF2: vector.body:
	; CHECKUF2: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
	; CHECKUF2: %[[IDXB:.]] = getelementptr inbounds double, double %b, i64 %index
	; CHECKUF2: %[[IDXB_CAST:.]] = bitcast double %[[IDXB]] to <vscale x 4 x double>*
	; CHECKUF2: %wide.load = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_CAST]], align 8, !alias.scope !0
	; CHECKUF2: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
	; CHECKUF2: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2
	; CHECKUF2: %[[VSCALE2_EXT:.*]] = sext i32 %[[VSCALE2]] to i64
	; CHECKUF2: %[[IDXB_NEXT:.]] = getelementptr inbounds double, double %[[IDXB]], i64 %[[VSCALE2_EXT]]
	; CHECKUF2: %[[IDXB_NEXT_CAST:.]] = bitcast double %[[IDXB_NEXT]] to <vscale x 4 x double>*
	; CHECKUF2: %wide.load{{[0-9]+}} = load <vscale x 4 x double>, <vscale x 4 x double>* %[[IDXB_NEXT_CAST]], align 8, !alias.scope !0
	; CHECKUF2: %[[FADD:.*]] = fadd <vscale x 4 x double> %wide.load, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> undef, double 1.000000e+00, i32 0), <vscale x 4 x double> undef, <vscale x 4 x i32> zeroinitializer)
	; CHECKUF2: %[[FADD_NEXT:.*]] = fadd <vscale x 4 x double> %wide.load{{[0-9]+}}, shufflevector (<vscale x 4 x double> insertelement (<vscale x 4 x double> undef, double 1.000000e+00, i32 0), <vscale x 4 x double> undef, <vscale x 4 x i32> zeroinitializer)
	; CHECKUF2: %[[IDXA:.]] = getelementptr inbounds double, double %a, i64 %index
	; CHECKUF2: %[[IDXA_CAST:.]] = bitcast double %[[IDXA]] to <vscale x 4 x double>*
	; CHECKUF2: store <vscale x 4 x double> %[[FADD]], <vscale x 4 x double>* %[[IDXA_CAST]], align 8, !alias.scope !3, !noalias !0
	; CHECKUF2: %[[VSCALE:.*]] = call i32 @llvm.vscale.i32()
	; CHECKUF2: %[[VSCALE2:.*]] = shl i32 %[[VSCALE]], 2
	; CHECKUF2: %[[VSCALE2_EXT:.*]] = sext i32 %[[VSCALE2]] to i64
	; CHECKUF2: %[[IDXA_NEXT:.]] = getelementptr inbounds double, double %[[IDXA]], i64 %[[VSCALE2_EXT]]
	; CHECKUF2: %[[IDXA_NEXT_CAST:.]] = bitcast double %[[IDXA_NEXT]] to <vscale x 4 x double>*
	; CHECKUF2: store <vscale x 4 x double> %[[FADD_NEXT]], <vscale x 4 x double>* %[[IDXA_NEXT_CAST]], align 8, !alias.scope !3, !noalias !0
	; CHECKUF2: %[[VSCALE:.*]] = call i64 @llvm.vscale.i64()
	; CHECKUF2: %[[VSCALEX8:.*]] = shl i64 %[[VSCALE]], 3
	; CHECKUF2: %index.next = add i64 %index, %[[VSCALEX8]]
	; CHECKUF2: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec
	; CHECKUF2: br i1 %[[CMP]], label %middle.block, label %vector.body, !llvm.loop !5

	define void @loop(i32 %N, double* nocapture %a, double* nocapture readonly %b) {
	entry:
	%cmp7 = icmp sgt i32 %N, 0
	br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

	for.body.preheader: ; preds = %entry
	%wide.trip.count = zext i32 %N to i64
	br label %for.body

	for.cond.cleanup: ; preds = %for.body, %entry
	ret void

	for.body: ; preds = %for.body.preheader, %for.body
	%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds double, double* %b, i64 %indvars.iv
	%0 = load double, double* %arrayidx, align 8
	%add = fadd double %0, 1.000000e+00
	%arrayidx2 = getelementptr inbounds double, double* %a, i64 %indvars.iv
	store double %add, double* %arrayidx2, align 8
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count
	br i1 %exitcond.not, label %for.cond.cleanup, label %for.body, !llvm.loop !1
	}

	!1 = distinct !{!1, !2, !3}
	!2 = !{!"llvm.loop.vectorize.width", i32 4}
	!3 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

llvm/test/Transforms/LoopVectorize/scalable-vf-hint.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt -loop-vectorize -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 \| FileCheck %s

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				; CHECK: LV: Ignoring VF=vscale x 4 because target does not support scalable vectors.
				; CHECK: remark: <unknown>:0:0: Ignoring VF=vscale x 4 because target does not support scalable vectors.
				; CHECK: LV: The Widest register safe to use is: 32 bits.
				define void @test1(i32* %a, i32* %b) {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %b, i64 %iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%2 = add nuw nsw i64 %iv, 4
				%arrayidx5 = getelementptr inbounds i32, i32* %a, i64 %2
				sdesmalenUnsubmitted Done Reply Inline Actions At least until we add lowering of scalable vectors for targets that don't support it, I would rather the code explicitly checks if the target can handle them and ignore the hint if it doesn't (a similar suggestion was made on D88962). sdesmalen: At least until we add lowering of scalable vectors for targets that don't support it, I would…
				store i32 %add, i32* %arrayidx5, align 4
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, 1024
				br i1 %exitcond.not, label %exit, label %loop, !llvm.loop !0

				exit:
				ret void
				}

				!0 = !{!0, !1, !2}
				!1 = !{!"llvm.loop.vectorize.width", i32 4}
				!2 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Legalize scalable VF hintsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 312819

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/scalable-loop-unpredicated-body-scalar-tail.ll

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll

llvm/test/Transforms/LoopVectorize/metadata-width.ll

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll

llvm/test/Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll

llvm/test/Transforms/LoopVectorize/scalable-vf-hint.ll

[LV] Legalize scalable VF hints
ClosedPublic