# [LV] Legalize scalable VF hintsClosedPublic

Authored by c-rhodes on Nov 18 2020, 8:30 AM.

# Details

Reviewers
 sdesmalen efriedma fhahn dmgreen Meinersbur ctetreau craig.topper david-arm • zinob
Commits
rG1e7efd397ac2: [LV] Legalize scalable VF hints
Summary

In the following loop:

```void foo(int *a, int *b, int N) {
for (int i=0; i<N; ++i)
a[i + 4] = a[i] + b[i];
}```

The loop dependence constrains the VF to a maximum of (4, fixed), which
would mean using <4 x i32> as the vector type in vectorization.
Extending this to scalable vectorization, a VF of (4, scalable) implies
a vector type of <vscale x 4 x i32>. To determine if this is legal
vscale must be taken into account. For this example, unless
max(vscale)=1, it's unsafe to vectorize.

For SVE, the number of bits in an SVE register is architecturally
defined to be a multiple of 128 bits with a maximum of 2048 bits, thus
the maximum vscale is 16. In the loop above it is therefore unfeasible
to vectorize with SVE. However, in this loop:

```void foo(int *a, int *b, int N) {
#pragma clang loop vectorize_width(X, scalable)
for (int i=0; i<N; ++i)
a[i + 32] = a[i] + b[i];
}```

As long as max(vscale) multiplied by the number of lanes 'X' doesn't
exceed the dependence distance, it is safe to vectorize. For SVE a VF of
(2, scalable) is within this constraint, since a vector of <16 x 2 x 32>
will have no dependencies between lanes. For any number of lanes larger
than this it would be unsafe to vectorize.

This patch extends 'computeFeasibleMaxVF' to legalize scalable VFs
specified as loop hints, implementing the following behaviour:

• If the backend does not support scalable vectors, ignore the hint.
• If scalable vectorization is unfeasible given the loop dependence, like in the first example above for SVE, then use a fixed VF.
• Accept scalable VFs if it's safe to do so.
• Otherwise, clamp scalable VFs that exceed the maximum safe VF.

# Diff Detail

### Event Timeline

c-rhodes created this revision.Nov 18 2020, 8:30 AM
Herald added a project: Restricted Project. Nov 18 2020, 8:30 AM
c-rhodes requested review of this revision.Nov 18 2020, 8:30 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5597

nit: MaxSafeElements?

5599

Instead of returning an Optional<ElementCount>, I'd prefer the code here to just clamp to MaxFixedVF,
in this order:

• first try to see if it can use the requested scalable VF.
• if not, then try to see if it can use the requested VF with vscale = 1 (i.e. fixed width)
• if not, then try if it can use a clamped fixed VF.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594

This check seems a bit odd. I think we should at least use a named constant to make things clearer and ensure that LAA & LV are kept in sync on the meaning. But would it be possible to instead compute the maximum width of UserVF using MaxVScale (something like UserVF * WidestType * MaxVScale)?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594

This check seems a bit odd. I think we should at least use a named constant to make things clearer and ensure that LAA & LV are kept in sync on the meaning. But would it be possible to instead compute the maximum width of UserVF using MaxVScale (something like UserVF * WidestType * MaxVScale)?

I don't see how this is related, the reason I added this check is this block is validating a user VF and clamping it when it exceeds what is safe in terms of dependencies, but if there are no dependencies there's nothing to do here. I'm not sure if there's a better way of querying if there are no dependencies? MaxSafeRegisterWidth is initialized as -1U in LAA so I compared it against UINT_MAX, maybe it would be a little clearer if MaxSafeRegisterWidth was initialized as UINT_MAX. It would probably have made sense to introduced this in D90687 although I didn't consider it, it became obvious with this patch since large max safe VFs for loops with no dependencies were being emitted.

5597

nit: MaxSafeElements?

Cheers that makes more sense, will update it

5599

Instead of returning an Optional<ElementCount>, I'd prefer the code here to just clamp to MaxFixedVF,
in this order:

• first try to see if it can use the requested scalable VF.
• if not, then try to see if it can use the requested VF with vscale = 1 (i.e. fixed width)
• if not, then try if it can use a clamped fixed VF.

Deferring to fixed-width so Optional isn't necessary is a good point, the first 2 cases should already be handled, I'll add the latter

c-rhodes updated this revision to Diff 308062.Nov 27 2020, 8:25 AM
c-rhodes edited the summary of this revision. (Show Details)

Changes:

• s/FixedVF/MaxSafeElements/
• Rather than bail out, clamp to fixed VF is scalable is unfeasible. Remove change to return Optional<ElementCount> from LoopVectorizationCostModel::computeFeasibleMaxVF.
• In LoopVectorizationPlanner::plan, if a UserVF is specified but isn't legal then use MaxVF (earlier).
• Simplified the IR in the tests.
c-rhodes marked an inline comment as done.Nov 27 2020, 8:25 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5612

This does something different than what I suggested though.

For the following example, your code would set the MaxSafeVF to 32 elements instead of using a fixed VF of 4 elements (fixed):

```#pragma clang loop vectorize_width(4, scalable)
for (int i=0; i<N; ++i)
a[i + 32] = a[i] + 1;   // for some int* a```

I think a VF of 4 would be preferred, because otherwise selectiondag will need to expand <32 x i32> to 8 x <4 x i32> fixed-width vectors (assuming 128bit vectors), which would be somewhat similar to requesting an interleave-count of 8.

llvm/test/Transforms/LoopVectorize/scalable-vf-hint.ll
21

At least until we add lowering of scalable vectors for targets that don't support it, I would rather the code explicitly checks if the target can handle them and ignore the hint if it doesn't (a similar suggestion was made on D88962).

c-rhodes updated this revision to Diff 310996.Dec 10 2020, 1:01 PM
c-rhodes edited the summary of this revision. (Show Details)

Changes:

• Rebase since D88962 and D91077 have now landed.
• Address @sdesmalen's comments. Added a target hook supportsScalableVectors in D93060 which is used in this patch to bail out of vectorization if unsupported. Also changed the behavior when scalable vectorization is unfeasible given a dependency, it now changes the hint to fixed, where existing behavior applies (i.e. clamping).
• Updated tests.
c-rhodes marked 2 inline comments as done.Dec 10 2020, 1:02 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5573

MaxSafeVectorWidthInBits can't exceed UINT_MAX. If this this early bail out isn't strictly necessary, I'd suggest removing it.

5596–5602

Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal?

i.e.

```bool IgnoreUserVF = UserVF.isScalable() && TTI.supportsScalableVectors();
if (IgnoreUserVF)
ORE->emit([&]() {
return OptimizationRemark(....) };

if (UserVF.isNonZero() && !IgnoreUserVF) {
...
}

// otherwise, let the LoopVectorizer choose a VF itself.```

I guess that can be split out into D93060 as well.

5621

Should this be an OptRemark instead?

5626

Can it know that ElementCount::getFixed(UserVF.getKnownMinValue()) would be a valid VF? (that check is below on line 5614).
Probably good to have a test for that as well.

7461

We still want to clamp to a smaller scalable VF if UserVF.isScalable() && MaxVF.isScalable() && UserVF > MaxVF. Because the loop below doesn't support scalable VFs, it is best to single that case out.

That means:

1. UserVF = Scalable && MaxVF = Scalable && UserVF > MaxVF => Pick MaxVF
2. UserVF = Scalable && MaxVF = Fixed => Fall through into loop below.
3. UserVF = Fixed && MaxVF = Fixed && UserVF > MaxVF => Fall through into loop below.

Where the clamping in 1. is temporary until the loop below works on scalable vectors, so that it can choose the most profitable VF based on the cost-model.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll
2

Why does it require asserts?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594

My point was that the original code handled the case where there are no dependencies naturally without an extra check, right? Is there a reason the logic for scalable vectors cannot do the same?

5596–5602

Agreed, if the user requests scalable vectorization through a hint, I probably should just drop the hint and proceed with normal cost-modeling.

(if there's a reason to keep the bail-out, can we just return a VF of 1? this is how other places in the function indicate that vectorization should be avoided)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5626

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5573

MaxSafeVectorWidthInBits can't exceed UINT_MAX. If this this early bail out isn't strictly necessary, I'd suggest removing it.

There's a couple of existing tests:

• Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll

that will fall over if the hint is ignored when scalable vectorization isn't supported. I suppose those could become target-specific tests.

5596–5602

Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal?

...

I guess that can be split out into D93060 as well.

Ok np, I'll split that out then.

5621

Should this be an OptRemark instead?

7461

Ok, I'll add a temporary workaround for scalable VFs to select MaxVF above if UserVF is unsafe.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-vf-hint.ll
2

Why does it require asserts?

Because of the debug flags

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5594

My point was that the original code handled the case where there are no dependencies naturally without an extra check, right? Is there a reason the logic for scalable vectors cannot do the same?

Sorry I missed your comment. One issue is for non-target specific tests with scalable VFs such as:

• Transforms/LoopVectorize/scalable-loop-unpredicated-body-scalar-tail.ll

the hint will be ignored since scalable vectorization isn't supported, and also getMaxVScale now returns None. I suppose there's a couple of options, we could add new loop vectorizer flags or those tests could become become target-specific, I'm not sure what the best option is.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5573

If the test has a loop-carried distance that may invalidate auto-vectorization that needs checking, I think it's fair enough for the test to be moved to be a target-specific test (because that will require getMaxVScale).

For a test like metadata-width.ll it may be useful to add some -force-target-supports-scalable-vectors=true flag to avoid making all scalable-vector tests for the LoopVectorizer target-specific.

c-rhodes updated this revision to Diff 311576.Dec 14 2020, 7:26 AM
• Rather than disable vectorization if scalable VF isn't supported, let the LV pick a VF as it normally does and process with cost-modelling.
• Added a new LV flag -force-target-supports-scalable-vectors to support writing non-target specific tests with scalable VFs.
• Added a temporary workaround to use MaxVF in place of UserVF until cost modelling loop works for scalable VFs.
• Emit opt remark.
c-rhodes marked 3 inline comments as done.Dec 14 2020, 7:38 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5596–5602

Instead of returning 'None' (and thus not allowing any vectorziation), isn't it better to ignore the UserVF and continuing by allowing a VF to be chosen as normal?

...

I guess that can be split out into D93060 as well.

Ok np, I'll split that out then.

Done, although I've kept it as part of this patch. Ignoring the hint if scalable vectors aren't supported breaks existing tests so I've implemented the flag you suggested -force-target-supports-scalable-vectors which can be used for non-target specific tests. Although this only works for loops with no dependencies since getMaxVScale is None by default, so I've moved scalable-loop-unpredicated-body-scalar-tail.ll to AArch64 and enabled SVE.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7442

Is this additional check necessary here? It looks like MaxVF came from computeMaXVF, which makes use of your clamping code above. If you revert this change do your tests still pass?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7442

Ah I think I misunderstood what the clamping above is doing. It only clamps MaxVF, not UserVF. I wonder if it's worth emitting a remark for this case to let the user know something happened? It might look a bit odd if we downgrade their VF without telling them. I guess it depends upon how quick we get a fix for this.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7442

Is this additional check necessary here?

Until the loop below supports scalable VFs yes, with this check we always fall into this block for a scalable VF, this handles the case where the UserVF isn't legal but a scalable VF is feasible so it clamps to the scalable MaxVF. Otherwise the assert below would be triggered, this is changed from the previous revision to implement @sdesmalen suggestion of: UserVF = Scalable && MaxVF = Scalable && UserVF > MaxVF => Pick MaxVF, rather than change MaxVF from scalable -> fixed.

c-rhodes updated this revision to Diff 312451.Dec 17 2020, 5:02 AM

Handle scalable VF in a single isScalable block in computeFeasibleMaxVF.

david-arm accepted this revision.Dec 18 2020, 6:46 AM

LGTM. I agree the FIXME in the code isn't ideal, but I understand why it's there and is only a temporary measure until we can fix the loop to support scalable vectors.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-loop-unpredicated-body-scalar-tail.ll
2

nit: Perhaps it's better to run llvm/utils/update_llc_test_checks.py on these tests rather than hand-write the CHECK lines? This seems to be the accepted way of doing it in LLVM these days. :)

This revision is now accepted and ready to land.Dec 18 2020, 6:46 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
278

It would be good to be clear in the message that this is just for testing (same as the previous option). Perhaps something like Pretend that scalable vectors are supported, even if the target does not support them. The flag should only be used for testing.

5572–5591

This is just about ignoring scalable UserVFs, right? Might be good to make that clear as part of the variable name.

5573

Yes that sounds reasonable to me.

5594

Ah right, if getMaxVScale can return None, then my suggestion doesn't work as expected. In that case, I think it would be good to encapsulate the check in LAA and have something like areAllVectorWidthsSafe or something. This should be more robust to future changes in LAA.

c-rhodes updated this revision to Diff 312819.Dec 18 2020, 9:06 AM

• Clarify help message in -force-target-supports-scalable-vectors flag.
• s/IgnoreUserVF/IgnoreScalableUserVF/g
• Added isSafeForAnyVectorWidth to LAA that checks if the number of elements that can be operated on simultaneously is not bound. This is used in computeFeasibleMaxVF to skip the checks for whether the VF is safe given the dependence distance.
c-rhodes marked 3 inline comments as done.Dec 18 2020, 9:07 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5600–5601

It should be fine for MaxVScale to be undefined for a target architecture. Based on the logic below where it tries to pick a suitable alternative VF, it should probably pick a different (non-scalable) VF, instead of crashing.

c-rhodes updated this revision to Diff 314648.Jan 5 2021, 10:02 AM
c-rhodes edited the summary of this revision. (Show Details)

Changes:

• Rebased. Target hook for max vscale has been removed from this patch since it landed in D93030.
• Address Sander's comment w.r.t. falling back to fixed vectorization if scalable vectors are supported, but max vscale is undefined.
c-rhodes marked an inline comment as done.Jan 5 2021, 10:06 AM
fhahn accepted this revision.Jan 6 2021, 2:43 AM

LGTM, thanks. Some small comments left inline.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5572–5591

This early exit could also be on the top of the function, before we compute the register widths & co and is not specific to scalable vectors, right? Might be good to move, although the 'scalable ignored' message might also needs to move.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-loop-unpredicated-body-scalar-tail.ll
101

-force-vector-width should also work, right? In that case, it would be good to use that at least for some tests.

Also, could this be kept target-independent with -force-target-supports-scalable-vectors=true?

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll
2

is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5572–5591

This early exit could also be on the top of the function, before we compute the register widths & co and is not specific to scalable vectors, right? Might be good to move, although the 'scalable ignored' message might also needs to move.

That's right it's not specific to scalable vectors, as long as User.isNonZero() it applies. I think you're right, if this is moved to the top IgnoreScalableUserVF will need to be part of the check, so:

```// Nothing to do if there are no dependencies.
if (UserVF.isNonZero() && !IgnoreScalableUserVF &&
Legal->isSafeForAnyVectorWidth())
return UserVF```

Otherwise scalable VFs will be accepted for targets without scalable vector support, where it currently falls into the existing code to pick a VF. It's fine with me to move this to the top with that in mind.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-loop-unpredicated-body-scalar-tail.ll
101

-force-vector-width should also work, right? In that case, it would be good to use that at least for some tests.

Yeah good idea, not sure we test the combination of that flag and the metadata, I'll change that.

Also, could this be kept target-independent with -force-target-supports-scalable-vectors=true?

It can for this test since there's no dependencies so max vscale isn't required to determine if the scalable VF is valid. I'll implement your suggestion and keep this target-indepedent.

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll
2

is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors.

The final test here checks epilogue vectorization is disabled for scalable vectors, I can split that out but it'll still require this flag to target-independent. Or it could become an AArch64 test with SVE.

c-rhodes updated this revision to Diff 314903.Jan 6 2021, 8:09 AM

@fhahn thanks for reviewing, I think I've addressed all comments. I'll probably land this over the coming days unless there's any objections between then, cheers.

llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll
2

is that necessary? I think it would be better to just add a separate test specifically checking that epilogue vectorization is disabled with scalable vectors.

The final test here checks epilogue vectorization is disabled for scalable vectors, I can split that out but it'll still require this flag to target-independent. Or it could become an AArch64 test with SVE.

I've split this into a separate test: llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-scalable.ll.

sdesmalen accepted this revision.Jan 6 2021, 2:11 PM

LGTM, thanks for your work on this patch @c-rhodes!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5616

nit: is it useful to print MaxSafeElements or MaxSafeVectorWidthInBits here?