This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute
ClosedPublic

Authored by david-arm on Oct 7 2020, 6:13 AM.

Details

Summary

In this patch I have added support for a new loop hint called
vectorize.scalable.enable that says whether we should enable scalable
vectorization or not. If a user wants to instruct the compiler to
vectorize a loop with scalable vectors they can now do this as
follows:

br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !2
...
!2 = !{!2, !3, !4}
!3 = !{!"llvm.loop.vectorize.width", i32 8}
!4 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

Setting the hint to false simply reverts the behaviour back to the
default, using fixed width vectors.

Diff Detail

Event Timeline

david-arm created this revision.Oct 7 2020, 6:13 AM
david-arm requested review of this revision.Oct 7 2020, 6:13 AM
fhahn requested changes to this revision.Oct 7 2020, 7:23 AM
fhahn added a subscriber: fhahn.

Could you also update the documentation for llvm.loop.vectorize.width? https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-width-metadata

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
60

might be good to add comments indicating what the fields are used for.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
67

are you anticipating additional hints with ElementCount metadata? If not, it might be simpler to just deal with HK_WIDTH up front and validate Val[0] and Val[1] here and leave the code handling the other cases mostly unchanged? Or maybe use named variables instead of an array, to make things a bit clearer?

This revision now requires changes to proceed.Oct 7 2020, 7:23 AM
david-arm added inline comments.Oct 8 2020, 12:52 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
67

So the reason I structured it like this is because we are still maintaining backwards compatibility with the old style and allowing single integer constants instead of the node. I could deal with the HK_WIDTH up front, but I'd still need all this code. Would you be ok with using named variables instead?

david-arm updated this revision to Diff 296895.Oct 8 2020, 1:51 AM
david-arm marked 2 inline comments as done.Oct 8 2020, 1:52 AM
  • I've updated the documentation for the new attribute format.
  • Added comments to the union members.
  • Renamed the variables in validateAndSet.
sdesmalen added inline comments.Oct 16 2020, 1:29 AM
llvm/docs/LangRef.rst
5990

Rather than talking about two forms, is it sufficient to say:

The vector width is an ElementCount tuple, represented in Metadata as:

.. code-block:: llvm

   !0 = !{!"llvm.loop.vectorize.width", !1}
   !1 = !{i32 4, i32 1}

where ``i32 4`` specifies the vector width and ``i32 1`` indicates if the vectorization factor is scalable, meaning that the loop-vectorizer should use vector-length agnostic vectorization.

For fixed-width vectorizatoin-factors, a short-hand `i32` operand for llvm.loop.vectorize.width is also supported.

.. code-block:: llvm
    !0 = !{!"llvm.loop.vectorize.width", i32 4}
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
67

nit: For the cases below, is it worthing using something like:

auto MaySetIntValue = [this](int IntVal, bool Condition) { if (Condition) this->Value.U32 = IntVal; return Condition; };
auto MaySetECValue = [this](ElementCount EC, bool Condition) { if (Condition) this->Value.EC = EC; return Condition; };

switch (Kind) {
case HK_WIDTH:
  return MaySetECValue(ElementCount::get(IntVal, IsScalable), isPowerOf2_32(IntVal) && IntVal <= VectorizerParams::MaxVectorWidth);
case HK_UNROLL:
  return MaySetIntValue(IntVal, isPowerOf2_32(Val) && Val <= MaxInterleaveFactor);
[...]
}
david-arm updated this revision to Diff 298980.Oct 19 2020, 3:14 AM
  • Updated documentation for vectorize_width loop attribute.
  • Added lambda functions to be used when validating and setting loop hint attributes.
sdesmalen accepted this revision.Oct 26 2020, 2:51 PM

LGTM with nits addressed. @fhahn are you happy with the changes?

llvm/docs/LangRef.rst
5984

nit: s/i32 1/i1 true/

5988

nit:

s/vector width/minimum known vector width/
s/and `i32 1`/and the non-zero value `i1 true`/
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
61

nit: s/unsigned IsScalable = 0;/bool IsScalable = false;/

79

nit: s/Maybe/Conditionally/

85

nit: this is only used once, so better to inline.

llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

nit: s/i32 0/i1 false/

david-arm added inline comments.Oct 27 2020, 2:56 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
61

Given that it is possible to specify this as "i32" type, even if the docs says "i1", are you saying that it's fine to specify the vector width as "i32 4, i32 345", for example? In this example with your changes "i32 345" will be treated as scalable = true. This is the reason I had this value as i32, in order to capture odd cases.

fhahn added a comment.Oct 27 2020, 3:05 AM

LGTM with nits addressed. @fhahn are you happy with the changes?

LG in general, thanks for the updates.

llvm/docs/LangRef.rst
5988

nit: I'd start with saying that the first value of the tuple is the vector width and the second value indicates whether the vectorization factor is scalable and then follow with the example saying this means min vector width of 4 & scalable vectorization

david-arm added inline comments.Oct 27 2020, 7:43 AM
llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

If I make this change the test fails. We will still always print out "i32 0". Is there a way to control the format generated?

david-arm added inline comments.Oct 27 2020, 7:44 AM
llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

Sorry I just realised this is an input to the test, rather than testing the output! Please ignore my comment.

david-arm updated this revision to Diff 301657.Oct 29 2020, 9:37 AM
  • Updated documentation.
  • Addressed review comments.
david-arm marked 9 inline comments as done.Oct 29 2020, 9:39 AM

Hi @fhahn are you happy with the changes now and happy to accept them?

fhahn added a comment.EditedNov 5 2020, 3:38 AM
This comment has been deleted.
llvm/docs/LangRef.rst
5979

ElementCount is an internal name of the LLVM codebase I guess and nothing defines it in LangRef? I think it would be better to drop it, because here it is just a regular metadata tuple and you already outline what the fields hold.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
66

nit: using {0} seems inconsistent with the other braces used here?

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
66

might be good to a comment that we are looking for a (width, isScalable) tuple here.

232

C is only used in the condition, maybe just check mdconst::dyn_extract<ConstantInt>(Arg) in the if?

But now that we pass the metadata node to validateAndSet, do we actually need the check here or can we just do all the validation in the validateAndSet? Seems like the time-savings by existing a bit earlier would be very minor.

llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

also needs a test which actually sets isScalable to true?

david-arm marked an inline comment as done.Nov 5 2020, 4:38 AM
david-arm added inline comments.
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
66

Good spot. Since I'm initialising Value explictly below I can simply remove "Value{0}, " entirely.

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
232

Sure, sounds sensible!

llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

Unfortunately I can't do that at the moment because the vectorizer crashes when trying to perform scalable vectorisation. This is why I only tested the 'false' case. In theory I could create a negative test that tests the vectoriser crashes, which we could remove later once we support vectorisation. Any thoughts @fhahn or @sdesmalen ?

david-arm updated this revision to Diff 303134.Nov 5 2020, 8:50 AM
david-arm marked an inline comment as done.
david-arm marked 4 inline comments as done.
sdesmalen added inline comments.Nov 5 2020, 8:59 AM
llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

The proof of concept for scalable loop vectorization (D90343) depends on this patch and shows the isScalable = true case does something sensible with this information. Hopefully that is sufficient for now while we work on actually implementing scalable auto-vec support in the LoopVectorizer.

[this time adding the comment to the right patch]

Hi @david-arm I just found that two uses of llvm.loop.vectorize.width are not yet updated.

  • WarnMissedTransforms.cpp in warnAboutLeftoverTransformations.
  • LoopUtils.cpp in llvm::hasVectorizeTransformation.

The cases seem quite trivial to fix up, can you include those changes in this patch?

david-arm updated this revision to Diff 303482.Nov 6 2020, 9:17 AM
  • Fixed up some remaining vectorize.width attribute cases.
fhahn added inline comments.Nov 9 2020, 12:53 PM
llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

Hm, I am a bit reluctant to add something to LangRef which causes passes to crash on valid IR. Would it be possible to either convert the scalable VF into a fixed one (replacing n with 1 should be valid?), or just bail out on a scalable VF, without too much effort?

sdesmalen added inline comments.Nov 9 2020, 1:40 PM
llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

Hm, I am a bit reluctant to add something to LangRef which causes passes to crash on valid IR. Would it be possible to either convert the scalable VF into a fixed one (replacing n with 1 should be valid?), or just bail out on a scalable VF, without too much effort?

That would lead to a bit of a chicken/egg problem, because we're adding this feature so we can incrementally make the loop vectorizer cope with scalable vectors. There are currently too many code-paths in there to fix up in one go. D90343 uses this metadata to enable vectorization of an individual loop with scalable vectors and gives us a route to enable and test individual loops in unit tests or in existing code (using the Clang attribute in D89031). The fact that the vectorizer currently crashes on most inputs, is a bug, it just happens to be a bug we know about. It is also a temporary broken state that we will want fix as soon as possible.

Is it not possible to add a line to the LangRef saying that this is experimental until the loop vectorizer supports scalable vectors? (and we remove that line when the vectorizer is stable enough).

llvm/test/Transforms/LoopVectorize/no_array_bounds2.ll
3 ↗(On Diff #303482)

Can you add a description of what this file is testing?

For the file name itself, is there a better name than no_array_bounds2.ll ?

fhahn added inline comments.Nov 10 2020, 3:20 AM
llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

That would lead to a bit of a chicken/egg problem, because we're adding this feature so we can incrementally make the loop vectorizer cope with scalable vectors. There are currently too many code-paths in there to fix up in one go. D90343 uses this metadata to enable vectorization of an individual loop with scalable vectors and gives us a route to enable and test individual loops in unit tests or in existing code (using the Clang attribute in D89031). The fact that the vectorizer currently crashes on most inputs, is a bug, it just happens to be a bug we know about. It is also a temporary broken state that we will want fix as soon as possible.

That makes sense, my suggestion was to just convert the scalable user VF to a fixed one after getting the hint instead of crashing. As in

   // Get user vectorization factor and interleave count.
   ElementCount UserVF = Hints.getWidth();
+  if (UserVF.isScalable()) {
+    // TODO: Use scalable user VF, once LV is ready. For now, just assume n == 1.
+    UserVF = ElementCount::getFixed(UserVF.getKnownMinValue());
+  }
   unsigned UserIC = Hints.getInterleave();

Only processLoopInVPlanNativePath and processLoop would need to be updated I think. as we are free to drop the hints, I think that would be preferable to crashing and should be legal?

sdesmalen added inline comments.Nov 10 2020, 4:12 AM
llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

Okay, so you mean assuming Fixed for now and then re-enable this in e.g. D91077 (which adds scalable vector support for some simple loop, but which doesn't yet guarantee that it won't crash on all inputs).
Yes, I think that makes sense.

fhahn added inline comments.Nov 10 2020, 4:17 AM
llvm/test/Transforms/LoopVectorize/metadata-width.ll
77

+1

david-arm updated this revision to Diff 304478.Nov 11 2020, 4:32 AM
  • In LoopVectorize.cpp I've forced the UserVF to fall back on fixed width vectorisation for now.
  • Added comments to the new test file and renamed it.
david-arm marked 6 inline comments as done.Nov 11 2020, 4:33 AM

Can you add a test-case to llvm/test/Transforms/LoopVectorize/metadata-width.ll where it also tries to compile a loop with scalable VF? (and ensure it falls back to the fixed-width case until we add support for it)

llvm/docs/LangRef.rst
5979–5980

nit: @c-rhodes and I noticed yesterday that the term minimum vector width can give confusion, because it is not the width of the vector that the metadata describes (in terms of bytes), but rather the vectorization factor, i.e. the (minimum) number of vector lanes used to vectorize a loop. Can you clarify that in the description?

david-arm updated this revision to Diff 306006.Nov 18 2020, 1:54 AM
david-arm retitled this revision from [SVE] Add support for scalable vectors in vectorize_width loop attribute to [SVE] Add support for scalable vectors with vectorize.scalable.enable loop attribute.
david-arm edited the summary of this revision. (Show Details)
  • Reverted changes to vectorize.width loop hint attribute.
  • Added new vectorize.scalable.enable loop hint attribute to control scalable vectorisation.

After discussions on the child patch (D89031) we've decided to do things differently so instead of adapting vectorize.width to accept a tuple we're now adding a new vectorize.scalable.enable hint that can be used with vectorize.width to create a ElementCount. For the clang patch I also intend to update it so that we use a new vectorize_style #pragma instead of using vectorize_width to force scalable vectorisation.

paulwalker-arm added a comment.EditedNov 18 2020, 2:54 AM

Hi @david-arm, can you document the reasons for the change?

Personally, I don't care much about the details of the metadata layout so if this switched layout represents value then great. However, I'm less happy with the proposed pragma change because there's been a lot of effort spent (with plenty more to come) converting existing fixed length only representations of vectors into new types that can represent both fixed and scalable vectors. To then force this information to be split into its constitute parts at the user level (i.e. pragmas and command line options) seems like a backward step.

Hi @paulwalker-arm, so the reason for this change is related to the child patch where we changed the vectorize_width #pragma to accept an additional optional argument "scalable" or "fixed". @SjoerdMeijer felt this was the wrong approach and that the scalable property should be specified only as an additional pragma, i.e.

#pragma clang loop vectorize_width(4) vectorize_style(scalable)

The reason for this is that we would have to add vectorize_style(scalable) anyway to support vectorisation where the width isn't specified, i.e.

#pragma clang loop some_other_pragma vectorize_style(scalable)

Also, if we support both a vectorize_style(scalable|fixed) pragma in addition to specifying the scalable property in vectorize_width it also means extra work managing potential conflicts. As a result of this change in approach it made sense to update the LLVM loop hint attribute too to reflect this.

Thanks for the info @david-arm. I just figured we'd support vector_width(2), vector_width(2, fixed), vector_width(2, scalable), vector_width(fixed), vector_width(scalable) so I still say splitting the width property across multiple pragmas is against our goal of moving away from fixed length only representations. That said, if this is the consensus then so be it.

Thanks for the info @david-arm. I just figured we'd support vector_width(2), vector_width(2, fixed), vector_width(2, scalable), vector_width(fixed), vector_width(scalable) so I still say splitting the width property across multiple pragmas is against our goal of moving away from fixed length only representations. That said, if this is the consensus then so be it.

I won't block D89031 and would be happy to be convinced otherwise, but I don't see any advantages of D89031, in fact I see only disadvantages.

The approach here looks good to me; have inlined a question.

llvm/docs/LangRef.rst
5959

I am wondering if we need to describe if and how this interacts with llvm.loop.vectorize.enable?

llvm/test/Transforms/LoopVectorize/no_array_bounds_scalable.ll
66

nit: we probably don't need all of this.

david-arm updated this revision to Diff 307029.Nov 23 2020, 3:37 AM
  • Updated the documentation to mention that the new attribute only has any effect if vectorisation is enabled for the loop.
  • Cleaned up one of the test files.
david-arm marked an inline comment as done.Nov 23 2020, 3:43 AM

Hi @fhahn @SjoerdMeijer @paulwalker-arm, are you happy with the patch now? The implementation on the LLVM IR side of things is independent of the clang pragma patch. For the clang patch I intend to write a proposal to the mailing list for changing the vectorize_width #pragma. @fhahn would you be happy with removing the "Request changes" cross?

llvm/docs/LangRef.rst
5959

I adapted this documentation from the vectorize.predicate.enable case above, which didn't discuss the interaction with vectorize.enable so I just thought I didn't need to here either. I've made it clear that this flag only has any effect if vectorisation is already enabled (although this could be through simply building with -O2).

fhahn added a comment.Nov 24 2020, 5:27 AM

The updated metadata looks fine and should be flexible enough to cover the scenarios outlined by @paulwalker-arm, whatever the clang support looks like. A few small remaining comments.

llvm/include/llvm/Transforms/Utils/LoopUtils.h
217

It this is exposed here, this should probably have a comment describing what this does?

llvm/lib/Transforms/Utils/LoopUtils.cpp
306

Here we get both the integer values of llvm.loop.vectorize.width and llvm.loop.vectorize.scalable.enable", right? Could this be simplified by using the existing getOptionalIntLoopAttribute?

llvm/test/Transforms/LoopVectorize/no_array_bounds_scalable.ll
66

looks like the TBAA metadata still can be removed?

david-arm updated this revision to Diff 307537.Nov 25 2020, 1:21 AM
  • Added documentation to getOptionalElementCountLoopAttribute and simplified the implementation in the function.
  • Removed some tbaa metadata from the tests.
david-arm marked 3 inline comments as done.Nov 25 2020, 1:22 AM
fhahn accepted this revision.Nov 30 2020, 1:01 AM

LGTM, thanks. Given that there has been quite some discussion, it would be good to wait with landing this for a few days, in case there are any more comments.

llvm/include/llvm/Transforms/Utils/LoopUtils.h
220

nit: drop llvm:: (there are some inconsistencies about that in the file, but it should not be required and it is also not used for other things in the llvm namespace, like Loop).

llvm/lib/Transforms/Utils/LoopUtils.cpp
475–476

nit: the extra () are not needed?

This revision is now accepted and ready to land.Nov 30 2020, 1:01 AM

I am also happy with this (and the new proposal in D89031 and the mail to cfe dev list).

What do we do when vectorize.scalable is not supported by the target? We want to issue a diagnostic/remark? Do we need to say something about this in the LangRef part?

Hi @SjoerdMeijer, what I have in the clang pragma patch at the moment is a warning emitted when the target doesn't support scalable vectors, and falling back on fixed width. However, this is something for discussion as I believe there are others who think that the fronted should simply accept the scalable hint regardless and let the vectoriser make that decision based upon cost analysis.

What do we do when vectorize.scalable is not supported by the target? We want to issue a diagnostic/remark? Do we need to say something about this in the LangRef part?

Yes, although I think that's worth addressing separately, I made a similar comment on D91718 where such functionality is needed. If that's addressed separately, are you happy @david-arm to land this patch @SjoerdMeijer?

(it is worth noting that this patch still ignores the 'scalable' VF in the vectorizer. This is only enabled again in D91077, which has a dependence on this patch)