This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add support to vectorize_width loop pragma for scalable vectors
ClosedPublic

Authored by david-arm on Oct 8 2020, 3:00 AM.

Details

Summary

This patch adds support for two new variants of the vectorize_width
pragma:

  1. vectorize_width(X[, fixed|scalable]) where an optional second

parameter is passed to the vectorize_width pragma, which indicates if
the user wishes to use fixed width or scalable vectorization. For
example the user can now write something like:

#pragma clang loop vectorize_width(4, fixed)

or

#pragma clang loop vectorize_width(4, scalable)

In the absence of a second parameter it is assumed the user wants
fixed width vectorization, in order to maintain compatibility with
existing code.

  1. vectorize_width(fixed|scalable) where the width is left unspecified,

but the user hints what type of vectorization they prefer, either
fixed width or scalable.

I have implemented this by making use of the LLVM loop hint attribute:

llvm.loop.vectorize.scalable.enable

Tests were added to

clang/test/CodeGenCXX/pragma-loop.cpp

for both the 'fixed' and 'scalable' optional parameter.

See this thread for context: http://lists.llvm.org/pipermail/cfe-dev/2020-November/067262.html

Diff Detail

Event Timeline

david-arm created this revision.Oct 8 2020, 3:00 AM
Herald added a project: Restricted Project. · View Herald Transcript
fhahn added inline comments.Oct 15 2020, 1:56 PM
clang/lib/Parse/ParsePragma.cpp
1096

nit: unrelated change?

clang/lib/Sema/SemaStmtAttr.cpp
144

Is there a way to only accept fixed_width/scalable for targets that support it? Not sure if we have enough information here, but we might be able to reject it eg per target basis or something

david-arm marked an inline comment as done.Oct 19 2020, 5:16 AM
david-arm added inline comments.
clang/lib/Sema/SemaStmtAttr.cpp
144

Hi @fhahn, I think if possible we'd prefer not to reject scalable vectors at this point. Theoretically there is no reason why we can't perform scalable vectorisation for targets that don't have hardware support for scalable vectors. In this case it simply means that vscale is 1. If you want we could add some kind of opt-remark in the vectoriser that says something like "target does not support scalable vectors, vectorising for vscale=1"?

sdesmalen accepted this revision.Oct 30 2020, 8:42 AM

LGTM

clang/lib/Sema/SemaStmtAttr.cpp
144

I agree with @david-arm that we shouldn't prevent this in the front-end. Even if the architecture may not support scalable vectors natively, there may still be reasons to want to create scalable vectors in IR, for example to have more portable IR.

clang/test/CodeGenCXX/pragma-loop.cpp
166

out of curiosity, is there a particular reason you're testing it with a do-while loop instead of a shorter for-loop like the tests in for_template_constant_expression_test ?

This revision is now accepted and ready to land.Oct 30 2020, 8:42 AM
fhahn added inline comments.Oct 30 2020, 8:56 AM
clang/lib/Sema/SemaStmtAttr.cpp
144

Hm, I am just a bit worried that it might be a bit confusing to users that do not know what scalable vectors are (it is obvious when knowing all about SVE, but I would assume most people don't necessarily know what this means). I guess that is not the biggest deal, as long vectorize_width(X, scalable) works for every target.

Even if the architecture may not support scalable vectors natively, there may still be reasons to want to create scalable vectors in IR, for example to have more portable IR.

Sure, but there are so many other target-specific things encoded that make the IR really un-portable between targets. Granted, it is not impossible to convert IR between some architectures (as in arm64_32)

sdesmalen requested changes to this revision.Nov 3 2020, 1:18 AM
sdesmalen added inline comments.
clang/docs/LanguageExtensions.rst
3039

Can you add a comment saying that the use of "scalable" is still experimental and is currently only intended to work for targets that support scalable vectors?

clang/lib/Sema/SemaStmtAttr.cpp
144

Sorry, forgot to reply to this.

Hm, I am just a bit worried that it might be a bit confusing to users that do not know what scalable vectors are (it is obvious when knowing all about SVE, but I would assume most people don't necessarily know what this means). I guess that is not the biggest deal, as long vectorize_width(X, scalable) works for every target.

At the moment this feature is still experimental, so I don't think any target would be able to return true to the question if this is supported :) That said, I agree that the compiler shouldn't crash for other targets after support in the loop-vectorizer stops being experimental. So I'm changing my mind here, and am happy to go with your suggestion to ignore the flag for other targets. When some default mechanism is added to lower scalable vectors to fixed-width vectors (for targets that don't natively support them), this check can probably be removed. @david-arm can you add some target hook to ignore the hint?

Sure, but there are so many other target-specific things encoded that make the IR really un-portable between targets. Granted, it is not impossible to convert IR between some architectures (as in arm64_32)

I didn't mean portable between targets, but more as keeping the length of the vector agnostic in IR and leaving it until code-generation to pick a suitable/available vector extension, so that the same IR could be used to generate code for Neon or 256bit SVE for example. This is more a hypothetical use-case at the moment though.

This revision now requires changes to proceed.Nov 3 2020, 1:18 AM
david-arm updated this revision to Diff 302773.Nov 4 2020, 1:20 AM
david-arm marked an inline comment as done.
david-arm edited the summary of this revision. (Show Details)
sdesmalen added inline comments.Nov 4 2020, 7:44 AM
clang/include/clang/Basic/DiagnosticSemaKinds.td
939 ↗(On Diff #302773)

From what I can see, the vectorize_width flag is not ignored, only the scalable property is. That means this should be:

'scalable' not supported by the target so assuming 'fixed' instead.
clang/lib/Sema/SemaStmtAttr.cpp
147

If the target does not support scalable vectors, it currently assumes "fixed". If we want to stick with that approach, the diagnostic message should be changed (see my other comment). The alternative is dropping the hint entirely by returning nullptr and changing the diagnostic message to say the hint is ignored. I could live with both options. @fhahn do you have a preference here?

nit: to reduce nesting, can you hoist this out one level, e.g.

if (StateLoc && StateLoc->Ident & ...)
  State = LoopHintAttr::ScalableNumeric;
else
  State = LoopHintAttr::Numeric;

if (State == LoopHintAttr::ScalableNumeric &&
    !S.Context.getTargetInfo().supportsScalableVectors()) {
  S.Diag(....);
  State = LoopHintAttr::Numeric;
}

I'll hold off on any more changes for now to give @fhahn a chance to reply to your comment @sdesmalen about the fallback behaviour when scalable vectorisation is unsupported.

clang/include/clang/Basic/DiagnosticSemaKinds.td
939 ↗(On Diff #302773)

OK. I guess it's just when the warning comes out it appears at the start of the line so I wanted to emphasise that this relates to the scalable property passed to the vectorize_width attribute (rather than other attributes) as there could potentially be several pragmas on one line. I think it would be good to mention the vectorize_width pragma/attribute somewhere in the warning message to make it clear. I'll see if I can reword it.

This comment was removed by sdesmalen.
david-arm updated this revision to Diff 304488.Nov 11 2020, 5:16 AM
david-arm marked an inline comment as done.
c-rhodes added inline comments.Nov 11 2020, 5:57 AM
clang/docs/LanguageExtensions.rst
3031

nit: s/__value__/_value_/

I am very sorry that I am late to this... but I do have some concerns.

The concern that I have is that we extend vecorize_width with a scalable/fixed boolean, but there are more vectorisation pragma that set vectorisation options which imply enabling vectorisation:

Pragmas setting transformation options imply the transformation is enabled, as if it was enabled via the corresponding transformation pragma (e.g. vectorize(enable))

Thus, unless I miss something, I don't think this should be an option to vectorize_width, but to me it looks like we need a separate one, e.g.:

vectorize_scalable(enable|disable)

what do you think?

Hi @SjoerdMeijer I think that given we now support scalable vectors we thought it made sense to be able to specify whether the user wants 'fixed' or 'scalable' vectorisation along with the vector width, although without specifying the additional property the default continues to remain 'fixed'. However, what you said about having a vectorize_scalable pragma is correct and we are intending to also add a pragma like this in a future patch.

Hi @SjoerdMeijer I think that given we now support scalable vectors we thought it made sense to be able to specify whether the user wants 'fixed' or 'scalable' vectorisation along with the vector width, although without specifying the additional property the default continues to remain 'fixed'. However, what you said about having a vectorize_scalable pragma is correct and we are intending to also add a pragma like this in a future patch.

Okay, I haven't looked at the implementation to be honest, but am just trying to understand the different use cases of this first.
I just seem to be missing or not understanding why fixed/scalable is an option to only vectorize_width, why not to vectorize(enable) or just a separate one like vectorize_scalable? By making scalable/fixed and option to vectorize_width, you can't toggle this for other pragmas like interleave(enable) that enable vectorisation, which would be inconsistent? It also seems to be more work to me to do this first for vectorize_width, and then fix up other pragmas later. But I might be missing something (obivous) here.

Hi @SjoerdMeijer I think that given we now support scalable vectors we thought it made sense to be able to specify whether the user wants 'fixed' or 'scalable' vectorisation along with the vector width, although without specifying the additional property the default continues to remain 'fixed'. However, what you said about having a vectorize_scalable pragma is correct and we are intending to also add a pragma like this in a future patch.

Okay, I haven't looked at the implementation to be honest, but am just trying to understand the different use cases of this first.
I just seem to be missing or not understanding why fixed/scalable is an option to only vectorize_width, why not to vectorize(enable) or just a separate one like vectorize_scalable? By making scalable/fixed and option to vectorize_width, you can't toggle this for other pragmas like interleave(enable) that enable vectorisation, which would be inconsistent? It also seems to be more work to me to do this first for vectorize_width, and then fix up other pragmas later. But I might be missing something (obivous) here.

Hi @SjoerdMeijer, all valid and good questions. We think it makes sense to allow specifying explicitly what the meaning of '4' is when specifying the width. So that vectorize_width(4, fixed) means vectorizing with <4 x eltty> and vectorize_width(4, scalable) means vectorizing with <vscale x 4 x eltty>. Like @david-arm said, we also plan to add something like vectorize_style(fixed|scalable). This approach should be fully complementary to vectorize_with so that it would be possible to have:

// Use scalable vectors, but leave it to the cost-model to choose the most efficient N in <vscale x N x eltty>.
// If the pragma is not specified, it defaults to vectorize_style(fixed).
#pragma clang loop vectorize_style(scalable)

// Use <4 x eltty>
#pragma clang loop vectorize_width(4, fixed)

// Use <vscale x 4 x eltty>
#pragma clang loop vectorize_width(4, scalable)

// If vectorize_style(scalable) is specified, then use <vscale x 4 x eltty>, otherwise <4 x eltty>
#pragma clang loop vectorize_width(4)                           // uses <4 x eltty>
#pragma clang loop vectorize_width(4) vectorize_style(scalable) // uses <vscale x 4 x eltty>

// Conflicting options, clang should print diagnostic and error or ignore the hint.
#pragma clang loop vectorize_width(4, fixed) vectorize_style(scalable)

I hope that gives a bit more context.

This approach should be fully complementary to vectorize_with so that it would be possible to have:

// Use scalable vectors, but leave it to the cost-model to choose the most efficient N in <vscale x N x eltty>.
// If the pragma is not specified, it defaults to vectorize_style(fixed).
#pragma clang loop vectorize_style(scalable)

// Use <4 x eltty>
#pragma clang loop vectorize_width(4, fixed)

// Use <vscale x 4 x eltty>
#pragma clang loop vectorize_width(4, scalable)

// If vectorize_style(scalable) is specified, then use <vscale x 4 x eltty>, otherwise <4 x eltty>
#pragma clang loop vectorize_width(4)                           // uses <4 x eltty>
#pragma clang loop vectorize_width(4) vectorize_style(scalable) // uses <vscale x 4 x eltty>

// Conflicting options, clang should print diagnostic and error or ignore the hint.
#pragma clang loop vectorize_width(4, fixed) vectorize_style(scalable)

I hope that gives a bit more context.

Ok, thanks for clarifying that!

If:

// Use <vscale x 4 x eltty>
#pragma clang loop vectorize_width(4, scalable)

is equivalent to:

// uses <vscale x 4 x eltty>
#pragma clang loop vectorize_width(4) vectorize_style(scalable)

then I think that illustrates that I don't see the point of extending vectorize_width because we still can't express scalable vectorisation for:

// <VF x eltty>
#pragma clang loop vectorize_predicate(enable)

and also for interleave_count(4)?

Again, when the idea is to have vectorize_style anyway, wouldn't it be easier not to bother extending vectorize_width and just go for vectorize_style? It allows you to specify fixed/scalable vectorisation in one way, and avoids having conflicting options.

The other thing I thought about: this is extending an existing user-facing pragma, and notifying the list would probably be best thing to do.

As I see it there are a bunch of pragmas that all enable vectorisation, with each pragma providing a unit of information. One component of this information is the vectorisation factor hint provided by vectorize_width.

With the introduction of scalable vectors this hint is using the wrong datatype and thus needs to be updated to allow vectorize_width(#num,[fixed|scalable]) and vectorize_width([fixed|scalable]) along side the existing vectorize_width(#num) representation that effectively becomes an alias to vectorize_width(#num, fixed).

Doing this means all existing usages work as expected and there's now extra power to better guide the chosen vectorisation factor.

SjoerdMeijer added a comment.EditedNov 18 2020, 8:14 AM

Because I was not understanding, we have discussed this further offline.

I think the conclusion was: pragma vectorize_width controls the vectorisation vector VF in <vscale x VF x elty>. where vscale is not just a separate thing but it defines a VectorType. That's why it would make sense to attach scalable|fixed to vectorize_width. I agree with this and seems reasonable.

I still don't see how the proposed extension here allows you to specify fixed width vectorisation for:

#pragma clang loop interleave_count(4)

targeting SVE and would appreciate if someone can comment on this example, but I won't be holding up this work anymore as this might be addressed later.

Thanks @david-arm for posting this proposal to the cfe list.
My confusion has been cleared up. The (new) proposal is to have:

  1. vectorize_width(X) where X is an integer.
  2. vectorize_width(X, fixed|scalable)
  3. vectorize_width(fixed|scalable)

And with that 3rd option I agree that this allows us to express everything we want, and this patch needs adapting to this new proposal (just stating the obvious for clarity/completeness)

If scalable is used on a target that doesn't support this, a warning and falling back to fixed seems like the right thing to do.

I did have concerns about this, similarly like @fhahn:

Hm, I am just a bit worried that it might be a bit confusing to users that do not know what scalable vectors are (it is obvious when knowing all about SVE, but I would assume most people don't necessarily know what this means). I guess that is not the biggest deal, as long vectorize_width(X, scalable) works for every target.

But since the new scalable option is opt-in, people don't need know about this if they don't want/need to, this should (hopefully) not be the case.

david-arm updated this revision to Diff 312802.Dec 18 2020, 8:11 AM
david-arm edited the summary of this revision. (Show Details)
aaron.ballman added inline comments.Dec 18 2020, 8:16 AM
clang/include/clang/Basic/Attr.td
3294–3295

Should the documentation in AttrDocs.td be updated for this change?

david-arm added inline comments.Dec 21 2020, 8:29 AM
clang/include/clang/Basic/Attr.td
3294–3295

Hi @aaron.ballman I had a look at LoopHintDocs in AttrDocs.td and it didn't explicitly mention these states, i.e. "assume_safety", "numeric", etc., so I'm not sure if it's necessary to add anything there?

aaron.ballman added inline comments.Dec 21 2020, 10:54 AM
clang/include/clang/Basic/Attr.td
3294–3295

Oh, I see now, we're deferring to the documentation in the language extensions document. I suppose that's fine as far as this patch goes, sorry for the noise.

Hi everyone, I realise that most people have probably been on holiday recently, but just a gentle ping here to see if anyone could take another look? Thanks!

SjoerdMeijer accepted this revision.Jan 6 2021, 9:39 AM

LGTM, perhaps wait a day with committing in case there are more comments.

clang/include/clang/Basic/Attr.td
3294–3295

Nit: formatting, exceeding 80 columns?

3295

same?

sdesmalen added inline comments.Jan 6 2021, 2:02 PM
clang/docs/LanguageExtensions.rst
3033

nit: s/In this case//

3035

nit: Another use of vectorize_width is

3037–3039

nit: In both variants of the pragma the vectorizer may decide to fall back on fixed width vectorization if the target does not support scalable vectors.

clang/include/clang/Basic/DiagnosticParseKinds.td
1384

use vectorize_width(X, fixed) or vectorize_width(X, scalable)
(it may otherwise lead to confusion whether fixed/scalable needs quotes, same below)

clang/lib/AST/AttrImpl.cpp
46 ↗(On Diff #312802)

is there always a value, even when "vectorize_width(scalable)" is specified?

clang/lib/CodeGen/CGLoopInfo.cpp
761–763

is that not something to fix in the code that conditionally sets vectorize.enable later on instead of working around it here?

david-arm added inline comments.Jan 7 2021, 12:30 AM
clang/lib/CodeGen/CGLoopInfo.cpp
761–763

I did originally try to do that, but I had trouble with it and found it broke other places too. It ended up being simpler to fix it here, but I can play around with it again. Even if this is still the simplest solution I can come back with a more detailed explanation at least!

david-arm updated this revision to Diff 315121.Jan 7 2021, 6:24 AM
  • Updated documentation as per review comments.
  • Fixed an issue with using value->prettyPrint on a null ptr.
  • Reworked the code that sets vectorize.enable.
david-arm marked 8 inline comments as done.Jan 7 2021, 6:25 AM
sdesmalen accepted this revision.Jan 7 2021, 7:19 AM

LGTM, thanks for all the changes @david-arm!

clang/lib/AST/AttrImpl.cpp
46 ↗(On Diff #315121)

nit:

if (value) {
  value->printPretty(OS, nullptr, Policy);
  OS << ", ";
}
OS << (state == ScalableWidth ? "scalable" : "fixed";

?

clang/lib/CodeGen/CGLoopInfo.cpp
312

nit: != 1 (it should be functionally the same because the >1 is already caught above, but this is specifically testing that VF=1 (scalar) is not specified)

This revision is now accepted and ready to land.Jan 7 2021, 7:19 AM