Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[Clang][AArch64] Add/implement ACLE keywords for SME.

Authored by sdesmalen on Jun 14 2022, 9:32 AM.



This patch adds all the language-level function keywords defined in: (merged) (update after D148700 landed)

The keywords are used to control PSTATE.ZA and PSTATE.SM, which are
respectively used for enabling the use of the ZA matrix array and Streaming
mode. This information needs to be available on call sites, since the use
of ZA or streaming mode may have to be enabled or disabled around the
call-site (depending on the IR attributes set on the caller and the
callee). For calls to functions from a function pointer, there is no IR
declaration available, so the IR attributes must be added explicitly to the

With the exception of 'arm_locally_streaming' and 'arm_new_za' the
information is part of the function's interface, not just the function
definition, and thus needs to be propagated through the

This patch adds the defintions of these keywords, as well as codegen and
semantic analysis to ensure conversions between function pointers are valid
and that no conflicting keywords are set. For example, 'arm_streaming'
and '
arm_streaming_compatible' are mutually exclusive.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

By standard imp-limits, apparently we only need to handle 8 bits worth for the Exception Spec types. I'd be tempted, based on exception specs being deprecated and rarely used, to reduce it to an 'unsigned short' (2 bytes), which makes this cost-free on 32 bit machines. WDYT?

It's a bit orthogonal to this patch, but I think it's a reasonable change to make so long as we don't break significant code.

@sdesmalen @rsandifo-arm @aaron.ballman @erichkeane What is the plan for this patch? Is it going to be abandoned in favor of D139028, or can it be landed in its current form after addressing all outstanding review comments?

sdesmalen updated this revision to Diff 528366.Jun 5 2023, 3:32 AM
sdesmalen retitled this revision from [Clang][AArch64] Add ACLE attributes for SME. to [Clang][AArch64] Add/implement ACLE keywords for SME..
sdesmalen edited the summary of this revision. (Show Details)
  • Rebased patch after patch to add RegularKeyword attributes (D148702) landed
  • Split off the request to reduce the size of FunctionTypeExtraBitfields (and use alignment of the struct to pointer type) to a new patch: D152140
  • Added test to ensure that the serialization of type attributes is exported/imported to/from modules, see clang/test/Modules/aarch64-sme-keywords.cppm
  • Added some new tests for template instantiation and function overloading with SME attributes:
void normal_func(void) {}
void streaming_func(void) __arm_streaming {}

template<typename T> int test_templated_f(T);
template<> constexpr int test_templated_f<void(*)(void)>(void(*)(void)) { return 1; }
template<> constexpr int test_templated_f<void(*)(void)__arm_streaming>(void(*)(void)__arm_streaming) { return 2; }

static_assert(test_templated_f(&normal_func) == 1, "Instantiated to wrong function");
static_assert(test_templated_f(&streaming_func) == 2, "Instantiated to wrong function");


// expected-cpp-error@+2 {{'__arm_streaming' only applies to function types; type here is 'int'}}
// expected-error@+1 {{'__arm_streaming' only applies to function types; type here is 'int'}}
int invalid_type_for_attribute __arm_streaming;

// Test overloads
constexpr int overload(void f(void)) { return 1; }
constexpr int overload(void f(void) __arm_streaming) { return 2; }
static_assert(overload(&normal_func) == 1, "Overloaded to wrong function");
static_assert(overload(&streaming_func) == 2, "Overloaded to wrong function");
sdesmalen updated this revision to Diff 528376.Jun 5 2023, 4:39 AM
sdesmalen marked 15 inline comments as done.
  • Use only 6 bits for AArch64SMEAttributes to reduce the size of ExtProtoInfo.
  • Added test for implicit instantiation.
sdesmalen marked 4 inline comments as done.Jun 5 2023, 4:42 AM
sdesmalen added inline comments.

Hi @erichkeane I've added a test for module export/import and was surprised to see this work without any additional changes. Do you think the added test is correct/sufficient to show that this works?


Sorry I missed that comment earlier. This is indeed intentional, there is no trailing __ after __ARM_FEATURE_SME.
The macro is defined here:


I've added a few tests for these cases!

erichkeane added inline comments.

I don't have a good idea of that, I know little about our modules implementation. Perhaps @ChuanqiXu knows?

We ended up getting the regular-attribute-like-keywords patch in a different form. Do we still want to do this as is? Does it need rebasing?

sdesmalen marked an inline comment as done.Jun 5 2023, 6:23 AM

We ended up getting the regular-attribute-like-keywords patch in a different form. Do we still want to do this as is? Does it need rebasing?

I've rebased this patch on top of @rsandifo-arm's patches so that it uses the attribute-like-keywords now (note for example that the tests have been updated to use the new keywords).

D148700 and D148702 only addressed the parsing side of things, not the semantics. This patch adds the remaining attributes/keywords (D148700 only added __arm_streaming), integrates the type information into the AST and adds codegen to the LLVM IR function- and call-site attributes.

I think we need a few tests to ensure that these are properly propagated on function templates (both for Sema and codegen). Also, a handful of nits (See my list).


So, why are __arm_shared_za and __arm_preserves_za different tenses, despite being very similar sounding? Should it be __arm_shares_za (or alternatively, __arm_preserved_za)?


I'm missing a touch of context that an additional word or two might help. is this 'the function requires the 'target' processor has SME? or the 'runtime' processor?

Also, the rest of the list ends in a '.', but this one doesnt.


"ZA" should be defined 1st time it is used in this doc, and the ones below.


Same comments as above.


Same here.




This switch should ONLY be 'handle' function calls. Please break this off into its own function.

sdesmalen updated this revision to Diff 536260.Jun 30 2023, 8:17 AM
sdesmalen marked 11 inline comments as done.

Updated patch:

  • Addressed review comments (clarified documentation, minor refactoring)
  • Added new tests (codegen + sema tests to make sure attributes are propagated to types when using variadic templates). If there's any other specific tests you'd like me to add, let me know.

I waited a bit to update this patch, because I wasn't sure if the syntax might still change pending the conversation on D148700, but at the moment the status is that the keywords have a specific benefit over C++ style attributes in that they are not silently ignored. If the concensus on this changes we can easily support the other syntax through another patch. I hope this patch with semantic support can land if everyone is happy with the implementation. I think I've addressed all outstanding comments now.


Gentle ping @ChuanqiXu

Do you know if this patch requires any further changes for reading/writing modules? The test I've added seems to work fine.


I think this was mostly for historical reasons. @rsandifo-arm mentioned to me that he was still open to renaming these.

Gentle ping @aaron.ballman and @erichkeane. I've addressed all comments and suggestions and believe we found agreement on the approach (to use a keyword instead of an attribute).
Are you happy to accept the patch?

Mostly LGTM from a spec POV, but some comments below.


In current main, this attribute is gated on TargetAArch64 rather than TargetAArch64SME. The reason is that, from a QoI point of view, I don't think we should require the +sme extension in order to accept an __arm_streaming type. +sme is instead required to compile a function with __arm_streaming type, or a call to a function with __arm_streaming type. IMO those are more CodeGen rather than Sema restrictions.

E.g., if a header file contains

void sme_foo(void) __arm_streaming;

then an __attribute__((target_version("sme"))) function should be able to call it. But we shouldn't require +sme to be passed on the command line in order to compile the declaration of sme_foo.

I think the same principle applies to the other function-type attributes. In particular, __arm_preserves_za functions and __arm_streaming_compatible functions do not require SME to be present.

I think we should also allow:

__arm_locally_streaming void __attribute__((target_version("sme"))) ifunc() { … }

since __arm_locally_streaming does not change the function signature.

So I think all of these attributes should be gated on TargetAArch64, with checks for SME made elsewhere where necessary.

(Unfortunately, since command-line options are out of scope for ACLE, I don't think it can really specify this.)


The bullet was trying to describe the function's ABI. I don't think “target” is appropriate in an ABI context (where you might be describing a binary that already exists, rather than a compilation). So I've a slight preference for dropping “target”.

I see in another comment, Erich made a distinction between “target” and “run-time”. If we do use one or the other, I think “run-time” is more correct.

“target” is right for __arm_locally_streaming and __arm_new_za though.


How about “non-streaming” rather than “normal”?


The other type attributes also only describe normal returns.


Just trying to avoid “if available”, since the saves aren't being provided for the benefit of anyone else. Feel free to reword to something else.


This comment probably needs updating for the new field, although the comment already seems pretty out of date. (It looks like the separate bool was removed in 2011.)

I suppose the new field doesn't create any ambiguities because it's always present. But given the comment about performance sensitivity, I wonder if we should combine the new field with the trailing return flag?


Are the !InsideCCAttribute conditions necessary? AIUI, InsideCCAttribute exists to suppress the default calling convention when printing the type to which an explicit CC is added. That doesn't apply to SME attributes, because they aren't modelled as traditional CCs, and because the default/normal case doesn't have any syntax associated with it.


How about something like:

// Handle SME attributes that apply to function definitions,
// rather than to function prototypes.

I can see why the current comment helped when we were still using GNU attributes, due to the way that type attributes “slid” from the declaration to the type. But now that the attributes are non-sliding type attributes, it wouldn't make sense to query D->hasAttr for them.

The suggested change also avoids the contradiction between “we only need to” and “the same holds for”.


The meaning of “from” and “to” seem to be the opposite of what I'd initially assumed. For casts, I'd have expected the type of the object that is being casted to have FromType and the cast type (the type of the result) to be ToType. Dropping ZA would then mean that FromAttributes has __arm_preserves_za and ToType doesn't.

I think that also makes sense for the virtual function check above. The overriding function has FromType and the function it overrides has ToType. We need to be able to convert from FromType to ToType (e.g. to initialise the vtable), but don't need to be able to go the other way.


Is it not possible to address this FIXME yet? Might be worth expanding the comment to say why if so.


I think it'd be worth having a few more of these, in particular testing that an override can be __arm_shared_za __arm_preserves_za if it wants to be.

Gentle ping @aaron.ballman and @erichkeane. I've addressed all comments and suggestions and believe we found agreement on the approach (to use a keyword instead of an attribute).
Are you happy to accept the patch?

I think things are looking close, still some minor issues to correct though.


I think this should also be an unsigned bit-field so that it packs together with the new field added below. Otherwise we're liable to end up with padding between this field and the next bit-field (which will take a full allocation unit anyway). How about unsigned NumExceptionType : 10;? We're reducing the count by 6 bits, but.... I struggle to imagine code being impacted and we're still allowing more than the implementation limits require.


IIRC, at least one of the compilers used to build Clang has issues with adjacent bit-fields of differing types, so we'd usually make these bool fields be unsigned at this point to ensure they pack together properly.


Isn't the more idiomatic form of this written as AArch64SMEAttributes &= ~Kind;?


non-streaming sounds good to me.


Also, remove the single quotes around %0 and %1 -- you're getting duplicate quotes in the test cases because of them.

sdesmalen updated this revision to Diff 546479.Aug 2 2023, 7:56 AM
sdesmalen marked 19 inline comments as done.
  • The attributes no longer require +sme to be passed.
  • Fixed IsInvalidSMECallConversion to consider the correct To/From types and added a corresponding test for the virtual override functions.
  • Created a new category for the Arm SME attributes in the Clang Attributes document.
  • Addressed all other comments.

Thanks both for the detailed review and latest round of comments, I've tried to address them all.


That's what I did initially in the patch where I limited NumExceptionType to 16bits (D152140), but there was the preference to have this be a uint16_t instead. I've changed it back again to a bitfield though.


That's a fair point. This change was partially made in D152141, so I've updated both patches to reflect the right behaviour.


@rsandifo-arm pointed out that I marked this comment as done, but hadn't done it. I've fixed that now, and created a new category for the SME extensions.


Thanks! That explains the double quotes indeed.


You're right, they were not necessary! Thanks.


Thanks for pointing out, that was indeed wrong. I've fixed it, and added a test-case for the virtual function override for the __arm_preserves_za case.


This was a stale comment, these cases are checked later on in section:

// Check for incorrect conversions of function pointers with the attributes
aaron.ballman accepted this revision.Aug 2 2023, 9:17 AM

LGTM, though please give other active reviewers a bit of time to chime in before landing. Note, @erichkeane is out on sabbatical at the moment, so there's no need to wait for his LG (he can leave post-commit comments on the review when he gets back though, so keep your eyes peeled for that just in case).


I'm sorry for the back and forth! I think Erich's suggestion was wrong in the other thread, mostly because of MSVC:

This revision is now accepted and ready to land.Aug 2 2023, 9:17 AM

Thanks for dropping the attribute requirements down to TargetAArch64. But the offsetting requirement was that something somewhere (CodeGen?) must raise an error if code requires SME and SME isn't present. I think that means:

  1. SME is not enabled for an __arm_streaming or __arm_locally_streaming function definition.
  2. SME is not enabled for a call to an __arm_streaming function.
  3. SME is not enabled for an __arm_shared_za or __arm_new_za function definition.

(Calls to __arm_shared_za functions don't need to be listed explicitly, since only __arm_shared_za and __arm_new_za functions can make such calls.)

In all cases, the check would apply after __attribute__((target_version)) is taken into account.

Perhaps that's a separate patch though.


Thanks, this is indeed the case I was suggesting to test. But my point was that it shouldn't be an error. It's OK for an override to be __arm_preserves_za even if the function it overrides isn't. Something that calls shared_za_member directly from an S2_2 object can take advantage of the extra guarantee.

It's the other way that's wrong. If a virtual function is __arm_preserves_za then any override must be too.

The principle is similar to covariant return types. E.g.:

struct S1 { virtual S1 *f(); };
struct S2 : S1 { S2 *f() override; }; // OK
struct S3 : S2 { S2 *f() override; }; // OK
struct S4 : S3 { S1 *f() override; }; // Error

An __arm_preserves_za function acts like functions that return S2 * in this example, and a non-__arm_preserves_za function acts like functions that return S1 *.

sdesmalen updated this revision to Diff 547696.Aug 7 2023, 3:25 AM

Added enum AArch64SMECallConversionKind to make it explicit when calling
IsInvalidSMECallConversion whether or not 'preserves_za' can be dropped
(e.g. in assignment), can be added (e.g. when overriding a parent method)
or must match exactly (e.g. when redefining a function).

I'll post a separate patch for guarding the SME attributes when not
compiling for a target with sme.

rsandifo-arm accepted this revision.Aug 7 2023, 6:53 AM

LGTM too, thanks.

This revision was landed with ongoing or failed builds.Aug 8 2023, 12:31 AM
This revision was automatically updated to reflect the committed changes.
dewen added a subscriber: dewen.Tue, Sep 5, 8:47 PM