This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add support for “regular” keyword attributes
ClosedPublic

Authored by rsandifo-arm on Apr 19 2023, 3:21 AM.

Details

Summary

Platform-specific language extensions often want to provide a way of
indicating that certain functions should be called in a different way,
compiled in a different way, or otherwise treated differently from a
“normal” function. Honoring these indications is often required for
correctness, rather being than an optimization/QoI thing.

If a function declaration has a property P that matters for correctness,
it will be ODR-incompatible with a function that does not have property P.
If a function type has a property P that affects the calling convention,
it will not be two-way compatible with a function type that does not
have property P. These properties therefore affect language semantics.
That in turn means that they cannot be treated as standard [[]]
attributes.

Until now, many of these properties have been specified using GNU-style
attributes instead. GNU attributes have traditionally been more lax
than standard attributes, with many of them having semantic meaning.
Examples include calling conventions and the vector_size attribute.

However, there is a big drawback to using GNU attributes for semantic
information: compilers that don't understand the attributes will
(by default) emit a warning rather than an error. They will go on to
compile the code as though the attributes weren't present, which will
inevitably lead to wrong code in most cases. For users who live
dangerously and disable the warning, this wrong code could even be
generated silently.

A more robust approach would be to specify the properties using
keywords, which older compilers would then reject. Some vendor-specific
extensions have already taken this approach. But traditionally, each
such keyword has been treated as a language extension in its own right.
This has three major drawbacks:

(1) The parsing rules need to be kept up-to-date as the language evolves.

(2) There are often corner cases that similar extensions handle differently.

(3) Each extension requires more custom code than a standard attribute.

The underlying problem for all three is that, unlike for true attributes,
there is no established template that extensions can reuse. The purpose
of this patch series is to try to provide such a template.

One option would have been to pick an existing keyword and do whatever
that keyword does. The problem with that is that most keywords only
apply to specific kinds of types, kinds of decls, etc., and so the
parsing rules are (for good reason) not generally applicable to all
types and decls.

Really, the “only” thing wrong with using standard attributes is that
standard attributes cannot affect semantics. In all other respects
they provide exactly what we need: a well-defined grammar that evolves
with the language, clear rules about what an attribute appertains to,
and so on.

This series therefore adds keyword “attributes” that can appear
exactly where a standard attribute can appear and that appertain
to exactly what a standard attribute would appertain to. The link is
mechanical and no opt-outs or variations are allowed. This should
make the keywords predictable for programmers who are already
familiar with standard attributes.

This does mean that these keywords will be accepted for parsing purposes
in many more places than necessary. Inappropriate uses will then be
diagnosed during semantic analysis. However, the compiler would need
to reject the keywords in those positions whatever happens, and treating
them as ostensible attributes shouldn't be any worse than the alternative.
In some cases it might even be better. For example, SME's
__arm_streaming attribute would make conceptual sense as a statement
attribute, so someone who takes a “try-it-and-see” approach might write:

__arm_streaming { …block-of-code…; }

In fact, we did consider supporting this originally. The reason for
rejecting it was that it was too difficult to implement, rather than
because it didn't make conceptual sense.

One slight disadvantage of the keyword-based approach is that it isn't
possible to use #pragma clang attribute with the keywords. Perhaps we
could add support for that in future, if it turns out to be useful.

For want of a better term, I've called the new attributes "regular"
keyword attributes (in the sense that their parsing is regular wrt
standard attributes), as opposed to "custom" keyword attributes that
have their own parsing rules.

This patch adds the Attr.td support for regular keyword attributes.
Adding an attribute with a RegularKeyword spelling causes tablegen
to define the associated tokens and to record that attributes created
with that syntax are regular keyword attributes rather than custom
keyword attributes.

A follow-on patch contains the main Parse and Sema support,
which is enabled automatically by the Attr.td definition.

Other notes:

  • The series does not allow regular keyword attributes to take

arguments, but this could be added in future.

  • I wondered about trying to use tablegen for

TypePrinter::printAttributedAfter too, but decided against it.
RegularKeyword is really a spelling-level classification rather
than an attribute-level classification, and in general, an attribute
could have both GNU and RegularKeyword spellings. In contrast,
printAttributedAfter is only given the attribute kind and the type
that results from applying the attribute. AFAIK, it doesn't have
access to the original attribute spelling. This means that some
attribute-specific or type-specific knowledge might be needed
to print the attribute in the best way.

  • Generating the tokens automatically from Attr.td means that

pseudo's libgrammar does now depend on tablegen.

  • The patch uses the SME __arm_streaming attribute as an example

for testing purposes. The attribute does not do anything at this
stage. Later SME-specific patches will add proper semantics for it,
and add other SME-related keyword attributes.

Diff Detail

Event Timeline

rsandifo-arm created this revision.Apr 19 2023, 3:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2023, 3:21 AM
rsandifo-arm requested review of this revision.Apr 19 2023, 3:21 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 19 2023, 3:21 AM
erichkeane added inline comments.Apr 19 2023, 6:42 AM
clang/include/clang/Basic/TokenKinds.h
106

Does AttrTokenKinds.inc not do this undef for you? Most of our tablegen'ed/.td files tend to...

Edit now that I've gone through the whole patch:

I see the inclusion into TokenKinds.def makes adding one a pain, but I suggest instead making these be KEYWORD_ATTRIBUTE(...), with all the 'fixings' that comes in TokenKinds.def (see the defines at the top), and then including TokenKinds.def here instead.

Make AttrTokenKinds.inc use a new KEYWORD_ATTRIBUTE macro, rather than
using KEYWORD directly. Move the #undef to the .inc file.

rsandifo-arm marked an inline comment as done.Apr 19 2023, 9:40 AM
rsandifo-arm added inline comments.
clang/include/clang/Basic/TokenKinds.h
106

Ah, yeah, that's neater, thanks. It also means that the KEYALL can be moved to TokenKinds.def until such time (hopefully never) as it becomes necessary to vary it based on attribute.

Done in the updated version.

Generally ok with all of this, but want some time to think about it/give aaron/etc time to see it.

clang/include/clang/Basic/TokenKinds.def
751

Probably should be: #ifndef KEYWORD_ATTRIBUTE here.

rsandifo-arm marked an inline comment as done.

Add #ifndef guard.

rsandifo-arm marked an inline comment as done.Apr 19 2023, 11:48 AM
rsandifo-arm added inline comments.
clang/include/clang/Basic/TokenKinds.def
751

Oops, yes, sorry.

rsandifo-arm marked an inline comment as done.May 8 2023, 11:57 PM

Generally ok with all of this, but want some time to think about it/give aaron/etc time to see it.

Hi @erichkeane & @aaron.ballman Thanks for the reviews so far. Do you have any more thoughts on this series?

aaron.ballman added inline comments.May 9 2023, 6:02 AM
clang/include/clang/Basic/Attr.td
2427–2430

I'd feel more comfortable switching an existing attribute over to use this new functionality instead of introducing a new attribute at the same time. (Also, we ask that there be no new undocumented attributes unless there's a Very Good Reason to leave it undocumented.)

clang/include/clang/Lex/Token.h
122
clang/lib/AST/TypePrinter.cpp
1724–1727

This seems like something that tablegen should automatically handle more generally for these attributes.

Thanks for the review.

clang/include/clang/Basic/Attr.td
2427–2430

Could you suggest an attribute that would be suitable? The problem is that, as far as I know, no existing keyword has parse rules that exactly match the rules for standard attributes. (That, and the fact that different keywords have different rules, was one of the main motivations for the patch). So I don't think it would be possible to adopt the new scheme for existing attributes without breaking compatiblity (both ways: some things would become valid that weren't before, and some things that were valid before would become invalid).

E.g.:

__declspec

I don't think anyone was suggesting we try to convert this :-) but for completeness:

int __declspec(align(16)) a1; // OK, but standard attribute rules would say this appertains to the type
int a2 __declspec();          // Not valid, unlike for standard decl attributes
__forceinline
int __forceinline a1() { return 1; } // OK, but standard attribute rules would say this appertains to the type
int a2 __forceinline() { return 1; } // Not valid, but would be for standard decl attributes
Calling convention keywords

These are generally DeclOrType attributes, with attributes sliding from the decl to the type where necessary.

__stdcall int a1();   // OK, but appertains to the decl and relies on sliding behaviour
int a2 __stdcall();   // Not valid, but is another place to put standard decl attributes
int a3() __stdcall;   // Not valid, but is another place to put standard type attributes
int (__stdcall a4)(); // OK, but standard attributes aren't allowed in this position
extern int (*const __stdcall volatile a5) (); // OK, but standard attributes wouldn't be allowed here
Nullability keywords

Like the calling-convention keywords, the nullability keywords appertain to types but are allowed in some (but not all) positions that would normally appertain to decls:

using ptr = int *;
_Nullable ptr a1; // OK, but appertains to the decl and relies on sliding behaviour
ptr a2 _Nullable; // Not valid, but is another place to put standard decl attributes
extern int *const _Nullable volatile a3; // OK, but a standard attribute wouldn't be allowed here

The same distinction applies to Microsoft pointer attributes.

On the documentation side: I admit this is one of the awkward things about using a new keyword to prove the system. The series doesn't actually implement __arm_streaming (@sdesmalen has patches for that, and other SME stuff). So it didn't seem appropriate to document the attribute until it was actually implemented. But the implementation, when it comes, will/should have documentation.

clang/lib/AST/TypePrinter.cpp
1724–1727

I wondered about trying to use tablegen for TypePrinter::printAttributedAfter, but decided against it.
RegularKeyword is really a spelling-level classification rather than an attribute-level classification, and in general, an attribute could have both GNU and RegularKeyword spellings. In contrast, printAttributedAfter is only given the attribute kind and the type that results from applying the attribute. AFAIK, it doesn't have access to the original attribute spelling. This means that some attribute-specific or type-specific knowledge might be needed to print the attribute in the best way.

Hi @aaron.ballman and @erichkeane . Do you have any more thoughts on this? In principle, I'm happy to convert an existing attribute over to the new scheme, but in practice, I can't find one that seems to be suitable. If we're going to do that, I think I'll need guidance as to which attribute to convert.

Alternatively, I could try to classify existing keywords into groups based on their current parsing rules and tablegen-ify them based on that. E.g. I could add NullabilityKeyword for _Nonnull, _Nullable, _Nullable_result, _Null_unspecified, and for anything else that happens to use exactly those parsing rules. I could then replace direct checks for the nullability keyword tokens with checks for a NullabilityKeyword spelling.

But the tablegen side wasn't really the main focus of this. It's more the principle of having an optional, mechanical link between attribute keywords and the standard attribute parsing rules.

I don't have any good suggestions unfortunately. Perhaps Aaron does? I went through our list as well, and don't believe I see a good candidate. While I'm generally against an undocumented attribute, I'd be OK with either an immediate follow-up patch that documents it with an implementation, or, more acceptably: just a note at the top of the document that says something like, "this is not yet implemented, but will SOON do the following" at the top, which gets removed when implemented (it could perhaps also be a "This is a WIP" disclaimer as well).

Add documentation. Fix a comment cut-&-pasto that Aaron pointed out.

rsandifo-arm marked an inline comment as done.May 16 2023, 10:43 AM

Thanks @erichkeane . Adding the documentation with that kind of disclaimer sounds good to me. I've done that in the updated version, and also fixed the comment problem that Aaron pointed out.

Hi @erichkeane and @aaron.ballman. Does the updated patch look OK?

This basically LGTM, but we should add documentation to the internals manual too: https://clang.llvm.org/docs/InternalsManual.html#spellings

clang/lib/AST/TypePrinter.cpp
1724–1727

Yeah, we've run into this problem a number of times. Ultimately, we should retain what spelling the user used because that's salient information for the AST (for example, doing AST matching looking for an AlignedAttr attribute where the spelling used matters for the intended semantics. But that's not something you need to address.

clang/lib/Sema/SemaDeclAttr.cpp
5262

Add internals documentation. Add FIXME to temporary code.

rsandifo-arm marked an inline comment as done.May 23 2023, 12:34 PM

This basically LGTM, but we should add documentation to the internals manual too: https://clang.llvm.org/docs/InternalsManual.html#spellings

Oops, hadn't seen that documentation. Added in the updated patch, thanks!

aaron.ballman accepted this revision.May 24 2023, 11:39 AM

LGTM! Thank you for your patience while we worked through the design details. :-)

This revision is now accepted and ready to land.May 24 2023, 11:39 AM

Thanks @aaron.ballman and @erichkeane for your patience in reviewing the patch and steering me in the right direction.

What do you think about the other two patches in the series:

The first one is pretty mechanical. The second is going to be a slog to review, sorry. It's one of those things that's much easier to write than to check afterwards. I've tried to capture all the potentially non-obvious/non-mechanical parts in the covering note, but a long covering note might just make the review even more painful. I'm happy to try presenting it in a different form if you can think of one that would help.

Thanks again for accepting this patch, really appreciate it.

Thanks @aaron.ballman and @erichkeane for your patience in reviewing the patch and steering me in the right direction.

What do you think about the other two patches in the series:

The first one is pretty mechanical. The second is going to be a slog to review, sorry. It's one of those things that's much easier to write than to check afterwards. I've tried to capture all the potentially non-obvious/non-mechanical parts in the covering note, but a long covering note might just make the review even more painful. I'm happy to try presenting it in a different form if you can think of one that would help.

Thanks again for accepting this patch, really appreciate it.

Thank you for the reminder about the other patches in the stack, I had lost track of those. I was able to review the first one today, but I don't think I'll have the bandwidth to get the second one done today (I'm about to head out for a ~week's vacation) so that one may take a bit longer.

This revision was landed with ongoing or failed builds.May 31 2023, 2:45 AM
This revision was automatically updated to reflect the committed changes.
thakis added inline comments.
clang-tools-extra/pseudo/lib/grammar/CMakeLists.txt
5

Did whoever wrote this comment sign up its removal? Looks like it was @sammccall in https://github.com/llvm/llvm-project/commit/dc63ad8878de6d0b5dc1268691f48035e9234763

rsandifo-arm added inline comments.May 31 2023, 5:42 AM
clang-tools-extra/pseudo/lib/grammar/CMakeLists.txt
5

No, sorry, I should have checked.

The type referenced in the comment (tok::TokenKind) is the one that is now being partially generated by tablegen. Given that, there didn't seem any way of avoiding the dependency.

This makes `CXX11 and C2x` spellings
unsuitable for attributes that affect the type system, that change the
binary interface of the code, or that have other similar semantic meaning.

Yes, standard attributes aren't supposed to be used for things which affect the type system (although, we certainly have many, already, which do, since we expose most GCC-syntax attributes also as C++-standard attribute syntax!)

But I think the C++ standard [[no_unique_address]] attribute is ample precedent that standard-attributes may validly affect ABI. [[no_unique_address]] may indeed be ignored without changing the validity of a program -- but it does change the ABI. So, if you care for ABI compatibility between compilers, all the compilers you're using need to agree on whether they ignore it or not.

rsmith added a subscriber: rsmith.Jun 6 2023, 4:31 PM

Yes, standard attributes aren't supposed to be used for things which affect the type system (although, we certainly have many, already, which do, since we expose most GCC-syntax attributes also as C++-standard attribute syntax!)

The rule that standard attributes don't affect semantics only applies to attributes specified by the language standard. There is no expectation that vendor attributes avoid such effects. In particular, I'm concerned by this in the description of this change:

Really, the “only” thing wrong with using standard attributes is that standard attributes cannot affect semantics.

If the only reason for this patch series is an idea that vendor attributes using [[...]] syntax can't affect program semantics, then I think this change is not justified, because vendor attributes using [[...]] syntax can and usually do affect program semantics. But the documentation change here makes the point that using a keyword attribute may be as good idea in cases where you would always want compilation to fail on a compiler that doesn't understand the annotation, rather than the annotation being ignored (likely with a warning), so maybe that's reasonable justification for this direction.

Yes, standard attributes aren't supposed to be used for things which affect the type system (although, we certainly have many, already, which do, since we expose most GCC-syntax attributes also as C++-standard attribute syntax!)

The rule that standard attributes don't affect semantics only applies to attributes specified by the language standard. There is no expectation that vendor attributes avoid such effects.

Ah. Does that mean that, when the standard says (sorry for the direct latex quote):

For an \grammarterm{attribute-token} (including an \grammarterm{attribute-scoped-token}) not specified in this document, the behavior is \impldef{behavior of non-standard attributes}; any such \grammarterm{attribute-token} that is not recognized by the implementation is ignored.

the “is ignored” rule only applies to unscoped attributes and to attributes in one of the std:: namespaces? I.e. to things that could reasonably be standardised in future, rather than to vendor attributes?

In particular, I'm concerned by this in the description of this change:

Really, the “only” thing wrong with using standard attributes is that standard attributes cannot affect semantics.

If the only reason for this patch series is an idea that vendor attributes using [[...]] syntax can't affect program semantics, then I think this change is not justified, because vendor attributes using [[...]] syntax can and usually do affect program semantics. But the documentation change here makes the point that using a keyword attribute may be as good idea in cases where you would always want compilation to fail on a compiler that doesn't understand the annotation, rather than the annotation being ignored (likely with a warning), so maybe that's reasonable justification for this direction.

FWIW, the original plan had been to use standard attribute syntax with the arm vendor namespace. We started out with things like [[arm::streaming]], but then a colleague pointed out that standard attributes aren't supposed to affect semantics. So then we switched to GNU attributes, which have long included things that can't be ignored (such as vector_size). But there was pushback against that because even unrecognised GNU attributes only generate a warning. (I think GCC made a mistake by establishing that unrecognised GNU attributes are only a warning.)

If using standard attributes with a vendor namespace is acceptable for things that affect semantics and ABI, then that would actually be more convenient, for a few reasons:

  • It would provide proper scoping (via namespaces), rather than the clunky __arm_ prefixes. E.g. some of the ACLE intrinsics have __arm_streaming __arm_shared_za __arm_preserves_za, which is quite a mouthful :)
  • It would allow other attribute-related features to be used, such as #pragma clang attribute
  • It would allow attributes to take arguments, without any compatibility concerns. Keyword attributes could support arguments, but the decision about whether an attribute takes arguments would probably have to be made when the attribute is first added. Trying to retrofit arguments to a keyword attribute that didn't previously take arguments could lead to parsing ambiguities. In contrast, the standard attribute syntax would allow optional arguments to be added later without any backward compatibility concerns.

The main drawback would still be that unrecognised attributes only generate a warning. But if vendor attributes are allowed to affect semantics, we could “solve” the warning problem for current and future compilers by reporting an error for unrecognised arm:: attributes. That wouldn't help with older compilers, but TBH, I think older compilers would reject any practical use of SME attributes anyway. E.g. the attributes would almost always (perhaps always) be used in conjunction with the arm_sme.h header, which older compilers don't provide. We also provide a feature preprocessor macro that careful code can check.

So if using an arm namespace is acceptable, if the current line about what can affect semantics is likely to become more fuzzy, and if there's a risk that keyword attributes are a dead end that no-one else adopts, then TBH I'd still be in favour of using arm namespaces for SME.

Yes, standard attributes aren't supposed to be used for things which affect the type system (although, we certainly have many, already, which do, since we expose most GCC-syntax attributes also as C++-standard attribute syntax!)

The rule that standard attributes don't affect semantics only applies to attributes specified by the language standard. There is no expectation that vendor attributes avoid such effects. In particular, I'm concerned by this in the description of this change:

Really, the “only” thing wrong with using standard attributes is that standard attributes cannot affect semantics.

If the only reason for this patch series is an idea that vendor attributes using [[...]] syntax can't affect program semantics, then I think this change is not justified, because vendor attributes using [[...]] syntax can and usually do affect program semantics. But the documentation change here makes the point that using a keyword attribute may be as good idea in cases where you would always want compilation to fail on a compiler that doesn't understand the annotation, rather than the annotation being ignored (likely with a warning), so maybe that's reasonable justification for this direction.

That is the reason for this direction -- we have enough instances where people want to add a vendor attribute but it is not safe for other implementations to ignore it (due to ABI, correctness, etc). This feature allows adding the attribute as a keyword which can appear anywhere the attribute can appear. Due to it being expressed as a keyword in source, the other implementations will fail to accept the code, whereas use of a vendor-specified attribute would be (potentially silently) accepted with ill-effects.

Hi @jyknight , @rsmith

Do you have any more thoughts on the above? Quick version is:

  1. Is it OK to have [[…]] attributes in the arm namespace that affect semantics?
  2. Is it OK to raise an error for unrecognised attributes in the arm namespace (for a measure of future-proofing)?

Given the differing views, I'm unsure whether to revert the series and do (1) (and possibly (2)), or whether to leave things as they are.

Thanks!

Hi @jyknight , @rsmith

Do you have any more thoughts on the above? Quick version is:

  1. Is it OK to have [[…]] attributes in the arm namespace that affect semantics?

I'd say the consensus is that it is.

  1. Is it OK to raise an error for unrecognised attributes in the arm namespace (for a measure of future-proofing)?

We already have the -Wunknown-attributes warning enabled by default (as a warning). Is it vital for it to be a default-on error (for arm::*), instead of a default-on warning? ISTM that the default-on warning ought to suffice, but I'm happy to hear people's experience of this going badly in their experience. :)

Given the differing views, I'm unsure whether to revert the series and do (1) (and possibly (2)), or whether to leave things as they are.

I don't really feel strongly about the syntax chosen, but given that you've mentioned a fair number of upsides to using normal [[arm::...]] attributes, I'd say it may indeed be worthwhile to go back to that.

Hi @jyknight , @rsmith

Do you have any more thoughts on the above? Quick version is:

  1. Is it OK to have [[…]] attributes in the arm namespace that affect semantics?

Yes, any attribute in a vendor namespace can do anything it wants, including impacting semantics.

  1. Is it OK to raise an error for unrecognised attributes in the arm namespace (for a measure of future-proofing)?

Yes, part of "anything it wants" includes diagnostic behavior.

Given the differing views, I'm unsure whether to revert the series and do (1) (and possibly (2)), or whether to leave things as they are.

The reason why we suggested keywords isn't because of concerns with how *Clang* implements the attribute, it's about portability of the code between Clang and other compilers that don't implement the attribute. Compilers do not issue an error diagnostic on unrecognized attributes in a vendor namespace (per spec they're defined to be ignored: https://eel.is/c++draft/dcl.attr#grammar-6.sentence-1 so if you're lucky, you'll get a warning about the attribute being unknown, but there's no guarantee: https://godbolt.org/z/8qaqPzYYG). So when the attribute impacts semantics in such a way that silently ignoring the attribute would lead to Very Bad Outcomes (think: crashes, ABI mismatches, security concerns, etc), using a keyword for the attribute is a better user experience because unknown keywords are not ignored by implementations (they get diagnosed as a syntax error).