This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Add aarch64_sve_pcs attribute to Clang
ClosedPublic

Authored by MattDevereau on May 5 2022, 3:35 AM.

Details

Summary

[AArch64][SVE] Add aarch64_sve_pcs attribute to Clang

Enable function attribute aarch64_sve_pcs at the C level, which correspondes to aarch64_sve_vector_pcs at the LLVM IR level.

This requirement was created by this addition to the ARM C Language Extension:
https://github.com/ARM-software/acle/pull/194

Diff Detail

Event Timeline

MattDevereau created this revision.May 5 2022, 3:35 AM
Herald added a project: Restricted Project. · View Herald Transcript
MattDevereau requested review of this revision.May 5 2022, 3:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 3:35 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
peterwaller-arm accepted this revision.May 5 2022, 5:45 AM

Looks good to me with minor nits.

clang/include/clang-c/Index.h
3448

It shouldn't matter in principle (... "but in practice" ...) we should probably avoid renumbering existing things in the enum and instead add to the end of it.

Nit, this is missing a space before the equals.
Nit, SVE is an acronym, so is PCS, so capitalization should be consistent between the two. I see 'PCS' capitalized in AAPCS for example so probably all upper case makes the sense.

This revision is now accepted and ready to land.May 5 2022, 5:45 AM
peterwaller-arm added inline comments.May 5 2022, 5:55 AM
clang/include/clang-c/Index.h
3448

I retract my sloppy "it shouldn't matter in principle [at the source level]", of course it does matter, and it likely matters in this case (see 'alias for compatibility' comment above).

To be more specific, changing the enum is an ABI break, and breaks if these things are ever serialized and therefore not something you want to do.

aaron.ballman added inline comments.May 5 2022, 6:18 AM
clang/include/clang-c/Index.h
3448

+1 -- I was just making that comment when you beat me to it.

clang/include/clang/Basic/Specifiers.h
283

I continue to be concerned with how many custom calling conventions we're adding. We're running out of bits to store them and when we bump up against the next limitation (32 possible calling conventions, this patch brings us up to 20), we're going to lose a significant amount of performance due to extra memory overhead pressure (which, in turn, will greatly reduce our maximum template instantiation depth) with no clear path forward. We've looked into this before and once we hit that 32 calling convention limit, we won't be adding any new calling conventions until we solve the memory pressure problem.

Are there other alternatives than making a new calling convention? (e.g., can this be handled with a backend pass somehow, or some other implementation strategy?)
Is this calling convention going to be used often enough to warrant adding it? (e.g., will this be showing up in system headers or other major third-party library headers, or is this a one-off for people trying to eek the last ounce of performance out of a function call?)

aaron.ballman requested changes to this revision.May 5 2022, 6:19 AM

Making it clear there are necessary changes (at least to unbreak the C ABI).

This revision now requires changes to proceed.May 5 2022, 6:19 AM
tschuett added inline comments.May 5 2022, 6:22 AM
clang/include/clang-c/Index.h
3448

Could you please add your CC as 18 and add a comment the next CC will 19 and do not touch the other ones.

erichkeane added inline comments.
clang/include/clang/Basic/Specifiers.h
283

I agree with this, our list of calling conventions needs to find a way to get smaller, not larger. our ExtInfo is already shocking/frighteningly large use of bits with calling conventions taking 5. IMO, we should reduce this to 4 bits if at all possible, particularly since we're likely to need more bits in the function-type thanks to the C++ committee.

Internally, Ive been fighting pretty hard to keep us from extending this list unless at all possible, and I think we need to do so here.

Just wanted to say this is not a new calling convention as such, but rather an existing one that is generally auto-detected based on function signature. The problem we're trying to solve here is that we need a way to allow a user to force the calling convention when the function signature would not normally choose it.

MattDevereau updated this revision to Diff 427335.EditedMay 5 2022, 8:35 AM

set CXCallingConv_AArch64SVEPcs= 17 to 18 to resolve ABI break
renamed CC_AArch64SVEPcs to CC_AArch64SVEPCS

Just wanted to say this is not a new calling convention as such, but rather an existing one that is generally auto-detected based on function signature. The problem we're trying to solve here is that we need a way to allow a user to force the calling convention when the function signature would not normally choose it.

Thanks for this information! It's still not clear to me whether there's sufficient need for this extension. From this description, it sounds like this will be rarely used because it's only necessary in one-off situations. If that's correct, can those users make use of inline assembly instead of a devoted named calling convention?

paulwalker-arm added a comment.EditedMay 5 2022, 10:07 AM

Just wanted to say this is not a new calling convention as such, but rather an existing one that is generally auto-detected based on function signature. The problem we're trying to solve here is that we need a way to allow a user to force the calling convention when the function signature would not normally choose it.

Thanks for this information! It's still not clear to me whether there's sufficient need for this extension. From this description, it sounds like this will be rarely used because it's only necessary in one-off situations. If that's correct, can those users make use of inline assembly instead of a devoted named calling convention?

It's hard to say how often this will be used but when used it will be fundamental to performance. I don't see inline assembly as a workable solution. It presents an unreasonable burden on a framework's user and will impact compiler optimisations. The ACLE exists to stop developers from needing to use inline assembly.

You have to forgive my naivety here but some of the calling conventions are target specific. Is it possible for them to safely alias for the parts of the code where memory is constrained? Or to put it another way, do the X86 and AArch64 calling conventions need to be uniquely identifiable?

Just wanted to say this is not a new calling convention as such, but rather an existing one that is generally auto-detected based on function signature. The problem we're trying to solve here is that we need a way to allow a user to force the calling convention when the function signature would not normally choose it.

Thanks for this information! It's still not clear to me whether there's sufficient need for this extension. From this description, it sounds like this will be rarely used because it's only necessary in one-off situations. If that's correct, can those users make use of inline assembly instead of a devoted named calling convention?

It's hard to say how often this will be used but when used it will be fundamental to performance. I don't see inline assembly as a workable solution. It presents an unreasonable burden on a framework's user and will impact compiler optimisations. The ACLE exists to stop developers from needing to use inline assembly.

You have to forgive my naivety here but some of the calling conventions are target specific. Is it possible for them to safely alias for the parts of the code where memory is constrained? Or to put it another way, do the X86 and AArch64 calling conventions need to be uniquely identifiable?

This would alleviate my concerns if we were to do something like that. It _IS_ quite unfair how many calling convention bits we burn on old 32-bit-MSVC and X86/x86-64 calling conventions that don't seem particularly well used/valuable.

I looked into doing that at one point about 2 years ago, but it ended up being not possible/beneficial thanks to our multi-target-at-the-same-time compilation model for offload. Basically: we need to identify TWO calling convention lists at the same time, and there are combos that result in us having more than 16 items, thus not saving any bits. If you can come up with a way of doing this that I was unable to come up with at the time, I'm all for it! At that point, it would be up to the individual targets to manage themselves in a way to stay sub-16.

If you're really concerned about the size of FunctionProtoType increasing, can we just shove the infrequently used calling convention bits into TrailingObjects?

If you're really concerned about the size of FunctionProtoType increasing, can we just shove the infrequently used calling convention bits into TrailingObjects?

I don't believe so. These are parts of the bitfield and are intrinsic to the type.

If you're really concerned about the size of FunctionProtoType increasing, can we just shove the infrequently used calling convention bits into TrailingObjects?

I don't believe so. These are parts of the bitfield and are intrinsic to the type.

I don't follow. Everything stored in FunctionProtoType, including information stored in TrailingObjects, is "intrinsic to the type". It's just stored differently. (FunctionTypeExtraBitfields already exists, even...)

tschuett added inline comments.May 5 2022, 10:36 AM
clang/include/clang-c/Index.h
3448

Do you want to add a static_assert to prevent people from doing stupid things?

If you're really concerned about the size of FunctionProtoType increasing, can we just shove the infrequently used calling convention bits into TrailingObjects?

I don't believe so. These are parts of the bitfield and are intrinsic to the type.

I don't follow. Everything stored in FunctionProtoType, including information stored in TrailingObjects, is "intrinsic to the type". It's just stored differently. (FunctionTypeExtraBitfields already exists, even...)

Ah, I see what you mean. I misread and thought you meant on the FunctionDecl itself, so mea culpa.

I was unaware of FunctionTypeExtraBitfields! We perhaps should consider what of the ExtInfo we can move over to the FunctionTypeExtraBitfields. In that list, there are MAYBE 5 bits of the 13 that are used with any level of commonness (though I have no idea what CmseNSCall means). If most of those moved, I'd be pretty ok with having even EXTRA bits added for calling convention (though, if we go over 32, we probably need to have a discussion as to whether they are valuable).

If you're really concerned about the size of FunctionProtoType increasing, can we just shove the infrequently used calling convention bits into TrailingObjects?

I don't believe so. These are parts of the bitfield and are intrinsic to the type.

I don't follow. Everything stored in FunctionProtoType, including information stored in TrailingObjects, is "intrinsic to the type". It's just stored differently. (FunctionTypeExtraBitfields already exists, even...)

Ah, I see what you mean. I misread and thought you meant on the FunctionDecl itself, so mea culpa.

I was unaware of FunctionTypeExtraBitfields! We perhaps should consider what of the ExtInfo we can move over to the FunctionTypeExtraBitfields. In that list, there are MAYBE 5 bits of the 13 that are used with any level of commonness (though I have no idea what CmseNSCall means). If most of those moved, I'd be pretty ok with having even EXTRA bits added for calling convention (though, if we go over 32, we probably need to have a discussion as to whether they are valuable).

Not to be a party pooper, but that only works for FunctionProtoType and users can write calling conventions on functions without prototypes too, which has no trailing objects currently. However, functions without prototypes aren't a worry for deep template instantiations or the like (because they don't exist in C++), and if users are making heavy use of functions without prototypes (never been a feature in any ISO C standard), I don't particularly mind if their compilation goes slower because we now need to tail allocate data for each non-prototype function. However, we should be careful not to duplicate code as I'm pretty sure that's why ExtInfo is in FunctionType in the first place (there's no way to add a trailing object to FunctionType because it's subclassed).

If you're really concerned about the size of FunctionProtoType increasing, can we just shove the infrequently used calling convention bits into TrailingObjects?

I don't believe so. These are parts of the bitfield and are intrinsic to the type.

I don't follow. Everything stored in FunctionProtoType, including information stored in TrailingObjects, is "intrinsic to the type". It's just stored differently. (FunctionTypeExtraBitfields already exists, even...)

Ah, I see what you mean. I misread and thought you meant on the FunctionDecl itself, so mea culpa.

I was unaware of FunctionTypeExtraBitfields! We perhaps should consider what of the ExtInfo we can move over to the FunctionTypeExtraBitfields. In that list, there are MAYBE 5 bits of the 13 that are used with any level of commonness (though I have no idea what CmseNSCall means). If most of those moved, I'd be pretty ok with having even EXTRA bits added for calling convention (though, if we go over 32, we probably need to have a discussion as to whether they are valuable).

Not to be a party pooper, but that only works for FunctionProtoType and users can write calling conventions on functions without prototypes too, which has no trailing objects currently. However, functions without prototypes aren't a worry for deep template instantiations or the like (because they don't exist in C++), and if users are making heavy use of functions without prototypes (never been a feature in any ISO C standard), I don't particularly mind if their compilation goes slower because we now need to tail allocate data for each non-prototype function. However, we should be careful not to duplicate code as I'm pretty sure that's why ExtInfo is in FunctionType in the first place (there's no way to add a trailing object to FunctionType because it's subclassed).

First: I care very little about FunctionNoProtoType. Second, it is used only in "C" afaik, where our memory pressure isn't that big of a deal. I'd be OK with having the FunctionTypeExtraBits be just added to FunctionNoProtoType, but stay in FunctionProtoType trailing storage.

It WOULD be more work, but I would wonder if we could combine all 3 classes into FunctionType and replace the NoProtoType-ness with a single bit to differentiate. The result is the non-prototyped versions needing to carry around extra information, but since:
a- our C implementation has low memory pressure
b- prototypeless functions were a mistake

I would think the extra memory they end up carrying might be ok?

I don't want to change with the distinction between FunctionProtoType and FunctionNoProtoType; that would involve auditing a bunch of code we don't really want to mess with.

FunctionProtoType and FunctionNoProtoType don't have any member variables at the moment, though. The only difference is that FunctionProtoType has TrailingObjects, and FunctionNoProtoType doesn't. It should be fine to just require the same TrailingObjects for FunctionNoProtoType. So FunctionProtoType and FunctionNoProtoType would have the same memory layout.

First: I care very little about FunctionNoProtoType.

I care insomuch as it doesn't break valid user code.

Second, it is used only in "C" afaik, where our memory pressure isn't that big of a deal. I'd be OK with having the FunctionTypeExtraBits be just added to FunctionNoProtoType, but stay in FunctionProtoType trailing storage.

I agree that the memory pressure here shouldn't be overly concerning, and I would most likely be okay if we adds the bits that way. However, I think we could also just make FunctionNoProtoType have the same memory layout as FunctionProtoType by using trailing objects again.

It WOULD be more work, but I would wonder if we could combine all 3 classes into FunctionType and replace the NoProtoType-ness with a single bit to differentiate. The result is the non-prototyped versions needing to carry around extra information, but since:
a- our C implementation has low memory pressure
b- prototypeless functions were a mistake

I would think the extra memory they end up carrying might be ok?

We do a fair amount of checks via isa<FunctionProtoType>() (or similar) that would need to be updated to check for a FunctionType and whether that bit is set, and I don't have a good handle on what that would do to the code. If it doesn't cause significant maintenance issues, I would still want to see some measurements for what it does to compiler performance on large projects (it'd be a good thing to try on http://llvm-compile-time-tracker.com/).

Matt added a subscriber: Matt.May 6 2022, 1:48 PM

@aaron.ballman It looks like the conversation reached a conclusion? Given this is separate to what we're trying to add here can this patch be unblocked?

aaron.ballman accepted this revision.May 10 2022, 8:44 AM

@aaron.ballman It looks like the conversation reached a conclusion? Given this is separate to what we're trying to add here can this patch be unblocked?

I continue to dislike adding more calling conventions, but the immediate concerns I had about running out of space are not too immediate and it sounds like we have a plausible way forward when we get closer to it. So I retract my blocking concerns -- this LGTM. Thanks for the discussion on this!

This revision is now accepted and ready to land.May 10 2022, 8:44 AM