Page MenuHomePhabricator

Introduce _BitInt, deprecate _ExtInt
Needs ReviewPublic

Authored by aaron.ballman on Aug 24 2021, 10:14 AM.

Details

Summary

WG14 adopted the _ExtInt feature from Clang for C23, but renamed the type to be _BitInt. This patch does the vast majority of the work to rename _ExtInt to _BitInt, which accounts for most of its size. The new type is exposed in older C modes and all C++ modes as a conforming extension. However, there are functional changes worth calling out:

  • Deprecates _ExtInt with a fix-it to help users migrate to _BitInt.
  • Updates the mangling for the type. MSVC switches from _ExtInt to _BitInt in the private mangling name, and Itanium uses newly-proposed-but-not-yet-accepted basic datatype names (https://github.com/itanium-cxx-abi/cxx-abi/pull/129).
  • Updates the documentation and adds a release note to warn users what is going on.
  • Adds new diagnostics for use of _BitInt to call out when it's used as a Clang extension or as a pre-C23 compatibility concern.
  • Adds new tests for the new diagnostic behaviors.

I want to call out the ABI break specifically. We do not believe that this break will cause a significant imposition for early adopters of the feature, and so this is being done as a full break. If it turns out there are critical uses where recompilation is not an option for some reason, we can consider using ABI tags to ease the transition.

There is one piece of work that's still outstanding: the test files use "ext int" in their name. Renaming the tests would make it harder to see that the changes in the test files are about the renaming of the type because it would show up in Phab as a delete + create of a new file. I will rename the files to use "bit int" in their name as a follow-up once this lands.

Note: I would like to hold off on landing this until the Itanium ABI proposal has been accepted, but I think it's reasonable to start getting Clang review feedback now as I don't expect the ABI discussion to take a significant amount of time.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Removing around 40 subscribers. Everyone was automatically added by Herald because of the mechanical renaming changes in files and so this review is likely spam for many of them. More importantly, having that many people subscribed causes mailman to moderate each patch review email. If you've been removed as a subscriber and this makes you sad, please re-add yourself with my apologies! (And hopefully Herald won't decide to add everyone back as a subscriber again automatically!)

erichkeane added inline comments.Aug 24 2021, 10:32 AM
clang/docs/ReleaseNotes.rst
142

This section should probably mention the ABI break. It shouldn't be a surprise for people upgrading.

143

I would think it would be a better idea to not claim that _ExtInt is still supported , but perhaps just say that it is now an alias for the 'new type'. I realize it is a little pedantic, but _ExtInt is no longer a type in our type system. I think this will make the diagnostics more understandable.

clang/include/clang/Basic/DiagnosticParseKinds.td
1489

should this be warn_cpre2x here, just to be more consistent with the 'group'?

clang/include/clang/Basic/TargetInfo.h
585–589

Concur on the fixme. I would expect after this lands that an llvm-dev discussion happen to do this alert, and have us remove this pretty quickly (a release or two?)

clang/lib/AST/Type.cpp
2021

Note to others: Aaron and I discussed this offline. I don't think it is NECESSARY (as I don't think we really call these functions on dependent types), but I thought it was a good idea to be here for completeness.

clang/lib/Parse/ParseDecl.cpp
3873

Is this extra trailing WS?

aaron.ballman marked 3 inline comments as done.

Update based on review feedback.

As a drive-by, I also moved the definition of DiagnoseBitIntUse() out of ParseTentative.cpp and into ParseDecl.cpp.

clang/include/clang/Basic/DiagnosticParseKinds.td
1489

It turns out that other diagnostics don't put pre in the identifier. Going with warn_c17_compat_bit_int to be consistent with warn_c17_compat_static_assert_no_message which is in the same group.

clang/lib/AST/Type.cpp
2021

+1 -- I could not find a new test case to write for this, but it seemed like an "obvious" oversight because the dependency doesn't matter for determining whether the type is signed or not.

clang/lib/Parse/ParseDecl.cpp
3873

*gasps* the nerve of some patch authors, right? ;-)

Please feel free to wordsmith my ABI change comment. I don't feel strongly other than trying to make it clear that _ExtInt is no longer a type/types. A question for other reviewers: Do we feel strongly enough to try to keep the old mangling around via the clang-abi-version flag? I'm not motivated, as __ExtInt was generally experimental, but I want to make sure noone feels strongly.

clang/docs/ReleaseNotes.rst
184

Hrm... not a huge fan that this still claims that `_ExtInt` is a type(though don't have a better wording), but I'd also probably want to mention that it now matches the new type. Perhaps something like:

The ``_ExtInt(N)`` family of types have been replaced with the C2X standardized version of the feature, ``_BitInt(N)``.  Therefore, source that previously used the ``_ExtInt`` types will now be mangled to instead use the ``_BitInt(N)`` spelling in both the Microsoft and Itanium ABIs.
clang/include/clang/Basic/TargetInfo.h
585–589

To clarify: This should be removed at the beginning of a release-cycle (along with an llvm-dev notice) so that we have as much time as possible in trunk to react/find issues.

aaron.ballman added inline comments.Aug 24 2021, 11:17 AM
clang/docs/ReleaseNotes.rst
184

I can go with this.

clang/include/clang/Basic/TargetInfo.h
585–589

We're basically at the beginning of Clang 14 (13 isn't out the door yet), so I am sort of tempted to alert llvm-dev now and remove it as part of this review. However, if people think that's too harsh, I'm happy to wait as well.

Update release note based on review feedback.

LGTM, pending some level of Itanium ABI blessing.

jyknight added inline comments.Aug 24 2021, 6:03 PM
clang/include/clang/Basic/TargetInfo.h
585–589

Since we already return true for all in-tree targets, removing this is effectively a no-op.

However...

Previously, this was a clang extension, so the ABI didn't really matter, as long as it was functional and not broken. But now, it's a standard feature, which gives (or, at least SHOULD give) a much stronger expectation as to compiler-interoperability and ABI documentation.

So -- where's the ABI specification for how _BitInt is represented in memory (size, alignment, ordering/padding location), and how it's passed/returned from functions? Presumably that needs to be part of each platform's psABI -- ideally before the feature is enabled on each platform, right?

I realize that such idealism may be somewhat impractical, here in the real world, since some platforms' psABIs appear to be effectively unmaintained. But is it going to be added to all those which are being actively maintained?

It would be really quite lovely if this didn't end up like the situation we have with _Atomic. (Where the new feature was implemented and deployed by compilers, without ever hashing out ABI issues. And, so, _Atomic ABI remains undocumented and incompatible between implementations.)

clang/lib/AST/ItaniumMangle.cpp
3958

There ought to be a ItaniumDemangle.h change corresponding to this, too (As a separate review is fine, too.)

clang/lib/AST/MicrosoftMangle.cpp
3338

It seems unlikely that this will be the official mangling for MS platforms, since it has a name __clang in it, right? How do we plan to deal with that future ABI issue?

I agree with James; I know you've reached out to the Itanium ABI group about mangling, but ABI involvement needs to mean also reaching out and getting psABI agreement about representation. I would suggest proposing a generic ABI, getting consensus on that generic ABI with other implementors, and then running that generic ABI past as many platform ABI groups as you can get in touch with.

I think the most logical generic ABI is:

  • Let MaxFundamentalWidth be the (target-chosen) bit-width of the largest fundamental integer type that can be used to represent a _BitInt. Typically this will be the largest integer type supported by the ABI, but some targets may want to use a smaller limit. At the very least, this needs to be locked down in the ABI; the ABI for _BitInt(256) shouldn't change if the ABI adds new support for an int256_t type.
  • Let chunk_t be the (target-chosen) fundamental integer type that will be used to store the components of a _BitInt that is wider than MaxFundamentalWidth. This should be an integer type that the architecture comfortably supports overflow operations on, and it will typically be the full width of a GPR.
  • Let ChunkWidth be defined as CHAR_BITS * sizeof(chunk_t).
  • If N < RepWidth(N), then signed/unsigned _BitInt(N) has the same representation as _BitInt(RepWidth(N)), with the value held in the least significant bits and sign/zero-extended to the full width. Values must be in the proper range of the original type; that is, it is undefined behavior if e.g. a signed _BitInt(7) does not hold a value in the range of -64 ..< 63. Hereafter we will assume that N == RepWidth(N).
  • If N <= MaxFundamentalWidth, then signed/unsigned _BitInt(N) has the same representation as the smallest fundamental integer type (least rank, if material) of at least N bits and the same signedness. _Bool is not considered in this selection.
  • Otherwise, signed/unsigned _BitInt(N) has the same representation as a struct containing a single field of type chunk_t[N / sizeof(chunk_t)], where the element at index i stores bits i*ChunkWidth ..< (i+1)*ChunkWidth of the integer. That is, the array is always stored in little-endian order, which should have better memory-access properties regardless of the endianness of the target. (The individual elements still have natural host endianness, of course.)

Some targets that don't pass/return small structs efficiently as a matter of course may want to treat smallish (but still non-fundamental) _BitInts specially in their calling convention.

I think using a uniform, GPR-sized chunk makes more sense than trying to use a smaller chunk that might eliminate some padding. I could imagine that a 64-bit platform that supports 32-bit overflow checking might consider represent _BitInt(96) with three 32-bit chunks instead of two 64-bit chunks, and that would indeed make an array of them pack better, but it would also mean that e.g. addition would take three operations instead of two.

You can't have bit-fields of _BitInt type, can you? If people want that, or want a truly packed _BitInt where sizeof(_BitInt(56)) == 7, it's going to add a lot of complexity.

I agree with James; I know you've reached out to the Itanium ABI group about mangling, but ABI involvement needs to mean also reaching out and getting psABI agreement about representation. I would suggest proposing a generic ABI, getting consensus on that generic ABI with other implementors, and then running that generic ABI past as many platform ABI groups as you can get in touch with.

I've already reached out to the psABI maintainers and have started those discussions before ever considering the mangling questions: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/21622785649f2cbacb596d135a417171f52ad539 Note that there are some changes expected to that wording (the "bit-aligned" stuff should say "byte-aligned" and we're missing information about bit-fields), so if there are additional changes we'd like to see made, now's a good time to make them.

If you have comments or concerns with what's proposed there, I can pass along feedback to H.J. about potential changes we'd like to see. However, once it's in psABI, I'm not certain what other platform ABI groups even exist -- that's a bit out of my wheelhouse, so help getting those discussions moving would be very much appreciated.

I think the most logical generic ABI is:

  • Let MaxFundamentalWidth be the (target-chosen) bit-width of the largest fundamental integer type that can be used to represent a _BitInt. Typically this will be the largest integer type supported by the ABI, but some targets may want to use a smaller limit. At the very least, this needs to be locked down in the ABI; the ABI for _BitInt(256) shouldn't change if the ABI adds new support for an int256_t type.
  • Let chunk_t be the (target-chosen) fundamental integer type that will be used to store the components of a _BitInt that is wider than MaxFundamentalWidth. This should be an integer type that the architecture comfortably supports overflow operations on, and it will typically be the full width of a GPR.
  • Let ChunkWidth be defined as CHAR_BITS * sizeof(chunk_t).
  • If N < RepWidth(N), then signed/unsigned _BitInt(N) has the same representation as _BitInt(RepWidth(N)), with the value held in the least significant bits and sign/zero-extended to the full width. Values must be in the proper range of the original type; that is, it is undefined behavior if e.g. a signed _BitInt(7) does not hold a value in the range of -64 ..< 63. Hereafter we will assume that N == RepWidth(N).
  • If N <= MaxFundamentalWidth, then signed/unsigned _BitInt(N) has the same representation as the smallest fundamental integer type (least rank, if material) of at least N bits and the same signedness. _Bool is not considered in this selection.
  • Otherwise, signed/unsigned _BitInt(N) has the same representation as a struct containing a single field of type chunk_t[N / sizeof(chunk_t)], where the element at index i stores bits i*ChunkWidth ..< (i+1)*ChunkWidth of the integer. That is, the array is always stored in little-endian order, which should have better memory-access properties regardless of the endianness of the target. (The individual elements still have natural host endianness, of course.)

I believe that's what's already been proposed for psABI, but if I've missed something, please let me know.

Some targets that don't pass/return small structs efficiently as a matter of course may want to treat smallish (but still non-fundamental) _BitInts specially in their calling convention.

I think using a uniform, GPR-sized chunk makes more sense than trying to use a smaller chunk that might eliminate some padding. I could imagine that a 64-bit platform that supports 32-bit overflow checking might consider represent _BitInt(96) with three 32-bit chunks instead of two 64-bit chunks, and that would indeed make an array of them pack better, but it would also mean that e.g. addition would take three operations instead of two.

You can't have bit-fields of _BitInt type, can you? If people want that, or want a truly packed _BitInt where sizeof(_BitInt(56)) == 7, it's going to add a lot of complexity.

You can have bit-fields of _BitInt type (and we even tell you when you do dumb things with it https://godbolt.org/z/sTKKdb1o1), but I'm not seeing where the complexity comes in from that (there was a thought that referencing the existing bit-field wording with a mention about the width of _BitInts would be sufficient). But these are not truly packed -- sizeof(_BitInt(56) == 8.

clang/include/clang/Basic/TargetInfo.h
585–589

So -- where's the ABI specification for how _BitInt is represented in memory (size, alignment, ordering/padding location), and how it's passed/returned from functions? Presumably that needs to be part of each platform's psABI -- ideally before the feature is enabled on each platform, right?

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/21622785649f2cbacb596d135a417171f52ad539 is a work in progress, so one step ahead on that.

It would be really quite lovely if this didn't end up like the situation we have with _Atomic. (Where the new feature was implemented and deployed by compilers, without ever hashing out ABI issues. And, so, _Atomic ABI remains undocumented and incompatible between implementations.)

Agreed! WG14 asked me explicitly to start the ABI discussions now, and I've done so to the best of my ability. Unfortunately, I'm not an ABI expert and so I don't know who all to reach out to (I've talked to psABI and Itanium folks, that's it). Help in this area would be very much appreciated.

clang/lib/AST/ItaniumMangle.cpp
3958

+1, I can do that as a follow-up.

clang/lib/AST/MicrosoftMangle.cpp
3338

With an ABI break. I waffled on this one -- we could leave it as ExtInt instead of BitInt and only deal with the ABI break once, but I believe the number of existing users of _ExtInt on Windows platforms to be vanishingly small enough to not count as much of an ABI break today.

As best I can tell, Microsoft sort of does their own thing for mangling whenever they get around to it, so my thinking was that doing the rename to BitInt now at least makes the mangling intelligible for users of the standard feature from Clang. But I don't feel strongly about it.

  • Changes the dependent ABI to mangle expressions rather than template arguments (per Itanium ABI discussion PR), adds a new test for it.
  • Fixes the failing clang-tidy test case caught by CI.
  • [NFC] Fixes some clang-format issues.

I've already reached out to the psABI maintainers and have started those discussions before ever considering the mangling questions: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/21622785649f2cbacb596d135a417171f52ad539 Note that there are some changes expected to that wording (the "bit-aligned" stuff should say "byte-aligned" and we're missing information about bit-fields), so if there are additional changes we'd like to see made, now's a good time to make them.

Btw, the psABI discussion thread can be found at: https://groups.google.com/g/x86-64-abi/c/XQiSj-zU3w8 and the latest version of the patch corrected the issues I mentioned.

I agree with James; I know you've reached out to the Itanium ABI group about mangling, but ABI involvement needs to mean also reaching out and getting psABI agreement about representation. I would suggest proposing a generic ABI, getting consensus on that generic ABI with other implementors, and then running that generic ABI past as many platform ABI groups as you can get in touch with.

I've already reached out to the psABI maintainers and have started those discussions before ever considering the mangling questions: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/21622785649f2cbacb596d135a417171f52ad539

Okay. To be clear, that's not "the psABI maintainers", that's specifically the x86_64 psABI group. Is your expectation that other psABI groups would more or less copy the wording from the x86_64 psABI, potentially replacing 64 with 32 as appropriate, or would you want to host a generic ABI document somewhere else? If the latter, I think the Clang repository wouldn't be an unreasonable place to do that; we do something similar with blocks.

I would be happy to put you in touch with some other psABI groups when you're ready to talk to them.

Note that there are some changes expected to that wording (the "bit-aligned" stuff should say "byte-aligned" and we're missing information about bit-fields), so if there are additional changes we'd like to see made, now's a good time to make them.

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

One advantage of having a generic ABI is that you can give a rationale in the document, which would be out of scope for a psABI.

You can't have bit-fields of _BitInt type, can you? If people want that, or want a truly packed _BitInt where sizeof(_BitInt(56)) == 7, it's going to add a lot of complexity.

You can have bit-fields of _BitInt type (and we even tell you when you do dumb things with it https://godbolt.org/z/sTKKdb1o1), but I'm not seeing where the complexity comes in from that

What is your expectation of how a _BitInt bit-field interacts with adjacent bit-fields? Does unsigned a: 1; _BitInt(8) b: 7; only use 8 bits? Does unsigned a: 1; _BitInt(95) b: 95; only use 96? Does the latter change based on whether this is a 32-bit or 64-bit target? Does it change if the type is _BitInt(128) instead? What is the "storage unit" of a _BitInt for the purposes of ABIs that are sensitive to such things?

Also, C requires the width of a bit-field to be no larger than the width of the underlying type, but C++ allows "wide" bit-fields, and psABIs are inconsistent about giving rules for their layout. Does this match that where supported, or is a different set of rules?

aaron.ballman added a subscriber: hjl.tools.

I agree with James; I know you've reached out to the Itanium ABI group about mangling, but ABI involvement needs to mean also reaching out and getting psABI agreement about representation. I would suggest proposing a generic ABI, getting consensus on that generic ABI with other implementors, and then running that generic ABI past as many platform ABI groups as you can get in touch with.

I've already reached out to the psABI maintainers and have started those discussions before ever considering the mangling questions: https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/21622785649f2cbacb596d135a417171f52ad539

Okay. To be clear, that's not "the psABI maintainers", that's specifically the x86_64 psABI group.

Heh, you can see how out of my depths I am on the ABI topic. :-D

Is your expectation that other psABI groups would more or less copy the wording from the x86_64 psABI, potentially replacing 64 with 32 as appropriate, or would you want to host a generic ABI document somewhere else? If the latter, I think the Clang repository wouldn't be an unreasonable place to do that; we do something similar with blocks.

I was expecting the ABI groups to determine what's best for their respective ABIs; I imagined the x86-64 ABI was sufficient as a template, but I wasn't certain if that's appropriate. We could host a generic ABI document similar to what we do for blocks if that's what people felt was a good approach.

I would be happy to put you in touch with some other psABI groups when you're ready to talk to them.

Thanks! I'm kind of hoping to avoid being responsible for driving the ABI process with all the different ABI groups because I have no idea how much effort I would be signing myself up for. TBH, I was hoping a "this is a newly standardized type that needs ABI considerations" note would be sufficient because I don't have an opinion on what's best for any given ABI.

Note that there are some changes expected to that wording (the "bit-aligned" stuff should say "byte-aligned" and we're missing information about bit-fields), so if there are additional changes we'd like to see made, now's a good time to make them.

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

I've added @hjl.tools to the review for his opinions, as he was the primary driver for the x64 ABI proposal. HJ, can you help me out here?

One advantage of having a generic ABI is that you can give a rationale in the document, which would be out of scope for a psABI.

Ah, good point.

You can't have bit-fields of _BitInt type, can you? If people want that, or want a truly packed _BitInt where sizeof(_BitInt(56)) == 7, it's going to add a lot of complexity.

You can have bit-fields of _BitInt type (and we even tell you when you do dumb things with it https://godbolt.org/z/sTKKdb1o1), but I'm not seeing where the complexity comes in from that

What is your expectation of how a _BitInt bit-field interacts with adjacent bit-fields?

It's implementation-defined because the type isn't _Bool, signed int, or unsigned int, but the current behavior is that it combines at the type boundary rather than packing.

Does unsigned a: 1; _BitInt(8) b: 7; only use 8 bits?

sizeof(struct S {
    unsigned a: 1;
    _BitInt(8) b: 7;
  });

is 4 bytes (on x64).

Does unsigned a: 1; _BitInt(95) b: 95; only use 96?

sizeof(struct S {
    unsigned a: 1;
    _BitInt(95) b: 95;
  });

is 16 bytes (on x64).

Does the latter change based on whether this is a 32-bit or 64-bit target?

It does; then it's 12 bytes.

Does it change if the type is _BitInt(128) instead?

128-bit is: 24 bytes (x64) and 20 bytes (x86)
127-bit is: 16 bytes on both x64 and x86

What is the "storage unit" of a _BitInt for the purposes of ABIs that are sensitive to such things?

My understanding is that it's the _BitInt itself.

Also, C requires the width of a bit-field to be no larger than the width of the underlying type, but C++ allows "wide" bit-fields, and psABIs are inconsistent about giving rules for their layout. Does this match that where supported, or is a different set of rules?

It should be the same set of rules. At least: https://godbolt.org/z/1n9qn38b7

(I should note, I've been describing the behavior Clang currently has using _ExtInt and testing on godbolt. If we want different answers than what I've given, additional work is required. @erichkeane may have more information on why the current behavior is the way it is.)

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

I've added @hjl.tools to the review for his opinions, as he was the primary driver for the x64 ABI proposal. HJ, can you help me out here?

Please follow up x86-64 psABI proposal with any feedbacks:

https://groups.google.com/g/x86-64-abi/c/XQiSj-zU3w8

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

I've added @hjl.tools to the review for his opinions, as he was the primary driver for the x64 ABI proposal. HJ, can you help me out here?

Please follow up x86-64 psABI proposal with any feedbacks:

https://groups.google.com/g/x86-64-abi/c/XQiSj-zU3w8

We don't have feedback yet, @hjl.tools; we're asking for rationale on the behavior of unused bits in the proposed psABI for x86-64. Can you help us understand why the bits are unspecified rather than extended and whether there are potential performance concerns from this decision (before it gets solidified)? Thanks!

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

I've added @hjl.tools to the review for his opinions, as he was the primary driver for the x64 ABI proposal. HJ, can you help me out here?

Please follow up x86-64 psABI proposal with any feedbacks:

https://groups.google.com/g/x86-64-abi/c/XQiSj-zU3w8

We don't have feedback yet, @hjl.tools; we're asking for rationale on the behavior of unused bits in the proposed psABI for x86-64. Can you help us understand why the bits are unspecified rather than extended and whether there are potential performance concerns from this decision (before it gets solidified)? Thanks!

It was suggested by people who proposed _BitInt.

! In D108643#2965852, @rjmccall wrote:

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

So we chose this for a few reasons:

1- Consistency with struct padding bits. It seemed strange to specify the padding bits here, when the rest of the standards/ABI don't specify padding bits.
2- Flexibility of implementation: Requiring zeroing is a more constraining decision, which limits implementation to having to set these bits. By leaving it unspecified, the implementation is free to zero them out if it feels it is worth-while. I'll note that our backends choose NOT to zero them out when not necessary, since (so I'm told) 'masked' compares are trivial in most processors.
3- Implement-ability on FPGAs: Since this was our motivating example, forcing an FPGA to zero out these bits when dealing with an interaction with a byte-aligned processor would have incredible performance overhead.
4- Ease of implementation: Forcing LLVM to zero out these bits would either mean we had to do quite a bit of work in our CodeGen to zero them out, or modify most of the backends to not zero padding bits in these cases. Since there isn't a particular performance benefit (see #2) we didn't think it would be worth while.

rjmccall added a comment.EditedSep 14 2021, 11:48 AM

! In D108643#2965852, @rjmccall wrote:

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

So we chose this for a few reasons:

1- Consistency with struct padding bits. It seemed strange to specify the padding bits here, when the rest of the standards/ABI don't specify padding bits.

I think it's a mistake to think of these as padding bits. Implementations on general-purpose hardware will be doing operations in word-sized chunks; these are the high bits of the most significant word.

2- Flexibility of implementation: Requiring zeroing is a more constraining decision, which limits implementation to having to set these bits. By leaving it unspecified, the implementation is free to zero them out if it feels it is worth-while.

This is a trade-off. Extending constrains the implementation of operations that produce noise in the high bits. Not extending constrains the implementation of operations that are affected by noise in the high bits. I'm willing to believe that the trade-off favors leaving the bits undefined, but that's why I'm asking, to see if you've actually evaluated this trade-off, because it kindof sounds like you've evaluated one side of it.

I'll note that our backends choose NOT to zero them out when not necessary, since (so I'm told) 'masked' compares are trivial in most processors.

They're trivial to implement in custom hardware, of course, but what existing ISAs actually provide masked compare instructions? Is this a common feature I'm simply totally unaware of? In practice I think this will be 1-2 extra instructions in every comparison.

3- Implement-ability on FPGAs: Since this was our motivating example, forcing an FPGA to zero out these bits when dealing with an interaction with a byte-aligned processor would have incredible performance overhead.

How on earth does making the store unit zero/sign-extend have "incredible performance overhead"? This is totally trivial technologically. It's not like you're otherwise sending 17-bit stores out on the bus.

I'm not sure it's appropriate to think of this as primarily an FPGA feature when in fact it's being added to standard targets.

4- Ease of implementation: Forcing LLVM to zero out these bits would either mean we had to do quite a bit of work in our CodeGen to zero them out, or modify most of the backends to not zero padding bits in these cases. Since there isn't a particular performance benefit (see #2) we didn't think it would be worth while.

The obvious lowering would be for clang to use i17 as the scalar type lowering but i32 as the "in-memory" lowering, then make sure that the backends are reasonably intelligent about folding extend/trunc operations around operations that aren't sensitive / don't produce noise.

! In D108643#2965852, @rjmccall wrote:

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

So we chose this for a few reasons:

1- Consistency with struct padding bits. It seemed strange to specify the padding bits here, when the rest of the standards/ABI don't specify padding bits.

I think it's a mistake to think of these as padding bits. Implementations on general-purpose hardware will be doing operations in word-sized chunks; these are the high bits of the most significant word.

I guess thats fair?

2- Flexibility of implementation: Requiring zeroing is a more constraining decision, which limits implementation to having to set these bits. By leaving it unspecified, the implementation is free to zero them out if it feels it is worth-while.

This is a trade-off. Extending constrains the implementation of operations that produce noise in the high bits. Not extending constrains the implementation of operations that are affected by noise in the high bits. I'm willing to believe that the trade-off favors leaving the bits undefined, but that's why I'm asking, to see if you've actually evaluated this trade-off, because it kindof sounds like you've evaluated one side of it.

Perhaps I'm missing something... the intent in the ABI/standard was to leave it unconstrained so that an implementation could choose to zero it out if possible. We DID consider making zero-ing the padding bits internally. On our hardware (see the reasoning below) there was not a perf-advantage.

I'll note that our backends choose NOT to zero them out when not necessary, since (so I'm told) 'masked' compares are trivial in most processors.

They're trivial to implement in custom hardware, of course, but what existing ISAs actually provide masked compare instructions? Is this a common feature I'm simply totally unaware of? In practice I think this will be 1-2 extra instructions in every comparison.

Intel and AMD's X86-64 both do masked-compare, at least in microcode from what I was told. I'll have to see if it was @craig.topper or @andrew.w.kaylor (perhaps someone else?) who said we'd observed similar behavior on arm.

3- Implement-ability on FPGAs: Since this was our motivating example, forcing an FPGA to zero out these bits when dealing with an interaction with a byte-aligned processor would have incredible performance overhead.

How on earth does making the store unit zero/sign-extend have "incredible performance overhead"? This is totally trivial technologically. It's not like you're otherwise sending 17-bit stores out on the bus.

My understanding is, it IS in fact as if you're putting them out on a bus, extending these bits for pushing to our x86 core from our FPGA core ends up requiring more gates, longer reads during transition between processors, and longer 'interconnects'. My understanding is the bus between the x86_64 cores and FPGA cores is a limited size, so being able to have the FPGA pack them is highly beneficial to the throughput. That said, my understanding of this is limited to the ELI5 explanation of our former FPGA architect.

I'm not sure it's appropriate to think of this as primarily an FPGA feature when in fact it's being added to standard targets.

I don't believe I _AM_ thinking of this primarily as an FPGA feature, it is a motivating example for one of our architectures. Just like standards shouldn't unfairly punish 32 bit devices, it shouldn't punish FPGAs.

4- Ease of implementation: Forcing LLVM to zero out these bits would either mean we had to do quite a bit of work in our CodeGen to zero them out, or modify most of the backends to not zero padding bits in these cases. Since there isn't a particular performance benefit (see #2) we didn't think it would be worth while.

The obvious lowering would be for clang to use i17 as the scalar type lowering but i32 as the "in-memory" lowering, then make sure that the backends are reasonably intelligent about folding extend/trunc operations around operations that aren't sensitive / don't produce noise.

I guess we could? That sounds like more work for opt for, what seems to me, to be little gain.

! In D108643#2965852, @rjmccall wrote:

The choice that high bits are unspecified rather than extended is an interesting one. Can you speak to that? That's good for +, -, *, &, |, ^, <<, and narrowing conversions, but bad for ==, <, /, >>, and widening conversions.

So we chose this for a few reasons:

1- Consistency with struct padding bits. It seemed strange to specify the padding bits here, when the rest of the standards/ABI don't specify padding bits.

I think it's a mistake to think of these as padding bits. Implementations on general-purpose hardware will be doing operations in word-sized chunks; these are the high bits of the most significant word.

I guess thats fair?

2- Flexibility of implementation: Requiring zeroing is a more constraining decision, which limits implementation to having to set these bits. By leaving it unspecified, the implementation is free to zero them out if it feels it is worth-while.

This is a trade-off. Extending constrains the implementation of operations that produce noise in the high bits. Not extending constrains the implementation of operations that are affected by noise in the high bits. I'm willing to believe that the trade-off favors leaving the bits undefined, but that's why I'm asking, to see if you've actually evaluated this trade-off, because it kindof sounds like you've evaluated one side of it.

Perhaps I'm missing something... the intent in the ABI/standard was to leave it unconstrained so that an implementation could choose to zero it out if possible. We DID consider making zero-ing the padding bits internally. On our hardware (see the reasoning below) there was not a perf-advantage.

The question is whether you can rely on extension at places that receive an arbitrary ABI-compatible value, like function parameters or loads. If nobody who receives such a value can rely on extension having been done, then the ABI is not meaningfully "leaving it unconstrained": it is constraining some places by the lack of constraint elsewhere. That is why this is a trade-off.

I'll note that our backends choose NOT to zero them out when not necessary, since (so I'm told) 'masked' compares are trivial in most processors.

They're trivial to implement in custom hardware, of course, but what existing ISAs actually provide masked compare instructions? Is this a common feature I'm simply totally unaware of? In practice I think this will be 1-2 extra instructions in every comparison.

Intel and AMD's X86-64 both do masked-compare, at least in microcode from what I was told. I'll have to see if it was @craig.topper or @andrew.w.kaylor (perhaps someone else?) who said we'd observed similar behavior on arm.

Okay, but this is a general-purpose feature being added to general-purpose targets. Clang cannot directly emit AMD x86-64 microcode, and I doubt that an and; and; cmp instruction sequence gets fused into a masked compare in microcode. Those masked comparisons are probably just used to implement 8/16/32-bit comparisons, selected directly on encountering a compare instruction of that width.

3- Implement-ability on FPGAs: Since this was our motivating example, forcing an FPGA to zero out these bits when dealing with an interaction with a byte-aligned processor would have incredible performance overhead.

How on earth does making the store unit zero/sign-extend have "incredible performance overhead"? This is totally trivial technologically. It's not like you're otherwise sending 17-bit stores out on the bus.

My understanding is, it IS in fact as if you're putting them out on a bus, extending these bits for pushing to our x86 core from our FPGA core ends up requiring more gates, longer reads during transition between processors, and longer 'interconnects'. My understanding is the bus between the x86_64 cores and FPGA cores is a limited size, so being able to have the FPGA pack them is highly beneficial to the throughput. That said, my understanding of this is limited to the ELI5 explanation of our former FPGA architect.

This still doesn't make any sense. If you're transmitting a _BitInt(17) as exactly 17 bits on a dedicated FPGA<->x86 bus, then of course continue to do that. The ABI rules govern the representation of values in the places that affect the interoperation of code, such as calling conventions and in-memory representations. They do not cover bus protocols.

I'm not sure it's appropriate to think of this as primarily an FPGA feature when in fact it's being added to standard targets.

I don't believe I _AM_ thinking of this primarily as an FPGA feature, it is a motivating example for one of our architectures. Just like standards shouldn't unfairly punish 32 bit devices, it shouldn't punish FPGAs.

This entire discussion is about what the ABI rules should be for implementing this feature on general-purpose devices that doesn't directly support e.g. 17-bit arithmetic. Custom hardware that does support native 17-bit arithmetic obviously doesn't need to care about those parts of the ABI and is not being "punished". At some point, 17-bit values will come from that specialized hardware and get exposed to general-purpose hardware by e.g. being written into a GPR; this is the first point at which the ABI even starts dreaming of being involved. Now, it's still okay under a mandatory-extension ABI if that GPR has its upper bits undefined: you're in the exact same situation as you would be after an addition, where it's fine to turn around and use that in some other operation that doesn't care about the upper bits (like a multiplication), but if you want to use it in something that cares about those bits (like a comparison), you need to zero/sign-extend them away first. The only difference between an ABI that leaves the upper bits undefined and one that mandates extension is that places which might expose the value outside of the function — like returning the value, passing the value as an argument, and writing the value into a pointer — have to be considered places that care about the upper bits; and then you get to rely on that before you do things like comparisons.

Again, I'm not trying to insist that a mandatory-extension ABI is the right way to go. I just want to make sure that we've done a fair investigation into this trade-off. Right now, my concern is that it sounds like that investigation invented a lot of extra constraints for mandatory-extension ABIs, like that somehow mandatory extension meant that you would need to transmit a bunch of zero bits between your FPGA and the main CPU. I am not a hardware specialist, but I know enough to know that this doesn't check out.

4- Ease of implementation: Forcing LLVM to zero out these bits would either mean we had to do quite a bit of work in our CodeGen to zero them out, or modify most of the backends to not zero padding bits in these cases. Since there isn't a particular performance benefit (see #2) we didn't think it would be worth while.

The obvious lowering would be for clang to use i17 as the scalar type lowering but i32 as the "in-memory" lowering, then make sure that the backends are reasonably intelligent about folding extend/trunc operations around operations that aren't sensitive / don't produce noise.

I guess we could? That sounds like more work for opt for, what seems to me, to be little gain.

I have a lot of concerns about turning "whatever LLVM does when you pass an i17 as an argument" into platform ABI. My experience is that LLVM does a lot of things that you wouldn't expect when you push it outside of simple cases like power-of-two integers. Different targets may even use different rules, because the IR specification doesn't define this stuff.

The question is whether you can rely on extension at places that receive an arbitrary ABI-compatible value, like function parameters or loads. If nobody who receives such a value can rely on extension having been done, then the ABI is not meaningfully "leaving it unconstrained": it is constraining some places by the lack of constraint elsewhere. That is why this is a trade-off.

I see, thanks for that clarification!

Okay, but this is a general-purpose feature being added to general-purpose targets. Clang cannot directly emit AMD x86-64 microcode, and I doubt that an and; and; cmp instruction sequence gets fused into a masked compare in microcode. Those masked comparisons are probably just used to implement 8/16/32-bit comparisons, selected directly on encountering a compare instruction of that width.

I don't work on the microcode, it is just what I was told when we asked about this. SO until someone can clarify, I have no idea.

This still doesn't make any sense. If you're transmitting a _BitInt(17) as exactly 17 bits on a dedicated FPGA<->x86 bus, then of course continue to do that. The ABI rules govern the representation of values in the places that affect the interoperation of code, such as calling conventions and in-memory representations. They do not cover bus protocols.

Again, it was an argument made at the time that is outside of my direct expertise, so if you have experience with mixed FPGA/traditional core interfaces, I'll have to defer to your expertise.

This entire discussion is about what the ABI rules should be for implementing this feature on general-purpose devices that doesn't directly support e.g. 17-bit arithmetic. Custom hardware that does support native 17-bit arithmetic obviously doesn't need to care about those parts of the ABI and is not being "punished". At some point, 17-bit values will come from that specialized hardware and get exposed to general-purpose hardware by e.g. being written into a GPR; this is the first point at which the ABI even starts dreaming of being involved. Now, it's still okay under a mandatory-extension ABI if that GPR has its upper bits undefined: you're in the exact same situation as you would be after an addition, where it's fine to turn around and use that in some other operation that doesn't care about the upper bits (like a multiplication), but if you want to use it in something that cares about those bits (like a comparison), you need to zero/sign-extend them away first. The only difference between an ABI that leaves the upper bits undefined and one that mandates extension is that places which might expose the value outside of the function — like returning the value, passing the value as an argument, and writing the value into a pointer — have to be considered places that care about the upper bits; and then you get to rely on that before you do things like comparisons.

Again, I'm not trying to insist that a mandatory-extension ABI is the right way to go. I just want to make sure that we've done a fair investigation into this trade-off. Right now, my concern is that it sounds like that investigation invented a lot of extra constraints for mandatory-extension ABIs, like that somehow mandatory extension meant that you would need to transmit a bunch of zero bits between your FPGA and the main CPU. I am not a hardware specialist, but I know enough to know that this doesn't check out.

Again at the time, my FPGA-CPU interconnect experts expressed issue with making the extra-bits 0, and it is filtered by my memory/ the "ELI5" explanation that was given to me, so I apologize it didn't come through correctly.

I have a lot of concerns about turning "whatever LLVM does when you pass an i17 as an argument" into platform ABI. My experience is that LLVM does a lot of things that you wouldn't expect when you push it outside of simple cases like power-of-two integers. Different targets may even use different rules, because the IR specification doesn't define this stuff.

That seems like a better argument for leaving them unspecified I would think. If we can't count on our backends to act consistently, then it is obviously going to be some level of behavior-change/perf-hit to force a decision on them.

I don't work on the microcode, it is just what I was told when we asked about this. SO until someone can clarify, I have no idea.

Again, it was an argument made at the time that is outside of my direct expertise, so if you have experience with mixed FPGA/traditional core interfaces, I'll have to defer to your expertise.

Again at the time, my FPGA-CPU interconnect experts expressed issue with making the extra-bits 0, and it is filtered by my memory/ the "ELI5" explanation that was given to me, so I apologize it didn't come through correctly.

Okay. Sorry if I came down on you personally, I know what it's like to be in the middle on things like this.

I have a lot of concerns about turning "whatever LLVM does when you pass an i17 as an argument" into platform ABI. My experience is that LLVM does a lot of things that you wouldn't expect when you push it outside of simple cases like power-of-two integers. Different targets may even use different rules, because the IR specification doesn't define this stuff.

That seems like a better argument for leaving them unspecified I would think. If we can't count on our backends to act consistently, then it is obviously going to be some level of behavior-change/perf-hit to force a decision on them.

Hmm. I did some experiments, and it looks like there's an inconsistency in a different way than I remembered. All the backends I tested seem to treat an iN parameter without the zeroext or signext attribute as only having N valid bits. However, they also also seem to assume that an iN will always be zero-padded in memory. For example, this module:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx11.0.0"

@a = global i29 0, align 4

define i32 @foo() local_unnamed_addr {
entry:
  %a = load i29, i29* @a, align 4
  %r = zext i29 %a to i32
  ret i32 %r
}

compiles to:

_foo:
	movl	_a(%rip), %eax
	retq

So if you're generating iN without extension attributes for parameters, and you're doing loads and stores of iN, then the effective ABI is that the high bits of arguments (at least in registers?) are undefined, but the high bits of values in memory are defined to be zero, even for signed types. That, uh, makes zero sense as an ABI: I can see no reason to treat these cases differently, and if we're going to assume extension, it should certainly match the signedness of the type. So I think *something* has to change in the implementation here.

I'm not sure if there's a way to get LLVM to treat loaded values as only having N valid bits.

Do you have resources on the patterns of code that you expect to see for _BitInt types? Like, what operations are most important here?

If addition, subtraction, and comparison are the most important operations — especially if we don't consider shifts or multiplication important — the best ABI might actually be to keep the value left-shifted.

Okay. Sorry if I came down on you personally, I know what it's like to be in the middle on things like this

Thank you, I very much appreciate that.

I'm not sure if there's a way to get LLVM to treat loaded values as only having N valid bits.

Do you have resources on the patterns of code that you expect to see for _BitInt types? Like, what operations are most important here?

If addition, subtraction, and comparison are the most important operations — especially if we don't consider shifts or multiplication important — the best ABI might actually be to keep the value left-shifted.

I don't have any such resources. Our users treat them as 'just integers', and my understanding is that is the design intent we brought to WG14. There is SOME experimentation to use these as 'Big Ints' as well.

FYI: @craig.topper noticed my ping above and tells me he at least doesn't agree with my understanding of the hallway discussion we had a few years ago. I am hopeful he can correct my memory/interpretation of the conversation.

Minor updates.

  • Rebased
  • Fixed failing tests in clang-tools-extra
  • Removed FIXME comments about the Itanium mangling being proposed, as that is now accepted

I plan to work on a general ABI document for this patch, similar to what we've produced for Blocks.

Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2021, 6:51 AM

Adds BitIntABI.rst as the start of the ABI documentation.

As I was writing this document, I became less and less comfortable with saying "this is how the ABI is" and more and more comfortable with "please think about these things because they're the things we're thinking about." I didn't think it would be appropriate to dictate architecture-specific details like endian requirements, but I did think it'd be reasonable to ask people to be thinking about them in general.

Is this along the lines of what people were hoping to see on the ABI topic?

I think it would be better to provide a generic ABI specification that is designed to "lower" _BitInt types into more basic concepts that ABIs typically already have rules for. It would be appropriate to note at the top that this is only a generic specification and that the rules for any particular platform should be codified in a platform-specific ABI. But we should make that easy so that platform owners who don't have strong opinions about the ABI can just say "as described in version 1.4 of the generic ABI at https://clang.llvm.org/..."

Now, that is all independent of what we actually decide the ABI should be. But I do think that if we're going to make this available on all targets, we should strongly consider doing a "legalizing" lowering in the frontend: at ABI-exposed points (call arguments/parameters and memory accesses), _BitInt types should be translated into types that a naive backend can be trusted to handle in a way that matches the documented ABI. Individual targets can then opt in to using the natural lowerings if they promise to match the ABI, but that becomes a decision that they make to enable better optimization and code generation, not something that's necessary for correctness.

clang/docs/ReleaseNotes.rst
148

Should we document that this is still considered an experimental feature, in the sense of not yet officially having a stable ABI?

187

Probably more accurate to now say something like:

The ``_ExtInt(N)`` extension has been standardized in C2X as ``_BitInt(N)``.  The mangling of this type in C++ has accordingly changed: under the Microsoft ABI it is now mangled using the ``_BitInt`` spelling, and under the Itanium ABI it is now mangled using a dedicated production.

I think it would be better to provide a generic ABI specification that is designed to "lower" _BitInt types into more basic concepts that ABIs typically already have rules for. It would be appropriate to note at the top that this is only a generic specification and that the rules for any particular platform should be codified in a platform-specific ABI. But we should make that easy so that platform owners who don't have strong opinions about the ABI can just say "as described in version 1.4 of the generic ABI at https://clang.llvm.org/..."

Now, that is all independent of what we actually decide the ABI should be. But I do think that if we're going to make this available on all targets, we should strongly consider doing a "legalizing" lowering in the frontend: at ABI-exposed points (call arguments/parameters and memory accesses), _BitInt types should be translated into types that a naive backend can be trusted to handle in a way that matches the documented ABI. Individual targets can then opt in to using the natural lowerings if they promise to match the ABI, but that becomes a decision that they make to enable better optimization and code generation, not something that's necessary for correctness.

Don't we already have that? We work the same way a 'bool' does, that is, we zero/sign extend for parameters and return values? The backends then reliably up-scale these to the power-of-two/multiple-of-pointer.

Or am I missing something here? The only difference here that I can see vs bool is that we don't do anything for in-function variables:

; Function Attrs: mustprogress noinline nounwind optnone
define dso_local signext i5 @_Z3BarDB5_(i5 signext %b) #0 {
entry:
 %b.addr = alloca i5, align 1
 %c = alloca i5, align 1
  store i5 %b, i5* %b.addr, align 1
  %0 = load i5, i5* %b.addr, align 1
  store i5 %0, i5* %c, align 1
  %1 = load i5, i5* %c, align 1
  ret i5 %1
}

; Function Attrs: mustprogress noinline nounwind optnone
define dso_local zeroext i5 @_Z3BazDU5_(i5 zeroext %b) #0 {
entry:
  %b.addr = alloca i5, align 1
  %c = alloca i5, align 1
  store i5 %b, i5* %b.addr, align 1
  %0 = load i5, i5* %b.addr, align 1
  store i5 %0, i5* %c, align 1
  %1 = load i5, i5* %c, align 1
  ret i5 %1
}
aaron.ballman marked an inline comment as done.

Updated based on review feedback.

  • Rewrote the generic ABI document, please double-check it.
  • Updated the release notes and language extension documents to mention that the ABI is not yet stable everywhere.
rjmccall added inline comments.Mon, Oct 11, 1:18 AM
clang/docs/BitIntABI.rst
17 ↗(On Diff #375602)

I think you should recommend that ABIs codify that by reference to your specification. Perhaps:

This document provides a generic ABI specification for the C23 _BitInt feature, expressed in terms of concepts common to low-level platform ABIs. This document does not directly constrain any platform, as the ABI used for _BitInt on a platform is determined by that platform's ABI specification. It is expected that platform ABIs will codify the ABI for the _BitInt feature on their platform using this specification, either by reference or by copying the text of this specification into the platform ABI specification.

Now, I am assuming that this specification will be complex enough that ABI maintainers won't want to just copy into their specifications. Maybe I'm wrong about that, but I think it will be fairly long by the time it's finished — not, like, Itanium C++ ABI long, but it might be 3-4 pages. If we want to support including this by reference, we need to think about what that reference looks like. I think there are two key parts of that: first, it should explicitly name a version of this document, and second, it should give the parameters that we've listed out below. Basically, someone should be able to see that and have no doubt about what the ABI is.

I would suggest this recommended reference:

Platform ABIs which include this specification by reference should explicitly give a version and the parameters listed below. For example: "The ABI for _BitInt types is specified by version 1 of the specification found at https://clang.llvm.org/docs/BitIntABI.html, using 64 as the maximum fundamental width and long as the chunk type."

To make versioning work, the version history should (1) explicitly say what version this is and (2) be limited to major semantic versions, where you actually change the intended ABI rather than just making editorial changes. (If you never need to do that, great, but you should set up to make it possible.)

42 ↗(On Diff #375602)

On first read, I thought this paragraph was telling me that sizeof behaves differently for _BitInt objects. On second read, I think you're trying to tell me that it *doesn't* behave differently and that arrays of _BitInt might contain padding between elements. However, that's true of all objects of _BitInt that aren't bit-fields, so I think this paragraph is a little misleading.

Perhaps:

_BitInt types are ordinary object types and may be used anywhere an object type can be, such as a struct field, union field, or array element. In addition, they are integer types and may be used as the type of a bit-field. Like any other type, a _BitInt object may require more memory than its stated bit width in order to satisfy the requirements of byte (or higher) alignment. In other words, the width of a _BitInt affects the semantics of operations on the value; it is not a guarantee that the value will be "packed" into exactly that many bits in memory.

47 ↗(On Diff #375602)

In general, this section is a good start, but I think you should split it into three sections:

  • one section describing how _BitInt is laid out as a non-bit-field object;
  • one section describing how a _BitInt is laid out as a bit-field, which I think needs to be much clearer than you've given below; and
  • one section describing how a _BitInt is passed and returned from a function.
71 ↗(On Diff #375602)

Oh, is sign/zero extension the ABI you want to recommend? I wasn't trying to lead you in either direction on this point; I just wanted to make sure this was thought through.

There should be a non-normative rationale / alternatives considered section at the end that makes an argument for the ABI's recommendations here.

90 ↗(On Diff #375602)

BitInts not packing with non-BitInts is a surprising rule.

How does this packing work when allocation units differ in size?

I think it would be better to provide a generic ABI specification that is designed to "lower" _BitInt types into more basic concepts that ABIs typically already have rules for. It would be appropriate to note at the top that this is only a generic specification and that the rules for any particular platform should be codified in a platform-specific ABI. But we should make that easy so that platform owners who don't have strong opinions about the ABI can just say "as described in version 1.4 of the generic ABI at https://clang.llvm.org/..."

Now, that is all independent of what we actually decide the ABI should be. But I do think that if we're going to make this available on all targets, we should strongly consider doing a "legalizing" lowering in the frontend: at ABI-exposed points (call arguments/parameters and memory accesses), _BitInt types should be translated into types that a naive backend can be trusted to handle in a way that matches the documented ABI. Individual targets can then opt in to using the natural lowerings if they promise to match the ABI, but that becomes a decision that they make to enable better optimization and code generation, not something that's necessary for correctness.

Don't we already have that? We work the same way a 'bool' does, that is, we zero/sign extend for parameters and return values? The backends then reliably up-scale these to the power-of-two/multiple-of-pointer.

There's two levels of a "legalizing" lowering.

On the specification level, we'd be formally saying that the ABI matches the ABI of some other C construct. For example, we might say that a signed _BitInt(14) argument is treated exactly as if the argument type was instead signed short (with whatever restriction we do or do not impose on the range). This is very nice on this level because, first, it's a rule that works generically for all targets without needing to care about the details of how arguments are assigned to registers or stack slots or whatever; and second, it means you're never adding new cases to the ABI that e.g. libffi would need to care about; and third, you can reliably match the ABI with manually-lowered C code from a compiler that doesn't even support _BitInts.

On the implementation level, we'd have to actually emit IR which will turn into code which matches that ABI. I think you are right that small integers already reliably work that way in LLVM, so that if you pass an i11 sext it'll get sign-extended to some width — what width exactly, I don't know, but probably the same width that i16 sext would be. Larger integers, I'm not sure. A legalizing lowering of big _BitInt argument would presumably say either that it's exactly the same as a struct containing all the chunks or that it's exactly the same as a series of arguments for each of the chunks. Those are two very different rules! Most ABIs won't "straddle" a single argument between registers and the stack, so e.g. if you'd need two registers and you only have one left then you exhaust the registers and go fully on the stack; but obviously you can get straddling if the lowering expands the sequence. Similarly, some ABIs will go indirect if the struct gets big enough, but you won't get that if the lowering expands. But on the other hand, if the rule is really that it's like a struct, well, some ABIs don't pass small structs in registers. Anyway, my point is that what exactly LLVM targets do if you give them an i117 might not match either of these possibilities reliably across targets. In any case, since we currently have to count registers in the frontend for the register-passing ABIs, we'll need to actually know what the targets do.

! In D108643#3054443, @rjmccall wrote:

! In D108643#3027473, @erichkeane wrote:

! In D108643#3026550, @rjmccall wrote:

I think it would be better to provide a generic ABI specification that is designed to "lower" _BitInt types into more basic concepts that ABIs typically already have rules for. It would be appropriate to note at the top that this is only a generic specification and that the rules for any particular platform should be codified in a platform-specific ABI. But we should make that easy so that platform owners who don't have strong opinions about the ABI can just say "as described in version 1.4 of the generic ABI at https://clang.llvm.org/..."

Now, that is all independent of what we actually decide the ABI should be. But I do think that if we're going to make this available on all targets, we should strongly consider doing a "legalizing" lowering in the frontend: at ABI-exposed points (call arguments/parameters and memory accesses), _BitInt types should be translated into types that a naive backend can be trusted to handle in a way that matches the documented ABI. Individual targets can then opt in to using the natural lowerings if they promise to match the ABI, but that becomes a decision that they make to enable better optimization and code generation, not something that's necessary for correctness.

Don't we already have that? We work the same way a 'bool' does, that is, we zero/sign extend for parameters and return values? The backends then reliably up-scale these to the power-of-two/multiple-of-pointer.

There's two levels of a "legalizing" lowering.

On the specification level, we'd be formally saying that the ABI matches the ABI of some other C construct. For example, we might say that a signed _BitInt(14) argument is treated exactly as if the argument type was instead signed short (with whatever restriction we do or do not impose on the range). This is very nice on this level because, first, it's a rule that works generically for all targets without needing to care about the details of how arguments are assigned to registers or stack slots or whatever; and second, it means you're never adding new cases to the ABI that e.g. libffi would need to care about; and third, you can reliably match the ABI with manually-lowered C code from a compiler that doesn't even support _BitInts.

On the implementation level, we'd have to actually emit IR which will turn into code which matches that ABI. I think you are right that small integers already reliably work that way in LLVM, so that if you pass an i11 sext it'll get sign-extended to some width — what width exactly, I don't know, but probably the same width that i16 sext would be. Larger integers, I'm not sure. A legalizing lowering of big _BitInt argument would presumably say either that it's exactly the same as a struct containing all the chunks or that it's exactly the same as a series of arguments for each of the chunks. Those are two very different rules! Most ABIs won't "straddle" a single argument between registers and the stack, so e.g. if you'd need two registers and you only have one left then you exhaust the registers and go fully on the stack; but obviously you can get straddling if the lowering expands the sequence. Similarly, some ABIs will go indirect if the struct gets big enough, but you won't get that if the lowering expands. But on the other hand, if the rule is really that it's like a struct, well, some ABIs don't pass small structs in registers. Anyway, my point is that what exactly LLVM targets do if you give them an i117 might not match either of these possibilities reliably across targets. In any case, since we currently have to count registers in the frontend for the register-passing ABIs, we'll need to actually know what the targets do.

Hmm... now that I think I understand the concern, I'm not sure we're able to take on this level of commitment. The Clang ABI code is a bit of a mess in general, plus each target has its own code, right (In CodeGen/TargetInfo.cpp)? Additionally, that would require breaking the Microsoft ABI again, and I thought we'd discussed trying not to make changes until Microsoft defined that ABI, right?

I would like us to be able to find a way forward, and hope you can help make a suggestion on how to do so. We'd love to get Clang to support this C23 feature at least on some targets as there are a good amount of users of this feature already as _ExtInt, and we would really want to get them to the standards version ASAP. Do you have any suggestions on how we can move forward?

PS: The current state of the things for the 4 platforms I deal with (and of course the most complicated ones) is:

<32 bit< 64 bit< 128 bit>128 bit
Linux x86_64s/zextalways pass as a coerced i64Pass as 2 i64s 'coerced' while in registers, then Ptr to iN (byval)Ptr to iN (Byval)
Windows x86_64pass as iNpass as iNPtr to iNPtr to iN
Linux i386s/zextpass as iNPtr to iNPtr to iN
Windows i386s/zextpass as iNPtr to iNPtr to iN
+ Your suggestion for ?all? as I understand it?s/zexts/zextSplit into i64s?Split into i64s?

Presumably there is a bit of issue because of register sizes too there, right?

aaron.ballman marked 4 inline comments as done.Mon, Oct 11, 9:44 AM
aaron.ballman added inline comments.
clang/docs/BitIntABI.rst
17 ↗(On Diff #375602)

Thanks, these are good suggestions; I've applied them.

42 ↗(On Diff #375602)

You're correct -- thanks, your new words are far more clear than my old ones.

71 ↗(On Diff #375602)

Oh, is sign/zero extension the ABI you want to recommend? I wasn't trying to lead you in either direction on this point; I just wanted to make sure this was thought through.

We're still nailing this bit down. I personally do not care what the answer is, and I have no idea how we would ever measure to determine what the "right" answer is because. My understanding of the usage pattern for this type is people expect to use it similar to int, so it'll be used for basically all of the operations roughly equally. Based on that, I don't think there is a right answer. So I'm switching back to "unspecified" on the assumption that it gives better performance for a larger set of operations (I'm assuming people "do math" with these types more than they "do logic" (comparisons) with them). But I'll definitely add information to the rationale section.

There should be a non-normative rationale / alternatives considered section at the end that makes an argument for the ABI's recommendations here.

Can do.

90 ↗(On Diff #375602)

BitInts not packing with non-BitInts is a surprising rule.

Hmmm, I thought I had tested and found that unused bits from an oversized _BitInt would not pack together, but now I can't seem to get that behavior. It looks like we *do* pack things into the unused bits. https://godbolt.org/z/4Pavvbeqe

It seems to me that there's no reason to NOT pack into those unused bits, so I think that's the behavior we'd like. I'll correct the docs.

How does this packing work when allocation units differ in size?

Are you wondering about a case like?

struct S {
  unsigned _BitInt(33) i : 8; // 8 byte allocation unit
  unsigned _BitInt(32) j : 8; // 4 byte allocation unit
};

or something else? I think the current behavior here is largely as a result of code review feedback on the original implementation work. At least, this patch isn't changing the behavior from whatever was deemed appropriate for _ExtInt originally. But we should find those rules out and document them here in the ABI doc.

clang/docs/ReleaseNotes.rst
148

Not a bad idea, I've added some additional words here, in the ABI section, and in LanguagateExtensions.rst so the information is widely available.

187

SGTM

aaron.ballman marked 3 inline comments as done.

Updating the generic ABI document based on review feedback.

! In D108643#3054443, @rjmccall wrote:

! In D108643#3027473, @erichkeane wrote:

! In D108643#3026550, @rjmccall wrote:

I think it would be better to provide a generic ABI specification that is designed to "lower" _BitInt types into more basic concepts that ABIs typically already have rules for. It would be appropriate to note at the top that this is only a generic specification and that the rules for any particular platform should be codified in a platform-specific ABI. But we should make that easy so that platform owners who don't have strong opinions about the ABI can just say "as described in version 1.4 of the generic ABI at https://clang.llvm.org/..."

Now, that is all independent of what we actually decide the ABI should be. But I do think that if we're going to make this available on all targets, we should strongly consider doing a "legalizing" lowering in the frontend: at ABI-exposed points (call arguments/parameters and memory accesses), _BitInt types should be translated into types that a naive backend can be trusted to handle in a way that matches the documented ABI. Individual targets can then opt in to using the natural lowerings if they promise to match the ABI, but that becomes a decision that they make to enable better optimization and code generation, not something that's necessary for correctness.

Don't we already have that? We work the same way a 'bool' does, that is, we zero/sign extend for parameters and return values? The backends then reliably up-scale these to the power-of-two/multiple-of-pointer.

There's two levels of a "legalizing" lowering.

On the specification level, we'd be formally saying that the ABI matches the ABI of some other C construct. For example, we might say that a signed _BitInt(14) argument is treated exactly as if the argument type was instead signed short (with whatever restriction we do or do not impose on the range). This is very nice on this level because, first, it's a rule that works generically for all targets without needing to care about the details of how arguments are assigned to registers or stack slots or whatever; and second, it means you're never adding new cases to the ABI that e.g. libffi would need to care about; and third, you can reliably match the ABI with manually-lowered C code from a compiler that doesn't even support _BitInts.

On the implementation level, we'd have to actually emit IR which will turn into code which matches that ABI. I think you are right that small integers already reliably work that way in LLVM, so that if you pass an i11 sext it'll get sign-extended to some width — what width exactly, I don't know, but probably the same width that i16 sext would be. Larger integers, I'm not sure. A legalizing lowering of big _BitInt argument would presumably say either that it's exactly the same as a struct containing all the chunks or that it's exactly the same as a series of arguments for each of the chunks. Those are two very different rules! Most ABIs won't "straddle" a single argument between registers and the stack, so e.g. if you'd need two registers and you only have one left then you exhaust the registers and go fully on the stack; but obviously you can get straddling if the lowering expands the sequence. Similarly, some ABIs will go indirect if the struct gets big enough, but you won't get that if the lowering expands. But on the other hand, if the rule is really that it's like a struct, well, some ABIs don't pass small structs in registers. Anyway, my point is that what exactly LLVM targets do if you give them an i117 might not match either of these possibilities reliably across targets. In any case, since we currently have to count registers in the frontend for the register-passing ABIs, we'll need to actually know what the targets do.

Hmm... now that I think I understand the concern, I'm not sure we're able to take on this level of commitment. The Clang ABI code is a bit of a mess in general, plus each target has its own code, right (In CodeGen/TargetInfo.cpp)?

Ideally we would have a generic handling of it that most ABI code would simply trigger.

Additionally, that would require breaking the Microsoft ABI again, and I thought we'd discussed trying not to make changes until Microsoft defined that ABI, right?

I didn't realize we were matching any other compilers at the moment. Or is that not what you mean?

I think we need to consider the ABI for this experimental and unstable on all platforms.

I would like us to be able to find a way forward, and hope you can help make a suggestion on how to do so. We'd love to get Clang to support this C23 feature at least on some targets as there are a good amount of users of this feature already as _ExtInt, and we would really want to get them to the standards version ASAP. Do you have any suggestions on how we can move forward?

I think the immediate way forward is that we can continue to document this as an experimental feature with an unstable ABI, just with a new name. We do not need to tie these things together as long as we agree that the feature is experimental and the ABI might change. Perhaps it should be enabled by an experimental flag until the ABI is official?

PS: The current state of the things for the 4 platforms I deal with (and of course the most complicated ones) is:

Ah, thank you for this.

<32 bit< 64 bit< 128 bit>128 bit
Linux x86_64s/zextalways pass as a coerced i64Pass as 2 i64s 'coerced' while in registers, then Ptr to iN (byval)Ptr to iN (Byval)
Windows x86_64pass as iNpass as iNPtr to iNPtr to iN
Linux i386s/zextpass as iNPtr to iNPtr to iN
Windows i386s/zextpass as iNPtr to iNPtr to iN
+ Your suggestion for ?all? as I understand it?s/zexts/zextSplit into i64s?Split into i64s?

Lowering semantics would use the platform's normal ABI rules for the lowered type. For example, if we lowered larger-than-fundamental _BitInts to a struct containing the chunks, then unsigned _BitInt(192) would become struct { uint32_t a, b, c, d, e, f; } on 32-bit (which IIUC on both Linux and Windows x86 would be passed directly on the stack, not by reference) and struct { uint64_t a, b, c; } on 64-bit (which IIUC on both Linux and Windows x86_64 would be passed by reference). The target lowering code would recognize _BitInt types and ask the normal code to handle the lowered type instead.

What you're describing above are completely custom target-specific rules, and in some cases they're novel rules that the ABIs don't normally use for other types. The standard i386 CCs basically never pass arguments by reference unless they have a non-trivially-copyable C++ type — the only exception is over-aligned types on Windows. Unless you don't really mean they're passed by pointer; if you're looking at IR, it can be misleading because of byval.

rjmccall added inline comments.Tue, Oct 12, 1:55 AM
clang/docs/BitIntABI.rst
90 ↗(On Diff #375602)

Right, it's fine to start by just documenting what we've got.

90 ↗(On Diff #378708)

It might be worthwhile to pull these definitions into a common Definitions section, since I think they'll apply in most of the sections. You should be explicit about the two parameters, like:

This generic ABI is expressed in terms of two parameters which must be determined by the platform ABI:

  • MaxFundamentalWidth is the bit-width...
  • chunk_t is the type of...

For the convenience of the following specification, other quantities are derived from these:

The paragraph about RepWidth could be clearer. Maybe:

`RepWidth(N) is the *representation width* of a _BitInt of width N. If N <= MaxFundamentalWidth, then RepWidth(N) is the bit-width of the lowest-rank fundamental integer type (excluding _Bool) with at least N bits. Otherwise, RepWidth(N) is the least multiple of the bit-width of chunk_t which is at least N. It is therefore always true that N <= RepWidth(N). When N < RepWidth(N), a _BitInt(N) is represented exactly as if it were extended to a _BitInt(RepWidth(N))` of the same signedness. The value is held in the least significant bits, and the excess (most significant) bits have unspecified values.

The sentence about undefined behavior conflicts with the excess bits taking unspecified values.

145 ↗(On Diff #378708)

Hmm. How about:

Excess bits

When `N < RepWidth(N)`, the ABI has three natural alternatives:

  • The value is kept in the least-significant bits, and the excess (most significant) bits are unspecified.
  • The value is kept in the least-significant bits, and the excess (most significant) bits are required to be a proper zero- or sign-extension of the value (as appropriate for the signedness of the type).
  • The value is left-shifted into the most-significant bits, and the excess (least significant) bits are required to be zero.

Each of these has trade-offs. Leaving the most-significant bits unspecified allows addition, subtraction, multiplication, bitwise complement, left shifts, and narrowing conversions to avoid adjusting these bits in their results. Forcing the most-significant bits to be properly extended allows comparisons, division, right shifts, and widening conversions to avoid adjusting these bits in their operands. Keeping the value left-shifted is good for both addition and comparison, but other operations (especially conversions) become more complex, and the representation is less "natural", which can complicate interacting with other systems. Furthermore, having unspecified bits means that bitwise equality can be false even when semantic equality holds, but not having unspecified bits means that there are "trap representations" which can lead to undefined behavior.

This ABI leaves the most-significant bits unspecified out of a belief that doing so should optimize the most common operations and avoid the most complexity in practice.

! In D108643#3054443, @rjmccall wrote:

! In D108643#3027473, @erichkeane wrote:

! In D108643#3026550, @rjmccall wrote:

I think it would be better to provide a generic ABI specification that is designed to "lower" _BitInt types into more basic concepts that ABIs typically already have rules for. It would be appropriate to note at the top that this is only a generic specification and that the rules for any particular platform should be codified in a platform-specific ABI. But we should make that easy so that platform owners who don't have strong opinions about the ABI can just say "as described in version 1.4 of the generic ABI at https://clang.llvm.org/..."

Now, that is all independent of what we actually decide the ABI should be. But I do think that if we're going to make this available on all targets, we should strongly consider doing a "legalizing" lowering in the frontend: at ABI-exposed points (call arguments/parameters and memory accesses), _BitInt types should be translated into types that a naive backend can be trusted to handle in a way that matches the documented ABI. Individual targets can then opt in to using the natural lowerings if they promise to match the ABI, but that becomes a decision that they make to enable better optimization and code generation, not something that's necessary for correctness.

Don't we already have that? We work the same way a 'bool' does, that is, we zero/sign extend for parameters and return values? The backends then reliably up-scale these to the power-of-two/multiple-of-pointer.

There's two levels of a "legalizing" lowering.

On the specification level, we'd be formally saying that the ABI matches the ABI of some other C construct. For example, we might say that a signed _BitInt(14) argument is treated exactly as if the argument type was instead signed short (with whatever restriction we do or do not impose on the range). This is very nice on this level because, first, it's a rule that works generically for all targets without needing to care about the details of how arguments are assigned to registers or stack slots or whatever; and second, it means you're never adding new cases to the ABI that e.g. libffi would need to care about; and third, you can reliably match the ABI with manually-lowered C code from a compiler that doesn't even support _BitInts.

On the implementation level, we'd have to actually emit IR which will turn into code which matches that ABI. I think you are right that small integers already reliably work that way in LLVM, so that if you pass an i11 sext it'll get sign-extended to some width — what width exactly, I don't know, but probably the same width that i16 sext would be. Larger integers, I'm not sure. A legalizing lowering of big _BitInt argument would presumably say either that it's exactly the same as a struct containing all the chunks or that it's exactly the same as a series of arguments for each of the chunks. Those are two very different rules! Most ABIs won't "straddle" a single argument between registers and the stack, so e.g. if you'd need two registers and you only have one left then you exhaust the registers and go fully on the stack; but obviously you can get straddling if the lowering expands the sequence. Similarly, some ABIs will go indirect if the struct gets big enough, but you won't get that if the lowering expands. But on the other hand, if the rule is really that it's like a struct, well, some ABIs don't pass small structs in registers. Anyway, my point is that what exactly LLVM targets do if you give them an i117 might not match either of these possibilities reliably across targets. In any case, since we currently have to count registers in the frontend for the register-passing ABIs, we'll need to actually know what the targets do.

Hmm... now that I think I understand the concern, I'm not sure we're able to take on this level of commitment. The Clang ABI code is a bit of a mess in general, plus each target has its own code, right (In CodeGen/TargetInfo.cpp)?

Ideally we would have a generic handling of it that most ABI code would simply trigger.

Additionally, that would require breaking the Microsoft ABI again, and I thought we'd discussed trying not to make changes until Microsoft defined that ABI, right?

I didn't realize we were matching any other compilers at the moment. Or is that not what you mean?

Sorry, no, not what I meant, but I see how it is misleading. We currently have an ABI for _ExtInt (admittedly experimental as you said) on Windows that we will have to break at least 1 more time (when MS decides on the ABI). I thought at one point in this discussion we'd decided to keep us from breaking OUR experimental MS ABI as much as possible, so we only broke it 1x.

I think we need to consider the ABI for this experimental and unstable on all platforms.

I would like us to be able to find a way forward, and hope you can help make a suggestion on how to do so. We'd love to get Clang to support this C23 feature at least on some targets as there are a good amount of users of this feature already as _ExtInt, and we would really want to get them to the standards version ASAP. Do you have any suggestions on how we can move forward?

I think the immediate way forward is that we can continue to document this as an experimental feature with an unstable ABI, just with a new name. We do not need to tie these things together as long as we agree that the feature is experimental and the ABI might change. Perhaps it should be enabled by an experimental flag until the ABI is official?

I don't know if we should continue to document this as experimental. This is a language-standardized type that we'd ultimately like to be able to claim support for. I'm not sure on the progress of the ABI going 'official', but I think it has already been published, or nearly so? We've been engaging with HJ, who owns the Itanium ABI.

PS: The current state of the things for the 4 platforms I deal with (and of course the most complicated ones) is:

Ah, thank you for this.

<32 bit< 64 bit< 128 bit>128 bit
Linux x86_64s/zextalways pass as a coerced i64Pass as 2 i64s 'coerced' while in registers, then Ptr to iN (byval)Ptr to iN (Byval)
Windows x86_64pass as iNpass as iNPtr to iNPtr to iN
Linux i386s/zextpass as iNPtr to iNPtr to iN
Windows i386s/zextpass as iNPtr to iNPtr to iN
+ Your suggestion for ?all? as I understand it?s/zexts/zextSplit into i64s?Split into i64s?

Lowering semantics would use the platform's normal ABI rules for the lowered type. For example, if we lowered larger-than-fundamental _BitInts to a struct containing the chunks, then unsigned _BitInt(192) would become struct { uint32_t a, b, c, d, e, f; } on 32-bit (which IIUC on both Linux and Windows x86 would be passed directly on the stack, not by reference) and struct { uint64_t a, b, c; } on 64-bit (which IIUC on both Linux and Windows x86_64 would be passed by reference). The target lowering code would recognize _BitInt types and ask the normal code to handle the lowered type instead.

I think I have a better understanding of this? I perhaps need to ruminate further. I don't know if we'd be able to have a single place where we can do all of this, but it seems like this is a fairly non-trivial task. Aaron and I will discuss it further.

What you're describing above are completely custom target-specific rules, and in some cases they're novel rules that the ABIs don't normally use for other types. The standard i386 CCs basically never pass arguments by reference unless they have a non-trivially-copyable C++ type — the only exception is over-aligned types on Windows. Unless you don't really mean they're passed by pointer; if you're looking at IR, it can be misleading because of byval.

For the windows types, they are currently passed directly as iN*. The only platform that did the 'byval' was the linux 64 bit ABI. This is perhaps novel, but it IS what we do currently for '_ExtInt'.

I think the immediate way forward is that we can continue to document this as an experimental feature with an unstable ABI, just with a new name. We do not need to tie these things together as long as we agree that the feature is experimental and the ABI might change. Perhaps it should be enabled by an experimental flag until the ABI is official?

We've already documented that the ABI is being actively developed in the release notes, in the ABI document, and in LanguagateExtensions.rst, so the ABI information is widely available. I'm fine documenting the *ABI* as being unstable, but I want to make sure we don't give the impression that the *feature* is experimental (it's in a standard that is starting to be finalized for publication), so I'd be opposed to putting a standard feature behind a feature flag (beyond -std).

This is spiraling into a pretty large set of ongoing changes when the original intent was "rename the thing we were already happy with to its new, standard name with the same semantics". Would it make sense to separate some concerns here, as this is starting to block further work? At a high-level, I'm thinking: this patch does the rename, adds the release notes/documentation to be clear that the ABI is not yet stable for the feature, and a second patch is added for the ABI documentation and eventual codegen changes needed for it? This would unblock us from doing additional follow-up work (like adding the BITINT_MAXWIDTH macro to limits.h, which would not make sense without having a type named _BitInt) while giving us the Clang 14 release time frame to clean up the ABI requirements. WDYT?

Yes, I think it would be fine to fork off a conversation about what the ABI should be; I didn't meant to completely derail this patch. To be frank, my expectation was that more of this was already settled and documented, so we just needed to add that documentation and clean up a few details.

Stepping back, I have two concerns about the ABI that are deeper than just these open discussions of trade-offs, and which I need to feel comfortable about before I can really sign off on the implementation of this feature from a code-owner perspective. The first is that I want to make sure we're settling on well-defined ABI rules that aren't just "whatever the corresponding LLVM target happened to do in the summer of 2021 for loads, stores, and arguments of i117 type". The second is that I want to avoid adding a standard C feature on just a handful of platforms and then washing our hands of it. The latter is why I am pushing you in the direction of a "lowering" ABI: because it's a generic solution that gives ABI groups the option of simply signing off on with a pair of so-obvious-they-decide-themselves decisions about underlying types, allowing us to rapidly get consensus from a broad spectrum of ABI maintainers because, after all, it's just the ABI they already use for the same general content. I think that is important because, among other reasons, it is what the C committee has asked you to do as a condition of standardizing this feature. If ABI groups want to go the extra mile of learning about this specific feature and designing custom ABI rules like "pass large _BitInts by pointer even though literally nothing else in our ABI is passed by pointer", that's their prerogative. A large number of language features have ABIs specified by lowering, from block pointers (ABI equivalent to a normal object pointer) to C++ member pointers (ABI equivalent to either an integer or a small struct, depending), although the code in TargetInfo doesn't make that explicit (and really it should, because the way it's done now requires a ton of annoying duplication between targets).

aaron.ballman marked 4 inline comments as done.

Updated the ABI document based on review feedback.