This is an archive of the discontinued LLVM Phabricator instance.

[HLSL] Disable integer promotion to avoid int16_t being promoted to int for HLSL.
ClosedPublic

Authored by python3kgae on Sep 11 2022, 6:05 AM.

Details

Summary

short will be promoted to int in UsualUnaryConversions.
Disable it for HLSL to keep int16_t as 16bit.

Diff Detail

Event Timeline

python3kgae created this revision.Sep 11 2022, 6:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 11 2022, 6:05 AM
python3kgae requested review of this revision.Sep 11 2022, 6:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 11 2022, 6:05 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Drive-by comment before I get into the review: does HLSL intend to follow the standard in terms of behavior of intN_t? If yes, then this doesn't follow the behavior allowed by the standard or the direction WG14 chose. We discussed https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2960.pdf at our Jul 2022 meeting, and this particular topic was part of that paper. The result of the preference poll was:

Opinion poll: Which stdint.h types should be allowed to be bit-precise integer types?
0) Leave it as is - [u]intN_t may not be bit-precise, but [u]intptr_t and [u]intmax_t are unclear.
 (no one asked for this direction)
1) None of [u]intN_t, [u]intptr_t and [u]intmax_t.
 9 / 5 / 4 (this direction)
2) None of [u]intN_t, [u]intptr_t and [u]intmax_t, unless they are wider than int.
 7 / 7 / 5 (not this direction)

So we decided explicitly to not allow intN_t to be defined in terms of a bit-precise integer type.

HLSL deviates from C here. HLSL doesn't actually have short (although I'm actually not sure we should disable it in Clang). We do have int16_t, but we don't promote int16_t to int. We discussed changing codegen to disable promotion for HLSL, but it seemed more straightforward to me to just define int16_t as _BitInt(16).

HLSL deviates from C here. HLSL doesn't actually have short (although I'm actually not sure we should disable it in Clang). We do have int16_t, but we don't promote int16_t to int. We discussed changing codegen to disable promotion for HLSL, but it seemed more straightforward to me to just define int16_t as _BitInt(16).

Okay, that's good to know! If you don't intend to *ever* conform to the standard in this area, then I think this approach is very reasonable. But you should know up front that you're constraining yourself here. (Changing the underlying type in the future is an ABI break: https://godbolt.org/z/P6ndrzMab, note the name mangling.)

clang/lib/Basic/Targets/DirectX.h
66 ↗(On Diff #459358)

This change requires more testing/thought, IMO -- do you support 128-bit operations? When we bump that limit to be arbitrarily high, should DX have the arbitrary limits or do you need to enforce something lower? Have you considered how you want to pack these into structures or other data layout considerations?

beanz added a comment.Sep 12 2022, 8:56 AM

Okay, that's good to know! If you don't intend to *ever* conform to the standard in this area, then I think this approach is very reasonable. But you should know up front that you're constraining yourself here. (Changing the underlying type in the future is an ABI break: https://godbolt.org/z/P6ndrzMab, note the name mangling.)

We have the benefit of ABI escape hatches. HLSL itself doesn't define a permanently stable ABI since GPU hardware and runtime ABIs change too frequently. We instead revision our ABI every few years as the DirectX and Vulkan specifications evolve.

My hope is that as the HLSL language and our runtime ABIs evolve we'll be more and more conformant to the C standard, but there are some odd areas that we might never quite get there on.

The 16-bit integer math is an interesting case. Because GPUs are inherently SIMD machines, on many architectures you can handle twice as many 16-bit operations per instruction as 32-bit (yay vectors!). Combine that with HLSL's SPMD programming model and all scalar math is actually vector math. This makes integer promotion for 16-bit types severely limiting. As a result I don't suspect we'll ever want to conform to C here.

Okay, that's good to know! If you don't intend to *ever* conform to the standard in this area, then I think this approach is very reasonable. But you should know up front that you're constraining yourself here. (Changing the underlying type in the future is an ABI break: https://godbolt.org/z/P6ndrzMab, note the name mangling.)

We have the benefit of ABI escape hatches. HLSL itself doesn't define a permanently stable ABI since GPU hardware and runtime ABIs change too frequently. We instead revision our ABI every few years as the DirectX and Vulkan specifications evolve.

My hope is that as the HLSL language and our runtime ABIs evolve we'll be more and more conformant to the C standard, but there are some odd areas that we might never quite get there on.

The 16-bit integer math is an interesting case. Because GPUs are inherently SIMD machines, on many architectures you can handle twice as many 16-bit operations per instruction as 32-bit (yay vectors!). Combine that with HLSL's SPMD programming model and all scalar math is actually vector math. This makes integer promotion for 16-bit types severely limiting. As a result I don't suspect we'll ever want to conform to C here.

Ah, good to know!

Btw, it looks like precommit CI is finding failures here.

Okay, that's good to know! If you don't intend to *ever* conform to the standard in this area, then I think this approach is very reasonable. But you should know up front that you're constraining yourself here. (Changing the underlying type in the future is an ABI break: https://godbolt.org/z/P6ndrzMab, note the name mangling.)

We have the benefit of ABI escape hatches. HLSL itself doesn't define a permanently stable ABI since GPU hardware and runtime ABIs change too frequently. We instead revision our ABI every few years as the DirectX and Vulkan specifications evolve.

My hope is that as the HLSL language and our runtime ABIs evolve we'll be more and more conformant to the C standard, but there are some odd areas that we might never quite get there on.

The 16-bit integer math is an interesting case. Because GPUs are inherently SIMD machines, on many architectures you can handle twice as many 16-bit operations per instruction as 32-bit (yay vectors!). Combine that with HLSL's SPMD programming model and all scalar math is actually vector math. This makes integer promotion for 16-bit types severely limiting. As a result I don't suspect we'll ever want to conform to C here.

Ah, good to know!

Btw, it looks like precommit CI is finding failures here.

It is strange that the tests passed locally when CI hit fail in this PR. But in https://reviews.llvm.org/D133634, I hit fail locally while CI passed all tests.
I'll check and fix local failures I can repro first.

Okay, that's good to know! If you don't intend to *ever* conform to the standard in this area, then I think this approach is very reasonable. But you should know up front that you're constraining yourself here. (Changing the underlying type in the future is an ABI break: https://godbolt.org/z/P6ndrzMab, note the name mangling.)

We have the benefit of ABI escape hatches. HLSL itself doesn't define a permanently stable ABI since GPU hardware and runtime ABIs change too frequently. We instead revision our ABI every few years as the DirectX and Vulkan specifications evolve.

My hope is that as the HLSL language and our runtime ABIs evolve we'll be more and more conformant to the C standard, but there are some odd areas that we might never quite get there on.

The 16-bit integer math is an interesting case. Because GPUs are inherently SIMD machines, on many architectures you can handle twice as many 16-bit operations per instruction as 32-bit (yay vectors!). Combine that with HLSL's SPMD programming model and all scalar math is actually vector math. This makes integer promotion for 16-bit types severely limiting. As a result I don't suspect we'll ever want to conform to C here.

Ah, good to know!

Btw, it looks like precommit CI is finding failures here.

It is strange that the tests passed locally when CI hit fail in this PR. But in https://reviews.llvm.org/D133634, I hit fail locally while CI passed all tests.
I'll check and fix local failures I can repro first.

Might be some issue with the stack PR. Should be OK once rebase with https://reviews.llvm.org/D133634.

beanz added inline comments.Sep 14 2022, 11:13 AM
clang/lib/Basic/Targets/DirectX.h
66 ↗(On Diff #459358)

Yea, we definitely need to set the max width to 64 for DirectX.

Rebase and update test.

Adding some codegen reviewers for awareness.

clang/lib/Basic/Targets/DirectX.h
66 ↗(On Diff #459358)

Nothing seems to have handled this comment yet. Be sure to add a Sema test for that as well.

Limit max bitint width to 64 for HLSL.

python3kgae marked 3 inline comments as done.Sep 19 2022, 12:37 PM

LGTM, but please wait a bit to land for the codegen reviewers to weigh in on those tests.

Hi, @rjmccall and @efriedma,
Could you take a look at this PR?
Or should I find someone else to review it?

Thanks
Xiang

If your goal is just to pass 16-bit types in some specific way without promotion, you can just do that in the ABI. C only requires short arguments to be promoted to int in variadic or unprototyped positions, and it's not legal under the C type compatibility rules to call a function prototyped with a short parameter using an unprototyped function type. (Not that I expect that you care deeply about supporting unprototyped function calls in HLSL anyway.) So if you're looking an existing targets and feeling constrained by their ABI decision to promote small integer types, be aware that you can simply decide not to do that; those targets are mostly constrained by very old decisions that were informed by the state of languages and libraries a long time, and those decisions do not need to be carried forward to new targets.

But if you're sure you want to do this this way, your tests look fine.

beanz added a comment.Oct 9 2022, 1:24 PM

Avoiding argument promotion is one part of what we need, but not all of it. For example if you take this trial code:

const RWBuffer<int16_t2> In;
RWBuffer<int16_t> Out;

[numthreads(1,1,1)]
void main(uint GI : SV_GroupIndex) {
  Out[GI] = In[GI].x + In[GI].y;
}

Following C rules, clang promotes the short math to int math, so the IR for main looks like:

; Function Attrs: noinline norecurse nounwind optnone
define internal void @"?main@@YAXI@Z"(i32 noundef %GI) #2 {
entry:
  %GI.addr = alloca i32, align 4
  store i32 %GI, ptr %GI.addr, align 4
  %0 = load i32, ptr %GI.addr, align 4
  %call = call noundef nonnull align 4 dereferenceable(4) ptr @"??A?$RWBuffer@T?$__vector@F$01@__clang@@@hlsl@@QBAAAT?$__vector@F$01@__clang@@I@Z"(ptr noundef nonnull align 4 dereferenceable(4) @In, i32 noundef %0)
  %1 = load <2 x i16>, ptr %call, align 4
  %2 = extractelement <2 x i16> %1, i32 0
  %conv = sext i16 %2 to i32
  %3 = load i32, ptr %GI.addr, align 4
  %call1 = call noundef nonnull align 4 dereferenceable(4) ptr @"??A?$RWBuffer@T?$__vector@F$01@__clang@@@hlsl@@QBAAAT?$__vector@F$01@__clang@@I@Z"(ptr noundef nonnull align 4 dereferenceable(4) @In, i32 noundef %3)
  %4 = load <2 x i16>, ptr %call1, align 4
  %5 = extractelement <2 x i16> %4, i32 1
  %conv2 = sext i16 %5 to i32
  %add = add nsw i32 %conv, %conv2
  %conv3 = trunc i32 %add to i16
  %6 = load i32, ptr %GI.addr, align 4
  %call4 = call noundef nonnull align 2 dereferenceable(2) ptr @"??A?$RWBuffer@F@hlsl@@QAAAAFI@Z"(ptr noundef nonnull align 4 dereferenceable(4) @"?Out@@3V?$RWBuffer@F@hlsl@@A", i32 noundef %6)
  store i16 %conv3, ptr %call4, align 2
  ret void
}

Because of the implicit vector nature of HLSL, these promotions and truncations would be extremely expensive. Using _BitInt allows us a language-header only solution that opts HLSL's [u]int16_t out of Sema::UsualUnaryConversions.

Sure, but it's extremely easy to unpromote that arithmetic for most operators, and I'm sure LLVM already has a pass that will do that. The only thing that blocks unpromotion is when you feed a promoted result into something that's sensitive to working on a wider value, like comparison or division, or of course an assignment to a wider type.

Alternatively, if you're worried about your ability to unpromote, and since you're breaking strict conformance anyway, have you considered just removing the initial promotion to int from the usual arithmetic conversions? I'm pretty sure the rest of the rules hang together just fine if you do. Then you have a uniform rule that permits easier vectorization for all small integer types rather than being sensitive specifically to using the int16_t typedef.

beanz added a comment.Oct 10 2022, 1:47 PM

Sure, but it's extremely easy to unpromote that arithmetic for most operators, and I'm sure LLVM already has a pass that will do that.

Okay... but HLSL explicitly doesn't promote. Having the compiler rely on an optimization to generate correct code is undesirable. Especially since we want debug builds to behave correctly.

Alternatively, if you're worried about your ability to unpromote, and since you're breaking strict conformance anyway, have you considered just removing the initial promotion to int from the usual arithmetic conversions? I'm pretty sure the rest of the rules hang together just fine if you do. Then you have a uniform rule that permits easier vectorization for all small integer types rather than being sensitive specifically to using the int16_t typedef.

HLSL isn't conformant to C or C++. One of the big areas that things get wonky is that every int is actually an implicit SIMD vector of a hardware-determined size.

We could disable promotion for HLSL. That is what we do in DXC. Part of what made me think BitInt was a better solution is that HLSL doesn't have the short keyword at all. The only 16-bit int types we support are the [u]int16_t explicit-sized typedefs. If you think disabling promotion under the HLSL language mode is a better approach, we can do that instead.

Sure, but it's extremely easy to unpromote that arithmetic for most operators, and I'm sure LLVM already has a pass that will do that.

Okay... but HLSL explicitly doesn't promote. Having the compiler rely on an optimization to generate correct code is undesirable. Especially since we want debug builds to behave correctly.

Alternatively, if you're worried about your ability to unpromote, and since you're breaking strict conformance anyway, have you considered just removing the initial promotion to int from the usual arithmetic conversions? I'm pretty sure the rest of the rules hang together just fine if you do. Then you have a uniform rule that permits easier vectorization for all small integer types rather than being sensitive specifically to using the int16_t typedef.

HLSL isn't conformant to C or C++. One of the big areas that things get wonky is that every int is actually an implicit SIMD vector of a hardware-determined size.

But that's purely on the implementation level, right? Everything is implicitly vectorized and you're just specifying the computation of a single lane, but as far as that lane-wise computation is concerned, you just have expressions which produce scalar values.

We could disable promotion for HLSL. That is what we do in DXC. Part of what made me think BitInt was a better solution is that HLSL doesn't have the short keyword at all. The only 16-bit int types we support are the [u]int16_t explicit-sized typedefs. If you think disabling promotion under the HLSL language mode is a better approach, we can do that instead.

If you don't otherwise have 16-bit (or 8-bit?) types, and it's the type behavior you want, I'm fine with you just using _BitInt. I just want to make sure I understand the situation well enough to feel confident that you've considered the alternatives.

beanz added a comment.Oct 10 2022, 2:34 PM

But that's purely on the implementation level, right? Everything is implicitly vectorized and you're just specifying the computation of a single lane, but as far as that lane-wise computation is concerned, you just have expressions which produce scalar values.

Yes, with the caveat that the language doesn't dictate the maximum SIMD width, but some features have minimum widths. The language source (and IR) operate on one lane of scalar and vector values, but we do have cross-SIMD lane operations, and true scalar (uniform) values, so we have to model the full breadth of parallel fun...

If you don't otherwise have 16-bit (or 8-bit?) types, and it's the type behavior you want, I'm fine with you just using _BitInt. I just want to make sure I understand the situation well enough to feel confident that you've considered the alternatives.

We don't currently have 8-bit types (although I fully expect someone will want them because ML seems to love small data types). I suspect that the promoting integer behaviors for types smaller than int will likely never make sense for HLSL (or really any SPMD/implicit-SIMD) programming model).

My inclination is that we should define all our smaller than int fixed-size integer types as _BitInt to opt-out of promotion. Along with that I expect that we'll disable short under HLSL. We will still have char, but the intent for char is really for the _extremely_ limited cases where strings get used (i.e. printf for debugging).

If you have char, would you want it to promote? Because turning char to _BitInt(8) is breaking with C on other grounds (like aliasing), for better or worse. So if you just don't want promotion, maybe you really should just disable promotion.

Switch back to short and disable integer promote for hlsl.

python3kgae retitled this revision from [HLSL] Use _BitInt(16) for int16_t to avoid promote to int. to [HLSL] Disable int16_t to avoid promote to int for HLSL..Oct 18 2022, 10:47 AM
python3kgae edited the summary of this revision. (Show Details)

If you have char, would you want it to promote? Because turning char to _BitInt(8) is breaking with C on other grounds (like aliasing), for better or worse. So if you just don't want promotion, maybe you really should just disable promotion.

Hi @rjmccall,
Thanks for the review.
I've switched back to short and disabled the integer promotion for HLSL.

Okay, this seems fine to me. I think you accidentally removed the word "promotion" from the patch title, though.

I assume HLSL also provides uint16_t, and you should test the behavior for it. You should also test this for whatever 8-bit types HLSL provides (char, signed char, and unsigned char, I assume, but maybe also int8_t and uint8_t?).

clang/lib/Sema/SemaExpr.cpp
841
clang/test/CodeGenHLSL/basic_types.hlsl
7 ↗(On Diff #468617)

Why the removal of the alignment?

python3kgae marked an inline comment as done.

Update comment and fix tests.

python3kgae retitled this revision from [HLSL] Disable int16_t to avoid promote to int for HLSL. to [HLSL] Disable integer promotion to avoid int16_t being promoted to int for HLSL..Oct 18 2022, 4:09 PM
aaron.ballman added inline comments.Oct 19 2022, 5:09 AM
clang/lib/Sema/SemaExpr.cpp
841

Hmm, the downside to the change being applied here is that isPromotableIntegerType() is a lie for HLSL and other callers are going to be surprised. I think a better way to achieve this same goal is to move isPromotableIntegerType() from Type to ASTContext so that it has access to the language options, and then hoist the language mode check so that isPromotableIntegerType() returns false for integer types in HLSL.

python3kgae marked an inline comment as done.

Move isPromotableIntegerType to ASTContext so it can access LangOpts.

aaron.ballman accepted this revision.Oct 20 2022, 8:18 AM

LGTM! Please give @rjmccall a few days to look it over as well before landing.

clang/include/clang/AST/ASTContext.h
2379 ↗(On Diff #468993)
This revision is now accepted and ready to land.Oct 20 2022, 8:18 AM

Update test.

Fix clang-format.