This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/clang: Add builtins for llvm.amdgcn.ballot
ClosedPublic

Authored by arsenm on Jun 18 2020, 5:35 AM.

Details

Summary

I wasn't sure what the best strategy was for the wave size
difference. I went for an explicit, enforced builtin for each. The
other option would be to just assume wave64, and IRGen the different
mangling + zext. I didn't see an obvious way to check the wave size
where builtins are emitted, and it might be beneficial to force you to
acknowledge the wave size difference? Or it might be an unnecessary
complication.

The behavior is also slightly odd when directly specifying
-target-feature to cc1 for +/- the size (since we have both positive
and negative forms of both sizes), but this is probably unimportant.

We're also still missing a predefined macro for the wave size, which
we probably need.

Diff Detail

Event Timeline

arsenm created this revision.Jun 18 2020, 5:35 AM

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

clang/lib/Basic/Targets/AMDGPU.cpp
348

So the implication here is that wave32 is the preferred choice on newer architectures, and hence the default when available?

yaxunl added inline comments.Jul 9 2020, 5:45 AM
clang/lib/Basic/Targets/AMDGPU.cpp
353

what's the default wave front size in backend for gfx10* before this change?

clang/test/CodeGenOpenCL/amdgpu-features.cl
7

what happens if both +wavefrontsize32 and +wavefrontsize64 are specified?

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

This seems like a defect in the design to me

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

arsenm marked 2 inline comments as done.Jul 10 2020, 11:04 AM
arsenm added inline comments.
clang/lib/Basic/Targets/AMDGPU.cpp
348

Yes, this has always been the case

353

It's always been wave32

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I think if the language interface insists on fixing the wave size, then I think the correct solution is to implement this in the header based on a wave size macro (which we're desperately missing). The library implementation should be responsible for inserting the extension to 64-bit for wave32

sameerds added a comment.EditedJul 12 2020, 12:19 AM

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I think if the language interface insists on fixing the wave size, then I think the correct solution is to implement this in the header based on a wave size macro (which we're desperately missing). The library implementation should be responsible for inserting the extension to 64-bit for wave32

Not sure if the frontend should try to infer warpsize and the mask size, or even whether it can in all cases. But this can result in wrong behaviour when the program passes 32-bit mask but then gets compiled for a 64-bit mask. It's easy to say that the programmer must not assume a warp-size, but it would be useful if the language can altogether avoid the confusion.

The documentation for HIP __ballot seems to indicate that the user does not have to explicitly specify the warp size. How is that achieved with these new builtins? Can this be captured in a lit test here?

This seems like a defect in the design to me

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I tend to agree. The HIP vote/ballot builtins are also missing a mask parameter, whose type needs to match the wavesize.

yaxunl added a comment.EditedJul 12 2020, 6:09 AM

https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md#warp-vote-and-ballot-functions

I think if the language interface insists on fixing the wave size, then I think the correct solution is to implement this in the header based on a wave size macro (which we're desperately missing). The library implementation should be responsible for inserting the extension to 64-bit for wave32

Agree that FE should have a predefined macro for wave front size. Since it is per amdgpu target and not per language, it should be named as __amdgpu_wavefront_size__ or something similar, then it could be used by all languages.

Then we need to initialize warpSize in HIP header by this macro
https://github.com/ROCm-Developer-Tools/HIP/blob/386a0e0123d67b95b4c0ebb3ebcf1d1615758146/include/hip/hcc_detail/device_functions.h#L300

I tends to think we should define __ballot in HIP header conditionally by the wavefront size so that the return type is correct for both wave32 and wave64 mode. We should assume normal HIP compilation always have -mcpu specified so that wavefront size is known at compile time. If -mcpu is not specified probably we should not define warpSize or __ballot.

A macro for wavefront size would make targeting gfx10 for openmp easier.

We currently use an int32_t for nvptx and an int64_t for amdgcn in various runtime function interfaces. I'd like to be able to set the latter based on said macro.

Can we land this? I'd like to use the new intrinsics as I don't understand the old ones.

Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2022, 4:43 PM
Herald added a subscriber: kosarev. · View Herald Transcript

Can we land this? I'd like to use the new intrinsics as I don't understand the old ones.

What do you think about using the two separate builtins, vs. one magic builtin that auto-changes the wavesize?

Can we land this? I'd like to use the new intrinsics as I don't understand the old ones.

What do you think about using the two separate builtins, vs. one magic builtin that auto-changes the wavesize?

The magic one would also change its return type, or always be i64 with high bits (zext or maybe sext or maybe copy of low), so less magic seems clearer

arsenm updated this revision to Diff 484980.Dec 22 2022, 3:53 PM

Rebase. Use _w32/_w64 suffixes since some other wave specific builtins seem to have gone with that convention

sameerds added inline comments.Dec 23 2022, 1:01 AM
clang/test/CodeGenOpenCL/amdgpu-features.cl
7

Shouldn't this be separately an error in itself? Is it tested elsewhere?

arsenm added inline comments.Dec 23 2022, 4:53 AM
clang/test/CodeGenOpenCL/amdgpu-features.cl
7

It looks like you end up with both features set by clang, and wave64 wins in codegen

arsenm planned changes to this revision.Dec 23 2022, 5:19 AM

This doesn't work correctly for unspecified wavesize for non-wave32 targets

arsenm updated this revision to Diff 485110.Dec 23 2022, 6:45 AM

Fix unknown target handling, diagnose some more of the errors

sameerds accepted this revision.Dec 26 2022, 12:39 AM
sameerds added inline comments.
clang/lib/Basic/Targets/AMDGPU.cpp
353

I would have preferred this to be a separate change, just like the FIXME for diagnosing wavefrontsize32 on targets that don't support it. But not feeling strongly enough to block this change!

This revision is now accepted and ready to land.Dec 26 2022, 12:39 AM

And note that the change description is written in a first-person train of thought. Please do rewrite it!