Page MenuHomePhabricator

[AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
ClosedPublic

Authored by saiislam on May 29 2020, 8:04 AM.

Details

Summary

builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)

First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.

Diff Detail

Event Timeline

saiislam created this revision.May 29 2020, 8:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2020, 8:04 AM
arsenm added inline comments.May 29 2020, 8:34 AM
clang/lib/CodeGen/CGBuiltin.cpp
14579

Should not rely on pointer element type, these are always i32 I think

14583–14584

We should fix this (or move these into atomicrmw)

14588

Should this come from whether the builtin was called with a volatile pointer rather than being an explicit parameter?

arsenm requested changes to this revision.May 29 2020, 8:35 AM
This revision now requires changes to proceed.May 29 2020, 8:35 AM

Thanks for this. I like the refactor to share code with amdgcn_fence. Agreed with Matt's points above.

sameerds added a comment.EditedMay 29 2020, 9:09 AM

The commit description needs fixing. These are not "llvm instructions" they are "AMDGCN intrinsics". They don't exist in the LangRef. Also, I would recommend inverting the first line of the description: "Introduce Clang builtins that are mapped to AMDGCN intrinsics".

EDIT: Another nitpick in the description: it's the Clang C ABI, not LLVM C ABI.

sameerds added inline comments.May 29 2020, 9:26 AM
clang/lib/CodeGen/CGBuiltin.cpp
14583–14584

I am not sure why these intrinsics exist as separate from atomicrmw. But while they do, taking a numerical scope is not a problem since they are target-specific. The LLVM instructions take scope as an opaque string just to keep target-specific bits out of the IR.

Actually, the question really is about why inc/dec are needed as separate operations either as IR intrinsics or Clang builtins. Why not just expose a __builtin_amdgcn_atomicrmw that takes a scope, and map it to the LLVM atomicrmw? That would be way cleaner. The language can provide convenience functions for inc/dec that internally call the rmw builtin.

saiislam updated this revision to Diff 267295.May 29 2020, 10:33 AM
  1. Updated title and description.
  2. Replaced pointer element type usage to i32 type for getIntrinsic.
  3. Volatile argument of the instrinsic now comes from pointer type of the first argument of the builtin.
  4. Updated all test cases wrt above change.
  5. Added test case for volatile pointer type.
saiislam retitled this revision from [AMDGPU] Expose llvm atomic inc/dec instructions as clang builtins for AMDGPU target to [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.May 29 2020, 10:37 AM
saiislam edited the summary of this revision. (Show Details)

Actually, the question really is about why inc/dec are needed as separate operations either as IR intrinsics or Clang builtins. Why not just expose a __builtin_amdgcn_atomicrmw that takes a scope, and map it to the LLVM atomicrmw? That would be way cleaner. The language can provide convenience functions for inc/dec that internally call the rmw builtin.

At the moment, atomic inc/dec exist along with atomicrmw. This patch only aims to devise a way to make it (inc/dec) accessible from the language level.

arsenm added inline comments.May 29 2020, 11:32 AM
clang/lib/CodeGen/CGBuiltin.cpp
14579

Nope, they can be i64 (but I assume these aren't overloadable builtins, so would need a separate pair)

Actually, the question really is about why inc/dec are needed as separate operations either as IR intrinsics or Clang builtins. Why not just expose a __builtin_amdgcn_atomicrmw that takes a scope, and map it to the LLVM atomicrmw? That would be way cleaner. The language can provide convenience functions for inc/dec that internally call the rmw builtin.

At the moment, atomic inc/dec exist along with atomicrmw. This patch only aims to devise a way to make it (inc/dec) accessible from the language level.

Just to clarify that a bit, the point is that atomic inc/dec are not available as operations in atomicrmw, and cannot be treated as an optimization of (add 1) or (sub 1). So the intrinsics are required until atomicrmw is enhanced, which is out of scope for this patch.

sameerds accepted this revision.May 29 2020, 10:23 PM

LGTM, but please wait for approval from @arsenm

Thank you @sameerds for the clarification. Sure, I will wait for @arsenm to review it.

arsenm added inline comments.Jun 4 2020, 8:23 AM
clang/include/clang/Basic/BuiltinsAMDGPU.def
62–63

My main concern is we should probably have both 32 and 64-bit variants

saiislam updated this revision to Diff 269013.Jun 6 2020, 5:05 AM
  1. Added 64 bit variants of inc/dec as well
  2. Added test cases for 64 bit variants
saiislam edited the summary of this revision. (Show Details)Jun 6 2020, 5:07 AM
arsenm added inline comments.Jun 8 2020, 12:31 PM
clang/lib/CodeGen/CGBuiltin.cpp
14565

This should be implied by the return type of the builtin? You shouldn't need to switch over it, and you just need to switch between inc/dec intrinsics

saiislam updated this revision to Diff 269327.Jun 8 2020, 12:52 PM

Atomic inc/dec operation width is now based on return type of the builtin.

arsenm accepted this revision.Tue, Jun 9, 9:02 AM

LGTM

This revision is now accepted and ready to land.Tue, Jun 9, 9:02 AM
This revision was automatically updated to reflect the committed changes.

This patch declares the clang builtin as acting on signed values, but the IR intrinsic maps to an instruction which does an unsigned comparison. We don't have ISA support for a signed comparison equivalent. Addition is the same operation on signed or unsigned integers, but signed integer comparison is not equivalent to unsigned integer comparison.

// 32bit
 tmp = MEM[ADDR];
 MEM[ADDR] = (tmp >= DATA) ? 0 : tmp + 1; // unsigned
compare
 RETURN_DATA = tmp.

The builtins should be changed to take unsigned values, optionally making that clear from the naming scheme, perhaps __amdgcn_builtin_atomic_dec_u32.

Apologies for not reviewing this the first time around.

This patch declares the clang builtin as acting on signed values, but the IR intrinsic maps to an instruction which does an unsigned comparison. We don't have ISA support for a signed comparison equivalent. Addition is the same operation on signed or unsigned integers, but signed integer comparison is not equivalent to unsigned integer comparison.

// 32bit
 tmp = MEM[ADDR];
 MEM[ADDR] = (tmp >= DATA) ? 0 : tmp + 1; // unsigned
compare
 RETURN_DATA = tmp.

The builtins should be changed to take unsigned values, optionally making that clear from the naming scheme, perhaps __amdgcn_builtin_atomic_dec_u32.

Apologies for not reviewing this the first time around.

I have created D83121 for this.