Download Raw Diff

Details

Reviewers

jdoerfert
ABataev
hfinkel
grokos
arsenm

Commits

rG133db44996a8: [libomptarget] Implement most hip atomic functions in terms of intrinsics

Summary

[libomptarget] Implement hip atomic functions in terms of intrinsics

All but atomicInc can be implemented using type generic clang intrinsics.
There is not yet a corresponding intrinsic for atomicInc in clang, only one in
LLVM. This patch leaves atomicInc as an unresolved symbol.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JonChesterfield created this revision.Jan 20 2020, 6:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2020, 6:05 PM

Herald added subscribers: openmp-commits, jfb, mgorny, jvesely. · View Herald Transcript

Harbormaster completed remote builds in B44447: Diff 239220.Jan 20 2020, 6:07 PM

Is is necessary, or useful, to have 3 different kinds of intrinsics here? We have __atomic_fetch_max and __atomic_max_fetch in clang, unsure if they could replace the OpenCL intrinsic. Similarly, there is __atomic_compare_exchange and __atomic_exchange. I would also be fine with only amdgcn intrinsics or only opencl ones but I am a little worried about mixing these 3 kinds.

openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt
73 ↗	(On Diff #239220)	Add a TODO here and in the ll file to state this is a temporary workaround. Also state the proper solution please.

jdoerfert added a reviewer: arsenm.Jan 21 2020, 3:00 PM

Herald added a subscriber: wdng. · View Herald TranscriptJan 21 2020, 3:00 PM

In D73076#1832492, @jdoerfert wrote:

Is is necessary, or useful, to have 3 different kinds of intrinsics here? We have __atomic_fetch_max and __atomic_max_fetch in clang, unsure if they could replace the OpenCL intrinsic. Similarly, there is __atomic_compare_exchange and __atomic_exchange. I would also be fine with only amdgcn intrinsics or only opencl ones but I am a little worried about mixing these 3 kinds.

I share this concern. I especially dislike the forced cast to _Atomic T* for the opencl intrinsics as that's skirting a bit close to an aliasing violation.

atomic_fetch_max I missed because gcc doesn't document it. In clang, it only compiles for 32 bit integer types (and we have a use of 64 bit in loop.cu). Some context in D46386.

I remember seeing different codegen for the opencl compare_exchange and the builtin but am not able to reproduce that now. There was some trial and error choosing intrinsics while comparing the codegen to hand written IR.

How about generic intrinsics where available:

fetch_add
exchange
compare_exchange

Amdgcn prefixed IR functions for:

atomic_inc
atomic_fetch_max

with the intent to provide amdgcn intrinsics for inc and fetch_max?

Prefer non-opencl atomics where available, comment the IR function

Harbormaster completed remote builds in B44534: Diff 239446.Jan 21 2020, 3:45 PM

I like this better. Interestingly, we could even think about sharing the clang builtin versions across targets. I'm fine with us keeping it this way for now. What do u think?

openmp/libomptarget/deviceRTLs/amdgcn/src/hip_atomics.h
53	Add a TODO where we want different builtins later on.

arsenm added inline comments.Jan 21 2020, 5:49 PM

openmp/libomptarget/deviceRTLs/amdgcn/src/atomic_inc.ll
12 ↗	(On Diff #239446)	We should also have a builtin for this. It would not be difficult to add
15 ↗	(On Diff #239446)	This should be false. Where are you seeing this imply volatile? I think we started hacking around OpenCL volatile for atomics in device-libs

review comments

Harbormaster completed remote builds in B44547: Diff 239475.Jan 21 2020, 6:38 PM

comment

Harbormaster completed remote builds in B44548: Diff 239476.Jan 21 2020, 6:38 PM

JonChesterfield added inline comments.Jan 21 2020, 6:39 PM

openmp/libomptarget/deviceRTLs/amdgcn/src/atomic_inc.ll
12 ↗	(On Diff #239446)	Agreed and hopefully. It takes a lot of parameters so I was unsure about testing the codegen. Are you willing to review such a patch?
15 ↗	(On Diff #239446)	Fun story. We were using functions from hc_atomic.ll though a wrapper and those functions specified volatile. This does the same to preserve that behaviour. However, dropping volatile from all of these functions doesn't actually break the build. Compiles about as cleanly as before, same set of tests pass. Therefore dropped the qualifier from all of these functions.

arsenm added inline comments.Jan 21 2020, 6:39 PM

openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_atomic.ll
20 ↗	(On Diff #239476)	There must be an existing builtin for this?

JonChesterfield marked 2 inline comments as done.Jan 21 2020, 6:42 PM

JonChesterfield added inline comments.

openmp/libomptarget/deviceRTLs/common/src/loop.cu
796 ↗	(On Diff #239476)	Buffer and elem are already uint64_t, and unsigned long long == 8 bytes on nvptx and amdgcn, so I'm not sure why there was a cast here.

JonChesterfield marked an inline comment as done.Jan 21 2020, 6:43 PM

JonChesterfield added inline comments.

openmp/libomptarget/deviceRTLs/amdgcn/src/amdgcn_atomic.ll
20 ↗	(On Diff #239476)	There is an opencl builtin, but it wants the pointer type cast to an atomic. I'm reluctant to use that when the underlying type is not atomic qualified. Used in the previous version of this diff. There is also a clang builtin, but only for 32 bits.

JonChesterfield added a subscriber: saiislam.Feb 4 2020, 3:32 AM

Discussed in the biweekly openmp call. Agreed that we should write the clang intrinsics first as deviceRTL isn't on the critical path for offloading to amdgcn anyway. I'll update this patch once they're available.

Use atomic max, reformat

Atomic max is implemented on trunk, as of D55562. Thanks Tim! I didn't notice it land. And failed to spot the difference between fetch_max and max_fetch.

Generates the same IR as the previous revision, updated diff accordingly.

Harbormaster completed remote builds in B46358: Diff 244279.Feb 12 2020, 2:15 PM

Checked cuda documentation, replace max_fetch with fetch_max to match their semantics

Harbormaster completed remote builds in B46359: Diff 244280.Feb 12 2020, 2:50 PM

sri added a subscriber: sri.Feb 12 2020, 8:21 PM

I have an amdgcn intrinsic for atomic_inc working without any semantic checks. SemaChecking is rather complicated though so it's going to take a little while to work out what changes need to go there.

Simplify, drop atomicInc impl

At some point in the future we may want to change over to using atomic functions that closer match the C++ ones, and let nvptx implement those in terms of cuda.

This patch is now very minimal - map the few cuda style atomic functions used onto clang's intrinsics, where available.

Harbormaster failed remote builds in B47938: Diff 247956!Mar 3 2020, 11:57 AM

clang-tidy failed because it tried to compile this file for x86, which doesn't work, and because it doesn't like the naming conventions. I can do T r; => T R; and capitalise various names if we like, but the unknown attributes and error on looking for -DAMDGCN will need an infrastructure change to resolve.

I'm fine with this, @arsenm ?

arsenm accepted this revision.Mar 4 2020, 9:29 AM

This revision is now accepted and ready to land.Mar 4 2020, 9:29 AM

Closed by commit rG133db44996a8: [libomptarget] Implement most hip atomic functions in terms of intrinsics (authored by JonChesterfield). · Explain WhyMar 4 2020, 10:06 AM

This revision was automatically updated to reflect the committed changes.

saiislam added a child revision: D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.May 29 2020, 8:06 AM

saiislam removed a child revision: D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.Jul 3 2020, 4:05 AM

Only one of the atomicInc functions is used and clang (trunk) now has an intrinsic that can be used to implement it.

Herald added subscribers: sstefan1, yaxunl. · View Herald TranscriptJul 3 2020, 4:56 AM

This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Implement most hip atomic functions in terms of intrinsics
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248230

openmp/libomptarget/deviceRTLs/amdgcn/src/hip_atomics.h

This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] Implement most hip atomic functions in terms of intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248230

openmp/libomptarget/deviceRTLs/amdgcn/src/hip_atomics.h

[libomptarget] Implement most hip atomic functions in terms of intrinsics
ClosedPublic