Page MenuHomePhabricator

Expose llvm fence instruction as clang intrinsic
ClosedPublic

Authored by saiislam on Mar 10 2020, 5:59 AM.

Details

Summary

Expose llvm fence instruction as clang builtin for AMDGPU target

__builtin_amdgcn_fence(unsigned int memoryOrdering, const char *syncScope)

The first argument of this builtin is one of the memory-ordering specifiers
ATOMIC_ACQUIRE, ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST
following C++11 memory model semantics. This is mapped to corresponding
LLVM atomic memory ordering for the fence instruction using LLVM atomic C
ABI. The second argument is an AMDGPU-specific synchronization scope
defined as string.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
clang/lib/CodeGen/CGBuiltin.cpp
3734

Interesting, fence can't be relaxed

‘fence’ instructions take an ordering argument which defines what synchronizes-with edges they add. They can only be given acquire, release, acq_rel, and seq_cst orderings.

clang/lib/Sema/SemaChecking.cpp
1894

This should reject 'relaxed' - I think that's currently accepted by sema then silently thrown away by codegen

clang/test/SemaOpenCL/atomic-ops.cl
198 ↗(On Diff #249340)

A happy case too please, e.g. to show that it accepts a couple of integers. Looks like __builtin_memory_fence(2, 2); but without an expected-error comment

clang/include/clang/Basic/Builtins.def
1516 ↗(On Diff #249340)

BUILTIN(__builtin_memory_fence, "vii", "n")?

The other fence intrinsics (e.g. __c11_atomic_thread_fence) take signed integers and rely on the built in type checking, which seems reasonable here too

@jdoerfert this is one of the two intrinsics needed to drop the .ll source from the amdgcn deviceRTL. The other is atomic_inc.

Anastasia added inline comments.Mar 10 2020, 9:18 AM
clang/lib/Sema/SemaChecking.cpp
1885

I think we should accept implicit type conversion rules from the language...

The commit summary needs improvement. The syntax is not really necessary there, but instead this needs a better explanation of how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?

clang/include/clang/Basic/DiagnosticSemaKinds.td
7860–7863 ↗(On Diff #249340)

Just above this addition, atomic op seems to emit a warning for invalid memory order. Should that be the case with fence too?

clang/lib/CodeGen/CGBuiltin.cpp
3728

The proposed builtin does not claim to be an OpenCL builtin, so it's probably not correct to simply assume the OpenCL model. Should the model be chosen based on the source language specified?

how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?

Identical to the fence instruction. Which is assumed well thought through already, given it's an IR instruction.

As far as I can tell, fence composes sensibly with other IR then generates the right thing at the back end. So it looks fit for purpose, just not currently available from clang.

clang/lib/CodeGen/CGBuiltin.cpp
3728

The only values for AtomicScopeModelKind are none and OpenCL.

how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?

Identical to the fence instruction. Which is assumed well thought through already, given it's an IR instruction.

As far as I can tell, fence composes sensibly with other IR then generates the right thing at the back end. So it looks fit for purpose, just not currently available from clang.

Well, there is a problem: The LangRef says that scopes are target-defined. This change says that scopes are defined by the high-level language and further assumes that OpenCL scopes make sense in all languages. Besides conflicting with the LangRef, this not seem to work with C++, which has no scopes and nor with CUDA or HIP, whose scopes are not represented in any AtomicScopeModel.

JonChesterfield added a comment.EditedMar 11 2020, 10:12 AM

Well, there is a problem: The LangRef says that scopes are target-defined. This change says that scopes are defined by the high-level language and further assumes that OpenCL scopes make sense in all languages. Besides conflicting with the LangRef, this not seem to work with C++, which has no scopes and nor with CUDA or HIP, whose scopes are not represented in any AtomicScopeModel.

I don't follow. IR has a fence instruction. This builtin maps directly to it, passing whatever integer arguments were given to the intrinsic along unchanged. It's exactly as valid, or invalid, as said fence instruction.

Are you objecting to passing enums in the test cases instead of raw integers?

Or is your issue with the fence instruction itself?

Well, there is a problem: The LangRef says that scopes are target-defined. This change says that scopes are defined by the high-level language and further assumes that OpenCL scopes make sense in all languages. Besides conflicting with the LangRef, this not seem to work with C++, which has no scopes and nor with CUDA or HIP, whose scopes are not represented in any AtomicScopeModel.

I don't follow. IR has a fence instruction. This builtin maps directly to it, passing whatever integer arguments were given to the intrinsic along unchanged. It's exactly as valid, or invalid, as said fence instruction.

Is it really? The scope argument of the IR fence is a target-specific string:
http://llvm.org/docs/LangRef.html#syncscope

The change that I see here is assuming a numerical argument, and also assuming that the numbers used must conform to the OpenCL enum. That would certainly make the builtin quite different from the IR fence.

Is it really? The scope argument of the IR fence is a target-specific string:
http://llvm.org/docs/LangRef.html#syncscope

The change that I see here is assuming a numerical argument, and also assuming that the numbers used must conform to the OpenCL enum. That would certainly make the builtin quite different from the IR fence.

I think I follow. The syncscope takes a string, therefore the builtin that maps onto fence should also take a string for that parameter? That's fine by me. Will help if a new non-opencl syncscope is introduced later.

I think I follow. The syncscope takes a string, therefore the builtin that maps onto fence should also take a string for that parameter? That's fine by me. Will help if a new non-opencl syncscope is introduced later.

Right. To be precise, it is a target-specific string, and should not be processed as if it was an OpenCL scope. The builtin should allow anything that the IR fence would allow in a .ll file created for the specified target.

Can this be revived? Changing the enum to a string still sounds good to me

Please go ahead and update to a string for the scope.

saiislam updated this revision to Diff 255173.Apr 5 2020, 9:45 AM

Removed OpenCL specific dependencies

Now it takes target-specific sync scope as an string.

sameerds requested changes to this revision.Apr 5 2020, 9:04 PM
sameerds added inline comments.
clang/include/clang/Basic/Builtins.def
1583 ↗(On Diff #255173)

This should be moved to be near line 786, where atomic builtins are listed under the comment "// GCC does not support these, they are a Clang extension."

clang/lib/CodeGen/CGBuiltin.cpp
14145

There should no mention of any high-level language here. The correct enum to validate against is llvm::AtomicOrdering from llvm/Support/AtomicOrdering.h, and not the C ABI or any other language ABI.

14166

This seems to be creating a new ID for any arbitrary string passed as sync scope. This should be validated against LLVMContext::getSyncScopeNames().

clang/test/CodeGenHIP/builtin_memory_fence.cpp
5 ↗(On Diff #255173)

There should be a line that tries to do:

__builtin_memory_fence(__ATOMIC_SEQ_CST, "foobar");
9 ↗(On Diff #255173)

Orderings like __ATOMIC_SEQ_CST are defined for C/C++ memory models. They should not be used with the new builtin because this new builtin does not follow any specific language model. For user convenience, the right thing to do is to introduce new tokens in the Clang preprocessor, similar to the __ATOMIC_* tokens. The convenient shortcut is to just tell the user to supply numerical values by looking at the LLVM source code.

From llvm/Support/AtomicOrdering.h, note how the numerical value for __ATOMIC_SEQ_CST is 5, but the numerical value for the LLVM SequentiallyConsistent ordering is 7. The numerical value 5 refers to the LLVM ordering "release". So, if the implementation were correct, this line should result in the following unexpected LLVM IR:

fence syncscope("workgroup") release
This revision now requires changes to proceed.Apr 5 2020, 9:04 PM
saiislam marked 5 inline comments as done.Apr 6 2020, 3:43 AM
saiislam added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
14145

Even though this builtin is supposed to be language-independent, here intention was to provide interface in terms of well known standard C11/C++11 enums for memory order (__ATOMIC_ACQUIRE, etc.), so that user of the builtin don't have to remember and modify their code. The builtin internally maps it as per the expectation of fence instruction.

14166

As the FE is not aware about specific target and implementation of sync scope for that target, getSyncScopeNames() here returns llvm'sdefault sync scopes, which only supports singlethreaded and system as valid scopes. Validity checking of memory scope string is being intentionally left for the later stages which deal with the generated IR.

clang/test/CodeGenHIP/builtin_memory_fence.cpp
9 ↗(On Diff #255173)

As you pointed out, the range of acquire to sequentiallly consistent memory orders for llvm::AtomicOrdering is [4, 7], while for llvm::AtomicOrderingCABI is [2, 5]. Enums of C ABI was taken to ensure easy of use for the users who are familiar with C/C++ standard memory model. It allows them to use macros like __ATOMIC_ACQUIRE etc.
Clang CodeGen of the builtin internally maps C ABI ordering to llvm atomic ordering.

JonChesterfield added inline comments.Apr 6 2020, 4:08 AM
clang/lib/Sema/SemaChecking.cpp
1888

I think I'd write this as a switch over the enum instead of a ranged compare.

It'll codegen to the same thing, but we'll get warnings if more values are introduced to the enum and things will keep working (here, anyway) if the values are reordered.

clang/test/CodeGenHIP/builtin_memory_fence.cpp
9 ↗(On Diff #255173)

What language, implemented in clang, do you have in mind that reusing the existing __ATOMIC_* macros would be incorrect for?

saiislam updated this revision to Diff 255288.Apr 6 2020, 4:44 AM

Changes:

  1. Moved builtin definition with rest of atomic builtins
  2. Updated validity checking of memory order using exact mathches instead of range matching
  3. Added a sucessful test case which passes arbitrary string scope
  4. Corrected formatting
sameerds added inline comments.Apr 6 2020, 7:45 AM
clang/lib/CodeGen/CGBuiltin.cpp
14166

That's pretty strange. At this point, Clang should know what the target is, and it should have a chance to update the list of sync scopes somewhere. @arsenm, do you see a way around this?

clang/test/CodeGenHIP/builtin_memory_fence.cpp
9 ↗(On Diff #255173)

I think we agreed that this builtin exposes the LLVM fence exactly. That would mean it takes arguments defined by LLVM. If you are implementing something different from that, then it first needs to be specified properly. Perhaps you could say that this is a C ABI compatible builtin, that happens to take target specific scopes? That should cover OpenCL whose scope enum is designed to be compatible with C.

Whatever it is that you are trying to implement here, it definitely does not expose a raw LLVM fence.

JonChesterfield added inline comments.Apr 6 2020, 9:35 AM
clang/lib/CodeGen/CGBuiltin.cpp
14166

There is already sufficient IR level checking for the string at the instruction level. Warning in clang as well could be a nicer user experience, but that seems low priority to me.

clang/test/CodeGenHIP/builtin_memory_fence.cpp
9 ↗(On Diff #255173)

The llvm fence, in text form, uses a symbol for the memory scope. Not an enum.

This symbol is set using these macros for the existing atomic builtins. Using an implementation detail of clang instead is strictly worse, by layering and by precedent.

ABI is not involved here. Nor is OpenCl.

sameerds requested changes to this revision.Apr 6 2020, 10:12 AM
sameerds added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
14166

If there is some checking happening anywhere, then that needs to be demonstrated in a testcase where the input high-level program passes an illegal string as the scope argument.

clang/test/CodeGenHIP/builtin_memory_fence.cpp
9 ↗(On Diff #255173)

The __ATOMIC_* symbols in Clang quite literally represent the C/C++ ABI. See the details in AtomicOrdering.h and InitPreprocessor.cpp. I am not opposed to specifying that the builtin expects these symbols, but then it is incorrect to say that the builtin exposes the raw LLVM builtin. It is a C-ABI-compatible builtin that happens to take target-specific scope as a string argument. And that would also make it an overload of the already existing builting __atomic_fence().

This revision now requires changes to proceed.Apr 6 2020, 10:12 AM
clang/test/CodeGenHIP/builtin_memory_fence.cpp
9 ↗(On Diff #255173)

I don't know what you mean by "raw", but am guessing you're asking for documentation for the intrinsic. Said documentation should indeed be added for this builtin - it'll probably be in a tablegen file.

clang/test/CodeGenHIP/builtin_memory_fence.cpp
9 ↗(On Diff #255173)

I will try to stop using builtin and intrinsic as synonyms.

clang/test/CodeGenHIP/builtin_memory_fence.cpp
1 ↗(On Diff #255288)

Codegen test should be under CodeGen and/or CodeGenCXX

sameerds added inline comments.Apr 6 2020, 9:46 PM
clang/test/CodeGenHIP/builtin_memory_fence.cpp
9 ↗(On Diff #255173)

Right. It's actually an LLVM instruction, not even an intrinsic. I am guilty of using the wrong term quite often, but usually the context helps. I think the following is still needed:

  1. A testcase that exercises the builtin with an invalid string argument for scope.
  2. An update to the change description describing what is being introduced here. It is more than a direct mapping from builtin to instruction. The C ABI is involved.
  3. An update to the Clang documentation describing this new builtin: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-functions
b-sumner added a comment.EditedApr 8 2020, 2:27 PM

In addition to predefining __ATOMIC_RELAXED, etc., clang also predefines __OPENCL_MEMORY_SCOPE_WORK_ITEM and friends. So it doesn't really seem unreasonable for clang to also predefine its known syncscopes, and to require the argument to be one of those integers.

saiislam updated this revision to Diff 258368.Apr 17 2020, 10:46 AM
saiislam marked an inline comment as done.
  1. Moved test case to clang/test/CodeGenCXX.
  2. Added a failing test case with invalid sync scope, which gets detected by implementation of fence instruction.
  3. Updated the change description of the builtin.
  4. Updated the clang documentation describing mapping of C++ memory-ordering to LLVM memory-ordering.
saiislam edited the summary of this revision. (Show Details)Apr 17 2020, 10:47 AM

The tests look good, but I can't see the implementation in this diff. Maybe a file missing from the patch? Can be hard to tell with phabricator, the error may be at my end.

saiislam updated this revision to Diff 259238.Apr 22 2020, 4:33 AM

Changed the builtin to be AMDGCN-specific

It is named as __builtin_amdgcn_fence(order, scope)

saiislam updated this revision to Diff 259239.Apr 22 2020, 4:37 AM

Removed stale commented code

saiislam edited the summary of this revision. (Show Details)Apr 22 2020, 4:40 AM

Amdgcn specific is fine by me. Hopefully that unblocks this.

arsenm added inline comments.Apr 22 2020, 7:24 AM
clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp
5 ↗(On Diff #259239)

Does this really depend on C++? Can it use OpenCL like the other builtin tests?This also belongs in a Sema* test directory since it's checking an error

clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp
5 ↗(On Diff #259239)

Making it opencl-only would force some of the openmp runtime to be written in opencl, which is not presently the case. Currently that library is written in a dialect of hip, but there's a plan to implement it in openmp instead.

I'd much rather this builtin work from any language, instead of tying it to opencl, as that means one can use it from openmp target regions.

b-sumner added inline comments.Apr 22 2020, 8:17 AM
clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp
5 ↗(On Diff #259239)

I thought the question was about this test itself. The test being in CodeGenOpenCL doesn't affect whether other languages can use the builtin. Why not put this test in CodeGenOpenCL alongside all of the other builtins-amdgcn-*.cl ?

saiislam updated this revision to Diff 259298.Apr 22 2020, 8:26 AM

Moved builtin-amdgcn-fence-failure.cpp from clang/test/CodeGenCXX/
to clang/test/Sema/ since it is checking an error.

saiislam marked an inline comment as done.Apr 22 2020, 8:33 AM
saiislam added inline comments.
clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp
5 ↗(On Diff #259239)

This test is basically relying on implementation of sync scope in the AMDGCN backend to check for validity, and is not testing the code generation capability. Doesn't it make more sense in Sema directory?

Won't putting it in SemaOpenCL or SemaOpenCLCXX indicate some kind of relation with OpenCL?

Can you please submit a squashed single commit for review against the master branch? All the recent commits seem to be relative to previous commits, and I am having trouble looking at the change as a whole.

The commit description needs some fixes:

  1. Remove use of Title Case, in places like "using Fence instruction" and "LLVM Atomic Memory Ordering" and "as a String". They are simply common nouns, not proper nouns. When in doubt, look at the LangRef to see how these words are used as common nouns.
  2. Don't mention that enum values are okay too. I am sure they will always be okay, but it's better to encourage the use of symbols.
  3. Don't mention HSA even if the AMDGPU user manual does so.
  4. In fact the last two sentences in the description are not strictly necessary ... they are the logical outcome of the scopes being target-specific strings.
  5. Note that the general practice in documentation is to say "AMDGPU" which covers the LLVM Target, while "amdgcn" is only used in the code because it is one of multiple architectures in the AMDGPU backend.
clang/docs/LanguageExtensions.rst
2458 ↗(On Diff #259298)

We probably don't need to document the new builtin here. Clearly, we have not documented any other AMDGPU builtin, and there is no need to start now. If necessary, we can take that up as a separate task covering all builtins.

saiislam updated this revision to Diff 259485.Apr 22 2020, 11:13 PM

Removed documentation from clang doc. Squashed all changes into a single commit.

saiislam edited the summary of this revision. (Show Details)Apr 22 2020, 11:16 PM
saiislam updated this revision to Diff 259545.Apr 23 2020, 6:18 AM

Added check and test for sync scope to be a constant literal.

saiislam marked an inline comment as done.Apr 23 2020, 6:24 AM
saiislam added inline comments.
clang/lib/Sema/SemaChecking.cpp
3070

@sameerds here is the check for sync scope to be constant literal

sameerds accepted this revision.Apr 23 2020, 7:56 PM

Thanks @saiislam ... this looks much better!

Two nitpicks, that must be fixed. But it is okay if you directly submit after fixing them.

  1. The change description should use "const char *" in the signature and not "String".
  2. Can you please add a test that passes an integer constant as the scope? I am assuming that the signature check will complain that it is not a string.
This revision is now accepted and ready to land.Apr 23 2020, 7:56 PM
saiislam edited the summary of this revision. (Show Details)Apr 24 2020, 5:47 PM
saiislam updated this revision to Diff 260053.Apr 24 2020, 6:40 PM

Updated description and added a failing test case for integer scope.

This revision was automatically updated to reflect the committed changes.