This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
1
LanguageExtensions.rst
-
include/clang/
-
clang/
-
Basic/
1/2
Builtins.def
-
BuiltinsAMDGPU.def
-
Sema/
-
Sema.h
-
lib/
-
CodeGen/
2/10
CGBuiltin.cpp
-
Sema/
1/5
SemaChecking.cpp
-
test/
-
CodeGenCXX/
1/4
builtin-amdgcn-fence-failure.cpp
3/11
builtin-amdgcn-fence.cpp
-
CodeGenHIP/
-
builtin_memory_fence.cpp
-
Sema/
-
builtins.c
-
SemaOpenCL/
-
builtins-amdgcn-error.cl

Differential D75917

Expose llvm fence instruction as clang intrinsic
ClosedPublic

Authored by saiislam on Mar 10 2020, 5:59 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
sameerds
yaxunl
gregrodgers
b-sumner
Anastasia
arsenm

Commits

rG06bdffb2bb45: [AMDGPU] Expose llvm fence instruction as clang intrinsic

Summary

Expose llvm fence instruction as clang builtin for AMDGPU target

__builtin_amdgcn_fence(unsigned int memoryOrdering, const char *syncScope)

The first argument of this builtin is one of the memory-ordering specifiers
ATOMIC_ACQUIRE, ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST
following C++11 memory model semantics. This is mapped to corresponding
LLVM atomic memory ordering for the fence instruction using LLVM atomic C
ABI. The second argument is an AMDGPU-specific synchronization scope
defined as string.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

saiislam created this revision.Mar 10 2020, 5:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2020, 5:59 AM

Herald added subscribers: cfe-commits, jfb. · View Herald Transcript

JonChesterfield added inline comments.Mar 10 2020, 7:11 AM

clang/lib/Sema/SemaChecking.cpp
1885	isIntegerType will return true for signed integers as well as unsigned. It seems reasonable to call this with a signed integer type (e.g. '2'), so perhaps the references to unsigned should be dropped from the code and error message
clang/test/CodeGenOpenCL/atomic-ops.cl
291 ↗	(On Diff #249340)	I'm hoping this intrinsic will be usable from C or C++, as well as from OpenCL - please could you add a non-opencl codegen test. It doesn't need to check all the cases again, just enough to show that the intrinsic and arguments are available (they're spelled like `__ATOMIC_SEQ_CST`, `__OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES` outside of opencl)

JonChesterfield added inline comments.Mar 10 2020, 7:11 AM

clang/lib/CodeGen/CGBuiltin.cpp
3715	Interesting, fence can't be relaxed ‘fence’ instructions take an ordering argument which defines what synchronizes-with edges they add. They can only be given acquire, release, acq_rel, and seq_cst orderings.
clang/lib/Sema/SemaChecking.cpp
1894	This should reject 'relaxed' - I think that's currently accepted by sema then silently thrown away by codegen
clang/test/SemaOpenCL/atomic-ops.cl
198 ↗	(On Diff #249340)	A happy case too please, e.g. to show that it accepts a couple of integers. Looks like `__builtin_memory_fence(2, 2);` but without an expected-error comment

JonChesterfield added inline comments.Mar 10 2020, 7:17 AM

clang/include/clang/Basic/Builtins.def
1517	`BUILTIN(__builtin_memory_fence, "vii", "n")`? The other fence intrinsics (e.g. __c11_atomic_thread_fence) take signed integers and rely on the built in type checking, which seems reasonable here too

JonChesterfield added a reviewer: Anastasia.Mar 10 2020, 8:43 AM

@jdoerfert this is one of the two intrinsics needed to drop the .ll source from the amdgcn deviceRTL. The other is atomic_inc.

Anastasia added inline comments.Mar 10 2020, 9:18 AM

clang/lib/Sema/SemaChecking.cpp
1885	I think we should accept implicit type conversion rules from the language...

The commit summary needs improvement. The syntax is not really necessary there, but instead this needs a better explanation of how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?

clang/include/clang/Basic/DiagnosticSemaKinds.td
7860–7863 ↗	(On Diff #249340)	Just above this addition, atomic op seems to emit a warning for invalid memory order. Should that be the case with fence too?
clang/lib/CodeGen/CGBuiltin.cpp
3709	The proposed builtin does not claim to be an OpenCL builtin, so it's probably not correct to simply assume the OpenCL model. Should the model be chosen based on the source language specified?

In D75917#1916160, @sameerds wrote:

how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?

Identical to the fence instruction. Which is assumed well thought through already, given it's an IR instruction.

As far as I can tell, fence composes sensibly with other IR then generates the right thing at the back end. So it looks fit for purpose, just not currently available from clang.

clang/lib/CodeGen/CGBuiltin.cpp
3709	The only values for AtomicScopeModelKind are none and OpenCL.

In D75917#1916664, @JonChesterfield wrote:

In D75917#1916160, @sameerds wrote:

how this builtin fits in with the overall scheme of language-specific and target-specific details of an atomic operation. For example, is this meant only for OpenCL? Does it work with CUDA? Or HIP? What is the behaviour for scope in C++?

Identical to the fence instruction. Which is assumed well thought through already, given it's an IR instruction.

As far as I can tell, fence composes sensibly with other IR then generates the right thing at the back end. So it looks fit for purpose, just not currently available from clang.

Well, there is a problem: The LangRef says that scopes are target-defined. This change says that scopes are defined by the high-level language and further assumes that OpenCL scopes make sense in all languages. Besides conflicting with the LangRef, this not seem to work with C++, which has no scopes and nor with CUDA or HIP, whose scopes are not represented in any AtomicScopeModel.

In D75917#1916972, @sameerds wrote:

Well, there is a problem: The LangRef says that scopes are target-defined. This change says that scopes are defined by the high-level language and further assumes that OpenCL scopes make sense in all languages. Besides conflicting with the LangRef, this not seem to work with C++, which has no scopes and nor with CUDA or HIP, whose scopes are not represented in any AtomicScopeModel.

I don't follow. IR has a fence instruction. This builtin maps directly to it, passing whatever integer arguments were given to the intrinsic along unchanged. It's exactly as valid, or invalid, as said fence instruction.

Are you objecting to passing enums in the test cases instead of raw integers?

Or is your issue with the fence instruction itself?

In D75917#1917296, @JonChesterfield wrote:

In D75917#1916972, @sameerds wrote:

Well, there is a problem: The LangRef says that scopes are target-defined. This change says that scopes are defined by the high-level language and further assumes that OpenCL scopes make sense in all languages. Besides conflicting with the LangRef, this not seem to work with C++, which has no scopes and nor with CUDA or HIP, whose scopes are not represented in any AtomicScopeModel.

I don't follow. IR has a fence instruction. This builtin maps directly to it, passing whatever integer arguments were given to the intrinsic along unchanged. It's exactly as valid, or invalid, as said fence instruction.

Is it really? The scope argument of the IR fence is a target-specific string:
http://llvm.org/docs/LangRef.html#syncscope

The change that I see here is assuming a numerical argument, and also assuming that the numbers used must conform to the OpenCL enum. That would certainly make the builtin quite different from the IR fence.

In D75917#1925617, @sameerds wrote:

Is it really? The scope argument of the IR fence is a target-specific string:
http://llvm.org/docs/LangRef.html#syncscope

The change that I see here is assuming a numerical argument, and also assuming that the numbers used must conform to the OpenCL enum. That would certainly make the builtin quite different from the IR fence.

I think I follow. The syncscope takes a string, therefore the builtin that maps onto fence should also take a string for that parameter? That's fine by me. Will help if a new non-opencl syncscope is introduced later.

In D75917#1925700, @JonChesterfield wrote:

I think I follow. The syncscope takes a string, therefore the builtin that maps onto fence should also take a string for that parameter? That's fine by me. Will help if a new non-opencl syncscope is introduced later.

Right. To be precise, it is a target-specific string, and should not be processed as if it was an OpenCL scope. The builtin should allow anything that the IR fence would allow in a .ll file created for the specified target.

Can this be revived? Changing the enum to a string still sounds good to me

Please go ahead and update to a string for the scope.

Removed OpenCL specific dependencies

Now it takes target-specific sync scope as an string.

Harbormaster failed remote builds in B51846: Diff 255173!Apr 5 2020, 10:40 AM

sameerds requested changes to this revision.Apr 5 2020, 9:04 PM

sameerds added inline comments.

clang/include/clang/Basic/Builtins.def
1583	This should be moved to be near line 786, where atomic builtins are listed under the comment "// GCC does not support these, they are a Clang extension."
clang/lib/CodeGen/CGBuiltin.cpp
13630	There should no mention of any high-level language here. The correct enum to validate against is llvm::AtomicOrdering from llvm/Support/AtomicOrdering.h, and not the C ABI or any other language ABI.
13651	This seems to be creating a new ID for any arbitrary string passed as sync scope. This should be validated against LLVMContext::getSyncScopeNames().
clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
5	There should be a line that tries to do: __builtin_memory_fence(__ATOMIC_SEQ_CST, "foobar");
9	Orderings like `__ATOMIC_SEQ_CST` are defined for C/C++ memory models. They should not be used with the new builtin because this new builtin does not follow any specific language model. For user convenience, the right thing to do is to introduce new tokens in the Clang preprocessor, similar to the `__ATOMIC_*` tokens. The convenient shortcut is to just tell the user to supply numerical values by looking at the LLVM source code. From llvm/Support/AtomicOrdering.h, note how the numerical value for `__ATOMIC_SEQ_CST` is 5, but the numerical value for the LLVM SequentiallyConsistent ordering is 7. The numerical value 5 refers to the LLVM ordering "release". So, if the implementation were correct, this line should result in the following unexpected LLVM IR: fence syncscope("workgroup") release

This revision now requires changes to proceed.Apr 5 2020, 9:04 PM

saiislam marked 5 inline comments as done.Apr 6 2020, 3:43 AM

saiislam added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp
13630	Even though this builtin is supposed to be language-independent, here intention was to provide interface in terms of well known standard C11/C++11 enums for memory order (__ATOMIC_ACQUIRE, etc.), so that user of the builtin don't have to remember and modify their code. The builtin internally maps it as per the expectation of fence instruction.
13651	As the FE is not aware about specific target and implementation of sync scope for that target, getSyncScopeNames() here returns llvm'sdefault sync scopes, which only supports singlethreaded and system as valid scopes. Validity checking of memory scope string is being intentionally left for the later stages which deal with the generated IR.
clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
9	As you pointed out, the range of acquire to sequentiallly consistent memory orders for llvm::AtomicOrdering is [4, 7], while for llvm::AtomicOrderingCABI is [2, 5]. Enums of C ABI was taken to ensure easy of use for the users who are familiar with C/C++ standard memory model. It allows them to use macros like __ATOMIC_ACQUIRE etc. Clang CodeGen of the builtin internally maps C ABI ordering to llvm atomic ordering.

JonChesterfield added inline comments.Apr 6 2020, 4:08 AM

clang/lib/Sema/SemaChecking.cpp
1888	I think I'd write this as a switch over the enum instead of a ranged compare. It'll codegen to the same thing, but we'll get warnings if more values are introduced to the enum and things will keep working (here, anyway) if the values are reordered.
clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
9	What language, implemented in clang, do you have in mind that reusing the existing __ATOMIC_* macros would be incorrect for?

Changes:

Moved builtin definition with rest of atomic builtins
Updated validity checking of memory order using exact mathches instead of range matching
Added a sucessful test case which passes arbitrary string scope
Corrected formatting

Harbormaster failed remote builds in B51917: Diff 255288!Apr 6 2020, 5:22 AM

sameerds added a reviewer: arsenm.Apr 6 2020, 7:36 AM

Herald added a subscriber: wdng. · View Herald TranscriptApr 6 2020, 7:36 AM

sameerds added inline comments.Apr 6 2020, 7:45 AM

clang/lib/CodeGen/CGBuiltin.cpp
13651	That's pretty strange. At this point, Clang should know what the target is, and it should have a chance to update the list of sync scopes somewhere. @arsenm, do you see a way around this?
clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
9	I think we agreed that this builtin exposes the LLVM fence exactly. That would mean it takes arguments defined by LLVM. If you are implementing something different from that, then it first needs to be specified properly. Perhaps you could say that this is a C ABI compatible builtin, that happens to take target specific scopes? That should cover OpenCL whose scope enum is designed to be compatible with C. Whatever it is that you are trying to implement here, it definitely does not expose a raw LLVM fence.

JonChesterfield added inline comments.Apr 6 2020, 9:35 AM

clang/lib/CodeGen/CGBuiltin.cpp
13651	There is already sufficient IR level checking for the string at the instruction level. Warning in clang as well could be a nicer user experience, but that seems low priority to me.
clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
9	The llvm fence, in text form, uses a symbol for the memory scope. Not an enum. This symbol is set using these macros for the existing atomic builtins. Using an implementation detail of clang instead is strictly worse, by layering and by precedent. ABI is not involved here. Nor is OpenCl.

sameerds requested changes to this revision.Apr 6 2020, 10:12 AM

sameerds added inline comments.

clang/lib/CodeGen/CGBuiltin.cpp
13651	If there is some checking happening anywhere, then that needs to be demonstrated in a testcase where the input high-level program passes an illegal string as the scope argument.
clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
9	The `__ATOMIC_*` symbols in Clang quite literally represent the C/C++ ABI. See the details in AtomicOrdering.h and InitPreprocessor.cpp. I am not opposed to specifying that the builtin expects these symbols, but then it is incorrect to say that the builtin exposes the raw LLVM builtin. It is a C-ABI-compatible builtin that happens to take target-specific scope as a string argument. And that would also make it an overload of the already existing builting __atomic_fence().

This revision now requires changes to proceed.Apr 6 2020, 10:12 AM

JonChesterfield added inline comments.Apr 6 2020, 11:50 AM

clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
9	I don't know what you mean by "raw", but am guessing you're asking for documentation for the intrinsic. Said documentation should indeed be added for this builtin - it'll probably be in a tablegen file.

JonChesterfield added inline comments.Apr 6 2020, 11:52 AM

clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
9	I will try to stop using builtin and intrinsic as synonyms.

JonChesterfield added inline comments.Apr 6 2020, 12:13 PM

clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
1	Codegen test should be under CodeGen and/or CodeGenCXX

sameerds added inline comments.Apr 6 2020, 9:46 PM

clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp
9	Right. It's actually an LLVM instruction, not even an intrinsic. I am guilty of using the wrong term quite often, but usually the context helps. I think the following is still needed: A testcase that exercises the builtin with an invalid string argument for scope. An update to the change description describing what is being introduced here. It is more than a direct mapping from builtin to instruction. The C ABI is involved. An update to the Clang documentation describing this new builtin: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-functions

In addition to predefining __ATOMIC_RELAXED, etc., clang also predefines __OPENCL_MEMORY_SCOPE_WORK_ITEM and friends. So it doesn't really seem unreasonable for clang to also predefine its known syncscopes, and to require the argument to be one of those integers.

Moved test case to clang/test/CodeGenCXX.
Added a failing test case with invalid sync scope, which gets detected by implementation of fence instruction.
Updated the change description of the builtin.
Updated the clang documentation describing mapping of C++ memory-ordering to LLVM memory-ordering.

Harbormaster failed remote builds in B53748: Diff 258368!Apr 17 2020, 10:47 AM

saiislam edited the summary of this revision. (Show Details)Apr 17 2020, 10:47 AM

The tests look good, but I can't see the implementation in this diff. Maybe a file missing from the patch? Can be hard to tell with phabricator, the error may be at my end.

Changed the builtin to be AMDGCN-specific

It is named as __builtin_amdgcn_fence(order, scope)

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptApr 22 2020, 4:33 AM

Removed stale commented code

saiislam edited the summary of this revision. (Show Details)Apr 22 2020, 4:40 AM

Harbormaster failed remote builds in B54228: Diff 259239!Apr 22 2020, 5:22 AM

Harbormaster failed remote builds in B54227: Diff 259238!

Amdgcn specific is fine by me. Hopefully that unblocks this.

arsenm added inline comments.Apr 22 2020, 7:24 AM

clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp
5	Does this really depend on C++? Can it use OpenCL like the other builtin tests?This also belongs in a Sema* test directory since it's checking an error

JonChesterfield added inline comments.Apr 22 2020, 7:40 AM

clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp
5	Making it opencl-only would force some of the openmp runtime to be written in opencl, which is not presently the case. Currently that library is written in a dialect of hip, but there's a plan to implement it in openmp instead. I'd much rather this builtin work from any language, instead of tying it to opencl, as that means one can use it from openmp target regions.

b-sumner added inline comments.Apr 22 2020, 8:17 AM

clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp
5	I thought the question was about this test itself. The test being in CodeGenOpenCL doesn't affect whether other languages can use the builtin. Why not put this test in CodeGenOpenCL alongside all of the other builtins-amdgcn-*.cl ?

Moved builtin-amdgcn-fence-failure.cpp from clang/test/CodeGenCXX/
to clang/test/Sema/ since it is checking an error.

saiislam marked an inline comment as done.Apr 22 2020, 8:33 AM

saiislam added inline comments.

clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp
5	This test is basically relying on implementation of sync scope in the AMDGCN backend to check for validity, and is not testing the code generation capability. Doesn't it make more sense in Sema directory? Won't putting it in SemaOpenCL or SemaOpenCLCXX indicate some kind of relation with OpenCL?

Harbormaster failed remote builds in B54258: Diff 259298!Apr 22 2020, 9:46 AM

Can you please submit a squashed single commit for review against the master branch? All the recent commits seem to be relative to previous commits, and I am having trouble looking at the change as a whole.

The commit description needs some fixes:

Remove use of Title Case, in places like "using Fence instruction" and "LLVM Atomic Memory Ordering" and "as a String". They are simply common nouns, not proper nouns. When in doubt, look at the LangRef to see how these words are used as common nouns.
Don't mention that enum values are okay too. I am sure they will always be okay, but it's better to encourage the use of symbols.
Don't mention HSA even if the AMDGPU user manual does so.
In fact the last two sentences in the description are not strictly necessary ... they are the logical outcome of the scopes being target-specific strings.
Note that the general practice in documentation is to say "AMDGPU" which covers the LLVM Target, while "amdgcn" is only used in the code because it is one of multiple architectures in the AMDGPU backend.

clang/docs/LanguageExtensions.rst
2458	We probably don't need to document the new builtin here. Clearly, we have not documented any other AMDGPU builtin, and there is no need to start now. If necessary, we can take that up as a separate task covering all builtins.

Removed documentation from clang doc. Squashed all changes into a single commit.

saiislam edited the summary of this revision. (Show Details)Apr 22 2020, 11:16 PM

Herald added a subscriber: tpr. · View Herald TranscriptApr 22 2020, 11:16 PM

Harbormaster failed remote builds in B54358: Diff 259485!Apr 22 2020, 11:57 PM

Added check and test for sync scope to be a constant literal.

saiislam marked an inline comment as done.Apr 23 2020, 6:24 AM

saiislam added inline comments.

clang/lib/Sema/SemaChecking.cpp
2958	@sameerds here is the check for sync scope to be constant literal

Thanks @saiislam ... this looks much better!

Two nitpicks, that must be fixed. But it is okay if you directly submit after fixing them.

The change description should use "const char *" in the signature and not "String".
Can you please add a test that passes an integer constant as the scope? I am assuming that the signature check will complain that it is not a string.

This revision is now accepted and ready to land.Apr 23 2020, 7:56 PM

saiislam edited the summary of this revision. (Show Details)Apr 24 2020, 5:47 PM

Updated description and added a failing test case for integer scope.

Harbormaster failed remote builds in B54661: Diff 260053!Apr 24 2020, 7:30 PM

Closed by commit rG06bdffb2bb45: [AMDGPU] Expose llvm fence instruction as clang intrinsic (authored by saiislam, committed by sameerds). · Explain WhyApr 26 2020, 9:15 PM

This revision was automatically updated to reflect the committed changes.

saiislam added a child revision: D80804: [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics.May 29 2020, 8:06 AM

Revision Contents

Path

Size

clang/

docs/

LanguageExtensions.rst

57 lines

include/

clang/

Basic/

Builtins.def

5 lines

BuiltinsAMDGPU.def

1 line

Sema/

Sema.h

1 line

lib/

CodeGen/

CGBuiltin.cpp

2 lines

Sema/

SemaChecking.cpp

63 lines

test/

CodeGenCXX/

builtin-amdgcn-fence-failure.cpp

9 lines

	CodeGenCXX/
	CodeGenHIP/

	builtin-amdgcn-fence.cpp
	builtin_memory_fence.cpp

17 lines

CodeGenHIP/

builtin_memory_fence.cpp

Sema/

builtins.c

12 lines

SemaOpenCL/

builtins-amdgcn-error.cl

8 lines

Diff 259239

clang/docs/LanguageExtensions.rst

	Show First 20 Lines • Show All 2,449 Lines • ▼ Show 20 Lines
	builtins and OpenCL 2.0 ``__opencl_atomic_*`` builtins. The OpenCL 2.0			builtins and OpenCL 2.0 ``__opencl_atomic_*`` builtins. The OpenCL 2.0
	atomic builtins are an explicit form of the corresponding OpenCL 2.0			atomic builtins are an explicit form of the corresponding OpenCL 2.0
	builtin function, and are named with a ``__opencl_`` prefix. The macros			builtin function, and are named with a ``__opencl_`` prefix. The macros
	``__OPENCL_MEMORY_SCOPE_WORK_ITEM``, ``__OPENCL_MEMORY_SCOPE_WORK_GROUP``,			``__OPENCL_MEMORY_SCOPE_WORK_ITEM``, ``__OPENCL_MEMORY_SCOPE_WORK_GROUP``,
	``__OPENCL_MEMORY_SCOPE_DEVICE``, ``__OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES``,			``__OPENCL_MEMORY_SCOPE_DEVICE``, ``__OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES``,
	and ``__OPENCL_MEMORY_SCOPE_SUB_GROUP`` are provided, with values			and ``__OPENCL_MEMORY_SCOPE_SUB_GROUP`` are provided, with values
	corresponding to the enumerators of OpenCL's ``memory_scope`` enumeration.)			corresponding to the enumerators of OpenCL's ``memory_scope`` enumeration.)

				AMDGCN specific builtins
				sameerdsUnsubmitted Not Done Reply Inline Actions We probably don't need to document the new builtin here. Clearly, we have not documented any other AMDGPU builtin, and there is no need to start now. If necessary, we can take that up as a separate task covering all builtins. sameerds: We probably don't need to document the new builtin here. Clearly, we have not documented any…
				-------------------------

				``__builtin_amdgcn_fence``
				-------------------------

				``__builtin_amdgcn_fence`` allows using `Fence instruction <https://llvm.org/docs/LangRef.html#fence-instruction>`_
				from clang. It takes C++11 compatible memory-ordering and AMDGCN-specific
				sync-scope as arguments, and generates a fence instruction in the IR.

				Syntax:

				.. code-block:: c++

				__builtin_amdgcn_fence(unsigned int memory_ordering, String sync_scope)

				Example of use:

				.. code-block:: c++

				void my_fence(int i) {
				i++;
				__builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "workgroup");
				i--;
				__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "agent");
				}

				Description:

				The first argument of ``__builtin_amdgcn_fence()`` builtin is one of the
				memory-ordering specifiers ``__ATOMIC_ACQUIRE``, ``__ATOMIC_RELEASE``,
				``__ATOMIC_ACQ_REL``, or ``__ATOMIC_SEQ_CST`` following C++11 memory model
				semantics. Equivalent enum values of these memory-ordering can also be
				specified. The builtin maps these C++ memory-ordering to corresponding
				LLVM Atomic Memory Ordering for the fence instruction using LLVM Atomic C
				ABI, as given in the table below. The second argument is a AMDGCN-specific
				synchronization scope defined as a String. It can take any of the sync scopes
				defined for `AMDHSA LLVM Sync Scopes <https://llvm.org/docs/AMDGPUUsage.html#memory-scopes>`_
				This builtin transparently passes the second argument to fence instruction
				and relies on AMDGCN implementation for validity check.

				+------------------------------+--------------------------------+
				\| Input in clang \| Output in IR \|
				\| (C++11 Memory-ordering) \| (LLVM Atomic Memory-ordering) \|
				+======================+=======+========================+=======+
				\| Enum \| Value \| Enum \| Value \|
				+----------------------+-------+------------------------+-------+
				\| ``__ATOMIC_ACQUIRE`` \| 2 \| Acquire \| 4 \|
				+----------------------+-------+------------------------+-------+
				\| ``__ATOMIC_RELEASE`` \| 3 \| Release \| 5 \|
				+----------------------+-------+------------------------+-------+
				\| ``__ATOMIC_ACQ_REL`` \| 4 \| AcquireRelease \| 6 \|
				+----------------------+-------+------------------------+-------+
				\| ``__ATOMIC_SEQ_CST`` \| 5 \| SequentiallyConsistent \| 7 \|
				+----------------------+-------+------------------------+-------+


	Low-level ARM exclusive memory builtins			Low-level ARM exclusive memory builtins
	---------------------------------------			---------------------------------------

	Clang provides overloaded builtins giving direct access to the three key ARM			Clang provides overloaded builtins giving direct access to the three key ARM
	instructions for implementing atomic operations.			instructions for implementing atomic operations.

	.. code-block:: c			.. code-block:: c

	▲ Show 20 Lines • Show All 974 Lines • Show Last 20 Lines

clang/include/clang/Basic/Builtins.def

	Show First 20 Lines • Show All 779 Lines • ▼ Show 20 Lines
	// Non-overloaded atomic builtins.			// Non-overloaded atomic builtins.
	BUILTIN(__sync_synchronize, "v", "n")			BUILTIN(__sync_synchronize, "v", "n")
	// GCC does not support these, they are a Clang extension.			// GCC does not support these, they are a Clang extension.
	BUILTIN(__sync_fetch_and_min, "iiD*i", "n")			BUILTIN(__sync_fetch_and_min, "iiD*i", "n")
	BUILTIN(__sync_fetch_and_max, "iiD*i", "n")			BUILTIN(__sync_fetch_and_max, "iiD*i", "n")
	BUILTIN(__sync_fetch_and_umin, "UiUiD*Ui", "n")			BUILTIN(__sync_fetch_and_umin, "UiUiD*Ui", "n")
	BUILTIN(__sync_fetch_and_umax, "UiUiD*Ui", "n")			BUILTIN(__sync_fetch_and_umax, "UiUiD*Ui", "n")

	// clang builtin to expose llvm fence instruction
	// First argument : uint in range [2, 5] i.e. [acquire, seq_cst]
	// Second argument : target specific sync scope string
	BUILTIN(__builtin_memory_fence, "vUicC*", "n")

	// Random libc builtins.			// Random libc builtins.
	BUILTIN(__builtin_abort, "v", "Fnr")			BUILTIN(__builtin_abort, "v", "Fnr")
	BUILTIN(__builtin_index, "ccCi", "Fn")			BUILTIN(__builtin_index, "ccCi", "Fn")
	BUILTIN(__builtin_rindex, "ccCi", "Fn")			BUILTIN(__builtin_rindex, "ccCi", "Fn")

	// ignored glibc builtin, see https://sourceware.org/bugzilla/show_bug.cgi?id=25399			// ignored glibc builtin, see https://sourceware.org/bugzilla/show_bug.cgi?id=25399
	BUILTIN(__warn_memset_zero_len, "v", "nU")			BUILTIN(__warn_memset_zero_len, "v", "nU")

	▲ Show 20 Lines • Show All 713 Lines • ▼ Show 20 Lines

	BUILTIN(__builtin_coro_id, "vIivvv", "n")			BUILTIN(__builtin_coro_id, "vIivvv", "n")
	BUILTIN(__builtin_coro_alloc, "b", "n")			BUILTIN(__builtin_coro_alloc, "b", "n")
	BUILTIN(__builtin_coro_begin, "vv", "n")			BUILTIN(__builtin_coro_begin, "vv", "n")
	BUILTIN(__builtin_coro_end, "bv*Ib", "n")			BUILTIN(__builtin_coro_end, "bv*Ib", "n")
	BUILTIN(__builtin_coro_suspend, "cIb", "n")			BUILTIN(__builtin_coro_suspend, "cIb", "n")
	BUILTIN(__builtin_coro_param, "bvv", "n")			BUILTIN(__builtin_coro_param, "bvv", "n")

	// OpenCL v2.0 s6.13.16, s9.17.3.5 - Pipe functions.			// OpenCL v2.0 s6.13.16, s9.17.3.5 - Pipe functions.
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions `BUILTIN(__builtin_memory_fence, "vii", "n")`? The other fence intrinsics (e.g. __c11_atomic_thread_fence) take signed integers and rely on the built in type checking, which seems reasonable here too JonChesterfield: `BUILTIN(__builtin_memory_fence, "vii", "n")`? The other fence intrinsics (e.g.
	// We need the generic prototype, since the packet type could be anything.			// We need the generic prototype, since the packet type could be anything.
	LANGBUILTIN(read_pipe, "i.", "tn", OCLC20_LANG)			LANGBUILTIN(read_pipe, "i.", "tn", OCLC20_LANG)
	LANGBUILTIN(write_pipe, "i.", "tn", OCLC20_LANG)			LANGBUILTIN(write_pipe, "i.", "tn", OCLC20_LANG)

	LANGBUILTIN(reserve_read_pipe, "i.", "tn", OCLC20_LANG)			LANGBUILTIN(reserve_read_pipe, "i.", "tn", OCLC20_LANG)
	LANGBUILTIN(reserve_write_pipe, "i.", "tn", OCLC20_LANG)			LANGBUILTIN(reserve_write_pipe, "i.", "tn", OCLC20_LANG)

	LANGBUILTIN(commit_write_pipe, "v.", "tn", OCLC20_LANG)			LANGBUILTIN(commit_write_pipe, "v.", "tn", OCLC20_LANG)
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines

	// Win64-compatible va_list functions			// Win64-compatible va_list functions
	BUILTIN(__builtin_ms_va_start, "vc*&.", "nt")			BUILTIN(__builtin_ms_va_start, "vc*&.", "nt")
	BUILTIN(__builtin_ms_va_end, "vc*&", "n")			BUILTIN(__builtin_ms_va_end, "vc*&", "n")
	BUILTIN(__builtin_ms_va_copy, "vc&c&", "n")			BUILTIN(__builtin_ms_va_copy, "vc&c&", "n")

	#undef BUILTIN			#undef BUILTIN
	#undef LIBBUILTIN			#undef LIBBUILTIN
	#undef LANGBUILTIN			#undef LANGBUILTIN
				sameerdsUnsubmitted Done Reply Inline Actions This should be moved to be near line 786, where atomic builtins are listed under the comment "// GCC does not support these, they are a Clang extension." sameerds: This should be moved to be near line 786, where atomic builtins are listed under the comment…

clang/include/clang/Basic/BuiltinsAMDGPU.def

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_amdgcn_wave_barrier, "v", "n")			BUILTIN(__builtin_amdgcn_wave_barrier, "v", "n")
	BUILTIN(__builtin_amdgcn_s_dcache_inv, "v", "n")			BUILTIN(__builtin_amdgcn_s_dcache_inv, "v", "n")
	BUILTIN(__builtin_amdgcn_buffer_wbinvl1, "v", "n")			BUILTIN(__builtin_amdgcn_buffer_wbinvl1, "v", "n")
	BUILTIN(__builtin_amdgcn_ds_gws_init, "vUiUi", "n")			BUILTIN(__builtin_amdgcn_ds_gws_init, "vUiUi", "n")
	BUILTIN(__builtin_amdgcn_ds_gws_barrier, "vUiUi", "n")			BUILTIN(__builtin_amdgcn_ds_gws_barrier, "vUiUi", "n")
	BUILTIN(__builtin_amdgcn_ds_gws_sema_v, "vUi", "n")			BUILTIN(__builtin_amdgcn_ds_gws_sema_v, "vUi", "n")
	BUILTIN(__builtin_amdgcn_ds_gws_sema_br, "vUiUi", "n")			BUILTIN(__builtin_amdgcn_ds_gws_sema_br, "vUiUi", "n")
	BUILTIN(__builtin_amdgcn_ds_gws_sema_p, "vUi", "n")			BUILTIN(__builtin_amdgcn_ds_gws_sema_p, "vUi", "n")
				BUILTIN(__builtin_amdgcn_fence, "vUicC*", "n")

	// FIXME: Need to disallow constant address space.			// FIXME: Need to disallow constant address space.
	BUILTIN(__builtin_amdgcn_div_scale, "dddbb*", "n")			BUILTIN(__builtin_amdgcn_div_scale, "dddbb*", "n")
	BUILTIN(__builtin_amdgcn_div_scalef, "fffbb*", "n")			BUILTIN(__builtin_amdgcn_div_scalef, "fffbb*", "n")
	BUILTIN(__builtin_amdgcn_div_fmas, "ddddb", "nc")			BUILTIN(__builtin_amdgcn_div_fmas, "ddddb", "nc")
	BUILTIN(__builtin_amdgcn_div_fmasf, "ffffb", "nc")			BUILTIN(__builtin_amdgcn_div_fmasf, "ffffb", "nc")
	BUILTIN(__builtin_amdgcn_div_fixup, "dddd", "nc")			BUILTIN(__builtin_amdgcn_div_fixup, "dddd", "nc")
	BUILTIN(__builtin_amdgcn_div_fixupf, "ffff", "nc")			BUILTIN(__builtin_amdgcn_div_fixupf, "ffff", "nc")
	▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

clang/include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,890 Lines • ▼ Show 20 Lines	private:
bool CheckMipsBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);		bool CheckMipsBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);
bool CheckMipsBuiltinCpu(unsigned BuiltinID, CallExpr *TheCall);		bool CheckMipsBuiltinCpu(unsigned BuiltinID, CallExpr *TheCall);
bool CheckMipsBuiltinArgument(unsigned BuiltinID, CallExpr *TheCall);		bool CheckMipsBuiltinArgument(unsigned BuiltinID, CallExpr *TheCall);
bool CheckSystemZBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);		bool CheckSystemZBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);
bool CheckX86BuiltinRoundingOrSAE(unsigned BuiltinID, CallExpr *TheCall);		bool CheckX86BuiltinRoundingOrSAE(unsigned BuiltinID, CallExpr *TheCall);
bool CheckX86BuiltinGatherScatterScale(unsigned BuiltinID, CallExpr *TheCall);		bool CheckX86BuiltinGatherScatterScale(unsigned BuiltinID, CallExpr *TheCall);
bool CheckX86BuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);		bool CheckX86BuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);
bool CheckPPCBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);		bool CheckPPCBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);
		bool CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall);

bool SemaBuiltinVAStart(unsigned BuiltinID, CallExpr *TheCall);		bool SemaBuiltinVAStart(unsigned BuiltinID, CallExpr *TheCall);
bool SemaBuiltinVAStartARMMicrosoft(CallExpr *Call);		bool SemaBuiltinVAStartARMMicrosoft(CallExpr *Call);
bool SemaBuiltinUnorderedCompare(CallExpr *TheCall);		bool SemaBuiltinUnorderedCompare(CallExpr *TheCall);
bool SemaBuiltinFPClassification(CallExpr *TheCall, unsigned NumArgs);		bool SemaBuiltinFPClassification(CallExpr *TheCall, unsigned NumArgs);
bool SemaBuiltinVSX(CallExpr *TheCall);		bool SemaBuiltinVSX(CallExpr *TheCall);
bool SemaBuiltinOSLogFormat(CallExpr *TheCall);		bool SemaBuiltinOSLogFormat(CallExpr *TheCall);

▲ Show 20 Lines • Show All 431 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,700 Lines • ▼ Show 20 Lines	case Builtin::BI__builtin_coro_param:
return EmitCoroutineIntrinsic(E, Intrinsic::coro_param);		return EmitCoroutineIntrinsic(E, Intrinsic::coro_param);

// OpenCL v2.0 s6.13.16.2, Built-in pipe read and write functions		// OpenCL v2.0 s6.13.16.2, Built-in pipe read and write functions
case Builtin::BIread_pipe:		case Builtin::BIread_pipe:
case Builtin::BIwrite_pipe: {		case Builtin::BIwrite_pipe: {
Value *Arg0 = EmitScalarExpr(E->getArg(0)),		Value *Arg0 = EmitScalarExpr(E->getArg(0)),
*Arg1 = EmitScalarExpr(E->getArg(1));		*Arg1 = EmitScalarExpr(E->getArg(1));
CGOpenCLRuntime OpenCLRT(CGM);		CGOpenCLRuntime OpenCLRT(CGM);
Value *PacketSize = OpenCLRT.getPipeElemSize(E->getArg(0));		Value *PacketSize = OpenCLRT.getPipeElemSize(E->getArg(0));
		sameerdsUnsubmitted Not Done Reply Inline Actions The proposed builtin does not claim to be an OpenCL builtin, so it's probably not correct to simply assume the OpenCL model. Should the model be chosen based on the source language specified? sameerds: The proposed builtin does not claim to be an OpenCL builtin, so it's probably not correct to…
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions The only values for AtomicScopeModelKind are none and OpenCL. JonChesterfield: The only values for AtomicScopeModelKind are none and OpenCL.
Value *PacketAlign = OpenCLRT.getPipeElemAlign(E->getArg(0));		Value *PacketAlign = OpenCLRT.getPipeElemAlign(E->getArg(0));

// Type of the generic packet parameter.		// Type of the generic packet parameter.
unsigned GenericAS =		unsigned GenericAS =
getContext().getTargetAddressSpace(LangAS::opencl_generic);		getContext().getTargetAddressSpace(LangAS::opencl_generic);
llvm::Type *I8PTy = llvm::PointerType::get(		llvm::Type *I8PTy = llvm::PointerType::get(
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions Interesting, fence can't be relaxed ‘fence’ instructions take an ordering argument which defines what synchronizes-with edges they add. They can only be given acquire, release, acq_rel, and seq_cst orderings. JonChesterfield: Interesting, fence can't be relaxed > ‘fence’ instructions take an ordering argument which…
llvm::Type::getInt8Ty(getLLVMContext()), GenericAS);		llvm::Type::getInt8Ty(getLLVMContext()), GenericAS);

// Testing which overloaded version we should generate the call for.		// Testing which overloaded version we should generate the call for.
if (2U == E->getNumArgs()) {		if (2U == E->getNumArgs()) {
const char *Name = (BuiltinID == Builtin::BIread_pipe) ? "__read_pipe_2"		const char *Name = (BuiltinID == Builtin::BIread_pipe) ? "__read_pipe_2"
: "__write_pipe_2";		: "__write_pipe_2";
// Creating a generic function type to be able to call with any builtin or		// Creating a generic function type to be able to call with any builtin or
// user defined type.		// user defined type.
▲ Show 20 Lines • Show All 9,889 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
case AMDGPU::BI__builtin_amdgcn_alignbit: {		case AMDGPU::BI__builtin_amdgcn_alignbit: {
llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));		llvm::Value *Src0 = EmitScalarExpr(E->getArg(0));
llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));		llvm::Value *Src1 = EmitScalarExpr(E->getArg(1));
llvm::Value *Src2 = EmitScalarExpr(E->getArg(2));		llvm::Value *Src2 = EmitScalarExpr(E->getArg(2));
Function *F = CGM.getIntrinsic(Intrinsic::fshr, Src0->getType());		Function *F = CGM.getIntrinsic(Intrinsic::fshr, Src0->getType());
return Builder.CreateCall(F, { Src0, Src1, Src2 });		return Builder.CreateCall(F, { Src0, Src1, Src2 });
}		}

case Builtin::BI__builtin_memory_fence: {		case AMDGPU::BI__builtin_amdgcn_fence: {
llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;		llvm::AtomicOrdering AO = llvm::AtomicOrdering::SequentiallyConsistent;
llvm::SyncScope::ID SSID;		llvm::SyncScope::ID SSID;
Value *Order = EmitScalarExpr(E->getArg(0));		Value *Order = EmitScalarExpr(E->getArg(0));
Value *Scope = EmitScalarExpr(E->getArg(1));		Value *Scope = EmitScalarExpr(E->getArg(1));

if (isa<llvm::ConstantInt>(Order)) {		if (isa<llvm::ConstantInt>(Order)) {
int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();		int ord = cast<llvm::ConstantInt>(Order)->getZExtValue();

// Map C11/C++11 memory ordering to LLVM memory ordering		// Map C11/C++11 memory ordering to LLVM memory ordering
		sameerdsUnsubmitted Not Done Reply Inline Actions There should no mention of any high-level language here. The correct enum to validate against is llvm::AtomicOrdering from llvm/Support/AtomicOrdering.h, and not the C ABI or any other language ABI. sameerds: There should no mention of any high-level language here. The correct enum to validate against…
		saiislamAuthorUnsubmitted Done Reply Inline Actions Even though this builtin is supposed to be language-independent, here intention was to provide interface in terms of well known standard C11/C++11 enums for memory order (__ATOMIC_ACQUIRE, etc.), so that user of the builtin don't have to remember and modify their code. The builtin internally maps it as per the expectation of fence instruction. saiislam: Even though this builtin is supposed to be language-independent, here intention was to provide…
switch (static_cast<llvm::AtomicOrderingCABI>(ord)) {		switch (static_cast<llvm::AtomicOrderingCABI>(ord)) {
case llvm::AtomicOrderingCABI::acquire:		case llvm::AtomicOrderingCABI::acquire:
AO = llvm::AtomicOrdering::Acquire;		AO = llvm::AtomicOrdering::Acquire;
break;		break;
case llvm::AtomicOrderingCABI::release:		case llvm::AtomicOrderingCABI::release:
AO = llvm::AtomicOrdering::Release;		AO = llvm::AtomicOrdering::Release;
break;		break;
case llvm::AtomicOrderingCABI::acq_rel:		case llvm::AtomicOrderingCABI::acq_rel:
AO = llvm::AtomicOrdering::AcquireRelease;		AO = llvm::AtomicOrdering::AcquireRelease;
break;		break;
case llvm::AtomicOrderingCABI::seq_cst:		case llvm::AtomicOrderingCABI::seq_cst:
AO = llvm::AtomicOrdering::SequentiallyConsistent;		AO = llvm::AtomicOrdering::SequentiallyConsistent;
break;		break;
case llvm::AtomicOrderingCABI::consume: // not supported by LLVM fence		case llvm::AtomicOrderingCABI::consume: // not supported by LLVM fence
case llvm::AtomicOrderingCABI::relaxed: // not supported by LLVM fence		case llvm::AtomicOrderingCABI::relaxed: // not supported by LLVM fence
break;		break;
}		}

StringRef scp;		StringRef scp;
llvm::getConstantStringInfo(Scope, scp);		llvm::getConstantStringInfo(Scope, scp);
SSID = getLLVMContext().getOrInsertSyncScopeID(scp);		SSID = getLLVMContext().getOrInsertSyncScopeID(scp);
		sameerdsUnsubmitted Not Done Reply Inline Actions This seems to be creating a new ID for any arbitrary string passed as sync scope. This should be validated against LLVMContext::getSyncScopeNames(). sameerds: This seems to be creating a new ID for any arbitrary string passed as sync scope. This should…
		saiislamAuthorUnsubmitted Done Reply Inline Actions As the FE is not aware about specific target and implementation of sync scope for that target, getSyncScopeNames() here returns llvm'sdefault sync scopes, which only supports singlethreaded and system as valid scopes. Validity checking of memory scope string is being intentionally left for the later stages which deal with the generated IR. saiislam: As the FE is not aware about specific target and implementation of sync scope for that target…
		sameerdsUnsubmitted Not Done Reply Inline Actions That's pretty strange. At this point, Clang should know what the target is, and it should have a chance to update the list of sync scopes somewhere. @arsenm, do you see a way around this? sameerds: That's pretty strange. At this point, Clang should know what the target is, and it should have…
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions There is already sufficient IR level checking for the string at the instruction level. Warning in clang as well could be a nicer user experience, but that seems low priority to me. JonChesterfield: There is already sufficient IR level checking for the string at the instruction level. Warning…
		sameerdsUnsubmitted Not Done Reply Inline Actions If there is some checking happening anywhere, then that needs to be demonstrated in a testcase where the input high-level program passes an illegal string as the scope argument. sameerds: If there is some checking happening anywhere, then that needs to be demonstrated in a testcase…

return Builder.CreateFence(AO, SSID);		return Builder.CreateFence(AO, SSID);
}		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
}		}
default:		default:
return nullptr;		return nullptr;
}		}
▲ Show 20 Lines • Show All 1,886 Lines • Show Last 20 Lines

clang/lib/Sema/SemaChecking.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,864 Lines • ▼ Show 20 Lines	case Builtin::BI__builtin_return_address: {
if (TheCall->getArg(0)->EvaluateAsInt(Result, getASTContext()) &&		if (TheCall->getArg(0)->EvaluateAsInt(Result, getASTContext()) &&
Result.Val.getInt() != 0)		Result.Val.getInt() != 0)
Diag(TheCall->getBeginLoc(), diag::warn_frame_address)		Diag(TheCall->getBeginLoc(), diag::warn_frame_address)
<< ((BuiltinID == Builtin::BI__builtin_return_address)		<< ((BuiltinID == Builtin::BI__builtin_return_address)
? "__builtin_return_address"		? "__builtin_return_address"
: "__builtin_frame_address")		: "__builtin_frame_address")
<< TheCall->getSourceRange();		<< TheCall->getSourceRange();
} break;		} break;

case Builtin::BI__builtin_memory_fence: {
ExprResult Arg = TheCall->getArg(0);
auto ArgExpr = Arg.get();
Expr::EvalResult ArgResult;

if(!ArgExpr->EvaluateAsInt(ArgResult, Context)) {
Diag(ArgExpr->getExprLoc(), diag::err_typecheck_expect_int)
<< ArgExpr->getType();
return ExprError();
}
int ord = ArgResult.Val.getInt().getZExtValue();

// Check valididty of memory ordering as per C11 / C++11's memody model.
switch (static_cast<llvm::AtomicOrderingCABI>(ord)) {
case llvm::AtomicOrderingCABI::acquire:
case llvm::AtomicOrderingCABI::release:
case llvm::AtomicOrderingCABI::acq_rel:
case llvm::AtomicOrderingCABI::seq_cst:
break;
default: {
Diag(ArgExpr->getBeginLoc(),
diag::warn_atomic_op_has_invalid_memory_order)
<< ArgExpr->getSourceRange();
return ExprError();
}
}
} break;
}		}

// Since the target specific builtins for each arch overlap, only check those		// Since the target specific builtins for each arch overlap, only check those
// of the arch we are compiling for.		// of the arch we are compiling for.
if (Context.BuiltinInfo.isTSBuiltin(BuiltinID)) {		if (Context.BuiltinInfo.isTSBuiltin(BuiltinID)) {
switch (Context.getTargetInfo().getTriple().getArch()) {		switch (Context.getTargetInfo().getTriple().getArch()) {
case llvm::Triple::arm:		case llvm::Triple::arm:
case llvm::Triple::armeb:		case llvm::Triple::armeb:
case llvm::Triple::thumb:		case llvm::Triple::thumb:
case llvm::Triple::thumbeb:		case llvm::Triple::thumbeb:
if (CheckARMBuiltinFunctionCall(BuiltinID, TheCall))		if (CheckARMBuiltinFunctionCall(BuiltinID, TheCall))
return ExprError();		return ExprError();
break;		break;
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions isIntegerType will return true for signed integers as well as unsigned. It seems reasonable to call this with a signed integer type (e.g. '2'), so perhaps the references to unsigned should be dropped from the code and error message JonChesterfield: isIntegerType will return true for signed integers as well as unsigned. It seems reasonable to…
		AnastasiaUnsubmitted Not Done Reply Inline Actions I think we should accept implicit type conversion rules from the language... Anastasia: I think we should accept implicit type conversion rules from the language...
case llvm::Triple::aarch64:		case llvm::Triple::aarch64:
case llvm::Triple::aarch64_32:		case llvm::Triple::aarch64_32:
case llvm::Triple::aarch64_be:		case llvm::Triple::aarch64_be:
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions I think I'd write this as a switch over the enum instead of a ranged compare. It'll codegen to the same thing, but we'll get warnings if more values are introduced to the enum and things will keep working (here, anyway) if the values are reordered. JonChesterfield: I think I'd write this as a switch over the enum instead of a ranged compare. It'll codegen to…
if (CheckAArch64BuiltinFunctionCall(BuiltinID, TheCall))		if (CheckAArch64BuiltinFunctionCall(BuiltinID, TheCall))
return ExprError();		return ExprError();
break;		break;
case llvm::Triple::bpfeb:		case llvm::Triple::bpfeb:
case llvm::Triple::bpfel:		case llvm::Triple::bpfel:
if (CheckBPFBuiltinFunctionCall(BuiltinID, TheCall))		if (CheckBPFBuiltinFunctionCall(BuiltinID, TheCall))
		JonChesterfieldUnsubmitted Not Done Reply Inline Actions This should reject 'relaxed' - I think that's currently accepted by sema then silently thrown away by codegen JonChesterfield: This should reject 'relaxed' - I think that's currently accepted by sema then silently thrown…
return ExprError();		return ExprError();
break;		break;
case llvm::Triple::hexagon:		case llvm::Triple::hexagon:
if (CheckHexagonBuiltinFunctionCall(BuiltinID, TheCall))		if (CheckHexagonBuiltinFunctionCall(BuiltinID, TheCall))
return ExprError();		return ExprError();
break;		break;
case llvm::Triple::mips:		case llvm::Triple::mips:
case llvm::Triple::mipsel:		case llvm::Triple::mipsel:
Show All 12 Lines	switch (Context.getTargetInfo().getTriple().getArch()) {
return ExprError();		return ExprError();
break;		break;
case llvm::Triple::ppc:		case llvm::Triple::ppc:
case llvm::Triple::ppc64:		case llvm::Triple::ppc64:
case llvm::Triple::ppc64le:		case llvm::Triple::ppc64le:
if (CheckPPCBuiltinFunctionCall(BuiltinID, TheCall))		if (CheckPPCBuiltinFunctionCall(BuiltinID, TheCall))
return ExprError();		return ExprError();
break;		break;
		case llvm::Triple::amdgcn:
		if (CheckAMDGCNBuiltinFunctionCall(BuiltinID, TheCall))
		return ExprError();
		break;
default:		default:
break;		break;
}		}
}		}

return TheCallResult;		return TheCallResult;
}		}

▲ Show 20 Lines • Show All 985 Lines • ▼ Show 20 Lines	case PPC::BI__builtin_unpack_vector_int128:
return SemaVSXCheck(TheCall) \|\|		return SemaVSXCheck(TheCall) \|\|
SemaBuiltinConstantArgRange(TheCall, 1, 0, 1);		SemaBuiltinConstantArgRange(TheCall, 1, 0, 1);
case PPC::BI__builtin_pack_vector_int128:		case PPC::BI__builtin_pack_vector_int128:
return SemaVSXCheck(TheCall);		return SemaVSXCheck(TheCall);
}		}
return SemaBuiltinConstantArgRange(TheCall, i, l, u);		return SemaBuiltinConstantArgRange(TheCall, i, l, u);
}		}

		bool Sema::CheckAMDGCNBuiltinFunctionCall(unsigned BuiltinID, CallExpr *TheCall) {
		switch (BuiltinID) {
		case AMDGPU::BI__builtin_amdgcn_fence: {
		ExprResult Arg = TheCall->getArg(0);
		auto ArgExpr = Arg.get();
		Expr::EvalResult ArgResult;

		if(!ArgExpr->EvaluateAsInt(ArgResult, Context)) {
		return Diag(ArgExpr->getExprLoc(), diag::err_typecheck_expect_int)
		<< ArgExpr->getType();
		}
		int ord = ArgResult.Val.getInt().getZExtValue();

		// Check valididty of memory ordering as per C11 / C++11's memody model.
		switch (static_cast<llvm::AtomicOrderingCABI>(ord)) {
		case llvm::AtomicOrderingCABI::acquire:
		case llvm::AtomicOrderingCABI::release:
		case llvm::AtomicOrderingCABI::acq_rel:
		case llvm::AtomicOrderingCABI::seq_cst:
		break;
		default: {
		return Diag(ArgExpr->getBeginLoc(),
		diag::warn_atomic_op_has_invalid_memory_order)
		<< ArgExpr->getSourceRange();
		}
		}
		} break;
		}
		return false;
		}

		saiislamAuthorUnsubmitted Done Reply Inline Actions @sameerds here is the check for sync scope to be constant literal saiislam: @sameerds here is the check for sync scope to be constant literal
bool Sema::CheckSystemZBuiltinFunctionCall(unsigned BuiltinID,		bool Sema::CheckSystemZBuiltinFunctionCall(unsigned BuiltinID,
CallExpr *TheCall) {		CallExpr *TheCall) {
if (BuiltinID == SystemZ::BI__builtin_tabort) {		if (BuiltinID == SystemZ::BI__builtin_tabort) {
Expr *Arg = TheCall->getArg(0);		Expr *Arg = TheCall->getArg(0);
llvm::APSInt AbortCode(32);		llvm::APSInt AbortCode(32);
if (Arg->isIntegerConstantExpr(AbortCode, Context) &&		if (Arg->isIntegerConstantExpr(AbortCode, Context) &&
AbortCode.getSExtValue() >= 0 && AbortCode.getSExtValue() < 256)		AbortCode.getSExtValue() >= 0 && AbortCode.getSExtValue() < 256)
return Diag(Arg->getBeginLoc(), diag::err_systemz_invalid_tabort_code)		return Diag(Arg->getBeginLoc(), diag::err_systemz_invalid_tabort_code)
▲ Show 20 Lines • Show All 11,653 Lines • Show Last 20 Lines

clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: not %clang_cc1 %s -S \
				// RUN: -triple=amdgcn-amd-amdhsa 2>&1 \| FileCheck %s

				void test_amdgcn_fence_failure() {
				arsenmUnsubmitted Not Done Reply Inline Actions Does this really depend on C++? Can it use OpenCL like the other builtin tests?This also belongs in a Sema* test directory since it's checking an error arsenm: Does this really depend on C++? Can it use OpenCL like the other builtin tests?This also…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions Making it opencl-only would force some of the openmp runtime to be written in opencl, which is not presently the case. Currently that library is written in a dialect of hip, but there's a plan to implement it in openmp instead. I'd much rather this builtin work from any language, instead of tying it to opencl, as that means one can use it from openmp target regions. JonChesterfield: Making it opencl-only would force some of the openmp runtime to be written in opencl, which is…
				b-sumnerUnsubmitted Not Done Reply Inline Actions I thought the question was about this test itself. The test being in CodeGenOpenCL doesn't affect whether other languages can use the builtin. Why not put this test in CodeGenOpenCL alongside all of the other builtins-amdgcn-.cl ? b-sumner:* I thought the question was about this test itself. The test being in CodeGenOpenCL doesn't…
				saiislamAuthorUnsubmitted Done Reply Inline Actions This test is basically relying on implementation of sync scope in the AMDGCN backend to check for validity, and is not testing the code generation capability. Doesn't it make more sense in Sema directory? Won't putting it in SemaOpenCL or SemaOpenCLCXX indicate some kind of relation with OpenCL? saiislam: This test is basically relying on implementation of sync scope in the AMDGCN backend to check…

				// CHECK: error: Unsupported atomic synchronization scope
				__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "foobar");
				}
				No newline at end of file

clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp

This file was moved from clang/test/CodeGenHIP/builtin_memory_fence.cpp.

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
				JonChesterfieldUnsubmitted Done Reply Inline Actions Codegen test should be under CodeGen and/or CodeGenCXX JonChesterfield: Codegen test should be under CodeGen and/or CodeGenCXX
	// RUN: %clang_cc1 %s -x hip -emit-llvm -O0 -o - \			// RUN: %clang_cc1 %s -emit-llvm -O0 -o - \
	// RUN: -triple=amdgcn-amd-amdhsa \| opt -instnamer -S \| FileCheck %s			// RUN: -triple=amdgcn-amd-amdhsa \| opt -instnamer -S \| FileCheck %s

	void test_memory_fence_success() {			void test_memory_fence_success() {
				sameerdsUnsubmitted Done Reply Inline Actions There should be a line that tries to do: __builtin_memory_fence(__ATOMIC_SEQ_CST, "foobar"); sameerds: There should be a line that tries to do: __builtin_memory_fence(__ATOMIC_SEQ_CST, "foobar");
	// CHECK-LABEL: test_memory_fence_success			// CHECK-LABEL: test_memory_fence_success

	// CHECK: fence syncscope("workgroup") seq_cst			// CHECK: fence syncscope("workgroup") seq_cst
	__builtin_memory_fence(__ATOMIC_SEQ_CST, "workgroup");			__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "workgroup");
				sameerdsUnsubmitted Not Done Reply Inline Actions Orderings like `__ATOMIC_SEQ_CST` are defined for C/C++ memory models. They should not be used with the new builtin because this new builtin does not follow any specific language model. For user convenience, the right thing to do is to introduce new tokens in the Clang preprocessor, similar to the `__ATOMIC_` tokens. The convenient shortcut is to just tell the user to supply numerical values by looking at the LLVM source code. From llvm/Support/AtomicOrdering.h, note how the numerical value for `__ATOMIC_SEQ_CST` is 5, but the numerical value for the LLVM SequentiallyConsistent ordering is 7. The numerical value 5 refers to the LLVM ordering "release". So, if the implementation were correct, this line should result in the following unexpected LLVM IR: fence syncscope("workgroup") release sameerds:* Orderings like `__ATOMIC_SEQ_CST` are defined for C/C++ memory models. They should not be used…
				saiislamAuthorUnsubmitted Done Reply Inline Actions As you pointed out, the range of acquire to sequentiallly consistent memory orders for llvm::AtomicOrdering is [4, 7], while for llvm::AtomicOrderingCABI is [2, 5]. Enums of C ABI was taken to ensure easy of use for the users who are familiar with C/C++ standard memory model. It allows them to use macros like __ATOMIC_ACQUIRE etc. Clang CodeGen of the builtin internally maps C ABI ordering to llvm atomic ordering. saiislam: As you pointed out, the range of acquire to sequentiallly consistent memory orders for llvm…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions What language, implemented in clang, do you have in mind that reusing the existing __ATOMIC_* macros would be incorrect for? JonChesterfield: What language, implemented in clang, do you have in mind that reusing the existing __ATOMIC_*…
				sameerdsUnsubmitted Not Done Reply Inline Actions I think we agreed that this builtin exposes the LLVM fence exactly. That would mean it takes arguments defined by LLVM. If you are implementing something different from that, then it first needs to be specified properly. Perhaps you could say that this is a C ABI compatible builtin, that happens to take target specific scopes? That should cover OpenCL whose scope enum is designed to be compatible with C. Whatever it is that you are trying to implement here, it definitely does not expose a raw LLVM fence. sameerds: I think we agreed that this builtin exposes the LLVM fence exactly. That would mean it takes…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions The llvm fence, in text form, uses a symbol for the memory scope. Not an enum. This symbol is set using these macros for the existing atomic builtins. Using an implementation detail of clang instead is strictly worse, by layering and by precedent. ABI is not involved here. Nor is OpenCl. JonChesterfield: The llvm fence, in text form, uses a symbol for the memory scope. Not an enum. This symbol is…
				sameerdsUnsubmitted Not Done Reply Inline Actions The `__ATOMIC_` symbols in Clang quite literally represent the C/C++ ABI. See the details in AtomicOrdering.h and InitPreprocessor.cpp. I am not opposed to specifying that the builtin expects these symbols, but then it is incorrect to say that the builtin exposes the raw LLVM builtin. It is a C-ABI-compatible builtin that happens to take target-specific scope as a string argument. And that would also make it an overload of the already existing builting __atomic_fence(). sameerds:* The `__ATOMIC_*` symbols in Clang quite literally represent the C/C++ ABI. See the details in…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I don't know what you mean by "raw", but am guessing you're asking for documentation for the intrinsic. Said documentation should indeed be added for this builtin - it'll probably be in a tablegen file. JonChesterfield: I don't know what you mean by "raw", but am guessing you're asking for documentation for the…
				JonChesterfieldUnsubmitted Not Done Reply Inline Actions I will try to stop using builtin and intrinsic as synonyms. JonChesterfield: I will try to stop using builtin and intrinsic as synonyms.
				sameerdsUnsubmitted Not Done Reply Inline Actions Right. It's actually an LLVM instruction, not even an intrinsic. I am guilty of using the wrong term quite often, but usually the context helps. I think the following is still needed: A testcase that exercises the builtin with an invalid string argument for scope. An update to the change description describing what is being introduced here. It is more than a direct mapping from builtin to instruction. The C ABI is involved. An update to the Clang documentation describing this new builtin: https://clang.llvm.org/docs/LanguageExtensions.html#builtin-functions sameerds: Right. It's actually an LLVM instruction, not even an intrinsic. I am guilty of using the wrong…

	// CHECK: fence syncscope("agent") acquire			// CHECK: fence syncscope("agent") acquire
	__builtin_memory_fence(__ATOMIC_ACQUIRE, "agent");			__builtin_amdgcn_fence(__ATOMIC_ACQUIRE, "agent");

	// CHECK: fence seq_cst			// CHECK: fence seq_cst
	__builtin_memory_fence(__ATOMIC_SEQ_CST, "");			__builtin_amdgcn_fence(__ATOMIC_SEQ_CST, "");

	// CHECK: fence syncscope("agent") acq_rel			// CHECK: fence syncscope("agent") acq_rel
	__builtin_memory_fence(4, "agent");			__builtin_amdgcn_fence(4, "agent");

	// CHECK: fence syncscope("workgroup") release			// CHECK: fence syncscope("workgroup") release
	__builtin_memory_fence(3, "workgroup");			__builtin_amdgcn_fence(3, "workgroup");
				}
	// CHECK: fence syncscope("foobar") release
	__builtin_memory_fence(3, "foobar");
	}
	No newline at end of file

clang/test/CodeGenHIP/builtin_memory_fence.cpp

This file was moved to clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp.

clang/test/Sema/builtins.c

Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	void test23() {
my_memcpy(buf, src, 11); // expected-warning{{'memcpy' will always overflow; destination buffer has size 10, but size argument is 11}}		my_memcpy(buf, src, 11); // expected-warning{{'memcpy' will always overflow; destination buffer has size 10, but size argument is 11}}
}		}

// Test that __builtin_is_constant_evaluated() is not allowed in C		// Test that __builtin_is_constant_evaluated() is not allowed in C
int test_cxx_builtin() {		int test_cxx_builtin() {
// expected-error@+1 {{use of unknown builtin '__builtin_is_constant_evaluated'}}		// expected-error@+1 {{use of unknown builtin '__builtin_is_constant_evaluated'}}
return __builtin_is_constant_evaluated();		return __builtin_is_constant_evaluated();
}		}

void test_memory_fence_errors() {
__builtin_memory_fence(__ATOMIC_SEQ_CST + 1, "workgroup"); // expected-warning {{memory order argument to atomic operation is invalid}}

__builtin_memory_fence(__ATOMIC_ACQUIRE - 1, "workgroup"); // expected-warning {{memory order argument to atomic operation is invalid}}

__builtin_memory_fence(4); // expected-error {{too few arguments to function call, expected 2}}

__builtin_memory_fence(4, 4, 4); // expected-error {{too many arguments to function call, expected 2}}

__builtin_memory_fence(3.14, ""); // expected-warning {{implicit conversion from 'double' to 'unsigned int' changes value from 3.14 to 3}}
}

clang/test/SemaOpenCL/builtins-amdgcn-error.cl

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	void test_ds_fminf(local float *out, float src, int a) {
*out = __builtin_amdgcn_ds_fminf(out, src, 0, 0, a); // expected-error {{argument to '__builtin_amdgcn_ds_fminf' must be a constant integer}}		*out = __builtin_amdgcn_ds_fminf(out, src, 0, 0, a); // expected-error {{argument to '__builtin_amdgcn_ds_fminf' must be a constant integer}}
}		}

void test_ds_fmaxf(local float *out, float src, int a) {		void test_ds_fmaxf(local float *out, float src, int a) {
*out = __builtin_amdgcn_ds_fmaxf(out, src, a, 0, false); // expected-error {{argument to '__builtin_amdgcn_ds_fmaxf' must be a constant integer}}		*out = __builtin_amdgcn_ds_fmaxf(out, src, a, 0, false); // expected-error {{argument to '__builtin_amdgcn_ds_fmaxf' must be a constant integer}}
*out = __builtin_amdgcn_ds_fmaxf(out, src, 0, a, false); // expected-error {{argument to '__builtin_amdgcn_ds_fmaxf' must be a constant integer}}		*out = __builtin_amdgcn_ds_fmaxf(out, src, 0, a, false); // expected-error {{argument to '__builtin_amdgcn_ds_fmaxf' must be a constant integer}}
*out = __builtin_amdgcn_ds_fmaxf(out, src, 0, 0, a); // expected-error {{argument to '__builtin_amdgcn_ds_fmaxf' must be a constant integer}}		*out = __builtin_amdgcn_ds_fmaxf(out, src, 0, 0, a); // expected-error {{argument to '__builtin_amdgcn_ds_fmaxf' must be a constant integer}}
}		}

		void test_fence() {
		__builtin_amdgcn_fence(__ATOMIC_SEQ_CST + 1, "workgroup"); // expected-warning {{memory order argument to atomic operation is invalid}}
		__builtin_amdgcn_fence(__ATOMIC_ACQUIRE - 1, "workgroup"); // expected-warning {{memory order argument to atomic operation is invalid}}
		__builtin_amdgcn_fence(4); // expected-error {{too few arguments to function call, expected 2}}
		__builtin_amdgcn_fence(4, 4, 4); // expected-error {{too many arguments to function call, expected 2}}
		__builtin_amdgcn_fence(3.14, ""); // expected-warning {{implicit conversion from 'double' to 'unsigned int' changes value from 3.14 to 3}}
		}

This is an archive of the discontinued LLVM Phabricator instance.

Expose llvm fence instruction as clang intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 259239

clang/docs/LanguageExtensions.rst

clang/include/clang/Basic/Builtins.def

clang/include/clang/Basic/BuiltinsAMDGPU.def

clang/include/clang/Sema/Sema.h

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/Sema/SemaChecking.cpp

clang/test/CodeGenCXX/builtin-amdgcn-fence-failure.cpp

clang/test/CodeGenCXX/builtin-amdgcn-fence.cpp

clang/test/CodeGenHIP/builtin_memory_fence.cpp

clang/test/Sema/builtins.c

clang/test/SemaOpenCL/builtins-amdgcn-error.cl

Expose llvm fence instruction as clang intrinsic
ClosedPublic