Page MenuHomePhabricator

Add OpenCL 2.0 atomic builtin functions as Clang builtin
ClosedPublic

Authored by yaxunl on Jan 13 2017, 12:16 PM.

Details

Summary

OpenCL 2.0 atomic builtin functions have a scope argument which is ideally represented as synchronization scope argument in LLVM atomic instructions.

Clang supports translating Clang atomic builtin functions to LLVM atomic instructions. However it currently does not support synchronization scope of LLVM atomic instructions. Without this, users have to use LLVM assembly code to implement OpenCL atomic builtin functions.

This patch adds OpenCL 2.0 atomic builtin functions as Clang builtin functions, which supports generating LLVM atomic instructions with synchronization scope operand.

Currently only constant memory scope argument is supported. Support of non-constant memory scope argument will be added later.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
yaxunl added inline comments.Jul 25 2017, 7:16 AM
test/CodeGenOpenCL/atomic-ops.cl
7 ↗(On Diff #102374)

This is to test the builtin works in pch. When generating pch, ALREADY_INCLUDED is undefined, therefore pch will include all functions. When including pch, since ALREADY_INCLUDED is defined through pch, the cl file is empty and function definitions from pch is used for codegen.

15 ↗(On Diff #102374)

will add them.

32 ↗(On Diff #102374)

No,

test/SemaOpenCL/atomic-ops.cl
22 ↗(On Diff #102374)

will do

30 ↗(On Diff #102374)

will add tests for different addr spaces.

yaxunl updated this revision to Diff 108073.Jul 25 2017, 7:20 AM
yaxunl marked 16 inline comments as done.
yaxunl edited the summary of this revision. (Show Details)
yaxunl added a reviewer: kzhuravl.

Rebased to ToT and revised by Anastasia's comments.

b-sumner added inline comments.Jul 25 2017, 7:32 AM
include/clang/Basic/Builtins.def
713 ↗(On Diff #102374)

Yes, they are. Please look again at 6.13.11.7.5 in the 2.0 C spec.

yaxunl marked 2 inline comments as done.Jul 25 2017, 7:56 AM
yaxunl added inline comments.
include/clang/Basic/Builtins.def
713 ↗(On Diff #102374)

sorry I thought they are just some atomic extensions since C++11 atomic builtins do not have those. I will add them.

kzhuravl edited edge metadata.Jul 25 2017, 10:43 AM

Seems like SyncScope.h file is missing?

b-sumner edited edge metadata.Jul 25 2017, 11:32 AM

Can we drop the "opencl" part of the name and use something like __scoped_atomic_*? Also, it may not make sense to support non-constant scope here since we can't predict what other scopes may be added by other languages in the future.

yaxunl marked an inline comment as done.Jul 25 2017, 11:40 AM

Seems like SyncScope.h file is missing?

Right. I will add it.

yaxunl updated this revision to Diff 108127.Jul 25 2017, 11:41 AM

Add min/max and missing file.

Can we drop the "opencl" part of the name and use something like __scoped_atomic_*? Also, it may not make sense to support non-constant scope here since we can't predict what other scopes may be added by other languages in the future.

we could use the approach of LangAS, i.e. we allow targets to map all language specific scopes to target-specific scope names, since IR only cares about scope names, which are target specific. And this is what the current implementation does.

I have no objection to use the __scoped_atomic_ name. It is more general and extensible. John/Anastasia, any comments? Thanks.

Can we drop the "opencl" part of the name and use something like __scoped_atomic_*? Also, it may not make sense to support non-constant scope here since we can't predict what other scopes may be added by other languages in the future.

we could use the approach of LangAS, i.e. we allow targets to map all language specific scopes to target-specific scope names, since IR only cares about scope names, which are target specific. And this is what the current implementation does.

I have no objection to use the __scoped_atomic_ name. It is more general and extensible. John/Anastasia, any comments? Thanks.

I think I would prefer __opencl_atomic_* until we have some evidence that this concept is more general than just OpenCL.

t-tye added a subscriber: t-tye.Jul 25 2017, 12:41 PM

Can we drop the "opencl" part of the name and use something like __scoped_atomic_*? Also, it may not make sense to support non-constant scope here since we can't predict what other scopes may be added by other languages in the future.

we could use the approach of LangAS, i.e. we allow targets to map all language specific scopes to target-specific scope names, since IR only cares about scope names, which are target specific. And this is what the current implementation does.

I have no objection to use the __scoped_atomic_ name. It is more general and extensible. John/Anastasia, any comments? Thanks.

I think I would prefer __opencl_atomic_* until we have some evidence that this concept is more general than just OpenCL.

There are other languages for heterogeneous compute that have scopes, although not exposed quite as explicitly as OpenCL. For example AMD's "HC" language. And any language making use of clang and targeting SPIR-V would likely use these builtins. I think a more generic prefix is appropriate, and "scoped" tells us exactly when these are needed.

There are other languages for heterogeneous compute that have scopes, although not exposed quite as explicitly as OpenCL. For example AMD's "HC" language. And any language making use of clang and targeting SPIR-V would likely use these builtins. I think a more generic prefix is appropriate, and "scoped" tells us exactly when these are needed.

But would those languages use the same language design for these scopes as OpenCL if they did expose them, as opposed to some more elaborate scoping specification? My objection is not that the concept is inherently OpenCL-specific, it's that the presentation in the language might be inherently OpenCL-specific, which makes staying in the opencl namespace is prudent.

There are other languages for heterogeneous compute that have scopes, although not exposed quite as explicitly as OpenCL. For example AMD's "HC" language. And any language making use of clang and targeting SPIR-V would likely use these builtins. I think a more generic prefix is appropriate, and "scoped" tells us exactly when these are needed.

But would those languages use the same language design for these scopes as OpenCL if they did expose them, as opposed to some more elaborate scoping specification? My objection is not that the concept is inherently OpenCL-specific, it's that the presentation in the language might be inherently OpenCL-specific, which makes staying in the opencl namespace is prudent.

Are you envisioning a language far enough from C/C++ that a standard library or header would not be able to map a scoped atomic operation into a call to one of these new builtins? Would we expect more of such languages than languages that would do such a mapping?

There are other languages for heterogeneous compute that have scopes, although not exposed quite as explicitly as OpenCL. For example AMD's "HC" language. And any language making use of clang and targeting SPIR-V would likely use these builtins. I think a more generic prefix is appropriate, and "scoped" tells us exactly when these are needed.

But would those languages use the same language design for these scopes as OpenCL if they did expose them, as opposed to some more elaborate scoping specification? My objection is not that the concept is inherently OpenCL-specific, it's that the presentation in the language might be inherently OpenCL-specific, which makes staying in the opencl namespace is prudent.

Are you envisioning a language far enough from C/C++ that a standard library or header would not be able to map a scoped atomic operation into a call to one of these new builtins? Would we expect more of such languages than languages that would do such a mapping?

If you're using Clang as a frontend for your language, it must be similar enough to C to call a builtin function. That's not at issue. The central question here is whether these builtins are meaningfully general outside of OpenCL. The concept of heterogenous computation is certainly not specific to OpenCL; however, these builtins are defined in terms of scopes — "work item", "work group", "device", and "all svm devices" — which, it seems to me, are basically only defined by reference to the OpenCL architecture. A different heterogenous compute environment might reasonably formalize scopes in a completely different way; for example, it might wish to be more explicit about exactly which peers / devices to synchronize with.

SPIR is explicitly defined on top of the OpenCL model. Users should be able to use OpenCL builtins when targeting it. That does not make those builtins more general than OpenCL.

John.

t-tye added inline comments.Jul 25 2017, 3:53 PM
include/clang/Basic/Builtins.def
717 ↗(On Diff #108127)

Will the OpenCL 2.0 memory fences also be supported which also have a memory order and memory scope?

include/clang/Basic/SyncScope.h
25–29 ↗(On Diff #108127)

Since the builtins are being named as __opencl then should these also be named as opencl_ since they are the memory scopes for OpenCL using the OpenCL numeric values?

If another language wants to use memory scopes, would it then add its own langx_* names in a similar way that is done for address spaces where the LangAS enumeration type has values for each distinct language. Each target is then responsible for mapping each language scope to the appropriate target specific scope as is done for address spaces?

If so then the builtins are really supporting the concept of memory scopes and are not language specific as this enumeration can support multiple languages in the same way as the LangAS enumeration supports multiple languages. If so the builtins would be better named to reflect this as @b-sumner suggested.

Anastasia edited edge metadata.Jul 28 2017, 3:14 AM

There are other languages for heterogeneous compute that have scopes, although not exposed quite as explicitly as OpenCL. For example AMD's "HC" language. And any language making use of clang and targeting SPIR-V would likely use these builtins. I think a more generic prefix is appropriate, and "scoped" tells us exactly when these are needed.

But would those languages use the same language design for these scopes as OpenCL if they did expose them, as opposed to some more elaborate scoping specification? My objection is not that the concept is inherently OpenCL-specific, it's that the presentation in the language might be inherently OpenCL-specific, which makes staying in the opencl namespace is prudent.

Are you envisioning a language far enough from C/C++ that a standard library or header would not be able to map a scoped atomic operation into a call to one of these new builtins? Would we expect more of such languages than languages that would do such a mapping?

If you're using Clang as a frontend for your language, it must be similar enough to C to call a builtin function. That's not at issue. The central question here is whether these builtins are meaningfully general outside of OpenCL. The concept of heterogenous computation is certainly not specific to OpenCL; however, these builtins are defined in terms of scopes — "work item", "work group", "device", and "all svm devices" — which, it seems to me, are basically only defined by reference to the OpenCL architecture. A different heterogenous compute environment might reasonably formalize scopes in a completely different way; for example, it might wish to be more explicit about exactly which peers / devices to synchronize with.

SPIR is explicitly defined on top of the OpenCL model. Users should be able to use OpenCL builtins when targeting it. That does not make those builtins more general than OpenCL.

John.

The scope concept in OpenCL is fairly generic. And the builtins just take one extra argument on top of the existing C11 builtin style. The OpenCL scopes have of course specific meaning to the OpenCL model, but there is nothing preventing other uses of the scope feature. As far as I understand atomic scope implementation in LLVM is fairly generic wrt scope types and it is intended for broader functionality than just OpenCL. So I would vote for having this as generic as possible in Clang too even though I don't think name prefix __opencl is preventing from using this feature in other languages unless we would disallow the builtins in the other language dialects. Which is not the case with the current patch because the builtins are added as ATOMIC_BUILTIN and not LANGBUILTIN. We have for example used C11 builtin in OpenCL already. So other cases are possible too.

include/clang/Basic/SyncScope.h
25–29 ↗(On Diff #108127)

We generally prefix the names of OpenCL specific implementations. So perhaps we should do some renaming here in case we don't intend this to be generic implementation.

lib/Sema/SemaChecking.cpp
3160 ↗(On Diff #108127)

Could we merge this and the line after please.

There are other languages for heterogeneous compute that have scopes, although not exposed quite as explicitly as OpenCL. For example AMD's "HC" language. And any language making use of clang and targeting SPIR-V would likely use these builtins. I think a more generic prefix is appropriate, and "scoped" tells us exactly when these are needed.

But would those languages use the same language design for these scopes as OpenCL if they did expose them, as opposed to some more elaborate scoping specification? My objection is not that the concept is inherently OpenCL-specific, it's that the presentation in the language might be inherently OpenCL-specific, which makes staying in the opencl namespace is prudent.

Are you envisioning a language far enough from C/C++ that a standard library or header would not be able to map a scoped atomic operation into a call to one of these new builtins? Would we expect more of such languages than languages that would do such a mapping?

If you're using Clang as a frontend for your language, it must be similar enough to C to call a builtin function. That's not at issue. The central question here is whether these builtins are meaningfully general outside of OpenCL. The concept of heterogenous computation is certainly not specific to OpenCL; however, these builtins are defined in terms of scopes — "work item", "work group", "device", and "all svm devices" — which, it seems to me, are basically only defined by reference to the OpenCL architecture. A different heterogenous compute environment might reasonably formalize scopes in a completely different way; for example, it might wish to be more explicit about exactly which peers / devices to synchronize with.

SPIR is explicitly defined on top of the OpenCL model. Users should be able to use OpenCL builtins when targeting it. That does not make those builtins more general than OpenCL.

John.

The scope concept in OpenCL is fairly generic. And the builtins just take one extra argument on top of the existing C11 builtin style. The OpenCL scopes have of course specific meaning to the OpenCL model, but there is nothing preventing other uses of the scope feature.

Yes, it is possible that some other language could introduce exactly the same scope-atomics concept only with a slightly different enumeration of scopes. But then we really shouldn't allow the OpenCL scopes to be used in that language, which means there would still not be a language-independent way of using these builtins.

As far as I understand atomic scope implementation in LLVM is fairly generic wrt scope types and it is intended for broader functionality than just OpenCL.

In some ways this is reasoning the wrong way around. I am not deeply informed about heterogenous computing, so I am happy to accept that llvm::SynchronizationScope is well-designed for our current needs. But LLVM's representation is, by design, ultimately just an implementation detail and can be easily updated — it's just a matter of changing a few calls and adding an upgrade path to the bitcode loader, exactly as we did when we introduced llvm::SynchronizationScope. That is not true of source language features, even builtins; their design is a contract with programmers, and the bar is substantially higher.

We do not have a compelling reason to claim that these are generally useful, so we should not. They should stay namespaced and be flagged as language-specific builtins.

John.

rjmccall added inline comments.Jul 28 2017, 8:48 AM
include/clang/Basic/SyncScope.h
1 ↗(On Diff #108127)

Capitalization.

20 ↗(On Diff #108127)

LLVM uses this namespace pattern in code that predates the addition of scoped enums to C++. Those days are behind us; we should just use a scoped enum.

22 ↗(On Diff #108127)

It defines the synch scope values used by the atomic builtins and expressions. LLVM's headers define the values used by the instructions.

25–29 ↗(On Diff #108127)

I agree that we should name the OpenCL-specific ones, like WorkGroup, with an OpenCL prefix.

32 ↗(On Diff #108127)

This is C++; please just use () instead of (void).

lib/CodeGen/CGAtomic.cpp
503 ↗(On Diff #108127)

None of the .getTypePtr() stuff here is necessary.

This function shouldn't really be necessary. I would encourage you to add a getValueType() accessor to AtomicExpr:

QualType AtomicExpr::getValueType() const {
  auto T = getPtr()->getType()->castTo<PointerType>()->getPointeeType();
  if (auto AT = T->getAs<AtomicType>()) {
    return AT->getValueType();
  } else {
    return T;
  }
}

You can then just use E->getValueType()->isSignedIntegerType() and eliminate this helper function.

919 ↗(On Diff #108127)

Again you're using getTypePtr() unnecessarily.

The check should be whether the AST-level address spaces match, not whether the lowered address spaces match.

Please pass E->getType() instead of E here.

You should remove the DoIt parameter and just check E->isOpenCL() (where E is the AtomicExpr already in scope).

lib/CodeGen/CGCall.cpp
3911 ↗(On Diff #108127)

No. Callers should ensure that they've added the right argument type, at least at the level of address spaces.

lib/CodeGen/TargetInfo.h
266 ↗(On Diff #108127)

Why does this return a StringRef instead of an llvm::SynchronizationScope?

yaxunl updated this revision to Diff 109406.Aug 2 2017, 1:31 PM
yaxunl marked 29 inline comments as done.

Revised by reviewers' comments.

t-tye added inline comments.Aug 2 2017, 9:03 PM
include/clang/Basic/SyncScope.h
23 ↗(On Diff #109406)

The OpenCL workitem scope is only used for image fences and does not apply to atomic operations so not sure that it should be in this enumeration which is used only for memory atomics.

lib/CodeGen/TargetInfo.cpp
7554–7555 ↗(On Diff #109406)

OpenCL only uses workitem for image fences which are not the same as atomic memory fences.

For image fences I don't think it would map to singlethread which is intended for signal handler support to ensure a signal handler has visibility of the updates done by a thread which is more of an optimization barrier. In contrast an image fence may need to flush caches to make the image and vector access paths coherent in a single thread.

Since this patch does not support fences probably want to leave workitem scope out. Current AMDGPU targets do not need to do anything for an OpenCL image fence, but other targets may need to generate an intrinsic since there is no LLVMIR instruction for this.

lib/Headers/opencl-c.h
13145–13150 ↗(On Diff #109406)

Do these have to have the same values as the SycnScope enumeration? Should that be ensured in a similar way to the memory_order enumeration?

lib/Sema/SemaChecking.cpp
3103 ↗(On Diff #109406)

IIRC OpenCL allows the scope to be a runtime value. So will doing this will likely cause failures in conformance?

The patch generally looks good, but if you need to handle non-constant scopes, you should submit a new patch to address that.

lib/CodeGen/CGAtomic.cpp
896 ↗(On Diff #109406)

You can sink this line and the next.

lib/Sema/SemaChecking.cpp
3103 ↗(On Diff #109406)

Ah, if that's true, you'll need to emit a switch in IRGen, the same way we handle non-constant memory orders.

yaxunl marked 6 inline comments as done.Aug 3 2017, 6:30 AM
yaxunl added inline comments.
include/clang/Basic/Builtins.def
717 ↗(On Diff #108127)

I am considering supporting it with a separate patch since this patch is already quite large.

include/clang/Basic/SyncScope.h
23 ↗(On Diff #109406)

You are right. I think we should drop it from this enum for now.

lib/CodeGen/CGAtomic.cpp
896 ↗(On Diff #109406)

will do.

lib/CodeGen/CGCall.cpp
3911 ↗(On Diff #108127)

Will remove.

lib/CodeGen/TargetInfo.cpp
7554–7555 ↗(On Diff #109406)

Thanks. I will remove this for now.

lib/Headers/opencl-c.h
13145–13150 ↗(On Diff #109406)

It is desirable to have the same value as SyncScope enumeration, otherwise the library has to do the translation when calling __opencl_atomic_* builtins.

Will do.

lib/Sema/SemaChecking.cpp
3103 ↗(On Diff #109406)

Will support it in another patch since this one is already quite large.

3160 ↗(On Diff #108127)

will do

yaxunl updated this revision to Diff 109542.Aug 3 2017, 6:41 AM
yaxunl marked 5 inline comments as done.

Revised by Tony's and John's comments.

yaxunl updated this revision to Diff 109556.Aug 3 2017, 7:27 AM

Add assert to make sure pre-defined macros __OPENCL_MEMORY_SCOP_* to be consistent with SyncScope enum.

LGTM. I'm fine with the plan to handle potentially non-constant scopes in a follow-up patch.

include/clang/Basic/SyncScope.h
21 ↗(On Diff #109556)

If the numeric values here were chosen to align with the arguments to some runtime function, that's important to leave as a comment.

lib/Headers/opencl-c.h
13145–13150 ↗(On Diff #109406)

Since we're defining these builtins ourselves de novo, it's fine to pick argument values that align with what the existing runtime functions expect. Once the builtins are defined and in-use, of course, we cannot subsequently change the builtin values, even if the runtime changes.

yaxunl updated this revision to Diff 109591.Aug 3 2017, 10:30 AM

Add comments to SyncScope.h

t-tye added inline comments.Aug 3 2017, 11:29 AM
docs/LanguageExtensions.rst
1935 ↗(On Diff #109591)

Should it also say:

The macros ``__OPENCL_MEMORY_SCOPE_WORK_ITEM``, ``__OPENCL_MEMORY_SCOPE_WORK_GROUP``, ``__OPENCL_MEMORY_SCOPE_DEVICE``, ``__OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES``, and ``__OPENCL_MEMORY_SCOPE_SUB_GROUP`` are provided, with values corresponding to the enumerators of OpenCL's ``memory_scope`` enumeration.
yaxunl marked an inline comment as done.Aug 3 2017, 12:12 PM
yaxunl added inline comments.
docs/LanguageExtensions.rst
1935 ↗(On Diff #109591)

Thanks. Will do.

yaxunl updated this revision to Diff 109612.Aug 3 2017, 12:14 PM
yaxunl marked an inline comment as done.

Added documentation about __OPENCL_MEMORY_SCOPE_* by Tony's comments.

t-tye accepted this revision.Aug 3 2017, 1:54 PM

LGTM

yaxunl marked 10 inline comments as done.Aug 4 2017, 6:25 AM
yaxunl added inline comments.
include/clang/Basic/SyncScope.h
21 ↗(On Diff #109556)

I have added the comments and updated the review.. On the Phabricator web page, it may take a bit effort to find it since it is interleaved with reviewers' comments.

bader accepted this revision.Aug 4 2017, 8:11 AM
bader added inline comments.
lib/AST/Expr.cpp
4000–4004 ↗(On Diff #109612)

No need in else branch after return:

if (...) {
  return AT->getValueType();
}

return T;

http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return

test/SemaOpenCL/atomic-ops.cl
1 ↗(On Diff #109612)

It's a pity, we have to parse the whole opencl-c.h file to get two enums and one typedef...

yaxunl marked 3 inline comments as done.Aug 4 2017, 8:33 AM

Ping. Any other comments? Thanks.

lib/AST/Expr.cpp
4000–4004 ↗(On Diff #109612)

will fix when committing.

test/SemaOpenCL/atomic-ops.cl
1 ↗(On Diff #109612)

Since there is change in opencl-c.h, using the header can test that.

rjmccall accepted this revision.Aug 4 2017, 10:01 AM

Still LGTM.

This revision is now accepted and ready to land.Aug 4 2017, 10:01 AM
This revision was automatically updated to reflect the committed changes.
yaxunl marked 2 inline comments as done.