This aims to support cl_ext_float_atomics on CFE, see https://github.com/KhronosGroup/OpenCL-Docs/pull/552 for details.
Details
Diff Detail
Event Timeline
The extension spec seems to also mention atomic_half. Are planning to add it too?
clang/include/clang/Basic/OpenCLExtensions.def | ||
---|---|---|
85 ↗ | (On Diff #360018) | If the only purpose of adding this extension here is to define the macros you should just use the internal header for it. See guidelines: https://clang.llvm.org/docs/OpenCLSupport.html#implementation-guidelines The same applies to all the feature macros below. You could check how new subgroup extensions are implemented: https://clang.llvm.org/doxygen/opencl-c-base_8h_source.html // For SPIR all extensions are supported. #if defined(__SPIR__) #define cl_khr_subgroup_extended_types 1 #define cl_khr_subgroup_non_uniform_vote 1 #define cl_khr_subgroup_ballot 1 #define cl_khr_subgroup_non_uniform_arithmetic 1 #define cl_khr_subgroup_shuffle 1 #define cl_khr_subgroup_shuffle_relative 1 #define cl_khr_subgroup_clustered_reduce 1 #endif // defined(__SPIR__) |
clang/lib/Headers/opencl-c-base.h | ||
40 | From the spec it feels like those macros should be defined conditionally? | |
clang/lib/Headers/opencl-c.h | ||
13419 | should this not be conditioned on __opencl_c_ext_fp32_global_atomic? Otherwise, I am missing what is the intent of those macros... |
Please consider uploading full diff: https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface
Hi, Anastasia. I am not very familiar with the process, could you please help to add to cfe-commits if possible? Thanks very much.
Just to make sure you are aware Clang doesn't use this header by default, so the upstream users won't be able to call those functions unless you add them into OpenCLBuiltins.td:
https://clang.llvm.org/docs/OpenCLSupport.html#opencl-builtins
This header is only accessible via the frontend options: https://clang.llvm.org/docs/OpenCLSupport.html#cmdoption-finclude-default-header
clang/lib/Headers/opencl-c.h | ||
---|---|---|
13702 | Can you annotate the #endifs with a comment describing what they correspond to. i.e. something like: #endif //defined(__opencl_c_ext_fp32_global_atomic_min_max) |
clang/lib/Headers/opencl-c-base.h | ||
---|---|---|
24 | Should this be defined as 1? Should this define be tested in clang/test/Headers/opencl-c-header.cl too? |
clang/lib/Headers/opencl-c-base.h | ||
---|---|---|
24 | Actually, now that I think more about this, it seems incorrect to add this here without adding the functions to OpenCLBuiltins.td because then the feature macro will be present without the feature when the default header is used? So I guess we either need to extend OpenCLBuiltins.td with new functions or move the new macros into opencl-c.h? |
clang/lib/Headers/opencl-c-base.h | ||
---|---|---|
24 | Good catch! Indeed, adding the define in the shared opencl-c-base.h without also providing the builtins through OpenCLBuiltins.td is incorrect. The preferred solution would be to add the new builtins to clang/lib/Sema/OpenCLBuiltins.td too, to avoid diverging the header and tablegen-driven code paths. |
Hi, Anastasia and svenvh.
Sorry for late reply. I have updated patch per your comments.
Many thanks for your comments.
Thanks for the update! I have a few points to improve the patch.
clang/lib/Sema/OpenCLBuiltins.td | ||
---|---|---|
1107 | Do we really need to guard these additions behind OpenCL 3.0? The spec mentions
(same applies to the opencl-h.c changes of course) | |
1112 | The feature macros seem to be missing. See FuncExtOpenCLCPipes for an example how to do that. | |
1132–1135 | This can be merged into the preceeding foreach parts I think? | |
clang/test/SemaOpenCL/fdeclare-opencl-builtins.cl | ||
137 | As mentioned in the comment on lines 13-17, this test is not meant to be exhaustive. So you don't have to test every overload, checking one or two builtins should suffice. |
Remove OpenCL3.0 macro guards on opencl-c.h and OpenCLBuiltins.td.
Add missing feature macro.
simplify test.
clang/lib/Sema/OpenCLBuiltins.td | ||
---|---|---|
1108 | So now all of those builtins are guarded by cl_ext_float_atomics, which is good, but not by any of the __opencl_c_ext_... macros yet. To guard by multiple macros, we'd need to do something like: def FuncExtFloatAtomicsFp32GlobalMinMax : FunctionExtension<"cl_ext_float_atomics __opencl_c_ext_fp32_global_atomic_min_max">; def FuncExtFloatAtomicsFp32LocalMinMax : FunctionExtension<"cl_ext_float_atomics __opencl_c_ext_fp32_local_atomic_min_max">; And then use let Extension = FuncExtFloatAtomics... around the corresponding builtins. You shouldn't have to change the loop structure much for this, as you can hopefully use # concatenation to construct the appropriate FuncExt name (and then !cast it to a record). However, I do see some problematic cases: the generic address space builtins are enabled by one of multiple feature macros, which is something that is currently not supported by the OpenCLBuiltins.td handling. If it's not too late, could we ask the extension spec editors to provide a dedicated feature macro for generic perhaps? |
clang/lib/Sema/OpenCLBuiltins.td | ||
---|---|---|
1110 | Please try to follow the formatting used in the rest of this file: def : Builtin<... So a space after def, then no newline after the :. This applies to all the new defs below too. | |
1111 | The paste operator # is a binary operator, so it makes more sense to put a space on both sides. | |
1187 | Wrong extension guard. | |
1283 | Wrong extension guard. |
Thanks for the update! I have a comment about indentation, other than that this is looking good to me.
clang/lib/Sema/OpenCLBuiltins.td | ||
---|---|---|
1118 | Please indent the content inside all let blocks. |
Hi, Anastasia. I also have some work to translate these atomic builtins on SPIRV currently, so I'd like to have another patch to add atomic_half later.
Do you have any comments?
Thanks very much.
Hi, svenvh.
Should we use cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics to guard the functions using atomic_double type?
Thanks very much.
#if defined(__opencl_c_ext_fp64_local_atomic_min_max) double __ovld atomic_fetch_min(volatile __local atomic_double *object, double operand); #endif
Hi, svenvh and Anastasia. If you approve the patch, could you please submit it?
I don't have permission to do it.
Hi, svenvh and Anastasia. Do you have any comments for adding cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics to guard atomic_double type? I'd appreciate it if you have time to answer it.
And if there is no any comment, please commit the patch.
Thanks very much.
Apologies for the delayed response.
This is perhaps something to raise at the specification level?
We can adjust the guards after any followup discussion if needed. To progress the support of this extension, I've just committed your patch (with some minor whitespace fixes).
Hi, svenvh.
I am ok with the patch. Thanks very much.
I'd appreciate it if you help review https://github.com/KhronosGroup/SPIRV-LLVM-Translator/pull/1116 as well.
Should this be defined as 1?
Should this define be tested in clang/test/Headers/opencl-c-header.cl too?