This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Upstreaming khronos OpenCL header file.
ClosedPublic

Authored by yaxunl on Mar 22 2016, 12:13 PM.

Details

Summary

OpenCL has large number of "builtin" functions ("builtin" in the sense of OpenCL spec) which are defined in header files. To compile OpenCL kernels using these builtin functions, the header files are needed. There are different implementations of OpenCL header files. The Khronos implementation can be found at this link:

https://github.com/KhronosGroup/SPIR/blob/spirv-1.0/lib/Headers/opencl.h

This patch is an effort to elicit further discussions about upstreaming OpenCL header file to clang.

Diff Detail

Repository
rL LLVM

Event Timeline

yaxunl updated this revision to Diff 51317.Mar 22 2016, 12:13 PM
yaxunl retitled this revision from to [OpenCL] Upstreaming khronos OpenCL 1.2/2.0 header files..
yaxunl updated this object.
yaxunl added reviewers: Anastasia, rsmith, bader, pxli168.
yaxunl added subscribers: tstellarAMD, cfe-commits.
Anastasia edited edge metadata.Mar 29 2016, 9:55 AM

Regarding half types since there is inconsistency in both headers (commented in CL1.2), should we just enable the extension cl_khr_fp16 in the header and then have the overloads with half there with all the other types? They shouldn't be visible to custom code unless the same extension is enabled in the compiled cl file because half type itself won't be allowed without enabling it.

What about OpenCL 1.1 header? Ideally it would be nice to have them in too!

Is there any chance we could factor out the common bits into separate files to avoid large code repetition? I would imagine it should be quite doable as libs of each standard contain incremental changes.

Do you plan integrating it into the Clang driver too by automatic inclusion since it's not done with normal #include?

lib/Headers/opencl-12.h
585 ↗(On Diff #51317)

Why commented code here?

lib/Headers/opencl-20.h
4150 ↗(On Diff #51317)

I think we should have a normal declaration of these BIFs, because otherwise this won't appear as a symbol anywhere and would prevent for example error reporting with the original OpenCL function name.

An implementation of the builtin function can just call the Clang builtin __builtin_astype in its definition. This is also more general approach in case some implementations will decide to do something else here.

9866 ↗(On Diff #51317)

There is a TODO here!

11213 ↗(On Diff #51317)

This doesn't come directly from Spec.

11222 ↗(On Diff #51317)

This isn't defined by Spec but it seems sensible to define it this way.

However, there is a conflict here as ndrange_t is already added as a Clang builtin type:
https://llvm.org/svn/llvm-project/cfe/trunk@247676
and it is compiled to opaque type in IR. However, considering that we can have local variables and returns of this type, we might remove it from Clang type then and add it here in a header.

Any thoughts?

11251 ↗(On Diff #51317)

I think I would prefer to add an enqueue kernel as a Clang builtin because it requires custom check of type of variadic arguments as well as block arguments.

11263 ↗(On Diff #51317)

Also here we need a special check of parameters to block, and therefore it should be added as a Clang builtin.

11572 ↗(On Diff #51317)

I think this should be target specific?

11605 ↗(On Diff #51317)

Could you change atomic_init_prototype to upper case letters to match the style?

The same below.

Also it seems like some BIFs are declared with macros and some not. Any system there?

11886 ↗(On Diff #51317)

This isn't correct prototype according to Spec though. It should take any pointer type and not a void*.

This approach will introduce extra type conversions and might lead to loss of type information.

13851 ↗(On Diff #51317)

Also there seems to be inconsistency in documentation.

Regarding half types since there is inconsistency in both headers (commented in CL1.2), should we just enable the extension cl_khr_fp16 in the header and then have the overloads with half there with all the other types? They shouldn't be visible to custom code unless the same extension is enabled in the compiled cl file because half type itself won't be allowed without enabling it.

The 2.0 header uses #ifdef cl_khr_fp16 for half builtins, we could do the same for 1.2 headers.

What about OpenCL 1.1 header? Ideally it would be nice to have them in too!

We can add it later after we done with 1.2 and 2.0 headers.

Is there any chance we could factor out the common bits into separate files to avoid large code repetition? I would imagine it should be quite doable as libs of each standard contain incremental changes.

I saw some inconsistencies in the common part of the 1.2 and 2.0 headers. I will try to consolidate them first then try to split.

Do you plan integrating it into the Clang driver too by automatic inclusion since it's not done with normal #include?

Yes.

jprice added a subscriber: jprice.Mar 31 2016, 8:57 AM

Is there any chance we could factor out the common bits into separate files to avoid large code repetition? I would imagine it should be quite doable as libs of each standard contain incremental changes.

I saw some inconsistencies in the common part of the 1.2 and 2.0 headers. I will try to consolidate them first then try to split.

Rather than having a separate header for each version with another header for the common parts, couldn't we just have a single header that uses #if guards for the version specific functionality? e.g.

size_t __attribute__((overloadable)) get_global_id(uint dim);

#if __OPENCL_C_VERSION__ >= 200
size_t __attribute__((overloadable)) get_global_linear_id(void);
#endif

This also allows an easy path for having OpenCL 1.1 and 1.0 support without having to add more copies of the header.

Rather than having a separate header for each version with another header for the common parts, couldn't we just have a single header that uses #if guards for the version specific functionality? e.g.

size_t __attribute__((overloadable)) get_global_id(uint dim);

#if __OPENCL_C_VERSION__ >= 200
size_t __attribute__((overloadable)) get_global_linear_id(void);
#endif

This also allows an easy path for having OpenCL 1.1 and 1.0 support without having to add more copies of the header.

I think it is a good idea. It simplifies the file names and organization.

yaxunl added a comment.Apr 1 2016, 8:09 AM

Anastasia/Alexey/Xiuli,

Do you agree that we should have one single opencl.h instead of headers for different OpenCL versions?

Since most 1.2 builtins are included in 2.0. I think this is doable.

If no objection, I will try to merge them into one header first then addressing other issues.

Thanks.

yaxunl added a comment.Apr 1 2016, 8:31 AM

One of the difference between opencl-12.cl and opencl-20.cl is opencl-12.cl defines

#define const_func __attribute__((const))
#define readonly __attribute__((pure))

and uses them for many functions, e.g.

float const_func __attribute__((overloadable)) acos(float);

I think this is a nice feature for performance. However surprisingly opencl-20.cl does not do that.

I suggest to keep these attributes in the merged file. What do you think? Thanks.

One of the difference between opencl-12.cl and opencl-20.cl is opencl-12.cl defines

#define const_func __attribute__((const))
#define readonly __attribute__((pure))

and uses them for many functions, e.g.

float const_func __attribute__((overloadable)) acos(float);

I think this is a nice feature for performance. However surprisingly opencl-20.cl does not do that.

I suggest to keep these attributes in the merged file. What do you think? Thanks.

I think it's OK to have them them in. But if we keep this, could we use "__" prefix to avoid possible clashes with the custom code identifiers (the same should apply to any other identifiers which we declare in this header that is not specified in Spec).

Do you agree that we should have one single opencl.h instead of headers for different OpenCL versions?

The only concern about having one header would be its parsing speed due to large size. But I believe that the proportion of CL1.2 and CL2.0 declarations isn't really large. So perhaps we won't gain much from separating into multiple.

What about OpenCL 1.1 header? Ideally it would be nice to have them in too!

We can add it later after we done with 1.2 and 2.0 headers.

Since CL1.2 and CL2.0 are derived from CL1.1, it would make sense to fix the declarations (header) for CL1.1 too.

yaxunl added a comment.EditedApr 6 2016, 9:09 AM

We need to discuss the layout of the header file.

For the builtin functions, I plan to follow the order of the spec and extension spec. If there are difference in 1.2 and 2.0, use condition macro.

For functions with double or half arguments, we have two options:

  1. keep all functions with double arguments as one section, keep all functions with half arguments as another section. This is the current layout of opencl-20.h
  1. keep functions with the same name together so that they follow the spec order.

Any suggestions? Thanks.

We need to discuss the layout of the header file.

For the builtin functions, I plan to follow the order of the spec and extension spec. If there are difference in 1.2 and 2.0, use condition macro.

For functions with double or half arguments, we have two options:

  1. keep all functions with double arguments as one section, keep all functions with half arguments as another section. This is the current layout of opencl-20.h
  1. keep functions with the same name together so that they follow the spec order.

Any suggestions? Thanks.

I am fine with either option actually.

Thanks,
Anastasia

pxli168 edited edge metadata.Apr 11 2016, 12:50 AM

I think merge them into one file is a good idea.
And the layout by sepc order is OK, is will makes the review easy.

yaxunl updated this revision to Diff 53401.EditedApr 12 2016, 8:40 AM
yaxunl edited edge metadata.

Merged the header files for OpenCL 1.2 and 2.0 as one opencl.h.

Currently it assumes 1.2 by default and uses __OPENCL_C_VERSION__ for 2.0.

The builtin functions follows the order of spec.

double and half builtin functions stay with the float versions of the same name.

builtin functions for extensions are conditioned with macro cl_khr_*.

builtin functions for extensions are ordered as extension spec after core builtin functions, except cl_khr_half, cl_khr_depth_images, cl_khr_gl_msaa_sharing, and atomics extensions, which stays with their non-extension counter parts.

pxli168 added inline comments.Apr 13 2016, 8:42 PM
lib/Headers/opencl.h
13721–13726 ↗(On Diff #53401)

Move this to the barrier part with worg_group_barrier maybe better.

15636–15637 ↗(On Diff #53401)

Is this macro needed in this header?
And what happens to spir32 and spir64 difference?

15661 ↗(On Diff #53401)

What is this UINT32MAX?
And what about spir64?

15675 ↗(On Diff #53401)

Is this part necessary for up-streaming to the llvm3.9? It will never be hint.

yaxunl marked 4 inline comments as done.Apr 14 2016, 12:41 PM
yaxunl added inline comments.
lib/Headers/opencl.h
13721–13726 ↗(On Diff #53401)

Will do.

15636–15637 ↗(On Diff #53401)

The spec requires to define this macro.

I agree this definition seems arbitrary since the spec does not define PIPE_RESERVE_ID_VALID_BIT.

How about

// Define an internally used macro for the maximum value of size_t.
#if defined(__SPIR32__)
#define _SIZET_MAX UINT_MAX
#elif defined(__SPIR64__ )
#define _SIZET_MAX ULONG_MAX
#endif

and use it for defining CLK_NULL_RESERVE_ID.

15661 ↗(On Diff #53401)

How about replacing UINT32MAX with _SIZET_MAX defined above?

15675 ↗(On Diff #53401)

I will remove it.

lib/Headers/opencl.h
11–13 ↗(On Diff #53401)

This should be removed so they can be used with any target.

53–57 ↗(On Diff #53401)

This needs to be removed too.

65–69 ↗(On Diff #53401)

And this.

70–84 ↗(On Diff #53401)

Do we actually need these typdefs? I thought clang automatically set these for OpenCL.

yaxunl marked 8 inline comments as done.Apr 14 2016, 1:54 PM
yaxunl added inline comments.
lib/Headers/opencl.h
11–13 ↗(On Diff #53401)

Agree.

53–57 ↗(On Diff #53401)

clang does not define size_t for OpenCL but defines __SIZE_TYPE__, so I can use

#define size_t __SIZE_TYPE__

since __SIZE_TYPE__ is defined based on pointer width of the architecture, we will get a size_t definition consistent with target pointer width.

65–69 ↗(On Diff #53401)

Clang does not define ptrdiff_t for OpenCL but I can define it as __PTRDIFF_TYPE__ which is defined by Clang.

70–84 ↗(On Diff #53401)

Clang does not define intptr_t nor uintptr_t, but it defines __INTPTR_TYPE__ and __UINTPTR_TYPE__. I can use them.

yaxunl marked 4 inline comments as done.Apr 14 2016, 2:08 PM
yaxunl added inline comments.
lib/Headers/opencl.h
15636–15637 ↗(On Diff #53401)

Actually there is a predefined macro __SIZE_MAX__ by Clang which is just for this purpose.

yaxunl updated this revision to Diff 53788.Apr 14 2016, 2:27 PM
yaxunl retitled this revision from [OpenCL] Upstreaming khronos OpenCL 1.2/2.0 header files. to [OpenCL] Upstreaming khronos OpenCL header file..
yaxunl updated this object.

Fixed some builtin funcs issues with Alexey Bader's patch.

Revised by Xiuli's and Tom's comments.

yaxunl updated this revision to Diff 53791.Apr 14 2016, 2:49 PM

Revised based on Xiuli's comments in another review http://reviews.llvm.org/D19071#inline-160513 .

bader added inline comments.Apr 15 2016, 3:42 AM
lib/Headers/opencl.h
14892–14898 ↗(On Diff #53791)

OpenCL 2.0 allows sampler-less reading from an image with 'read_write' access qualifiers and write_image to images with 'read_write' access qualifiers.
Since 265783 clang differentiate image types with different access qualifiers, so now we have explicitly declare new built-in functions.
Could you add these declaration, please?

bader added inline comments.Apr 15 2016, 5:44 AM
lib/Headers/opencl.h
15372–15373 ↗(On Diff #53791)

Please, add image access qualifier to Built-in Image Query Functions declarations. It's required by OpenCL 2.0 specification.
OpenCL 1.x spec. defines built-in functions w/o access qualifiers (i.e. 'read_only' is applied).

bader edited edge metadata.Apr 15 2016, 5:52 AM

To clarify my last comment: I don't think we should define Image Query built-in functions only for 'read_only' access qualifier in OpenCL 1.x.
It's just bug in OpenCL 1.x specifications. I think the intention was to provide these built-in functions for both 'read_only' and 'write_only' images. OpenCL 2.0 also provides 'read_write' flavors.

yaxunl updated this revision to Diff 53897.Apr 15 2016, 8:53 AM
yaxunl edited edge metadata.
yaxunl marked an inline comment as done.

Revised as Alexey suggested.

Add access qualifiers to image query builtin functions.

bader added a comment.Apr 15 2016, 8:59 AM

Sam, could you also add declarations of samplerless read_image and write_image built-in functions with read_write qualified images, please?

bader added inline comments.Apr 18 2016, 10:34 AM
lib/Headers/opencl.h
14473–14474 ↗(On Diff #53897)

'const' attribute can't be applied to the read_image* functions, because image2d_t it's handle to the image that located in global memory:

"Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
Note that a function that has pointer arguments and examines the data pointed to must not be declared const. Likewise, a function that calls a non-const function usually must not be const"

https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes

Please, apply 'pure' attribute.

yaxunl updated this revision to Diff 54087.Apr 18 2016, 11:29 AM

Revised as Alexey suggested.

Added read_write version of read/write image builtin functions.
Removed __const_func from read image builtin functions.

bader added inline comments.Apr 18 2016, 12:18 PM
lib/Headers/opencl.h
14476–14479 ↗(On Diff #54087)

According to OpenCL 2.0 (page 46 of https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf):

The sampler-less read image and write image built-ins can be used with image declared with the read_write (or read_write) qualifier. Calls to built-ins that read from an image using a sampler for images declared with the read_write (or read_write) qualifier will be a compilation error.

yaxunl updated this revision to Diff 54101.Apr 18 2016, 1:03 PM

Remove read_write version of read_image builtin functions with sampler argument.

bader accepted this revision.Apr 19 2016, 6:35 AM
bader edited edge metadata.

LGTM. Thanks!

This revision is now accepted and ready to land.Apr 19 2016, 6:35 AM
yaxunl updated this revision to Diff 54421.Apr 20 2016, 2:37 PM
yaxunl edited edge metadata.

Add missing convert_* functions between half and double types.

Make vload* functions with pointer argument in constant addr space shared between 1.2 and 2.0.

Add condition macro ifdef cl_khr_fp64 for double type functions.

Add __const_func to some convert functions.

bader added inline comments.Apr 22 2016, 1:42 AM
lib/Headers/opencl.h
4870 ↗(On Diff #54421)

Sam, could you confirm that this macro id implicitly defined for OpenCL versions 1.2+?

bader added a comment.Apr 22 2016, 1:48 AM

BTW, there is a comment on GitHub that opencl.h might be a bad name for that header, since Khronos group provides the header with the same name, but it defines host API. So if some developer is using clang to compile OpenCL application it might accidentally include opencl.h provided with Clang instead of opencl.h with OpenCL host API.

bader added inline comments.Apr 22 2016, 2:39 AM
lib/Headers/opencl.h
2 ↗(On Diff #54421)

This comment should be updated.

BTW, there is a comment on GitHub that opencl.h might be a bad name for that header, since Khronos group provides the header with the same name, but it defines host API. So if some developer is using clang to compile OpenCL application it might accidentally include opencl.h provided with Clang instead of opencl.h with OpenCL host API.

Right. We'd better come up with another name. How about opencl-c.h?

lib/Headers/opencl.h
4870 ↗(On Diff #54421)

AMDGPU target defines it. However, SPIR target does not. As discussed at cfe-dev, SPIR target should define all extensions/optional core features as supported. I will open a review for that.

BTW, there is a comment on GitHub that opencl.h might be a bad name for that header, since Khronos group provides the header with the same name, but it defines host API. So if some developer is using clang to compile OpenCL application it might accidentally include opencl.h provided with Clang instead of opencl.h with OpenCL host API.

Right. We'd better come up with another name. How about opencl-c.h?

Sounds good to me.

yaxunl marked 12 inline comments as done.Apr 25 2016, 12:03 PM

Changed file name to opencl-c.h and updated comments.

Addressed Anastasia's early comments.

I am thinking we may need several more patches to address remaining issues to keep the changes manageable.

lib/Headers/opencl-20.h
13851 ↗(On Diff #51317)

Fixed.

yaxunl added inline comments.Apr 25 2016, 12:03 PM
lib/Headers/opencl-12.h
585 ↗(On Diff #51317)

Fixed.

lib/Headers/opencl-20.h
4150 ↗(On Diff #51317)

as_type is defined as operators in v1.2 s6.2.4.2 instead of as builtin functions in 6.12.

9866 ↗(On Diff #51317)

Fixed.

11213 ↗(On Diff #51317)

s6.13.16 requires macro CLK_NULL_RESERVE_ID to be defined.

11222 ↗(On Diff #51317)

We need another patch to address this. For now I suggest keep the clang definition of ndrange_t and remove this from header.

11251 ↗(On Diff #51317)

need another patch for this work.

11263 ↗(On Diff #51317)

need another patch

11572 ↗(On Diff #51317)

We need a uniform representation for this macro in SPIR. As far as I know this works fine on different platforms. If a target needs special handling of this macro condition macro maybe added.

11605 ↗(On Diff #51317)

Fixed.

The newly added builtin functions of opencl 2.0 are using macros.

11886 ↗(On Diff #51317)

We need another patch to handle this. Since the argument type allows user defined type, it cannot be declared in header file.

yaxunl updated this revision to Diff 54883.Apr 25 2016, 12:06 PM
yaxunl marked 11 inline comments as done.

update the file.

yaxunl updated this revision to Diff 54910.Apr 25 2016, 2:30 PM

Remove some invalid atomic_fetch_ functions.

Anastasia added inline comments.Apr 29 2016, 12:53 PM
lib/Headers/opencl-c.h
14 ↗(On Diff #54910)

Is there any reason to define this extra macro?

Could we not just check OPENCL_C_VERSION >= CL_VERSION_2_0

137 ↗(On Diff #54910)

cl_khr_fp64 is not enabled!

144 ↗(On Diff #54910)

I think it should be (void*)0.

Btw, I think we should make it available in all OpenCL versions (at least as a part of C99 itself).

4872 ↗(On Diff #54910)

Interesting, macro has the same name as an extension?

6581 ↗(On Diff #54910)

I would like to avoid mapping to Clang builtin in macro because it breaks error reporting (i.e. the error is reported about the macro/Clang builtin and not a proper OpenCL function)

Ex:

error: too many arguments provided to function-like macro invocation
as_char(1,1);
                 ^
note: macro 'as_char' defined here

This also reduces generality in case some implementation need to do something different here.

Such mapping can be simply done in library implementation and should be easily inlined during optimizations.

13318 ↗(On Diff #54910)

Could we remove this functions from here please as they are not properly declared anyways?

I think they should be added as Clang builtins directly.

14056 ↗(On Diff #54910)

I am not sure this implementation of macro is generic enough!

15312 ↗(On Diff #54910)

Should we just check the CL version instead? The same for other occurrences!

yaxunl updated this revision to Diff 55879.May 2 2016, 1:09 PM
yaxunl marked 7 inline comments as done.

Revised as Anastasia suggested.

Also absorbed attribute((overloadable)) into macro __const_func, which saves 300MB space.

lib/Headers/opencl-c.h
4872 ↗(On Diff #54910)

The spec requires a macro to be defined with the same name as the supported extension, so it is natural to use it here.

14056 ↗(On Diff #54910)

This macro is for initializing global variables with atomic type. In SPIR it is represented as usual initializer the same way as a non-atomic type. OpenCL runtime initializes it the same way as a non-atomic type before kernel execution. I don't think there is need to define this macro another way.

yaxunl added a comment.May 2 2016, 1:14 PM

typo. saved 300KB space.

If we want to save some space, could we use some macro to expand the gentype or some script to expand the gentype into each types when the clang is build?

yaxunl added a comment.May 5 2016, 7:47 AM

If we want to save some space, could we use some macro to expand the gentype or some script to expand the gentype into each types when the clang is build?

We need to balance between space and readability. When I absorbed attribute((overloadable)) into __const_func to save space, it did not sacrifice readability. However using macro with gentype will cause the header file more difficult to read.

yaxunl added inline comments.May 6 2016, 2:43 PM
lib/Headers/opencl-c.h
14057 ↗(On Diff #55879)

If this representation is not generic enough. Any suggestion for an alternative? Thanks.

If we want to save some space, could we use some macro to expand the gentype or some script to expand the gentype into each types when the clang is build?

We need to balance between space and readability. When I absorbed attribute((overloadable)) into __const_func to save space, it did not sacrifice readability. However using macro with gentype will cause the header file more difficult to read.

But I don't think this kind of so long header is easy to read, it contains full pages of the same function with different types and it is hard to see if anyone is missing or find where these functions end. In spec builtin functions can be represented by

gentype ctz (gentype x)
gentype max (gentype x, gentype y)
gentype max (gentype x, sgentype y)

If we could use some macro or script to generate the actual builtin functions, it seems more likely spec and clear to read, also will avoid missing or typo.

yaxunl added a comment.May 9 2016, 7:08 AM

If we want to save some space, could we use some macro to expand the gentype or some script to expand the gentype into each types when the clang is build?

We need to balance between space and readability. When I absorbed attribute((overloadable)) into __const_func to save space, it did not sacrifice readability. However using macro with gentype will cause the header file more difficult to read.

But I don't think this kind of so long header is easy to read, it contains full pages of the same function with different types and it is hard to see if anyone is missing or find where these functions end. In spec builtin functions can be represented by

gentype ctz (gentype x)
gentype max (gentype x, gentype y)
gentype max (gentype x, sgentype y)

If we could use some macro or script to generate the actual builtin functions, it seems more likely spec and clear to read, also will avoid missing or typo.

For simple cases, we could define something like this:

#define MATH_FUN(F) \
float F(float x); \
double F(double x); \
half F(half x);

MATH_FUN(sin)
MATH_FUN(cos)

It could be concise without sacrificing much clarity. However, for more complicated patterns, like

uchar8 __const_func convert_uchar8_rtn(int8);

It seems hard to balance between readability and brevity. Mostly it will ends up with multiple layers of macros calling each other, and the original function declarations obscured. This also opens Pandora's box of subjectivity of readability.

I think using macro and not using macro both have advantages and disadvantages. That's why I did not take efforts to change one representation to another.

It will take considerable efforts to define all builtin functions as macros, whereas this review already dragged too long.

Anastasia, what's your suggestion? Thanks.

If we want to save some space, could we use some macro to expand the gentype or some script to expand the gentype into each types when the clang is build?

We need to balance between space and readability. When I absorbed attribute((overloadable)) into __const_func to save space, it did not sacrifice readability. However using macro with gentype will cause the header file more difficult to read.

But I don't think this kind of so long header is easy to read, it contains full pages of the same function with different types and it is hard to see if anyone is missing or find where these functions end. In spec builtin functions can be represented by

gentype ctz (gentype x)
gentype max (gentype x, gentype y)
gentype max (gentype x, sgentype y)

If we could use some macro or script to generate the actual builtin functions, it seems more likely spec and clear to read, also will avoid missing or typo.

For simple cases, we could define something like this:

#define MATH_FUN(F) \
float F(float x); \
double F(double x); \
half F(half x);

MATH_FUN(sin)
MATH_FUN(cos)

It could be concise without sacrificing much clarity. However, for more complicated patterns, like

uchar8 __const_func convert_uchar8_rtn(int8);

It seems hard to balance between readability and brevity. Mostly it will ends up with multiple layers of macros calling each other, and the original function declarations obscured. This also opens Pandora's box of subjectivity of readability.

I think using macro and not using macro both have advantages and disadvantages. That's why I did not take efforts to change one representation to another.

It will take considerable efforts to define all builtin functions as macros, whereas this review already dragged too long.

Anastasia, what's your suggestion? Thanks.

Let's don't exaggerate with macros for this. Using them has negative effect of diagnostics being reported about a macro and not the original function, which won't conform the Spec any longer.

See my earlier comment to line 6582:

error: too many arguments provided to function-like macro invocation
lib/Headers/opencl-c.h
14057 ↗(On Diff #55879)

I don't think Spec imposes any specific implementation of this macro.

I am thinking we might better leave it out to allow adding in a way suitable for other implementations.

yaxunl marked 3 inline comments as done.May 13 2016, 7:45 AM
yaxunl added inline comments.
lib/Headers/opencl-c.h
14057 ↗(On Diff #55879)

How about this?

#ifndef ATOMIC_VAR_INIT
#define ATOMIC_VAR_INIT(x) (x)
#endif

This way we have a default declaration and also allows user to override it.

Another way is to remove it from header and define it in Clang on target by target basis.

Anastasia added inline comments.May 13 2016, 10:59 AM
lib/Headers/opencl-c.h
7452 ↗(On Diff #55879)

Could you put OpenCL v1.1 section too?

7767 ↗(On Diff #55879)

What does +ve mean?

7795 ↗(On Diff #55879)

Could y be merged with the previous line?

7952 ↗(On Diff #55879)

Can't read this comment.

Perhaps: "Compute base e exponent"?

8306 ↗(On Diff #55879)

Does this mean that all non-generic versions are not available in CL2.0 forcing the dynamic AS conversion for all BIFs with a pointer type?

Wondering if adding those unconditionally would be better thing to do...

8457 ↗(On Diff #55879)

Could you add space please: x^2 + y^2

9110 ↗(On Diff #55879)

Overloadable here and in other places too?

12772 ↗(On Diff #55879)

Should it be vload_half instead?

14484 ↗(On Diff #55879)

btw, perhaps it's a good idea to have a macro for attribute((overloadable)). This should be relatively simple to replace everywhere.

14766 ↗(On Diff #55879)

Let's not have macros for function declarations. As commented earlier they can break proper diagnostics reporting (ie. diagnostics will display macro instead of actual function).

15674 ↗(On Diff #55879)

I feel that some bits of this description are being repeated multiple times. Could we have a common description at the beginning instead that would cover all the different cases.

16220 ↗(On Diff #55879)

The same here. Could we unify the description for all write_image functions somehow?

16960 ↗(On Diff #55879)

Let's remove macros from here too.

17051 ↗(On Diff #55879)

Are those arbitrary taken values I am guessing?

17099 ↗(On Diff #55879)

Could we remove the enqueue_kernel as it's being added as Clang builtin? It's prototype is incorrect here anyways.

Btw, I have started the review for it: http://reviews.llvm.org/D20249

17252 ↗(On Diff #55879)

I think this should be better grouped with other image functions above.

Anastasia added inline comments.May 13 2016, 11:00 AM
lib/Headers/opencl-c.h
14057 ↗(On Diff #55879)

Not sure. I guess this way would work too.

lib/Headers/opencl-c.h
9110 ↗(On Diff #55879)

It seems macro __const_func has overloadable attribute.

#define const_func attribute__((const, overloadable))

13534 ↗(On Diff #55879)

I have seen a lot of comments with such few words in one line. It seems very strange here.

14484 ↗(On Diff #55879)

Agree, it seems a lot of functions need overloable attribute.

14766 ↗(On Diff #55879)

So maybe we could use see script to generate the header when build? It will save space and avoid macro break readability and diagnostics reporting. The clang builtin functions are defined in some .def files, or maybe we could try something like that?
This can be done after we have a stable version of what is need for header, we may just decide what should be in this header and don't focus on the performance.

yaxunl added inline comments.May 16 2016, 9:28 AM
lib/Headers/opencl-c.h
17051 ↗(On Diff #55879)

How about this?

// The value are defined according to their order in the spec.
#define CLK_SUCCESS                                 0
#define CLK_ENQUEUE_FAILURE                         -1
#define CLK_INVALID_QUEUE                           -2
#define CLK_INVALID_NDRANGE                         -3
#define CLK_INVALID_EVENT_WAIT_LIST                 -4
#define CLK_DEVICE_QUEUE_FULL                       -5
#define CLK_INVALID_ARG_SIZE                        -6
#define CLK_EVENT_ALLOCATION_FAILURE                -8
#define CLK_OUT_OF_RESOURCES                        -9
jprice added inline comments.May 16 2016, 9:43 AM
lib/Headers/opencl-c.h
17051 ↗(On Diff #55879)

These values were taken from the table in section 2.10.6.1 in the SPIR 2.0 provisional spec. Some of the choices seem arbitrary, while others were chosen to match the value of the equivalent runtime API error.

I imagine changing these values now might cause incompatibilities with existing SPIR consumers?

yaxunl added inline comments.May 16 2016, 10:14 AM
lib/Headers/opencl-c.h
17051 ↗(On Diff #55879)

Then I guess we'd better keep it as is.

yaxunl updated this revision to Diff 57382.May 16 2016, 12:21 PM

Revised according to Anastasia and Xiuli's comments.

yaxunl updated this revision to Diff 57383.May 16 2016, 12:29 PM

Revert enqueue_kernel error code value changed by accident.

Looking good already! I just have one general concern about committing 16K lines of code with no tests. But perhaps testing all the declarations isn't an option actually.

Could we at least have a test that includes the header and makes sure it's compiled successfully. Any thought's on this?

Also I can see that other headers are added to lib/Headers/CMakeLists.txt.

yaxunl updated this revision to Diff 57488.May 17 2016, 9:22 AM

Add opencl-c.h to lib/Headers/CMakeLists.txt. Add a simple test for the header file.

pxli168 accepted this revision.May 19 2016, 9:48 PM
pxli168 edited edge metadata.

LGTM!
I think we may add optimization on this base later.

Sam, there are a few small comments, otherwise it seems to be in a good shape.

@rsmith, do you think you could take a look here? The OpenCL side is fine, but I was just wondering if you see any issue with us adding a header of ~17K lines. It is all part of OpenCL standard libraries though and would be very helpful to have it upstream for all of us. Any suggestion on testing would be useful too as it's a bit simplified for now due to its size.

lib/Headers/opencl-c.h
19 ↗(On Diff #57488)

Could we change to ovld and cnfn with double underscores please?

143 ↗(On Diff #57488)

indentation seems wrong!

175 ↗(On Diff #57488)

Is this a part of spec? I don't see it in s6.13.2.1.

7953 ↗(On Diff #57488)

Ping!

8307 ↗(On Diff #57488)

Any problem with doing this unconditionally?

8458 ↗(On Diff #57488)

Ping!

12646 ↗(On Diff #57488)

This description is missing explaining what gentype is! The same happens in other places too.

12773 ↗(On Diff #57488)

Ping!

14004 ↗(On Diff #57488)

Formatting is wrong!

yaxunl updated this revision to Diff 58340.May 24 2016, 3:39 PM
yaxunl edited edge metadata.
yaxunl marked 35 inline comments as done.

Revised by Anastasia's comments.

yaxunl added inline comments.May 24 2016, 4:26 PM
lib/Headers/opencl-c.h
143 ↗(On Diff #57488)

fixed.

175 ↗(On Diff #57488)

see v1.1 s6.11.2.

7953 ↗(On Diff #57488)

I think "base e exponential" is the correct name. It could be renamed as "base e exponential function" to be more clear.

"exponent" is the 'superscript' part of an exponential. e.g. in e^n, n is the exponent.

8307 ↗(On Diff #57488)

This requires vendors to implement functions not required by OpenCL 2.0 spec. For example, in our 2.0 builtin library we don't have such functions.

8458 ↗(On Diff #57488)

Fixed.

12773 ↗(On Diff #57488)

no. they are different. see OpenCL extensions v1.1 s9.6.6, v1.2 s9.5.6, v2.0 s9.4.6

Anastasia accepted this revision.May 25 2016, 8:58 AM
Anastasia edited edge metadata.

LGTM! Thanks!

Since it has been in review for quite a while, should we try to commit it ASAP? I think Richard can give us his feedback later as well.

This revision was automatically updated to reflect the committed changes.