Page MenuHomePhabricator

[OpenCL] Make generic addrspace optional for -fdeclare-opencl-builtins
ClosedPublic

Authored by svenvh on Aug 9 2021, 8:55 AM.

Details

Summary

Currently, -fdeclare-opencl-builtins always adds the generic address
space overloads of e.g. the vload builtin functions in OpenCL 3.0
mode, even when the generic address space feature is disabled.

Guard the generic address space overloads by the
__opencl_c_generic_address_space feature instead of by OpenCL
version.

Guard the private, global, and local overloads using the internal
__opencl_c_named_address_space_builtins feature.

Diff Detail

Event Timeline

svenvh created this revision.Aug 9 2021, 8:55 AM
svenvh requested review of this revision.Aug 9 2021, 8:55 AM
svenvh updated this revision to Diff 368136.Aug 23 2021, 10:03 AM
svenvh edited the summary of this revision. (Show Details)

I have done an alternative spin of this patch: instead of introducing RequireDisabledExtension, simply always make the private, global, and local overloads available. This makes tablegen diverge from opencl-c.h though. Perhaps we should also always add make the private, global, and local overloads available in opencl-c.h?

Anastasia added a comment.EditedAug 26 2021, 10:05 AM

I have done an alternative spin of this patch: instead of introducing RequireDisabledExtension, simply always make the private, global, and local overloads available. This makes tablegen diverge from opencl-c.h though. Perhaps we should also always add make the private, global, and local overloads available in opencl-c.h?

Yes, we could do this indeed as a clang extension. I don't think this will impact the user code. However, this means vendors will have to add definitions for extra overloads in OpenCL 2.0 mode?

I have done an alternative spin of this patch: instead of introducing RequireDisabledExtension, simply always make the private, global, and local overloads available. This makes tablegen diverge from opencl-c.h though. Perhaps we should also always add make the private, global, and local overloads available in opencl-c.h?

Yes, we could do this indeed as a clang extension. I don't think this will impact the user code. However, this means vendors will have to add definitions for extra overloads in OpenCL 2.0 mode?

I wonder if making the private, global, and local overloads always available would even be considered an extension. Unless I overlooked something, I cannot find anything in the spec saying that the private, global, and local overloads should not be available when a generic overload is available (even though this is what Clang has always done)?

Making all overloads always available in e.g. CL2.0 mode will indeed affect the generated calls, which means the definitions should be available when consuming the generated IR.

I have done an alternative spin of this patch: instead of introducing RequireDisabledExtension, simply always make the private, global, and local overloads available. This makes tablegen diverge from opencl-c.h though. Perhaps we should also always add make the private, global, and local overloads available in opencl-c.h?

Yes, we could do this indeed as a clang extension. I don't think this will impact the user code. However, this means vendors will have to add definitions for extra overloads in OpenCL 2.0 mode?

I wonder if making the private, global, and local overloads always available would even be considered an extension. Unless I overlooked something, I cannot find anything in the spec saying that the private, global, and local overloads should not be available when a generic overload is available (even though this is what Clang has always done)?

Making all overloads always available in e.g. CL2.0 mode will indeed affect the generated calls, which means the definitions should be available when consuming the generated IR.

The way I understand the spec for OpenCL C 2.0 is that whenever the address space of the pointer is not listed it means generic has to be used, here is one example:
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#vector-data-load-and-store-functions

gentypen vloadn(size_t offset, const gentype *p)
gentypen vloadn(size_t offset, const __constant gentype *p)

that has no address space (i.e. generic) and constant explicitly. So I think if address spaces are not listed explicitly they are not supposed to be available.

One implication of adding all address space overloads is that it makes library size larger, but my feeling is that we don't have that many functions with pointers to significantly impace the library size?

The way I understand the spec for OpenCL C 2.0 is that whenever the address space of the pointer is not listed it means generic has to be used, here is one example:
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#vector-data-load-and-store-functions

gentypen vloadn(size_t offset, const gentype *p)
gentypen vloadn(size_t offset, const __constant gentype *p)

that has no address space (i.e. generic) and constant explicitly. So I think if address spaces are not listed explicitly they are not supposed to be available.

The unified specification (which "specifies all versions of OpenCL C") seems to be making all overloads available as I understand; it is perhaps subtly different from the previous specification?

https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#vector-data-load-and-store-functions

gentypen vloadn(size_t offset, const __global gentype *p)
gentypen vloadn(size_t offset, const __local gentype *p)
gentypen vloadn(size_t offset, const __constant gentype *p)
gentypen vloadn(size_t offset, const __private gentype *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

gentypen vloadn(size_t offset, const gentype *p)

Since the __constant overload should always be available, I think a reader can assume that the overloads directly above and below __constant are also always available? So that the generic overload is an optional addition to the list of overloads. If not, I'd expect the spec to specify a condition before listing the specific overloads.

One implication of adding all address space overloads is that it makes library size larger, but my feeling is that we don't have that many functions with pointers to significantly impace the library size?

This patch should be touching all of them. Not that many indeed, but it might still have a non-negligible impact on OpenCL libraries due to the combination of #vector-sizes * #types * #addrspaces.

The way I understand the spec for OpenCL C 2.0 is that whenever the address space of the pointer is not listed it means generic has to be used, here is one example:
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_C.html#vector-data-load-and-store-functions

gentypen vloadn(size_t offset, const gentype *p)
gentypen vloadn(size_t offset, const __constant gentype *p)

that has no address space (i.e. generic) and constant explicitly. So I think if address spaces are not listed explicitly they are not supposed to be available.

The unified specification (which "specifies all versions of OpenCL C") seems to be making all overloads available as I understand; it is perhaps subtly different from the previous specification?

https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#vector-data-load-and-store-functions

gentypen vloadn(size_t offset, const __global gentype *p)
gentypen vloadn(size_t offset, const __local gentype *p)
gentypen vloadn(size_t offset, const __constant gentype *p)
gentypen vloadn(size_t offset, const __private gentype *p)

For OpenCL C 2.0, or OpenCL C 3.0 or newer with the __opencl_c_generic_address_space feature:

gentypen vloadn(size_t offset, const gentype *p)

Since the __constant overload should always be available, I think a reader can assume that the overloads directly above and below __constant are also always available? So that the generic overload is an optional addition to the list of overloads. If not, I'd expect the spec to specify a condition before listing the specific overloads.

I agree that OpenCL 3.0 spec might have broken the backward compatibility for OpenCL 2.0, however, I am not convinced it has been done intentionally. Perhaps, it's worth to clarify that?

Looping in @yaxunl for further feedback regarding OpenCL 2.0 behavior specifically.

svenvh updated this revision to Diff 403705.Jan 27 2022, 9:48 AM
svenvh edited the summary of this revision. (Show Details)

Make use of the __opencl_c_named_address_space_builtins internal feature added by D118158. This should avoid affecting OpenCL 2.0.

Anastasia accepted this revision.Jan 28 2022, 8:43 AM

LGTM. Thanks for changing this. I have no issues with this patch as it doesn't change the behavior for OpenCL 2.0 but I did create an issue for OpenCL-Docs to fix this problem somehow: https://github.com/KhronosGroup/OpenCL-Docs/issues/753. So in case, there is any feedback we can refine this behavior later on.

clang/test/SemaOpenCL/fdeclare-opencl-builtins.cl
73

btw this is not correct for C++ for OpenCL 2021 but we are not testing this with C++ for OpenCL 2021 which we should.

However it doesn't belong to this patch, but would you be able to add a FIXME here to indicate the issue?

This revision is now accepted and ready to land.Jan 28 2022, 8:43 AM
svenvh added inline comments.Jan 31 2022, 2:11 AM
clang/test/SemaOpenCL/fdeclare-opencl-builtins.cl
73

I'll add C++ for OpenCL 2021 (and CL3.0) testing in a separate commit.

As discussed offline, we are likely to drop headerless testing from here soonish, since we're relying more and more on opencl-c-base.h for -fdeclare-opencl-builtins. We essentially have to redefine all the feature macros in the test too, when they're already in the header that we're deliberately excluding.

This revision was landed with ongoing or failed builds.Jan 31 2022, 2:21 AM
This revision was automatically updated to reflect the committed changes.