This is an archive of the discontinued LLVM Phabricator instance.

OpenCL: Fix some missing predefined macros
Needs ReviewPublic

Authored by arsenm on Apr 10 2020, 5:04 PM.

Details

Summary

Some of the device specific standard predefines were missing.

IMAGE_SUPPORT was only hardcoded for SPIR. OPENCL_VERSION
wasn't defined at all, so allow targets to pick values for these.

Diff Detail

Event Timeline

arsenm created this revision.Apr 10 2020, 5:04 PM
jvesely added a comment.EditedApr 11 2020, 9:24 PM

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

The offline compiler needs to be perfectly usable, with no driver/runtime dependencies. Any information needed from the driver should be captured by the target triple/OS type. We would invent a new OS type if there was such a need

https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/preprocessorDirectives.html describes __OPENCL_VERSION__ as "an integer reflecting the version number of the OpenCL supported by the OpenCL device." as contrasted with __OPENCL_C_VERSION__ which is described as "an integer reflecting the OpenCL C version specified by the -cl-std build option to clBuildProgram or clCompileProgram. If the -cl-std build option is not specified, the OpenCL C version supported by the compiler for this OpenCL device will be used."

But I don't see where the correct use of these is defined? How should the user decide which to use? It does seem like __OPENCL_VERSION__ and __OPENCL_C_VERSION__ can differ, e.g. if you compile for a device supporting 2.0 with -cl-std=1.2, but why would __OPENCL_VERSION__ ever be referenced then?

https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/preprocessorDirectives.html describes __OPENCL_VERSION__ as "an integer reflecting the version number of the OpenCL supported by the OpenCL device." as contrasted with __OPENCL_C_VERSION__ which is described as "an integer reflecting the OpenCL C version specified by the -cl-std build option to clBuildProgram or clCompileProgram. If the -cl-std build option is not specified, the OpenCL C version supported by the compiler for this OpenCL device will be used."

But I don't see where the correct use of these is defined? How should the user decide which to use? It does seem like __OPENCL_VERSION__ and __OPENCL_C_VERSION__ can differ, e.g. if you compile for a device supporting 2.0 with -cl-std=1.2, but why would __OPENCL_VERSION__ ever be referenced then?

I don't know why you would ever use this; but the standard says it should be defined, so I guess we have to define it

https://www.khronos.org/registry/OpenCL/sdk/2.0/docs/man/xhtml/preprocessorDirectives.html describes __OPENCL_VERSION__ as "an integer reflecting the version number of the OpenCL supported by the OpenCL device." as contrasted with __OPENCL_C_VERSION__ which is described as "an integer reflecting the OpenCL C version specified by the -cl-std build option to clBuildProgram or clCompileProgram. If the -cl-std build option is not specified, the OpenCL C version supported by the compiler for this OpenCL device will be used."

But I don't see where the correct use of these is defined? How should the user decide which to use? It does seem like __OPENCL_VERSION__ and __OPENCL_C_VERSION__ can differ, e.g. if you compile for a device supporting 2.0 with -cl-std=1.2, but why would __OPENCL_VERSION__ ever be referenced then?

I don't know why you would ever use this; but the standard says it should be defined, so I guess we have to define it

Right, I just want to understand the actual semantics and it doesn't seem clear. I agree that I don't see a use for __OPENCL_VERSION__ but we define it so I'm sure people are using it, so likely we need to be careful that this patch preserves the existing behavior of the runtime.

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

I don't know why this would imply we can't just have a static __OPENCL_VERSION__ for each device in Clang? I don't see anything in the standard that says you are not allowed to have (going with your example) __OPENCL_VERSION__=200 and __OPENCL_C_VERSION=120 at the same time.

It seems like we should be able to just have __OPENCL_VERSION__ defined in Clang, and let the runtime control setting __OPENCL_C_VERSION__ via -cl-std=. If the current patch sets __OPENCL_VERSION__ differently than the runtime does then it seems like we should make it match, but I'm not sure how best to confirm that.

b-sumner added a comment.EditedApr 13 2020, 2:27 PM

In my opinion, for on-line compile for OpenCL, the platform is responsible for setting __OPENCL_VERSION__. Also, it should be the platform's choice as to how to respond the image support query and how __IMAGE_SUPPORT__ is set. For offline compile, it doesn't seem unreasonable to ask the developer to set these.

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

I don't know why this would imply we can't just have a static __OPENCL_VERSION__ for each device in Clang? I don't see anything in the standard that says you are not allowed to have (going with your example) __OPENCL_VERSION__=200 and __OPENCL_C_VERSION=120 at the same time.

It seems like we should be able to just have __OPENCL_VERSION__ defined in Clang, and let the runtime control setting __OPENCL_C_VERSION__ via -cl-std=. If the current patch sets __OPENCL_VERSION__ differently than the runtime does then it seems like we should make it match, but I'm not sure how best to confirm that.

The __OPENCL_VERSION__ macro in the kernel needs to match the version returned by clGetDeviceInfo (both refer to OpenCL version supported by the device),
which in turn must be <= version returned by clGetPlatformInfo.
This part makes it unsuitable for hardcoding in the compiler.

__OPENCL_VERSION__ should indicate device capabilities rather than language features. specs talk about available resources.
For example, CL_DEVICE_IMAGE2D_MAX_HEIGHT is required to be >= 8192 for OpenCL 1.2 while it's required to be >=16384 for OpenCL 2.1.

if there's to be a default value hardcoded in the compiler it'd be nice if platform drivers had an easy way to override (lower) it.

As for OpenCL C spec we should ideally add __OPENCL_VERSION__ but I feel it should be taken from architecture or OS type... I think there were other cases where we needed something like an OpenCL runtime version in the triple too but I can't remember this now.

I can't think of any use case where this macro would be useful in the kernel code though.

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

I don't know why this would imply we can't just have a static __OPENCL_VERSION__ for each device in Clang? I don't see anything in the standard that says you are not allowed to have (going with your example) __OPENCL_VERSION__=200 and __OPENCL_C_VERSION=120 at the same time.

It seems like we should be able to just have __OPENCL_VERSION__ defined in Clang, and let the runtime control setting __OPENCL_C_VERSION__ via -cl-std=. If the current patch sets __OPENCL_VERSION__ differently than the runtime does then it seems like we should make it match, but I'm not sure how best to confirm that.

The __OPENCL_VERSION__ macro in the kernel needs to match the version returned by clGetDeviceInfo (both refer to OpenCL version supported by the device),
which in turn must be <= version returned by clGetPlatformInfo.
This part makes it unsuitable for hardcoding in the compiler.

__OPENCL_VERSION__ should indicate device capabilities rather than language features. specs talk about available resources.
For example, CL_DEVICE_IMAGE2D_MAX_HEIGHT is required to be >= 8192 for OpenCL 1.2 while it's required to be >=16384 for OpenCL 2.1.

But why would the driver not report the device values which we do know in the compiler? A discrepancy between these sounds like a driver bug.

if there's to be a default value hardcoded in the compiler it'd be nice if platform drivers had an easy way to override (lower) it.

I found a hacky way to do this. You can -include a header, which happens after the predefines, with undef and redef of the predefined value. This does trigger -Wreserved-id-macro (which I only found through -Weverything), but you should be able to pragma push/pop the warning off inside the special header.

For now I could assume the platform is 2.0 based on -amdhsa, and not for -mesa3d to work around clover not currently supporting CL2.0.

jvesely added a comment.EditedApr 14 2020, 1:09 PM

OPENCL_VERSION is something that should be really set by the opencl driver rather than the compiler.
coarse-grained SVM can be done without FEATURE_FLAT_ADDRESS_SPACE, so those gpus can get over 1.2, while the compiler can be used in an implementation that doesn't go above 1.2 even for gfx700+

I don't know why this would imply we can't just have a static __OPENCL_VERSION__ for each device in Clang? I don't see anything in the standard that says you are not allowed to have (going with your example) __OPENCL_VERSION__=200 and __OPENCL_C_VERSION=120 at the same time.

It seems like we should be able to just have __OPENCL_VERSION__ defined in Clang, and let the runtime control setting __OPENCL_C_VERSION__ via -cl-std=. If the current patch sets __OPENCL_VERSION__ differently than the runtime does then it seems like we should make it match, but I'm not sure how best to confirm that.

The __OPENCL_VERSION__ macro in the kernel needs to match the version returned by clGetDeviceInfo (both refer to OpenCL version supported by the device),
which in turn must be <= version returned by clGetPlatformInfo.
This part makes it unsuitable for hardcoding in the compiler.

__OPENCL_VERSION__ should indicate device capabilities rather than language features. specs talk about available resources.
For example, CL_DEVICE_IMAGE2D_MAX_HEIGHT is required to be >= 8192 for OpenCL 1.2 while it's required to be >=16384 for OpenCL 2.1.

But why would the driver not report the device values which we do know in the compiler? A discrepancy between these sounds like a driver bug.

because it needs to be capped by the platform opencl version. devices supporting opencl x.y need to support opencl <x.y as well, so capping it at lower version is ok.

if there's to be a default value hardcoded in the compiler it'd be nice if platform drivers had an easy way to override (lower) it.

I found a hacky way to do this. You can -include a header, which happens after the predefines, with undef and redef of the predefined value. This does trigger -Wreserved-id-macro (which I only found through -Weverything), but you should be able to pragma push/pop the warning off inside the special header.

does passing -U__OPENCL_VERSION__ -D __OPENCL_VERSION__=x.y on the commadline work as well? the header trick was my fallback path.

For now I could assume the platform is 2.0 based on -amdhsa, and not for -mesa3d to work around clover not currently supporting CL2.0.

does passing -U__OPENCL_VERSION__ -D __OPENCL_VERSION__=x.y on the commadline work as well? the header trick was my fallback path.

Yes the command line option -U or -D override predefined macros.

arsenm updated this revision to Diff 257681.Apr 15 2020, 5:35 AM

Check triple for support. Report 2.0 for -amdhsa and -amdpal with flat support, but 1.2 for clover/-mesa3d. Also require targets to explicitly set a value to define, rather than defaulting.

arsenm updated this revision to Diff 257684.Apr 15 2020, 5:36 AM

Attach correct diff

LGTM. The objective of the change is to simplify offline compilation. We want to avoid asking users to add -D if a proper macro can be inferred from triple and cpu. Ideally, users only need to specify triple and cpu with clang driver. In case users want to override the default macros for the triple, they can always override them by -D or -U in command line.

The question is whether the default value of the macro is proper. To me this change maintained the old behavior for spir and other targets and the default value seems fine for amdgcn.

LGTM. The objective of the change is to simplify offline compilation. We want to avoid asking users to add -D if a proper macro can be inferred from triple and cpu. Ideally, users only need to specify triple and cpu with clang driver. In case users want to override the default macros for the triple, they can always override them by -D or -U in command line.

will this trigger extra warnings for the user?
as long as there's a way to set custom values I'm fine.
I just don't understand why this can' be appended to the command line by the offline compiler distributed with the platform (and based on release if needed).
The platform will need to get the same number + any additional information it wants to report in clGetDeviceInfo(CL_DEVICE_VERSION) anyway so having it in one place would be prudent.

The question is whether the default value of the macro is proper. To me this change maintained the old behavior for spir and other targets and the default value seems fine for amdgcn.

will this trigger extra warnings for the user?

No.

The current solution seems very specific and doesn't provide generic uniform mechanisms for the targets. Would something based on a triple component not work?

Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2020, 3:48 AM

FYI there has been a related spec change that makes this functionality optional in the offline compilation: https://github.com/KhronosGroup/OpenCL-Docs/pull/522/files