Page MenuHomePhabricator

[Docs][OpenCL] Added OpenCL feature description to user manual.
AcceptedPublic

Authored by Anastasia on Dec 23 2016, 4:57 AM.

Details

Summary

Adding information on OpenCL to Clang documentation. This currently contains the main features only. I am hoping we can improve it with time documenting more features.

Can be build by:

cmake  -DLLVM_ENABLE_SPHINX=ON -DLLVM_BUILD_DOCS=ON ...
make sphinx

Diff Detail

Event Timeline

Anastasia updated this revision to Diff 82410.Dec 23 2016, 4:57 AM
Anastasia retitled this revision from to [Docs][OpenCL] Added OpenCL feature description to user manual..
Anastasia updated this object.
Anastasia updated this object.
Anastasia updated this object.
Anastasia added subscribers: cfe-commits, rsmith.
docs/UsersManual.rst
2083

Should this be opencl_addrsp (two d's)?

2242

Shouldn't that be __attribute__?

2259

Compared to the unrolled code below, this should probably say i < 4 rather than i < 3?

2278

Obviously, change this as well to opencl_addrsp...

svenvh added a subscriber: svenvh.Jan 6 2017, 8:47 AM
Anastasia updated this revision to Diff 83824.Jan 10 2017, 10:29 AM

Updated with comments from Mats!

Anastasia marked 4 inline comments as done.Jan 10 2017, 10:30 AM

Ping! Any more comments here? It would be nice to commit this before the 4.0 rel branch is taken (on Jan 12).

yaxunl added inline comments.Jan 10 2017, 12:15 PM
docs/UsersManual.rst
2003

$ clang -target amdgcn-amd-amdhsa-opencl test.cl

2053

Maybe rephrase as "Disables support of OpenCL extensions" instead of "Disable standard target extensions" since "standard target extension" is confusing and also to avoid confusion with the effect of disabling OpenCL extension as in '#pragam OPENCL EXTENSION X: disable'. Disabling support of an extension means that they cannot be enabled by pragmas in the OpenCL kernels since they are deemed as non-supported.

2107

$ clang -target amdgcn-amd-amdhsa-opencl test.cl

2154

To enable modules for OpenCL, add option -fmodules -fimplicit-module-maps -fmodules-cache-path=X where X is the directory for storing the generated modules.

pekka.jaaskelainen requested changes to this revision.Jan 10 2017, 10:34 PM
pekka.jaaskelainen edited edge metadata.

Great work.

docs/UsersManual.rst
44

Should this be OpenCL C Language?

2013

I'd like to add a sentence or two for instructing how to use it for non-SPMD targets and of course advertise pocl here too ;)

Something along the lines of:

"For "non-SPMD" targets which cannot spawn multiple work-items on the fly using hardware, which covers practically all non-GPU devices such as CPUs and DSPs, additional processing is needed for the kernels to support multiple WI execution. For this, a 3rd party project such as pocl <http://portablecl.org/>_ can be used."

2078

I'd convert to something like:

"This allows adding explicit address space ids to the bitcode for architectures without separate IDs for each of the OpenCL logical address spaces."

2114

This is an alternative location for my pocl advertisement above with a subtitle:

  • For CPU/DSP architectures such as x86, ARM or TCE:

...

2120

Which version of SPIR is generated?

2130

This is a bit confusing paragraph, probably due to my confusing explanations of the problems with pocl. Currently pocl tries not to use FASM for preserving logical AS IDs to LLVM IR due to the bunch of problems it constantly produces with seemingly little benefits for common CPUs. My patch related to this considered only the argument info switch. Now pocl only derives the logical iDS from kernel arguments (and all other IDs within the body of the IR function are lost for flat machines). In my patch, the argument info's address space IDs were made constant and identical to SPIR's as previously they were the target's (which meant losing the AS IDs altogether for flat AS machines).

You seem to document the arg-info md switch later, so this paragraph might be removed or converted to my pocl blurb which mentions the need for further processing for CPUs.

2173

Something like: "The IDs used to encode the OpenCL's logical address spaces in the argument info metadata ..."

2206

add space before /

2310

+conversion

www/index.html
19

OpenCL C. Maybe OpenCL C++ also?

This revision now requires changes to proceed.Jan 10 2017, 10:34 PM
bader edited edge metadata.Jan 11 2017, 1:09 AM

Thanks for working on this!
Looks good, except a few pedantic notes.

docs/UsersManual.rst
44

It should and to be perfectly clear we might consider using term "OpenCL C kernel language" to avoid confusion with OpenCL host code language or OpenCL C++ kernel language.

2009

Just for the sake of consistency it would be nice to switch input file name with the options:

clang -c -emit-llvm test.cl
2014

I suggest specifying that Clang support OpenCL C kernel language standards up to v2.0.
For instance OpenCL v2.1 doesn't introduce new OpenCL C kernel language standard - it still uses v2.0. So existing clang is sufficient enough to serve as front-end compiler in OpenCL 2.1 driver.

2120

For -cl-std=CL1.x (where x is 0, 1 or 2), SPIR version is 1.2.
For -cl-std=CL2.0, SPIR version is 2.0.

2142–2143

Not sure it's worth to mention, but even built-in vector types are defined in the opencl-c.h.

Anastasia updated this revision to Diff 83962.Jan 11 2017, 5:02 AM
Anastasia edited edge metadata.

Addressed comments from Alexey, Pekka, and Sam!

Anastasia marked 16 inline comments as done.Jan 11 2017, 5:22 AM
Anastasia added inline comments.
docs/UsersManual.rst
2013

Cool! I have added this later in the description of x86 target use for OpenCL. :)

2053

Good point!

2114

Added later in the x86 description.

2120

Is it worth mentioning this in this doc?

2130

Yes, it's perhaps a bit confusing indeed. I find the usage of the x86 backend for OpenCL is generally a bit confusing. Also there are a lot of program paths even in Clang that don't play well for OpenCL purposes or are not handled properly and are therefore a source of bugs.

I was just wondering whether it would be possible to switch to using SPIR as a generic target at some point in the future and "deprecate" the x86 target for OpenCL . Do you think this might work for POCL? At the same time we have upcoming SPIR-V support as a new feature that might reshape things.

Regarding the address space I just know that it's common to use fake address space map with the x86 backend for some out of tree implementations to add missing memory segments to this target. That's why I added this text here.

2142–2143

Good point!

www/index.html
19

We don't have OpenCL C++ officially yet. :(

bader accepted this revision.Jan 11 2017, 6:15 AM
bader edited edge metadata.

LGTM.
Thanks!

yaxunl accepted this revision.Jan 11 2017, 7:51 AM
yaxunl edited edge metadata.

Great work. LGTM. Thanks!

Anastasia marked 5 inline comments as done.Jan 11 2017, 10:57 AM
docs/UsersManual.rst
2120

I think so as SPIR V is now there also.

2130

We have considered using SPIR internally, but it is likely to bring the same problems as with the FASM as we need to convert to the target bitcode to ensure all target specific IR level optimizations (especially vectorization) is applied properly. Why do you think using the real target when generating from OpenCL C kernels is problematic?

2133

ARM (and others) are treated similarly in pocl: OpenCL C is just yet another frontend language in this picture, like C/C++. I would not "deprecate" that path as of yet as forcing to another intermediate representation (which SPIR basically is, albeit derived from LLVM IR) always loses some information on the way.

Anastasia updated this revision to Diff 84136.Jan 12 2017, 9:57 AM
Anastasia edited edge metadata.

Update based on comments from Pekka.

Anastasia added inline comments.Jan 12 2017, 9:59 AM
docs/UsersManual.rst
2130

Perhaps, it would be nice to understand better how you use x86 backend in your toolchain. What I have seen in some projects that some invalid IR was produced or Clang hit an asset in the place which has to be done differently in some CodeGen paths just for OpenCL. So the typical fix for this to specialize on the OpenCL is not that easy to get into upstream because the target hasn't been created for what it's being used for and people like to see more modular code in general anyways.

The target is ideally not just a processor architecture but a combination of hardware and runtime software layers (i.e. OpenCL runtime in our case). Btw what triple do you use exactly while compiling OpenCL for x86?

2133

I have removed the deprecation bit from the documentation now. Do you want me to change anything else here?

I think the idea of SPIR target was not to force the developers to yet another IR but rather to create a common generic target that is specifically tailored to OpenCL needs (not just as a language but RT as well). Also the intention of SPIR was to add missing semantics of OpenCL to LLVM IR, so it's supposed not to lose the information ideally even though I am not completely sure the goal was achieved. But since SPIR-V is gaining more popularity I am not sure how much SPIR will continue to be relevant in the community.

One more problem with x86 target is that there are currently not many contributors in the community that are willing to support it for OpenCL. That's why I am a bit reluctant to put it in the document as fully supported target. Do you have any relevant fixes in your internal branches btw?

Since everyone agreed on the core part of this documentation I have committed the current revision in 291780 to make it easier for the release.

Feel free to give me more feedback (if any) as small changes can still be merged in.

@yaxunl Sam, I think it would be nice to describe your recent change for adding OpenCL extensions. Would you like to write up some text? Otherwise I can draft something and you can review. :)

@yaxunl Sam, I think it would be nice to describe your recent change for adding OpenCL extensions. Would you like to write up some text? Otherwise I can draft something and you can review. :)

Anastasia, I am busy with some other work. If you can add it I will be happy to review it. Thanks!

@yaxunl Sam, I think it would be nice to describe your recent change for adding OpenCL extensions. Would you like to write up some text? Otherwise I can draft something and you can review. :)

Anastasia, I am busy with some other work. If you can add it I will be happy to review it. Thanks!

Sure. No worries!

pekka.jaaskelainen edited edge metadata.
pekka.jaaskelainen added inline comments.
docs/UsersManual.rst
2130

We use just the regular target triplet without any opencl specific parts.This is because of at least two reasons a) we link in builtins as target-specific bitcodes b) we want to ensure all ordinary LLVM IR optimizations (some of which are target specific) are applied to the fully linked kernel bitcode to get all the performance improvements. Apart from the address spaces, I'm not sure which OpenCL-specific code paths you might mean in the code gen. I'd turn it the other way: if there's no need such special casing, it's better to do like with other frontends and have the usual hardware target instead of SPIR (which then needs to be converted to the target IR just for optimizations and bitcode linking).

2133

No additional fixes, the upstream Clang/LLVM works just fine. The main source of grief was the fake address space map to transfer the logical address space data, which I decided to get off of for now.

This revision is now accepted and ready to land.Jan 13 2017, 3:31 AM
pekka.jaaskelainen requested changes to this revision.Jan 17 2017, 12:00 AM
pekka.jaaskelainen added inline comments.
docs/UsersManual.rst
2065

Is this correct? I cannot make it work:

~/local/stow/llvm-4.0-unpatched-Debug/bin/clang -Xclang -finclude-default-header -emit-llvm -cc1 -triple spir64-unknown-unknown kernel/test_halfs.cl -c -S -o -
clang-4.0: error: unknown argument: '-cc1'
clang-4.0: error: unknown argument: '-triple'
clang-4.0: error: no such file or directory: 'spir64-unknown-unknown'

-target works instead. (But reveals another issue, there's no printf() declaration, should I file a bug?)

~/local/stow/llvm-4.0-unpatched-Debug/bin/clang -Xclang -finclude-default-header -emit-llvm -target spir64-unknown-unknown kernel/test_halfs.cl -c -S -o -
kernel/test_halfs.cl:10:9: warning: implicit declaration of function 'printf' is invalid in C99 [-Wimplicit-function-declaration]
        printf("max(a,b)[0] type=uint2 a=0x15b348c9 b=0xf88e7d07 want=0xf88e7d07 got=%x\n", max_[0]);
        ^
kernel/test_halfs.cl:10:9: error: function with no prototype cannot use the spir_function calling convention
1 warning and 1 error generated.

With 3.9 that kernel which calls printf() passes:

/local/stow/llvm-3.9-debug/bin/clang -Xclang -finclude-default-header -emit-llvm -target spir64-unknown-unknown kernel/test_halfs.cl -c -S -o -
; ModuleID = 'kernel/test_halfs.cl'
source_filename = "kernel/test_halfs.cl"
target datalayout = "e-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024"
target triple = "spir64-unknown-unknown"
....
This revision now requires changes to proceed.Jan 17 2017, 12:00 AM
Anastasia added inline comments.Jan 17 2017, 11:25 AM
docs/UsersManual.rst
2065

It should be -triple with -cc1 and -target without.

clang -cc1 -triple ...
clang -target ...

We have disabled the C99 builtin functions as per spec v1.2 s6.9.f, but the OpenCL builtins should be available via the header now.

As I can see declaration of printf is in the header under CL1.2 and higher.

Could you try this command instead:

clang -std=CL1.2 test.cl -Xclang -finclude-default-header
docs/UsersManual.rst
2065

Yes, std=CL1.2 helped, good point! But -cc1 -triple doesn't work (unknown argument). However -target works. This is with Clang 4.0.

bader added inline comments.Jan 18 2017, 11:49 AM
docs/UsersManual.rst
2065

I think you pass something before -cc1. -cc1 option must be the first argument to the clang executable, otherwise it won't work.

@pekka.jaaskelainen, I just compiled the release 4.0 branch and it all works for me as expected:
clang -cc1 -triple spir64-unknown-unknown test.cl
clang -target spir64-unknown-unknown test.cl

@pekka.jaaskelainen, I just compiled the release 4.0 branch and it all works for me as expected:
clang -cc1 -triple spir64-unknown-unknown test.cl
clang -target spir64-unknown-unknown test.cl

Yes, that seems to work.

The problem was confusion with the frontend flag parameter. With the -cc1 it switches all parameters to
the frontend mode, not only the next one (like -Xclang does).

When I pass -Xclang before -cc1, it fails, if after, it fails also, but differently:

$ clang -cc1 -triple spir64-unknown-unknown -Xclang -finclude-default-header ~/temp/local.cl
error: unknown argument: '-Xclang'
$ clang -Xclang -finclude-default-header -cc1 -triple spir64-unknown-unknown ~/temp/local.cl
clang-4.0: error: unknown argument: '-cc1'
clang-4.0: error: unknown argument: '-triple'
clang-4.0: error: no such file or directory: 'spir64-unknown-unknown'

(I should pass -finclude-default-header after -cc1 without -Xclang then it works, but I think also the
latter version should work, no?)
Also, I cannot add -S (even with -emit-llvm), then it complains about a missing target:

$ clang -cc1 -triple spir64-unknown-unknown ~/temp/local.cl -S
error: unable to create target: 'No available targets are compatible with this triple.'

I think -S used to work with -emit-llvm as a switch to output LLVM text instead of bitcode, but now it seems to
output text by default so it should not be used. Is there are reason to use/document the -cc1 -triple instead of
-target here? Also, for me the command without -emit-llvm doesn't output anything.

(I should pass -finclude-default-header after -cc1 without -Xclang then it works, but I think also the
latter version should work, no?)

In general -cc1 and -Xclang have similar effect so I either should work indeed.

Also, I cannot add -S (even with -emit-llvm), then it complains about a missing target:

$ clang -cc1 -triple spir64-unknown-unknown ~/temp/local.cl -S
error: unable to create target: 'No available targets are compatible with this triple.'

I think -S used to work with -emit-llvm as a switch to output LLVM text instead of bitcode, but now it seems to
output text by default so it should not be used.

I think this has to do with spir being a 'generic' target.

Is there are reason to use/document the -cc1 -triple instead of
-target here?

Yes, because in some cases we have added frontend flag only, so we need to either pass -cc1 or -Xclang. But I could change the documentation to try to pass -target before the -cc1 flag.

Also, for me the command without -emit-llvm doesn't output anything.

What targets do you use? I think the problem is that it's not clear what the default output of the standalone compiler should be for the most of the OpenCL targets, since it does't produce a valid binary. If let's say we emit an x86 binary, would it be valid to run it for POCL directly? Perhaps, we could try to emit the bitcode by default then.

Currently if I run:

clang test.cl

I get an error: undefined reference to `main'. But if I add a main (which isn't really valid for OpenCL) I get:
error: function cannot be called 'main'

Also, for me the command without -emit-llvm doesn't output anything.

What targets do you use?

In this case I used only SPIR. I suppose this is fine as the concept of linking is unclear in case of Clang's OpenCL
support.

I think the problem is that it's not clear what the default output of the standalone compiler should be for the most of the OpenCL targets, since it does't produce a valid binary. If let's say we emit an x86 binary, would it be valid to run it for POCL directly? Perhaps, we could try to emit the bitcode by default then.

An x86 OpenCL binary from Clang is not executable as is (especially in case of multi-WI WGs), but needs a transformation to convert the multi-WI
to parallel loops, fibers, straight to vector instructions, or similar. However, I try to keep the Clang's OpenCL output of any target valid for pocl without
a special triplet code for now. This way it should be possible to get OpenCL support for most, if not all, targets supported by LLVM out of the box.

One problem I've found are the host callable kernel functions. We don't want the target-specific CC/ABI used for them as they have clSetKernelArg()
mappable arguments and other interfacing considerations, especially with heterogeneous setups. Recently I discovered that there might be a way to fix this
nicely via the SPIR target. If we use SPIR ABI by default for OpenCL kernel functions, the rest of it might just work with the target's default settings.

Just a heads up, I will propose this patch to upstream after some testing:
https://raw.githubusercontent.com/pocl/pocl/c9ae29f2268d3b92462d6ca97d1d596dae0d15a1/tools/patches/clang-3.9-always-spir-cc-for-kernels.patch

@pekka, do you think there is something more to be done here or can I close this revision now?

Apparently, I won't be able to pass` -target` in all cases because -cc has to be always the first parameter.

Also I am preparing additional text and some minor corrections in another review.

This revision is now accepted and ready to land.Feb 9 2017, 11:11 AM