Page MenuHomePhabricator

[SYCL] Add design document for SYCL mode
Needs ReviewPublic

Authored by bader on Mar 23 2021, 7:45 AM.

Details

Summary

Initial version of the document covers high level details of overall
design and SYCL mode support in the front-end compiler.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
bader added inline comments.Mar 25 2021, 6:11 AM
clang/docs/SYCLSupport.md
140 ↗(On Diff #332676)

I replaced "overcome OpenCL limitations" to "use OpenCL as a back-end", but I don't see the difference between these two options.

155 ↗(On Diff #332676)

"OpenCL extensions" are two keywords: __kernel and global.

Do you have a suggestion how to improve it?

! In D99190#2650326, @bader wrote:

  1. I'm looking for suggestions on "OpenCL extensions" clarification.

I said that "OpenCL extensions" are misleading because it can be understood as either extensions inside OpenCL (cl_khr_ext...) or extensions to C/C++ with specific OpenCL features (like kernel or global).

bader updated this revision to Diff 333502.Mar 25 2021, 10:48 PM

Resolve "OpenCL extensions" ambiguity.

bader marked an inline comment as done.Mar 25 2021, 10:50 PM

! In D99190#2650326, @bader wrote:

  1. I'm looking for suggestions on "OpenCL extensions" clarification.

I said that "OpenCL extensions" are misleading because it can be understood as either extensions inside OpenCL (cl_khr_ext...) or extensions to C/C++ with specific OpenCL features (like kernel or global).

Thanks for clarification. I replaced "using OpenCL extensions" with "in OpenCL-like pseudo-code". Hopefully this resolves the ambiguity.

bader updated this revision to Diff 333581.Mar 26 2021, 10:06 AM

Add links to SYCL specification with address space behavior description.

Anastasia added inline comments.Mar 26 2021, 11:19 AM
clang/docs/SYCLSupport.md
857 ↗(On Diff #333581)

FYI OpenCL also deduced private and global address spaces. I suggest you change this to something like:

SYCL doesn't perform address space qualifier inference detailed in v3.0 s6.7.8: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#addr-spaces-inference`.

Since the inference happens in the parsing as a final step (during CodeGen) instead, it would be good to add that certain variables are bind to the physical global memory segments i.e. globals, static locals, etc. You already have some info at the end of the paragraph.

859 ↗(On Diff #333581)

I think this implied from the statement above i.e. the address spaces are not inferred.

What might be important to describe is where other objects are bound to i.e. non-globals are bound to a private memory? You could consider refering to OpenCL spec on that since I feel it might be fairly similar?

861 ↗(On Diff #333581)

Ok this is an implementation details but from the language sematic it would be good to describe what logic you are expecting.

So Default address space is primarily used for C/C++ flat memory which means everything in standard C or C++ will be in Default and this is where the local memory is allocated too.

When not specified otherwise, objects are allocated by default in a generic address space, which corresponds to the single address space of ISO/IEC 9899:1999.

I am guessing this doesn't entirely apply to SYCL? I think this would be important to clarify so it is clear what your semantic of Default is. It would make sense to reference OpenCL generic address space or any other documentation if you want to be concise.

863 ↗(On Diff #333581)

Ok, I suggested to lift this to where you describe the inference. It would be good to elaborate on what objects are bound to what memory segments. You might also refer to OpenCL spec since I believe the memory segments are fairly similar.

Can you explain this point a bit more and pointers to them are cast to generic? Having an example might help too.

864 ↗(On Diff #333581)

Ok, I would put the design goals to the top.

Btw I am not sure this is the case "keeps the type system consistent with C++" since your semantic of default address spaces is different to C++. Perhaps you can elaborate more what it means...

886 ↗(On Diff #333581)

I don't understand what is the message of this paragraph. The example compiles in accordance with OpenCL language semantic... Perhaps you can elaborate more.

906 ↗(On Diff #333581)

I would suggest to move this table to the top and also as an introduction to make references to OpenCL memory model and address space qualifiers to make it clear that this is reused from OpenCL.

919 ↗(On Diff #333581)

I am not sure what limited means? I would suggest being more specific and state something like that there is no binding to the constant memory region. Unless this is something that you intend to support later?

922 ↗(On Diff #333581)

It would also be good to see some clarifications regarding your address space overlapping rules and the implicit/explicit conversions and whether there is any behavior that doesn't comply to embedded C/OpenCL C: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#addr-spaces-conversions

Also do you have any constraints on variable initialization wrt address spaces?

Naghasan requested changes to this revision.Mar 29 2021, 6:14 AM

Added comment for the front end support part.

I think the document present to much at once, front-end, driver, tool chains, SPIR-V backend, compiler-runtime bindings etc.
It is good to have everything eventually, but it would be beneficial to split this patch into more focus ones.
Having a link to the main document in your github would still allow one to get the big picture.

I see the address space bit has already been split, that's better to keep things focused I guess.

@keryell, thanks for the feedback.

I applied your suggestions with two exceptions:

  1. I'm not sure if other back-ends need special processing for kernel function parameters we apply for SPIR kernels.

I'm not sure TBH, but passing just the object as a whole is not helping DAE passes. So either way, it is not a bad thing to have that processing.

clang/docs/SYCLSupport.md
51 ↗(On Diff #333780)

Do you plan on upstreaming your integration header approach ? Even if it is useful in some situations and speeds up the creation of a prototype, it comes with its complications.

An integration header creates a by-product then used as input for another compilation phase. I haven't at the upstream driver capabilities for awhile, but I don't think you can model 2 outputs yet. Are you planing on adding this capability ?
If not, wouldn't that forces you to trigger a compilation step just for the generation of that files ? If so that puts a strong burden on the compilation speed as you now have to process 3 times your input C++ file.

73 ↗(On Diff #333780)

First, it must be possible to use any host compiler

I don't understand the link with the integration header. SYCL being implementable as a library is a design principle of the specs but it doesn't means the clang host compiler has to remain a vanilla C++ compiler.

information provided in the integration header is used (included) by the SYCL runtime implementation, so the header must be available before the host compilation starts

Another approach to the integration header would be for clang as the host compiler to generate the used type traits.

123 ↗(On Diff #333780)

OpenMP offload uses a similar approach isn't it? Might be worth to describe how the 2 relates to each other and where they diverge.

128 ↗(On Diff #333780)

I find "shared" ambiguous, to CUDA developer shared memory is the OpenCL local memory. "Unified" is less prone misinterpretation (and matches the U of USM :).

130 ↗(On Diff #333780)

Raw pointers map to kernel parameters one-to-one without additional transformations

Is this true ? I thought they should be passed as "pointer to global memory" (under the OpenCL or CUDA model).

131 ↗(On Diff #333780)
132 ↗(On Diff #333780)

Won't be better to refer to SPIR or SPIR-V kernels rather than OpenCL ?

Or even SPIR-like or SPIR-defacto to be even less normative.

134 ↗(On Diff #333780)

This is a bit ambiguous and leaves the impression you can't objects pass by value.

144 ↗(On Diff #333780)

A PTX / cuda backend would use the exact same lowering logic. The generated LLVM IR will look a bit different (as the identification of PTX kernels is different), but this section also applies.

161 ↗(On Diff #333780)

This is missing the template instantiation that will eventually lead to that lowering.

I would also suggest to split code block in 2 as to mark what is in header and source file (SYCL code) and what is compiler generated (that pseudo OpenCL).

Might be good to also mention the glue generated in the integration header as this is what allows arguments to be set by the runtime (bridge between the structure in C++ and the SPIR-like kernel arguments).

139 ↗(On Diff #332676)

SYCL's mechanism isn't special, just its way to.

Making an analogy with another programming model is I think a good way to describe it, but I would suggest to use a single source programming model. To that end, I think CUDA would be a better programming model to use here.

This revision now requires changes to proceed.Mar 29 2021, 6:14 AM
bader marked 8 inline comments as done.Mar 29 2021, 6:51 AM

@Anastasia, I've addressed the comments for the address space section in https://reviews.llvm.org/D99488. Let's move discussion there.

clang/docs/SYCLSupport.md
861 ↗(On Diff #333581)

I hope that the language semantics defined by the SYCL specification is clear enough. Please, see https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_address_space_deduction for details. The only thing left for implementation to define is address space assignment for namespace scope variables. I've added a clarification for that at the end of the paragraph.

863 ↗(On Diff #333581)

This is a description of how CodeGen library implements Target hooks like getGlobalVarAddressSpace/getASTAllocaAddressSpace. As it's not related to SYCL, I just removed this confusing sentence.

864 ↗(On Diff #333581)

Moved right after the links to SYCL specification.

Btw I am not sure this is the case "keeps the type system consistent with C++" your semantic of default address spaces is different to C++

The point here is that SYCL compiler doesn't change standard C++ types by assigning non-default address space attribute implicitly. That way C++ types not using extensions are left intact.

886 ↗(On Diff #333581)

This example demonstrates the problem with compiling C++ code when address space type qualifiers are inferred.

The example compiles in accordance with OpenCL language semantic...

https://godbolt.org/z/9jzxK5xc4 - ToT clang doesn't compile this example.

919 ↗(On Diff #333581)

I removed this confusing section. It describes the functionality, which is not being considered for upstream yet.

922 ↗(On Diff #333581)

Added.

I don't have anything in addition to SYCL and SPIR constraints.

bader updated this revision to Diff 334229.Mar 30 2021, 10:58 AM
bader marked 12 inline comments as done.

Apply code review comments from @Naghasan.

Reduced the scope of this patch to front-end compiler part.

bader added a subscriber: ABataev.Mar 30 2021, 10:59 AM
bader added inline comments.
clang/docs/SYCLSupport.md
73 ↗(On Diff #333780)

First, it must be possible to use any host compiler

I don't understand the link with the integration header. SYCL being implementable as a library is a design principle of the specs but it doesn't means the clang host compiler has to remain a vanilla C++ compiler.

information provided in the integration header is used (included) by the SYCL runtime implementation, so the header must be available before the host compilation starts

Another approach to the integration header would be for clang as the host compiler to generate the used type traits.

If there are no objections from @keryell, I'd like to prototype this approach for SYCL first to make sure there are no blocking issues.
This option seems to be worth to explore considering integration header approach disadvantages.

123 ↗(On Diff #333780)

Do you mean the approach OpenMP compiler uses to outline single-source code parts to offload?
To be honest, I'm not sure... @ABataev, is there any description how OpenMP compiler outlines device code?
https://clang.llvm.org/docs/OpenMPSupport.html doesn't provide much details unfortunately.

130 ↗(On Diff #333780)

Good catch. I'll adjust the wording.

132 ↗(On Diff #333780)

Is it okay if I refer to OpenCL SPIR-V environment specification:
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_kernels?

134 ↗(On Diff #333780)

Agree. Removed.

161 ↗(On Diff #333780)

This is missing the template instantiation that will eventually lead to that lowering.
I would also suggest to split code block in 2 as to mark what is in header and source file (SYCL code) and what is compiler generated (that pseudo OpenCL).

Do you suggest to sketch an SYCL kernel invocation API usage example with the complete implementation of these methods to demonstrate the full stack? I was trying to avoid runtime/header part of implementation in this section and describe only API compiler relies on or provides for the runtime library. I think these details were in this document before, but moved to https://github.com/intel/llvm/blob/sycl/sycl/doc/KernelParameterPassing.md referenced below.

Might be good to also mention the glue generated in the integration header as this is what allows arguments to be set by the runtime (bridge between the structure in C++ and the SPIR-like kernel arguments).

I've mentioned that, but I hope KernelParameterPassing.md document provides the rest of the details.
Is that okay?

ABataev added inline comments.Mar 30 2021, 12:11 PM
clang/docs/SYCLSupport.md
123 ↗(On Diff #333780)

I don't think we have anything like this. Moreover, currently, there are 2 different models, one with outlining by the frontend and another one with the outlining by the LLVM.

The OMPIRBuilder is the modern version of late outlining:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
You may consider to go that way with SYCL as well.

bader updated this revision to Diff 334429.Mar 31 2021, 6:43 AM

Convert document from Markdown to ReST format.

The OMPIRBuilder is the modern version of late outlining:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
You may consider to go that way with SYCL as well.

triSYCL had a late outlining version, giving different trade-offs. https://github.com/triSYCL/llvm/blob/sycl/master/lib/SYCL/SYCLKernelFilter.cpp
Now we are just using the early outliner of oneAPI DPC++ because there are quite more developers working on it. :-)

mstorsjo removed a subscriber: mstorsjo.Mar 31 2021, 1:36 PM
keryell added inline comments.Mar 31 2021, 1:47 PM
clang/docs/SYCLSupport.md
73 ↗(On Diff #333780)

First, it must be possible to use any host compiler

I don't understand the link with the integration header. SYCL being implementable as a library is a design principle of the specs but it doesn't means the clang host compiler has to remain a vanilla C++ compiler.

information provided in the integration header is used (included) by the SYCL runtime implementation, so the header must be available before the host compilation starts

Another approach to the integration header would be for clang as the host compiler to generate the used type traits.

Type traits are somehow misleading and probably implementation details.
I guess what this means is a meta-representation of the kernels and their API/ABI.

If there are no objections from @keryell, I'd like to prototype this approach for SYCL first to make sure there are no blocking issues.
This option seems to be worth to explore considering integration header approach disadvantages.

No objection.
triSYCL did not have this integration header since we were using Clang as the host compiler and were able to generate directly the kernel call on the host https://github.com/triSYCL/llvm/blob/sycl/master/lib/SYCL/SYCLSerializeArguments.cpp and the kernel stub on device https://github.com/triSYCL/llvm/blob/sycl/master/lib/SYCL/SYCLSerializeArgumentsInside.cpp

Obviously, it was in the context of late outlining in LLVM but it can work with early outlining in Clang.

Naghasan added inline comments.Apr 1 2021, 2:48 AM
clang/docs/SYCLSupport.md
123 ↗(On Diff #333780)

I mentioned that as I know there is some support for CUDA and the clang driver openmp offload works with multiple frontend passes.
If the model(s) is too different then there is no point going further here.

Moreover, currently, there are 2 different models, one with outlining by the frontend and another one with the outlining by the LLVM.

I do recall some brief conversations about that. Are they meant to work in pair or one aims to replace the other ?

161 ↗(On Diff #333780)

Do you suggest to sketch an SYCL kernel invocation API usage example with the complete implementation of these methods to demonstrate the full stack?

Not the full stack, I agree that would be too much. Just the bare minimal that can show where the global int* a comes from and help connecting the dots. I agree details should be kept in the kernel param passing document.

So maybe consider adding something of those lines:

class MyObj {
  accessor<int> _ptr; // accessor<int> contains a pointer to the global address space.
public:

  void operator()(); // Body of the kernel
};

[...]
MyObj Obj{/*some init*/};
sycl_kernel_function<MyObj>(Obj);

This shows the instanciation and one can make the relation between this and the pseudo OpenCL kernel.

bader updated this revision to Diff 334680.Apr 1 2021, 7:14 AM
bader marked 2 inline comments as done.

Applied code review suggestions from @Naghasan.

clang/docs/SYCLSupport.md
123 ↗(On Diff #333780)

What do say if add a TODO here (or in a separate TODO document) to study more about differences between SYCL/OpenMP-offload/CUDA implementation designs?
It seems to be useful to understanding if we want to re-use other programming model experience with implementing common tasks like device code outlining.

bader retitled this revision from WIP: [SYCL] Add design document for SYCL mode to [SYCL] Add design document for SYCL mode.Apr 1 2021, 7:16 AM
bader edited the summary of this revision. (Show Details)
Naghasan added inline comments.Apr 1 2021, 7:54 AM
clang/docs/SYCLSupport.md
123 ↗(On Diff #333780)

fine by me :)

clang/docs/SYCLSupport.rst
198

Oups sorry, I just noticed I made a mistake in my suggestion, that's more in line with the statement KernelFuncObj.A.__init(a); in the pseudo opencl

bader updated this revision to Diff 334714.Apr 1 2021, 8:46 AM
bader marked an inline comment as done.

Added simplified accessor declaration to the example and TODO section.

bader updated this revision to Diff 334715.Apr 1 2021, 8:48 AM

Fixed a typo in the code example.

bader marked an inline comment as done.Apr 1 2021, 8:48 AM
bader added inline comments.
clang/docs/SYCLSupport.md
123 ↗(On Diff #333780)

I've added a TODO section to the document.

clang/docs/SYCLSupport.rst
198

Thanks. Fixed.

bader marked 2 inline comments as done.Apr 1 2021, 11:30 PM
bader added inline comments.
clang/docs/SYCLSupport.md
51 ↗(On Diff #333780)

It looks like Mike addressed this issue with https://github.com/intel/llvm/pull/3471. :)

Naghasan accepted this revision.Apr 7 2021, 2:33 AM

LGTM

bader updated this revision to Diff 340541.Apr 26 2021, 8:47 AM
bader marked an inline comment as done.

Rebased on ToT to resolve merge conflicts.