Initial version of the document covers high level details of overall
design and SYCL mode support in the front-end compiler.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
25,520 ms | x64 debian > LLVM.Bindings/Go::go.test |
Event Timeline
Great to have some design documentation!
Just a feedback on the first 20%...
clang/docs/SYCLSupport.md | ||
---|---|---|
24 | SPIR-V is just an example | |
36 | ||
49 | ||
56 | ||
57 | ||
58 | ||
61 | Should we add somewhere other back-ends like PTX? | |
70 | ||
81 | ||
96 | ||
109 | ||
115 | ||
119 | ||
126 | We need to introduce USM too somehow... | |
128 | To be generalized to non OpenCL case. | |
140 | To generalize to non SPIR & non OpenCL | |
141 | It is not about overcoming some OpenCL limitations but just to use OpenCL as a back-end. | |
156 | "OpenCL extensions" is probably misleading here and the sentence should be clarified. | |
173 | Where is it? |
@keryell, thanks for the feedback.
I applied your suggestions with two exceptions:
- I'm not sure if other back-ends need special processing for kernel function parameters we apply for SPIR kernels.
- I'm looking for suggestions on "OpenCL extensions" clarification.
clang/docs/SYCLSupport.md | ||
---|---|---|
128 | @Naghasan, do you know if CUDA has similar limitation? |
! In D99190#2650326, @bader wrote:
- I'm looking for suggestions on "OpenCL extensions" clarification.
I said that "OpenCL extensions" are misleading because it can be understood as either extensions inside OpenCL (cl_khr_ext...) or extensions to C/C++ with specific OpenCL features (like kernel or global).
Thanks for clarification. I replaced "using OpenCL extensions" with "in OpenCL-like pseudo-code". Hopefully this resolves the ambiguity.
clang/docs/SYCLSupport.md | ||
---|---|---|
858 | FYI OpenCL also deduced private and global address spaces. I suggest you change this to something like: SYCL doesn't perform address space qualifier inference detailed in v3.0 s6.7.8: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#addr-spaces-inference`. Since the inference happens in the parsing as a final step (during CodeGen) instead, it would be good to add that certain variables are bind to the physical global memory segments i.e. globals, static locals, etc. You already have some info at the end of the paragraph. | |
860 | I think this implied from the statement above i.e. the address spaces are not inferred. What might be important to describe is where other objects are bound to i.e. non-globals are bound to a private memory? You could consider refering to OpenCL spec on that since I feel it might be fairly similar? | |
862 | Ok this is an implementation details but from the language sematic it would be good to describe what logic you are expecting. So Default address space is primarily used for C/C++ flat memory which means everything in standard C or C++ will be in Default and this is where the local memory is allocated too. When not specified otherwise, objects are allocated by default in a generic address space, which corresponds to the single address space of ISO/IEC 9899:1999. I am guessing this doesn't entirely apply to SYCL? I think this would be important to clarify so it is clear what your semantic of Default is. It would make sense to reference OpenCL generic address space or any other documentation if you want to be concise. | |
864 | Ok, I suggested to lift this to where you describe the inference. It would be good to elaborate on what objects are bound to what memory segments. You might also refer to OpenCL spec since I believe the memory segments are fairly similar. Can you explain this point a bit more and pointers to them are cast to generic? Having an example might help too. | |
865 | Ok, I would put the design goals to the top. Btw I am not sure this is the case "keeps the type system consistent with C++" since your semantic of default address spaces is different to C++. Perhaps you can elaborate more what it means... | |
887 | I don't understand what is the message of this paragraph. The example compiles in accordance with OpenCL language semantic... Perhaps you can elaborate more. | |
907 | I would suggest to move this table to the top and also as an introduction to make references to OpenCL memory model and address space qualifiers to make it clear that this is reused from OpenCL. | |
920 | I am not sure what limited means? I would suggest being more specific and state something like that there is no binding to the constant memory region. Unless this is something that you intend to support later? | |
923 | It would also be good to see some clarifications regarding your address space overlapping rules and the implicit/explicit conversions and whether there is any behavior that doesn't comply to embedded C/OpenCL C: https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_C.html#addr-spaces-conversions Also do you have any constraints on variable initialization wrt address spaces? |
Move address space handling section to https://reviews.llvm.org/D99488 to address https://reviews.llvm.org/D89909#2653452.
Added comment for the front end support part.
I think the document present to much at once, front-end, driver, tool chains, SPIR-V backend, compiler-runtime bindings etc.
It is good to have everything eventually, but it would be beneficial to split this patch into more focus ones.
Having a link to the main document in your github would still allow one to get the big picture.
I see the address space bit has already been split, that's better to keep things focused I guess.
I'm not sure TBH, but passing just the object as a whole is not helping DAE passes. So either way, it is not a bad thing to have that processing.
clang/docs/SYCLSupport.md | ||
---|---|---|
51 | Do you plan on upstreaming your integration header approach ? Even if it is useful in some situations and speeds up the creation of a prototype, it comes with its complications. An integration header creates a by-product then used as input for another compilation phase. I haven't at the upstream driver capabilities for awhile, but I don't think you can model 2 outputs yet. Are you planing on adding this capability ? | |
73 |
I don't understand the link with the integration header. SYCL being implementable as a library is a design principle of the specs but it doesn't means the clang host compiler has to remain a vanilla C++ compiler.
Another approach to the integration header would be for clang as the host compiler to generate the used type traits. | |
123 | OpenMP offload uses a similar approach isn't it? Might be worth to describe how the 2 relates to each other and where they diverge. | |
128 | I find "shared" ambiguous, to CUDA developer shared memory is the OpenCL local memory. "Unified" is less prone misinterpretation (and matches the U of USM :). | |
130 |
Is this true ? I thought they should be passed as "pointer to global memory" (under the OpenCL or CUDA model). | |
131 | ||
132 | Won't be better to refer to SPIR or SPIR-V kernels rather than OpenCL ? Or even SPIR-like or SPIR-defacto to be even less normative. | |
134 | This is a bit ambiguous and leaves the impression you can't objects pass by value. | |
140 | SYCL's mechanism isn't special, just its way to. Making an analogy with another programming model is I think a good way to describe it, but I would suggest to use a single source programming model. To that end, I think CUDA would be a better programming model to use here. | |
144 | A PTX / cuda backend would use the exact same lowering logic. The generated LLVM IR will look a bit different (as the identification of PTX kernels is different), but this section also applies. | |
161 | This is missing the template instantiation that will eventually lead to that lowering. I would also suggest to split code block in 2 as to mark what is in header and source file (SYCL code) and what is compiler generated (that pseudo OpenCL). Might be good to also mention the glue generated in the integration header as this is what allows arguments to be set by the runtime (bridge between the structure in C++ and the SPIR-like kernel arguments). |
@Anastasia, I've addressed the comments for the address space section in https://reviews.llvm.org/D99488. Let's move discussion there.
clang/docs/SYCLSupport.md | ||
---|---|---|
862 | I hope that the language semantics defined by the SYCL specification is clear enough. Please, see https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#_address_space_deduction for details. The only thing left for implementation to define is address space assignment for namespace scope variables. I've added a clarification for that at the end of the paragraph. | |
864 | This is a description of how CodeGen library implements Target hooks like getGlobalVarAddressSpace/getASTAllocaAddressSpace. As it's not related to SYCL, I just removed this confusing sentence. | |
865 | Moved right after the links to SYCL specification.
The point here is that SYCL compiler doesn't change standard C++ types by assigning non-default address space attribute implicitly. That way C++ types not using extensions are left intact. | |
887 | This example demonstrates the problem with compiling C++ code when address space type qualifiers are inferred.
https://godbolt.org/z/9jzxK5xc4 - ToT clang doesn't compile this example. | |
920 | I removed this confusing section. It describes the functionality, which is not being considered for upstream yet. | |
923 | Added. I don't have anything in addition to SYCL and SPIR constraints. |
Apply code review comments from @Naghasan.
Reduced the scope of this patch to front-end compiler part.
clang/docs/SYCLSupport.md | ||
---|---|---|
73 |
If there are no objections from @keryell, I'd like to prototype this approach for SYCL first to make sure there are no blocking issues. | |
123 | Do you mean the approach OpenMP compiler uses to outline single-source code parts to offload? | |
130 | Good catch. I'll adjust the wording. | |
132 | Is it okay if I refer to OpenCL SPIR-V environment specification: | |
134 | Agree. Removed. | |
161 |
Do you suggest to sketch an SYCL kernel invocation API usage example with the complete implementation of these methods to demonstrate the full stack? I was trying to avoid runtime/header part of implementation in this section and describe only API compiler relies on or provides for the runtime library. I think these details were in this document before, but moved to https://github.com/intel/llvm/blob/sycl/sycl/doc/KernelParameterPassing.md referenced below.
I've mentioned that, but I hope KernelParameterPassing.md document provides the rest of the details. |
clang/docs/SYCLSupport.md | ||
---|---|---|
123 | I don't think we have anything like this. Moreover, currently, there are 2 different models, one with outlining by the frontend and another one with the outlining by the LLVM. |
The OMPIRBuilder is the modern version of late outlining:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
You may consider to go that way with SYCL as well.
triSYCL had a late outlining version, giving different trade-offs. https://github.com/triSYCL/llvm/blob/sycl/master/lib/SYCL/SYCLKernelFilter.cpp
Now we are just using the early outliner of oneAPI DPC++ because there are quite more developers working on it. :-)
clang/docs/SYCLSupport.md | ||
---|---|---|
73 |
Type traits are somehow misleading and probably implementation details.
No objection. Obviously, it was in the context of late outlining in LLVM but it can work with early outlining in Clang. |
clang/docs/SYCLSupport.md | ||
---|---|---|
123 | I mentioned that as I know there is some support for CUDA and the clang driver openmp offload works with multiple frontend passes.
I do recall some brief conversations about that. Are they meant to work in pair or one aims to replace the other ? | |
161 |
Not the full stack, I agree that would be too much. Just the bare minimal that can show where the global int* a comes from and help connecting the dots. I agree details should be kept in the kernel param passing document. So maybe consider adding something of those lines: class MyObj { accessor<int> _ptr; // accessor<int> contains a pointer to the global address space. public: void operator()(); // Body of the kernel }; [...] MyObj Obj{/*some init*/}; sycl_kernel_function<MyObj>(Obj); This shows the instanciation and one can make the relation between this and the pseudo OpenCL kernel. |
Applied code review suggestions from @Naghasan.
clang/docs/SYCLSupport.md | ||
---|---|---|
123 | What do say if add a TODO here (or in a separate TODO document) to study more about differences between SYCL/OpenMP-offload/CUDA implementation designs? |
clang/docs/SYCLSupport.md | ||
---|---|---|
51 | It looks like Mike addressed this issue with https://github.com/intel/llvm/pull/3471. :) |
SPIR-V is just an example