diff --git a/clang/docs/SYCLSupport.md b/clang/docs/SYCLSupport.md new file mode 100644 --- /dev/null +++ b/clang/docs/SYCLSupport.md @@ -0,0 +1,197 @@ +# SYCL Compiler and Runtime architecture design + +## Introduction + +This document describes the architecture of the SYCL compiler and runtime +library. More details are provided in [external document][1], which are going to +be added to clang documentation in the future. + +## SYCL Compiler architecture + +SYCL application compilation flow for `spir`/`spir64` target: + +![High level component diagram for SYCL Compiler](images/Compiler-HLD.svg) + +*Diagram 1. Application build flow.* + +SYCL compiler logically can be split into the host compiler and a number of +device compilers—one per each supported target. Clang driver orchestrates the +compilation process, it will invoke the device compiler once per each requested +target, then it will invoke the host compiler to compile the host part of a +SYCL source. In the simplest case, when compilation and linkage are done in one +compiler driver invocation, once compilation is finished, the device object +files (which are really LLVM IR files) are linked with the `llvm-link` tool. +The resulting LLVM IR module can additionally processed to get program format +suitable for execution by specific back-end. For instance, `llvm-spirv` tool +translates LLVM bitcode to SPIR-V module for OpenCL back-end or `ptxas`/`fatbin` +tools prepare LLVM for execution via CUDA back-end. After that +`clang-offload-wrapper` tool wraps device module in a host object file. Once all +the host object files and the wrapped object with device code are ready, the +driver invokes the usual platform linker and the final executable called "fat +binary" is produced. This is a host executable or library with embedded linked +images for each target specified at the command line. + +There are many variations of the compilation process depending on whether user +chose to do one or more of the following: + +- perform compilation separately from linkage +- compile the device module ahead-of-time for one or more targets +- perform device code splitting so that device code is distributed across + multiple modules rather than enclosed in a single one +- perform linkage of static device libraries +Sections below provide more details on some of those scenarios. + +SYCL sources can be also compiled as a regular C++ code, in this mode there is +no "device part" of the code — everything is executed on the host. + +Device compiler is further split into the following major components: + +- **Front-end** - parses input source, "outlines" device part of the code, +applies additional restrictions on the device code (e.g. no exceptions or +virtual calls), generates LLVM IR for the device code only and an "integration +header" which provides information like kernel name, parameters order and data +type for the runtime library. +- **Middle-end** - transforms the initial LLVM IR to get consumed by the +back-end. Today middle-end transformations include subset of LLVM standard +passes: + - Any LLVM IR transformation can be applied with only one limitation: + the back-end compiler should be able to handle the transformed LLVM IR. + NOTE: the performance impact of transformation passes depends on accurate + target information, so it makes sense to disable such transformation for + "virtual" targets like SPIR. + - Optionally: Address space inference pass + - Optionally: LLVM IR -> SPIR-V translator or LLVM -> PTX. +- **Back-end** - produces native "device" code. It is shown as +"Target-specific LLVM compiler" box on Diagram 1. It is invoked either at +compile time (in ahead-of-time compilation scenario) or at runtime +(in just-in-time compilation scenario). + +*Design note: in current design we use SYCL device front-end compiler to produce +the integration header for two reasons. First, it must be possible to use any +host compiler to produce SYCL heterogeneous applications. Second, even if the +same Clang compiler is used for the host compilation, information provided in +the integration header is used (included) by the SYCL runtime implementation, so +the header must be available before the host compilation starts.* + +### SYCL support in Clang front-end + +SYCL support in Clang front-end can be split into the following components: + +- Device code outlining. This component is responsible for identifying and +outlining "device code" in the single source. +- SYCL kernel function object (functor or lambda) lowering. This component +creates a SPIR kernel function interface for SYCL kernels. +- Device code diagnostics. This component enforces language restrictions on +device code. +- Integration header generation. This component emits information required for +binding host and device parts of the SYCL code. + +#### Device code outlining + +Here is a code example of a SYCL program that demonstrates compiler outlining +work: + +```C++ +int foo(int x) { return ++x; } +int bar(int x) { throw std::exception{"CPU code only!"}; } +... +using namespace sycl; +queue Q; +buffer a{range<1>{1024}}; +Q.submit([&](handler& cgh) { + auto A = a.get_access(cgh); + cgh.parallel_for(range<1>{1024}, [=](id<1> index) { + A[index] = index[0] * 2 + foo(42); + }); + } +... +``` + +In this example, the compiler needs to compile the lambda expression passed +to the `sycl::handler::parallel_for` method, as well as the function `foo` +called from the lambda expression for the device. + +The compiler must also ignore the `bar` function when we compile the +"device" part of the single source code, as it's unused inside the device +portion of the source code (the contents of the lambda expression passed to the +`sycl::handler::parallel_for` and any function called from this lambda +expression). + +The current approach is to use the SYCL kernel attribute in the runtime to +mark code passed to `sycl::handler::parallel_for` as "kernel functions". The +runtime library can't mark foo as "device" code - this is a compiler job: to +traverse all symbols accessible from kernel functions and add them to the +"device part" of the code marking them with the new SYCL device attribute. + +#### Lowering of lambda function objects and named function objects + +SYCL memory objects shared between host and device can be accessed either +through the use of raw pointers to unified memory (known as USM +https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:usm) +or special `accessor` classes. SYCL device compiler sets address space attribute +for raw pointers captured by SYCL kernels. `accessor` classes require additional +processing as the "device" implementation of this class contains pointers to the +device memory as a class member. +[OpenCL SPIR-V environment specification](https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_kernels) +doesn't allow passing structures with pointer type members as kernel parameters. + +SYCL also has a special mechanism for passing kernel arguments from host to +the device. In OpenCL kernel arguments are set by calling `clSetKernelArg` +function for each kernel argument, meanwhile in SYCL all the kernel arguments +are fields of "SYCL kernel function" which can be defined as a lambda function +or a named function object and passed as an argument to SYCL function for +invoking kernels (such as `parallel_for` or `single_task`). For example, in the +previous code snippet above `accessor` `A` is one such captured kernel argument. + +To facilitate the mapping of SYCL kernel data members to SPIR-like kernel +arguments we added the generation of an SPIR kernel function inside the +compiler. A SPIR-like kernel function contains the body of the SYCL kernel +function, receives OpenCL-like parameters and additionally does some +manipulation to initialize SYCL kernel data members with these parameters. +A PTX / CUDA backend uses the same lowering logic, but generated LLVM IR has +minor differences (e.g. the identification of PTX kernels is different). + +The pseudo code for SPIR kernel function generated by the compiler looks like +this: + +```C++ +// Generated kernel function (expressed in OpenCL-like pseudo-code) +__kernel KernelName(global int* a) { + KernelType KernelFuncObj; // Actually kernel function object declaration + // doesn't have a name in AST. + // Let the kernel function object have one captured field - accessor A. + // We need to init it with global pointer from arguments: + KernelFuncObj.A.__init(a); + // Body of the SYCL kernel from SYCL headers: + { + KernelFuncObj(); + } +} +``` + +The compiler generates such kernel function for an instantiation of a function +template with `sycl_kernel` attribute. SYCL kernel invocation methods can use +following helper function template to lower SYCL kernel function object to +SPIR-like kernel function: + +```C++ +// SYCL kernel is defined in SYCL headers: +template +__attribute__((sycl_kernel)) void sycl_kernel_function(KernelType KernelFuncObj) { + // ... + KernelFuncObj(); +} +``` + +The compiler also emits the integration header with the information required for +marshalling SPIR-like kernel arguments by the runtime library. Kernel function +and integration header are generated by the compiler from `Sema` library using +AST nodes. + +Additional details of kernel parameter passing may be found in the +document +[SYCL Kernel Parameter Handling and Array Support](https://github.com/intel/llvm/blob/sycl/sycl/doc/KernelParameterPassing.md). + +[//]: # (TODO: move KernelParameterPassing.md to llvm-project) + +[1]: https://github.com/intel/llvm/blob/sycl/sycl/doc/CompilerAndRuntimeDesign.md \ No newline at end of file diff --git a/clang/docs/images/Compiler-HLD.svg b/clang/docs/images/Compiler-HLD.svg new file mode 100644 --- /dev/null +++ b/clang/docs/images/Compiler-HLD.svg @@ -0,0 +1,16522 @@ + +image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Page-1 + + Task.136llvm-linkSheet.141Sheet.145Sheet.146Sheet.147Sheet.148Sheet.149Sheet.150Sheet.154Sheet.155Sheet.156Sheet.157Sheet.158Task.176Offload-wrapperSheet.181Sheet.185Sheet.186Sheet.187Sheet.188Sheet.189Sheet.190Sheet.194Sheet.195Sheet.196Sheet.197Sheet.198Offload-wrapper + + +Sheet.137llvm-link + + + + + + + + + + + + + + + + + + + + + + + + + + Data Object + SourceFile.cpp + + Sheet.2 + + + + + + + Sheet.3 + + + + + SourceFile.cpp + + + + + + Dynamic Connector + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Task + Compiler driver + + Sheet.5 + + + + + + + Sheet.6 + + Sheet.7 + + + + Sheet.8 + + + + + Sheet.9 + + + + + + + + + + + Sheet.10 + + Sheet.11 + + + + Sheet.12 + + + + Sheet.13 + + + + + + + Sheet.14 + + + + + + + + + + + Sheet.15 + + Sheet.16 + + + + Sheet.17 + + + Sheet.18 + + + Sheet.19 + + Sheet.20 + + + + Sheet.21 + + + + Sheet.22 + + + Sheet.23 + + + Sheet.24 + + + Sheet.25 + + + Sheet.26 + + + + + + + + + Compiler driver + + + + + + Dynamic Connector.51 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Task.28 + SYCL device front-end compiler + + Sheet.29 + + + + + + + Sheet.30 + + Sheet.31 + + + + Sheet.32 + + + + + Sheet.33 + + + + + + + + + + + Sheet.34 + + Sheet.35 + + + + Sheet.36 + + + + Sheet.37 + + + + + + + Sheet.38 + + + + + + + + + + + Sheet.39 + + Sheet.40 + + + + Sheet.41 + + + Sheet.42 + + + Sheet.43 + + Sheet.44 + + + + Sheet.45 + + + + Sheet.46 + + + Sheet.47 + + + Sheet.48 + + + Sheet.49 + + + Sheet.50 + + + + + + + + + SYCL device front-end compiler + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Task.52 + C++ host compiler + + Sheet.53 + + + + + + + Sheet.54 + + Sheet.55 + + + + Sheet.56 + + + + + Sheet.57 + + + + + + + + + + + Sheet.58 + + Sheet.59 + + + + Sheet.60 + + + + Sheet.61 + + + + + + + Sheet.62 + + + + + + + + + + + Sheet.63 + + Sheet.64 + + + + Sheet.65 + + + Sheet.66 + + + Sheet.67 + + Sheet.68 + + + + Sheet.69 + + + + Sheet.70 + + + Sheet.71 + + + Sheet.72 + + + Sheet.73 + + + Sheet.74 + + + + + + + + + C++ host compiler + + + + + + + + + + + + + + + + + + + + + + + + + + Data Object.76 + LLVM IR + + Sheet.77 + + + + + + + Sheet.78 + + + + + LLVM IR + + + + + + Dynamic Connector.127 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Task.104 + Target specific LLVM compiler + + Sheet.105 + + + + + + + Sheet.106 + + + Sheet.108 + + + + + Sheet.109 + + + + + + + + Sheet.113 + + + + + + + Sheet.114 + + + + + + + + + + + Sheet.115 + + Sheet.116 + + + + Sheet.117 + + + Sheet.118 + + + + Sheet.122 + + + Sheet.123 + + + Sheet.124 + + + Sheet.125 + + + Sheet.126 + + + + + + + + + Target specific LLVM compiler + + + + + + + + + + + + + + + + + + + + + + + + + + Data Object.128 + Target binary + + Sheet.129 + + + + + + + Sheet.130 + + + + + Target binary + + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Data Object.165 + Host object file + + Sheet.166 + + + + + + + Sheet.167 + + + + + Host object file + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Task.176 + Offload-wrapper + + Sheet.177 + + + + + + + + Sheet.181 + + + + + + + + Sheet.185 + + + + + + + Sheet.186 + + + + + + + + + + + Sheet.187 + + Sheet.188 + + + + Sheet.189 + + + Sheet.190 + + + + Sheet.194 + + + Sheet.195 + + + Sheet.196 + + + Sheet.197 + + + Sheet.198 + + + + + + + + + Offload-wrapper + + + + + + + + Dynamic Connector.205 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.206 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Data Object.228 + Integration header + + Sheet.229 + + + + + + + Sheet.230 + + + + + Integrationheader + + + + + + Dynamic Connector.231 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.232 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Data Object.76LLVM IRSheet.77Sheet.78LLVM IR + + +Data Object.76SPIRVSheet.77Sheet.78SPIRV + + + + Dynamic Connector.227 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Task.136llvm-spirvSheet.141Sheet.145Sheet.146Sheet.147Sheet.148Sheet.149Sheet.150Sheet.154Sheet.155Sheet.156Sheet.157Sheet.158Task.176Offload-wrapperSheet.181Sheet.185Sheet.186Sheet.187Sheet.188Sheet.189Sheet.190Sheet.194Sheet.195Sheet.196Sheet.197Sheet.198Offload-wrapper + + +Sheet.137llvm-spirv + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.205 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.127 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Data Object.165Host object fileSheet.166Sheet.167Wrapper object file + + + + + + + + + + + + + + + + + + + + Data Object.201 + Fat binary file + + Sheet.202 + + + + + Sheet.203 + + + Task.176Offload-wrapperSheet.181Sheet.185Sheet.186Sheet.187Sheet.188Sheet.189Sheet.190Sheet.194Sheet.195Sheet.196Sheet.197Sheet.198Sheet.177Linker + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.227 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Data Object.165Host object fileSheet.166Sheet.167Fat binary file + + + + + + + + + + + + + + + + + + + + Data Object.201 + Fat binary file + + Sheet.202 + + + + + Sheet.203 + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (another source compiled) + + + + + + + + + + + + + + + + + + + + Data Object.165 + Host object file + + Sheet.166 + + + + + + + Sheet.167 + + + + + Host object file + + + + + Dynamic Connector.131 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (another source compiled) + Dynamic Connector.227 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +