Index: clang/docs/OpenCLSupport.rst =================================================================== --- clang/docs/OpenCLSupport.rst +++ clang/docs/OpenCLSupport.rst @@ -17,31 +17,82 @@ OpenCL Support ================== -Clang fully supports all OpenCL C versions from 1.1 to 2.0. +Clang has complete support of OpenCL C versions from 1.0 to 2.0. -Please refer to `Bugzilla -`_ -for the most up to date bug reports. +Clang also supports :ref:`the C++ for OpenCL kernel language `. +There is an ongoing work to support :ref:`OpenCL 3.0 `. + +For general issues and bugs with OpenCL in clang refer to `Bugzilla +`_. + +.. _cxx_for_opencl_impl: C++ for OpenCL Implementation Status ==================================== -Bugzilla bugs for this functionality are typically prefixed -with '[C++]'. +Clang implements language version 1.0 published in `the official +release of C++ for OpenCL Documentation +`_. -Differences to OpenCL C ------------------------ +Bugzilla bugs for this functionality are typically prefixed +with '[C++4OpenCL]' - click `here +`_ +to view the full bug list. -TODO! Missing features or with limited support ---------------------------------------- -- Use of ObjC blocks is disabled. - -- Global destructor invocation is not generated correctly. - -- Initialization of objects in `__constant` address spaces is not guaranteed to work. - -- `addrspace_cast` operator is not supported. +- Use of ObjC blocks is disabled and therefore the ``enqueue_kernel`` builtin + function is not supported currently. It is expected that if support for this + feature is added in the future, it will utilize C++ lambdas instead of ObjC + blocks. + +- IR generation for global destructors is incomplete (See: + `PR48047 `_). + +- There is no distinct file extension for sources that are to be compiled + in C++ for OpenCL mode (See: `PR48097 `_) + +.. _opencl_300: + +OpenCL 3.0 Implementation Status +================================ + +The following table provides an overview of features in OpenCL C 3.0 and their +implementation status. + ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Category | Feature | Status | Reviews | ++==============================+==============================================================+======================+===========================================================================+ +| Command line interface | New value for `-cl-std` flag | :good:`done` | https://reviews.llvm.org/D88300 | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Predefined macros | New version macro | :good:`done` | https://reviews.llvm.org/D88300 | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Predefined macros | Feature macros | :part:`worked on` | https://reviews.llvm.org/D89869 | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | Generic address space | :none:`unclaimed` | | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | Builtin function overloads with generic address space | :part:`worked on` | https://reviews.llvm.org/D92004 | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | Program scope variables in global memory | :none:`unclaimed` | | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | 3D image writes including builtin functions | :none:`unclaimed` | | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | read_write images including builtin functions | :none:`unclaimed` | | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | C11 atomics memory scopes, ordering and builtin function | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | Device-side kernel enqueue including builtin functions | :none:`unclaimed` | | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | Pipes including builtin functions | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| Feature optionality | Work group collective functions | :part:`worked on` | https://reviews.llvm.org/D92004 | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| New functionality | RGBA vector components | :none:`unclaimed` | | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| New functionality | Subgroup functions | :part:`worked on` | https://reviews.llvm.org/D92004 | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ +| New functionality | Atomic mem scopes: subgroup, all devices including functions | :part:`worked on` | https://reviews.llvm.org/D92004 (functions only) | ++------------------------------+--------------------------------------------------------------+----------------------+---------------------------------------------------------------------------+ Index: clang/docs/UsersManual.rst =================================================================== --- clang/docs/UsersManual.rst +++ clang/docs/UsersManual.rst @@ -41,7 +41,8 @@ variants depending on base language. - :ref:`C++ Language ` - :ref:`Objective C++ Language ` -- :ref:`OpenCL C Language `: v1.0, v1.1, v1.2, v2.0. +- :ref:`OpenCL Kernel Language `: OpenCL C v1.0, v1.1, v1.2, v2.0, + plus C++ for OpenCL. In addition to these base languages and their dialects, Clang supports a broad variety of language extensions, which are documented in the @@ -2796,8 +2797,8 @@ =============== Clang can be used to compile OpenCL kernels for execution on a device -(e.g. GPU). It is possible to compile the kernel into a binary (e.g. for AMD or -Nvidia targets) that can be uploaded to run directly on a device (e.g. using +(e.g. GPU). It is possible to compile the kernel into a binary (e.g. for AMDGPU) +that can be uploaded to run directly on a device (e.g. using `clCreateProgramWithBinary `_) or into generic bitcode files loadable into other toolchains. @@ -2824,13 +2825,26 @@ $ clang -c -emit-llvm test.cl -This will produce a generic test.bc file that can be used in vendor toolchains +This will produce a file `test.bc` that can be used in vendor toolchains to perform machine code generation. -Clang currently supports OpenCL C language standards up to v2.0. Starting from -clang 9 a C++ mode is available for OpenCL (see +Note that if compiled to bitcode for generic targets such as SPIR, +portable IR is produced that it can be used with various vendor +tools as well as open source tools such as `SPIRV-LLVM Translator +`_ +to produce SPIR-V binary. + + +Clang currently supports OpenCL C language standards up to v2.0. Clang mainly +supports full profile. There is only very limited support of the embedded +profile. +Starting from clang 9 a C++ mode is available for OpenCL (see :ref:`C++ for OpenCL `). +There is ongoing support for OpenCL v3.0 that is documented along with other +experimental functionality and features in development on :doc:`OpenCLSupport` +page. + OpenCL Specific Options ----------------------- @@ -2847,24 +2861,31 @@ .. option:: -finclude-default-header -Loads standard includes during compilations. By default OpenCL headers are not -loaded and therefore standard library includes are not available. To load them -automatically a flag has been added to the frontend (see also :ref:`the section -on the OpenCL Header `): +Adds most of builtin types and function declarations during compilations. By +default the OpenCL headers are not loaded and therefore certain builtin +types and most of builtin functions are not declared. To load them +automatically this flag can be passed to the frontend (see also :ref:`the +section on the OpenCL Header `): .. code-block:: console $ clang -Xclang -finclude-default-header test.cl -Alternatively ``-include`` or ``-I`` followed by the path to the header location -can be given manually. +Note that this is a frontend-only flag and therefore it requires the use of +flags that forward options to the frontend, e.g. ``-cc1`` or ``-Xclang``. + +Alternatively the internal header `opencl-c.h` containing the declarations +can be included manually using ``-include`` or ``-I`` followed by the path +to the header location. The header can be found in the clang source tree or +installation directory. .. code-block:: console - $ clang -I/lib/Headers/opencl-c.h test.cl + $ clang -I/lib/Headers/opencl-c.h test.cl + $ clang -I/lib/clang//include/opencl-c.h/opencl-c.h test.cl -In this case the kernel code should contain ``#include `` just as a -regular C include. +In this example it is assumed that the kernel code contains +``#include `` just as a regular C include. .. _opencl_cl_ext: @@ -2874,10 +2895,14 @@ of extensions that they support. Clang allows to amend this using the ``-cl-ext`` flag with a comma-separated list of extensions prefixed with ``'+'`` or ``'-'``. The syntax: ``-cl-ext=<(['-'|'+'][,])+>``, where extensions -can be either one of `the OpenCL specification extensions -`_ -or any known vendor extension. Alternatively, ``'all'`` can be used to enable +can be either one of `the OpenCL published extensions +`_ +or any vendor extension. Alternatively, ``'all'`` can be used to enable or disable all known extensions. + +Note that this is a frontend-only flag and therefore it requires the use of +flags that forward options to the frontend e.g. ``-cc1`` or ``-Xclang``. + Example disabling double support for the 64-bit SPIR target: .. code-block:: console @@ -2896,7 +2921,7 @@ Overrides the target address space map with a fake map. This allows adding explicit address space IDs to the bitcode for non-segmented -memory architectures that don't have separate IDs for each of the OpenCL +memory architectures that do not have separate IDs for each of the OpenCL logical address spaces by default. Passing ``-ffake-address-space-map`` will add/override address spaces of the target compiled for with the following values: ``1-global``, ``2-constant``, ``3-local``, ``4-generic``. The private address @@ -2905,7 +2930,10 @@ .. code-block:: console - $ clang -ffake-address-space-map test.cl + $ clang -cc1 -ffake-address-space-map test.cl + +Note that this is a frontend-only flag and therefore it requires the use of +flags that forward options to the frontend e.g. ``-cc1`` or ``-Xclang``. Some other flags used for the compilation for C can also be passed while compiling for OpenCL, examples: ``-c``, ``-O<1-4|s>``, ``-o``, ``-emit-llvm``, etc. @@ -2945,12 +2973,15 @@ .. code-block:: console - $ clang -target spir-unknown-unknown test.cl - $ clang -target spir64-unknown-unknown test.cl + $ clang -cc1 -triple=spir test.cl + $ clang -cc1 -triple=spir64 test.cl + + Note that this is a frontend-only target and therefore it requires the use of + flags that forward options to the frontend e.g. ``-cc1`` or ``-Xclang``. All known OpenCL extensions are supported in the SPIR targets. Clang will generate SPIR v1.2 compatible IR for OpenCL versions up to 2.0 and SPIR v2.0 - for OpenCL v2.0. + for OpenCL v2.0 or C++ for OpenCL. - x86 is used by some implementations that are x86 compatible and currently remains for backwards compatibility (with older implementations prior to @@ -2972,7 +3003,8 @@ By default Clang will not include standard headers and therefore OpenCL builtin functions and some types (i.e. vectors) are unknown. The default CL header is, however, provided in the Clang installation and can be enabled by passing the -``-finclude-default-header`` flag to the Clang frontend. +``-finclude-default-header`` flag (see :ref:`flags description ` +for more details). .. code-block:: console @@ -2992,10 +3024,10 @@ OpenCL Extensions ----------------- -All of the ``cl_khr_*`` extensions from `the official OpenCL specification -`_ -up to and including version 2.0 are available and set per target depending on the -support available in the specific architecture. +Most of the ``cl_khr_*`` extensions to OpenCL C from `the official OpenCL +registry `_ are available and +configured per target depending on the support available in the specific +architecture. It is possible to alter the default extensions setting per target using ``-cl-ext`` flag. (See :ref:`flags description ` for more details). @@ -3022,7 +3054,10 @@ void my_func(my_t); #pragma OPENCL EXTENSION my_ext : end -Declaring the same types in different vendor extensions is disallowed. +There is no conflict resolution for identifier clashes among extensions. +It is therefore recommended that the identifiers are prefixed with a +double underscore to avoid clashing with user space identifiers. Vendor +extension should use reserved identifier prefix e.g. amd, arm, intel. Clang also supports language extensions documented in `The OpenCL C Language Extensions Documentation @@ -3203,13 +3238,14 @@ `_ and there is no plan to support it in clang in any new releases in the near future. -For detailed information about this language refer to `The C++ for OpenCL -Programming Language Documentation -`_. -Since C++ features are to be used on top of OpenCL C functionality, all existing -restrictions from OpenCL C v2.0 will inherently apply. All OpenCL C builtin types -and function libraries are supported and can be used in this mode. +Clang currently support C++ for OpenCL v1.0. +For detailed information about this language refer to the C++ for OpenCL +Programming Language Documentation available +in `the latest build +`_ +or in `the official release +`_. To enable the C++ for OpenCL mode, pass one of following command line options when compiling ``.cl`` file ``-cl-std=clc++``, ``-cl-std=CLC++``, ``-std=clc++`` or @@ -3236,31 +3272,41 @@ Constructing and destroying global objects ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Global objects must be constructed before the first kernel using the global objects -is executed and destroyed just after the last kernel using the program objects is -executed. In OpenCL v2.0 drivers there is no specific API for invoking global -constructors. However, an easy workaround would be to enqueue a constructor -initialization kernel that has a name ``_GLOBAL__sub_I_``. -This kernel is only present if there are any global objects to be initialized in -the compiled binary. One way to check this is by passing ``CL_PROGRAM_KERNEL_NAMES`` -to ``clGetProgramInfo`` (OpenCL v2.0 s5.8.7). - -Note that if multiple files are compiled and linked into libraries, multiple kernels -that initialize global objects for multiple modules would have to be invoked. - -Applications are currently required to run initialization of global objects manually -before running any kernels in which the objects are used. +Global objects with non-trivial constructors require the constructors to be run +before the first kernel using the global objects is executed. Similarly global +objects with non-trivial destructors require destructor invocation just after +the last kernel using the program objects is executed. +In OpenCL versions earlier than v2.2 there is no support for invoking global +constructors. However, an easy workaround is to manually enqueue the +constructor initialization kernel that has the following name scheme +``_GLOBAL__sub_I_``. +This kernel is only present if there are global objects with non-trivial +constructors present in the compiled binary. One way to check this is by +passing ``CL_PROGRAM_KERNEL_NAMES`` to ``clGetProgramInfo`` (OpenCL v2.0 +s5.8.7) and then checking whether any kernel name matches the naming scheme of +global constructor initialization kernel above. + +Note that if multiple files are compiled and linked into libraries, multiple +kernels that initialize global objects for multiple modules would have to be +invoked. + +Applications are currently required to run initialization of global objects +manually before running any kernels in which the objects are used. .. code-block:: console clang -cl-std=clc++ test.cl -If there are any global objects to be initialized, the final binary will contain -the ``_GLOBAL__sub_I_test.cl`` kernel to be enqueued. +If there are any global objects to be initialized, the final binary will +contain the ``_GLOBAL__sub_I_test.cl`` kernel to be enqueued. -Global destructors can not be invoked in OpenCL v2.0 drivers. However, all memory used -for program scope objects is released on ``clReleaseProgram``. +Note that the manual workaround only applies to objects declared at the +program scope. There is no manual workaround for the construction of static +objects with non-trivial constructors inside functions. +Global destructors can not be invoked manually in the OpenCL v2.0 drivers. +However, all memory used for program scope objects should be released on +``clReleaseProgram``. .. _target_features: