Skip to content

Commit 3bdd600

Browse files
committedJul 26, 2018
[OPENMP] What's new for OpenMP in clang.
Updated ReleaseNotes + Status of the OpenMP support in clang. llvm-svn: 338049
1 parent 66d405d commit 3bdd600

File tree

2 files changed

+84
-7
lines changed

2 files changed

+84
-7
lines changed
 

‎clang/docs/OpenMPSupport.rst

+69-6
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,15 @@
1010
.. role:: partial
1111
.. role:: good
1212

13+
.. contents::
14+
:local:
15+
1316
==================
1417
OpenMP Support
1518
==================
1619

17-
Clang fully supports OpenMP 3.1 + some elements of OpenMP 4.5. Clang supports offloading to X86_64, AArch64 and PPC64[LE] devices.
18-
Support for Cuda devices is not ready yet.
19-
The status of major OpenMP 4.5 features support in Clang.
20+
Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
21+
PPC64[LE] and has `basic support for Cuda devices`_.
2022

2123
Standalone directives
2224
=====================
@@ -35,7 +37,7 @@ Standalone directives
3537

3638
* #pragma omp target: :good:`Complete`.
3739

38-
* #pragma omp declare target: :partial:`Partial`. No full codegen support.
40+
* #pragma omp declare target: :good:`Complete`.
3941

4042
* #pragma omp teams: :good:`Complete`.
4143

@@ -64,5 +66,66 @@ Combined directives
6466

6567
* #pragma omp target teams distribute parallel for [simd]: :good:`Complete`.
6668

67-
Clang does not support any constructs/updates from upcoming OpenMP 5.0 except for `reduction`-based clauses in the `task` and `target`-based directives.
68-
In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS.
69+
Clang does not support any constructs/updates from upcoming OpenMP 5.0 except
70+
for `reduction`-based clauses in the `task` and `target`-based directives.
71+
72+
In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
73+
Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS.
74+
ows, and mac OS.
75+
76+
.. _basic support for Cuda devices:
77+
78+
Cuda devices support
79+
====================
80+
81+
Directives execution modes
82+
--------------------------
83+
84+
Clang code generation for target regions supports two modes: the SPMD and
85+
non-SPMD modes. Clang chooses one of these two modes automatically based on the
86+
way directives and clauses on those directives are used. The SPMD mode uses a
87+
simplified set of runtime functions thus increasing performance at the cost of
88+
supporting some OpenMP features. The non-SPMD mode is the most generic mode and
89+
supports all currently available OpenMP features. The compiler will always
90+
attempt to use the SPMD mode wherever possible. SPMD mode will not be used if:
91+
92+
- The target region contains an `if()` clause that refers to a `parallel`
93+
directive.
94+
95+
- The target region contains a `parallel` directive with a `num_threads()`
96+
clause.
97+
98+
- The target region contains user code (other than OpenMP-specific
99+
directives) in between the `target` and the `parallel` directives.
100+
101+
Data-sharing modes
102+
------------------
103+
104+
Clang supports two data-sharing models for Cuda devices: `Generic` and `Cuda`
105+
modes. The default mode is `Generic`. `Cuda` mode can give an additional
106+
performance and can be activated using the `-fopenmp-cuda-mode` flag. In
107+
`Generic` mode all local variables that can be shared in the parallel regions
108+
are stored in the global memory. In `Cuda` mode local variables are not shared
109+
between the threads and it is user responsibility to share the required data
110+
between the threads in the parallel regions.
111+
112+
Features not supported or with limited support for Cuda devices
113+
---------------------------------------------------------------
114+
115+
- Reductions across the teams are not supported yet.
116+
117+
- Cancellation constructs are not supported.
118+
119+
- Doacross loop nest is not supported.
120+
121+
- User-defined reductions are supported only for trivial types.
122+
123+
- Nested parallelism: inner parallel regions are executed sequentially.
124+
125+
- Static linking of libraries containing device code is not supported yet.
126+
127+
- Automatic translation of math functions in target regions to device-specific
128+
math functions is not implemented yet.
129+
130+
- Debug information for OpenMP target regions is not supported yet.
131+

‎clang/docs/ReleaseNotes.rst

+15-1
Original file line numberDiff line numberDiff line change
@@ -216,7 +216,21 @@ OpenCL C Language Changes in Clang
216216
OpenMP Support in Clang
217217
----------------------------------
218218

219-
- ...
219+
- Clang gained basic support for OpenMP 4.5 offloading for NVPTX target.
220+
To compile your program for NVPTX target use the following options:
221+
`-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda` for 64 bit platforms or
222+
`-fopenmp -fopenmp-targets=nvptx-nvidia-cuda` for 32 bit platform.
223+
224+
- Passing options to the OpenMP device offloading toolchain can be done using
225+
the `-Xopenmp-target=<triple> -opt=val` flag. In this way the `-opt=val`
226+
option will be forwarded to the respective OpenMP device offloading toolchain
227+
described by the triple. For example passing the compute capability to
228+
the OpenMP NVPTX offloading toolchain can be done as follows:
229+
`-Xopenmp-target=nvptx62-nvidia-cuda -march=sm_60`. For the case when only one
230+
target offload toolchain is specified under the `-fopenmp-targets=<triples>`
231+
option, then the triple can be skipped: `-Xopenmp-target -march=sm_60`.
232+
233+
- Other bugfixes.
220234

221235
CUDA Support in Clang
222236
---------------------

0 commit comments

Comments
 (0)
Please sign in to comment.