diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -13,11 +13,34 @@ ============ This document contains the release notes for the OpenMP runtime, release 12.0.0. -Here we describe the status of openmp, including major improvements -from the previous release. All openmp releases may be downloaded +Here we describe the status of OpenMP, including major improvements +from the previous release. All OpenMP releases may be downloaded from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= +- Extended the ``libomptarget`` API functions to include source location + information and OpenMP target mapper support. This allows ``libomptarget`` to + know the source location of the OpenMP region it is executing, as well as the + name and declarations of all the variables used inside the region. Each + function generated now uses its ``mapper`` variant. The old API calls now call + into the new API functions with ``nullptr`` arguments for backwards + compatibility with old binaries. Source location information for + ``libomptarget`` is now generated by Clang at any level of debugging + information. +- Added improved error messages for ``libomptarget`` and ``CUDA`` plugins. Error + messages are now presented without requiring a debug build of + ``libomptarget``. The newly added source location information can also be used + to identify which OpenMP target region the failure occurred in. More + information can be found :ref:`here `. + +- Added additional environment variables to control output from the + ``libomptarget`` runtime library. ``LIBOMPTARGET_PROFILE`` to + generate time profile output similar to Clang's ``-ftime-trace`` option. + ``LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`` sets the threshold size for which + the ``libomptarget`` memory manager will handle the allocation. + ``LIBOMPTARGET_INFO`` allows the user to request certain information from the + ``libomptarget`` runtime using a 32-bit field. A full description of each + environment variable is described :ref:`here `. diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -16,6 +16,8 @@ LLVM/OpenMP Target Host Runtime (``libomptarget``) -------------------------------------------------- +.. _libopenmptarget_environment_vars: + Environment Variables ^^^^^^^^^^^^^^^^^^^^^ @@ -171,6 +173,95 @@ kernel region ends even though it isn't written to. Finally, at the end of the OpenMP data region the entries for ``X`` and ``Y`` are removed from the table. +.. _libopenmptarget_errors: + +Errors: +^^^^^^^ + +``libomptarget`` provides error messages when the program fails inside the +OpenMP target region. Common causes of failure could be an invalid pointer +access, running out of device memory, or trying to offload when the device is +busy. If the application was built with debugging symbols the error messages +will additionally provide the source location of the OpenMP target region. + +For example, consider the following code that implements a simple parallel +reduction on the GPU. This code has a bug that causes it to fail in the +offloading region. + +.. code-block:: c++ + + #include + + double sum(double *A, std::size_t N) { + double sum = 0.0; + #pragma omp target teams distribute parallel for reduction(+:sum) + for (int i = 0; i < N; ++i) + sum += A[i]; + + return sum; + } + + int main() { + const int N = 1024; + double A[N]; + sum(A, N); + } + +If this code is compiled and run, there will be an error message indicating what is +going wrong. + +.. code-block:: console + + $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o sum + $ ./sum + +.. code-block:: text + + CUDA error: Error when copying data from device to host. + CUDA error: an illegal memory access was encountered + Libomptarget error: Copying data from device failed. + Libomptarget error: Call to targetDataEnd failed, abort target. + Libomptarget error: Failed to process data after launching the kernel. + Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings. + sum.cpp:5:1: Libomptarget error 1: failure of target construct while offloading is mandatory + +This shows that there is an illegal memory access occuring inside the OpenMP +target region once execution has moved to the CUDA device, suggesting a +segmentation fault. This then causes a chain reaction of failures in +``libomptarget``. Another message suggests using the ``LIBOMPTARGET_INFO`` +environment variable as described in :ref:`libopenmptarget_environment_vars`. If +we do this it will print the sate of the host-target pointer mappings at the +time of failure. + +.. code-block:: console + + $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o sum + $ env LIBOMPTARGET_INFO=4 ./sum + +.. code-block:: text + + info: OpenMP Host-Device pointer mappings after block at sum.cpp:5:1: + info: Host Ptr Target Ptr Size (B) RefCount Declaration + info: 0x00007ffc058280f8 0x00007f4186600000 8 1 sum at sum.cpp:4:10 + +This tells us that the only data mapped between the host and the device is the +``sum`` variable that will be copied back from the device once the reduction has +ended. There is no entry mapping the host array ``A`` to the device. In this +situation, the compiler cannot determine the size of the array at compile time +so it will simply assume that the pointer is mapped on the device already by +default. The solution is to add an explicit map clause in the target region. + +.. code-block:: c++ + + double sum(double *A, std::size_t N) { + double sum = 0.0; + #pragma omp target teams distribute parallel for reduction(+:sum) map(to:A[0 : N]) + for (int i = 0; i < N; ++i) + sum += A[i]; + + return sum; + } + .. toctree:: :hidden: :maxdepth: 1