diff --git a/openmp/docs/README.txt b/openmp/docs/README.txt --- a/openmp/docs/README.txt +++ b/openmp/docs/README.txt @@ -12,9 +12,9 @@ Sphinx and then do: cd - cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true - make - $BROWSER /projects/openmp/docs//html/index.html + cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true -DCMAKE_MODULE_PATH=/path/to/llvm/cmake/modules + make docs-openmp-html + $BROWSER /docs/html/index.html The mapping between reStructuredText files and generated documentation is `docs/Foo.rst` <-> `/projects/openmp/docs//html/Foo.html` <-> diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -376,7 +376,7 @@ granularity down to group since that is the largest granularity allowed by the OS. KMP_HIDDEN_HELPER_AFFINITY (Windows, Linux) -""""""""""""""""""""""""""""" +""""""""""""""""""""""""""""""""""""""""""" Enables run-time library to bind hidden helper threads to physical processing units. This environment variable has the same syntax and semantics as ``KMP_AFFINIY`` but only @@ -1058,6 +1058,7 @@ The default behavior of LLVM 14 is to force atomic maps clauses, prior versions of LLVM did not. +.. _libomptarget_jit_opt_level: LIBOMPTARGET_JIT_OPT_LEVEL """""""""""""""""""""""""" @@ -1083,6 +1084,8 @@ pipeline and backend are skipped and only target specific post-processing is performed on the object file before it is loaded onto the device. +.. _libomptarget_jit_replacement_module: + LIBOMPTARGET_JIT_REPLACEMENT_MODULE """"""""""""""""""""""""""""""""""" @@ -1096,6 +1099,7 @@ :ref:`LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE` or :ref:`LIBOMPTARGET_JIT_POST_OPT_IR_MODULE` environment variables. +.. _libomptarget_jit_pre_opt_ir_module: LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE """""""""""""""""""""""""""""""""" @@ -1107,6 +1111,7 @@ transformed and loaded back into the JIT pipeline via :ref:`LIBOMPTARGET_JIT_REPLACEMENT_MODULE`. +.. _libomptarget_jit_post_opt_ir_module: LIBOMPTARGET_JIT_POST_OPT_IR_MODULE """"""""""""""""""""""""""""""""""" @@ -1126,7 +1131,7 @@ combined kernel, e.g., `target teams distribute parallel for`, has insufficient parallelism. Especially if the trip count of the loops is lower than the number of threads possible times the number of teams (aka. blocks) the device preferes -(see also :ref:`LIBOMPTARGET_AMDGPU_TEAMS_PER_CU), we will reduce the thread +(see also :ref:`LIBOMPTARGET_AMDGPU_TEAMS_PER_CU`), we will reduce the thread count to increase outer (team/block) parallelism. The thread count will never be reduced below the value passed for this environment variable though. @@ -1225,6 +1230,8 @@ It is also the number of AQL packets that can be pushed into each queue without waiting the driver to process them. The default value is ``512``. +.. _libomptarget_amdgpu_teams_per_cu: + LIBOMPTARGET_AMDGPU_TEAMS_PER_CU """"""""""""""""""""""""""""""""