diff --git a/openmp/docs/README.txt b/openmp/docs/README.txt
--- a/openmp/docs/README.txt
+++ b/openmp/docs/README.txt
@@ -12,9 +12,9 @@
Sphinx and then do:
cd
- cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true
- make
- $BROWSER /projects/openmp/docs//html/index.html
+ cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true -DCMAKE_MODULE_PATH=/path/to/llvm/cmake/modules
+ make docs-openmp-html
+ $BROWSER /docs/html/index.html
The mapping between reStructuredText files and generated documentation is
`docs/Foo.rst` <-> `/projects/openmp/docs//html/Foo.html` <->
diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -376,7 +376,7 @@
granularity down to group since that is the largest granularity allowed by the OS.
KMP_HIDDEN_HELPER_AFFINITY (Windows, Linux)
-"""""""""""""""""""""""""""""
+"""""""""""""""""""""""""""""""""""""""""""
Enables run-time library to bind hidden helper threads to physical processing units.
This environment variable has the same syntax and semantics as ``KMP_AFFINIY`` but only
@@ -1058,6 +1058,7 @@
The default behavior of LLVM 14 is to force atomic maps clauses, prior versions
of LLVM did not.
+.. _libomptarget_jit_opt_level:
LIBOMPTARGET_JIT_OPT_LEVEL
""""""""""""""""""""""""""
@@ -1083,6 +1084,8 @@
pipeline and backend are skipped and only target specific post-processing is
performed on the object file before it is loaded onto the device.
+.. _libomptarget_jit_replacement_module:
+
LIBOMPTARGET_JIT_REPLACEMENT_MODULE
"""""""""""""""""""""""""""""""""""
@@ -1096,6 +1099,7 @@
:ref:`LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE` or
:ref:`LIBOMPTARGET_JIT_POST_OPT_IR_MODULE` environment variables.
+.. _libomptarget_jit_pre_opt_ir_module:
LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE
""""""""""""""""""""""""""""""""""
@@ -1107,6 +1111,7 @@
transformed and loaded back into the JIT pipeline via
:ref:`LIBOMPTARGET_JIT_REPLACEMENT_MODULE`.
+.. _libomptarget_jit_post_opt_ir_module:
LIBOMPTARGET_JIT_POST_OPT_IR_MODULE
"""""""""""""""""""""""""""""""""""
@@ -1126,7 +1131,7 @@
combined kernel, e.g., `target teams distribute parallel for`, has insufficient
parallelism. Especially if the trip count of the loops is lower than the number
of threads possible times the number of teams (aka. blocks) the device preferes
-(see also :ref:`LIBOMPTARGET_AMDGPU_TEAMS_PER_CU), we will reduce the thread
+(see also :ref:`LIBOMPTARGET_AMDGPU_TEAMS_PER_CU`), we will reduce the thread
count to increase outer (team/block) parallelism. The thread count will never
be reduced below the value passed for this environment variable though.
@@ -1225,6 +1230,8 @@
It is also the number of AQL packets that can be pushed into each queue without
waiting the driver to process them. The default value is ``512``.
+.. _libomptarget_amdgpu_teams_per_cu:
+
LIBOMPTARGET_AMDGPU_TEAMS_PER_CU
""""""""""""""""""""""""""""""""