diff --git a/openmp/docs/README.txt b/openmp/docs/README.txt
--- a/openmp/docs/README.txt
+++ b/openmp/docs/README.txt
@@ -12,9 +12,9 @@
 Sphinx <http://sphinx-doc.org/> and then do:
 
     cd <build-dir>
-    cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true <src-dir>
-    make
-    $BROWSER <build-dir>/projects/openmp/docs//html/index.html
+    cmake -DLLVM_ENABLE_SPHINX=true -DSPHINX_OUTPUT_HTML=true -DCMAKE_MODULE_PATH=/path/to/llvm/cmake/modules <src-dir>
+    make docs-openmp-html
+    $BROWSER <build-dir>/docs/html/index.html
 
 The mapping between reStructuredText files and generated documentation is
 `docs/Foo.rst` <-> `<build-dir>/projects/openmp/docs//html/Foo.html` <->
diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -376,7 +376,7 @@
     granularity down to group since that is the largest granularity allowed by the OS.
 
 KMP_HIDDEN_HELPER_AFFINITY (Windows, Linux)
-"""""""""""""""""""""""""""""
+"""""""""""""""""""""""""""""""""""""""""""
 
 Enables run-time library to bind hidden helper threads to physical processing units.
 This environment variable has the same syntax and semantics as ``KMP_AFFINIY`` but only
@@ -1058,6 +1058,7 @@
 The default behavior of LLVM 14 is to force atomic maps clauses, prior versions
 of LLVM did not.
 
+.. _libomptarget_jit_opt_level:
 
 LIBOMPTARGET_JIT_OPT_LEVEL
 """"""""""""""""""""""""""
@@ -1083,6 +1084,8 @@
 pipeline and backend are skipped and only target specific post-processing is
 performed on the object file before it is loaded onto the device.
 
+.. _libomptarget_jit_replacement_module:
+
 LIBOMPTARGET_JIT_REPLACEMENT_MODULE
 """""""""""""""""""""""""""""""""""
 
@@ -1096,6 +1099,7 @@
 :ref:`LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE` or
 :ref:`LIBOMPTARGET_JIT_POST_OPT_IR_MODULE` environment variables.
 
+.. _libomptarget_jit_pre_opt_ir_module:
 
 LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE
 """"""""""""""""""""""""""""""""""
@@ -1107,6 +1111,7 @@
 transformed and loaded back into the JIT pipeline via
 :ref:`LIBOMPTARGET_JIT_REPLACEMENT_MODULE`.
 
+.. _libomptarget_jit_post_opt_ir_module:
 
 LIBOMPTARGET_JIT_POST_OPT_IR_MODULE
 """""""""""""""""""""""""""""""""""
@@ -1126,7 +1131,7 @@
 combined kernel, e.g., `target teams distribute parallel for`, has insufficient
 parallelism. Especially if the trip count of the loops is lower than the number
 of threads possible times the number of teams (aka. blocks) the device preferes
-(see also :ref:`LIBOMPTARGET_AMDGPU_TEAMS_PER_CU), we will reduce the thread
+(see also :ref:`LIBOMPTARGET_AMDGPU_TEAMS_PER_CU`), we will reduce the thread
 count to increase outer (team/block) parallelism. The thread count will never
 be reduced below the value passed for this environment variable though.
 
@@ -1225,6 +1230,8 @@
 It is also the number of AQL packets that can be pushed into each queue without
 waiting the driver to process them. The default value is ``512``.
 
+.. _libomptarget_amdgpu_teams_per_cu:
+
 LIBOMPTARGET_AMDGPU_TEAMS_PER_CU
 """"""""""""""""""""""""""""""""