diff --git a/llvm/docs/ConvergentOperations.rst b/llvm/docs/ConvergentOperations.rst
new file mode 100644
--- /dev/null
+++ b/llvm/docs/ConvergentOperations.rst
@@ -0,0 +1,625 @@
+==============================
+Convergent Operation Semantics
+==============================
+
+.. contents::
+   :local:
+   :depth: 4
+
+Overview
+========
+
+Some parallel execution environments execute threads in groups that allow
+efficient communication within each group. Notably this is the case for
+whole-program vectorization environments such as GPUs, where threads are mapped
+to lanes of a SIMD vector. Efficient communication among threads is possible in
+this case by simply exchanging data between the lanes of a vector. However, the
+semantics defined in this document are independent of such implementation
+details.
+
+When control flow diverges, i.e. threads of the same group follow different
+paths through the CFG, not all threads of the group may be available to
+participate in this communication. This is the defining characteristic that
+distinguishes convergent operations from other inter-thread communication and
+which requires the use of the ``convergent`` attribute to indicate
+*additional constraints* on program transforms:
+
+A convergent operation involves inter-thread communication outside of the
+memory model, where the set of threads which participate in communication is
+implicitly defined or at least affected by control flow.
+
+For example, in the following GPU compute kernel, communication during the
+convergent operation is expected to occur precisely among those threads of an
+implementation-defined execution scope (such as workgroup or subgroup) for
+which ``condition`` is true:
+
+.. code-block:: c++
+
+  void example_kernel() {
+      ...
+      if (condition)
+          convergent_operation();
+      ...
+  }
+
+In structured programming languages, there is often an intuitive and
+unambiguous way of determining the threads that are expected to communicate.
+However, this is not always the case even in structured programming languages,
+and the intuition breaks down entirely in unstructured control flow. This
+document describes the formal semantics in LLVM, i.e. how to determine the set
+of communicating threads for convergent operations.
+
+The definitions in this document leave many details open, such as how groups of
+threads are formed in the first place. It focuses on the questions that are
+relevant for deciding the correctness of generic program transforms and
+convergence-related analyses such as divergence analysis.
+
+Note: It is common among practicioners to think about convergent operations in
+terms of *divergence* and *reconvergence*: sets of threads split at branch
+instructions if threads follow different paths through the control flow graph
+(divergence) and may later merge when they reach the same static point in the
+program (reconvergence). This operational point of view is often convenient for
+backend implementations and it can sometimes be useful for guiding intuition.
+However, the semantics defined here are declarative, since experience has shown
+that the operational view is too restrictive in practice for general compiler
+transforms. It is up to each implementation to operationalize those declarative
+semantics in a way that makes sense for the underlying hardware, which varies
+wildly.
+
+
+Motivating Examples of Convergent Operations
+============================================
+
+(This section is informative.)
+
+The following stylized pixel shader samples a texture at a given set of
+coordinates. Texture sampling requires screen-space derivatives of the
+coordinates to determine the level of detail (mipmap level) of the sample.
+They are commonly approximated by taking the difference between neighboring
+pixels, which are computed by different threads in the same group:
+
+.. code-block:: c++
+
+  void example_shader() {
+      ...
+      color = textureSample(texture, coordinates);
+      if (condition) {
+          use(color);
+      }
+      ...
+  }
+
+From a purely single-threaded perspective, sinking the `textureSample` into
+the if-statement appears legal. However, if the condition is false for some
+neighboring pixels, then their corresponding threads will not execute together
+in the group, making it impossible to take the difference of coordinates as an
+approximation of the screen-space derivative. In practice, the outcome will be
+an undefined value.
+
+That is, the `textureSample` operation fits our definition of a convergent
+operation:
+
+ 1. It communicates with a set of threads that implicitly depends on control
+    flow.
+
+ 2. Correctness depends on this set of threads.
+
+The following example shows that merging common code of branches can be
+incorrect in the face of convergent operations:
+
+.. code-block:: c++
+
+  void example_kernel() {
+      delta = ...
+      if (delta > 0) {
+          total_gains = subgroupAdd(delta);
+          ...
+      } else {
+          total_losses = subgroupAdd(delta);
+          ...
+      }
+  }
+
+The ``subgroupAdd`` computing the ``total_gains`` will be executed by the
+subset of threads with positive ``delta`` in a subgroup (wave), and so will sum
+up all the ``delta`` values of those threads; and similarly for the
+``subgroupAdd`` that computes the ``total_losses``.
+
+If we were to hoist and merge the ``subgroupAdd`` above the if-statement, it
+would sum up the ``delta`` across *all* threads instead.
+
+Finally, consider an example of how jump threading removes structure in a way
+that can make semantics non-obvious:
+
+.. code-block:: llvm
+
+  void example_original() {
+  entry:
+      ...
+      br i1 %cond1, label %then1, label %mid
+
+  then1:
+      ...
+      %cond2 = ...
+      br label %mid
+
+  mid:
+      %flag = phi i1 [ true, %entry ], [ %cond2, %then1 ]
+      br i1 %flag, label %then2, label %end
+
+  then2:
+      ...
+      call void @subgroupControlBarrier()
+      ...
+      br label %end
+
+  end:
+  }
+
+  void example_jumpthreaded() {
+  entry:
+      br i1 %cond1, label %then1, label %then2
+
+  then1:
+      ...
+      %cond2 = ...
+      br i1 %cond2, label %then2, label %end
+
+  then2:
+      ...
+      call void @subgroupControlBarrier()
+      ...
+      br label %end
+
+  end:
+  }
+
+Is the control barrier guaranteed to synchronize among the same set of threads
+in both cases? Different implementations in the literature may give different
+answers to this question:
+
+* In an implementation that reconverges at post-dominators, threads reconverge
+  at ``mid`` in the first version, so that all threads (within a subgroup/wave)
+  that execute the control barrier do so together. In the second version,
+  threads that reach the control barrier via different paths synchronize
+  separately.
+
+* An implementation that sorts basic blocks topologically and ensures maximal
+  reconvergence for each basic block would behave the same way in both
+  versions.
+
+
+Dynamic Instances and Convergence Tokens
+========================================
+
+Every execution of an LLVM IR instruction occurs in a *dynamic instance* of
+the instruction. Dynamic instances are the formal objects by which we talk
+about communicating threads in convergent operations. They satisfy:
+
+1. Different executions of the same static instruction by a single thread
+   give rise to different dynamic instances of that instruction.
+
+2. Executions of different static instructions always occur in different
+   dynamic instances.
+
+3. Executions of the same static instruction by different threads may occur in
+   the same dynamic instance.
+
+4. When executing a convergent operation, the set of threads that execute the
+   same dynamic instance is the set of threads that communicate with each other
+   for that operation.
+
+*Convergence tokens* are values of ``token`` type, i.e. they cannot be used in
+``phi`` or ``select`` instructions. A convergence token value represents the
+dynamic instance of the instruction that produced it.
+
+Convergent operations typically have a ``convergencectrl`` operand bundle with
+a convergence token operand to define the set of communicating threads relative
+to some anchor. The details are described in the
+:ref:`Formal Rules <convergence_formal_rules>` section.
+
+The convergence control intrinsics described in this document and convergent
+operations that have a ``convergencectrl`` operand bundle are considered
+*controlled* convergent operations. Other convergent operations are
+*uncontrolled*.
+
+The use of uncontrolled convergent operations is deprecated.
+
+
+Convergence Control Intrinsics
+==============================
+
+This section describes target-independent intrinsics that can be used to
+produce convergence tokens.
+
+.. _llvm.experimental.convergence.anchor:
+
+``llvm.experimental.convergence.anchor``
+----------------------------------------
+
+.. code-block:: llvm
+
+  token @llvm.experimental.convergence.anchor() convergent readnone
+
+This intrinsic is a marker that acts as an "anchor" producing an initial
+convergence token. The set of threads executing the same dynamic instance of
+this intrinsic is implementation-defined.
+
+The expectation is that all threads within a group that "happen to be active at
+the same time" will execute the same dynamic instance, so that programs can
+detect the maximal set of threads that can communicate efficiently within
+some local region of the program.
+
+
+.. _llvm.experimental.convergence.loop:
+
+``llvm.experimental.convergence.loop``
+--------------------------------------
+
+.. code-block:: llvm
+
+  token @llvm.experimental.convergence.loop() [ "convergencectrl"(token) ] convergent readnone
+
+This intrinsic defines the "heart" of a loop, i.e. the place where an imaginary
+loop counter is incremented for the purpose of determining convergence
+semantics.
+
+The convergence control token operand is usually defined outside of the loop,
+but this is not a requirement for the validity of a program (the resulting
+behavior is quite different, though).
+
+The resulting convergence token *can* be used outside of the loop; see the
+:ref:`Formal Rules <convergence_formal_rules>` section for details.
+
+
+.. _llvm.experimental.convergence.entry:
+
+``llvm.experimental.convergence.entry``
+----------------------------------------
+
+.. code-block:: llvm
+
+  token @llvm.experimental.convergence.entry() convergent readnone
+
+This intrinsic returns the convergence token that was used in the
+``convergencectrl`` operand bundle when the current function was called.
+
+Behavior is undefined if the containing function was called from IR without
+such a bundle.
+
+The expectation is that for program "main" functions, such as kernel entry
+functions, whose caller is not visible to LLVM, the implementation returns a
+convergence token that represents uniform control flow, i.e. that is guaranteed
+to refer to all threads within a (target- or environment-dependent) group.
+
+Behavior is undefined if this intrinsic appears in a function that isn't
+``convergent``.
+
+Behavior is undefined if this intrinsic appears inside of another convergence
+region or outside of a function's entry block.
+
+Function inlining substitutes this intrinsic with the token from the operand
+bundle. For example:
+
+.. code-block:: c++
+
+  // Before inlining:
+
+  void callee() convergent {
+    %tok = call token @llvm.experimental.convergence.entry()
+    convergent_operation(...) [ "convergencectrl"(token %tok) ]
+  }
+
+  void main() {
+    %outer = call token @llvm.experimental.convergence.anchor()
+    for (...) {
+      %inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
+      callee() [ "convergencectrl"(token %inner) ]
+    }
+  }
+
+  // After inlining:
+
+  void main() {
+    %outer = call token @llvm.experimental.convergence.anchor()
+    for (...) {
+      %inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
+      convergent_operation(...) [ "convergencectrl"(token %inner) ]
+    }
+  }
+
+
+.. _convergence_formal_rules:
+
+Formal Rules
+============
+
+Rules on the execution of dynamic instances:
+
+1. Let U be a static controlled convergent operation other than
+   :ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
+   whose convergence token is produced by an instruction D. Two threads
+   executing U execute the same dynamic instance of U if and only if they
+   obtained the token value from the same dynamic instance of D.
+
+2. Two threads executing the same static call U of
+   :ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
+   execute the same dynamic instance of U if and only if
+   (1) they obtained the ``convergencectrl`` token operand value from the same
+   dynamic instance of the defining instruction, and
+   (2) there is an *n* such that both threads execute U for the *n*'th time
+   with that same token operand value.
+
+3. If two threads execute the same static call U of
+   :ref:`llvm.experimental.convergence.entry <llvm.experimental.convergence.entry>`,
+   and at least one of them executes the function F containing U because it was
+   called by a ``call``, ``invoke``, or ``callbr`` instruction, then they
+   execute the same dynamic instance of U if and only if both threads execute F
+   because it was called by the same dynamic instance of a ``call``, ``invoke``,
+   or ``callbr`` instruction.
+
+   (Informational note: If the function is executed for some reason outside of
+   the scope of LLVM IR, e.g. because it is a kernel entry function, then this
+   rule does not apply. On the other hand, if a thread executes the function
+   due to a call from IR, then the thread cannot "spontaneously converge" with
+   threads that execute the function for some other reason.)
+
+4. Target-specific rules determine whether two threads execute the same
+   dynamic instance of an uncontrolled convergent operation.
+
+Static rules that must be satisfied by valid programs:
+
+1. Every cycle in the CFG that contains a use of a convergence token other
+   than a use by
+   :ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
+   must also contain the definition of the token.
+
+2. Every cycle in the CFG that contains two or more static uses of a
+   convergence token by
+   :ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
+   must also contain the definition of the token.
+
+3. The *convergence region* corresponding to a convergence token T is the
+   region in which T is live (i.e., the subset of the dominance region of the
+   definition of T from which a use of T can be reached without leaving the
+   dominance region).
+
+4. If a convergence region contains a use of a convergence token, then it must
+   also contain its definition.
+
+The freedom of targets to define the target-specific rule about uncontrolled
+convergent operations is limited by the following rule: A transform is correct
+for uncontrolled convergence operations if it does not make such operations
+control-dependent on additional values.
+
+
+Memory Model Non-Interaction
+============================
+
+The fact that an operation is convergent has no effect on how it is treated for
+memory model purposes. In particular, an operation that is ``convergent`` and
+``readnone`` does not introduce additional ordering constraints as far as the
+memory model is concerned. There is no implied barrier, neither in the memory
+barrier sense nor in the control barrier sense of synchronizing the execution
+of threads.
+
+Threads that execute the same dynamic instance do not necessarily do so at the
+same time.
+
+
+Other Interactions
+==================
+
+``convergent`` vs. ``speculatable``. A function can be both ``convergent`` and
+``speculatable``, indicating that the function does not have undefined
+behavior and has no effects besides calculating its result, but is still
+affected by the set of threads executing this function. This typically
+prevents speculation of calls to the function unless the constraint imposed
+by ``convergent`` is further relaxed by some other means.
+
+
+Examples for the Correctness of Program Transforms
+==================================================
+
+(This section is informative.)
+
+As implied by the rules in the previous sections, program transforms are correct
+with respect to convergent operations if they preserve or refine their
+semantics. This means that the set of communicating threads in the transformed
+program must have been possible in the original program.
+
+Program transforms with a single-threaded focus are generally conservatively
+correct if they do not sink or hoist convergent operations across a branch.
+This applies even to program transforms that change the control flow graph.
+
+For example, unrolling a loop that does not contain convergent operations
+cannot break any of the guarantees required for convergent operations outside
+of the loop.
+
+An arbitrary loop that contains convergent operations *can* be unrolled if
+all convergent operations refer back to an anchor inside the loop.
+For example (in pseudo-code):
+
+.. code-block:: llvm
+
+  while (counter > 0) {
+    %tok = call tok @llvm.experimental.convergence.anchor()
+    call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+    counter--;
+  }
+
+This can be unrolled to:
+
+.. code-block:: llvm
+
+  while (counter >= 2) {
+    %tok = call tok @llvm.experimental.convergence.anchor()
+    call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+    %tok = call tok @llvm.experimental.convergence.anchor()
+    call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+    counter -= 2;
+  }
+  while (counter > 0) {
+    %tok = call tok @llvm.experimental.convergence.anchor()
+    call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+    counter--;
+  }
+
+This is likely to change the behavior of the convergent operation if there
+are threads whose initial counter value is not a multiple of 2. That is allowed
+because the anchor intrinsic has implementation-defined convergence behavior
+and the loop unrolling transform is considered to be part of the
+implementation.
+
+If the loop contains uncontrolled convergent operations, this unrolling is
+forbidden.
+
+Unrolling a loop with convergent operations that refer to tokens produced
+outside the loop is also forbidden in the general case. Consider:
+
+.. code-block:: llvm
+
+  %outer = call tok @llvm.experimental.convergence.anchor()
+  while (counter > 0) {
+    %inner = call tok @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
+    call void @convergent.operation() [ "convergencectrl"(token %inner) ]
+    counter--;
+  }
+
+Threads whose counter is not a multiple of the unroll count will not communicate
+with the expected set of threads during the remainder loop. On the other hand,
+if the loop counter is known to be a multiple, then unrolling is allowed,
+though care must be taken to correct the use of the loop intrinsic.
+For example, unrolling by 2:
+
+.. code-block:: llvm
+
+  %outer = call tok @llvm.experimental.convergence.anchor()
+  while (counter > 0) {
+    %inner = call tok @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
+    call void @convergent.operation() [ "convergencectrl"(token %inner) ]
+    call void @convergent.operation() [ "convergencectrl"(token %inner) ]
+    counter -= 2;
+  }
+
+Loops in which a
+:ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
+intrinsic outside of the loop header uses a token defined outside of the loop
+can generally not be unrolled.
+
+Even though
+:ref:`llvm.experimental.convergence.anchor <llvm.experimental.convergence.anchor>`
+is marked as ``convergent``, it can be sunk in some cases. For example, in
+pseudo-code:
+
+.. code-block:: llvm
+
+  %tok = @call tok llvm.experimental.convergence.anchor()
+  if (condition) {
+    call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+  }
+
+Assuming that ``%tok`` is only used inside the conditional block, the anchor can
+be sunk. Again, the rationale is that the anchor has implementation-defined
+behavior, and the sinking is part of the implementation.
+
+Anchors can be hoisted in acyclic control flow. For example:
+
+.. code-block:: llvm
+
+  if (condition) {
+    %tok = @call tok llvm.experimental.convergence.anchor()
+    call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+  } else {
+    %tok = @call tok llvm.experimental.convergence.anchor()
+    call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+  }
+
+The anchors can be hoisted, resulting in:
+
+.. code-block:: llvm
+
+  %tok = @call tok llvm.experimental.convergence.anchor()
+  if (condition) {
+  call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+  } else {
+  call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+  }
+
+The behavior is unchanged, since each of the static convergent operations only
+ever communicates with threads that have the same ``condition`` value.
+By contrast, hoisting the convergent operations themselves is forbidden.
+
+Hoisting and sinking anchors out of and into loops is forbidden. For example:
+
+.. code-block:: llvm
+
+  for (;;) {
+    %tok = call tok @llvm.experimental.convergence.anchor()
+    call void @convergent.operation() [ "convergencectrl"(token %tok) ]
+  }
+
+Hoisting the anchor would make the program invalid according to the static
+validity rules. Conversely:
+
+.. code-block:: llvm
+
+  %outer = call tok @llvm.experimental.convergence.anchor()
+  while (counter > 0) {
+    %inner = call tok @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
+    call void @convergent.operation() [ "convergencectrl"(token %inner) ]
+    counter--;
+  }
+
+The program would stay valid if the anchor was sunk into the loop, but its
+behavior could end up being different. If the anchor is inside the loop, then
+the grouping of threads during the execution of the anchor -- i.e., the sets of
+threads executing the same dynamic instance of it -- can change in an arbitrary,
+implementation-defined way in each iteration.
+
+Convergent operations can be sunk together with their anchor. Again in
+pseudo-code:
+
+.. code-block:: llvm
+
+  %tok = call tok @llvm.experimental.convergence.anchor()
+  %a = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
+  %b = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
+  if (condition) {
+    use(%a, %b)
+  }
+
+Assuming that ``%tok``, ``%a``, and ``%b`` are only used inside the conditional
+block, all can be sunk together:
+
+.. code-block:: llvm
+
+  if (condition) {
+    %tok = call tok @llvm.experimental.convergence.anchor()
+    %a = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
+    %b = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
+    use(%a, %b)
+  }
+
+The rationale is that the anchor intrinsic has implementation-defined behavior,
+and the sinking transform is considered to be part of the implementation.
+
+However, sinking *only* the convergent operation producing ``%b`` would be
+incorrect. That would allow the (remainder of the) implementation to include
+threads for which ``condition`` is false to participate in the same dynamic
+instance of the anchor and therefore in the calculation of ``%a``, and so the
+set of threads communicating for the calculations of ``%a`` and ``%b`` could be
+different, which the original program doesn't allow.
+
+Note that the entry intrinsic behaves differently. Sinking the convergent
+operations is forbidden in the following snippet:
+
+.. code-block:: llvm
+
+  %tok = call tok @llvm.experimental.convergence.entry()
+  %a = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
+  %b = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
+  if (condition) {
+    use(%a, %b)
+  }
+
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -1471,24 +1471,22 @@
     function call are also considered to be cold; and, thus, given low
     weight.
 ``convergent``
-    In some parallel execution models, there exist operations that cannot be
-    made control-dependent on any additional values.  We call such operations
-    ``convergent``, and mark them with this attribute.
-
-    The ``convergent`` attribute may appear on functions or call/invoke
-    instructions.  When it appears on a function, it indicates that calls to
-    this function should not be made control-dependent on additional values.
-    For example, the intrinsic ``llvm.nvvm.barrier0`` is ``convergent``, so
-    calls to this intrinsic cannot be made control-dependent on additional
-    values.
+    Some parallel execution environments execute threads in groups that allow
+    efficient communication within the group, among a subset of threads that
+    is implicitly defined by control flow. We call such operations
+    ``convergent`` and mark them with this attribute.
+
+    The ``convergent`` attribute may appear on call/invoke instructions to
+    indicate that the instruction is a convergent operation, or on functions
+    to indicate that calls to this function are convergent operations.
 
-    When it appears on a call/invoke, the ``convergent`` attribute indicates
-    that we should treat the call as though we're calling a convergent
-    function.  This is particularly useful on indirect calls; without this we
-    may treat such calls as though the target is non-convergent.
+    The presence of this attribute indicates that certain program transforms
+    involving control flow are forbidden. For a detailed description, see the
+    `Convergent Operations <ConvergentOperations.html>`_ document.
 
     The optimizer may remove the ``convergent`` attribute on functions when it
-    can prove that the function does not execute any convergent operations.
+    can prove that the function does not execute uncontrolled convergent
+    operations or ``llvm.experimental.convergent.entry``.
     Similarly, the optimizer may remove ``convergent`` on calls/invokes when it
     can prove that the call/invoke cannot call a convergent function.
 ``inaccessiblememonly``
@@ -1603,10 +1601,14 @@
     (synchronize) with another thread through memory or other well-defined means.
     Synchronization is considered possible in the presence of `atomic` accesses
     that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses,
-    as well as `convergent` function calls. Note that through `convergent` function calls
-    non-memory communication, e.g., cross-lane operations, are possible and are also
-    considered synchronization. However `convergent` does not contradict `nosync`.
-    If an annotated function does ever synchronize with another thread,
+    as well as `convergent` function calls.
+
+    Note that `convergent` operations can involve communication that is
+    considered to be not through memory and does notnecessarily imply an
+    ordering between threads for the purposes of the memory model. Therefore,
+    an operation can be both `convergent` and `nosync`.
+
+    If a `nosync` function does ever synchronize with another thread,
     the behavior is undefined.
 ``nounwind``
     This function attribute indicates that the function never raises an
@@ -2283,6 +2285,14 @@
 :ref:`stackmap entry <statepoint-stackmap-format>`.  See the intrinsic description
 for further details.
 
+Convergence Control Operand Bundles
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A "convergencectrl" operand bundle is only valid on a ``convergent`` operation.
+When present, the operand bundle must contain exactly one value of token type.
+See the `Convergent Operations <ConvergentOperations.html>`_ document for
+details.
+
 .. _moduleasm:
 
 Module-Level Inline Assembly
@@ -16089,6 +16099,14 @@
       %a = load i16, i16* @x, align 2
       %res = call float @llvm.convert.from.fp16(i16 %a)
 
+Convergence Intrinsics
+----------------------
+
+The LLVM convergence intrinsics for controlling the semantics of ``convergent``
+operations, which all start with the ``llvm.experimental.convergence.``
+prefix, are described in the `Convergent Operations <ConvergentOperations.html>`_
+document.
+
 .. _dbg_intrinsics:
 
 Debugger Intrinsics
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -15,6 +15,7 @@
    BranchWeightMetadata
    Bugpoint
    CommandGuide/index
+   ConvergentOperations
    Coroutines
    DependenceGraphs/index
    ExceptionHandling
@@ -130,6 +131,9 @@
 :doc:`GlobalISel/index`
   This describes the prototype instruction selection replacement, GlobalISel.
 
+:doc:`ConvergentOperations`
+  Description of ``convergent`` operation semantics and related intrinsics.
+
 =====================
 Testing and Debugging
 =====================
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1556,7 +1556,14 @@
 //===---------- Intrinsics to query properties of scalable vectors --------===//
 def int_vscale : Intrinsic<[llvm_anyint_ty], [], [IntrNoMem]>;
 
-//===----------------------------------------------------------------------===//
+//===------- Convergence Intrinsics ---------------------------------------===//
+
+def int_experimental_convergence_entry
+  : Intrinsic<[llvm_token_ty], [], [IntrNoMem, IntrConvergent]>;
+def int_experimental_convergence_anchor
+  : Intrinsic<[llvm_token_ty], [], [IntrNoMem, IntrConvergent]>;
+def int_experimental_convergence_loop
+  : Intrinsic<[llvm_token_ty], [], [IntrNoMem, IntrConvergent]>;
 
 //===----------------------------------------------------------------------===//
 // Target-specific intrinsics
diff --git a/llvm/include/llvm/IR/LLVMContext.h b/llvm/include/llvm/IR/LLVMContext.h
--- a/llvm/include/llvm/IR/LLVMContext.h
+++ b/llvm/include/llvm/IR/LLVMContext.h
@@ -87,12 +87,13 @@
   /// operand bundle tags without comparing strings. Keep this in sync with
   /// LLVMContext::LLVMContext().
   enum : unsigned {
-    OB_deopt = 0,         // "deopt"
-    OB_funclet = 1,       // "funclet"
-    OB_gc_transition = 2, // "gc-transition"
-    OB_cfguardtarget = 3, // "cfguardtarget"
-    OB_preallocated = 4,  // "preallocated"
-    OB_gc_live = 5,       // "gc-live"
+    OB_deopt = 0,           // "deopt"
+    OB_funclet = 1,         // "funclet"
+    OB_gc_transition = 2,   // "gc-transition"
+    OB_cfguardtarget = 3,   // "cfguardtarget"
+    OB_preallocated = 4,    // "preallocated"
+    OB_gc_live = 5,         // "gc-live"
+    OB_convergencectrl = 6, // "convergencectrl"
   };
 
   /// getMDKindID - Return a unique non-zero ID for the specified metadata kind.
diff --git a/llvm/lib/IR/LLVMContext.cpp b/llvm/lib/IR/LLVMContext.cpp
--- a/llvm/lib/IR/LLVMContext.cpp
+++ b/llvm/lib/IR/LLVMContext.cpp
@@ -78,6 +78,11 @@
          "gc-transition operand bundle id drifted!");
   (void)GCLiveEntry;
 
+  auto *ConvergenceCtrlEntry = pImpl->getOrInsertBundleTag("convergencectrl");
+  assert(ConvergenceCtrlEntry->second == LLVMContext::OB_convergencectrl &&
+         "convergencectrl operand bundle id drifted!");
+  (void)ConvergenceCtrlEntry;
+
   SyncScope::ID SingleThreadSSID =
       pImpl->getOrInsertSyncScopeID("singlethread");
   assert(SingleThreadSSID == SyncScope::SingleThread &&
diff --git a/llvm/test/Bitcode/operand-bundles-bc-analyzer.ll b/llvm/test/Bitcode/operand-bundles-bc-analyzer.ll
--- a/llvm/test/Bitcode/operand-bundles-bc-analyzer.ll
+++ b/llvm/test/Bitcode/operand-bundles-bc-analyzer.ll
@@ -9,6 +9,7 @@
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
 ; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
+; CHECK-NEXT:    <OPERAND_BUNDLE_TAG
 ; CHECK-NEXT:  </OPERAND_BUNDLE_TAGS_BLOCK
 
 ; CHECK:   <FUNCTION_BLOCK