diff --git a/llvm/docs/DynamicInstances.rst b/llvm/docs/DynamicInstances.rst
new file mode 100644
--- /dev/null
+++ b/llvm/docs/DynamicInstances.rst
@@ -0,0 +1,431 @@
+====================================================
+Dynamic Instances and Convergent Operation Semantics
+====================================================
+
+.. contents::
+   :local:
+   :depth: 4
+
+Overview
+========
+
+This document defines dynamic instances and the semantics of related intrinsics
+and the ``convergent`` attribute. Convergent operations communicate with other
+threads that execute the same operation. Typical examples of convergent
+operations are control barriers, which pause execution until all threads in some
+set of threads have reached the same barrier, and thread group reductions, which
+reduce an input value across some set of threads, e.g. summing up values
+computed by each thread.
+
+Convergent operations are commonly available in SPMD programming environments
+such as GPUs, where a large number of threads execute the same program
+concurrently. Implementations commonly map threads of execution onto lanes of
+a hardware SIMD unit, and the intent of convergent operations is often to
+communicate among those threads that are mapped to the same SIMD vector.
+However, the semantics defined in this document are independent of such
+implementation details.
+
+The defining characteristic that distinguishes convergent operations from other
+inter-thread communication is that the set of threads among which communication
+occurs is implicitly defined by control flow. For example, in the following GPU
+compute kernel, communication occurs precisely among those threads of an
+implementation-defined execution scope (such as workgroup or subgroup) for
+which ``condition`` is true:
+
+.. code-block:: text
+
+  void example_kernel() {
+      ...
+      if (condition)
+          convergent_operation();
+      ...
+  }
+
+One can think of the communicating sets of threads in terms of *divergence*
+and *reconvergence*: sets of threads split at branch instructions if threads
+follow different paths through the control flow graph (divergence) and may
+later merge when they reach the same static point in the program
+(reconvergence).
+
+In structured programming languages, there is usually an intuitive and
+unambiguous way of determining when threads reconverge. However, this is not
+the case in unstructured control flow representations as used by LLVM. The
+language of *dynamic instances* of instructions is used to formalize divergence
+and reconvergence and provide a global, static view of when reconvergence
+occurs.
+
+
+Waves/Warps and an Illustrating Example of Dynamic Instances
+============================================================
+
+Consider a simple loop:
+
+.. code-block:: text
+
+  void example_loop() {
+  A:
+    br label %B
+
+  B:
+    ...
+    br i1 %cond, label %B, label %C
+
+  C:
+    ...
+  }
+
+We can illustrate all possible executions of this code by drawing a graph of
+dynamic instances (of basic blocks) that arises from "unrolling" the CFG:
+
+.. code-block:: text
+
+  A -> B0 -> B1 -> B2 -> B3 -> ...
+        \     \     \     \
+         -------------------- ... -> C
+
+  // or, for example:
+
+  A -> B0 -> B1 -> B2 -> B3 -> ...
+        \     \     \     \
+         C0    C1    C2    C3
+
+The difference between the two graphs is whether threads "reconverge" after
+the loop: if `C` contains a subgroup operation (in the GPU language sense),
+will it communicate among all threads of the subgroup (usually: wave or warp)
+or only among those that left the loop in the same iteration?
+
+Note that other graphs are possible as well, where threads reconverge partially.
+
+As a first approximation, you can think of the dynamic instances as all
+executions of an instruction or basic block by the SIMD unit on which a
+wave/warp is running. If the threads reconverge in `C`, then `C` is only
+executed once for each wave/warp, corresponding to only one dynamic instance.
+
+However, note that this analogy is not precise. Lockstep execution is not
+enforced in the LLVM IR semantics. This falls out naturally from the wave/warp
+model of execution when considering workgroup scope. Even if `C` is considered
+to be reconverged at the workgroup scope, different waves/warps of the same
+workgroup will generally execute instructions in `C` at different points in
+time, except for explicit barrier instructions.
+
+
+Motivating Examples of Convergent Operations
+============================================
+
+The following stylized pixel shader samples a texture at a given set of
+coordinates. Texture sampling requires screen-space derivatives of the
+coordinates to determine the level of detail (mipmap level) of the sample,
+which is commonly approximated by taking the difference with respect to
+neighboring pixels -- which are computed by different threads in the same
+wave or warp:
+
+.. code-block:: text
+
+  void example_shader() {
+      ...
+      color = textureSample(texture, coordinates);
+      if (condition) {
+          use_color();
+      }
+      ...
+  }
+
+From a purely single-threaded perspective, sinking the `textureSample` into
+the if-statement appears legal. However, if the condition is false for some
+neighboring pixels, then their corresponding threads will not execute together
+in the wave/warp, making it impossible to take the difference of coordinates
+as an approximation of the screen-space derivative (in practice, the outcome
+will be an undefined value).
+
+That is, the `textureSample` operation fits our definition of a convergent
+operation:
+
+ * It communicates with a set of thread that implicitly depends on control flow.
+
+ * Correctness depends on this set of threads.
+
+The following example shows that merging common code of branches can be
+incorrect in the face of convergent operations:
+
+.. code-block:: text
+
+  void example_kernel() {
+      delta = ...
+      if (delta > 0) {
+          total_gains = nonuniformSubgroupSum(delta);
+          ...
+      } else {
+          total_losses = nonuniformSubgroupSum(delta);
+          ...
+      }
+  }
+
+The `nonuniformSubgroupSum` computing the `total_gains` will be executed by
+the subset of threads with positive `delta` in a warp, and so will sum up al
+the `delta`s of those threads; and similarly for the `nonuniformSubgroupSum`
+that computes the `total_losses`.
+
+If we were to move and merge the `nonuniformSubgroupSum` above the if-statement,
+it would sum up the `delta` across _all_ threads.
+
+Consider a simple example of jump threading:
+
+.. code-block:: text
+
+  void example_original() {
+  entry:
+      ...
+      br i1 %cond1, label %then1, label %mid
+
+  then1:
+      ...
+      %cond2 = ...
+      br label %mid
+
+  mid:
+      %flag = phi i1 [ true, %entry ], [ %cond2, %then1 ]
+      br i1 %flag, label %then2, label %end
+
+  then2:
+      ...
+      call void @nonuniformSubgroupControlBarrier()
+      ...
+      br label %end
+
+  end:
+  }
+
+  void example_jumpthreaded() {
+  entry:
+      br i1 %cond1, label %then1, label %then2
+
+  then1:
+      ...
+      %cond2 = ...
+      br i1 %cond2, label %then2, label %end
+
+  then2:
+      ...
+      call void @nonuniformSubgroupControlBarrier()
+      ...
+      br label %end
+
+  end:
+  }
+
+Is the control barrier guaranteed to synchronize among the same set of threads
+in both cases? Different implementations in the literature may give different
+answers to this question:
+
+* In an implementation that reconverges at post-dominators, threads reconverge
+  at `%mid` in the first version, so that all threads (within a wave/warp) that
+  execute the control  barrier do so together. In the second version, threads
+  that reach the control barrier via different paths synchronize separately.
+
+* An implementation that sorts basic blocks topologically and ensures maximal
+  reconvergence for each basic block would behave the same way in both
+  versions.
+
+Therefore, we need to be able to specify explicitly which threads communicate
+in a convergent operation.
+
+
+
+Definitions and Rules
+=====================
+
+1. Every instruction in an LLVM IR function has zero or more dynamic instances.
+
+2. There is exactly one dynamic instance of the function entry point.
+
+3. The dynamic instances form a directed acyclic (potentially infinite) graph.
+
+4. If there is a walk *W* (in this graph) from a dynamic instance *A* to a
+   dynamic instance *B*, then there is a corresponding walk *w* in the
+   function's control flow graph in the following sense: the walks have the
+   same length, and the i-th dynamic instance in *W* is a dynamic instance of
+   the i-th instruction in *w*.
+
+   We say that *w* is the *projection* of *W*, and *W* is a *lifting* of *w*.
+
+5. Given a dynamic instance *A*, every walk through the function's control flow
+   graph starting at the instruction corresponding to *A* has a unique lifting
+   that starts in *A*. That is, two threads that follow the same control flow
+   path will not diverge.
+
+   For the purpose of this rule, edges in the control flow graph are
+   significant. For example, if both labels of a branch instruction point to the
+   same basic block, threads executing the same dynamic instance of the branch
+   instruction with different truth values of the condition may diverge.
+
+   Note: This rule implies that when threads diverge inside of a function call,
+   they will reconverge when the returning to the calling function.
+
+6. When executing a dynamic instance *A* of a convergent operation, the set of
+   communicating threads is the subset of threads that execute *A* out of some
+   implementation-defined set of threads.
+
+   Examples of this implementation-defined set of threads are the OpenCL
+   workgroup of the executing thread or a SPIR-V subgroup.
+
+These rules intentionally do not fully define the set (and connectivity) of
+dynamic instances: the only constraint on when reconvergence may happen is
+that the graph must be acyclic. In particular, reconvergence may happen in the
+middle of a basic block. Later in the document, we describe intrinsics that can
+be used to constrain where reconvergence may or or may not occur.
+
+7. When the graph of dynamic instances is not fully defined, it is considered
+   to be (partially) implementation-defined. Program transforms are free to add
+   additional constraints to the graph connectivity in this case.
+
+
+Correctness of Program Transforms
+=================================
+
+(This section is informative.)
+
+As implied by the rules in the previous section, program transforms are correct
+with regards to convergent operations if they preserve their semantics, which
+means that the set of communicating threads must remain the same.
+
+If the connectivity of the graph of dynamic instances is fully defined, this
+means that if two possible executions of the original program do/do not execute
+the same dynamic instance of a given convergent operation, then they must/must
+not do so in the transformed program.
+Since the connectivity is in general *not* fully defined, the following more
+complex rule applies: if the transformed program allows a graph of dynamic
+instances such that two possible executions *E1* and *E2* of a program do/do not
+execute the same dynamic instance of a convergent operation, then the original
+program must allow a graph such that *E1* and *E2* do/do not execute the same
+dynamic instance at the corresponding point of their execution.
+
+In general, program transforms with a single-threaded focus are correct if they
+do not move convergent operations in the control flow graph, i.e. if they do
+not sink or hoist convergent operations across a branch. This applies even to
+program transforms that change the control flow graph.
+
+For example, consider unrolling a simple loop that in itself does not contain
+convergent operations. This loop causes divergence based on, and only based on,
+the trip count. This remains true even after (partial) unrolling. Therefore,
+unrolling the loop does not change the possible graphs of dynamic instances in
+any way that could affect convergent operations. The same argument applies to
+other loop transforms and to transforms such as replacing a switch statement by
+a binary tree of conditional branches.
+
+On the other hand, a loop that contains convergent operations and has unknown
+trip count cannot be unrolled in general, since the remainder loop changes the
+possible graph structures for convergent operations in the loop body.
+
+A major exception to the rule of thumb that program transforms are
+conservatively correct if they "do not touch" convergent operations are program
+transforms that create control flow "from thin air". Examples of this are a
+transform that specializes parts of a function based on testing a value
+against a constant, or a transform that converts ``select`` instructions into
+branches. Both kinds of transforms introduce additional divergence without
+guaranteeing reconvergence, which may change the behavior of convergent
+operations that occur much later in the program. (Note, however, that
+if-conversion is perfectly legal as long as the converted basic blocks
+themselves do not contain any convergent operations, because it only reduces
+the set of possible graphs of dynamic instances as far as convergent operations
+are concerned.)
+
+
+Memory Model Non-Interaction
+============================
+
+The fact that an operation is convergent has no effect on how it should be
+treated for memory model purposes. In particular, an operation that is
+``convergent`` and ``readnone`` does not introduce additional ordering
+constraints as far as the memory model is concerned. There is no implied
+barrier, neither in the memory barrier sense nor in the control barrier
+sense of synchronizing the execution of threads. Threads executing the same
+dynamic instance do not necessarily do so at the same time.
+
+
+Other Interactions
+==================
+
+1. `convergent` vs. `speculatable`. A function can be both `convergent` and
+   `speculatable`, indicating that the function does not have undefined
+   behavior and has no effects besides calculating its result -- but is still
+   affected by the set of threads executing this function. This typically
+   prevents speculation of calls to the function unless the constraint imposed
+   by `convergent` is further relaxed by some other means.
+
+
+Intrinsics
+==========
+
+This section defines a set of intrinsics that can be used together to restrict
+the possible graphs of dynamic instances by ensuring reconvergence or
+non-reconvergence at certain points of the program.
+
+.. _llvm.convergence.anchor:
+
+``llvm.convergence.anchor``
+------------------------------------
+
+.. code-block:: text
+
+  token @llvm.convergence.anchor() convergent readnone
+
+
+This intrinsic is a marker that acts as an "anchor". The produced token can
+be consumed by *join* or *point* intrinsics in order to restrict the dynamic
+instances DAG structure.
+
+The *region* of an anchor is the subset of dominance region of the region from
+which corresponding join or point intrinsics are reachable (without leaving
+the dominance region).
+
+Anchor regions must be well-nested: if the region of anchor *a1* contains
+a join or point anchored by *a2*, then *a1* must dominate *a2*.
+
+
+.. _llvm.convergence.join:
+
+``llvm.convergence.join``
+------------------------------------
+
+.. code-block:: text
+
+  void @llvm.convergence.join(token %anchor) convergent readnone
+
+The anchor token must be refer to a call to the anchor intrinsic, which we
+simply call the anchor.
+
+This intrinsic ensures reconvergence in the following sense. Let *b* be a call
+to this intrinsic that can be reached from the function entry (called a
+"join"), and let *A* be a dynamic instance of the anchor *a*. Then there is a
+dynamic instance *B* of *b* such that for every walk in the graph of dynamic
+instances that begins at *A* and encounters a dynamic instance of *b* before
+encountering another dynamic instance of *a*, *B* is that dynamic instance of
+*b*.
+
+The program is undefined if the control flow graph has a cycle that contains
+a *join* but not its corresponding *anchor*.
+
+
+.. _llvm.convergence.point:
+
+``llvm.convergence.point``
+-----------------------------------
+
+.. code-block:: text
+
+  void @llvm.convergence.point(token %anchor) convergent readnone
+
+The anchor token must be produced by a call to the anchor intrinsic, which we
+simply call the anchor.
+
+This intrinsic ensures non-reconvergence. It is named after the points at the
+end of the tines of a fork. Non-reconvergence is ensured in the following
+sense. Let *b* be a call to this intrinsic that can be reached from the
+function entry (called a "point"), let *a* be the anchor, and let *B* be a
+dynamic instance of *b*. Then there is a dynamic instance *A* of *a* such that
+on every walk in the graph of dynamic instances starting at the function entry
+point and reaching *B*, *A* is the most recently encountered dynamic instance
+of *a*.
+
+The program is undefined if the control flow graph has a cycle that contains
+a *point* but not its corresponding *anchor*.
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -1401,21 +1401,17 @@
     function call are also considered to be cold; and, thus, given low
     weight.
 ``convergent``
-    In some parallel execution models, there exist operations that cannot be
-    made control-dependent on any additional values.  We call such operations
-    ``convergent``, and mark them with this attribute.
-
-    The ``convergent`` attribute may appear on functions or call/invoke
-    instructions.  When it appears on a function, it indicates that calls to
-    this function should not be made control-dependent on additional values.
-    For example, the intrinsic ``llvm.nvvm.barrier0`` is ``convergent``, so
-    calls to this intrinsic cannot be made control-dependent on additional
-    values.
+    In some parallel execution models, there exist operations that communicate
+    with a set of threads that is implicitly defined by control flow. We call
+    such operations ``convergent`` and mark them with this attribute.
+
+    The ``convergent`` attribute may appear on call/invoke instructions to
+    indicate that the instruction is a convergent operation, or on functions
+    to indicate that calls to this function are convergent operations.
 
-    When it appears on a call/invoke, the ``convergent`` attribute indicates
-    that we should treat the call as though we're calling a convergent
-    function.  This is particularly useful on indirect calls; without this we
-    may treat such calls as though the target is non-convergent.
+    The presence of this attribute indicates that certain program transforms
+    involving control flow are forbidden. For a detailed description, see the
+    `Dynamic Instances <DynamicInstances.html>`_ document.
 
     The optimizer may remove the ``convergent`` attribute on functions when it
     can prove that the function does not execute any convergent operations.
@@ -1518,10 +1514,15 @@
     (synchronize) with another thread through memory or other well-defined means.
     Synchronization is considered possible in the presence of `atomic` accesses
     that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses,
-    as well as `convergent` function calls. Note that through `convergent` function calls
-    non-memory communication, e.g., cross-lane operations, are possible and are also
-    considered synchronization. However `convergent` does not contradict `nosync`.
-    If an annotated function does ever synchronize with another thread,
+    as well as `convergent` function calls.
+
+    Note that `convergent` operations can involve communication that is
+    considered to be not through memory (e.g. through cross-lane operations)
+    and does not necessarily imply an ordering between threads for the purposes
+    of the memory model. Therefore, an operation can be both `convergent` and
+    `nosync`.
+
+    If a `nosync` function does ever synchronize with another thread,
     the behavior is undefined.
 ``nounwind``
     This function attribute indicates that the function never raises an
@@ -14438,6 +14439,13 @@
 
 .. _dbg_intrinsics:
 
+Convergence Intrinsics
+----------------------
+
+The LLVM convergence intrinsics for controlling the semantics of convergent
+operations, which all start with the ``llvm.convergence.`` prefix, are
+described in the `Dynamic Instances <DynamicInstances.html>`_ document.
+
 Debugger Intrinsics
 -------------------
 
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
--- a/llvm/docs/Reference.rst
+++ b/llvm/docs/Reference.rst
@@ -17,6 +17,7 @@
    CommandGuide/index
    Coroutines
    DependenceGraphs/index
+   DynamicInstances
    ExceptionHandling
    Extensions
    FaultMaps
@@ -128,6 +129,9 @@
 :doc:`GlobalISel/index`
   This describes the prototype instruction selection replacement, GlobalISel.
 
+:doc:`DynamicInstances`
+  Description of dynamic instances semantics and convergent operations.
+
 =====================
 Testing and Debugging
 =====================
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1279,6 +1279,15 @@
                                                  [IntrNoMem, ImmArg<1>,
                                                   ImmArg<2>]>;
 
+//===------- Convergence Intrinsics ---------------------------------------===//
+
+def int_convergence_anchor : Intrinsic<[llvm_token_ty], [],
+                                       [IntrNoMem, IntrConvergent]>;
+def int_convergence_join : Intrinsic<[], [llvm_token_ty],
+                                     [IntrNoMem, IntrConvergent]>;
+def int_convergence_point : Intrinsic<[], [llvm_token_ty],
+                                      [IntrNoMem, IntrConvergent]>;
+
 //===----------------------------------------------------------------------===//
 // Target-specific intrinsics
 //===----------------------------------------------------------------------===//
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -3946,6 +3946,12 @@
     auto *CI = cast<const CallInst>(Inst);
     const Function *Callee = CI->getCalledFunction();
 
+    // The called function depends on the set of threads executing it, which
+    // could change if the call is moved to a different location in control
+    // flow.
+    if (CI->isConvergent())
+        return false;
+
     // The called function could have undefined behavior or side-effects, even
     // if marked readnone nounwind.
     return Callee && Callee->isSpeculatable();
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -1094,13 +1094,9 @@
   // is unsafe -- it adds a control-flow dependency to the convergent
   // operation.  Therefore restrict remainder loop (try unrollig without).
   //
-  // TODO: This is quite conservative.  In practice, convergent_op()
-  // is likely to be called unconditionally in the loop.  In this
-  // case, the program would be ill-formed (on most architectures)
-  // unless n were the same on all threads in a thread group.
-  // Assuming n is the same on all threads, any kind of unrolling is
-  // safe.  But currently llvm's notion of convergence isn't powerful
-  // enough to express this.
+  // TODO: This is somewhat conservative, as we could allow unrolling if the
+  // trip count is uniform across the threads in a thread group, which it
+  // should be if convergent_op() is a barrier.
   if (Convergent)
     UP.AllowRemainder = false;
 
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -1263,6 +1263,13 @@
       isa<CallBrInst>(I1))
     return false;
 
+  // Cannot hoist convergent calls since that could change the set of threads
+  // with which they communicate.
+  if (const auto *C = dyn_cast<CallBase>(I1)) {
+    if (C->isConvergent())
+      return false;
+  }
+
   BasicBlock *BIParent = BI->getParent();
 
   bool Changed = false;
@@ -1455,13 +1462,19 @@
         I->getType()->isTokenTy())
       return false;
 
-    // Conservatively return false if I is an inline-asm instruction. Sinking
-    // and merging inline-asm instructions can potentially create arguments
-    // that cannot satisfy the inline-asm constraints.
-    if (const auto *C = dyn_cast<CallBase>(I))
+    if (const auto *C = dyn_cast<CallBase>(I)) {
+      // Conservatively return false if I is an inline-asm instruction. Sinking
+      // and merging inline-asm instructions can potentially create arguments
+      // that cannot satisfy the inline-asm constraints.
       if (C->isInlineAsm())
         return false;
 
+      // Do not sink convergent calls, as sinking may affect the set of threads
+      // with which the convergent operation communicates.
+      if (C->isConvergent())
+        return false;
+    }
+
     // Each instruction must have zero or one use.
     if (HasUse && !I->hasOneUse())
       return false;
diff --git a/llvm/test/Transforms/JumpThreading/basic.ll b/llvm/test/Transforms/JumpThreading/basic.ll
--- a/llvm/test/Transforms/JumpThreading/basic.ll
+++ b/llvm/test/Transforms/JumpThreading/basic.ll
@@ -605,6 +605,40 @@
 ; CHECK: }
 }
 
+; CHECK-LABEL: @convergence_intrinsics
+; CHECK: entry:
+; CHECK: if1.then:
+; CHECK:   br label %if1.end
+; CHECK: if1.end:
+; CHECK: if2.then:
+; CHECK: if2.end
+define i32 @convergence_intrinsics(i1 %cc1) {
+entry:
+  %tok = call token @llvm.convergence.anchor()
+  br i1 %cc1, label %if1.then, label %if1.end
+
+if1.then:
+  %v = call i32 @f1()
+  %cc.v = icmp ne i32 %v, 0
+  br label %if1.end
+
+if1.end:
+  %cc2 = phi i1 [ false, %entry ], [ %cc.v, %if1.then ]
+  call void @llvm.convergence.join(token %tok)
+  br i1 %cc2, label %if2.then, label %if2.end
+
+if2.then:
+  %ballot1 = call i32 @ballot(i1 true)
+  br label %if2.end
+
+if2.end:
+  %r = phi i32 [ 0, %if1.end ], [ %ballot1, %if2.then ]
+  ret i32 %r
+}
+
+declare token @llvm.convergence.anchor() convergent readnone
+declare void @llvm.convergence.join(token) convergent readnone
+declare i32 @ballot(i1) convergent readnone
 
 ; CHECK: attributes [[$NOD]] = { noduplicate }
 ; CHECK: attributes [[$CON]] = { convergent }
diff --git a/llvm/test/Transforms/SimplifyCFG/attr-convergent.ll b/llvm/test/Transforms/SimplifyCFG/attr-convergent.ll
--- a/llvm/test/Transforms/SimplifyCFG/attr-convergent.ll
+++ b/llvm/test/Transforms/SimplifyCFG/attr-convergent.ll
@@ -3,6 +3,7 @@
 ; Checks that the SimplifyCFG pass won't duplicate a call to a function marked
 ; convergent.
 ;
+; CHECK-LABEL: @check
 ; CHECK: call void @barrier
 ; CHECK-NOT: call void @barrier
 define void @check(i1 %cond, i32* %out) {
@@ -25,4 +26,46 @@
   ret void
 }
 
+; CHECK-LABEL: @dont_merge_convergent
+; CHECK: if.then:
+; CHECK:   %ballot.then = call i32 @ballot
+; CHECK: if.else:
+; CHECK:   %ballot.else = call i32 @ballot
+define i32 @dont_merge_convergent(i1 %cond1, i1 %cond2) {
+entry:
+  br i1 %cond1, label %if.then, label %if.else
+
+if.then:
+  %ballot.then = call i32 @ballot(i1 %cond2)
+  br label %end
+
+if.else:
+  %ballot.else = call i32 @ballot(i1 %cond2)
+  br label %end
+
+end:
+  %ballot = phi i32 [ %ballot.then, %if.then ], [ %ballot.else, %if.else ]
+  ret i32 %ballot
+}
+
+; CHECK-LABEL: @dont_speculate_convergent
+; CHECK: entry:
+; CHECK: then:
+; CHECK:   call i32 @speculatable_convergent
+; CHECK: end:
+define i32 @dont_speculate_convergent(i1 %cond1, i32 %in) {
+entry:
+  br i1 %cond1, label %then, label %end
+
+then:
+  %v = call i32 @speculatable_convergent(i32 %in)
+  br label %end
+
+end:
+  %r = phi i32 [ 0, %entry ], [ %v, %then ]
+  ret i32 %r
+}
+
+declare i32 @ballot(i1 %arg) convergent readnone
+declare i32 @speculatable_convergent(i32) convergent readnone speculatable
 declare void @barrier() convergent