diff --git a/llvm/docs/ConvergentOperations.rst b/llvm/docs/ConvergentOperations.rst --- a/llvm/docs/ConvergentOperations.rst +++ b/llvm/docs/ConvergentOperations.rst @@ -9,21 +9,6 @@ Overview ======== -Some parallel execution environments execute threads in groups that allow -efficient communication within each group. Notably this is the case for -whole-program vectorization environments such as GPUs, where threads are mapped -to lanes of a SIMD vector. Efficient communication among threads is possible in -this case by simply exchanging data between the lanes of a vector. However, the -semantics defined in this document are independent of such implementation -details. - -When control flow diverges, i.e. threads of the same group follow different -paths through the CFG, not all threads of the group may be available to -participate in this communication. This is the defining characteristic that -distinguishes convergent operations from other inter-thread communication and -which requires the use of the ``convergent`` attribute to indicate -*additional constraints* on program transforms: - A convergent operation involves inter-thread communication or synchronization that occurs outside of the memory model, where the set of threads which participate in communication is implicitly affected by control flow. @@ -401,60 +386,27 @@ region of code that is controlled by the anchor. -.. _dynamic_instances_and_convergence_tokens: - -Dynamic Instances and Convergence Tokens -======================================== - -Every execution of an LLVM IR instruction occurs in a *dynamic instance* of -the instruction. Dynamic instances are the formal objects by which we talk -about communicating threads in convergent operations. They satisfy: - -1. Different executions of the same instruction by a single thread - give rise to different dynamic instances of that instruction. - -2. Executions of different instructions always occur in different dynamic - instances. For this and other rules in this document, instructions of the - same type at different points in the program are considered to be different - instructions. - -3. Executions of the same instruction by different threads may occur in - the same dynamic instance. +.. _convergence_tokens: -4. When executing a convergent operation, the set of threads that execute the - same dynamic instance is the set of threads that communicate with each other - for that operation. +Convergence Tokens +================== *Convergence tokens* are values of ``token`` type, i.e. they cannot be used in ``phi`` or ``select`` instructions. A convergence token value represents the -dynamic instance of the instruction that produced it. +:ref:`dynamic instance ` of the instruction that produced it. Convergent operations typically have a ``convergencectrl`` operand bundle with a convergence token operand to define the set of communicating threads relative to some anchor. The details are described in the :ref:`Formal Rules ` section. -.. _controlled_convergent_operation: - -The convergence control intrinsics described in this document and convergent -operations that have a ``convergencectrl`` operand bundle are considered -*controlled* convergent operations. - -Other convergent operations are *uncontrolled*. Their use is deprecated. -Program transforms are correct for uncontrolled convergent operations if they -do not make such operations control-dependent on additional values. The -remainder of this document is only concerned with controlled convergent -operations. +If a call/invoke instruction is marked ``noconvergent`` or its callee +is marked ``noconvergent``, then any explicit token argument is +ignored. -Informational notes: +Informational note: -1. The rules above define dynamic instances for *all* LLVM IR instructions, - whether convergent or not. However, the dynamic instances for non-convergent - instructions are entirely irrelevant. The only way that dynamic instances - can have an effect on the execution of a program is via rule 4 about the - cross-thread communication in convergent operatons. - -2. The text defines convergence token values as representing a dynamic + The text defines convergence token values as representing a dynamic instance, but you could almost think of them as representing a set of threads instead -- specifically, the set S of threads that executed the dynamic instance, i.e. that executed the defining instruction D "together". @@ -491,7 +443,7 @@ .. code-block:: llvm - token @llvm.experimental.convergence.entry() convergent readnone + token @llvm.experimental.convergence.entry() readnone This intrinsic is used to tie the dynamic instances inside of a function to those in the caller. Informally, one can think of it as returning the @@ -499,51 +451,16 @@ when the current function was called. The formal definition based on dynamic instances is given :ref:`later `. -Behavior is undefined if the containing function was called from IR without -a ``convergencectrl`` bundle. - -The expectation is that for program "main" functions whose caller is not +For program "main" functions whose caller is not visible to LLVM, such as kernel entry functions, the implementation returns a convergence token that represents uniform control flow, i.e. that is guaranteed to refer to all threads within a (target- or environment-dependent) group. -Behavior is undefined if this intrinsic appears in a function that isn't -``convergent``. - -Behavior is undefined if this intrinsic appears inside of another -:ref:`convergence region ` or outside of a function's entry -block. - -Function inlining substitutes this intrinsic with the token from the operand -bundle. For example: - -.. code-block:: c++ - - // Before inlining: - - void callee() convergent { - %tok = call token @llvm.experimental.convergence.entry() - convergent_operation(...) [ "convergencectrl"(token %tok) ] - } - - void main() { - %outer = call token @llvm.experimental.convergence.anchor() - for (...) { - %inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ] - callee() [ "convergencectrl"(token %inner) ] - } - } - - // After inlining: - - void main() { - %outer = call token @llvm.experimental.convergence.anchor() - for (...) { - %inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ] - convergent_operation(...) [ "convergencectrl"(token %inner) ] - } - } +Behavior is undefined if this intrinsic appears outside of a +function's entry block. +It is an error to add the ``noconvergent`` attribute or a convergence +token operand on any call to this intrinsic. .. _llvm.experimental.convergence.loop: @@ -552,7 +469,7 @@ .. code-block:: llvm - token @llvm.experimental.convergence.loop() [ "convergencectrl"(token) ] convergent readnone + token @llvm.experimental.convergence.loop() [ "convergencectrl"(token) ] readnone This intrinsic defines the *heart* of a loop, i.e. the place where an imaginary loop counter is incremented for the purpose of determining convergence @@ -573,7 +490,7 @@ .. code-block:: llvm - token @llvm.experimental.convergence.anchor() convergent readnone + token @llvm.experimental.convergence.anchor() noconvergent readnone This intrinsic is a marker that acts as an *anchor* producing an initial convergence token that is independent from any "outer scope". The set of @@ -585,6 +502,65 @@ detect the maximal set of threads that can communicate efficiently within some local region of the program. +.. _convergent_function_inlining: + +Function Inlining +================= + +Function inlining substitutes any occurrence of the +:ref:`llvm.experimental.convergence.entry +` intrinsic in the callee with +the token from the ``convergencectrl`` operand bundle passed at the +callsite being inlined. + +- If the callsite is marked ``noconvergent`` or the callee is marked + ``noconvergent``, then every occurrence of the + :ref:`llvm.experimental.convergence.entry + ` intrinsic in the callee is + replaced by a unique call to + :ref:`llvm.experimental.convergence.anchor + ` inserted just before the + callsite being inlined. + +- If no token is present at the callsite being inlined, then any + occurrence of the :ref:`llvm.experimental.convergence.entry + ` intrinsic in the callee is + removed along with its uses in any ``convergencectrl`` operand + bundles. + +For example: + +.. code-block:: c++ + + // Before inlining: + + void callee() convergent { + %tok = call token @llvm.experimental.convergence.entry() + convergent_operation(...) [ "convergencectrl"(token %tok) ] + } + + void main() { + %outer = call token @llvm.experimental.convergence.anchor() + for (...) { + %inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ] + callee() [ "convergencectrl"(token %inner) ] + callee() // no token operand + callee() noconvergent + } + } + + // After inlining: + + void main() { + %outer = call token @llvm.experimental.convergence.anchor() + for (...) { + %inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ] + convergent_operation(...) [ "convergencectrl"(token %inner) ] + convergent_operation(...) // no token operand + %implicit = call token @llvm.experimental.convergence.anchor() + convergent_operation(...) ["convergencectrl"(token %implicit)] // originally noconvergent + } + } .. _convergence_formal_rules: @@ -592,23 +568,23 @@ ============ The convergence control intrinsics described in the previous section place -additional constraints on the execution of dynamic instances, which should be -understood on top of the -:ref:`basic rules about dynamic instances `: - -1. Let U be a :ref:`controlled ` convergent - operation other than the convergence control intrinsics. Let D be the +constraints on the execution of dynamic instances as follows: + +1. Except at a ``noconvergent`` call or a call to the + :ref:`llvm.experimental.convergence.entry + ` intrinsic, if a token value + is not supplied at a call/invoke instruction in the + ``convergencectrl`` operand bundle, the behaviour is as if the + dynamic instance at the call is indeterminable. None of the rules + defined in this document can be applied when transforming the + control flow that reaches such a call. + +2. Let U be a call to a ``convergent`` operation. Let D be the instruction that defines the convergence token used by U. Two threads executing U execute the same dynamic instance of U if and only if they obtained the token value from the same dynamic instance of D. - (Informational note: As mentioned in the - :ref:`basic rules about dynamic instances `, - the requirement here is that U is the same point in the program and not just - the same type of instruction. In particular, this rule does not apply when - the same ``convergent`` function is called from different call sites.) - -2. Two threads executing the same call U of +3. Two threads executing the same call U of :ref:`llvm.experimental.convergence.loop ` execute the same dynamic instance of U if and only if: @@ -619,7 +595,7 @@ .. _convergence_formal_rules_entry: -3. If two threads execute the same call U of +4. If two threads execute the same call U of :ref:`llvm.experimental.convergence.entry `, and at least one of them executes the function F containing U because it was called by a ``call``, ``invoke``, or ``callbr`` instruction, then they @@ -638,6 +614,9 @@ point functions, is expected to be defined elsewhere, e.g. in reference to the relevant language or API (e.g. OpenCL, Vulkan) specifications. +Cycles +------ + For the purpose of the following rules, a *cycle* is a walk in the CFG, i.e. a directed sequence of nodes and edges in the CFG whose start and end points are the same. @@ -661,6 +640,9 @@ .. _convergence_region: +Convergence Regions +------------------- + The *convergence region* of a convergence token T is the minimal region in which T is live and used, i.e., the set of program points dominated by the definition D of T from which a use of T can be reached by a walk in the CFG @@ -669,36 +651,11 @@ The following static rule about convergence regions must be satisfied by valid programs: -1. If a convergence region R for a token T1 contains a use of a convergence + If a convergence region R for a token T1 contains a use of a convergence token T2, then R must also contain the definition of T2. (In other words, convergence regions must be reasonably nested.) -Memory Model Non-Interaction -============================ - -The fact that an operation is convergent has no effect on how it is treated for -memory model purposes. In particular, an operation that is ``convergent`` and -``readnone`` does not introduce additional ordering constraints as far as the -memory model is concerned. There is no implied barrier, neither in the memory -barrier sense nor in the control barrier sense of synchronizing the execution -of threads. - -Informational note: Threads that execute the same dynamic instance do not -necessarily do so at the same time. - - -Other Interactions -================== - -``convergent`` vs. ``speculatable``. A function can be both ``convergent`` and -``speculatable``, indicating that the function does not have undefined -behavior and has no effects besides calculating its result, but is still -affected by the set of threads executing this function. This typically -prevents speculation of calls to the function unless the constraint imposed -by ``convergent`` is further relaxed by some other means. - - Rationales ========== diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -1476,25 +1476,7 @@ function call are also considered to be cold; and, thus, given low weight. ``convergent`` - Some parallel execution environments execute threads in groups that allow - efficient communication within the group, among a subset of threads that - is implicitly defined by control flow. We call such operations - ``convergent`` and mark them with this attribute. - - The ``convergent`` attribute may appear on call/invoke instructions to - indicate that the instruction is a convergent operation, or on functions - to indicate that calls to this function are convergent operations. - - The presence of this attribute indicates that certain program transforms - involving control flow are forbidden. For a detailed description, see the - :doc:`ConvergentOperations` document. - - The optimizer may remove the ``convergent`` attribute on functions when it - can prove that the function does not execute - ``llvm.experimental.convergent.entry`` or uncontrolled convergent - operations (see :ref:`dynamic_instances_and_convergence_tokens`). - Similarly, the optimizer may remove ``convergent`` on calls/invokes when it - can prove that the call/invoke cannot call a convergent function. + This attribute has no effect. See ``noconvergent`` instead. ``inaccessiblememonly`` This attribute indicates that the function may only access memory that is not accessible by the module being compiled. This is a weaker form @@ -1542,6 +1524,12 @@ with equivalent code based on the semantics of the built-in function, unless the call site uses the ``builtin`` attribute. This is valid at call sites and on function declarations and definitions. +``noconvergent`` + This attribute indicates that the execution of this function is + not :ref:`convergent `. The attribute may + also appear on a call/invoke instruction, which indicates that + this call/invoke should be treated as if it is calling a + ``noconvergent`` function. ``noduplicate`` This attribute indicates that calls to the function cannot be duplicated. A call to a ``noduplicate`` function may be moved @@ -2328,9 +2316,9 @@ Convergence Control Operand Bundles ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -A "convergencectrl" operand bundle is only valid on a ``convergent`` operation. +A ``convergencectrl`` operand bundle on a call :ref:`identifies the +dynamic instance ` of a convergent operation. When present, the operand bundle must contain exactly one value of token type. -See the :doc:`ConvergentOperations` document for details. .. _moduleasm: @@ -2867,6 +2855,85 @@ ``fast`` This flag implies all of the others. +.. _convergent: + +Convergent Operations +--------------------- + +Some parallel execution environments execute threads in groups that +make use of specialized operations for efficient communication within +each group. When control flow diverges, i.e. threads of the same group +follow different paths through the CFG, not all threads of the group +may be available to participate in this communication. An operation +whose outcome is determined by the set of threads that communicate at +that operation, is said to be ``convergent``. + +A call/invoke instruction is assumed to be convergent unless the +``noconvergent`` attribute is present on that instruction or on the +called function. The set of threads that communicate at an operation +marked ``noconvergent`` is implementation-defined. + +.. _dynamic_instance: + +Dynamic Instances +^^^^^^^^^^^^^^^^^ + +Every execution of an LLVM IR instruction occurs in a *dynamic instance* of +the instruction. Dynamic instances are the formal objects by which we talk +about communicating threads in convergent operations. They satisfy: + +1. Different executions of the same instruction by a single thread + give rise to different dynamic instances of that instruction. + +2. Executions of different instructions always occur in different dynamic + instances. For this and other rules in this document, instructions of the + same type at different points in the program are considered to be different + instructions. + +3. Executions of the same instruction by different threads may occur in + the same dynamic instance. (Informational note: Threads that + execute the same dynamic instance do not necessarily do so at the + same time.) + +4. When executing a convergent operation, the set of threads that execute the + same dynamic instance is the set of threads that communicate with each other + for that operation. + +Informational note: + + The rules above can define dynamic instances for *all* LLVM IR instructions, + whether convergent or not. However, the dynamic instances for non-convergent + instructions are entirely irrelevant. The only way that dynamic instances + can have an effect on the execution of a program is via rule 4 about the + cross-thread communication in convergent operatons. + +In general, an optimization that modifies control flow in the program +must ensure that the set of threads communicating at each dynamic +instance of a convergent operation remains unchanged. If a +``convergencectrl`` operand bundle is present on a call/invoke +instruction, then the the rules defined in :doc:`ConvergentOperations` +determine the set of threads that together execute that instruction. + +Memory Model Non-Interaction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The fact that an operation is convergent has no effect on how it is treated for +memory model purposes. In particular, an operation that is ``convergent`` and +``readnone`` does not introduce additional ordering constraints as far as the +memory model is concerned. There is no implied barrier, neither in the memory +barrier sense nor in the control barrier sense of synchronizing the execution +of threads. + +Other Interactions +^^^^^^^^^^^^^^^^^^ + +A function can be both ``convergent`` and +``speculatable``, indicating that the function does not have undefined +behavior and has no effects besides calculating its result, but is still +affected by the set of threads executing this function. This typically +prevents speculation of calls to the function unless the constraint imposed +by ``convergent`` is further relaxed by some other means. + .. _uselistorder: Use-list Order Directives