This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
-
ConvergenceAndUniformity.rst
13/13
ConvergentOperations.rst
-
LangRef.rst
-
Reference.rst
-
ReleaseNotes.rst
-
include/llvm/
-
llvm/
-
ADT/
-
GenericCycleImpl.h
-
GenericCycleInfo.h
-
Analysis/
-
CycleAnalysis.h
-
IR/
-
CycleInfo.h
1/1
Intrinsics.td
-
LLVMContext.h
-
lib/
-
Analysis/
-
CycleAnalysis.cpp
-
IR/
-
CMakeLists.txt
-
CycleInfo.cpp
-
LLVMContext.cpp
5/5
Verifier.cpp
-
test/
-
Analysis/UniformityAnalysis/AMDGPU/
-
UniformityAnalysis/
-
AMDGPU/
2/2
join-at-loop-heart.ll
-
Assembler/
-
convergence-control.ll
-
Bitcode/
-
convergence-control.ll
-
convergence-control.ll.bc
-
operand-bundles-bc-analyzer.ll
-
Verifier/
4/4
convergencectrl-invalid.ll

Differential D147116

[RFC] Introduce convergence control intrinsics
ClosedPublic

Authored by sameerds on Mar 28 2023, 11:58 PM.

Download Raw Diff

Details

Reviewers

nhaehnle
arsenm
efriedma
t-tye
simoll
jdoerfert
jlebar
jholewinski
Anastasia
ruiling
foad
jsilvanus
tra

Commits

rGda61c865e734: [RFC] Introduce convergence control intrinsics

Summary

This is a reboot of the original design and implementation by
Nicolai Haehnle <nicolai.haehnle@amd.com>:
https://reviews.llvm.org/D85603

This change also obsoletes an earlier attempt at restarting the work on
convergence tokens:
https://reviews.llvm.org/D104504

Changes relative to D85603:

Clean up the definition of a "convergent operation", a convergent call and convergent function.
Clean up the relationship between dynamic instances, sets of threads and convergence tokens.
Redistribute the formal rules into the definitions of the convergence intrinsics.
Expand on the semantics of entering a function from outside LLVM, and the environment-defined outcome of the entry intrinsic.
Replace the term "cycle" with "closed path". The static rules are defined in terms of closed paths, and then a relation is established with cycles.
Specify that if a function contains a controlled convergent operation, then all convergent operations in that function must be controlled.
Describe an optional procedure to infer tokens for uncontrolled convergent operations.
Introduce controlled maximal convergence-before and controlled m-converged property as an update to the original properties in UniformityAnalysis.
Additional constraint that a cycle heart can only occur in the header of a reducible cycle (natural loop).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sameerds created this revision.Mar 28 2023, 11:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 11:58 PM

Herald added subscribers: jdoerfert, hiraditya. · View Herald Transcript

sameerds requested review of this revision.Mar 28 2023, 11:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 11:58 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

sameerds added reviewers: nhaehnle, arsenm, efriedma, t-tye, simoll, jdoerfert, tra, jlebar, jholewinski, Anastasia, ruiling, foad.Mar 29 2023, 12:04 AM

Herald added subscribers: StephenFan, wdng. · View Herald TranscriptMar 29 2023, 12:04 AM

Harbormaster completed remote builds in B222417: Diff 509224.Mar 29 2023, 5:01 AM

maksimsab added a subscriber: maksimsab.Apr 3 2023, 7:43 AM

sameerds added a subscriber: Restricted Project.Apr 5 2023, 12:27 AM

jsilvanus added a subscriber: jsilvanus.Apr 14 2023, 2:06 AM

jsilvanus added inline comments.Apr 14 2023, 5:44 AM

llvm/docs/ConvergentOperations.rst
575	Maybe mention that `n` is an integer, so There is an integer n such that [..]
726–728	Maybe "does not satisfy" -> "violates"? In the current form, there is a ambiguity in whether all properties are violated, or just a single one. I assume a single one is intended?
830–831	I think this property should be communicated more prominently. Per my understanding, loops in reducible control flow have unique headers, which give rise to a "natural" convergence (implicit maximal convergence?) by counting executions of the header, and considering those converged if the counter agrees. For irreducible control flow, there are no unique headers, instead there is an ambiguity caused by the dependency on a choice of a cycle hierarchy (and their headers). Explicit convergence control intrinsincs eliminate this ambiguity by allowing to explicitly define loop hearts. But for reducible control flow, if loop hearts are not placed at loop headers, then the notion of convergence may be different. I believe this is what these lines refer to. For example, in the example in llvm/docs/convergence-heart.png, removing the edge from D to R makes the CFG reducible, but iteration counts at R might be different from iteration counts at H, due to the shortcut from H to L. Is that intentional, a "neutral" side-effect of the model, or a side-effect of the model that we would want to eliminate but cannot easily? In any case, I think we should discuss this more explicitly. Also, an example somewhere suggests to construct explicit control intrinsics by putting hearts into loop headers, maybe we can mention there that this ensures that the two notions of convergence agree, because the imaginary counters do.

nhaehnle added inline comments.Apr 17 2023, 8:30 AM

llvm/docs/ConvergentOperations.rst
830–831	I'd say it's somewhere between intentional and a neutral side-effect. It certainly allows for some interesting experiments. By the way, the whole loop intrinsic business isn't only about eliminating ambiguity for irreducible loops. Consider: do { do { a(); } while (conda); b(); } while (condb); // vs. do { a(); if (conda) continue; b(); } while (condb); Assuming that `conda` implies `condb`, these two loops are semantically identical from a single-threaded perspective and could easily result in identical CFGs. But the "intuitively expected" convergence behavior is very different. Convergence control allows us to explicitly encode in the IR which of the two intuitive behaviors are expected. (And for the first version of the loop, this would result in two nested loops in the CFG that cannot be collapsed into a single one because the loop intrinsics are "in the way".)

jsilvanus added inline comments.Apr 20 2023, 4:17 AM

llvm/docs/ConvergentOperations.rst
830–831	Thanks for the background and the example, that definitely helps. I still feel this should be stated more explicitly, possibly quite at the beginning when introducing these new concepts? We motivate the use of convergent operations in great detail, but motivate only very briefly why we need to control it explicitly. An example where naive loop-header-based convergence is not intended would be helpful for that. Also, generalizing this example, the fact that explicit control intrinsics allow to change the structure of nested loops while preserving convergence semantics.

jsilvanus added inline comments.Apr 20 2023, 4:24 AM

llvm/docs/ConvergentOperations.rst
830–831	To expand on the above, it probably suffices to add a loop-based example (the one above?) to the list of examples motivating intrinsics, and mention early on that deviating from natural loop-header based convergence is possible and intended, referencing the above new example.

nhaehnle added inline comments.Apr 25 2023, 8:11 AM

llvm/docs/ConvergentOperations.rst
830–831	Ironically, there's now been some discussion that we may be able to simplify this by only allowing loop hearts in natural loop headers. They're still required (the example above with the two loops vs. single loop with continue still stands). But yeah, perhaps that could be added to the document.

nhaehnle mentioned this in D150976: [LangRef] Document the de facto meaning of convergent.May 30 2023, 7:17 AM

rebase
cycle heart is allowed only in the header of a natural loop
address review comments

Herald added subscribers: kerbowa, jvesely. · View Herald TranscriptJun 6 2023, 5:55 AM

sameerds added a reviewer: jsilvanus.Jun 6 2023, 6:03 AM

sameerds marked 7 inline comments as done.

sameerds added inline comments.

llvm/docs/ConvergentOperations.rst
231	New section to really bring out the benefit of explicit convergence control. Something that both @jdoerfert and @jsilvanus had asked about, at different points of time.
726–728	Reworded to remove the confusion. The intention is "one or more".
830–831	The spec is now updated to allow a cycle heart only in the header of a natural loop. Thus the notion of an iteration under convergence remains unchanged from the intuitive notion. Explicit cycle hearts would have allowed the user to specify how "all threads" (for a suitable definition of "all") converge inside an irreducible cycle. But the usefulness of this is rare enough that we can discount it for now.

Harbormaster completed remote builds in B236916: Diff 528817.Jun 6 2023, 7:41 AM

add missing checks in verifier, along with tests

sameerds added a child revision: D152431: [Inliner] Handle convergence control when inlining a call.Jun 8 2023, 2:41 AM

Harbormaster completed remote builds in B237453: Diff 529541.Jun 8 2023, 3:34 AM

arsenm added inline comments.Jun 8 2023, 1:35 PM

llvm/docs/ConvergentOperations.rst
141	Block is missing a terminator. Also should have a token use?
391	Use opaque pointers
llvm/include/llvm/IR/Intrinsics.td
2531	Needs a lot more intrinsic properties. Should really be DefaultAttrsIntrinsic +Convergent+NoMem
llvm/lib/IR/Verifier.cpp
41	Typo ConvertentOperations
llvm/test/Analysis/UniformityAnalysis/AMDGPU/join-at-loop-heart.ll
9	Why was this deleted?
llvm/test/Verifier/convergencectrl-invalid.ll
172	Avoid undef
220	Need some tests with invoke and demonstrate the exception issues
llvm/test/Verifier/convergencectrl-valid.ll
1 ↗	(On Diff #529541)	These kinds of tests should go in test/Assembler and roundtrip through llvm-as/llvm-dis Also should get a bitcode compatibility test

arsenm added inline comments.Jun 8 2023, 1:39 PM

llvm/lib/IR/Verifier.cpp
2572	Don't reconstruct each iteration?
2587	This is a pretty long and indented block, move to helper function?
2642	Don't need llvm::

Add mention to release notes

release notes
added more tests
default attributes for intrinsics

sameerds marked 9 inline comments as done.Jun 14 2023, 8:43 PM

sameerds added inline comments.

llvm/docs/ConvergentOperations.rst
141	Tokens are used only on convergent operations. A token doesn't need to be kept alive beyond the last convergent op that uses it.
llvm/lib/IR/Verifier.cpp
2572	Reconstruction on each iteration is not something I've thought about much. But the programmer's manual only mentions this for std::vector and not SmallVector. Moved the declaration out of the loop anyway.
llvm/test/Analysis/UniformityAnalysis/AMDGPU/join-at-loop-heart.ll
9	This test was incorrectly added with the change that introduced uniformity analysis. The current semantics disallow a heart anywhere other than a loop header, so the example in this test is now invalid.
llvm/test/Verifier/convergencectrl-invalid.ll
220	Well it turns out that EH landing pads occur in blocks where we can't have a call to the entry() or loop() intrinsics. There are some rules about the predecessor blocks, which prevent landing pads in the entry block or in a loop header. So there is nothing to test here. Note to self: If there is no conflict, might as well remove the comments from Verifier.cpp. FIXME: A loop intrinsic is required to be the first non-PHI only if it is a true heart (in a loop header). Verifier should not complain if it occurs in any other block.

Harbormaster completed remote builds in B239022: Diff 531598.Jun 14 2023, 9:29 PM

I'm wondering if there's a better way to represent loops. At the core, the key notion for loop convergence is that all threads converge for every iteration. So in general, a while loop is something like the following:

start:
@llvm.experimental.convergence.loop()
if (!cond) goto end
body;
goto start;
end:

For most cases, making this work doesn't require llvm.experimental.convergence.loop to return a token; the mere existence of a convergent call that executes on every thread every iteration forces the necessary structure on the code. The problem is that after a "break" or "return" inside, the loop doesn't execute the llvm.experimental.convergence.loop call; to solve this, you make llvm.experimental.convergence.loop return a token, and impose a bunch of rules on the placement of the intrinsic and the usage of the token, so the control flow can be reconstructed.

But the fact that we don't execute the llvm.experimental.convergence.loop after a "break" isn't fundamental to the definition of "break", it's just an artifact of the way you're choosing to lower "break". You could instead define a hidden condition:

bool didbreak = false;
start:
if (!cond) goto end
body;
@llvm.experimental.convergence.loop()
if (didbreak) goto end
goto start;
end:

And then define "break" to be equivalent to "didbreak = true; continue;". "return" requires generating a bit more code after the loop, but works similarly.

clang already has the infrastructure necessary to make this work; it's exactly the same kind of control flow you need for C++ destructors with "break"/"return".

The advantage of this formulation is that it makes the invariants for llvm.experimental.convergence.loop a lot simpler. Actually, you don't need any special rules at all: the fact that the intrinsic is convergent dictates the result you want. The disadvantage is that the scalar control flow is a bit more complicated.

(I'm really not an expert on convergence, so I might be missing something. But I didn't see any other discussion along these lines.)

In D147116#4426454, @efriedma wrote:
start:
@llvm.experimental.convergence.loop()
if (!cond) goto end
body;
goto start;
end:
For most cases, making this work doesn't require llvm.experimental.convergence.loop to return a token; the mere existence of a convergent call that executes on every thread every iteration forces the necessary structure on the code. The problem is that after a "break" or "return" inside, the loop doesn't execute the llvm.experimental.convergence.loop call; to solve this, you make llvm.experimental.convergence.loop return a token, and impose a bunch of rules on the placement of the intrinsic and the usage of the token, so the control flow can be reconstructed.

The problem is not that exited threads don't execute the call to llvm.experimental.convergence.loop. We actually want to allow that, and then identify subsets of threads that executed the loop the same number of times and then broke out. The key part is this missing convergent op in the example:

start:
  %inner = @llvm.experimental.convergence.loop() [token %outer]
  if (!cond) {
    convergent_op() [%inner]
    goto end
  }
  body;
  goto start;
end:

The tokens allow us to identify the subsets of threads that will execute convergent_op() "together", on their way out of the loop along the break statement. The token %outer define the set S of threads that entered the loop together, and the token %inner now identifies subsets of S that exited "together". The explicit use of %inner is specifying which threads should communicate at convergent_op(). If the call had used %outer as an argument, it would have meant that the communication at convergent_op() is "outside" the loop, and all threads that entered the loop should execute it together.

The important fact is that convergent_op() is itself outside the CFG loop, although lexically it looks like it is inside the loop. This distinction is even greater when the we replace the "start ... goto start" with a proper high-level loop statement.

I think what I'm describing didn't really get across... probably the convergence-related terminology I'm using isn't quite right.

The result of the change I'm suggesting is that for a loop like while (true) { if (g()) { convergent_op(); break; } }, convergent_op() actually stays inside the CFG loop. If all operations lexically inside the loop are also inside the CFG loop, we don't need tokens to figure out which operations are lexically inside the loop.

In D147116#4429100, @efriedma wrote:

The result of the change I'm suggesting is that for a loop like while (true) { if (g()) { convergent_op(); break; } }, convergent_op() actually stays inside the CFG loop. If all operations lexically inside the loop are also inside the CFG loop, we don't need tokens to figure out which operations are lexically inside the loop.

Yeah, this is how we have always looked at convergence. The new tokens actually try to move away from that picture. A number of different angles to view this from:

It's not useful to always think in terms of "all threads". The tokens returned by the new intrinsics help further specify "which set of threads" converges at a given operation.
The implicit convergence derived from control dependences is kinda sufficient to work with "all threads". It is an approximation that allows a single-thread view to do safe things around convergent operations. But it is not sufficient to clearly specify what it is the actual relation between the CFG and the convergence of multiple threads.
For example, in the same loop or CFG region, etc, one convergent op might be interested in a local convergence captured by the anchor intrinsic, while another might be interested in the threads captured by the loop intrinsic. Now that loop intrinsic might itself have a token argument returned by an anchor intrinsic outside the loop. The subset relationship of all these threads is captured by the constraint on convergence regions.
Until code generation, it is sufficient to just record the relationship between sets of convergent threads. The usual transforms only have to follow the simple static rules about loop hearts and convergence regions to ensure correctness.
The transformation that you are thinking of is actually performed by the backend, where it will "pull" the convergent ops on the exit edges into the loop, and introduce proper mask manipulation to make sure that the right set of threads in a wave/warp executes it.
Until then, we do not actually want to pull that convergent op into the loop body. That will produce unnecessary constraints on transforms working with the loop. The convergent op is most definitely on the exit of the loop. And it's useful to keep it there.
The next step in these patches is to introduce an analysis (D85608) that captures "extended cycles" like the one we are discussing here. This will be used by other analyses and transforms that are "convergence aware" to reason about these extended cycles. This does not require the frontend or any other entity to modify the cycle structure, and no new rules are imposed on the LLVM IR. One example is an enhancement to UniformityAnalysis, where it will recognize some cases of "temporal divergence" that are actually uniform because they are on the exit path of this example.

Until then, we do not actually want to pull that convergent op into the loop body. That will produce unnecessary constraints on transforms working with the loop. The convergent op is most definitely on the exit of the loop. And it's useful to keep it there.

This is not obvious to me. For example, pushing the convergent op out of the loop makes fully unrolling a lot harder: you need to compute a region containing all the uses of the loop token outside the loop, sink any uses of other convergence tokens out of that region, pull all the relevant basic blocks into the loop, then unroll. (I guess in general, full unroll is actually impossible? As far as I can tell, none of the rules guarantee that "sink any uses of other convergence tokens out of that region" is a legal transform. Or am I missing some rule that actually ensures that?)

Or consider loop strength reduction: my first impression is that you want LSR to treat the operands of convergent operations that use a loop's token as if they're inside the loop, because the computation of those operands ends up inside the loop in the final assembly.

I guess pushing the blocks out of the loop makes it simpler to sink non-convergent operations out of the loop?

As far as I can tell, none of the rules guarantee that "sink any uses of other convergence tokens out of that region" is a legal transform. Or am I missing some rule that actually ensures that?

Looking again, I guess this is the "convergence region" rule. It doesn't look like the verifier enforces this rule at the moment, but I assume that's planned?

Spent a bit more time thinking about this... the whole "explicit convergence" thing is starting to make more sense. At a conceptual level, I can see why the notion of barriers is sort of incompatible with what you want to do with tokens. And if you don't have a barrier, then there's nothing actually keeping anything inside the loop, so the notion of an extended loop region is the natural result.

That said, I still think it would be very inconvenient to adapt certain transforms, like unrolling and LSR, to actually handle extended loop regions effectively. It might make sense to have a utility to transform an extended loop into a plain loop.

I think this is in pretty good shape. You may want to give it a bit more time in case more discussion shows up, but it's good for me.

This revision is now accepted and ready to land.Jun 19 2023, 9:29 AM

In D147116#4431982, @efriedma wrote:

Spent a bit more time thinking about this... the whole "explicit convergence" thing is starting to make more sense. At a conceptual level, I can see why the notion of barriers is sort of incompatible with what you want to do with tokens. And if you don't have a barrier, then there's nothing actually keeping anything inside the loop, so the notion of an extended loop region is the natural result.

That said, I still think it would be very inconvenient to adapt certain transforms, like unrolling and LSR, to actually handle extended loop regions effectively. It might make sense to have a utility to transform an extended loop into a plain loop.

Thanks Eli, for all the attention you are giving here! It's really important to see that people are understanding the spec enough to look beyond it at the implications!

About unrolling, please do see D85605 for an initial attempt at fixing those passes. It only unrolls in the trivial cases. If a token defined inside a loop is used outside, then there is no attempt at reconstructing the extended region. Instead, we rely on the super-conservative existing behaviour that unrolling is disabled if a token type is used outside the block where it is defined. Clearly, there is a long way to go from there.

Notably, the stack of reviews starting D85603 is the original attempt at introducing convergence tokens. The new stack of reviews attempt is mostly a rebase with some simplifications as we discovered them.

Good to know that the rule about nesting convergence regions works out. It is indeed enforced by the verifier in the lambda "checkBundle", where it tracks a stack of "live tokens" and ensures that these intervals are well-nested.

rebase
removed the verifier comment about conflict with EH landing pads

In D147116#4433173, @nhaehnle wrote:

I think this is in pretty good shape. You may want to give it a bit more time in case more discussion shows up, but it's good for me.

Thanks. I'll wait until Monday for further comments.

sameerds mentioned this in D152431: [Inliner] Handle convergence control when inlining a call.Jun 19 2023, 10:12 PM

sameerds marked an inline comment as done.Jun 19 2023, 10:19 PM

sameerds added inline comments.

llvm/test/Verifier/convergencectrl-invalid.ll
220	Simplified the comments in the verifier, but did not relax the check for non-heart loop calls. For now, the verifier allows a call to `loop` only at the start of a block. That's not necessary if the call is not a loop heart, but relaxing this check is not very important yet. We can revisit if we have a real use-case where a non-heart `loop` call needs to be in the middle of a block.

Harbormaster completed remote builds in B239913: Diff 532784.Jun 19 2023, 11:18 PM

My understanding of nuances here is not sufficient for a meaningful review. @nhaehnle's LGTM works for me.

sameerds mentioned this in D153744: [LoopUnroll] adjust for new `convergent` semantics.Jun 26 2023, 7:12 AM

Bump! @jdoerfert mostly just waiting for a rubber-stamp at this point.

https://discourse.llvm.org/t/rfc-introduce-convergence-control-intrinsics/69613

rebase
preparing to commit

This revision was landed with ongoing or failed builds.Jul 12 2023, 12:02 AM

Closed by commit rGda61c865e734: [RFC] Introduce convergence control intrinsics (authored by sameerds). · Explain Why

This revision was automatically updated to reflect the committed changes.

sameerds added a commit: rGda61c865e734: [RFC] Introduce convergence control intrinsics.

Harbormaster completed remote builds in B244674: Diff 539394.Jul 12 2023, 2:03 AM

sameerds added a child revision: D153744: [LoopUnroll] adjust for new `convergent` semantics.Jul 18 2023, 9:31 AM

sameerds mentioned this in D85603: IR: Add convergence control operand bundle and intrinsics.Aug 21 2023, 11:48 PM

Revision Contents

Path

Size

llvm/

docs/

ConvergenceAndUniformity.rst

79 lines

ConvergentOperations.rst

1607 lines

LangRef.rst

61 lines

Reference.rst

4 lines

ReleaseNotes.rst

4 lines

include/

llvm/

ADT/

GenericCycleImpl.h

4 lines

GenericCycleInfo.h

1 line

Analysis/

CycleAnalysis.h

2 lines

IR/

CycleInfo.h

31 lines

Intrinsics.td

8 lines

LLVMContext.h

1 line

lib/

Analysis/

CycleAnalysis.cpp

3 lines

IR/

1 line

16 lines

5 lines

182 lines

test/

Analysis/

UniformityAnalysis/

AMDGPU/

join-at-loop-heart.ll

Assembler/

convergence-control.ll

91 lines

Bitcode/

convergence-control.ll

42 lines

convergence-control.ll.bc

operand-bundles-bc-analyzer.ll

1 line

Verifier/

convergencectrl-invalid.ll

225 lines

Diff 539407

llvm/docs/ConvergenceAndUniformity.rst

		.. _convergence-and-uniformity:

==========================		==========================
Convergence And Uniformity		Convergence And Uniformity
==========================		==========================

.. contents::		.. contents::
:local:		:local:

Introduction		Introduction
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	Join node
A join node of a branch is a node reachable along disjoint paths		A join node of a branch is a node reachable along disjoint paths
starting from that branch.		starting from that branch.

Diverged path		Diverged path
A diverged path is a path that starts from a divergent branch and		A diverged path is a path that starts from a divergent branch and
either reaches a join node of the branch or reaches the end of the		either reaches a join node of the branch or reaches the end of the
function without passing through any join node of the branch.		function without passing through any join node of the branch.

		.. _convergence-dynamic-instances:

Threads and Dynamic Instances		Threads and Dynamic Instances
=============================		=============================

Each occurrence of an instruction in the program source is called a		Each occurrence of an instruction in the program source is called a
static instance. When a thread executes a program, each execution of		static instance. When a thread executes a program, each execution of
a static instance produces a distinct dynamic instance of that		a static instance produces a distinct dynamic instance of that
instruction.		instruction.

Show All 37 Lines
===========		===========

Converged-with is a transitive symmetric relation over dynamic		Converged-with is a transitive symmetric relation over dynamic
instances produced by different threads for the *same static		instances produced by different threads for the *same static
instance*. Informally, two threads that produce converged dynamic		instance*. Informally, two threads that produce converged dynamic
instances are said to be converged, and they are said to execute		instances are said to be converged, and they are said to execute
that static instance convergently, at that point in the execution.		that static instance convergently, at that point in the execution.

Convergence order is a strict partial order over dynamic instances		Convergence-before is a strict partial order over dynamic instances
that is defined as the transitive closure of:		that is defined as the transitive closure of:

1. If dynamic instance ``P`` is executed strictly before ``Q`` in the		1. If dynamic instance ``P`` is executed strictly before ``Q`` in the
same thread, then ``P`` is convergence-before ``Q``.		same thread, then ``P`` is convergence-before ``Q``.
2. If dynamic instance ``P`` is executed strictly before ``Q1`` in the		2. If dynamic instance ``P`` is executed strictly before ``Q1`` in the
same thread, and ``Q1`` is converged-with ``Q2``, then ``P`` is		same thread, and ``Q1`` is converged-with ``Q2``, then ``P`` is
convergence-before ``Q2``.		convergence-before ``Q2``.
3. If dynamic instance ``P1`` is converged-with ``P2``, and ``P2``		3. If dynamic instance ``P1`` is converged-with ``P2``, and ``P2``
Show All 19 Lines
to be converged (i.e., related to each other in the converged-with		to be converged (i.e., related to each other in the converged-with
relation). The resulting convergence order includes the edges ``P ->		relation). The resulting convergence order includes the edges ``P ->
Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc.		Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc.

The fact that convergence-before is a strict partial order is a		The fact that convergence-before is a strict partial order is a
constraint on the converged-with relation. It is trivially satisfied		constraint on the converged-with relation. It is trivially satisfied
if different dynamic instances are never converged. It is also		if different dynamic instances are never converged. It is also
trivially satisfied for all known implementations for which		trivially satisfied for all known implementations for which
convergence plays some role. Aside from the strict partial convergence		convergence plays some role.
order, there are currently no additional constraints on the
converged-with relation imposed in LLVM IR.

.. _convergence-note-convergence:		.. _convergence-note-convergence:

.. note::		.. note::

1. The ``convergent`` attribute on convergent operations does		1. The convergence-before relation is not
constrain changes to ``converged-with``, but it is expressed in
terms of control flow and does not explicitly deal with thread
convergence.

2. The convergence-before relation is not
directly observable. Program transforms are in general free to		directly observable. Program transforms are in general free to
change the order of instructions, even though that obviously		change the order of instructions, even though that obviously
changes the convergence-before relation.		changes the convergence-before relation.

3. Converged dynamic instances need not be executed at the same		2. Converged dynamic instances need not be executed at the same
time or even on the same resource. Converged dynamic instances		time or even on the same resource. Converged dynamic instances
of a convergent operation may appear to do so but that is an		of a convergent operation may appear to do so but that is an
implementation detail. The fact that ``P`` is convergence-before		implementation detail.

		3. The fact that ``P`` is convergence-before
``Q`` does not automatically imply that ``P`` happens-before		``Q`` does not automatically imply that ``P`` happens-before
``Q`` in a memory model sense.		``Q`` in a memory model sense.

4. Future work: Providing convergence-related guarantees to
compiler frontends enables some powerful optimization techniques
that can be used by programmers or by high-level program
transforms. Constraints on the ``converged-with`` relation may
be added eventually as part of the definition of LLVM
IR, so that guarantees can be made that frontends can rely on.
For a proposal on how this might work, see `D85603
<https://reviews.llvm.org/D85603>`_.

.. _convergence-maximal:		.. _convergence-maximal:

Maximal Convergence		Maximal Convergence
-------------------		-------------------

This section defines a constraint that may be used to		This section defines a constraint that may be used to
produce a maximal converged-with relation without violating the		produce a maximal converged-with relation without violating the
strict convergence-before order. This maximal converged-with		strict convergence-before order. This maximal converged-with
relation is reasonable for real targets and is compatible with		relation is reasonable for real targets and is compatible with
convergent operations.		convergent operations.

The maximal converged-with relation is defined in terms of cycle		The maximal converged-with relation is defined in terms of cycle
headers, which are not unique to a given CFG. Each cycle hierarchy for		headers, with the assumption that threads converge at the header on every
the same CFG results in a different maximal converged-with relation.		"iteration" of the cycle. Informally, two threads execute the same iteration of
		a cycle if they both previously executed the cycle header the same number of
		times after they entered that cycle. In general, this needs to account for the
		iterations of parent cycles as well.

Maximal converged-with:		Maximal converged-with:

Dynamic instances ``X1`` and ``X2`` produced by different threads		Dynamic instances ``X1`` and ``X2`` produced by different threads
for the same static instance ``X`` are converged in the maximal		for the same static instance ``X`` are converged in the maximal
converged-with relation if and only if for every cycle ``C`` with		converged-with relation if and only if for every cycle ``C`` with
header ``H`` that contains ``X``:		header ``H`` that contains ``X``:

- every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in		- every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in
the respective thread is convergence-before ``X2``, and,		the respective thread is convergence-before ``X2``, and,
- every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in		- every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in
the respective thread is convergence-before ``X1``,		the respective thread is convergence-before ``X1``,
- without assuming that ``X1`` is converged with ``X2``.		- without assuming that ``X1`` is converged with ``X2``.

.. note::		.. note::

		Cycle headers may not be unique to a given CFG if it is irreducible. Each
		cycle hierarchy for the same CFG results in a different maximal
		converged-with relation.

For brevity, the rest of the document restricts the term		For brevity, the rest of the document restricts the term
converged to mean "related under the maximal converged-with		converged to mean "related under the maximal converged-with
relation for the given cycle hierarchy".		relation for the given cycle hierarchy".

Maximal convergence can now be demonstrated in the earlier example as follows:		Maximal convergence can now be demonstrated in the earlier example as follows:

.. table::		.. table::
:align: left		:align: left
Show All 18 Lines
- ``L3`` is not converged with ``L5`` due to ``H5`` which is not		- ``L3`` is not converged with ``L5`` due to ``H5`` which is not
convergence-before ``L3``.		convergence-before ``L3``.

.. _convergence-cycle-headers:		.. _convergence-cycle-headers:

Dependence on Cycles Headers		Dependence on Cycles Headers
----------------------------		----------------------------

Contradictions in convergence order are possible only between two		Contradictions in convergence-before are possible only between two
nodes that are inside some cycle. The dynamic instances of such nodes		nodes that are inside some cycle. The dynamic instances of such nodes
may be interleaved in the same thread, and this interleaving may be		may be interleaved in the same thread, and this interleaving may be
different for different threads.		different for different threads.

When a thread executes a node ``X`` once and then executes it again,		When a thread executes a node ``X`` once and then executes it again,
it must have followed a closed path in the CFG that includes ``X``.		it must have followed a closed path in the CFG that includes ``X``.
Such a path must pass through the header of at least one cycle --- the		Such a path must pass through the header of at least one cycle --- the
smallest cycle that includes the entire closed path. In a given		smallest cycle that includes the entire closed path. In a given
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
the cycle produce uniform values, but exit the cycle along the same		the cycle produce uniform values, but exit the cycle along the same
divergent path after executing the header a different number of times		divergent path after executing the header a different number of times
(informally, on different iterations of the cycle). For a node ``N``		(informally, on different iterations of the cycle). For a node ``N``
inside the cycle the outputs may be uniform for the two threads, but		inside the cycle the outputs may be uniform for the two threads, but
any use ``U`` outside the cycle receives a value from non-converged		any use ``U`` outside the cycle receives a value from non-converged
dynamic instances of ``N``. An output of ``U`` may be divergent,		dynamic instances of ``N``. An output of ``U`` may be divergent,
depending on the semantics of the instruction.		depending on the semantics of the instruction.

		.. _uniformity-analysis:

Static Uniformity Analysis		Static Uniformity Analysis
==========================		==========================

Irreducible control flow results in different cycle hierarchies		Irreducible control flow results in different cycle hierarchies
depending on the choice of headers during depth-first traversal. As a		depending on the choice of headers during depth-first traversal. As a
result, a static analysis cannot always determine the convergence of		result, a static analysis cannot always determine the convergence of
nodes in irreducible cycles, and any uniformity analysis is limited to		nodes in irreducible cycles, and any uniformity analysis is limited to
those static instances whose convergence is independent of the cycle		those static instances whose convergence is independent of the cycle
Show All 15 Lines	.. _convergence-m-converged:
cycle hierarchy for the same CFG.		cycle hierarchy for the same CFG.

As noted earlier, for brevity, we restrict the term converged to		As noted earlier, for brevity, we restrict the term converged to
mean "related under the maximal converged-with relation for a given		mean "related under the maximal converged-with relation for a given
cycle hierarchy".		cycle hierarchy".


Each node ``X`` in a given CFG is reported to be m-converged if and		Each node ``X`` in a given CFG is reported to be m-converged if and
only if:		only if every cycle that contains ``X`` satisfies the following necessary
		conditions:
1. ``X`` is a :ref:`top-level<cycle-toplevel-block>` node, in which
case, there are no cycle headers to influence the convergence of
``X``.

2. Otherwise, if ``X`` is inside a cycle, then every cycle that		1. Every divergent branch inside the cycle satisfies the
contains ``X`` satisfies the following necessary conditions:

a. Every divergent branch inside the cycle satisfies the
:ref:`diverged entry criterion<convergence-diverged-entry>`, and,		:ref:`diverged entry criterion<convergence-diverged-entry>`, and,
b. There are no :ref:`diverged paths reaching the		2. There are no :ref:`diverged paths reaching the
cycle<convergence-diverged-outside>` from a divergent branch		cycle<convergence-diverged-outside>` from a divergent branch
outside it.		outside it.

.. note::		.. note::

A reducible cycle :ref:`trivially satisfies		A reducible cycle :ref:`trivially satisfies
<convergence-reducible-cycle>` the above conditions. In particular,		<convergence-reducible-cycle>` the above conditions. In particular,
if the whole CFG is reducible, then all nodes in the CFG are		if the whole CFG is reducible, then all nodes in the CFG are
m-converged.		m-converged.

▲ Show 20 Lines • Show All 212 Lines • ▼ Show 20 Lines
2. When diverged paths reach the subgraph ``C`` from outside, their		2. When diverged paths reach the subgraph ``C`` from outside, their
convergence is always determined by the same header ``H``.		convergence is always determined by the same header ``H``.

Clearly, this can be determined only in a cycle hierarchy ``T`` where		Clearly, this can be determined only in a cycle hierarchy ``T`` where
``C`` is detected as a reducible cycle. No such conclusion can be made		``C`` is detected as a reducible cycle. No such conclusion can be made
in a different cycle hierarchy ``T'`` where ``C`` is part of a larger		in a different cycle hierarchy ``T'`` where ``C`` is part of a larger
cycle ``C'`` with the same header, but this does not contradict the		cycle ``C'`` with the same header, but this does not contradict the
conclusion in ``T``.		conclusion in ``T``.

		Controlled Convergence
		======================

		:ref:`Convergence control tokens <dynamic_instances_and_convergence_tokens>`
		provide an explicit semantics for determining which threads are converged at a
		given point in the program. The impact of this is incorporated in a
		:ref:`controlled maximal converged-with <controlled_maximal_converged_with>`
		relation over dynamic instances and a :ref:`controlled m-converged
		<controlled_m_converged>` property of static instances. The :ref:`uniformity
		analysis <uniformity-analysis>` implemented in LLVM includes this for targets
		that support convergence control tokens.

llvm/docs/ConvergentOperations.rst

This file was added.

				==============================
				Convergent Operation Semantics
				==============================

				.. contents::
				:local:
				:depth: 4

				Overview
				========

				Some parallel execution environments execute threads in groups that allow
				efficient communication within the group using special primitives called
				convergent operations. The outcome of a convergent operation is sensitive to
				the set of threads that executes it "together", i.e., convergently. When control
				flow :ref:`diverges <convergence-and-uniformity>`, i.e. threads of the same
				group follow different
				paths through the CFG, not all threads of the group may be available to
				participate in this communication. This is the defining characteristic that
				distinguishes convergent operations from other inter-thread communication:

				A convergent operation involves inter-thread communication or synchronization
				that occurs outside of the memory model, where the set of threads which
				participate in communication is implicitly affected by control flow.

				For example, in the following GPU compute kernel, communication during the
				convergent operation is expected to occur precisely among those threads of an
				implementation-defined execution scope (such as workgroup or subgroup) for
				which ``condition`` is true:

				.. code-block:: c++

				void example_kernel() {
				...
				if (condition)
				convergent_operation();
				...
				}

				In structured programming languages, there is often an intuitive and
				unambiguous way of determining the threads that are expected to communicate.
				However, this is not always the case even in structured programming languages,
				and the intuition breaks down entirely in unstructured control flow. This
				document describes the formal semantics in LLVM, i.e. how to determine the set
				of communicating threads for convergent operations.

				The definitions in this document leave many details open, such as how groups of
				threads are formed in the first place. It focuses on the questions that are
				relevant for deciding the correctness of generic program transforms and
				convergence-related analyses such as :ref:`uniformity analysis
				<convergence-and-uniformity>`.

				.. _convergent_operations:

				Convergent Operations
				=====================

				In LLVM IR, the only way to communicate between threads as described
				above is by calling target-defined convergent intrinsics. Hence, only
				a call-site in LLVM IR (a :ref:`call <i_call>`, :ref:`invoke
				<i_invoke>`, or :ref:`callbr <i_callbr>` instruction) can result in a
				convergent operation.

				A function in LLVM IR is said to be convergent if it has the
				:ref:`convergent <attr_convergent>` attribute.

				A call-site in LLVM IR is said to be convergent if it is a direct
				call to a convergent function or it has the :ref:`convergent
				<attr_convergent>` attribute or a :ref:`convergencectrl operand bundle
				<convergencectrl>`.

				Informational notes:

				A function may have to be treated as convergent if that function, or
				transitively, any function called from it, contains a convergent call-site. A
				frontend generating the ``convergent`` attribute should take this into account
				when emitting functions and function calls. But this is not always the case:

				A non-convergent function may contain convergent operations; such operations
				do not directly depend on the set of threads that enter the function as a
				single communicating group. Instead, these operations depend on an
				implementation-defined subset of threads within the body of the function, as
				shown in :ref:`opportunistic_convergence`.

				Examples of Convergent Operations
				========================================

				(This section is informative.)

				Texture sampling in a pixel shader
				----------------------------------

				The following stylized pixel shader samples a texture at a given set of
				coordinates, using the builtin function `textureSample`. Texture sampling
				requires screen-space derivatives of the coordinates to determine the level of
				detail (mipmap level) of the sample. They are commonly approximated by taking
				the difference between neighboring pixels, which are computed by different
				threads in the same group:

				.. code-block:: c++

				void example_shader() {
				...
				color = textureSample(texture, coordinates);
				if (condition) {
				use(color);
				}
				...
				}

				From a purely single-threaded perspective, sinking the `textureSample` into
				the if-statement appears legal. However, if the condition is false for some
				neighboring pixels, then their corresponding threads will not execute together
				in the group, making it impossible to take the difference of coordinates as an
				approximation of the screen-space derivative. In practice, the outcome will be
				an undefined value.

				That is, the `textureSample` operation fits our definition of a convergent
				operation:

				1. It communicates with a set of threads that implicitly depends on control
				flow.
				2. Correctness depends on this set of threads.

				The compiler frontend can emit IR that expresses the convergence constraints as
				follows:

				.. code-block:: llvm

				define void @example_shader() convergent {
				%entry = call token @llvm.experimental.convergence.entry()
				...
				%color = call T @textureSample(U %texture, V %coordinates) [ "convergencectrl"(token %entry) ]
				br i1 %condition, label %then, label %end

				then:
				call void @use(T %color)
				br label %end

				end:
				ret void
				arsenmUnsubmitted Done Reply Inline Actions Block is missing a terminator. Also should have a token use? arsenm: Block is missing a terminator. Also should have a token use?
				sameerdsAuthorUnsubmitted Done Reply Inline Actions Tokens are used only on convergent operations. A token doesn't need to be kept alive beyond the last convergent op that uses it. sameerds: Tokens are used only on convergent operations. A token doesn't need to be kept alive beyond the…
				}

				The :ref:`llvm.experimental.convergence.entry <llvm.experimental.convergence.entry>`
				intrinsic is itself ``convergent``, and we expect it to communicate at least
				among all threads of the same "quad" -- a group of 2x2 pixels that are
				evaluated together for the purpose of approximating screen-space derivatives.
				This fact is not part of the generic LLVM IR semantics; it would have to be
				defined somewhere else, for example as part of target-specific ABI definitions
				and/or in reference to some relevant API specs.

				Since the ``@textureSample`` call then uses the token produced by the entry
				intrinsic in its ``convergencectrl`` bundle, and has no additional control
				dependencies, it must communicate among the same set of threads. This indicates
				to generic program transforms that sinking the ``@textureSample`` call is
				forbidden. (A program transform can still sink the call if it can prove somehow,
				e.g. by leaning on target-specific callbacks that can analyze the program with
				additional knowledge, that ``%condition`` is always uniform across the threads
				referenced by the convergence token ``%entry``.)

				.. _convergence_example_reductions:

				Reductions inside divergent control flow
				----------------------------------------

				The following example shows that merging common code of branches can be
				incorrect in the face of convergent operations:

				.. code-block:: c++

				void example_kernel() {
				delta = ...
				if (delta > 0) {
				total_gains = subgroupAdd(delta);
				...
				} else {
				total_losses = subgroupAdd(delta);
				...
				}
				}

				The ``subgroupAdd`` computing the ``total_gains`` will be executed by the
				subset of threads with positive ``delta`` in a subgroup (wave), and so will sum
				up all the ``delta`` values of those threads; and similarly for the
				``subgroupAdd`` that computes the ``total_losses``.

				If we were to hoist and merge the ``subgroupAdd`` above the if-statement, it
				would sum up the ``delta`` across all threads instead.

				The compiler frontend can emit IR that expresses the convergence constraints
				as follows:

				.. code-block:: llvm

				define void @example_kernel() convergent {
				%entry = call token @llvm.experimental.convergence.entry()
				%delta = ...
				%cc = icmp sgt i32 %delta, 0
				br i1 %cc, label %then, label %else

				then:
				%total_gains = call i32 @subgroupAdd(i32 %delta) [ "convergencectrl"(token %entry) ]
				...
				br label %end

				else:
				%total_losses = call i32 @subgroupAdd(i32 %delta) [ "convergencectrl"(token %entry) ]
				...
				br label %end

				end:
				...
				}

				The entry intrinsic behaves like in the previous example: assuming that
				``@example_kernel`` is an OpenCL kernel (as hinted at by the "subgroup"
				terminology), we expect it to communicate among all threads within the
				"subgroup". This typically maps to a SIMD vector on GPU hardware.

				The calls to ``@subgroupAdd`` use the token produced by the entry intrinsic,
				but they also have an additional control dependency. According to the rules
				defined in this document, they only communicate among the subset of threads
				that actually end up executing the respective (static) call site.

				Hoisting them would remove the control dependency and cause them to communicate
				among the full set of threads that the entry intrinsic communicated with.
				Again, hoisting is allowed if it can be proven that ``%cc`` is always uniform
				among the relevant set of threads: in that case, the ``@subgroupAdd`` already
				communicates among the full set of threads in the original program.

				Motivating Examples of Convergence Control
				sameerdsAuthorUnsubmitted Done Reply Inline Actions New section to really bring out the benefit of explicit convergence control. Something that both @jdoerfert and @jsilvanus had asked about, at different points of time. sameerds: New section to really bring out the benefit of explicit convergence control. Something that…
				==========================================

				(This section is informative.)

				Unstructured control flow
				-------------------------

				Consider an example of how jump threading removes structure in a way that can
				make semantics non-obvious without the convergence intrinsics described in this
				document:

				.. code-block:: llvm

				void example_original() {
				entry:
				...
				br i1 %cond1, label %then1, label %mid

				then1:
				...
				%cond2 = ...
				br label %mid

				mid:
				%flag = phi i1 [ true, %entry ], [ %cond2, %then1 ]
				br i1 %flag, label %then2, label %end

				then2:
				...
				call void @subgroupControlBarrier()
				...
				br label %end

				end:
				}

				void example_jumpthreaded() {
				entry:
				...
				br i1 %cond1, label %then1, label %then2

				then1:
				...
				%cond2 = ...
				br i1 %cond2, label %then2, label %end

				then2:
				...
				call void @subgroupControlBarrier()
				...
				br label %end

				end:
				}

				Is the control barrier guaranteed to synchronize among the same set of threads
				in both cases? Different implementations in the literature may give different
				answers to this question:

				* In an implementation that reconverges at post-dominators, threads reconverge
				at ``mid`` in the first version, so that all threads (within a subgroup/wave)
				that execute the control barrier do so together. In the second version,
				threads that reach the control barrier via different paths synchronize
				separately: the first (and only) post-dominator is ``end``, so threads do not
				reconverge before then.

				* An implementation that sorts basic blocks topologically and ensures maximal
				reconvergence for each basic block would behave the same way in both
				versions.

				We generally take the stance that reconvergence in acyclic control flow must
				be maximal. The compiler frontend could augment the original code as follows:

				.. code-block:: llvm

				define void @example_original() convergent {
				entry:
				%entry = call token @llvm.experimental.convergence.entry()
				...
				br i1 %cond1, label %then1, label %mid

				then1:
				...
				%cond2 = ...
				br label %mid

				mid:
				%flag = phi i1 [ true, %entry ], [ %cond2, %then1 ]
				br i1 %flag, label %then2, label %end

				then2:
				...
				call void @subgroupControlBarrier() [ "convergencectrl"(token %entry) ]
				...
				br label %end

				end:
				}

				If S is the set of threads that the entry intrinsic communicated with, then
				the ``@subgroupControlBarrier`` call communicates with the subset of S that
				actually reaches the call site. This set of threads doesn't change after
				jump-threading, so the answer to the question posed above remains the same.

				.. _opportunistic_convergence:

				Opportunistic convergent operations
				-----------------------------------

				Some programs have local regions of code that contain a sequence of convergent
				operations where the code does not care about the exact set of threads with
				which it is executed, but only that the set of threads is the same for all the
				operations within the sequence. (If a subset of the convergent operations in the
				sequence have additional, non-uniform control dependencies, then this is not
				possible. However, the code may still require that the sets of threads are
				logically consistent with the conditions of those control dependencies.) In this
				case, :ref:`llvm.experimental.convergence.anchor
				<llvm.experimental.convergence.anchor>` can be used to express the desired
				semantics.

				The following example function could be part of a hypothetical "append buffer"
				implementation, where threads conditionally write fixed-sized records
				contiguously into a global buffer. The function ``@reserveSpaceInBuffer``
				returns the index into the buffer at which the calling thread should store its
				data.

				This could be achieved by using a simple atomic operation in every thread to
				bump an allocation counter.

				However, the following implementation can be more performant on some hardware,
				because it uses only a single atomic operation for an entire group of threads.
				To do this, it first determines the total size of the group, which will be the
				operand to the atomic operation, and then later broadcasts the result of the
				atomic operation to all threads of the group, so that each thread can compute
				its individual position in the buffer:

				.. code-block:: llvm

				define i32 @reserveSpaceInBuffer() { ; NOTE: _not_ a convergent function!
				entry:
				%anchor = call token @llvm.experimental.convergence.anchor()

				%ballot = call i64 @subgroupBallot(i1 true) [ "convergencectrl"(token %anchor) ]
				%numThreads.p = call i64 @llvm.ctpop.i64(i64 %ballot)
				%numThreads = trunc i64 %numThreads.p to i32

				%absoluteThreadIdx = call i32 @getSubgroupLocalInvocationId()
				%absoluteThreadIdx.ext = zext i32 %absoluteThreadIdx to i64
				%mask.p = shl i64 1, %absoluteThreadIdx.ext
				%mask = sub i64 %mask.p, 1

				%maskedBallot = and i64 %ballot, %mask
				%relativeThreadIdx.p = call i64 @llvm.ctpop.i64(i64 %maskedBallot)
				%relativeThreadIdx = trunc i64 %relativeThreadIdx.p to i32

				%isFirstThread = icmp eq i32 %relativeThreadIdx, 0
				br i1 %isFirstThread, label %then, label %end

				then:
				%baseOffset.1 = atomicrmw add ptr @bufferAllocationCount, i32 %numThreads monotonic
				arsenmUnsubmitted Done Reply Inline Actions Use opaque pointers arsenm: Use opaque pointers
				br label %end

				end:
				%baseOffset.2 = phi i32 [ undef, %entry ], [ %baseOffset.1, %then ]
				%baseOffset = call i32 @subgroupBroadcastFirst(i32 %baseOffset.2) [ "convergencectrl"(token %anchor) ]
				%offset = add i32 %baseOffset, %relativeThreadIdx
				ret i32 %offset
				}

				The key here is that the function really doesn't care which set of threads it
				is being called with. It takes whatever set of threads it can get. What the
				implementation of the function cares about is that the initial
				``@subgroupBallot`` -- which is used to retrieve the bitmask of threads that
				executed the anchor together -- executes with the same set of threads as the
				final ``@subgroupBroadcastFirst``. Nothing else is required for correctness as
				far as convergence is concerned.

				The function ``@reserveSpaceInBuffer`` itself is _not_ ``convergent``: callers
				are free to move call sites of the function as they see fit. This can change
				the behavior in practice, by changing the sets of threads that are grouped
				together for the atomic operation. This can be visible in the output of the
				program, since the order in which outputs appear in the buffer is changed.
				However, this does not break the overall contract that ``@reserveSpaceInBuffer``
				has with its caller -- which makes sense: the order of outputs is
				non-deterministic anyway because of the atomic operation that is involved.

				If the function is inlined, the use of the anchor intrinsic similarly indicates
				that certain transforms which are usually forbidden by the presence of
				convergent operations are in fact allowed, as long as they don't break up the
				region of code that is controlled by the anchor.

				.. _convergence_high-level_break:

				Extended Cycles: Divergent Exit from a Loop
				-------------------------------------------

				High-level languages typically provide a ``break`` statement that transfers
				control out of a loop statement. In most cases, the loop is structured and hence
				there is no ambiguity about convergence inside the loop. But an ambiguity arises
				when a ``break`` is control dependent on a divergent condition inside the loop.
				Consider the following example:

				.. code-block:: c++

				void example() {
				// A
				...
				for (...) {
				// B
				if (condition) { // divergent condition
				// C
				convergent_op();
				break;
				}
				// D
				...
				}
				// E
				}

				In this program, the call to convergent_op() is lexically "inside" the ``for``
				loop. But when translated to LLVM IR, the basic block B is an exiting block
				ending in a divergent branch, and the basic block C is an exit of the loop.
				Thus, the call to convergent_op() is outside the loop. This causes a mismatch
				between the programmer's expectation and the compiled program. The call should
				be executed convergently on every iteration of the loop, by threads that
				together take the branch to exit the loop. But when compiled, all threads that
				take the divergent exit on different iterations first converge at the beginning
				of basic block C and then together execute the call to convergent_op().

				In this case, :ref:`llvm.experimental.convergence.loop
				<llvm.experimental.convergence.loop>` can be used to express the desired
				semantics. A call to this intrinsic is placed in the loop header, which tracks
				each iteration of the loop. The token produced by this is used as a
				``convergencectrl`` operand to the convergent call. The semantics of the
				``loop`` intrinsic ensures that the convergent call is performed convergently
				only by those threads that convergently exited the loop in a given iteration.

				.. code-block:: llvm

				define void @example() convergent {
				%entry = call token @llvm.experimental.convergence.entry()
				br label %for

				for:
				%inner = call token @llvm.experimental.convergence.loop() ["convergencectrl"(token %entry)]
				%for.cond = i1 ...
				br i1 %for.cond, label %B, label %E

				B:
				...
				%condition = i1 ...
				br i1 %condition, label %C, label %D

				C:
				call void @convergent_op() ["convergencectrl"(token %inner)]
				br label %E

				D:
				...
				br label %for

				E:
				...
				ret void
				}

				The LLVM IR version of the same program shows a cycle consisting of the basic
				blocks ``%for``, ``%B`` and ``%D``, while ``%C`` is an exit reached by the
				divergent branch at the end of the exiting block ``%B``. But the use of
				convergence control tokens makes it clear that block ``%C`` must be executed
				convergently only by those threads that convergently take the exit edge from %B
				to ``%C``. In other words, the convergent execution of ``%C`` is governed by the
				call to the :ref:`llvm.experimental.convergence.loop
				<llvm.experimental.convergence.loop>` intrinsic inside the cycle. The cycle is
				effectively extended to include all uses of this token that lie outside the
				cycle.

				.. _dynamic_instances_and_convergence_tokens:

				Dynamic Instances and Convergence Tokens
				========================================

				Every execution of an LLVM IR instruction occurs in a :ref:`dynamic instance
				<convergence-dynamic-instances>` of the instruction. Dynamic instances are the
				formal objects by which we talk about communicating threads in convergent
				operations. Dynamic instances are defined for all operations in an LLVM
				program, whether convergent or not. Convergence control is primarily about the
				dynamic instances of convergent operations since they affect execution of the
				program through inter-thread communication. The dynamic instances for
				non-convergent operations are relevant for determining :ref:`uniformity
				<convergence-and-uniformity>` of values.

				Dynamic instances produced by the execution of the same convergent operation
				by different threads may be :ref:`converged <convergence-definition>`. When
				executing a convergent operation, the set of threads that execute converged
				dynamic instances is the set of threads that communicate with each other.
				Convergence tokens capture this convergence as described below.

				Convergence tokens are values of ``token`` type, i.e. they cannot be used in
				``phi`` or ``select`` instructions. A convergence token value represents the
				dynamic instance of the instruction that produced it.

				Convergent operations may have an optional ``convergencectrl`` operand bundle with
				a convergence token operand to define the set of communicating threads relative
				to the operation that defined the token.

				Let ``U`` be a convergent operation other than a call to a convergence
				control intrinsic, and ``D`` be the convergent operation that defines
				the token value used as the ``convergencectrl`` operand to ``U``. Two
				threads execute converged dynamic instances of ``U`` if and only if the
				token value in both threads was returned by converged dynamic
				instances of ``D``.

				.. note::

				The text defines convergence token values as representing dynamic instances.
				But if we were to assume that converged dynamic instances produce the same
				token value, then we could almost think of the token value as representing a
				set of threads instead -- specifically, the set ``S`` of threads that
				executed converged dynamic instances of the defining instruction ``D``.

				In this intuitive picture, when a convergence token value ``T`` is used by a
				``convergencectrl`` bundle on an instruction ``I``, then the set of threads that
				communicates in ``I`` is a subset of the set ``S`` represented by the token value.
				Specifically, it is the subset of threads that ends up executing ``I`` while
				using the token value.

				This by itself wouldn't quite work as a definition: what if ``I`` is executed
				multiple times by the same threads? Which execution of ``I`` in thread 1
				communicates with which execution of ``I`` in thread 2? Leaning on the notion
				of dynamic instances gives a robust answer to this question as long as ``D``
				and ``I`` are at the same loop (or cycle) nesting level.

				The case where ``D`` and ``I`` are at different loop nesting levels is
				forbidden by the :ref:`static rules <convergence_static_rules>` -- handling
				that case is the purpose of :ref:`llvm.experimental.convergence.loop
				<llvm.experimental.convergence.loop>`.

				.. _convergence_control_intrinsics:

				Convergence Control Intrinsics
				==============================

				jsilvanusUnsubmitted Done Reply Inline Actions Maybe mention that `n` is an integer, so There is an integer n such that [..] jsilvanus: Maybe mention that `n` is an integer, so There is an integer n such that [..]
				This section describes target-independent intrinsics that can be used to
				produce convergence tokens.

				Behaviour is undefined if a convergence control intrinsic is called
				indirectly.

				.. _llvm.experimental.convergence.entry:

				``llvm.experimental.convergence.entry``
				----------------------------------------

				.. code-block:: llvm

				token @llvm.experimental.convergence.entry() convergent readnone

				This intrinsic is used to tie the dynamic instances inside of a function to
				those in the caller.

				1. If the function is called from outside the scope of LLVM, the convergence of
				dynamic instances of this intrinsic are environment-defined. For example:

				a. In an OpenCL kernel launch, the maximal set of threads that
				can communicate outside the memory model is a workgroup.
				Hence, a suitable choice is to specify that all the threads from
				a single workgroup in OpenCL execute converged dynamic instances
				of this intrinsic.
				b. In a C/C++ program, threads are launched independently and they can
				communicate only through the memory model. Hence the dynamic instances of
				this intrinsic in a C/C++ program are never converged.
				2. If the function is called from a call-site in LLVM IR, then two
				threads execute converged dynamic instances of this intrinsic if and
				only if both threads entered the function by executing converged
				dynamic instances of the call-site.

				This intrinsic can occur at most once in a function, and only at the start of
				the entry block of the function.

				It is an error if this intrinsic appears in a non-convergent function.

				It is an error to specify a ``convergencectrl`` operand bundle at a
				call to this intrinsic.

				Function inlining substitutes this intrinsic with the token from the operand
				bundle. For example:

				.. code-block:: c++

				// Before inlining:

				void callee() convergent {
				%tok = call token @llvm.experimental.convergence.entry()
				convergent_operation(...) [ "convergencectrl"(token %tok) ]
				}

				void main() {
				%outer = call token @llvm.experimental.convergence.anchor()
				for (...) {
				%inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
				callee() [ "convergencectrl"(token %inner) ]
				}
				}

				// After inlining:

				void main() {
				%outer = call token @llvm.experimental.convergence.anchor()
				for (...) {
				%inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
				convergent_operation(...) [ "convergencectrl"(token %inner) ]
				}
				}

				.. _llvm.experimental.convergence.loop:

				``llvm.experimental.convergence.loop``
				--------------------------------------

				.. code-block:: llvm

				token @llvm.experimental.convergence.loop() [ "convergencectrl"(token) ] convergent readnone

				This intrinsic represents the place where an imaginary counter is incremented
				for determining convergence inside a control flow cycle.

				Let ``U`` be a call to this intrinsic and ``D`` be the convergent operation that
				defines the token value used as the ``convergencectrl`` operand to ``U``. Two
				threads execute converged dynamic instances of ``U`` if and only if:

				1. The token value in both threads was returned by converged dynamic
				instances of ``D``, and,
				2. There is an integer n such that both threads execute ``U`` for the n'th time
				with that token value.

				It is an error to omit the ``convergencectrl`` operand bundle on a
				call to this intrinsic.

				This intrinsic can only occur at the start of a basic block.

				.. _convergence_cycle_heart:

				Heart of a Cycle:

				If a :ref:`cycle <cycle-terminology>` ``C`` contains an occurrence ``H`` of
				this intrinsic whose token operand is defined outside ``C``, then ``H`` is
				called the heart of ``C``.

				.. note::

				The static rules for cycles imply that a heart can occur only in the header
				of a natural loop. This ensures that the heart closely represents the
				intuitive notion of a loop iteration. If this restriction is relaxed, the
				resulting semantics provides a new notion of "cycle iteration" even for
				irreducible cycles. But this allows a natural loop to have a heart in a
				node other than its header, which has interesting consequences on the
				meaning of a loop iteration in terms of convergence. For now, we disallow
				this situation since its practical application is very rare.

				.. _llvm.experimental.convergence.anchor:

				``llvm.experimental.convergence.anchor``
				----------------------------------------

				.. code-block:: llvm

				token @llvm.experimental.convergence.anchor() convergent readnone

				This intrinsic produces an initial convergence token that is independent from
				any "outer scope". The set of threads executing converged dynamic instances of
				this intrinsic is implementation-defined.

				It is an error to pass a ``convergencectrl`` operand bundle at a
				call to this intrinsic.

				.. note::

				The expectation is that all threads within a group that "happen to be active
				at the same time" will execute converged dynamic instances, so that programs
				can detect the maximal set of threads that can communicate efficiently within
				some local region of the program.

				.. _convergence_uncontrolled:

				Uncontrolled Convergent Operations
				==================================

				Convergent operations with an explicit ``convergencectrl`` operand bundle are
				called controlled convergent operations. All other convergent operations are
				said to be uncontrolled.

				An uncontrolled convergent operation is said to have *implicit convergence
				control* determined by the ``convergent`` attribute alone. The semantics of the
				``convergent`` attribute as implemented in LLVM differs from the documented
				semantics. The implementation tries to follow common intuition about convergent
				jsilvanusUnsubmitted Done Reply Inline Actions Maybe "does not satisfy" -> "violates"? In the current form, there is a ambiguity in whether all properties are violated, or just a single one. I assume a single one is intended? jsilvanus: Maybe "does not satisfy" -> "violates"? In the current form, there is a ambiguity in whether…
				sameerdsAuthorUnsubmitted Done Reply Inline Actions Reworded to remove the confusion. The intention is "one or more". sameerds: Reworded to remove the confusion. The intention is "one or more".
				operations, which remains under-specified. As such, it is not possible to fully
				translate implicit convergence control into explicit convergence control tokens,
				and these two modes cannot be mixed in the same function.

				If a function contains a controlled convergent operation, then all convergent
				operations in that function must either be controlled operations or calls to
				the convergence control intrinsics.

				Inferring Tokens
				----------------

				(This section is informational)

				Sometimes, it may be necessary to reinterpret the implicit convergence control
				in terms of explicit convergence control tokens. For example, this may happen
				when a function call is inlined, and either the caller or the callee contains
				uncontrolled convergent operations.

				Some uses of uncontrolled convergent operations may need to satisfy the
				following property:

				For an environment-defined group of threads (such as an OpenCL workgroup or
				subgroup), if one thread in the group executes a convergent operation, then
				all threads in the group do so convergently with that thread.

				In terms of explicit convergence control, this means that the
				``convergencectrl`` operand on each convergent operation ``X`` must ultimately
				originate from a call to the :ref:`llvm.experimental.convergence.entry
				<llvm.experimental.convergence.entry>` intrinsic. This preserves the possibility
				that the group of threads that converge on reaching ``X`` is the same group that
				originally started executing the program in convergence. In comparison, the
				:ref:`llvm.experimental.convergence.anchor
				<llvm.experimental.convergence.anchor>` intrinsic captures an
				implementation-defined group of threads, which is insufficient to support the
				above property.

				One way to approximate implicit convergence control in terms of explicit
				convergence control tokens is the following procedure, which preserves the above
				mentioned property:

				1. Convert every irreducible cycle into a reducible cycle.
				2. Insert a call to :ref:`llvm.experimental.convergence.entry
				<llvm.experimental.convergence.entry>` at the start of the entry block of the
				function.
				3. Insert a call to :ref:`llvm.experimental.convergence.loop
				<llvm.experimental.convergence.loop>` at the start of every loop header. If
				this loop is an outermost loop, the ``convergencectrl`` operand is the call
				to :ref:`llvm.experimental.convergence.entry
				<llvm.experimental.convergence.entry>` in the entry block of the function.
				Otherwise, the ``convergencectrl`` operand is the call to
				:ref:`llvm.experimental.convergence.loop
				<llvm.experimental.convergence.loop>` in the parent loop's header.
				4. For each uncontrolled convergent operation ``X``, add a ``convergencectrl``
				operand bundle using the token defined by a definition ``D`` that is a
				:ref:`sibling <cycle-sibling>` to this operation. ``D`` always dominates
				``X`` --- if ``X`` is not in any cycle, then ``D`` is a call to
				:ref:`llvm.experimental.convergence.entry
				<llvm.experimental.convergence.entry>`; otherwise ``D`` is the heart of the
				parent cycle of ``X``.

				.. _convergence_static_rules:

				Static Rules
				============

				A well-formed program in LLVM IR must satisfy the following static
				rules about cycles and convergence regions.

				Closed Paths
				------------

				A :ref:`closed path <cycle-closed-path>` in a CFG is a connected sequence of
				nodes and edges in the CFG whose start and end points are the same.

				1. Every closed path in the CFG that contains a use of a convergence token T other
				than a use by
				:ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
				must also contain the definition of T.

				2. Every closed path in the CFG that contains two different uses of a convergence
				token T must also contain the definition of T.

				3. Every closed path in the CFG that contains uses of two different convergence tokens
				T1 and T2 must also contain the definition of at least one of them.

				Taken together, these rules imply that for every closed path C, there can be at most
				one convergence token T which is used in C but defined outside of it, and that
				T can be used only once in C, and only by ``llvm.experimental.convergence.loop``.

				4. In every closed path that contains a use U of a token T but not the
				definition of T, U must dominate all nodes in the closed path.

				This implies that ``llvm.experimental.convergence.loop`` can appear as a heart
				only in the header of a natural loop.

				Sufficient Conditions: From the :ref:`properties of cycles
				<cycle-closed-path>`, it is sufficient to prove the above properties
				for cycles instead of closed paths. Briefly, any closed path that violates
				one or more of the above static rules is contained in a cycle that also
				violates the same rule(s).

				.. _convergence_region:

				jsilvanusUnsubmitted Done Reply Inline Actions I think this property should be communicated more prominently. Per my understanding, loops in reducible control flow have unique headers, which give rise to a "natural" convergence (implicit maximal convergence?) by counting executions of the header, and considering those converged if the counter agrees. For irreducible control flow, there are no unique headers, instead there is an ambiguity caused by the dependency on a choice of a cycle hierarchy (and their headers). Explicit convergence control intrinsincs eliminate this ambiguity by allowing to explicitly define loop hearts. But for reducible control flow, if loop hearts are not placed at loop headers, then the notion of convergence may be different. I believe this is what these lines refer to. For example, in the example in llvm/docs/convergence-heart.png, removing the edge from D to R makes the CFG reducible, but iteration counts at R might be different from iteration counts at H, due to the shortcut from H to L. Is that intentional, a "neutral" side-effect of the model, or a side-effect of the model that we would want to eliminate but cannot easily? In any case, I think we should discuss this more explicitly. Also, an example somewhere suggests to construct explicit control intrinsics by putting hearts into loop headers, maybe we can mention there that this ensures that the two notions of convergence agree, because the imaginary counters do. jsilvanus: I think this property should be communicated more prominently. Per my understanding, loops in…
				nhaehnleUnsubmitted Done Reply Inline Actions I'd say it's somewhere between intentional and a neutral side-effect. It certainly allows for some interesting experiments. By the way, the whole loop intrinsic business isn't only about eliminating ambiguity for irreducible loops. Consider: do { do { a(); } while (conda); b(); } while (condb); // vs. do { a(); if (conda) continue; b(); } while (condb); Assuming that `conda` implies `condb`, these two loops are semantically identical from a single-threaded perspective and could easily result in identical CFGs. But the "intuitively expected" convergence behavior is very different. Convergence control allows us to explicitly encode in the IR which of the two intuitive behaviors are expected. (And for the first version of the loop, this would result in two nested loops in the CFG that cannot be collapsed into a single one because the loop intrinsics are "in the way".) nhaehnle: I'd say it's somewhere between intentional and a neutral side-effect. It certainly allows for…
				jsilvanusUnsubmitted Done Reply Inline Actions Thanks for the background and the example, that definitely helps. I still feel this should be stated more explicitly, possibly quite at the beginning when introducing these new concepts? We motivate the use of convergent operations in great detail, but motivate only very briefly why we need to control it explicitly. An example where naive loop-header-based convergence is not intended would be helpful for that. Also, generalizing this example, the fact that explicit control intrinsics allow to change the structure of nested loops while preserving convergence semantics. jsilvanus: Thanks for the background and the example, that definitely helps. I still feel this should be…
				jsilvanusUnsubmitted Done Reply Inline Actions To expand on the above, it probably suffices to add a loop-based example (the one above?) to the list of examples motivating intrinsics, and mention early on that deviating from natural loop-header based convergence is possible and intended, referencing the above new example. jsilvanus: To expand on the above, it probably suffices to add a loop-based example (the one above?) to…
				nhaehnleUnsubmitted Done Reply Inline Actions Ironically, there's now been some discussion that we may be able to simplify this by only allowing loop hearts in natural loop headers. They're still required (the example above with the two loops vs. single loop with continue still stands). But yeah, perhaps that could be added to the document. nhaehnle: Ironically, there's now been some discussion that we may be able to simplify this by only…
				sameerdsAuthorUnsubmitted Done Reply Inline Actions The spec is now updated to allow a cycle heart only in the header of a natural loop. Thus the notion of an iteration under convergence remains unchanged from the intuitive notion. Explicit cycle hearts would have allowed the user to specify how "all threads" (for a suitable definition of "all") converge inside an irreducible cycle. But the usefulness of this is rare enough that we can discount it for now. sameerds: The spec is now updated to allow a cycle heart only in the header of a natural loop. Thus the…
				Convergence Regions
				-------------------

				The convergence region of a convergence token T is the minimal region in
				which T is live and used, i.e., the set of program points dominated by the
				definition D of T from which a use of T can be reached.

				The following static rule about convergence regions must be satisfied by
				valid programs:

				If a convergence region R for a token T1 contains a use of a convergence
				token T2, then R must also contain the definition of T2. (In other words,
				convergence regions must be reasonably nested.)

				.. note::

				For brevity, this document uses the term "convergence region of a token
				definition ``D``" to actually refer to the convergence region of the token
				``T`` defined by ``D``.

				.. _inferring_noconvergent:

				Inferring non-convergence
				=========================

				When the target or the environment guarantees that threads do not
				communicate using convergent operations or that threads never diverge,
				the dynamic instances in the program are irrelevant and an optimizer
				may remove any occurrence of the ``convergent`` attribute on a
				call-site or a function and any explicit ``convergencectrl`` operand
				bundle at a call-site.

				An optimizer may remove the ``convergent`` attribute and any explicit
				``convergencectrl`` operand bundle from a call-site if it can prove
				that the execution of this call-site always results in a call to a
				non-convergent function.

				An optimizer may remove the ``convergent`` attribute on a function if it can
				prove that the function does not contain a call to
				:ref:`llvm.experimental.convergence.entry
				<llvm.experimental.convergence.entry>`, or any uncontrolled convergent
				operations.

				Memory Model Non-Interaction
				============================

				The fact that an operation is convergent has no effect on how it is treated for
				memory model purposes. In particular, an operation that is ``convergent`` and
				``readnone`` does not introduce additional ordering constraints as far as the
				memory model is concerned. There is no implied barrier, neither in the memory
				barrier sense nor in the control barrier sense of synchronizing the execution
				of threads.

				Informational note: Threads that execute converged dynamic instances do not
				necessarily do so at the same time.


				Other Interactions
				==================

				A function can be both ``convergent`` and
				``speculatable``, indicating that the function does not have undefined
				behavior and has no effects besides calculating its result, but is still
				affected by the set of threads executing this function. This typically
				prevents speculation of calls to the function unless the constraint imposed
				by ``convergent`` is further relaxed by some other means.

				Controlled Maximal Convergence
				==============================

				The :ref:`converged-with relation <convergence-definition>` over dynamic
				instances of each controlled convergent operation is completely defined by the
				semantics of convergence tokens. But the implementation-defined convergence at a
				call to :ref:`llvm.experimental.convergence.anchor
				<llvm.experimental.convergence.anchor>` also depends on the cycle hierarchy
				chosen if it occurs inside an irreducible cycle.

				When the token defined by a convergent operation ``D`` is used at another
				convergent operation ``U``, the implementation must ensure that the threads that
				converge at ``U`` are all the threads that reached ``U`` after converging at
				``D``. On most implementations, it is reasonable to assume that only these
				threads are converged at every node they reach on any path from ``D`` to ``U``.
				In other words, the converged-with relation at ``D`` produces groups of threads
				that can converge only within each group, while inside the convergence region of
				``D``.

				All this affects the :ref:`maximal converged-with relation
				<convergence-maximal>` over dynamic instances and in turn the :ref:`m-converged
				property <uniformity-analysis>` of static instances in the convergence region of
				``D``.

				.. _controlled_maximal_converged_with:

				Controlled Maximal converged-with Relation

				1. Dynamic instances of a convergent operation are related in the controlled
				maximal converged-with relation according to the semantics of the convergence
				control tokens.
				2. Dynamic instances ``X1`` and ``X2`` produced by different threads for the
				same non-convergent operation ``X`` are related in the controlled maximal
				converged-with relation if and only if:

				1. Both threads executed converged dynamic instances of every token
				definition ``D`` such that ``X`` is in the convergence region of ``D``,
				and,
				2. For every cycle ``C`` with header ``H`` that contains ``X``:

				- every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in the
				respective thread is convergence-before ``X2``, and,
				- every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in the
				respective thread is convergence-before ``X1``,
				- without assuming that ``X1`` is converged with ``X2``.

				.. _controlled_m_converged:

				Controlled m-converged Static Instances

				A node ``X`` in a given CFG is reported to be m-converged if and only if:

				1. For any token definition ``D`` such that ``X`` is inside the convergence region
				of ``D``, ``D`` itself is m-converged, and,
				2. Every cycle that contains ``X`` satisfies the following necessary
				conditions:

				a. Every divergent branch inside the cycle satisfies the :ref:`diverged
				entry criterion<convergence-diverged-entry>`, and,
				b. There are no :ref:`diverged paths reaching the
				cycle<convergence-diverged-outside>` from a divergent branch outside it.

				Temporal Divergence at Cycle Exit
				---------------------------------

				When a cycle has a divergent exit, maximal convergence assumes that all threads
				converge at the exit block. But if a controlled convergent operation outside the
				cycle uses a token defined by an operation ``D`` inside the cycle, the
				convergence region of ``D`` now extends outside the cycle. If two threads
				executed converged dynamic instances of ``D`` before exiting the cycle, then
				they continue to execute converged dynamic instances of nodes in the convergence
				region of ``D`` outside the cycle. Thus, for a value ``V`` defined inside the
				cycle, any use ``U`` of ``V`` within the convergence region of ``T`` uses the
				output of converged dynamic instances of ``V``. If ``V`` is uniform, then its
				use at such a ``U`` is also uniform. In other words, temporal divergence applies
				only to a use of ``V`` that is outside the convergence region of ``D``.

				Rationales for Static rules about cycles
				========================================

				(This section is informative.)

				.. note::

				For convenience, we use the operator ``==`` to represent the relation
				``converged-with`` and the operator ``!=`` to represent its negation.

				Consider a loop with (incorrect!) convergence control as in the following
				pseudocode:

				.. code-block:: llvm

				; WARNING: Example of incorrect convergence control!

				%anchor = call token @llvm.experimental.convergence.anchor()
				for (;;) {
				...
				call void @convergent.op() [ "convergencectrl"(token %anchor) ]
				...
				}

				This code is forbidden by the first static rule about cycles.

				A first formal argument why we have to do this is that the dynamic rule for
				deciding whether two threads execute converged dynamic instances of
				``@convergent.op`` leads to a logical contradiction in this code.
				Assume two threads execute converged dynamic instances of the anchor
				followed by two iterations of the loop. Thread 1 executes dynamic instances
				I1 and I2 of ``@convergent.op``, thread 2 executes dynamic instances J1 and J2.
				Using all the rules, we can deduce:

				1. ``I1 != I2`` and ``J1 != J2`` by the basic rules of dynamic instances.

				2. ``I1 == J1`` by the first dynamic rule about controlled convergent
				operations: both threads execute the same static instruction while using
				a convergence token value produced by converged dynamic instances of an
				instruction (the anchor).

				3. ``I1 == J2`` by the same argument. Also, ``I2 == J1`` and ``I2 == J2``.

				The fact that one may be intuitively tempted to think of ``I1`` and ``J2``
				as being executed in different loop iterations is completely irrelevant for
				the formal argument. There is no mechanism in LLVM IR semantics for
				forming associations between loop iterations in different threads, except
				for the rules defined in this document -- and the rules in this document
				require a loop heart intrinsic for talking about loop iterations.

				4. By transitivity, we have ``I1 == I2`` and ``J1 == J2``. That is a
				contradiction.

				This problem goes away by inserting a loop heart intrinsic as follows, which
				establishes a relationship between loop iterations across threads.

				.. code-block:: llvm

				%anchor = call token @llvm.experimental.convergence.anchor()
				for (;;) {
				%loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				...
				call void @convergent.op() [ "convergencectrl"(token %loop) ]
				...
				}

				In the same scenario of two threads executing converged dynamic instances of the
				anchor and then two iterations of the loop, the dynamic rule about loop heart
				intrinsics implies that both threads execute the converged dynamic instances of
				the loop heart intrinsic in their respective first iterations and then again in
				their respective second iterations of the loop.

				This then implies that they execute converged dynamic instances ``I1 == J1`` of
				the ``@convergent.op`` in their first iterations and then
				``I2 == J2`` in their second iterations. The rule is an "if and only if" rule,
				so it also implies that ``I1 != J2`` and ``I2 != J1``, because those executions
				see token values of ``%loop`` originating from non-converged dynamic
				instances of the loop intrinsic.

				One may ask whether we could change the dynamic rule instead of adding the
				static rule about cycles. That is impractical due to deeper difficulties.
				Consider the following loop, again with incorrect convergence control:

				.. code-block:: llvm

				; WARNING: Example of incorrect convergence control!

				; (A)
				%anchor = call token @llvm.experimental.convergence.anchor()
				for (;;) {
				; (B)
				if (condition1) {
				; (C)
				call void @convergent.op.1() [ "convergencectrl"(token %anchor) ]
				}
				; (D)
				if (condition2) {
				; (E)
				call void @convergent.op.2() [ "convergencectrl"(token %anchor) ]
				}
				; (F)
				}
				; (G)

				Assume two threads execute converged dynamic instances of the anchor followed
				by this sequence of basic blocks:

				.. code-block:: text

				Thread 1: A B C D F B D E F G
				Thread 2: A B D E F B C D F G

				That is, both threads execute two iterations of the loop, but they execute
				the different convergent operations in different iterations. Without forming a
				relation between loop iterations across the threads, there is no reasonable way
				of defining which dynamic instances of the convergent operations should be the
				same across the threads, if any.

				Again, this can be addressed by adding a loop heart intrinsic, most naturally
				as:

				.. code-block:: llvm

				; (A)
				%anchor = call token @llvm.experimental.convergence.anchor()
				for (;;) {
				; (B)
				%loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				if (condition1) {
				; (C)
				call void @convergent.op.1() [ "convergencectrl"(token %loop) ]
				}
				; (D)
				if (condition2) {
				; (E)
				call void @convergent.op.2() [ "convergencectrl"(token %loop) ]
				}
				; (F)
				}
				; (G)

				Let ``%loop(i;j)`` be the dynamic instance of ``j``-th execution of the loop
				heart intrinsic by thread ``i``, and analogously ``@op.k(i)`` and ``@op.k(i)``
				the dynamic instances of the execution of ``@convergent.op.k`` by thread ``i``.
				Then we have:

				1. ``%loop(1;j) == %loop(2;j)`` for ``j = 1, 2`` because of the dynamic rule
				about loop heart intrinsics.

				2. ``%loop(i;1) != %loop(i;2)`` for ``i = 1, 2`` because of the basic rule that
				different executions by the same thread happen in different dynamic
				instances.

				3. ``@op.1(1) != @op.1(2)``, since ``@op.1(1)`` uses the token value of ``%loop``
				referring to ``%loop(1;1)`` and ``@op.1(2)`` uses that
				referring to ``%loop(2;2) == %loop(1;2)``, which is different from
				``%loop(1;1)``.

				4. Similarly, ``@op.2(1) != @op.2(2)``.

				However, loop heart intrinsics could be inserted differently, at the cost
				of also inserting a free-standing anchor:

				.. code-block:: llvm

				; (A)
				%anchor = call token @llvm.experimental.convergence.anchor()
				for (;;) {
				; (B)
				if (condition1) {
				; (C)
				%loop = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				call void @convergent.op.1() [ "convergencectrl"(token %loop) ]
				}
				; (D)
				if (condition2) {
				; (E)
				%free = call token @llvm.experimental.convergence.anchor()
				call void @convergent.op.2() [ "convergencectrl"(token %free) ]
				}
				; (F)
				}
				; (G)

				This leads to the "unnatural counting of loop iterations" that is also mentioned
				elsewhere. Let ``%loop(i)`` be the dynamic instance of the execution of the
				loop heart intrinsic by thread ``i`` (each thread executes it only once), and
				let ``@op.k(i)`` be as before. Then:

				1. ``%loop(1) == %loop(2)`` because of the dynamic rule about loop heart
				intrinsics.

				2. ``@op.1(1) == @op.1(2)`` because ``@op.1(i)`` uses the value of ``%loop``
				referring to ``%loop(i)``, and ``%loop(1) == %loop(2)``.

				3. Whether ``@op.2(1) == @op.2(2)`` is implementation-defined because of the
				use of the ``%free`` anchor intrinsic.

				In practice, they almost certainly have to be non-converged dynamic
				instances. Consider that if an implementation strictly follows the order of
				instructions given in the program, the executions of the threads can be
				"aligned" as follows:

				.. code-block:: text

				Thread 1: A B C D F B D E F G
				Thread 2: A B D E F B C D F G

				So then ``@op.2(1)`` physically executes later than ``@op.2(2)`` and there
				can be no communication between the threads, which means they execute
				non-converged dynamic instances.

				That said, it is conceivable that there aren't actually any data or other
				dependencies that would enforce this execution order. In that case, a highly
				out-of-order implementation could potentially allow communication. That's
				why the rules defined in this document are silent about whether
				``@op.2(1) == @op.2(2)`` or not.

				This type of convergence control seems relatively unlikely to appear in real
				programs. Its possibility is simply a logical consequence of the model.

				An equivalent issue arises if the convergent operations are replaced by nested
				loops with loop heart intrinsics that directly refer to ``%anchor``, hence
				the variants of the static rules about cycles that apply to them:

				.. code-block:: llvm

				; WARNING: Example of incorrect convergence control!

				%anchor = call token @llvm.experimental.convergence.anchor()
				for (;;) {
				if (condition1) {
				for (;;) {
				%loop1 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				}
				}
				if (condition2) {
				for (;;) {
				%loop2 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
				}
				}
				}

				There is a cycle (closed walk in the CFG) that goes through both loop heart
				intrinsics using ``%anchor`` but not through the definition of ``%anchor``,
				so this code is invalid.


				Examples for the Correctness of Program Transforms
				==================================================

				(This section is informative.)

				As implied by the rules in the previous sections, program transforms are correct
				with respect to convergent operations if they preserve or refine their
				semantics. This means that the set of communicating threads in the transformed
				program must have been possible in the original program.

				Program transforms with a single-threaded focus are generally conservatively
				correct if they do not sink or hoist convergent operations across a branch.
				This applies even to program transforms that change the control flow graph.

				For example, unrolling a loop that does not contain convergent operations
				cannot break any of the guarantees required for convergent operations outside
				of the loop.


				Loop unrolling examples
				-----------------------

				We consider three kinds of loop unrolling here:

				* Partial unrolling with no known trip multiple, so a "tail" is required to
				collect the remaining elements.
				* Partial unrolling by a trip multiple, so no "tail" is required.
				* Full unrolling, which eliminates the loop.

				The first kind is forbidden when ``@llvm.experimental.convergence.loop`` is
				used. We illustrate the reasoning with some examples.

				First, an arbitrary loop that contains convergent operations can be unrolled
				in all of these ways, even with "tail", if all convergent operations refer back
				to an anchor inside the loop. For example (in pseudo-code):

				.. code-block:: llvm

				while (counter > 0) {
				%tok = call token @llvm.experimental.convergence.anchor()
				call void @convergent.operation() [ "convergencectrl"(token %tok) ]
				counter--;
				}

				This can be unrolled to:

				.. code-block:: llvm

				while (counter >= 2) {
				%tok = call token @llvm.experimental.convergence.anchor()
				call void @convergent.operation() [ "convergencectrl"(token %tok) ]
				%tok = call token @llvm.experimental.convergence.anchor()
				call void @convergent.operation() [ "convergencectrl"(token %tok) ]
				counter -= 2;
				}
				while (counter > 0) {
				%tok = call token @llvm.experimental.convergence.anchor()
				call void @convergent.operation() [ "convergencectrl"(token %tok) ]
				counter--;
				}

				This is likely to change the behavior of the convergent operation if there
				are threads whose initial counter value is not a multiple of 2. In particular,
				all threads with an odd trip count are now likely to execute the convergent
				operation in their respective final iterations together because the
				underlying implementation is likely to try to group as many threads together
				as possible for the execution of the "tail".

				This change is allowed because the anchor intrinsic has implementation-defined
				convergence behavior and the loop unrolling transform is considered to be part
				of the implementation. Another way of reasoning is that while the likely
				behavior of the code has changed, the guarantees about its behavior have
				remained the same.

				If the loop contains uncontrolled convergent operations, this kind of unrolling
				is forbidden.

				Unrolling a loop with convergent operations that refer to tokens produced
				outside the loop is forbidden when a "tail" or "remainder" would have to
				be introduced. Consider:

				.. code-block:: llvm

				; (A)
				%outer = call token @llvm.experimental.convergence.anchor()
				while (counter > 0) {
				%inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
				; (B)
				call void @convergent.operation() [ "convergencectrl"(token %inner) ]
				counter--;
				}
				; (C)

				To understand why unrolling is forbidden, consider two threads that execute
				converged dynamic instances of the anchor and then proceed with 3 and 4 loop
				iterations, respectively:

				.. code-block:: text

				Thread 1: A B B B C
				Thread 2: A B B B B C

				By the dynamic rule on loop heart intrinsics, these threads execute converged
				dynamic instances of the loop intrinsic for the first 3 iterations, and then
				thread 2 executes another dynamic instance by itself.

				By the dynamic rule on general convergent operations, the threads execute
				converged dynamic instances of the ``@convergent.operation`` in the first 3
				iterations (that is, the dynamic instance executed by thread 1 in iteration
				n is the same as that executed by thread 2 in iteration n, for n = 1,2,3;
				the dynamic instance executed in iteration 1 is different from that in
				iteration 2, etc.).

				Now assume that the loop is unrolled by a factor of 2, which requires a
				remainder as follows:

				.. code-block:: llvm

				; (A)
				%outer = call token @llvm.experimental.convergence.anchor()
				while (counter >= 2) {
				%inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
				; (B)
				call void @convergent.operation() [ "convergencectrl"(token %inner) ]
				call void @convergent.operation() [ "convergencectrl"(token %inner) ]
				counter -= 2;
				}
				; (C)
				if (counter > 0) {
				%remainder = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
				; (D)
				call void @convergent.operation() [ "convergencectrl"(token %remainder) ]
				}
				; (E)

				First of all, note some interesting problems surrounding the loop intrinsic:

				1. It is not duplicated inside the unrolled loop. This is to comply with
				the :ref:`convergence_static_rules`.

				2. It is unclear whether the loop intrinsic ought to be duplicated in the
				remainder, or whether the final ``@convergent.operation`` in D should just
				refer to either ``%inner`` (which is possible in SSA form) or directly to
				``%outer``. The decision made here is arbitrary and doesn't change the
				argument that follows. Ultimately, it simply doesn't matter because the
				transform is incorrect either way.

				The threads now execute the following sequences of blocks:

				.. code-block:: text

				Thread 1: A B C D E
				Thread 2: A B B C D E

				Analogous to the argument above, they execute converged dynamic instances of the
				``%inner`` intrinsic and the ``@convergent.operation`` in the first iteration
				of the unrolled loop, which corresponds to the first 2 iterations of the
				original loop.

				However, they execute different static calls to ``@convergent.operation`` for
				the 3rd iteration of the original loop. In thread 1, that iteration corresponds
				to the call in the remainder, while in thread 2 it corresponds to the first
				call to ``@convergent.operation`` in the unrolled loop. Therefore, they execute
				non-converged dynamic instances, which means that the set of communicating threads
				for the 3rd iteration of the original loop is different. This is why the
				unrolling is incorrect.

				On the other hand, unrolling without "tail" is allowed. For example, assuming
				that the trip counter is known to be a multiple of 2, we can unroll the loop
				as follows:

				.. code-block:: llvm

				%outer = call token @llvm.experimental.convergence.anchor()
				while (counter > 0) {
				%inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
				call void @convergent.operation() [ "convergencectrl"(token %inner) ]
				call void @convergent.operation() [ "convergencectrl"(token %inner) ]
				counter -= 2;
				}

				Note again that the loop intrinsic is not duplicated.

				The
				:ref:`llvm.experimental.convergence.loop <llvm.experimental.convergence.loop>`
				intrinsic is typically expected to appear in the header of a natural loop.
				However, it can also appear in non-header blocks of a loop. In that case, the
				loop can generally not be unrolled.


				Hoisting and sinking
				--------------------

				In general, hoisting and sinking of convergent operations is forbidden. This is
				because moving the operation to a different point in control flow generally
				changes the set of threads that reach the operation and therefore, the set of
				threads that execute converged dynamic instances of the operation. By
				definition, this changes the set of threads that participate in the
				communication of the convergent operation, which will typically change its
				result.

				There are a number of exceptions, though most of them require additional
				knowledge.

				For example, hoisting and sinking across uniform conditional branches -- i.e.,
				conditional branches where within every possible relevant set of threads, all
				threads will always take the same direction -- is generally allowed. See the end
				of the :ref:`example of reductions inside control flow
				<convergence_example_reductions>` for a brief discussion.

				Some convergent operations can be hoisted but not sunk, or vice versa. A simple
				example is the ``subgroupShuffle(data, id)`` operation. It returns the ``data``
				operand of the thread identified by ``id``, where thread IDs are fixed and
				assigned to each thread at launch. The result is undefined (or perhaps there is
				UB, depending on the language and environment) if thread ``id`` is not in the
				communicating set of threads. So hoisting is allowed in the following
				pseudo-code example:

				.. code-block:: llvm

				define void @example(...) convergent {
				%entry = call token @llvm.experimental.convergence.entry()
				%data = ...
				%id = ...
				if (condition) {
				%shuffled = call i32 @subgroupShuffle(i32 %data, i32 %id) [ "convergencectrl"(token %entry) ]
				...
				} else {
				%shuffled = call i32 @subgroupShuffle(i32 %data, i32 %id) [ "convergencectrl"(token %entry) ]
				...
				}
				}

				After hoisting the calls to ``@subgroupShuffle``, the communicating set of
				threads is the union of the two sets of threads in the original program, so
				``%id`` can only go "out of range" after hoisting if it did so in the original
				program.

				However, speculative execution of ``@subgroupShuffle`` in the following program
				may be forbidden:

				.. code-block:: llvm

				define void @example(...) convergent {
				%entry = call token @llvm.experimental.convergence.entry()
				%data = ...
				%id = ...
				if (condition) {
				%shuffled = call i32 @subgroupShuffle(i32 %data, i32 %id) [ "convergencectrl"(token %entry) ]
				...
				}
				}

				There is no guarantee about the value of ``%id`` in the threads where
				``condition`` is false. If ``@subgroupShuffle`` is defined to have UB when
				``%id`` is outside of the set of communicating threads, then speculating and
				hoisting ``@subgroupShuffle`` might introduce UB.

				On the other hand, if ``@subgroupShuffle`` is defined such that it merely
				produces an undefined value or poison as result when ``%id`` is "out of range",
				then speculating is okay.

				Even though
				:ref:`llvm.experimental.convergence.anchor <llvm.experimental.convergence.anchor>`
				is marked as ``convergent``, it can be sunk in some cases. For example, in
				pseudo-code:

				.. code-block:: llvm

				%tok = call token @llvm.experimental.convergence.anchor()
				if (condition) {
				call void @convergent.operation() [ "convergencectrl"(token %tok) ]
				}

				Assuming that ``%tok`` is only used inside the conditional block, the anchor can
				be sunk. The rationale is two-fold. First, the anchor has implementation-defined
				behavior, and the sinking is part of the implementation. Second, already in the
				original program, the set of threads that communicates in the
				``@convergent.operation`` is automatically subset to the threads for which
				``condition`` is true.

				Anchors can be hoisted in acyclic control flow. For example:

				.. code-block:: llvm

				if (condition) {
				%tok1 = call token @llvm.experimental.convergence.anchor()
				call void @convergent.operation() [ "convergencectrl"(token %tok1) ]
				} else {
				%tok2 = call token @llvm.experimental.convergence.anchor()
				call void @convergent.operation() [ "convergencectrl"(token %tok2) ]
				}

				The anchors can be hoisted, resulting in:

				.. code-block:: llvm

				%tok = call token @llvm.experimental.convergence.anchor()
				if (condition) {
				call void @convergent.operation() [ "convergencectrl"(token %tok) ]
				} else {
				call void @convergent.operation() [ "convergencectrl"(token %tok) ]
				}

				The behavior is unchanged, since each of the static convergent operations only
				ever communicates with threads that have the same ``condition`` value.
				By contrast, hoisting the convergent operations themselves is forbidden.

				Hoisting and sinking anchors out of and into loops is forbidden. For example:

				.. code-block:: llvm

				for (;;) {
				%tok = call token @llvm.experimental.convergence.anchor()
				call void @convergent.operation() [ "convergencectrl"(token %tok) ]
				}

				Hoisting the anchor would make the program invalid according to the static
				validity rules. Conversely:

				.. code-block:: llvm

				%outer = call token @llvm.experimental.convergence.anchor()
				while (counter > 0) {
				%inner = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %outer) ]
				call void @convergent.operation() [ "convergencectrl"(token %inner) ]
				counter--;
				}

				The program would stay valid if the anchor was sunk into the loop, but its
				behavior could end up being different. If the anchor is inside the loop, then
				each loop iteration has a new dynamic instance of the anchor, and the set of
				threads participating in those dynamic instances of the anchor could be
				different in arbitrary implementation-defined ways. Via the dynamic rules about
				dynamic instances of convergent operations, this then implies that the set of
				threads executing ``@convergent.operation`` could be different in each loop
				iteration in arbitrary implementation-defined ways.

				Convergent operations can be sunk together with their anchor. Again in
				pseudo-code:

				.. code-block:: llvm

				%tok = call token @llvm.experimental.convergence.anchor()
				%a = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
				%b = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
				if (condition) {
				use(%a, %b)
				}

				Assuming that ``%tok``, ``%a``, and ``%b`` are only used inside the conditional
				block, all can be sunk together:

				.. code-block:: llvm

				if (condition) {
				%tok = call token @llvm.experimental.convergence.anchor()
				%a = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
				%b = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
				use(%a, %b)
				}

				The rationale is that the anchor intrinsic has implementation-defined behavior,
				and the sinking transform is considered to be part of the implementation:
				the sinking will restrict the set of communicating threads to those for which
				``condition`` is true, but that could have happened in the original program
				anyway for some arbitrary other reason.

				However, sinking only the convergent operation producing ``%b`` would be
				incorrect. That would allow threads for which ``condition`` is false to
				communicate at ``%a``, but not at ``%b``, which the original program doesn't
				allow.

				Note that the entry intrinsic behaves differently. Sinking the convergent
				operations is forbidden in the following snippet:

				.. code-block:: llvm

				%tok = call token @llvm.experimental.convergence.entry()
				%a = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
				%b = call T @pure.convergent.operation(...) [ "convergencectrl"(token %tok) ]
				if (condition) {
				use(%a, %b)
				}

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,745 Lines • ▼ Show 20 Lines	``builtin``
uses the ``nobuiltin`` attribute. This is only valid at call sites for		uses the ``nobuiltin`` attribute. This is only valid at call sites for
direct calls to functions that are declared with the ``nobuiltin``		direct calls to functions that are declared with the ``nobuiltin``
attribute.		attribute.
``cold``		``cold``
This attribute indicates that this function is rarely called. When		This attribute indicates that this function is rarely called. When
computing edge weights, basic blocks post-dominated by a cold		computing edge weights, basic blocks post-dominated by a cold
function call are also considered to be cold; and, thus, given low		function call are also considered to be cold; and, thus, given low
weight.		weight.
``convergent``
In some parallel execution models, there exist operations that cannot be
made control-dependent on any additional values. We call such operations
``convergent``, and mark them with this attribute.

The ``convergent`` attribute may appear on functions or call/invoke
instructions. When it appears on a function, it indicates that calls to
this function should not be made control-dependent on additional values.
For example, the intrinsic ``llvm.nvvm.barrier0`` is ``convergent``, so
calls to this intrinsic cannot be made control-dependent on additional
values.

When it appears on a call/invoke, the ``convergent`` attribute indicates		.. _attr_convergent:
that we should treat the call as though we're calling a convergent
function. This is particularly useful on indirect calls; without this we		``convergent``
may treat such calls as though the target is non-convergent.		This attribute indicates that this function is convergent.
		When it appears on a call/invoke, the convergent attribute
The optimizer may remove the ``convergent`` attribute on functions when it		indicates that we should treat the call as though we’re calling a
can prove that the function does not execute any convergent operations.		convergent function. This is particularly useful on indirect
Similarly, the optimizer may remove ``convergent`` on calls/invokes when it		calls; without this we may treat such calls as though the target
can prove that the call/invoke cannot call a convergent function.		is non-convergent.

		See :doc:`ConvergentOperations` for further details.

		It is an error to call :ref:`llvm.experimental.convergence.entry
		<llvm.experimental.convergence.entry>` from a function that
		does not have this attribute.
``disable_sanitizer_instrumentation``		``disable_sanitizer_instrumentation``
When instrumenting code with sanitizers, it can be important to skip certain		When instrumenting code with sanitizers, it can be important to skip certain
functions to ensure no instrumentation is applied to them.		functions to ensure no instrumentation is applied to them.

This attribute is not always similar to absent ``sanitize_<name>``		This attribute is not always similar to absent ``sanitize_<name>``
attributes: depending on the specific sanitizer, code can be inserted into		attributes: depending on the specific sanitizer, code can be inserted into
functions regardless of the ``sanitize_<name>`` attribute to prevent false		functions regardless of the ``sanitize_<name>`` attribute to prevent false
positive reports.		positive reports.
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	``willreturn``
Annotated functions may still raise an exception, i.a., ``nounwind`` is not implied.		Annotated functions may still raise an exception, i.a., ``nounwind`` is not implied.
If an invocation of an annotated function does not return control back		If an invocation of an annotated function does not return control back
to a point in the call stack, the behavior is undefined.		to a point in the call stack, the behavior is undefined.
``nosync``		``nosync``
This function attribute indicates that the function does not communicate		This function attribute indicates that the function does not communicate
(synchronize) with another thread through memory or other well-defined means.		(synchronize) with another thread through memory or other well-defined means.
Synchronization is considered possible in the presence of `atomic` accesses		Synchronization is considered possible in the presence of `atomic` accesses
that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses,		that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses,
as well as `convergent` function calls. Note that through `convergent` function calls		as well as `convergent` function calls.
non-memory communication, e.g., cross-lane operations, are possible and are also
considered synchronization. However `convergent` does not contradict `nosync`.		Note that `convergent` operations can involve communication that is
If an annotated function does ever synchronize with another thread,		considered to be not through memory and does not necessarily imply an
		ordering between threads for the purposes of the memory model. Therefore,
		an operation can be both `convergent` and `nosync`.

		If a `nosync` function does ever synchronize with another thread,
the behavior is undefined.		the behavior is undefined.
``nounwind``		``nounwind``
This function attribute indicates that the function never raises an		This function attribute indicates that the function never raises an
exception. If the function does raise an exception, its runtime		exception. If the function does raise an exception, its runtime
behavior is undefined. However, functions marked nounwind may still		behavior is undefined. However, functions marked nounwind may still
trap or generate asynchronous exceptions. Exception handling schemes		trap or generate asynchronous exceptions. Exception handling schemes
that are recognized by LLVM to handle asynchronous exceptions, such		that are recognized by LLVM to handle asynchronous exceptions, such
as SEH, will still provide their implementation defined semantics.		as SEH, will still provide their implementation defined semantics.
▲ Show 20 Lines • Show All 764 Lines • ▼ Show 20 Lines

.. code-block:: llvm		.. code-block:: llvm

call void %0() ["kcfi"(i32 1234)]		call void %0() ["kcfi"(i32 1234)]

Clang emits KCFI operand bundles and the necessary metadata with		Clang emits KCFI operand bundles and the necessary metadata with
``-fsanitize=kcfi``.		``-fsanitize=kcfi``.

		.. _convergencectrl:

		Convergence Control Operand Bundles
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		A "convergencectrl" operand bundle is only valid on a ``convergent`` operation.
		When present, the operand bundle must contain exactly one value of token type.
		See the :doc:`ConvergentOperations` document for details.

.. _moduleasm:		.. _moduleasm:

Module-Level Inline Assembly		Module-Level Inline Assembly
----------------------------		----------------------------

Modules may contain "module-level inline asm" blocks, which corresponds		Modules may contain "module-level inline asm" blocks, which corresponds
to the GCC "file scope inline asm" blocks. These blocks are internally		to the GCC "file scope inline asm" blocks. These blocks are internally
concatenated by LLVM and treated as a single unit, but may be separated		concatenated by LLVM and treated as a single unit, but may be separated
▲ Show 20 Lines • Show All 15,851 Lines • ▼ Show 20 Lines

.. code-block:: text		.. code-block:: text

%a = call i8 @llvm.fptosi.sat.i8.f32(float 23.9) ; yields i8: 23		%a = call i8 @llvm.fptosi.sat.i8.f32(float 23.9) ; yields i8: 23
%b = call i8 @llvm.fptosi.sat.i8.f32(float -130.8) ; yields i8: -128		%b = call i8 @llvm.fptosi.sat.i8.f32(float -130.8) ; yields i8: -128
%c = call i8 @llvm.fptosi.sat.i8.f32(float 999.0) ; yields i8: 127		%c = call i8 @llvm.fptosi.sat.i8.f32(float 999.0) ; yields i8: 127
%d = call i8 @llvm.fptosi.sat.i8.f32(float 0xFFF8000000000000) ; yields i8: 0		%d = call i8 @llvm.fptosi.sat.i8.f32(float 0xFFF8000000000000) ; yields i8: 0

		Convergence Intrinsics
		----------------------

		The LLVM convergence intrinsics for controlling the semantics of ``convergent``
		operations, which all start with the ``llvm.experimental.convergence.``
		prefix, are described in the :doc:`ConvergentOperations` document.

.. _dbg_intrinsics:		.. _dbg_intrinsics:

Debugger Intrinsics		Debugger Intrinsics
-------------------		-------------------

The LLVM debugger intrinsics (which all start with ``llvm.dbg.``		The LLVM debugger intrinsics (which all start with ``llvm.dbg.``
prefix), are described in the `LLVM Source Level		prefix), are described in the `LLVM Source Level
Debugging <SourceLevelDebugging.html#format-common-intrinsics>`_		Debugging <SourceLevelDebugging.html#format-common-intrinsics>`_
▲ Show 20 Lines • Show All 8,720 Lines • Show Last 20 Lines

llvm/docs/Reference.rst

Show All 10 Lines	.. toctree::

Atomics		Atomics
BitCodeFormat		BitCodeFormat
BlockFrequencyTerminology		BlockFrequencyTerminology
BranchWeightMetadata		BranchWeightMetadata
Bugpoint		Bugpoint
CommandGuide/index		CommandGuide/index
ConvergenceAndUniformity		ConvergenceAndUniformity
		ConvergentOperations
Coroutines		Coroutines
DependenceGraphs/index		DependenceGraphs/index
ExceptionHandling		ExceptionHandling
Extensions		Extensions
FaultMaps		FaultMaps
FuzzingLLVM		FuzzingLLVM
GarbageCollection		GarbageCollection
GetElementPtr		GetElementPtr
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines

:doc:`Machine IR (MIR) Format Reference Manual <MIRLangRef>`		:doc:`Machine IR (MIR) Format Reference Manual <MIRLangRef>`
A reference manual for the MIR serialization format, which is used to test		A reference manual for the MIR serialization format, which is used to test
LLVM's code generation passes.		LLVM's code generation passes.

:doc:`GlobalISel/index`		:doc:`GlobalISel/index`
This describes the prototype instruction selection replacement, GlobalISel.		This describes the prototype instruction selection replacement, GlobalISel.

		:doc:`ConvergentOperations`
		Description of ``convergent`` operation semantics and related intrinsics.

=====================		=====================
Testing and Debugging		Testing and Debugging
=====================		=====================

:doc:`LLVM Testing Infrastructure Guide <TestingGuide>`		:doc:`LLVM Testing Infrastructure Guide <TestingGuide>`
A reference manual for using the LLVM testing infrastructure.		A reference manual for using the LLVM testing infrastructure.

:doc:`TestSuiteGuide`		:doc:`TestSuiteGuide`
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines

	* Introduced new ``llvm.frexp`` intrinsic.			* Introduced new ``llvm.frexp`` intrinsic.

	* The constant expression variants of the following instructions have been			* The constant expression variants of the following instructions have been
	removed:			removed:

	* ``select``			* ``select``

				* Introduced a set of experimental `convergence control intrinsics
				<ConvergentOperations.html>`__ to explicitly define the semantics of convergent
				operations.

	Changes to LLVM infrastructure			Changes to LLVM infrastructure
	------------------------------			------------------------------

	* The legacy optimization pipeline has been removed.			* The legacy optimization pipeline has been removed.

	* Alloca merging in the inliner has been removed, since it only worked with the			* Alloca merging in the inliner has been removed, since it only worked with the
	legacy inliner pass. Backend stack coloring should handle cases alloca			legacy inliner pass. Backend stack coloring should handle cases alloca
	merging initially set out to handle.			merging initially set out to handle.
	▲ Show 20 Lines • Show All 331 Lines • Show Last 20 Lines

llvm/include/llvm/ADT/GenericCycleImpl.h

	Show All 9 Lines
	/// This template implementation resides in a separate file so that it			/// This template implementation resides in a separate file so that it
	/// does not get injected into every .cpp file that includes the			/// does not get injected into every .cpp file that includes the
	/// generic header.			/// generic header.
	///			///
	/// DO NOT INCLUDE THIS FILE WHEN MERELY USING CYCLEINFO.			/// DO NOT INCLUDE THIS FILE WHEN MERELY USING CYCLEINFO.
	///			///
	/// This file should only be included by files that implement a			/// This file should only be included by files that implement a
	/// specialization of the relevant templates. Currently these are:			/// specialization of the relevant templates. Currently these are:
	/// - CycleAnalysis.cpp			/// - llvm/lib/IR/CycleInfo.cpp
	/// - MachineCycleAnalysis.cpp			/// - llvm/lib/CodeGen/MachineCycleAnalysis.cpp
	///			///
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_ADT_GENERICCYCLEIMPL_H			#ifndef LLVM_ADT_GENERICCYCLEIMPL_H
	#define LLVM_ADT_GENERICCYCLEIMPL_H			#define LLVM_ADT_GENERICCYCLEIMPL_H

	#include "llvm/ADT/DenseSet.h"			#include "llvm/ADT/DenseSet.h"
	#include "llvm/ADT/DepthFirstIterator.h"			#include "llvm/ADT/DepthFirstIterator.h"
	▲ Show 20 Lines • Show All 449 Lines • Show Last 20 Lines

llvm/include/llvm/ADT/GenericCycleInfo.h

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	public:

/// Methods for debug and self-test.		/// Methods for debug and self-test.
//@{		//@{
#ifndef NDEBUG		#ifndef NDEBUG
bool validateTree() const;		bool validateTree() const;
#endif		#endif
void print(raw_ostream &Out) const;		void print(raw_ostream &Out) const;
void dump() const { print(dbgs()); }		void dump() const { print(dbgs()); }
		Printable print(const CycleT *Cycle) { return Cycle->print(Context); }
//@}		//@}

/// Iteration over top-level cycles.		/// Iteration over top-level cycles.
//@{		//@{
using const_toplevel_iterator_base =		using const_toplevel_iterator_base =
typename std::vector<std::unique_ptr<CycleT>>::const_iterator;		typename std::vector<std::unique_ptr<CycleT>>::const_iterator;
struct const_toplevel_iterator		struct const_toplevel_iterator
: iterator_adaptor_base<const_toplevel_iterator,		: iterator_adaptor_base<const_toplevel_iterator,
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/CycleAnalysis.h

	Show All 9 Lines
	/// This file declares an analysis pass that computes CycleInfo for			/// This file declares an analysis pass that computes CycleInfo for
	/// LLVM IR, specialized from GenericCycleInfo.			/// LLVM IR, specialized from GenericCycleInfo.
	///			///
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_ANALYSIS_CYCLEANALYSIS_H			#ifndef LLVM_ANALYSIS_CYCLEANALYSIS_H
	#define LLVM_ANALYSIS_CYCLEANALYSIS_H			#define LLVM_ANALYSIS_CYCLEANALYSIS_H

	#include "llvm/ADT/GenericCycleInfo.h"			#include "llvm/IR/CycleInfo.h"
	#include "llvm/IR/PassManager.h"			#include "llvm/IR/PassManager.h"
	#include "llvm/IR/SSAContext.h"			#include "llvm/IR/SSAContext.h"
	#include "llvm/Pass.h"			#include "llvm/Pass.h"

	namespace llvm {			namespace llvm {
	extern template class GenericCycleInfo<SSAContext>;			extern template class GenericCycleInfo<SSAContext>;
	extern template class GenericCycle<SSAContext>;			extern template class GenericCycle<SSAContext>;

	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/include/llvm/IR/CycleInfo.h

This file was added.

				//===- CycleInfo.h - Cycle Info for LLVM IR ------------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				/// \file
				///
				/// This file declares the LLVM IR specialization of the GenericCycle
				/// templates.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_IR_CYCLEINFO_H
				#define LLVM_IR_CYCLEINFO_H

				#include "llvm/ADT/GenericCycleInfo.h"
				#include "llvm/IR/SSAContext.h"

				namespace llvm {

				extern template class GenericCycleInfo<SSAContext>;
				extern template class GenericCycle<SSAContext>;

				using CycleInfo = GenericCycleInfo<SSAContext>;
				using Cycle = CycleInfo::CycleT;

				} // namespace llvm

				#endif // LLVM_IR_CYCLEINFO_H

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 2,519 Lines • ▼ Show 20 Lines
	// signature in the pointer, but instead returns the signature as a value.			// signature in the pointer, but instead returns the signature as a value.
	// That allows it to be used to sign non-pointer data: in that sense, it is			// That allows it to be used to sign non-pointer data: in that sense, it is
	// generic. There is no generic @llvm.ptrauth.auth: instead, the signature			// generic. There is no generic @llvm.ptrauth.auth: instead, the signature
	// can be computed using @llvm.ptrauth.sign_generic, and compared with icmp.			// can be computed using @llvm.ptrauth.sign_generic, and compared with icmp.
	def int_ptrauth_sign_generic :			def int_ptrauth_sign_generic :
	DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty], [IntrNoMem]>;			DefaultAttrsIntrinsic<[llvm_i64_ty], [llvm_i64_ty, llvm_i64_ty], [IntrNoMem]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				//===------- Convergence Intrinsics ---------------------------------------===//

				def int_experimental_convergence_entry
				: DefaultAttrsIntrinsic<[llvm_token_ty], [], [IntrNoMem, IntrConvergent]>;
				arsenmUnsubmitted Done Reply Inline Actions Needs a lot more intrinsic properties. Should really be DefaultAttrsIntrinsic +Convergent+NoMem arsenm: Needs a lot more intrinsic properties. Should really be DefaultAttrsIntrinsic +Convergent+NoMem
				def int_experimental_convergence_anchor
				: DefaultAttrsIntrinsic<[llvm_token_ty], [], [IntrNoMem, IntrConvergent]>;
				def int_experimental_convergence_loop
				: DefaultAttrsIntrinsic<[llvm_token_ty], [], [IntrNoMem, IntrConvergent]>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Target-specific intrinsics			// Target-specific intrinsics
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "llvm/IR/IntrinsicsPowerPC.td"			include "llvm/IR/IntrinsicsPowerPC.td"
	include "llvm/IR/IntrinsicsX86.td"			include "llvm/IR/IntrinsicsX86.td"
	include "llvm/IR/IntrinsicsARM.td"			include "llvm/IR/IntrinsicsARM.td"
	Show All 16 Lines

llvm/include/llvm/IR/LLVMContext.h

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	enum : unsigned {
OB_funclet = 1, // "funclet"		OB_funclet = 1, // "funclet"
OB_gc_transition = 2, // "gc-transition"		OB_gc_transition = 2, // "gc-transition"
OB_cfguardtarget = 3, // "cfguardtarget"		OB_cfguardtarget = 3, // "cfguardtarget"
OB_preallocated = 4, // "preallocated"		OB_preallocated = 4, // "preallocated"
OB_gc_live = 5, // "gc-live"		OB_gc_live = 5, // "gc-live"
OB_clang_arc_attachedcall = 6, // "clang.arc.attachedcall"		OB_clang_arc_attachedcall = 6, // "clang.arc.attachedcall"
OB_ptrauth = 7, // "ptrauth"		OB_ptrauth = 7, // "ptrauth"
OB_kcfi = 8, // "kcfi"		OB_kcfi = 8, // "kcfi"
		OB_convergencectrl = 9, // "convergencectrl"
};		};

/// getMDKindID - Return a unique non-zero ID for the specified metadata kind.		/// getMDKindID - Return a unique non-zero ID for the specified metadata kind.
/// This ID is uniqued across modules in the current LLVMContext.		/// This ID is uniqued across modules in the current LLVMContext.
unsigned getMDKindID(StringRef Name) const;		unsigned getMDKindID(StringRef Name) const;

/// getMDKindNames - Populate client supplied SmallVector with the name for		/// getMDKindNames - Populate client supplied SmallVector with the name for
/// custom metadata IDs registered in this LLVMContext.		/// custom metadata IDs registered in this LLVMContext.
▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/lib/Analysis/CycleAnalysis.cpp

	Show All 11 Lines
	#include "llvm/InitializePasses.h"			#include "llvm/InitializePasses.h"

	using namespace llvm;			using namespace llvm;

	namespace llvm {			namespace llvm {
	class Module;			class Module;
	}			}

	template class llvm::GenericCycleInfo<SSAContext>;
	template class llvm::GenericCycle<SSAContext>;

	CycleInfo CycleAnalysis::run(Function &F, FunctionAnalysisManager &) {			CycleInfo CycleAnalysis::run(Function &F, FunctionAnalysisManager &) {
	CycleInfo CI;			CycleInfo CI;
	CI.compute(F);			CI.compute(F);
	return CI;			return CI;
	}			}

	AnalysisKey CycleAnalysis::Key;			AnalysisKey CycleAnalysis::Key;

	▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

llvm/lib/IR/CMakeLists.txt

	add_llvm_component_library(LLVMCore			add_llvm_component_library(LLVMCore
	AbstractCallSite.cpp			AbstractCallSite.cpp
	AsmWriter.cpp			AsmWriter.cpp
	Assumptions.cpp			Assumptions.cpp
	Attributes.cpp			Attributes.cpp
	AutoUpgrade.cpp			AutoUpgrade.cpp
	BasicBlock.cpp			BasicBlock.cpp
	BuiltinGCs.cpp			BuiltinGCs.cpp
	Comdat.cpp			Comdat.cpp
	ConstantFold.cpp			ConstantFold.cpp
	ConstantRange.cpp			ConstantRange.cpp
	Constants.cpp			Constants.cpp
	Core.cpp			Core.cpp
				CycleInfo.cpp
	DIBuilder.cpp			DIBuilder.cpp
	DataLayout.cpp			DataLayout.cpp
	DebugInfo.cpp			DebugInfo.cpp
	DebugInfoMetadata.cpp			DebugInfoMetadata.cpp
	DebugLoc.cpp			DebugLoc.cpp
	DiagnosticHandler.cpp			DiagnosticHandler.cpp
	DiagnosticInfo.cpp			DiagnosticInfo.cpp
	DiagnosticPrinter.cpp			DiagnosticPrinter.cpp
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/lib/IR/CycleInfo.cpp

This file was added.

				//===- CycleInfo.cpp - IR Cycle Info ----------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/IR/CycleInfo.h"
				#include "llvm/ADT/GenericCycleImpl.h"
				#include "llvm/IR/CFG.h"

				using namespace llvm;

				template class llvm::GenericCycleInfo<SSAContext>;
				template class llvm::GenericCycle<SSAContext>;

llvm/lib/IR/LLVMContext.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	assert(PtrauthEntry->second == LLVMContext::OB_ptrauth &&
"ptrauth operand bundle id drifted!");		"ptrauth operand bundle id drifted!");
(void)PtrauthEntry;		(void)PtrauthEntry;

auto *KCFIEntry = pImpl->getOrInsertBundleTag("kcfi");		auto *KCFIEntry = pImpl->getOrInsertBundleTag("kcfi");
assert(KCFIEntry->second == LLVMContext::OB_kcfi &&		assert(KCFIEntry->second == LLVMContext::OB_kcfi &&
"kcfi operand bundle id drifted!");		"kcfi operand bundle id drifted!");
(void)KCFIEntry;		(void)KCFIEntry;

		auto *ConvergenceCtrlEntry = pImpl->getOrInsertBundleTag("convergencectrl");
		assert(ConvergenceCtrlEntry->second == LLVMContext::OB_convergencectrl &&
		"convergencectrl operand bundle id drifted!");
		(void)ConvergenceCtrlEntry;

SyncScope::ID SingleThreadSSID =		SyncScope::ID SingleThreadSSID =
pImpl->getOrInsertSyncScopeID("singlethread");		pImpl->getOrInsertSyncScopeID("singlethread");
assert(SingleThreadSSID == SyncScope::SingleThread &&		assert(SingleThreadSSID == SyncScope::SingleThread &&
"singlethread synchronization scope ID drifted!");		"singlethread synchronization scope ID drifted!");
(void)SingleThreadSSID;		(void)SingleThreadSSID;

SyncScope::ID SystemSSID =		SyncScope::ID SystemSSID =
pImpl->getOrInsertSyncScopeID("");		pImpl->getOrInsertSyncScopeID("");
▲ Show 20 Lines • Show All 275 Lines • Show Last 20 Lines

llvm/lib/IR/Verifier.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 32 Lines
// * It is illegal to have a ret instruction that returns a value that does not		// * It is illegal to have a ret instruction that returns a value that does not
// agree with the function return value type.		// agree with the function return value type.
// * Function call argument types match the function prototype		// * Function call argument types match the function prototype
// * A landing pad is defined by a landingpad instruction, and can be jumped to		// * A landing pad is defined by a landingpad instruction, and can be jumped to
// only by the unwind edge of an invoke instruction.		// only by the unwind edge of an invoke instruction.
// * A landingpad instruction must be the first non-PHI instruction in the		// * A landingpad instruction must be the first non-PHI instruction in the
// block.		// block.
// * Landingpad instructions must be in a function with a personality function.		// * Landingpad instructions must be in a function with a personality function.
		// * Convergence control intrinsics are introduced in ConvergentOperations.rst.
		arsenmUnsubmitted Done Reply Inline Actions Typo ConvertentOperations arsenm: Typo ConvertentOperations
		// The applied restrictions are too numerous to list here.
		// * The convergence entry intrinsic and the loop heart must be the first
		// non-PHI instruction in their respective block. This does not conflict with
		// the landing pads, since these two kinds cannot occur in the same block.
// * All other things that are tested by asserts spread about the code...		// * All other things that are tested by asserts spread about the code...
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/ADT/APFloat.h"		#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/BinaryFormat/Dwarf.h"		#include "llvm/BinaryFormat/Dwarf.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/AttributeMask.h"		#include "llvm/IR/AttributeMask.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/Comdat.h"		#include "llvm/IR/Comdat.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/ConstantRange.h"		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
		#include "llvm/IR/CycleInfo.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfo.h"		#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/EHPersonalities.h"		#include "llvm/IR/EHPersonalities.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	private:

// NOLINTNEXTLINE(readability-identifier-naming)		// NOLINTNEXTLINE(readability-identifier-naming)
void Write(const AttributeList *AL) {		void Write(const AttributeList *AL) {
if (!AL)		if (!AL)
return;		return;
AL->print(*OS);		AL->print(*OS);
}		}

		void Write(Printable P) { *OS << P << '\n'; }

template <typename T> void Write(ArrayRef<T> Vs) {		template <typename T> void Write(ArrayRef<T> Vs) {
for (const T &V : Vs)		for (const T &V : Vs)
Write(V);		Write(V);
}		}

template <typename T1, typename... Ts>		template <typename T1, typename... Ts>
void WriteTs(const T1 &V1, const Ts &... Vs) {		void WriteTs(const T1 &V1, const Ts &... Vs) {
Write(V1);		Write(V1);
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	class Verifier : public InstVisitor<Verifier>, VerifierSupport {
bool SawFrameEscape;		bool SawFrameEscape;

/// Whether the current function has a DISubprogram attached to it.		/// Whether the current function has a DISubprogram attached to it.
bool HasDebugInfo = false;		bool HasDebugInfo = false;

/// The current source language.		/// The current source language.
dwarf::SourceLanguage CurrentSourceLang = dwarf::DW_LANG_lo_user;		dwarf::SourceLanguage CurrentSourceLang = dwarf::DW_LANG_lo_user;

		/// Whether the current function has convergencectrl operand bundles.
		enum {
		ControlledConvergence,
		UncontrolledConvergence,
		NoConvergence
		} ConvergenceKind = NoConvergence;

/// Whether source was present on the first DIFile encountered in each CU.		/// Whether source was present on the first DIFile encountered in each CU.
DenseMap<const DICompileUnit *, bool> HasSourceDebugInfo;		DenseMap<const DICompileUnit *, bool> HasSourceDebugInfo;

/// Stores the count of how many objects were passed to llvm.localescape for a		/// Stores the count of how many objects were passed to llvm.localescape for a
/// given function and the largest index passed to llvm.localrecover.		/// given function and the largest index passed to llvm.localrecover.
DenseMap<Function *, std::pair<unsigned, unsigned>> FrameEscapeInfo;		DenseMap<Function *, std::pair<unsigned, unsigned>> FrameEscapeInfo;

// Maps catchswitches and cleanuppads that unwind to siblings to the		// Maps catchswitches and cleanuppads that unwind to siblings to the
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	for (const BasicBlock &BB : F) {
}		}
return false;		return false;
}		}

Broken = false;		Broken = false;
// FIXME: We strip const here because the inst visitor strips const.		// FIXME: We strip const here because the inst visitor strips const.
visit(const_cast<Function &>(F));		visit(const_cast<Function &>(F));
verifySiblingFuncletUnwinds();		verifySiblingFuncletUnwinds();
		if (ConvergenceKind == ControlledConvergence)
		verifyConvergenceControl(const_cast<Function &>(F));
InstsInThisBlock.clear();		InstsInThisBlock.clear();
DebugFnArgs.clear();		DebugFnArgs.clear();
LandingPadResultTy = nullptr;		LandingPadResultTy = nullptr;
SawFrameEscape = false;		SawFrameEscape = false;
SiblingFuncletInfo.clear();		SiblingFuncletInfo.clear();
verifyNoAliasScopeDecl();		verifyNoAliasScopeDecl();
NoAliasScopeDecls.clear();		NoAliasScopeDecls.clear();
		ConvergenceKind = NoConvergence;

return !Broken;		return !Broken;
}		}

/// Verify the module that this instance of \c Verifier was initialized with.		/// Verify the module that this instance of \c Verifier was initialized with.
bool verify() {		bool verify() {
Broken = false;		Broken = false;

▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	#include "llvm/IR/Metadata.def"
void verifyFunctionMetadata(ArrayRef<std::pair<unsigned, MDNode *>> MDs);		void verifyFunctionMetadata(ArrayRef<std::pair<unsigned, MDNode *>> MDs);

void visitConstantExprsRecursively(const Constant *EntryC);		void visitConstantExprsRecursively(const Constant *EntryC);
void visitConstantExpr(const ConstantExpr *CE);		void visitConstantExpr(const ConstantExpr *CE);
void verifyInlineAsmCall(const CallBase &Call);		void verifyInlineAsmCall(const CallBase &Call);
void verifyStatepoint(const CallBase &Call);		void verifyStatepoint(const CallBase &Call);
void verifyFrameRecoverIndices();		void verifyFrameRecoverIndices();
void verifySiblingFuncletUnwinds();		void verifySiblingFuncletUnwinds();
		void verifyConvergenceControl(Function &F);

void verifyFragmentExpression(const DbgVariableIntrinsic &I);		void verifyFragmentExpression(const DbgVariableIntrinsic &I);
template <typename ValueOrMetadata>		template <typename ValueOrMetadata>
void verifyFragmentExpression(const DIVariable &V,		void verifyFragmentExpression(const DIVariable &V,
DIExpression::FragmentInfo Fragment,		DIExpression::FragmentInfo Fragment,
ValueOrMetadata *Desc);		ValueOrMetadata *Desc);
void verifyFnArgs(const DbgVariableIntrinsic &I);		void verifyFnArgs(const DbgVariableIntrinsic &I);
void verifyNotEntryValue(const DbgVariableIntrinsic &I);		void verifyNotEntryValue(const DbgVariableIntrinsic &I);
▲ Show 20 Lines • Show All 1,940 Lines • ▼ Show 20 Lines	do {
Active.insert(PredPad);		Active.insert(PredPad);
} while (true);		} while (true);
// Each node only has one successor, so we've walked all the active		// Each node only has one successor, so we've walked all the active
// nodes' successors.		// nodes' successors.
Active.clear();		Active.clear();
}		}
}		}

		void Verifier::verifyConvergenceControl(Function &F) {
		DenseMap<BasicBlock , SmallVector<CallBase , 8>> LiveTokenMap;
		DenseMap<const Cycle , const CallBase > CycleHearts;

		// Just like the DominatorTree, compute the CycleInfo locally so that we
		// can run the verifier outside of a pass manager and we don't rely on
		// potentially out-dated analysis results.
		CycleInfo CI;
		CI.compute(F);

		auto checkBundle = [&](OperandBundleUse &Bundle, CallBase *CB,
		SmallVectorImpl<CallBase *> &LiveTokens) {
		Check(Bundle.Inputs.size() == 1 && Bundle.Inputs[0]->getType()->isTokenTy(),
		arsenmUnsubmitted Done Reply Inline Actions Don't reconstruct each iteration? arsenm: Don't reconstruct each iteration?
		sameerdsAuthorUnsubmitted Done Reply Inline Actions Reconstruction on each iteration is not something I've thought about much. But the programmer's manual only mentions this for std::vector and not SmallVector. Moved the declaration out of the loop anyway. sameerds: Reconstruction on each iteration is not something I've thought about much. But the programmer's…
		"The 'convergencectrl' bundle requires exactly one token use.", CB);

		Value *Token = Bundle.Inputs[0].get();
		auto *Def = dyn_cast<CallBase>(Token);
		Check(Def != nullptr,
		"Convergence control tokens can only be produced by call "
		"instructions.",
		Token);

		Check(llvm::is_contained(LiveTokens, Token),
		"Convergence region is not well-nested.", Token, CB);

		while (LiveTokens.back() != Token)
		LiveTokens.pop_back();

		arsenmUnsubmitted Done Reply Inline Actions This is a pretty long and indented block, move to helper function? arsenm: This is a pretty long and indented block, move to helper function?
		// Check static rules about cycles.
		auto *BB = CB->getParent();
		auto *BBCycle = CI.getCycle(BB);
		if (!BBCycle)
		return;

		BasicBlock *DefBB = Def->getParent();
		if (DefBB == BB \|\| BBCycle->contains(DefBB)) {
		// degenerate occurrence of a loop intrinsic
		return;
		}

		auto *II = dyn_cast<IntrinsicInst>(CB);
		Check(II &&
		II->getIntrinsicID() == Intrinsic::experimental_convergence_loop,
		"Convergence token used by an instruction other than "
		"llvm.experimental.convergence.loop in a cycle that does "
		"not contain the token's definition.",
		CB, CI.print(BBCycle));

		while (true) {
		auto *Parent = BBCycle->getParentCycle();
		if (!Parent \|\| Parent->contains(DefBB))
		break;
		BBCycle = Parent;
		};

		Check(BBCycle->isReducible() && BB == BBCycle->getHeader(),
		"Cycle heart must dominate all blocks in the cycle.", CB, BB,
		CI.print(BBCycle));
		Check(!CycleHearts.count(BBCycle),
		"Two static convergence token uses in a cycle that does "
		"not contain either token's definition.",
		CB, CycleHearts[BBCycle], CI.print(BBCycle));
		CycleHearts[BBCycle] = CB;
		};

		ReversePostOrderTraversal<Function *> RPOT(&F);
		SmallVector<CallBase *, 8> LiveTokens;
		for (BasicBlock *BB : RPOT) {
		LiveTokens.clear();
		auto LTIt = LiveTokenMap.find(BB);
		if (LTIt != LiveTokenMap.end()) {
		LiveTokens = std::move(LTIt->second);
		LiveTokenMap.erase(LTIt);
		}

		for (Instruction &I : *BB) {
		CallBase *CB = dyn_cast<CallBase>(&I);
		if (!CB)
		continue;

		auto Bundle = CB->getOperandBundle(LLVMContext::OB_convergencectrl);
		if (Bundle)
		checkBundle(*Bundle, CB, LiveTokens);
		arsenmUnsubmitted Done Reply Inline Actions Don't need llvm:: arsenm: Don't need llvm::

		if (CB->getType()->isTokenTy())
		LiveTokens.push_back(CB);
		}

		// Propagate token liveness
		for (BasicBlock *Succ : successors(BB)) {
		DomTreeNode *SuccNode = DT.getNode(Succ);
		LTIt = LiveTokenMap.find(Succ);
		if (LTIt == LiveTokenMap.end()) {
		// We're the first predecessor: all tokens which dominate the
		// successor are live for now.
		LTIt = LiveTokenMap.try_emplace(Succ).first;
		for (CallBase *LiveToken : LiveTokens) {
		if (!DT.dominates(DT.getNode(LiveToken->getParent()), SuccNode))
		break;
		LTIt->second.push_back(LiveToken);
		}
		} else {
		// Compute the intersection of live tokens.
		auto It = llvm::partition(LTIt->second, [&LiveTokens](CallBase *Token) {
		return llvm::is_contained(LiveTokens, Token);
		});
		LTIt->second.erase(It, LTIt->second.end());
		}
		}
		}
		}

// visitFunction - Verify that a function is ok.		// visitFunction - Verify that a function is ok.
//		//
void Verifier::visitFunction(const Function &F) {		void Verifier::visitFunction(const Function &F) {
visitGlobalValue(F);		visitGlobalValue(F);

// Check function arguments.		// Check function arguments.
FunctionType *FT = F.getFunctionType();		FunctionType *FT = F.getFunctionType();
unsigned NumArgs = F.arg_size();		unsigned NumArgs = F.arg_size();
▲ Show 20 Lines • Show All 712 Lines • ▼ Show 20 Lines	Check(PN.getType() == IncValue->getType(),
"PHI node operands are not the same type as the result!", &PN);		"PHI node operands are not the same type as the result!", &PN);
}		}

// All other PHI node constraints are checked in the visitBasicBlock method.		// All other PHI node constraints are checked in the visitBasicBlock method.

visitInstruction(PN);		visitInstruction(PN);
}		}

		static bool isControlledConvergent(const CallBase &Call) {
		if (Call.getOperandBundle(LLVMContext::OB_convergencectrl))
		return true;
		if (const auto *F = dyn_cast<Function>(Call.getCalledOperand())) {
		switch (F->getIntrinsicID()) {
		case Intrinsic::experimental_convergence_anchor:
		case Intrinsic::experimental_convergence_entry:
		case Intrinsic::experimental_convergence_loop:
		return true;
		}
		}
		return false;
		}

void Verifier::visitCallBase(CallBase &Call) {		void Verifier::visitCallBase(CallBase &Call) {
Check(Call.getCalledOperand()->getType()->isPointerTy(),		Check(Call.getCalledOperand()->getType()->isPointerTy(),
"Called function must be a pointer!", Call);		"Called function must be a pointer!", Call);
PointerType *FPTy = cast<PointerType>(Call.getCalledOperand()->getType());		PointerType *FPTy = cast<PointerType>(Call.getCalledOperand()->getType());

Check(FPTy->isOpaqueOrPointeeTypeMatches(Call.getFunctionType()),		Check(FPTy->isOpaqueOrPointeeTypeMatches(Call.getFunctionType()),
"Called function is not the same type as the call!", Call);		"Called function is not the same type as the call!", Call);

▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	if (Call.getFunction()->getSubprogram() && Call.getCalledFunction() &&
CheckDI(Call.getDebugLoc(),		CheckDI(Call.getDebugLoc(),
"inlinable function call in a function with "		"inlinable function call in a function with "
"debug info must have a !dbg location",		"debug info must have a !dbg location",
Call);		Call);

if (Call.isInlineAsm())		if (Call.isInlineAsm())
verifyInlineAsmCall(Call);		verifyInlineAsmCall(Call);

		if (isControlledConvergent(Call)) {
		Check(Call.isConvergent(),
		"Expected convergent attribute on a controlled convergent call.",
		Call);
		Check(ConvergenceKind != UncontrolledConvergence,
		"Cannot mix controlled and uncontrolled convergence in the same "
		"function.",
		Call);
		ConvergenceKind = ControlledConvergence;
		} else if (Call.isConvergent()) {
		Check(ConvergenceKind != ControlledConvergence,
		"Cannot mix controlled and uncontrolled convergence in the same "
		"function.",
		Call);
		ConvergenceKind = UncontrolledConvergence;
		}

visitInstruction(Call);		visitInstruction(Call);
}		}

void Verifier::verifyTailCCMustTailAttrs(const AttrBuilder &Attrs,		void Verifier::verifyTailCCMustTailAttrs(const AttrBuilder &Attrs,
StringRef Context) {		StringRef Context) {
Check(!Attrs.contains(Attribute::InAlloca),		Check(!Attrs.contains(Attribute::InAlloca),
Twine("inalloca attribute not allowed in ") + Context);		Twine("inalloca attribute not allowed in ") + Context);
Check(!Attrs.contains(Attribute::InReg),		Check(!Attrs.contains(Attribute::InReg),
▲ Show 20 Lines • Show All 2,371 Lines • ▼ Show 20 Lines	default:
CheckFailed("Intrinsic can only be used from functions with the "		CheckFailed("Intrinsic can only be used from functions with the "
"amdgpu_cs, amdgpu_cs_chain or amdgpu_cs_chain_preserve "		"amdgpu_cs, amdgpu_cs_chain or amdgpu_cs_chain_preserve "
"calling conventions",		"calling conventions",
&Call);		&Call);
break;		break;
}		}
break;		break;
}		}
		case Intrinsic::experimental_convergence_entry:
		Check(Call.getFunction()->isConvergent(),
		"Entry intrinsic can occur only in a convergent function.", &Call);
		Check(Call.getParent()->isEntryBlock(),
		"Entry intrinsic must occur in the entry block.", &Call);
		Check(Call.getParent()->getFirstNonPHI() == &Call,
		"Entry intrinsic must occur at the start of the basic block.", &Call);
		LLVM_FALLTHROUGH;
		case Intrinsic::experimental_convergence_anchor:
		Check(!Call.getOperandBundle(LLVMContext::OB_convergencectrl),
		"Entry or anchor intrinsic must not have a convergencectrl bundle.",
		&Call);
		break;
		case Intrinsic::experimental_convergence_loop:
		Check(Call.getOperandBundle(LLVMContext::OB_convergencectrl),
		"Loop intrinsic must have a convergencectrl bundle.", &Call);
		Check(Call.getParent()->getFirstNonPHI() == &Call,
		"Loop intrinsic must occur at the start of the basic block.", &Call);
		break;
};		};

// Verify that there aren't any unmediated control transfers between funclets.		// Verify that there aren't any unmediated control transfers between funclets.
if (IntrinsicInst::mayLowerToFunctionCall(ID)) {		if (IntrinsicInst::mayLowerToFunctionCall(ID)) {
Function *F = Call.getParent()->getParent();		Function *F = Call.getParent()->getParent();
if (F->hasPersonalityFn() &&		if (F->hasPersonalityFn() &&
isScopedEHPersonality(classifyEHPersonality(F->getPersonalityFn()))) {		isScopedEHPersonality(classifyEHPersonality(F->getPersonalityFn()))) {
// Run EH funclet coloring on-demand and cache results for other intrinsic		// Run EH funclet coloring on-demand and cache results for other intrinsic
▲ Show 20 Lines • Show All 1,061 Lines • Show Last 20 Lines

llvm/test/Analysis/UniformityAnalysis/AMDGPU/join-at-loop-heart.ll

This file was deleted.

	; RUN: opt -mtriple amdgcn-unknown-amdhsa -passes='print<uniformity>' -disable-output %s 2>&1 \| FileCheck %s

	; CHECK: DIVERGENT: %phi.h = phi i32 [ 0, %entry ], [ %inc, %C ], [ %inc, %D ], [ %inc, %E ]
	; CHECK: DIVERGENT: %tid = call i32 @llvm.amdgcn.workitem.id.x()
	; CHECK: DIVERGENT: %div.cond = icmp slt i32 %tid, 0
	; CHECK: DIVERGENT: %inc = add i32 %phi.h, 1
	; CHECK: DIVERGENT: br i1 %div.cond, label %C, label %D

	define void @nested_loop_extension() {
	arsenmUnsubmitted Done Reply Inline Actions Why was this deleted? arsenm: Why was this deleted?
	sameerdsAuthorUnsubmitted Done Reply Inline Actions This test was incorrectly added with the change that introduced uniformity analysis. The current semantics disallow a heart anywhere other than a loop header, so the example in this test is now invalid. sameerds: This test was incorrectly added with the change that introduced uniformity analysis. The…
	entry:
	%anchor = call token @llvm.experimental.convergence.anchor()
	br label %A

	A:
	%phi.h = phi i32 [ 0, %entry ], [ %inc, %C ], [ %inc, %D ], [ %inc, %E ]
	br label %B

	B:
	%tid = call i32 @llvm.amdgcn.workitem.id.x()
	%div.cond = icmp slt i32 %tid, 0
	%inc = add i32 %phi.h, 1
	br i1 %div.cond, label %C, label %D

	C:
	br i1 undef, label %A, label %E

	D:
	br i1 undef, label %A, label %E

	E:
	%b = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %anchor) ]
	br i1 undef, label %A, label %F

	F:
	ret void
	}

	declare i32 @llvm.amdgcn.workitem.id.x() #0

	declare token @llvm.experimental.convergence.anchor()
	declare token @llvm.experimental.convergence.loop()

	attributes #0 = { nounwind readnone }

llvm/test/Assembler/convergence-control.ll

This file was added.

				; RUN: llvm-as < %s -disable-output 2>&1 \| FileCheck %s -allow-empty

				; RUN: llvm-as < %s \| llvm-dis > %t1.ll
				; RUN: llvm-as %t1.ll -o - \| llvm-dis > %t2.ll
				; RUN: diff %t1.ll %t2.ll

				; RUN: llvm-as < %t1.ll -disable-output 2>&1 \| FileCheck %s -allow-empty

				; CHECK-NOT: error
				; CHECK-NOT: warning

				define void @mixed1() {
				call void @g() ; not convergent
				call void @f() ; uncontrolled convergent
				call void @g() ; not convergent
				ret void
				}

				define void @mixed2() {
				call void @g() ; not convergent
				%t1_tok1 = call token @llvm.experimental.convergence.anchor()
				call void @f() [ "convergencectrl"(token %t1_tok1) ]
				call void @g() ; not convergent
				ret void
				}


				define void @region_nesting1() convergent {
				A:
				%tok1 = call token @llvm.experimental.convergence.entry()
				%tok2 = call token @llvm.experimental.convergence.anchor()
				br label %B

				B:
				br i1 undef, label %C, label %D

				C:
				call void @f() [ "convergencectrl"(token %tok1) ]
				ret void

				D:
				call void @f() [ "convergencectrl"(token %tok2) ]
				ret void
				}

				; Mirror image of @region_nesting1
				define void @region_nesting2() {
				A:
				%tok1 = call token @llvm.experimental.convergence.anchor()
				%tok2 = call token @llvm.experimental.convergence.anchor()
				br label %B

				B:
				br i1 undef, label %C, label %D

				C:
				call void @f() [ "convergencectrl"(token %tok2) ]
				ret void

				D:
				call void @f() [ "convergencectrl"(token %tok1) ]
				ret void
				}

				define void @loop_nesting() convergent {
				A:
				%a = call token @llvm.experimental.convergence.entry()
				br label %B

				B:
				%b = call token @llvm.experimental.convergence.anchor()
				br i1 undef, label %C, label %D

				C:
				%c = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %b) ]
				call void @f() [ "convergencectrl"(token %c) ]
				br label %B

				D:
				call void @f() [ "convergencectrl"(token %b) ]
				br i1 undef, label %B, label %E

				E:
				ret void
				}
				declare void @f() convergent
				declare void @g()

				declare token @llvm.experimental.convergence.entry()
				declare token @llvm.experimental.convergence.anchor()
				declare token @llvm.experimental.convergence.loop()

llvm/test/Bitcode/convergence-control.ll

This file was added.

				; RUN: llvm-dis < %s.bc \| FileCheck %s

				define void @loop_nesting() convergent {
				A:
				; CHECK-LABEL: A:
				; CHECK: [[A:%.*]] = call token @llvm.experimental.convergence.entry()
				;
				%a = call token @llvm.experimental.convergence.entry()
				br label %B

				B:
				; CHECK-LABEL: B:
				; CHECK: [[B:%.*]] = call token @llvm.experimental.convergence.anchor()
				;
				%b = call token @llvm.experimental.convergence.anchor()
				br i1 undef, label %C, label %D

				C:
				; CHECK-LABEL: C:
				; CHECK: [[C:%.*]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[B]]) ]
				; CHEC K: call void @f() [ "convergencectrl"(token [[C]]) ]
				;
				%c = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %b) ]
				call void @f() [ "convergencectrl"(token %c) ]
				br label %B

				D:
				; CHECK-LABEL: D:
				; CHECK: call void @f() [ "convergencectrl"(token [[B]]) ]
				;
				call void @f() [ "convergencectrl"(token %b) ]
				br i1 undef, label %B, label %E

				E:
				ret void
				}

				declare void @f() convergent

				declare token @llvm.experimental.convergence.entry()
				declare token @llvm.experimental.convergence.anchor()
				declare token @llvm.experimental.convergence.loop()

llvm/test/Bitcode/convergence-control.ll.bc

This binary file was added.

llvm/test/Bitcode/operand-bundles-bc-analyzer.ll

	; RUN: llvm-as < %s \| llvm-bcanalyzer -dump -disable-histogram \| FileCheck %s			; RUN: llvm-as < %s \| llvm-bcanalyzer -dump -disable-histogram \| FileCheck %s

	; CHECK: <OPERAND_BUNDLE_TAGS_BLOCK			; CHECK: <OPERAND_BUNDLE_TAGS_BLOCK
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: <OPERAND_BUNDLE_TAG			; CHECK-NEXT: <OPERAND_BUNDLE_TAG
				; CHECK-NEXT: <OPERAND_BUNDLE_TAG
	; CHECK-NEXT: </OPERAND_BUNDLE_TAGS_BLOCK			; CHECK-NEXT: </OPERAND_BUNDLE_TAGS_BLOCK

	; CHECK: <FUNCTION_BLOCK			; CHECK: <FUNCTION_BLOCK
	; CHECK: <OPERAND_BUNDLE			; CHECK: <OPERAND_BUNDLE
	; CHECK: <OPERAND_BUNDLE			; CHECK: <OPERAND_BUNDLE
	; CHECK-NOT: <OPERAND_BUNDLE			; CHECK-NOT: <OPERAND_BUNDLE
	; CHECK: </FUNCTION_BLOCK			; CHECK: </FUNCTION_BLOCK

	Show All 11 Lines

llvm/test/Verifier/convergencectrl-invalid.ll

This file was added.

				; RUN: not llvm-as < %s -o /dev/null 2>&1 \| FileCheck %s

				; CHECK: Expected convergent attribute on a controlled convergent call.
				; CHECK-NEXT call void @g(){{.*}}%t05_tok1
				define void @missing.attribute() {
				%t05_tok1 = call token @llvm.experimental.convergence.anchor()
				call void @g() [ "convergencectrl"(token %t05_tok1) ]
				ret void
				}

				; CHECK: Cannot mix controlled and uncontrolled convergence in the same function
				; CHECK-NEXT call void @f()
				define void @mixed1() {
				call void @g() ; not convergent
				%t10_tok1 = call token @llvm.experimental.convergence.anchor()
				call void @f() [ "convergencectrl"(token %t10_tok1) ]
				call void @g()
				call void @f() ; uncontrolled convergent
				ret void
				}

				; CHECK: Cannot mix controlled and uncontrolled convergence in the same function
				; CHECK: %t20_tok1 = call token @llvm.experimental.convergence.anchor()
				; CHECK: Cannot mix controlled and uncontrolled convergence in the same function
				; CHECK: call void @f() [ "convergencectrl"(token %t20_tok1) ]
				define void @mixed2() {
				call void @g() ; not convergent
				call void @f() ; uncontrolled convergent
				call void @g()
				%t20_tok1 = call token @llvm.experimental.convergence.anchor()
				call void @f() [ "convergencectrl"(token %t20_tok1) ]
				ret void
				}

				; CHECK: Convergence region is not well-nested.
				; CHECK: %t30_tok2
				define void @region_nesting1() {
				%t30_tok1 = call token @llvm.experimental.convergence.anchor()
				%t30_tok2 = call token @llvm.experimental.convergence.anchor()
				call void @f() [ "convergencectrl"(token %t30_tok1) ]
				call void @f() [ "convergencectrl"(token %t30_tok2) ]
				ret void
				}

				; CHECK: Convergence region is not well-nested.
				; CHECK: %t40_tok2
				define void @region_nesting2(i1 %cond) {
				A:
				%t40_tok1 = call token @llvm.experimental.convergence.anchor()
				%t40_tok2 = call token @llvm.experimental.convergence.anchor()
				br i1 %cond, label %B, label %C

				B:
				call void @f() [ "convergencectrl"(token %t40_tok1) ]
				br label %C

				C:
				call void @f() [ "convergencectrl"(token %t40_tok2) ]
				ret void
				}

				; CHECK: Convergence token used by an instruction other than llvm.experimental.convergence.loop in a cycle that does not contain the token's definition.
				; CHECK: token %t50_tok1
				define void @use_in_cycle() {
				A:
				%t50_tok1 = call token @llvm.experimental.convergence.anchor()
				br label %B

				B:
				call void @f() [ "convergencectrl"(token %t50_tok1) ]
				br label %B
				}

				; CHECK: Entry intrinsic must occur at the start of the basic block.
				; CHECK: %t60_tok1
				define void @entry_at_start(i32 %x, i32 %y) convergent {
				%z = add i32 %x, %y
				%t60_tok1 = call token @llvm.experimental.convergence.entry()
				ret void
				}

				; CHECK: Entry intrinsic can occur only in a convergent function.
				; CHECK: %t60_tok2
				define void @entry_in_convergent(i32 %x, i32 %y) {
				%t60_tok2 = call token @llvm.experimental.convergence.entry()
				ret void
				}

				; CHECK: Loop intrinsic must occur at the start of the basic block.
				; CHECK: %t60_tok3
				define void @loop_at_start(i32 %x, i32 %y) convergent {
				A:
				%t60_tok3 = call token @llvm.experimental.convergence.entry()
				br label %B
				B:
				%z = add i32 %x, %y
				%h1 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %t60_tok3) ]
				ret void
				}

				; CHECK: Entry intrinsic must occur in the entry block.
				; CHECK: %t60_tok4
				define void @entry_at_entry(i32 %x, i32 %y) convergent {
				A:
				%z = add i32 %x, %y
				br label %B
				B:
				%t60_tok4 = call token @llvm.experimental.convergence.entry()
				ret void
				}

				; CHECK: Two static convergence token uses in a cycle that does not contain either token's definition.
				; CHECK: token %t70_tok1
				; CHECK: token %t70_tok2
				define void @multiple_hearts() {
				A:
				%t70_tok1 = call token @llvm.experimental.convergence.anchor()
				%t70_tok2 = call token @llvm.experimental.convergence.anchor()
				br label %B

				B:
				%h2 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %t70_tok2) ]
				%h1 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %t70_tok1) ]
				br label %B
				}

				; CHECK: Two static convergence token uses in a cycle that does not contain either token's definition.
				; CHECK: token %h0
				; CHECK: token %h0
				define void @multiple_hearts_nested(i1 %cond1, i1 %cond2) {
				A:
				%t70_tok3 = call token @llvm.experimental.convergence.anchor()
				br label %B

				B:
				%h0 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %t70_tok3) ]
				br i1 %cond1, label %C, label %B

				C:
				%h1 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %h0) ]
				%h2 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %h0) ]
				br i1 %cond2, label %C, label %B
				}

				; CHECK: Cycle heart must dominate all blocks in the cycle.
				; CHECK: %h3 = call token
				; CHECK: label %C
				define void @invalid_heart_nested(i1 %cond1, i1 %cond2) {
				A:
				%t70_tok4 = call token @llvm.experimental.convergence.anchor()
				br label %B

				B:
				br i1 %cond1, label %C, label %B

				C:
				%h3 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %t70_tok4) ]
				br i1 %cond2, label %C, label %B
				}

				; CHECK: Cycle heart must dominate all blocks in the cycle.
				; CHECK: %h4 = call token
				; CHECK: label %C
				define void @irreducible1(i1 %cond) {
				A:
				%a = call token @llvm.experimental.convergence.anchor()
				br i1 %cond, label %B, label %C

				B:
				%b = call token @llvm.experimental.convergence.anchor()
				br i1 %cond, label %C, label %D

				arsenmUnsubmitted Done Reply Inline Actions Avoid undef arsenm: Avoid undef
				C:
				%h4 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %a) ]
				br i1 %cond, label %B, label %E

				D:
				call void @f() [ "convergencectrl"(token %b) ]
				br i1 %cond, label %B, label %F

				E:
				call void @f() [ "convergencectrl"(token %h4) ]
				br i1 %cond, label %C, label %F

				F:
				call void @f() [ "convergencectrl"(token %a) ]
				ret void
				}

				; Mirror image of @irreducible1
				; CHECK: Cycle heart must dominate all blocks in the cycle.
				; CHECK: %h5 = call token
				; CHECK: label %B
				define void @irreducible2(i1 %cond) {
				A:
				%a = call token @llvm.experimental.convergence.anchor()
				br i1 %cond, label %B, label %C

				B:
				%h5 = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token %a) ]
				br i1 %cond, label %C, label %D

				C:
				%c = call token @llvm.experimental.convergence.anchor()
				br i1 %cond, label %B, label %E

				D:
				call void @f() [ "convergencectrl"(token %h5) ]
				br i1 %cond, label %B, label %F

				E:
				call void @f() [ "convergencectrl"(token %c) ]
				br i1 %cond, label %C, label %F

				F:
				call void @f() [ "convergencectrl"(token %a) ]
				ret void
				}

				declare void @f() convergent
				arsenmUnsubmitted Done Reply Inline Actions Need some tests with invoke and demonstrate the exception issues arsenm: Need some tests with invoke and demonstrate the exception issues
				sameerdsAuthorUnsubmitted Done Reply Inline Actions Well it turns out that EH landing pads occur in blocks where we can't have a call to the entry() or loop() intrinsics. There are some rules about the predecessor blocks, which prevent landing pads in the entry block or in a loop header. So there is nothing to test here. Note to self: If there is no conflict, might as well remove the comments from Verifier.cpp. FIXME: A loop intrinsic is required to be the first non-PHI only if it is a true heart (in a loop header). Verifier should not complain if it occurs in any other block. sameerds: Well it turns out that EH landing pads occur in blocks where we can't have a call to the entry…
				sameerdsAuthorUnsubmitted Done Reply Inline Actions Simplified the comments in the verifier, but did not relax the check for non-heart loop calls. For now, the verifier allows a call to `loop` only at the start of a block. That's not necessary if the call is not a loop heart, but relaxing this check is not very important yet. We can revisit if we have a real use-case where a non-heart `loop` call needs to be in the middle of a block. sameerds: Simplified the comments in the verifier, but did not relax the check for non-heart loop calls.
				declare void @g()

				declare token @llvm.experimental.convergence.entry()
				declare token @llvm.experimental.convergence.anchor()
				declare token @llvm.experimental.convergence.loop()

This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Introduce convergence control intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 539407

llvm/docs/ConvergenceAndUniformity.rst

llvm/docs/ConvergentOperations.rst

llvm/docs/LangRef.rst

llvm/docs/Reference.rst

llvm/docs/ReleaseNotes.rst

llvm/include/llvm/ADT/GenericCycleImpl.h

llvm/include/llvm/ADT/GenericCycleInfo.h

llvm/include/llvm/Analysis/CycleAnalysis.h

llvm/include/llvm/IR/CycleInfo.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/LLVMContext.h

llvm/lib/Analysis/CycleAnalysis.cpp

llvm/lib/IR/CMakeLists.txt

llvm/lib/IR/CycleInfo.cpp

llvm/lib/IR/LLVMContext.cpp

llvm/lib/IR/Verifier.cpp

llvm/test/Analysis/UniformityAnalysis/AMDGPU/join-at-loop-heart.ll

llvm/test/Assembler/convergence-control.ll

llvm/test/Bitcode/convergence-control.ll

llvm/test/Bitcode/convergence-control.ll.bc

llvm/test/Bitcode/operand-bundles-bc-analyzer.ll

llvm/test/Verifier/convergencectrl-invalid.ll

[RFC] Introduce convergence control intrinsics
ClosedPublic