This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
3/7
DynamicInstances.rst
-
LangRef.rst
-
Reference.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
Intrinsics.td
-
lib/
-
Analysis/
-
ValueTracking.cpp
-
Transforms/
-
Scalar/
-
LoopUnrollPass.cpp
-
Utils/
2/3
SimplifyCFG.cpp
-
test/Transforms/
-
Transforms/
-
JumpThreading/
-
basic.ll
-
SimplifyCFG/
-
attr-convergent.ll

Differential D68994

[RFC] Redefine `convergent` in terms of dynamic instances
AbandonedPublic

Authored by nhaehnle on Oct 15 2019, 8:48 AM.

Download Raw Diff

Details

Reviewers

jdoerfert

Summary

GPU-oriented programming languages have some operations with constraints
that cannot currently be expressed properly in LLVM IR. For example:

uvec4 result;
if (cc) {
  result = ballot(true);
} else {
  result = ballot(true);
}

Even though both sides of the branch are identical, it is incorrect to
replace the if-statement with a single ballot call. This is because
ballot communicates with other threads, and the set of those threads
depends on where ballot is with respect to control flow.

In the past, we have tried to fix this up somewhat by putting the
convergent attribute on functions. However, this approach has some
weaknesses. First, the restrictions imposed by convergent are not
actually strong enough for some cases such as the example above. Second,
the definition of convergent relies on the notion of
control-dependencies, which have action at a distance that makes it
difficult to satisfy. For example, the jump threading pass currently
does not honor the convergent attribute correctly in cases
such as:

bool flag = false;
if (cc1) {
  ...
  if (cc2)
    flag = true;
}
if (flag) {
  result = ballot(true);
}

Since the convergent ballot operation is at a distance from the part
of the code inspected by the jump threading pass, the pass will decide
to transform the code in an incorrect way.

This patch proposes to fix these and related problems by putting the
convergent attribute and the underlying notions of divergence and
reconvergence on a solid formal basis. At the same time, the impact
on generic transforms is small by design: a new set of intrinsics is
introduced that can be used to control reconvergence without being
prone to action at a distance. Frontends for GPU-oriented programming
langauges are expected to insert these intrinsics, so that passes such
as jump threading will be "correct by default".

In the jump threading example above, a frontend would be expected to
insert intrinsics as follows:

bool flag = false;
token tok = @llvm.convergence.anchor();
if (cc1) {
  ...
  if (cc2)
    flag = true;
}
@llvm.convergence.join(tok);
if (flag) {
  result = ballot(true);
}

The convergence intrinsics indicate that threads are expected to
reconverge before the second if-statement, which affects the behavior
of the ballot call. The join intrinsic call guards against incorrect
jump threading.

The intention of this RFC is to gauge the interest of the LLVM community
and whether this direction can be accepted going forward. Frontend and
backend parts are required for a complete solution, though the frontend
parts are language-specific and therefore not part of LLVM itself.

Additional Notes:

Function inlining really needs to add convergence intrinsics when the caller is convergent and the callee contains control flow

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 40267
Build 40368: arc lint + arc unit

Event Timeline

nhaehnle created this revision.Oct 15 2019, 8:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2019, 8:48 AM

Herald added subscribers: jdoerfert, zzheng, hiraditya, wdng. · View Herald Transcript

nhaehnle removed reviewers: arsenm, alex-t, tpr, t-tye.Oct 15 2019, 8:52 AM

nhaehnle added subscribers: arsenm, alex-t, tpr and 7 others.

Harbormaster completed remote builds in B39584: Diff 225051.Oct 15 2019, 8:56 AM

Thanks for posting this. Please send a plain email explaining the proposal to llvm-dev - RFCs regarding new intrinsics and other IR extensions should go there to reach a wide audience.

In D68994#1709715, @hfinkel wrote:

Thanks for posting this. Please send a plain email explaining the proposal to llvm-dev - RFCs regarding new intrinsics and other IR extensions should go there to reach a wide audience.

RFC is here: http://lists.llvm.org/pipermail/llvm-dev/2019-October/135929.html

chapeiro added a subscriber: chapeiro.Oct 15 2019, 10:09 AM

chapeiro removed a subscriber: chapeiro.

chapeiro added a subscriber: chapeiro.

Are you planning on adding codegen support in a separate patch?

None of these tests use llvm.convergence.point.

Can you also add tests for unroll with a convergent operation in the presence of the new intrinsics?

llvm/docs/DynamicInstances.rst
230	I don't fully understand what the 'point' of this one is. What is the point of ensuring nonreconvergence? Which real example does this correspond to?

Nicola added a subscriber: Nicola.Oct 15 2019, 10:53 AM

JonChesterfield added a subscriber: JonChesterfield.Oct 15 2019, 1:29 PM

sameerds added a subscriber: sameerds.Oct 15 2019, 10:16 PM

piotr added a subscriber: piotr.Oct 16 2019, 6:50 AM

In D68994#1709802, @arsenm wrote:

Are you planning on adding codegen support in a separate patch?

Yes, though it'll be a while before it's ready.

llvm/docs/DynamicInstances.rst
230	It allows us to mark regions of code that are semantically part of a loop for the purposes of convergence even when they're not part of the loop as far as loop analysis is concerned. This applies to high-level code such as the following: int divergent_key = ...; int v = ...; int sum; for (;;;) { tok = @llvm.convergence.anchor() int uniform_key = readfirstlane(divergent_key); if (uniform_key == divergent_key) { sum = subgroup_reduce_add(v); @llvm.convergence.point(tok) break; } } The point indicates that no reconvergence should happen for the execution of the reduce operation (or in a sense, that the reduce operation should not be moved out of the loop). Points are in some sense redundant, as the code above could be equivalently rewritten as: int divergent_key = ...; int v = ...; int sum; for (;;;) { tok = @llvm.convergence.anchor() int uniform_key = readfirstlane(divergent_key); if (uniform_key == divergent_key) { sum = subgroup_reduce_add(v); } @llvm.convergence.join(tok) if (uniform_key == divergent_key) break; } Though that reformulation loses some information about control-flow, in particular it leads to more conservative live ranges.

foad added a subscriber: foad.Oct 21 2019, 2:18 AM

vchuravy added a subscriber: vchuravy.Oct 22 2019, 3:34 PM

arsenm mentioned this in D69498: IR: Invert convergent attribute handling.Oct 27 2019, 7:26 PM

What is the expectiation of lowering of a loop like the one you mentioned above?

int divergent_key = ...;
int v = ...;
int sum;

for (;;;) {
  tok = @llvm.convergence.anchor()
  int uniform_key = readfirstlane(divergent_key);
  if (uniform_key == divergent_key) {
    sum = subgroup_reduce_add(v);
    @llvm.convergence.point(tok)
    break;
  }
}

In particular the expectation of lowering in LLVM-IR is typically that the block with the "break" is an exit block and as such is typically moved outside of the loop (when the threads are reconverted). Is the expectation to have LLVM.CONVERGENCE.POINT to be lowered to the backends with a pseudo instruction similar to DBG_VALUE (and then be ignored by regalloc and such) up to the point that final control-flow block ordering is defined so that the execution of the "unconverged" part of the exit block is lowered so that is executed inside the loop instead of outside?

llvm/docs/DynamicInstances.rst
62	While I like the idea of formalizing the whole "Dynamic Instances" concept I wonder if it could be explained in a more intuitive way as well in order to make the understanding of the formal definitions easier by approaching them with an intuition already of what they are.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1474	Is this the only change in SimplifyCFG we would have to do? What about FoldBranchToCommonDest() ? It seems to be hoisting instructions into the predecessors , but it just checks for the Speculable attribute (through isSafeToSpeculativelyExecute()) and seem to ignore the Convergent attribute. In general how much the new behavior has been vetted around LLVM passes with respect respecting hoisting or changes in thread execution mask?

sstefan1 added a subscriber: sstefan1.Oct 29 2019, 2:02 AM

In D68994#1724647, @kariddi wrote:
What is the expectiation of lowering of a loop like the one you mentioned above?

int divergent_key = ...;
int v = ...;
int sum;
for (;;;) {
  tok = @llvm.convergence.anchor()
  int uniform_key = readfirstlane(divergent_key);
  if (uniform_key == divergent_key) {
    sum = subgroup_reduce_add(v);
    @llvm.convergence.point(tok)
    break;
  }
}
In particular the expectation of lowering in LLVM-IR is typically that the block with the "break" is an exit block and as such is typically moved outside of the loop (when the threads are reconverted). Is the expectation to have LLVM.CONVERGENCE.POINT to be lowered to the backends with a pseudo instruction similar to DBG_VALUE (and then be ignored by regalloc and such) up to the point that final control-flow block ordering is defined so that the execution of the "unconverged" part of the exit block is lowered so that is executed inside the loop instead of outside?

Something along those lines, yes. The details would obviously depend on how the backend handles divergence and reconvergence. For example, in AMDGPU we fairly early on convert into a form where divergent values are computed using vector instructions, i.e. mostly explicitly SIMD instructions that use an EXEC physical register. So in our case, the intrinsic would affect this conversion of the code into SIMD form.

llvm/docs/DynamicInstances.rst
62	I don't think we can get away without a formal description, but I'm not against having some more illustrative examples. Do you have anything in mind?
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1474	I've fixed the places that I was aware of based on bugs we've encountered over the years. I wouldn't be surprised if there were more places like you say...

simoll added inline comments.Oct 29 2019, 9:11 AM

llvm/docs/DynamicInstances.rst
230	Couldn't you align the conceptual convergence point with the loop structure by adding a `@llvm.always.true` intrinsic like so: int divergent_key = ...; int v = ...; int sum; for (;;;) { int uniform_key = readfirstlane(divergent_key); if (uniform_key == divergent_key) { sum = subgroup_reduce_add(v); %we.know.its.true = call @llvm.always.true() br i1 %we.know.its.true, %loopExitBlock, %nextBlockInTheLoop } } That way you do not need the intrinsic and the DA/SDA already assume that threads only synchronize at (`LoopInfo`) loop exits.

Thanks for the answers! It was about time somebody tackled this problem in LLVM :)

A couple extra questions I have:

Changing the convergent attribute from being sink prevention only to both sink/hoist prevention (which should impact CSE as well I believe even if I didn't see any change on that regard) would have some performance impact. Considering there's another proposal with respect to make convergent the default from Matt, how do we see this impacting performance. Should we have two (or more?) attributes instead? For example quad-scoped texture operations don't really care about being hoisted/cse'd up, but they care about being sinked.

llvm.convergence.join is marked IntNoMem. Its not clear to me how is it gonna survive instcombine considering that is not emitting any value , so nothing is in place to keep it alive. What's the mechanism that's keeping it around?

Is there a plan on how we are gonna figure out all the optimizations that need to be updated and in particular, how are we gonna make sure new optimizations (probably written with single threaded in mind) are not gonna break our invariants. (Basically how do we make everybody aware this exists and make them think about it). In particular there could be classes of optimizations that work uniquely on CFG that now would maybe have to look "inside the blocks" before transforming because they need to make sure there's no convergent instruction/call while operating over the blocks they are considering.

llvm/docs/DynamicInstances.rst
62	Yep, my suggestion was about adding a preamble before the formal definition that explains in more "earthly" terms what the concept is before jumping to the formal definition. Certainly not replacing the formal definition. I really like the fact that you added it in there! Maybe commonly known concepts like warps, threads in a warp and divergence could be used to that effect. If I understood correctly the concept "Dynamic instance" is the execution of an instruction by a certain group of threads and the same instruction can be executed by different group of threads at different point in time. (Please correct me if I misunderstood) . If its something like this it could be more intuitively be placed before the formal definition.

Thank you for your detailed comments!

In D68994#1725542, @kariddi wrote:

Changing the convergent attribute from being sink prevention only to both sink/hoist prevention (which should impact CSE as well I believe even if I didn't see any change on that regard) would have some performance impact. Considering there's another proposal with respect to make convergent the default from Matt, how do we see this impacting performance. Should we have two (or more?) attributes instead? For example quad-scoped texture operations don't really care about being hoisted/cse'd up, but they care about being sinked.

That's a good point. Something like allowconvergence and allowdivergence?

llvm.convergence.join is marked IntNoMem. Its not clear to me how is it gonna survive instcombine considering that is not emitting any value , so nothing is in place to keep it alive. What's the mechanism that's keeping it around?

You're right. The immediate answer to your question is that I missed marking the intrinsics as nounwind. Once I do that, they are indeed removed like you say. Any ideas on that?

Is there a plan on how we are gonna figure out all the optimizations that need to be updated and in particular, how are we gonna make sure new optimizations (probably written with single threaded in mind) are not gonna break our invariants. (Basically how do we make everybody aware this exists and make them think about it). In particular there could be classes of optimizations that work uniquely on CFG that now would maybe have to look "inside the blocks" before transforming because they need to make sure there's no convergent instruction/call while operating over the blocks they are considering.

I don't have a good answer for that, unfortunately. In general, education about convergent is the only answer that comes to mind on the first half. On the second half, I did go to some length to think about whether transforms need to worry about "spooky action at a distance", i.e. whether they need to scan code for convergent that they would not otherwise have to scan, and I believe this is largely not a problem (though I cannot prove it). This is summarized in the "Correctness" section of the new document.

llvm/docs/DynamicInstances.rst
230	I would like to avoid having to insert fake edges in the CFG: it might trip up generic transforms.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1474	Giving this some more thought, it does raise an interesting question: is there a legitimate use for a function that is `speculatable convergent`? It seems initially strange, given that speculation changes the set of threads calling the function, which goes counter to the whole point of `convergent`. But then, as you mentioned in your later comment and I've mentioned in D69498, we'll want to relax convergent. Something like a subgroup shuffle could perhaps be `speculatable convergent allowconvergence`, indicating that it can be hoisted but not sunk. I'm going to add some comments to that effect.

add some more expository text and examples
handle the case of speculatable+convergent

Herald added a subscriber: jfb. · View Herald TranscriptOct 30 2019, 6:54 AM

Harbormaster completed remote builds in B40267: Diff 227086.Oct 30 2019, 7:00 AM

• niellesx added a subscriber: • niellesx.Dec 30 2019, 3:47 PM

Flakebi added a subscriber: Flakebi.Mar 27 2020, 7:31 AM

kpet added a subscriber: kpet.May 21 2020, 9:13 AM

What is the status of this change?

I'm introducing extended subgroup functions in OpenCL: https://reviews.llvm.org/D79781#inline-735663 and currently marking them with the convergent attribute doesn't prevent invalid optimizations.

ychen added a subscriber: ychen.May 28 2020, 9:52 AM

mehdi_amini added a subscriber: ThomasRaoux.May 28 2020, 1:31 PM

In D68994#2060451, @PiotrFusik wrote:

What is the status of this change?

I still want to pursue a proper modeling of convergent that would probably express that the invalid optimizations that you're likely to see are forbidden. However, the model described here has some issues, and I hope to introduce a slight variation that addresses those issue.

As a matter of pragmatism, I wonder whether we shouldn't just already disable those problematic optimizations today...

However, the model described here has some issues, and I hope to introduce a slight variation that addresses those issue.

Can you expand on the issues that are present here?

In D68994#2073827, @nhaehnle wrote:

In D68994#2060451, @PiotrFusik wrote:

What is the status of this change?

I still want to pursue a proper modeling of convergent that would probably express that the invalid optimizations that you're likely to see are forbidden. However, the model described here has some issues, and I hope to introduce a slight variation that addresses those issue.

As a matter of pragmatism, I wonder whether we shouldn't just already disable those problematic optimizations today...

Do you mean disable them for the convergent functions?

Can you expand on the issues that are present here?

The intrinsics are fine, the formal modeling using dynamic instances isn't because it prevents desirable optimizations. For example:

%tok1 = @llvm.convergence.anchor()
if (divergent_condition) {
   ...
}
v1 = ballot(..., %tok1)

%tok2 = @llvm.convergence.anchor()
v2 = ballot(..., %tok2)

The ballot for v1 enforces reconvergence of the dynamic instances after the divergent branch, which means the ballot for v2 also executes fully converged (relative to whatever dynamic instances we had when entering this snippet).

If v1 is unused, we would like to DCE the ballot. However, after doing this there is no longer anything that guarantees reconvergence formally before the second anchor. Which means the ballot for v2 is now allowed to execute in a way that's partitioned according to the divergent condition.

As a matter of pragmatism, I wonder whether we shouldn't just already disable those problematic optimizations today...

Do you mean disable them for the convergent functions?

Yes.

Herald added a reviewer: jdoerfert. · View Herald TranscriptJul 24 2020, 9:53 AM

Now that I'm actually editing this text, I realize that this is an older version than I thought. I've given up on "join" and "point" as being too noisy in the IR in favor of something that more explicitly tracks loop structure, which is the real point of concern at least for what we care about.

nhaehnle mentioned this in D85603: IR: Add convergence control operand bundle and intrinsics.Aug 9 2020, 7:23 AM

nhaehnle mentioned this in D85604: SimplifyCFG: prevent certain transforms on convergent operations.

I am abandoning this change in favor of D85603, which is the next and hopefully final iteration on this.

Peter9606 added a subscriber: Peter9606.Dec 30 2022, 1:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2022, 1:10 AM

Revision Contents

Path

Size

llvm/

docs/

DynamicInstances.rst

431 lines

LangRef.rst

44 lines

Reference.rst

4 lines

include/

llvm/

IR/

Intrinsics.td

9 lines

lib/

Analysis/

ValueTracking.cpp

6 lines

Transforms/

Scalar/

LoopUnrollPass.cpp

10 lines

Utils/

SimplifyCFG.cpp

21 lines

test/

Transforms/

JumpThreading/

basic.ll

34 lines

SimplifyCFG/

attr-convergent.ll

43 lines

Diff 227086

llvm/docs/DynamicInstances.rst

This file was added.

				====================================================
				Dynamic Instances and Convergent Operation Semantics
				====================================================

				.. contents::
				:local:
				:depth: 4

				Overview
				========

				This document defines dynamic instances and the semantics of related intrinsics
				and the ``convergent`` attribute. Convergent operations communicate with other
				threads that execute the same operation. Typical examples of convergent
				operations are control barriers, which pause execution until all threads in some
				set of threads have reached the same barrier, and thread group reductions, which
				reduce an input value across some set of threads, e.g. summing up values
				computed by each thread.

				Convergent operations are commonly available in SPMD programming environments
				such as GPUs, where a large number of threads execute the same program
				concurrently. Implementations commonly map threads of execution onto lanes of
				a hardware SIMD unit, and the intent of convergent operations is often to
				communicate among those threads that are mapped to the same SIMD vector.
				However, the semantics defined in this document are independent of such
				implementation details.

				The defining characteristic that distinguishes convergent operations from other
				inter-thread communication is that the set of threads among which communication
				occurs is implicitly defined by control flow. For example, in the following GPU
				compute kernel, communication occurs precisely among those threads of an
				implementation-defined execution scope (such as workgroup or subgroup) for
				which ``condition`` is true:

				.. code-block:: text

				void example_kernel() {
				...
				if (condition)
				convergent_operation();
				...
				}

				One can think of the communicating sets of threads in terms of divergence
				and reconvergence: sets of threads split at branch instructions if threads
				follow different paths through the control flow graph (divergence) and may
				later merge when they reach the same static point in the program
				(reconvergence).

				In structured programming languages, there is usually an intuitive and
				unambiguous way of determining when threads reconverge. However, this is not
				the case in unstructured control flow representations as used by LLVM. The
				language of dynamic instances of instructions is used to formalize divergence
				and reconvergence and provide a global, static view of when reconvergence
				occurs.


				Waves/Warps and an Illustrating Example of Dynamic Instances
				============================================================

				Consider a simple loop:

				kariddiUnsubmitted Not Done Reply Inline Actions While I like the idea of formalizing the whole "Dynamic Instances" concept I wonder if it could be explained in a more intuitive way as well in order to make the understanding of the formal definitions easier by approaching them with an intuition already of what they are. kariddi: While I like the idea of formalizing the whole "Dynamic Instances" concept I wonder if it could…
				nhaehnleAuthorUnsubmitted Done Reply Inline Actions I don't think we can get away without a formal description, but I'm not against having some more illustrative examples. Do you have anything in mind? nhaehnle: I don't think we can get away without a formal description, but I'm not against having some…
				kariddiUnsubmitted Not Done Reply Inline Actions Yep, my suggestion was about adding a preamble before the formal definition that explains in more "earthly" terms what the concept is before jumping to the formal definition. Certainly not replacing the formal definition. I really like the fact that you added it in there! Maybe commonly known concepts like warps, threads in a warp and divergence could be used to that effect. If I understood correctly the concept "Dynamic instance" is the execution of an instruction by a certain group of threads and the same instruction can be executed by different group of threads at different point in time. (Please correct me if I misunderstood) . If its something like this it could be more intuitively be placed before the formal definition. kariddi: Yep, my suggestion was about adding a preamble before the formal definition that explains in…
				.. code-block:: text

				void example_loop() {
				A:
				br label %B

				B:
				...
				br i1 %cond, label %B, label %C

				C:
				...
				}

				We can illustrate all possible executions of this code by drawing a graph of
				dynamic instances (of basic blocks) that arises from "unrolling" the CFG:

				.. code-block:: text

				A -> B0 -> B1 -> B2 -> B3 -> ...
				\ \ \ \
				-------------------- ... -> C

				// or, for example:

				A -> B0 -> B1 -> B2 -> B3 -> ...
				\ \ \ \
				C0 C1 C2 C3

				The difference between the two graphs is whether threads "reconverge" after
				the loop: if `C` contains a subgroup operation (in the GPU language sense),
				will it communicate among all threads of the subgroup (usually: wave or warp)
				or only among those that left the loop in the same iteration?

				Note that other graphs are possible as well, where threads reconverge partially.

				As a first approximation, you can think of the dynamic instances as all
				executions of an instruction or basic block by the SIMD unit on which a
				wave/warp is running. If the threads reconverge in `C`, then `C` is only
				executed once for each wave/warp, corresponding to only one dynamic instance.

				However, note that this analogy is not precise. Lockstep execution is not
				enforced in the LLVM IR semantics. This falls out naturally from the wave/warp
				model of execution when considering workgroup scope. Even if `C` is considered
				to be reconverged at the workgroup scope, different waves/warps of the same
				workgroup will generally execute instructions in `C` at different points in
				time, except for explicit barrier instructions.


				Motivating Examples of Convergent Operations
				============================================

				The following stylized pixel shader samples a texture at a given set of
				coordinates. Texture sampling requires screen-space derivatives of the
				coordinates to determine the level of detail (mipmap level) of the sample,
				which is commonly approximated by taking the difference with respect to
				neighboring pixels -- which are computed by different threads in the same
				wave or warp:

				.. code-block:: text

				void example_shader() {
				...
				color = textureSample(texture, coordinates);
				if (condition) {
				use_color();
				}
				...
				}

				From a purely single-threaded perspective, sinking the `textureSample` into
				the if-statement appears legal. However, if the condition is false for some
				neighboring pixels, then their corresponding threads will not execute together
				in the wave/warp, making it impossible to take the difference of coordinates
				as an approximation of the screen-space derivative (in practice, the outcome
				will be an undefined value).

				That is, the `textureSample` operation fits our definition of a convergent
				operation:

				* It communicates with a set of thread that implicitly depends on control flow.

				* Correctness depends on this set of threads.

				The following example shows that merging common code of branches can be
				incorrect in the face of convergent operations:

				.. code-block:: text

				void example_kernel() {
				delta = ...
				if (delta > 0) {
				total_gains = nonuniformSubgroupSum(delta);
				...
				} else {
				total_losses = nonuniformSubgroupSum(delta);
				...
				}
				}

				The `nonuniformSubgroupSum` computing the `total_gains` will be executed by
				the subset of threads with positive `delta` in a warp, and so will sum up al
				the `delta`s of those threads; and similarly for the `nonuniformSubgroupSum`
				that computes the `total_losses`.

				If we were to move and merge the `nonuniformSubgroupSum` above the if-statement,
				it would sum up the `delta` across _all_ threads.

				Consider a simple example of jump threading:

				.. code-block:: text

				void example_original() {
				entry:
				...
				br i1 %cond1, label %then1, label %mid

				then1:
				...
				%cond2 = ...
				br label %mid

				mid:
				%flag = phi i1 [ true, %entry ], [ %cond2, %then1 ]
				br i1 %flag, label %then2, label %end

				then2:
				...
				call void @nonuniformSubgroupControlBarrier()
				...
				br label %end

				end:
				}

				void example_jumpthreaded() {
				entry:
				br i1 %cond1, label %then1, label %then2

				then1:
				...
				%cond2 = ...
				br i1 %cond2, label %then2, label %end

				then2:
				...
				call void @nonuniformSubgroupControlBarrier()
				...
				br label %end

				end:
				}

				Is the control barrier guaranteed to synchronize among the same set of threads
				in both cases? Different implementations in the literature may give different
				answers to this question:

				* In an implementation that reconverges at post-dominators, threads reconverge
				at `%mid` in the first version, so that all threads (within a wave/warp) that
				execute the control barrier do so together. In the second version, threads
				that reach the control barrier via different paths synchronize separately.

				* An implementation that sorts basic blocks topologically and ensures maximal
				reconvergence for each basic block would behave the same way in both
				versions.

				Therefore, we need to be able to specify explicitly which threads communicate
				in a convergent operation.
				arsenmUnsubmitted Not Done Reply Inline Actions I don't fully understand what the 'point' of this one is. What is the point of ensuring nonreconvergence? Which real example does this correspond to? arsenm: I don't fully understand what the 'point' of this one is. What is the point of ensuring…
				nhaehnleAuthorUnsubmitted Done Reply Inline Actions It allows us to mark regions of code that are semantically part of a loop for the purposes of convergence even when they're not part of the loop as far as loop analysis is concerned. This applies to high-level code such as the following: int divergent_key = ...; int v = ...; int sum; for (;;;) { tok = @llvm.convergence.anchor() int uniform_key = readfirstlane(divergent_key); if (uniform_key == divergent_key) { sum = subgroup_reduce_add(v); @llvm.convergence.point(tok) break; } } The point indicates that no reconvergence should happen for the execution of the reduce operation (or in a sense, that the reduce operation should not be moved out of the loop). Points are in some sense redundant, as the code above could be equivalently rewritten as: int divergent_key = ...; int v = ...; int sum; for (;;;) { tok = @llvm.convergence.anchor() int uniform_key = readfirstlane(divergent_key); if (uniform_key == divergent_key) { sum = subgroup_reduce_add(v); } @llvm.convergence.join(tok) if (uniform_key == divergent_key) break; } Though that reformulation loses some information about control-flow, in particular it leads to more conservative live ranges. nhaehnle: It allows us to mark regions of code that are semantically part of a loop for the purposes of…
				simollUnsubmitted Not Done Reply Inline Actions Couldn't you align the conceptual convergence point with the loop structure by adding a `@llvm.always.true` intrinsic like so: int divergent_key = ...; int v = ...; int sum; for (;;;) { int uniform_key = readfirstlane(divergent_key); if (uniform_key == divergent_key) { sum = subgroup_reduce_add(v); %we.know.its.true = call @llvm.always.true() br i1 %we.know.its.true, %loopExitBlock, %nextBlockInTheLoop } } That way you do not need the intrinsic and the DA/SDA already assume that threads only synchronize at (`LoopInfo`) loop exits. simoll: Couldn't you align the conceptual convergence point with the loop structure by adding a `@llvm.
				nhaehnleAuthorUnsubmitted Done Reply Inline Actions I would like to avoid having to insert fake edges in the CFG: it might trip up generic transforms. nhaehnle: I would like to avoid having to insert fake edges in the CFG: it might trip up generic…



				Definitions and Rules
				=====================

				1. Every instruction in an LLVM IR function has zero or more dynamic instances.

				2. There is exactly one dynamic instance of the function entry point.

				3. The dynamic instances form a directed acyclic (potentially infinite) graph.

				4. If there is a walk W (in this graph) from a dynamic instance A to a
				dynamic instance B, then there is a corresponding walk w in the
				function's control flow graph in the following sense: the walks have the
				same length, and the i-th dynamic instance in W is a dynamic instance of
				the i-th instruction in w.

				We say that w is the projection of W, and W is a lifting of w.

				5. Given a dynamic instance A, every walk through the function's control flow
				graph starting at the instruction corresponding to A has a unique lifting
				that starts in A. That is, two threads that follow the same control flow
				path will not diverge.

				For the purpose of this rule, edges in the control flow graph are
				significant. For example, if both labels of a branch instruction point to the
				same basic block, threads executing the same dynamic instance of the branch
				instruction with different truth values of the condition may diverge.

				Note: This rule implies that when threads diverge inside of a function call,
				they will reconverge when the returning to the calling function.

				6. When executing a dynamic instance A of a convergent operation, the set of
				communicating threads is the subset of threads that execute A out of some
				implementation-defined set of threads.

				Examples of this implementation-defined set of threads are the OpenCL
				workgroup of the executing thread or a SPIR-V subgroup.

				These rules intentionally do not fully define the set (and connectivity) of
				dynamic instances: the only constraint on when reconvergence may happen is
				that the graph must be acyclic. In particular, reconvergence may happen in the
				middle of a basic block. Later in the document, we describe intrinsics that can
				be used to constrain where reconvergence may or or may not occur.

				7. When the graph of dynamic instances is not fully defined, it is considered
				to be (partially) implementation-defined. Program transforms are free to add
				additional constraints to the graph connectivity in this case.


				Correctness of Program Transforms
				=================================

				(This section is informative.)

				As implied by the rules in the previous section, program transforms are correct
				with regards to convergent operations if they preserve their semantics, which
				means that the set of communicating threads must remain the same.

				If the connectivity of the graph of dynamic instances is fully defined, this
				means that if two possible executions of the original program do/do not execute
				the same dynamic instance of a given convergent operation, then they must/must
				not do so in the transformed program.
				Since the connectivity is in general not fully defined, the following more
				complex rule applies: if the transformed program allows a graph of dynamic
				instances such that two possible executions E1 and E2 of a program do/do not
				execute the same dynamic instance of a convergent operation, then the original
				program must allow a graph such that E1 and E2 do/do not execute the same
				dynamic instance at the corresponding point of their execution.

				In general, program transforms with a single-threaded focus are correct if they
				do not move convergent operations in the control flow graph, i.e. if they do
				not sink or hoist convergent operations across a branch. This applies even to
				program transforms that change the control flow graph.

				For example, consider unrolling a simple loop that in itself does not contain
				convergent operations. This loop causes divergence based on, and only based on,
				the trip count. This remains true even after (partial) unrolling. Therefore,
				unrolling the loop does not change the possible graphs of dynamic instances in
				any way that could affect convergent operations. The same argument applies to
				other loop transforms and to transforms such as replacing a switch statement by
				a binary tree of conditional branches.

				On the other hand, a loop that contains convergent operations and has unknown
				trip count cannot be unrolled in general, since the remainder loop changes the
				possible graph structures for convergent operations in the loop body.

				A major exception to the rule of thumb that program transforms are
				conservatively correct if they "do not touch" convergent operations are program
				transforms that create control flow "from thin air". Examples of this are a
				transform that specializes parts of a function based on testing a value
				against a constant, or a transform that converts ``select`` instructions into
				branches. Both kinds of transforms introduce additional divergence without
				guaranteeing reconvergence, which may change the behavior of convergent
				operations that occur much later in the program. (Note, however, that
				if-conversion is perfectly legal as long as the converted basic blocks
				themselves do not contain any convergent operations, because it only reduces
				the set of possible graphs of dynamic instances as far as convergent operations
				are concerned.)


				Memory Model Non-Interaction
				============================

				The fact that an operation is convergent has no effect on how it should be
				treated for memory model purposes. In particular, an operation that is
				``convergent`` and ``readnone`` does not introduce additional ordering
				constraints as far as the memory model is concerned. There is no implied
				barrier, neither in the memory barrier sense nor in the control barrier
				sense of synchronizing the execution of threads. Threads executing the same
				dynamic instance do not necessarily do so at the same time.


				Other Interactions
				==================

				1. `convergent` vs. `speculatable`. A function can be both `convergent` and
				`speculatable`, indicating that the function does not have undefined
				behavior and has no effects besides calculating its result -- but is still
				affected by the set of threads executing this function. This typically
				prevents speculation of calls to the function unless the constraint imposed
				by `convergent` is further relaxed by some other means.


				Intrinsics
				==========

				This section defines a set of intrinsics that can be used together to restrict
				the possible graphs of dynamic instances by ensuring reconvergence or
				non-reconvergence at certain points of the program.

				.. _llvm.convergence.anchor:

				``llvm.convergence.anchor``
				------------------------------------

				.. code-block:: text

				token @llvm.convergence.anchor() convergent readnone


				This intrinsic is a marker that acts as an "anchor". The produced token can
				be consumed by join or point intrinsics in order to restrict the dynamic
				instances DAG structure.

				The region of an anchor is the subset of dominance region of the region from
				which corresponding join or point intrinsics are reachable (without leaving
				the dominance region).

				Anchor regions must be well-nested: if the region of anchor a1 contains
				a join or point anchored by a2, then a1 must dominate a2.


				.. _llvm.convergence.join:

				``llvm.convergence.join``
				------------------------------------

				.. code-block:: text

				void @llvm.convergence.join(token %anchor) convergent readnone

				The anchor token must be refer to a call to the anchor intrinsic, which we
				simply call the anchor.

				This intrinsic ensures reconvergence in the following sense. Let b be a call
				to this intrinsic that can be reached from the function entry (called a
				"join"), and let A be a dynamic instance of the anchor a. Then there is a
				dynamic instance B of b such that for every walk in the graph of dynamic
				instances that begins at A and encounters a dynamic instance of b before
				encountering another dynamic instance of a, B is that dynamic instance of
				b.

				The program is undefined if the control flow graph has a cycle that contains
				a join but not its corresponding anchor.


				.. _llvm.convergence.point:

				``llvm.convergence.point``
				-----------------------------------

				.. code-block:: text

				void @llvm.convergence.point(token %anchor) convergent readnone

				The anchor token must be produced by a call to the anchor intrinsic, which we
				simply call the anchor.

				This intrinsic ensures non-reconvergence. It is named after the points at the
				end of the tines of a fork. Non-reconvergence is ensured in the following
				sense. Let b be a call to this intrinsic that can be reached from the
				function entry (called a "point"), let a be the anchor, and let B be a
				dynamic instance of b. Then there is a dynamic instance A of a such that
				on every walk in the graph of dynamic instances starting at the function entry
				point and reaching B, A is the most recently encountered dynamic instance
				of a.

				The program is undefined if the control flow graph has a cycle that contains
				a point but not its corresponding anchor.

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,395 Lines • ▼ Show 20 Lines	``builtin``
direct calls to functions that are declared with the ``nobuiltin``		direct calls to functions that are declared with the ``nobuiltin``
attribute.		attribute.
``cold``		``cold``
This attribute indicates that this function is rarely called. When		This attribute indicates that this function is rarely called. When
computing edge weights, basic blocks post-dominated by a cold		computing edge weights, basic blocks post-dominated by a cold
function call are also considered to be cold; and, thus, given low		function call are also considered to be cold; and, thus, given low
weight.		weight.
``convergent``		``convergent``
In some parallel execution models, there exist operations that cannot be		In some parallel execution models, there exist operations that communicate
made control-dependent on any additional values. We call such operations		with a set of threads that is implicitly defined by control flow. We call
``convergent``, and mark them with this attribute.		such operations ``convergent`` and mark them with this attribute.

The ``convergent`` attribute may appear on functions or call/invoke		The ``convergent`` attribute may appear on call/invoke instructions to
instructions. When it appears on a function, it indicates that calls to		indicate that the instruction is a convergent operation, or on functions
this function should not be made control-dependent on additional values.		to indicate that calls to this function are convergent operations.
For example, the intrinsic ``llvm.nvvm.barrier0`` is ``convergent``, so
calls to this intrinsic cannot be made control-dependent on additional		The presence of this attribute indicates that certain program transforms
values.		involving control flow are forbidden. For a detailed description, see the
		`Dynamic Instances <DynamicInstances.html>`_ document.
When it appears on a call/invoke, the ``convergent`` attribute indicates
that we should treat the call as though we're calling a convergent
function. This is particularly useful on indirect calls; without this we
may treat such calls as though the target is non-convergent.

The optimizer may remove the ``convergent`` attribute on functions when it		The optimizer may remove the ``convergent`` attribute on functions when it
can prove that the function does not execute any convergent operations.		can prove that the function does not execute any convergent operations.
Similarly, the optimizer may remove ``convergent`` on calls/invokes when it		Similarly, the optimizer may remove ``convergent`` on calls/invokes when it
can prove that the call/invoke cannot call a convergent function.		can prove that the call/invoke cannot call a convergent function.
``inaccessiblememonly``		``inaccessiblememonly``
This attribute indicates that the function may only access memory that		This attribute indicates that the function may only access memory that
is not accessible by the module being compiled. This is a weaker form		is not accessible by the module being compiled. This is a weaker form
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	``willreturn``
Annotated functions may still raise an exception, i.a., ``nounwind`` is not implied.		Annotated functions may still raise an exception, i.a., ``nounwind`` is not implied.
If an invocation of an annotated function does not return control back		If an invocation of an annotated function does not return control back
to a point in the call stack, the behavior is undefined.		to a point in the call stack, the behavior is undefined.
``nosync``		``nosync``
This function attribute indicates that the function does not communicate		This function attribute indicates that the function does not communicate
(synchronize) with another thread through memory or other well-defined means.		(synchronize) with another thread through memory or other well-defined means.
Synchronization is considered possible in the presence of `atomic` accesses		Synchronization is considered possible in the presence of `atomic` accesses
that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses,		that enforce an order, thus not "unordered" and "monotonic", `volatile` accesses,
as well as `convergent` function calls. Note that through `convergent` function calls		as well as `convergent` function calls.
non-memory communication, e.g., cross-lane operations, are possible and are also
considered synchronization. However `convergent` does not contradict `nosync`.		Note that `convergent` operations can involve communication that is
If an annotated function does ever synchronize with another thread,		considered to be not through memory (e.g. through cross-lane operations)
		and does not necessarily imply an ordering between threads for the purposes
		of the memory model. Therefore, an operation can be both `convergent` and
		`nosync`.

		If a `nosync` function does ever synchronize with another thread,
the behavior is undefined.		the behavior is undefined.
``nounwind``		``nounwind``
This function attribute indicates that the function never raises an		This function attribute indicates that the function never raises an
exception. If the function does raise an exception, its runtime		exception. If the function does raise an exception, its runtime
behavior is undefined. However, functions marked nounwind may still		behavior is undefined. However, functions marked nounwind may still
trap or generate asynchronous exceptions. Exception handling schemes		trap or generate asynchronous exceptions. Exception handling schemes
that are recognized by LLVM to handle asynchronous exceptions, such		that are recognized by LLVM to handle asynchronous exceptions, such
as SEH, will still provide their implementation defined semantics.		as SEH, will still provide their implementation defined semantics.
▲ Show 20 Lines • Show All 12,900 Lines • ▼ Show 20 Lines

.. code-block:: llvm		.. code-block:: llvm

%a = load i16, i16* @x, align 2		%a = load i16, i16* @x, align 2
%res = call float @llvm.convert.from.fp16(i16 %a)		%res = call float @llvm.convert.from.fp16(i16 %a)

.. _dbg_intrinsics:		.. _dbg_intrinsics:

		Convergence Intrinsics
		----------------------

		The LLVM convergence intrinsics for controlling the semantics of convergent
		operations, which all start with the ``llvm.convergence.`` prefix, are
		described in the `Dynamic Instances <DynamicInstances.html>`_ document.

Debugger Intrinsics		Debugger Intrinsics
-------------------		-------------------

The LLVM debugger intrinsics (which all start with ``llvm.dbg.``		The LLVM debugger intrinsics (which all start with ``llvm.dbg.``
prefix), are described in the `LLVM Source Level		prefix), are described in the `LLVM Source Level
Debugging <SourceLevelDebugging.html#format-common-intrinsics>`_		Debugging <SourceLevelDebugging.html#format-common-intrinsics>`_
document.		document.

▲ Show 20 Lines • Show All 3,485 Lines • Show Last 20 Lines

llvm/docs/Reference.rst

Show All 11 Lines	.. toctree::
Atomics		Atomics
BitCodeFormat		BitCodeFormat
BlockFrequencyTerminology		BlockFrequencyTerminology
BranchWeightMetadata		BranchWeightMetadata
Bugpoint		Bugpoint
CommandGuide/index		CommandGuide/index
Coroutines		Coroutines
DependenceGraphs/index		DependenceGraphs/index
		DynamicInstances
ExceptionHandling		ExceptionHandling
Extensions		Extensions
FaultMaps		FaultMaps
FuzzingLLVM		FuzzingLLVM
GarbageCollection		GarbageCollection
GetElementPtr		GetElementPtr
GlobalISel/index		GlobalISel/index
GwpAsan		GwpAsan
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines

:doc:`Machine IR (MIR) Format Reference Manual <MIRLangRef>`		:doc:`Machine IR (MIR) Format Reference Manual <MIRLangRef>`
A reference manual for the MIR serialization format, which is used to test		A reference manual for the MIR serialization format, which is used to test
LLVM's code generation passes.		LLVM's code generation passes.

:doc:`GlobalISel/index`		:doc:`GlobalISel/index`
This describes the prototype instruction selection replacement, GlobalISel.		This describes the prototype instruction selection replacement, GlobalISel.

		:doc:`DynamicInstances`
		Description of dynamic instances semantics and convergent operations.

=====================		=====================
Testing and Debugging		Testing and Debugging
=====================		=====================

:doc:`LLVM Testing Infrastructure Guide <TestingGuide>`		:doc:`LLVM Testing Infrastructure Guide <TestingGuide>`
A reference manual for using the LLVM testing infrastructure.		A reference manual for using the LLVM testing infrastructure.

:doc:`TestSuiteGuide`		:doc:`TestSuiteGuide`
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 1,273 Lines • ▼ Show 20 Lines	def int_preserve_union_access_index : Intrinsic<[llvm_anyptr_ty],
[llvm_anyptr_ty, llvm_i32_ty],		[llvm_anyptr_ty, llvm_i32_ty],
[IntrNoMem, ImmArg<1>]>;		[IntrNoMem, ImmArg<1>]>;
def int_preserve_struct_access_index : Intrinsic<[llvm_anyptr_ty],		def int_preserve_struct_access_index : Intrinsic<[llvm_anyptr_ty],
[llvm_anyptr_ty, llvm_i32_ty,		[llvm_anyptr_ty, llvm_i32_ty,
llvm_i32_ty],		llvm_i32_ty],
[IntrNoMem, ImmArg<1>,		[IntrNoMem, ImmArg<1>,
ImmArg<2>]>;		ImmArg<2>]>;

		//===------- Convergence Intrinsics ---------------------------------------===//

		def int_convergence_anchor : Intrinsic<[llvm_token_ty], [],
		[IntrNoMem, IntrConvergent]>;
		def int_convergence_join : Intrinsic<[], [llvm_token_ty],
		[IntrNoMem, IntrConvergent]>;
		def int_convergence_point : Intrinsic<[], [llvm_token_ty],
		[IntrNoMem, IntrConvergent]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target-specific intrinsics		// Target-specific intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "llvm/IR/IntrinsicsPowerPC.td"		include "llvm/IR/IntrinsicsPowerPC.td"
include "llvm/IR/IntrinsicsX86.td"		include "llvm/IR/IntrinsicsX86.td"
include "llvm/IR/IntrinsicsARM.td"		include "llvm/IR/IntrinsicsARM.td"
include "llvm/IR/IntrinsicsAArch64.td"		include "llvm/IR/IntrinsicsAArch64.td"
Show All 9 Lines

llvm/lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 3,940 Lines • ▼ Show 20 Lines	case Instruction::Load: {
return isDereferenceableAndAlignedPointer(		return isDereferenceableAndAlignedPointer(
LI->getPointerOperand(), LI->getType(), MaybeAlign(LI->getAlignment()),		LI->getPointerOperand(), LI->getType(), MaybeAlign(LI->getAlignment()),
DL, CtxI, DT);		DL, CtxI, DT);
}		}
case Instruction::Call: {		case Instruction::Call: {
auto *CI = cast<const CallInst>(Inst);		auto *CI = cast<const CallInst>(Inst);
const Function *Callee = CI->getCalledFunction();		const Function *Callee = CI->getCalledFunction();

		// The called function depends on the set of threads executing it, which
		// could change if the call is moved to a different location in control
		// flow.
		if (CI->isConvergent())
		return false;

// The called function could have undefined behavior or side-effects, even		// The called function could have undefined behavior or side-effects, even
// if marked readnone nounwind.		// if marked readnone nounwind.
return Callee && Callee->isSpeculatable();		return Callee && Callee->isSpeculatable();
}		}
case Instruction::VAArg:		case Instruction::VAArg:
case Instruction::Alloca:		case Instruction::Alloca:
case Instruction::Invoke:		case Instruction::Invoke:
case Instruction::CallBr:		case Instruction::CallBr:
▲ Show 20 Lines • Show All 1,864 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 1,088 Lines • ▼ Show 20 Lines	if (ExitingBlock) {
TripMultiple = SE.getSmallConstantTripMultiple(L, ExitingBlock);		TripMultiple = SE.getSmallConstantTripMultiple(L, ExitingBlock);
}		}

// If the loop contains a convergent operation, the prelude we'd add		// If the loop contains a convergent operation, the prelude we'd add
// to do the first few instructions before we hit the unrolled loop		// to do the first few instructions before we hit the unrolled loop
// is unsafe -- it adds a control-flow dependency to the convergent		// is unsafe -- it adds a control-flow dependency to the convergent
// operation. Therefore restrict remainder loop (try unrollig without).		// operation. Therefore restrict remainder loop (try unrollig without).
//		//
// TODO: This is quite conservative. In practice, convergent_op()		// TODO: This is somewhat conservative, as we could allow unrolling if the
// is likely to be called unconditionally in the loop. In this		// trip count is uniform across the threads in a thread group, which it
// case, the program would be ill-formed (on most architectures)		// should be if convergent_op() is a barrier.
// unless n were the same on all threads in a thread group.
// Assuming n is the same on all threads, any kind of unrolling is
// safe. But currently llvm's notion of convergence isn't powerful
// enough to express this.
if (Convergent)		if (Convergent)
UP.AllowRemainder = false;		UP.AllowRemainder = false;

// Try to find the trip count upper bound if we cannot find the exact trip		// Try to find the trip count upper bound if we cannot find the exact trip
// count.		// count.
unsigned MaxTripCount = 0;		unsigned MaxTripCount = 0;
bool MaxOrZero = false;		bool MaxOrZero = false;
if (!TripCount) {		if (!TripCount) {
▲ Show 20 Lines • Show All 376 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 1,257 Lines • ▼ Show 20 Lines	while (isa<DbgInfoIntrinsic>(I2))
I2 = &*BB2_Itr++;		I2 = &*BB2_Itr++;
}		}
// FIXME: Can we define a safety predicate for CallBr?		// FIXME: Can we define a safety predicate for CallBr?
if (isa<PHINode>(I1) \|\| !I1->isIdenticalToWhenDefined(I2) \|\|		if (isa<PHINode>(I1) \|\| !I1->isIdenticalToWhenDefined(I2) \|\|
(isa<InvokeInst>(I1) && !isSafeToHoistInvoke(BB1, BB2, I1, I2)) \|\|		(isa<InvokeInst>(I1) && !isSafeToHoistInvoke(BB1, BB2, I1, I2)) \|\|
isa<CallBrInst>(I1))		isa<CallBrInst>(I1))
return false;		return false;

		// Cannot hoist convergent calls since that could change the set of threads
		// with which they communicate.
		if (const auto *C = dyn_cast<CallBase>(I1)) {
		if (C->isConvergent())
		return false;
		}

BasicBlock *BIParent = BI->getParent();		BasicBlock *BIParent = BI->getParent();

bool Changed = false;		bool Changed = false;
do {		do {
// If we are hoisting the terminator instruction, don't move one (making a		// If we are hoisting the terminator instruction, don't move one (making a
// broken BB), instead clone it, and remove BI.		// broken BB), instead clone it, and remove BI.
if (I1->isTerminator())		if (I1->isTerminator())
goto HoistTerminator;		goto HoistTerminator;
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	static bool canSinkInstructions(
// PHI instruction in the successor.		// PHI instruction in the successor.
bool HasUse = !Insts.front()->user_empty();		bool HasUse = !Insts.front()->user_empty();
for (auto *I : Insts) {		for (auto *I : Insts) {
// These instructions may change or break semantics if moved.		// These instructions may change or break semantics if moved.
if (isa<PHINode>(I) \|\| I->isEHPad() \|\| isa<AllocaInst>(I) \|\|		if (isa<PHINode>(I) \|\| I->isEHPad() \|\| isa<AllocaInst>(I) \|\|
I->getType()->isTokenTy())		I->getType()->isTokenTy())
return false;		return false;

		if (const auto *C = dyn_cast<CallBase>(I)) {
// Conservatively return false if I is an inline-asm instruction. Sinking		// Conservatively return false if I is an inline-asm instruction. Sinking
// and merging inline-asm instructions can potentially create arguments		// and merging inline-asm instructions can potentially create arguments
// that cannot satisfy the inline-asm constraints.		// that cannot satisfy the inline-asm constraints.
if (const auto *C = dyn_cast<CallBase>(I))
if (C->isInlineAsm())		if (C->isInlineAsm())
return false;		return false;

		// Do not sink convergent calls, as sinking may affect the set of threads
		// with which the convergent operation communicates.
		if (C->isConvergent())
		kariddiUnsubmitted Not Done Reply Inline Actions Is this the only change in SimplifyCFG we would have to do? What about FoldBranchToCommonDest() ? It seems to be hoisting instructions into the predecessors , but it just checks for the Speculable attribute (through isSafeToSpeculativelyExecute()) and seem to ignore the Convergent attribute. In general how much the new behavior has been vetted around LLVM passes with respect respecting hoisting or changes in thread execution mask? kariddi: Is this the only change in SimplifyCFG we would have to do? What about FoldBranchToCommonDest…
		nhaehnleAuthorUnsubmitted Done Reply Inline Actions I've fixed the places that I was aware of based on bugs we've encountered over the years. I wouldn't be surprised if there were more places like you say... nhaehnle: I've fixed the places that I was aware of based on bugs we've encountered over the years. I…
		nhaehnleAuthorUnsubmitted Done Reply Inline Actions Giving this some more thought, it does raise an interesting question: is there a legitimate use for a function that is `speculatable convergent`? It seems initially strange, given that speculation changes the set of threads calling the function, which goes counter to the whole point of `convergent`. But then, as you mentioned in your later comment and I've mentioned in D69498, we'll want to relax convergent. Something like a subgroup shuffle could perhaps be `speculatable convergent allowconvergence`, indicating that it can be hoisted but not sunk. I'm going to add some comments to that effect. nhaehnle: Giving this some more thought, it does raise an interesting question: is there a legitimate use…
		return false;
		}

// Each instruction must have zero or one use.		// Each instruction must have zero or one use.
if (HasUse && !I->hasOneUse())		if (HasUse && !I->hasOneUse())
return false;		return false;
if (!HasUse && !I->user_empty())		if (!HasUse && !I->user_empty())
return false;		return false;
}		}

const Instruction *I0 = Insts.front();		const Instruction *I0 = Insts.front();
▲ Show 20 Lines • Show All 4,664 Lines • Show Last 20 Lines

llvm/test/Transforms/JumpThreading/basic.ll

Show First 20 Lines • Show All 599 Lines • ▼ Show 20 Lines	l4:
ret void		ret void

l5:		l5:
call void @k()		call void @k()
ret void		ret void
; CHECK: }		; CHECK: }
}		}

		; CHECK-LABEL: @convergence_intrinsics
		; CHECK: entry:
		; CHECK: if1.then:
		; CHECK: br label %if1.end
		; CHECK: if1.end:
		; CHECK: if2.then:
		; CHECK: if2.end
		define i32 @convergence_intrinsics(i1 %cc1) {
		entry:
		%tok = call token @llvm.convergence.anchor()
		br i1 %cc1, label %if1.then, label %if1.end

		if1.then:
		%v = call i32 @f1()
		%cc.v = icmp ne i32 %v, 0
		br label %if1.end

		if1.end:
		%cc2 = phi i1 [ false, %entry ], [ %cc.v, %if1.then ]
		call void @llvm.convergence.join(token %tok)
		br i1 %cc2, label %if2.then, label %if2.end

		if2.then:
		%ballot1 = call i32 @ballot(i1 true)
		br label %if2.end

		if2.end:
		%r = phi i32 [ 0, %if1.end ], [ %ballot1, %if2.then ]
		ret i32 %r
		}

		declare token @llvm.convergence.anchor() convergent readnone
		declare void @llvm.convergence.join(token) convergent readnone
		declare i32 @ballot(i1) convergent readnone

; CHECK: attributes [[$NOD]] = { noduplicate }		; CHECK: attributes [[$NOD]] = { noduplicate }
; CHECK: attributes [[$CON]] = { convergent }		; CHECK: attributes [[$CON]] = { convergent }

llvm/test/Transforms/SimplifyCFG/attr-convergent.ll

	; RUN: opt < %s -simplifycfg -S \| FileCheck %s			; RUN: opt < %s -simplifycfg -S \| FileCheck %s

	; Checks that the SimplifyCFG pass won't duplicate a call to a function marked			; Checks that the SimplifyCFG pass won't duplicate a call to a function marked
	; convergent.			; convergent.
	;			;
				; CHECK-LABEL: @check
	; CHECK: call void @barrier			; CHECK: call void @barrier
	; CHECK-NOT: call void @barrier			; CHECK-NOT: call void @barrier
	define void @check(i1 %cond, i32* %out) {			define void @check(i1 %cond, i32* %out) {
	entry:			entry:
	br i1 %cond, label %if.then, label %if.end			br i1 %cond, label %if.then, label %if.end

	if.then:			if.then:
	store i32 5, i32* %out			store i32 5, i32* %out
	br label %if.end			br label %if.end

	if.end:			if.end:
	%x = phi i1 [ true, %entry ], [ false, %if.then ]			%x = phi i1 [ true, %entry ], [ false, %if.then ]
	call void @barrier()			call void @barrier()
	br i1 %x, label %cond.end, label %cond.false			br i1 %x, label %cond.end, label %cond.false

	cond.false:			cond.false:
	br label %cond.end			br label %cond.end

	cond.end:			cond.end:
	ret void			ret void
	}			}

				; CHECK-LABEL: @dont_merge_convergent
				; CHECK: if.then:
				; CHECK: %ballot.then = call i32 @ballot
				; CHECK: if.else:
				; CHECK: %ballot.else = call i32 @ballot
				define i32 @dont_merge_convergent(i1 %cond1, i1 %cond2) {
				entry:
				br i1 %cond1, label %if.then, label %if.else

				if.then:
				%ballot.then = call i32 @ballot(i1 %cond2)
				br label %end

				if.else:
				%ballot.else = call i32 @ballot(i1 %cond2)
				br label %end

				end:
				%ballot = phi i32 [ %ballot.then, %if.then ], [ %ballot.else, %if.else ]
				ret i32 %ballot
				}

				; CHECK-LABEL: @dont_speculate_convergent
				; CHECK: entry:
				; CHECK: then:
				; CHECK: call i32 @speculatable_convergent
				; CHECK: end:
				define i32 @dont_speculate_convergent(i1 %cond1, i32 %in) {
				entry:
				br i1 %cond1, label %then, label %end

				then:
				%v = call i32 @speculatable_convergent(i32 %in)
				br label %end

				end:
				%r = phi i32 [ 0, %entry ], [ %v, %then ]
				ret i32 %r
				}

				declare i32 @ballot(i1 %arg) convergent readnone
				declare i32 @speculatable_convergent(i32) convergent readnone speculatable
	declare void @barrier() convergent			declare void @barrier() convergent