This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
50
LangRef.rst
-
include/llvm/
-
llvm/
-
Analysis/
-
CodeMetrics.h
-
IR/
-
Instructions.h
13
Intrinsics.td
-
IntrinsicsAMDGPU.td
-
lib/
-
Analysis/
-
CodeMetrics.cpp
-
AsmParser/
-
LLParser.cpp
-
IR/
-
Verifier.cpp
-
Transforms/
-
Scalar/
-
GVNHoist.cpp
-
LoopUnrollPass.cpp
-
Utils/
2
SimplifyCFG.cpp
-
test/
-
Bitcode/
-
attributes.ll
-
Transforms/
-
GVNHoist/
-
hoist-convergent.ll
-
InstCombine/
-
select-call.ll
-
LoopUnroll/
-
convergent.ll
-
SimplifyCFG/
-
convergent.ll
-
utils/TableGen/
-
TableGen/
-
CodeGenIntrinsics.h
-
CodeGenTarget.cpp
-
IntrinsicEmitter.cpp

Differential D26348

Allow convergent attribute for function arguments
Needs RevisionPublic

Authored by nhaehnle on Nov 7 2016, 2:29 AM.

Download Raw Diff

Details

Reviewers

chandlerc
• tstellarAMD
jlebar
mehdi_amini
simoll
fpetrogalli
sanjoy
arsenm

Summary

While convergent functions are functions where the fact that they are called
must be uniform across multiple threads in an SIMT/SPMD-type execution
model, convergent function arguments are arguments whose value must be
uniform across multiple threads.

The problem that this is intended to address is that (for AMDGPU, but also
for general GLSL semantics):

%v1 = texelFetch(%sampler, %coord0)
%v2 = texelFetch(%sampler, %coord1)
%v = select i1 %cond, vType %v1, %v2

is logically equivalent to and could benefit from being transformed to:

%coord = select i1 %cond, cType %coord0, %coord1
%v = texelFetch(%sampler, %coord)

On the other hand,

%v1 = texelFetch(%sampler0, %coord)
%v2 = texelFetch(%sampler1, %coord)
%v = select i1 %cond, vType %v1, %v2

must not be transformed to

%s = select i1 %cond, sType %sampler0, %sampler1
%v = texelFetch(%s, %coord)

because of uniformity restrictions on the first argument of texelFetch.

While InstCombine does not actually perform these transforms today,
SimplifyCFG does tail sinking that amounts to the same thing, and there are
shaders in the wild that are mis-compiled because of it.

In other words, this patch is really a bug fix, but it tries to fix the bug
without unnecessary performance regression and keep the door open for future
optimization improvements.

This is very much an RFC with feedback very much appreciated, but I'd
personally be happy to push the patch as-is (minus the part of the
select-call.ll test which merely illustrates a potential future improvement
and minus the incomplete formalization).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988

Diff Detail

Build Status

Buildable 1543
Build 1543: arc lint + arc unit

Event Timeline

nhaehnle updated this revision to Diff 77022.Nov 7 2016, 2:29 AM

nhaehnle retitled this revision from to Allow convergent attribute for function arguments.

nhaehnle updated this object.

nhaehnle added reviewers: arsenm, • tstellarAMD, mehdi_amini, jlebar.

nhaehnle added a subscriber: llvm-commits.

Herald edited edge metadata. · View Herald TranscriptNov 7 2016, 2:29 AM

Herald added a subscriber: wdng. · View Herald Transcript

I looked at just the langref and intrinsics.td.

docs/LangRef.rst
1154	Nit, I think you want "whereas" or "while". Sort of like "although" but a less strong sense of contrast.
1168	This is going to be the paragraph to wordsmith. I think the approach of saying "A correct transformation has the following property: For any two runs of the program r1 and r2, if r1 and r2 were compatible before the transformation, they must be compatible after the transformation" makes some sense as a way to import SPMD semantics into the langref. But I am not sure about the definition of 'compatible'. For one thing, this doesn't work if the transformation adds or removes a callsite with convergent parameters. But you're sometimes allowed to do this. For example, you could unroll a loop which contains a convergent-parameter call. This is really tricky...
include/llvm/IR/Intrinsics.td
54	This definition actually seems fairly clear to me, if we s/control-//. Is there a reason we don't want to use this as our canonical definition?

mehdi_amini added reviewers: sanjoy, chandlerc.Nov 7 2016, 3:02 PM

This seems to import the "SIMT/SPMD/..." model into LangRef, and the constraint here seems a bit scary since I suspect it may impact significantly the optimizer.
(adding @chandlerc as reviewer as well)

This seems to import the "SIMT/SPMD/..." model into LangRef, and the constraint here seems a bit scary since I suspect it may impact significantly the optimizer.

Right, see previous discussion on llvm-dev. We're trying to craft something that doesn't import SIMT/SPMD into the langref while still allowing us to model the constraints the hardware puts on these functions.

The existing convergent function attr also has nontrivial impacts on the optimizer.

Thank you for taking a look. I've replied to some points inline.

As for the overall impact on passes, (a) the impact is local in the sense that you only have to care when you touch a call-site which calls a function with this attribute, and (b) I suspect the places where you do need to consider the attribute in e.g. loop transformations are quite similar to the existing use of convergent as a function argument.

docs/LangRef.rst
1154	Makes sense.
1168	So, the intention is for the language to allow the addition of callsites in loop unrolling. Compatible means that at every call site, the sequences of actual convergent function arguments are equal or one is a subset of the other. If you add a callsite, you have to make sure that this property holds. In the case of loop unrolling, I think the constraint ends up being similar to what happens for convergent functions, i.e. as long as you know that the loop has a multiple of N iterations, you can unroll the loop N times. The effect is that the sequence of actual function arguments at the call site in the loop from run r1 on the original program is split into N sequences of actual function arguments, one for each of the N call sites in the loop-unrolled program. If the sequences in r1 and r2 were equal originally, then they're trivially equal afterwards. If the sequence in r1 was a subsequence of the sequence in r2, then this must mean that the loop took fewer iterations in r1 than in r2. Then it follows that the sequences in r1 are still subsequences of the corresponding sequences in r2. (I think this approach is fine for the proof that unrolling exactly N times satisfies the condition. For the reverse proof that loop transformations satisfying the condition actually do the right thing with respect to the real SIMT model, I'm cheating slightly because there could be an outer loop so that there is not necessarily a clear implication from subsequence -> fewer iterations, but then e.g. use the freedom to choose r1 and r2 in such a way that the outer loop is run through only once.)
include/llvm/IR/Intrinsics.td
54	I'd be happy with it. When I posted something like this on the mailing list initially, Mehdi asked for clarification :)

mehdi_amini added inline comments.Nov 8 2016, 9:36 AM

include/llvm/IR/Intrinsics.td

My concern on the mailing was the original definition was based on "uniformity".

Anyway, what about this example:

if (cond) {
  Tmp1 = ...
  Foo(Tmp1 [convergent])
} else {
  Tmp2 = ...
  Foo(Tmp2 [convergent])
}

I think your definition does not prevent:

if (cond) {
  Tmp1 = ...
} else {
  Tmp2 = ...
}

Tmp3 = phi [Tmp1, Tmp2]

if (cond) {
  Foo(Tmp3 [convergent])
} else {
  Foo(Tmp3 [convergent])
}

However it breaks uniformity.

nhaehnle added inline comments.Nov 8 2016, 9:56 AM

include/llvm/IR/Intrinsics.td
54	That transform is allowed. It will cause some inefficiencies in the code generation, but the semantics are indeed okay. At each call site of Foo, the argument is "dynamically uniform" using the language of the GLSL specification, i.e. all threads that actually go through the call site will see the same value (of course assuming that this was true for the original program). So maybe it's really just a matter of finding the right wording.

mehdi_amini added inline comments.Nov 8 2016, 9:57 AM

include/llvm/IR/Intrinsics.td
54	OK let's just do one extra transformation: if (cond) { Tmp1 = ... } else { Tmp2 = ... } Tmp3 = phi [Tmp1, Tmp2] Foo(Tmp3 [convergent])

jlebar added inline comments.Nov 8 2016, 10:00 AM

docs/LangRef.rst
1168	Oh, I see what you mean now! Okay. My problem was I was trying to compare call sites before and after the transformation, but you only compare calls made within one piece of code (i.e. either before compared to before or after compared to after). Now that I've slept on it (and understood my mistake here), I think that reasoning about how programs run is going to be pretty complicated -- to the point that I am still not 100% sure this is correct. If we can agree on a data-flow formulation, I'd probably find that a lot easier to reason about myself. It sounds like Mehdi doesn't have as much of a problem with it as you might have thought?

nhaehnle added inline comments.Nov 9 2016, 5:14 AM

docs/LangRef.rst
1168	Right, the comparison is within one piece of code. See the last example that Mehdi brought up as an example for why a pure data-flow formulation may be challenging.
include/llvm/IR/Intrinsics.td
54	That transformation is incorrect, and this RFC patch already disables it (in SimplifyCFG/tail sinking). A simpler example would be just: if (cond) { Tmp1 = Foo(v [convergent]) } else { Tmp2 = Foo(v [convergent]) } Tmp3 = phi [Tmp1, Tmp2] to Tmp3 = Foo(v [convergent]) This transform is incorrect in general because cond could be something like `v == 0`, where v is an int that can only take values 0 or 1. Formally, the constraint formulation with runs r1 and r2 forbids this transform because you can choose r1 = { cond = true, v = A }, r2 = { cond = false, v = B }. These runs are compatible in the original program, but incompatible in the transformed program. (This is basically what happens in the linked bug report.) The transform would be correct if the branch condition were proved to be "uncorrelated" with the convergent argument. This can sometimes be done with target-specific knowledge. This target-specific knowledge effectively restricts which pairs of runs need to be considered in the runs-based constraint formulation.

mehdi_amini added inline comments.Nov 9 2016, 8:04 AM

include/llvm/IR/Intrinsics.td
54	I'm addressing the definition proposed here: " The values to this argument are convergent, and calls to the intrinsic may not be modified in a way that makes the argument value control-dependent on additional values." I don't see how any of this two definition forbids the transformation I mentioned. Or the one in LangRef: "When a function argument has the `convergent` attribute, a call to the function may only be transformed in a way that does not add additional sources of divergence to the argument." It is using the concept of "divergence" without defining it, so this one is more iffy, but still not clear.

hfinkel added a subscriber: hfinkel.Nov 9 2016, 8:11 AM

hfinkel added inline comments.

include/llvm/IR/Intrinsics.td
54	It is using the concept of "divergence" without defining it, so this one is more iffy, but still not clear. Yea, we normally talk about this in terms of adding control dependencies, but we'll need to carefully define what we mean here.

nhaehnle added inline comments.Nov 10 2016, 4:13 AM

include/llvm/IR/Intrinsics.td
54	Okay, so how about in this location in Intrinsics.td we remove the "control" from "control-dependent" and refer explicitly to the IR attribute? Then at least it's clear that the proper definition is located in one specific place. Maybe as short as: // Convergent - The values to this argument are convergent. This corresponds // to the IR function argument attribute of the same name. Then we can focus on the language in LangRef - any feedback on that?

mehdi_amini added inline comments.Nov 10 2016, 8:47 AM

include/llvm/IR/Intrinsics.td
54	The quote I posted just above about using "source of divergence" comes from LangRef.

hfinkel added inline comments.Nov 10 2016, 9:39 AM

include/llvm/IR/Intrinsics.td
54	No, because there are important differences. Both refer to control dependencies, but on a function it refers to control dependencies on the execution of the function, but on the value, it is talking about control dependencies on the value being passed into the function, right?

mehdi_amini added inline comments.Nov 14 2016, 10:07 PM

include/llvm/IR/Intrinsics.td
54	@hfinkel: were you answering to me? If so it looks like we're talking past each other.

hfinkel added inline comments.Nov 14 2016, 11:00 PM

include/llvm/IR/Intrinsics.td
54	@mehdi_amini: no, I was responding to @nhaehnle

Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

Fixes fdo#97988 for me (together with the Mesa part attached to that bug report).

@mehdi_amini: I changed Intrinsics.td like I suggested, so that only LangRef
should matter.

In LangRef, I removed the 'source of divergence' part and instead made the
definition of compatibility with two runs as discussed in the sub-thread
with @jlebar more explicit. I don't know if that's more to your taste, at
least I couldn't find anything clearly wrong with it yet.

@hfinkel: I'm not sure what you were replying to exactly. The current
version does not talk about control-dependencies anymore at all, because
that language does not make it obvious what should happen with Mehdi's
example, i.e. you're not allowed to eliminate the branches in:

if (cond) {
  Tmp1 = Foo(v [convergent])
} else {
  Tmp2 = Foo(v [convergent])
}
Tmp3 = phi [Tmp1, Tmp2]

"Control-dependencies on a value" doesn't really capture that. Unless you
understand that the control-dependencies of a value depend on which basic
block you're reading the value in. But that's not how people would
generally understand it, right?

Herald edited edge metadata. · View Herald TranscriptNov 16 2016, 3:04 PM

mehdi_amini added inline comments.Nov 16 2016, 3:45 PM

docs/LangRef.rst
1147	`uniform` isn't defined in the LangRef.

mehdi_amini added inline comments.Nov 16 2016, 3:45 PM

docs/LangRef.rst
1155	(Ignore this, phab bug)
1161	It is not clear to me what is r1/S1 and r2/S2 that prevent to CSE: if (cond) { Tmp1 = Foo(v [convergent]) } else { Tmp2 = Foo(v [convergent]) } Tmp3 = phi [Tmp1, Tmp2] to: Tmp3 = Foo(v[convergent]) But your definition reference a "call site" as if there is a one-to-one mapping before/after a transformation.

nhaehnle added inline comments.Nov 17 2016, 1:57 AM

docs/LangRef.rst
1147	Right, this is also used in the paragraph below. This whole paragraph isn't intended to be normative, but to provide context that allows one to understand the actual condition. Perhaps this could be phrased better. What about: "In some parallel execution models, there exist operations that are executed simultaneously for multiple threads, and one or more arguments of the operation must have the same value across all simultaneously executing threads."
1161	The definition of "compatible" and everything inside it is completely independent of transformations. It talks about two runs of the same program. Your example is forbidden because in the original program, the runs `r1 = { cond = true, v = V0 }` and `r2 = { cond = false, v = V1 }` (with V0 != V1) are compatible (the first call-site of Foo has the sequences r1(CS) = ( V0 ) and r2(CS) = ( ), i.e. r2(CS) is a subsequence of r1(CS); similarly for the second call-site), but in the transformed program they're not (the single call-site of Foo has the sequences r1(CS) = ( V0 ) and r2(CS) = ( V1 ), and neither is a subsequence of the other). I do think that the language I used is pretty clear on that, actually, but since both you and @jlebar seem to have had the same difficulty, I'd like to understand that better. Perhaps it would help to swap those two paragraphs around as follows? Program transformations must ensure that every pair of runs that is compatible with respect to convergent function attributes in the original program must also be compatible in the transformed program. Compatibility is defined as follows: Two runs r1 and r2 of a program are said to be compatible (wrt convergent function attributes) if ... Maybe that helps the reader to parse the definition of compatibility because it first establishes a context where compatibility is about runs within a program rather than runs across multiple (transformed) versions of a program.

Add missing handling of the convergent attribute in GVNHoist and
LoopUnroll.

Also update LangRef with the suggested changes to phrasing.

Herald edited edge metadata. · View Herald TranscriptNov 17 2016, 4:54 AM

Herald added a subscriber: mzolotukhin. · View Herald Transcript

hfinkel added inline comments.Nov 17 2016, 6:51 AM

docs/LangRef.rst
1163	Please write out "with respect to." Also, you mean convergent function-parameter attributes, not function attributes. The runs of the program terminology I find bothersome, in part because that's not what we're talking about. We're talking about the behavior of different threads within the same program. I think it would be better to say, "Two executions of the calling function, e1 and e2, are compatible..." Then we can say that transformations must preserve compatibility w.r.t. convergent function-parameter attributes for all potential executions of the calling function. We might even further specify that this is for potential simultaneous executions. Finally, I'd really like it if you would include the example we've been discussing in the reference as a non-normative example. I did not really understand the definition until I read @nhaehnle's explanation above.

nhaehnle added inline comments.Nov 18 2016, 11:18 AM

docs/LangRef.rst
1163	I'm only one person :) I'm making the editing changes and adding the example. s/run/execution/ is fine with me, and I'm adding that as an alternative below, but whenever I try to formulate some text that talks about executing a calling function rather than executing a function, I feel confused about the calling vs. the callee function. Perhaps you can clarify: The runs of the program terminology I find bothersome, in part because that's not what we're talking about. We're talking about the behavior of different threads within the same program. What's the difference between threads and executions in this context?

Editing changes and s/run/execution/ alternative.

Herald edited edge metadata. · View Herald TranscriptNov 18 2016, 11:19 AM

hfinkel added inline comments.Nov 18 2016, 2:52 PM

docs/LangRef.rst
1163	but whenever I try to formulate some text that talks about executing a calling function rather than executing a function, I feel confused about the calling vs. the callee function. Feel free to establish some terminology early in the definition for later use. We're talking about a function-parameter attribute. That attribute must appear on the function parameters of some call site. That call site exists in some function (the caller a.k.a. the calling function) and calls some other function (the callee function). What's the difference between threads and executions in this context? As I think about it, a thread is some entity that executes things (e.g. functions). Threads execute functions, and each time a thread executes some function, that constitutes an execution of that function.

jlebar added inline comments.Nov 20 2016, 12:17 PM

docs/LangRef.rst
1153	While the `convergent` attribute on functions can be thought of as "whether and when each call site to this function is reached must be uniform across multiple threads", I know this isn't meant to be normative, but I'm a little concerned by this language. Specifically, our semantics say that the compiler is not allowed to introduce new sources of divergence to convergent calls, but it is also not allowed to assume that convergent calls are uniform. I presume the same is true for convergent args -- the compiler cannot introduce divergence into the arg values, but it also cannot assume anything about the arg values as a consequence of their being marked convergent. I know rewriting this to be precise might defeat the purpose of your summary here. But could we add some weasel words so that people can't point to this as justification that divergent calls / args marked with `convergent` are LLVM UB?
1159	Nit, suggest putting "compatible" in quotes so it's clear it's a term we're about to define.

hfinkel added inline comments.Nov 20 2016, 12:37 PM

docs/LangRef.rst
1153	This is a good point. We might want to say something like that there is some implementation-defined set of simultaneously-executing threads for which the restriction holds. Thus we won't be able to make an target-independent assumptions about the implied behavior.

mehdi_amini added inline comments.Nov 20 2016, 12:40 PM

docs/LangRef.rst
1153	I don't see why we should mention anything about thread or concurrency in the LangRef? Also, I'm not thrilled by this paragraph either (to begin with, the sentence about the convergent attribute on function isn't clear to me).

nhaehnle added inline comments.Nov 21 2016, 7:43 AM

docs/LangRef.rst
1153	Hmm. I wonder what the general opinion is on whether this paragraph is still helpful at all. The way I see it, it can serve two purposes: Explain the attribute in an intuitive way to someone with an SIMT/SPMD background. Explain why the same term is used here for a function-parameter attribute and elsewhere as a function attribute. But if it just adds confusion for others, maybe it's best to remove it rather than discuss it to death? After all, the first purpose is kind of fulfilled already by the paragraph before that... If the paragraph is kept around, then I agree with something like: ... whether and when each call site to this function is reached must be uniform across a target-dependent set of simultaneously-executing threads ...
1159	Sure, no problem.
1163	I see. I played around with this some more, and here's an example I'd like to hear your thoughts on: define void @foo(i32 %a) { ... call void @bar(i32 %a) ... } define void @bar(i32 %a) { call void @baz(i32 convergent %a) } This code has a potential problem because at least the parameter of bar should arguably have been labelled convergent. I'd say that inlining shouldn't have to worry about the convergent attribute at all. However, if the definition we're discussing here takes a function-centric view, then we cannot prove in a target-independent way that inlining bar is correct (because foo gets a new callsite with a convergent function argument, where previously there were no such callsites, i.e. every pair of runs of foo was compatible). If the definition takes a program-centric view (i.e., executions that start at foo implicitly also consider the sequence of arguments passed into baz inside bar), then inlining bar is clearly correct. That would be an argument in favour of staying with the language about "executions of a program" rather than "executions of a function".

nhaehnle added inline comments.Nov 21 2016, 8:15 AM

docs/LangRef.rst
1163	Hmm, looking at other transforms of similar cases, I think we have to say that cases like this are undefined behaviour. Curious, I think the same applies to the convergent attribute for functions: if a function F contains calls to a convergent function G, but is not itself marked convergent, then calls to F should be undefined behaviour, right? LangRef is currently somewhat silent on this particular point.

hfinkel added inline comments.Nov 21 2016, 8:20 AM

docs/LangRef.rst
1163	The execution of a function semantically includes the execution of any functions called, so I still feel this is the better terminology. We're talking about preventing transformations that invalidate executions of individual program executions, comparing different program executions is irrelevant, but each single program execution can contain multiple simultaneous executions of the functions in question. I think that the inlining would be correct regardless. Nothing that the inliner does will change the convergence properties of function arguments, as far as I can tell. However, you also raise a second important point. If the first argument of `@bar` is not marked as convergent, then other optimizations around calls to `@bar` can ruin the convergent property regardless. Thus, it too needs to be marked.

hfinkel added inline comments.Nov 21 2016, 8:23 AM

docs/LangRef.rst
1163	Regarding the convergent function attribute, yes. This is why Clang in CUDA mode marks all functions as convergent and then we remove the attribute when we prove it is not needed (in lib/Transforms/IPO/FunctionAttrs.cpp).

Clarify that the set of threads is target-dependent (I'm still open to eliminating that paragraph outright if people find it confusing)
Talk about executions of functions
No mention of concurrency or simultaneous execution (except for the motivational explanation in the first paragraph), per @mehdi_amini
Add an explicit paragraph about calling functions containing call-sites with convergent function parameters

On the last point, this is fairly restrictive right now. E.g. one could
ask what happens if the called function F passes a value that was loaded
from memory to a convergent parameter. At least for AMDGPU purposes it's
fine to leave that undefined, and probably will remain so for a long time.

When/If it does become an issue, I suspect that solving this will require a
new function attribute (convergentmem?). However, I'd rather leave this
open for now.

Herald edited edge metadata. · View Herald TranscriptNov 23 2016, 6:15 AM

hfinkel added inline comments.Dec 2 2016, 4:43 PM

docs/LangRef.rst
1147	I think that would be better. I still find the current text in this paragraph and the next confusing because understanding them hinges on understanding the term `uniform` which we never define.
1164	So, the clarify, we need to ensure compatibility between all call sites with a convergent argument, or all call sites with a convergent argument to the same function, or all call sites with a convergent argument subject to some other condition?

nhaehnle added inline comments.Dec 7 2016, 6:59 AM

docs/LangRef.rst
1147	Okay, that's what the current version says.
1164	I hope we're not talking past each other, but for the definition of compatibility, each call site is considered separately. Compatibility is a property of a pair of executions; it is satisfied if a certain property is satisfied at each call site to a function with convergent arguments. When one proves that a transformation preserves compatibility, one would look at each call-site in the transformed code, and prove the property based on the fact that it was satisfied at some call-site in the original code, but the definition of compatibility itself is never concerned with the interaction or relationship of multiple call sites to each other.

hfinkel added inline comments.Dec 7 2016, 7:15 AM

docs/LangRef.rst
1147	Yes, but it still says uniform in the paragraph below. Please either define that term or don't use it.
1164	but for the definition of compatibility, each call site is considered separately. That's what I thought, thanks! I just wanted to make sure.

Since we seem to be getting close to agreement on the langref wording, I went ahead and looked at the rest of the code. Just one minor comment, otherwise it looks good to me. I guess I have a broader question of, did you audit in order to make sure that these are the only places where we need to handle specifically convergent args? Do you need a reviewer to check that?

lib/Transforms/Utils/SimplifyCFG.cpp
1443	I don't understand the "because" clause here; can we rephrase?

This revision is now accepted and ready to land.Dec 7 2016, 2:05 PM

I don't believe we're there yet on the wording.

This revision now requires changes to proceed.Dec 7 2016, 2:19 PM

divergence is used in multiple places in comments in the code, it isn't great as it is based on SIMT divergence. And even worse it is *data* divergence while we usually consider the control flow when talking about divergence on GPU. I don't like the fact that we spread this everywhere slowly.

I also think we should have someone else look at the LangRef (with a fresh look) change as well, the wording seems "correct" to me, but not necessarily easy to understand or easy to forge a mental model for the engineer working on optimizations on the LLVM IR. @sanjoy or @chandlerc maybe?

docs/LangRef.rst
1149	I don't like this paragraph, and especially the use of "threads", which in general does not imply the lock-step model. It seems like a loose definition that makes little sense for anyone not versed into SIMT and GPU execution model.
1154	I believe: `s/function argument/function parameter/` ( http://llvm.org/docs/LangRef.html#parameter-attributes ) (Check the other places as well)
1156	This paragraph isn't providing much either, refers to a definition of the `convergent` attribute on functions that is not present elsewhere in the document, and is still using "uniform".
1163	What is the current status on this? How do we propagate/infer this attribute?

Thank you for taking a look.

I mostly agree about the use of 'divergence', I think the places where it's necessary can be filled in using the language of executions. However, there's definitely precedence for talking about divergence of data in LLVM. The DivergenceAnalysis pass explicitly determines divergence vs. uniformity of data. The purpose is ultimately to decide whether control flow is uniform or divergent, but to decide that, it obviously has to look at the divergence of data as well because branch conditions are data.

docs/LangRef.rst
1149	"threads" is literally the term used by AMD for the thing we're talking about: threads executing in lock-step within a single "wave". It would be counter-productive to deviate from that. And yes, "thread" doesn't always imply lock-step, but the paragraph does explicitly talk about "operations that are executed simultaneously for multiple threads". Maybe this would help drive the point home even more: In some parallel execution models, there exist operations that are executed by a single hardware execution unit on behalf of multiple threads simultaneously and one or more parameters of the operation ... ... but I was previously encouraged to stay away from talking about hardware in the LangRef, so I have to say that this feels increasingly like bike-shedding to me. I'd actually be happy to use language that admits that at the end of the day, this is all about doing something useful with hardware, because I think it helps understand the intuition behind the definition. But the back-and-forth needs to converge at some point (in the limit sense, not in the attribute sense ;-)).
1154	Ok, will do.
1156	Ok, let's just kill it. Of all the fuss about this definition, the one thing that this paragraph was really meant to address hasn't ever come up, so... (although, perhaps it hasn't come up precisely because the paragraph served its purpose :))
1163	See the paragraph starting at line 1198 ("Calling a function F that contains a call-site CS..."). Also, to quote myself about what that paragraph has to say: this is fairly restrictive right now. E.g. one could ask what happens if the called function F passes a value that was loaded from memory to a convergent parameter. At least for AMDGPU purposes it's fine to leave that undefined, and probably will remain so for a long time. We currently have no need to actually either infer or remove the attribute, but while the paragraph doesn't explicitly state it, it contains all the definitional power to work with and without a flow that does what is done for the convergent function attribute in some compilation flows: add it everywhere in the beginning, then remove it in places where it can be proved to be unneeded.
lib/Transforms/Utils/SimplifyCFG.cpp
1443	Yes, let's rephrase using the language of executions: // We cannot sink calls if any parameter is convergent: two executions that go // through different predecessor blocks are compatible with respect to // convergent parameters at the considered call-sites before sinking even if // the concrete values passed to the called function are different. After sinking, // they are no longer compatible, because different concrete values are passed // to the called function at the same call-site. Or the strategically better variant: // We cannot sink calls if any parameter is convergent.

In D26348#616989, @nhaehnle wrote:

Thank you for taking a look.

I mostly agree about the use of 'divergence', I think the places where it's necessary can be filled in using the language of executions. However, there's definitely precedence for talking about divergence of data in LLVM. The DivergenceAnalysis pass explicitly determines divergence vs. uniformity of data. The purpose is ultimately to decide whether control flow is uniform or divergent, but to decide that, it obviously has to look at the divergence of data as well because branch conditions are data.

We need to clarify, there are two parts in my comments:

"divergence" is not common in LLVM. I stand on this point: the DivergenceAnalysis is a GPU specific pass and is hardly part of the core semantic of the IR.
"divergence" as it is usually encounter is about control flow and not data flow.

About 2, since you're pointing at the DivergenceAnalysis pass, it mentions This file implements divergence analysis which determines whether a branch in a GPU program is divergent.. My understand of divergence in this context has always been a control flow divergence, which does not "break" SSA.

I mean that unless I missed something here, the usual notion of divergence does not prevent the folding we took as an example in LangRef for this patch:

if (cond) {
     Tmp1 = Foo(v [convergent])
   } else {
     Tmp2 = Foo(v [convergent])
   }
   Tmp3 = phi [Tmp1, Tmp2]

too

Tmp3 = Foo(v [convergent])

I'm looking at sinking may add divergence even when the argument is the same IR value in all blocks, here there is no control flow divergence, I don't believe a GPU programmer would say that the transformation above "could create divergence".

(Of course I understand the comment, and it makes perfect sense to me, but I have been following this patch and it is hard to judge, which is why I asked for @chandlerc or @sanjoy to provide another look at this).

docs/LangRef.rst
1149	The fact that AMD is using "threads" is irrelevant to me: we're not writing an AMDGPU documentation here. This is why the use of `threads` alone seems wrong and misleading to me in the context of this document. (My point wouldn't stand in a GPU-centric document, but the GPU model is not part of the LangRef right now) Otherwise, I would prefer that we spell it clearly, something along these lines: GPUs (and other similar hardware) are frequently based on a SIMT (or SPMD) model, where a single hardware execution unit executes identical operations simultaneously on behalf of multiple threads of execution. Some of these operations require that one or more arguments of the operation must have the same value across all simultaneously executing threads. Such arguments are called ``convergent``.

In D26348#617266, @mehdi_amini wrote:

We need to clarify, there are two parts in my comments:

"divergence" is not common in LLVM. I stand on this point: the DivergenceAnalysis is a GPU specific pass and is hardly part of the core semantic of the IR.

Fair enough.

"divergence" as it is usually encounter is about control flow and not data flow.

About 2, since you're pointing at the DivergenceAnalysis pass, it mentions This file implements divergence analysis which determines whether a branch in a GPU program is divergent.. My understand of divergence in this context has always been a control flow divergence, which does not "break" SSA.

I mean that unless I missed something here, the usual notion of divergence does not prevent the folding we took as an example in LangRef for this patch:
if (cond) {
     Tmp1 = Foo(v [convergent])
   } else {
     Tmp2 = Foo(v [convergent])
   }
   Tmp3 = phi [Tmp1, Tmp2]
too
Tmp3 = Foo(v [convergent])
I'm looking at sinking may add divergence even when the argument is the same IR value in all blocks, here there is no control flow divergence, I don't believe a GPU programmer would say that the transformation above "could create divergence".

(Of course I understand the comment, and it makes perfect sense to me, but I have been following this patch and it is hard to judge, which is why I asked for @chandlerc or @sanjoy to provide another look at this).

Within the scope of LLVM, I agree with you. In the actual context of GPU programming, at least the GLSL spec doesn't really use "divergent" anyway and instead talks about uniform vs. non-uniform. Rules involving (dynamically) uniform data and uniform control flow are actually roughly equally frequent. It's just that when you write a compiler for it, the control flow part is by far the bigger issue. The data part mostly takes care of itself from the compiler perspective, especially when optimizations aren't too aggressive, which is why we managed to miss the issue that this patch addresses for so long.

That said, I agree that "create divergence" is unfortunate phrasing either way. If one wanted to stick to "divergence", it would be better to say that it exposes divergence or something along those lines.

docs/LangRef.rst
1149	My point was that inventing a new term to use instead of 'threads' is counter-productive, but it seems we agree on that anyway. I agree that something needs to be added for the purpose of LangRef, and I originally thought that talking about "simultaneously executing" would be sufficient, but clearly it isn't. I'd be happy to use your wording.

Few comments inline

docs/LangRef.rst
1147	(Based on what I've heard about GPUs) Should you emphasize that they're executed in "lockstep"? Or is that not a correct statement?
1158	I think you need to be specific and explicit about how you're relating "executions" between the pre-transform and the post-transform functions. For instance if the pre-transform function is: void f(int* val dereferenceable(8)) { int k = val } and the post-transform program is: void f(int val dereferenceable(8)) { int k0 = (long )val; int k = k & 0xff..ff; } an execution of the pre-transform program where `k = 400` may not make sense in the context of the post-transform program since you need to produce a 8 byte value for `k0`.
1175	What if `Foo` is a function pointer that we loaded from some place (and thus we don't have a declaration that we can inspect to know for sure if the argument is marked `convergent`)? This rule would imply we can't do this sinking transform in that case.
1199	This is a little tricky -- what if the source program is: void f(int convergent i); void g(int convergent j, int /* normal */ k) { f(j - j + k); } Can we simplify the call site to call `f(k)`? If so, did we introduce UB in functions that call `g`?

mehdi_amini added inline comments.Dec 9 2016, 4:51 PM

docs/LangRef.rst
1199	Isn't your source program already UB? You have a call site with an argument that is data-dependent on a non-convergent parameter, `k`. The kind of annoying and SSA-breaking limitation of convergent would be: void f(int convergent i); void g(int convergent j, int /* normal */ k) { if (j == k) f(j); } You can't replace `f(j);` with `f(k);` IIUC.

sanjoy added inline comments.Dec 9 2016, 5:11 PM

docs/LangRef.rst
1199	You're right, I misread "only data-dependencies" as "there must be at least one convergent data dependencies". The other interesting corner case here is (assuming I did not misread this second time around) around PHI nodes and selects -- if we have: void @other_convergent(int convergent x); void @f(int convergent k, int /* normal */ l) { entry: br cond label left, label merge left: br label merge merge: val = PHI [ k, entry ], [ l, left ] call @other_convergent(val); } Is calling the function above well defined if `cond` is is always `false`? What happens if we convert the phi to a select? What happens when we convert the select to arithmetic?

Thank you for taking a look!

docs/LangRef.rst
1147	It is a correct statement; whether it needs to be emphasized is anyone's guess ;-) I didn't think so, but it seems that this a bit of a point of contention, so probably something like it is required. Did you see @mehdi_amini 's alternative wording for this paragraph? Copying it here: GPUs (and other similar hardware) are frequently based on a SIMT (or SPMD) model, where a single hardware execution unit executes identical operations simultaneously on behalf of multiple threads of execution. Some of these operations require that one or more arguments of the operation must have the same value across all simultaneously executing threads. Such arguments are called `convergent`. I'd be fine with it, and I think it's fairly clear.
1158	As far as I'd understand it, the initial state of memory is part of what defines an execution. Since the parameter is dereferenceable(8), that includes 8 bytes of memory even in the original program.
1175	In that case, we may not have a way to specify that the parameter is convergent. This may be a gap in this patch (I haven't actually checked), but not a gap that needs to be filled immediately. This is similar to the convergent attribute of functions: if you're calling through a function pointer, you may want to add the convergent attribute to the call-site. As long as that attribute isn't present, you can assume that the called function is not convergent. The same holds for the function parameter attribute.
1199	@mehdi_amini: This example is interesting, because the transform breaks the condition of the definition, but it is correct in the target semantics. This is very unfortunate, and I don't have an immediate answer for it. @sanjoy's second example is similar. My first intuition would actually have been to say that in all variations, calling the function is undefined. But that's not great if in the future you want to map the semantics of higher-level languages directly to LLVM, including function calls, because there the example is perfectly fine. Anyway, @mehdi_amini's example is the bigger problem. Forcing GVN and others to keep track of the convergent-ness is clearly a bad idea. Thinking out loud, maybe the job could be done by a function attribute that indicates that any parameter may have to be treated as convergent.

Is there any progress on this? I'd be really nice, if this could land so I could enjoy HW acceleration for videos with mpv without the need to build LLVM constantly and carry an additional patch.

In D26348#634025, @curan wrote:

Is there any progress on this? I'd be really nice, if this could land so I could enjoy HW acceleration for videos with mpv without the need to build LLVM constantly and carry an additional patch.

I'm fairly confident it is not gonna ship in LLVM 4.0 as we branch tomorrow.

dark_sylinc added a subscriber: dark_sylinc.Feb 12 2017, 11:23 AM

Ok!

So I landed here after a referral. We were literally hit by this bug in LLVM (https://bugs.freedesktop.org/show_bug.cgi?id=99780)

If anyone needs something tangible to know what a miscompiled program looks like due to a missing convergent argument, this is what the bug looks like:
http://imgur.com/43tIuI6
http://imgur.com/iBhxoBj

From reading this thread I see there are a few problems with the terms, some confussions, as well as understanding why this is needed.

So I'm going to chime in with an explanation (feel free to add it to the code) in hope of speeding up the acceptance/enhancement of this patch:

GCN executes a 'wavefront'. Each wavefront consists 64 'waves' (aka threads). In NVIDIA's lingo, wavefront = warp and wave = thread; and 1 warp consists of 32 threads (though actualy numbers could vary within architectures and is not relevant).

If a video cards needs to process 128 pixels, it will likely do so using at least 2 wavefronts (2 wavefronts x 64 waves = 128).
OpenGL uses the term "uniform" to refer to parameters that will be constant for all wavefronts within the same dispatch.
The problem being addressed here is not uniformity in OpenGL terms, but rather convergence. Convergence is a parameter that must be the same for all waves within the same wavefront, but it is ok if it is different across different wavefronts.

The graph illustrates the idea much better than words:

Terms get messier if we add DirectX terminology, because Microsoft uses the term "const" to refer to what OpenGL understands as "uniform". In fact, DirectX treats uniforms as a synonym of convergence.
I'd suggest you stick to OpenGL terms for consistency with your surrounding ecosystem.

DX12's HLSL (via Shader Model 5.1) allows programmers to explicitly tag variables that require convergence as such (by default DX12 assumes variables do not need to enforce convergence on indexing, but it requires on branches), as this can lead to performance optimizations if we promise in advance our reads won't diverge.
One such example is rendering sprites via textured quads (1 quad = 2 triangles): Every sprite may have its own texture (and therefore index a different texture). However we promise that each triangle will read the same texture. Due to rasterization ordering rules, different triangles won't be put together in the same wavefront therefore giving us the ability to lift away this restriction and have the compiler produce faster code.

Unfortunately GLSL doesn't have this modifier (yet?), but it also assumes variables do not need convergence, just like DX12. The problem however, is that LLVM is compiling code incorrectly:

As per both OpenGL & DirectX rules, the following does not need to have convergence enforced at compiler level:

in int myIndex;
sampler2D myTextures[2];

texture( myTextures[myIndex], uv ); //If myIndex diverges, it's programmer's fault.

However the following DOES require to enforce convergence at compiler level:

in int myIndex;
sampler2D myTextures[2];

if( myIndex == 0 )
    texture( myTextures[0], uv );
else if( 
    texture( myTextures[1], uv );

Currently, the problem is that right now LLVM in the latter case is "optimizing" the code to execute like the former code to save one image_sample instruction, which is wrong.

Right now what I see wrong about the patch' terminology is that it marks variables requiring to enforce convergence at compiler level as "convergent". That's exactly the opposite! It should be tagged as "divergent". If a variable is divergent, then the compiler must enforce convergence. If a variable is already convergent, then it can perform aggressive optimizations. You don't enforce convergence on something that is already convergent!
It's like calling a C variable "const" because it can mutate. I think this contradiction is what's causing a buzzing noise in some of the reviewers.

I propose instead:

Reserve keyword "convergent_branch". If a variable is tagged as such, it means additional optimizations can be done (that could be dangerous if the programmer isn't careful). In C terms, tagging a variable "convergent_branch" is analogous to tagging a C pointer with __restrict in terms of programmer responsability.
Implement keyword "divergent_branch" (what the current patch refers to as "convergent"). If a variable is tagged divergent_branch, it means the compiler must go to great lengths to ensure correct output even if branches diverge.
Reserve the keywords "divergent_index" & "convergent_index" for future use.
Tagging a variable as both convergent_branch and divergent_branch at the same time would be illegal and contradictory.
Tagging a variable as both convergent_index and divergent_index at the same time would be illegal and contradictory.
It is legal for a variable to be both divergent_index and convergent_branch at the same time, or both convergent_index and divergent_branch.
Full level paranoia would be to tag a variable as both divergent_index & divergent_branch
Maximum optimization can be obtained by tagging a variable as both convergent_index & convergent_branch

About terms:
What does a variable that is marked as divergent mean?
It means the compiler must enforce results are correct even if the execution diverges within the same wavefront. We say it is divergent because the compiler must enforce convergence at code level.
What does a variable that is marked as convergent mean?
It means the compiler is free to optimize code in a way that could result in improper output if the programmer doesn't guarantee convergence via external means (e.g. by the data it will feed to the program, by exploiting rasterization rules, etc)

I hope this explanation and proposal (to change the name of the keyword currently being used) shed lights to all of you; and accelerates the inclusion of the final patch.
I'm not a language lawyer, I'm not an LLVM expert either. I'm just a graphics guy who was pissed enough to care.

Cheers
Matias

I've given it a little more thought during coffee break.

divergent_branch, divergent_index, convergent_branch & convergent_index aren't enough.
At least one more modifier is needed: cannot_diverge (or similar name).

By default most variables would be convergent_branch convergent_index; while texture variables would be divergent_branch convergent_index. These defaults could be changed on demand (i.e. to accomodate to GPU arch needs).

Consider the following examples:

Example 1:

float4 texture( cannot_diverge_branch sampler2D texSampler, float2 uv ); //Internally implemented by LLVM to map to a GPU instruction(s).

divergent_branch convergent_index sampler2D myTexture0;
divergent_branch convergent_index sampler2D myTexture1;
convergent_branch convergent_index int condValue;
convergent_branch convergent_index float2 uv;

void main()
{
    if( condValue > 1 )
        outColour = texture( myTexture0, uv );
    else
        outColour = texture( myTexture1, uv );
}

Here in this case, myTexture0 & myTexture1 are both marked as divergent_branch. Because we enter a branch and we are calling texture() which has cannot_diverge on its argument, the following should happen:

LLVM checks that an argument with divergent_branch was passed to a function that requires this argument to not diverge.
The branch can only be flattened if it performs 2 texture samples then branchless select. It cannot do branchless select first, then perform the 1 sample.
myTexture0 can not be factored out of the branch to do branchless selection with myTexture1.
myTexture1 can not be factored out of the branch to do branchless selection with myTexture0.

If myTexture0 & myTexture1 were both to be tagged as convergent_branch instead, then there would be no problem.

Example 2:

divergent_branch convergent_index sampler2D myTexture0;
divergent_branch convergent_index sampler2D myTexture1;
convergent_branch convergent_index int condValue;
convergent_branch convergent_index float2 uv;

float4 doSomething( sampler2D texSampler, float2 uv );
{
    return uv.xyxy;
}

void main()
{
    if( condValue > 1 )
        outColour = doSomething( myTexture0, uv );
    else
        outColour = doSomething( myTexture1, uv );
}

Here, 'doSomething' was called. However its argument does not have cannot_diverge; therefore the branch can still be flattened in any way. myTexture0 is not used, but if it could be used in any form (arithmetic operations??? not possible but just for the sake of the example); it could be moved by the optimizer outside the loop for branchless select against myTexture1.

Example 3:

divergent_branch convergent_index sampler2D myTexture0;
divergent_branch convergent_index sampler2D myTexture1;
convergent_branch convergent_index int condValue;
convergent_branch convergent_index float2 uv;

float4 doSomething( sampler2D texSampler, float2 uv );
{
    return texture( texSampler, uv.xy );
}

void main()
{
    if( condValue > 1 )
        outColour = doSomething( myTexture0, uv );
    else
        outColour = doSomething( myTexture1, uv );
}

Here, doSomething() does not have 'cannot_diverge' attribute, but it ends up calling texture() which does. Therefore the same outcome as example 1 ends up happening.

Example 4:

divergent_branch convergent_index sampler2D myTexture0[4];
divergent_branch convergent_index sampler2D myTexture1;
convergent_branch convergent_index int condValue;
convergent_branch convergent_index int index;
convergent_branch convergent_index float2 uv;

float4 doSomething( sampler2D texSampler, float2 uv );
{
    return texture( texSampler, uv.xy );
}

void main()
{
    if( condValue > 1 )
        outColour = doSomething( myTexture0[index], uv );
    else
        outColour = doSomething( myTexture1, uv );
}

Here myTexture0 was indexed. Since the sampler is marked as convergent_index, the comparison of cannot_diverge_branch && divergent_branch is ignored. This means the myTexture0 can be factored out of the loop, branch could be flattened in any way, etc.

Example 5:

divergent_branch convergent_index sampler2D myTexture0[4];
divergent_branch convergent_index sampler2D myTexture1;
convergent_branch convergent_index int condValue;
convergent_branch convergent_index int index;
convergent_branch convergent_index float2 uv;

float4 doSomething( sampler2D texSampler, float2 uv );
{
    return texture( texSampler, uv.xy );
}

void main()
{
    if( condValue > 1 )
        outColour = doSomething( myTexture0[divergent_index(index)], uv );
    else
        outColour = doSomething( myTexture1, uv );
}

This is like Example 4; but the index caused the variable texSampler = myTexture0[divergent_index(index)] to become divergent_index as well.
This means the compiler now behaves like in Example 1.

In summary, the final logic inside LLVM would be something like this:

if( function.argument.cannot_diverge_branch &&
    (variable.divergent_branch || variable.divergent_index) &&
    !variable.was_converge_indexed /*set to true if it was read from an array without divergent_index(index)*/ )
{
    beCarefulWithFlattenning( variable );
    cannotMoveOutOfBranch( variable );
}

I don't know how easy/hard it would be to implement all of this; but it is clear the number of use cases cannot be handled easily with just one attribute if one intends to compile the code to produce correct output AND optimize as much as possible.

Of course at the time being implementing divergent_branch & cannot_diverge is the most urgent matter. The rest of the keywords could be left reserved.

May I point out this bug does not only produce graphical artifacts. It can also cause GPU hangs:

int nCount = 0;
if( someCond )
    nCount = texture( myTex0, uv );
else
    nCount = texture( myTex1, uv );
    
for( int i=0; i<nCount; ++i )
{
    ...
}

What happens if nCount is read from the wrong path and ends up a very large value? That's right. GPU hang (if someone's lost here; a GPU hang is whenever the GPU stops responding to perform graphics commands like updating the display. This can be caused because of HW malfunctions, it crashed, or it's stuck doing a very long computation).
Considering there's a lot of GPU hang tickets out there in the radeonsi bugtracker, I wouldn't be surprised if one of these hangs is actually caused by this. I cannot stress enough this should be a priority bug.

In D26348#674624, @dark_sylinc wrote:

Currently, the problem is that right now LLVM in the latter case is "optimizing" the code to execute like the former code to save one image_sample instruction, which is wrong.

I'd like to clarify here that the existing bug is not really in the LLVM optimizer, but rather in the fact that graphic frontend are not emitting these calls with the right set of conservative attributes. The immediate fix would be for them to change this and not leave the optimizer mess up with these operations. (And not to put pressure on LLVM to adopt this patch as a correctness fix, it is not, or only incidentally).
(Of course being conservative means losing some optimizations possibilities in some cases, and the purpose of this patch is to help getting these back with finer modeling of the constraint).

Right now what I see wrong about the patch' terminology is that it marks variables requiring to enforce convergence at compiler level as "convergent". That's exactly the opposite! It should be tagged as "divergent". If a variable is divergent, then the compiler must enforce convergence. If a variable is already convergent, then it can perform aggressive optimizations. You don't enforce convergence on something that is already convergent!

I'm not sure I follow. We're maybe not talking about the same thing. We not constraining variables here, but instead "operations" if I get it correctly (going back to this every two months isn't making it easy).
IIUC, taking your examples:

int myIndex;
 sampler2D myTextures[2];
 texture( myTextures[myIndex], uv ); //If myIndex diverges, it's programmer's fault.

The information we're trying to carry right now with this patch, is not about myTextures[myIndex], but about the fact that a call to texture needs to preserve the convergence on its first argument.

We are *not* carrying any information supplied by the programmer about the convergence property of a variable. We can just assume that in the source program the first argument of a call texture is convergent, and we need to maintain it.

I propose instead:
[snip]

None of it seems to fit LLVM IR but looks rather like a language specification to me.
As a starting point, LLVM IR doesn't have any concept of "variables". We have SSA values and memory (which can be stack allocations or not).

It also seems to me that you're trying to bring the source-level programmer annotations of convergence on source program variables, in order to enable more optimizations. There may be some optimizations to do around this, but that seems orthogonal to what this patch is doing which is, as far as I understand, about modeling the correctness constraint associated with a call to texture (i.e. to express what is the property that needs to be maintained on the first argument).

In D26348#674679, @dark_sylinc wrote:

I've given it a little more thought during coffee break.

divergent_branch, divergent_index, convergent_branch & convergent_index aren't enough.
At least one more modifier is needed: cannot_diverge (or similar name).

You just figured what this patch is about :)

The only thing the current patch is doing is exactly adding this knowledge to LLVM, you wrote

float4 texture( cannot_diverge_branch sampler2D texSampler, float2 uv ); //Internally implemented by LLVM to map to a GPU instruction(s).

This patch is proposing:

float4 texture(convergent sampler2D texSampler, float2 uv ); //Internally implemented by LLVM to map to a GPU instruction(s).

I don't think we really need to care about tagging variables right now, there might be some more optimizations possible, but that should be addressed separately.

I cannot stress enough this should be a priority bug.

As I mentioned previously, I believe it is very easy to fix, no need to change LLVM: marking the texture operation as having unknown side-effect should be enough.

Hi!

As I said I'm not an LLVM expert so if someone thinks I'm confusing IR with source level aspects then I won't get in the way.

You just figured what this patch is about :)

The only thing the current patch is doing is exactly adding this knowledge to LLVM, you wrote
float4 texture( cannot_diverge_branch sampler2D texSampler, float2 uv ); //Internally implemented by LLVM to map to a GPU instruction(s).
This patch is proposing:
float4 texture(convergent sampler2D texSampler, float2 uv ); //Internally implemented by LLVM to map to a GPU instruction(s).
I don't think we really need to care about tagging variables right now, there might be some more optimizations possible, but that should be addressed separately.

Then we agree and we understand each other. What I was saying is to add even more information (information which may come from the source level annotations, or may be deducted automatically, or may depend on GPU arch) in order to perform aggressive optimizations, which you if believe it should be addressed separately, I won't object.

As I mentioned previously, I believe it is very easy to fix, no need to change LLVM: marking the texture operation as having unknown side-effect should be enough.

I will pass this on to the radeonsi ticket.

arsenm mentioned this in D30609: [SimplifyCFG] do not sink intrinsics even with non-constant operands.Mar 6 2017, 10:48 AM

Hi there,

This patch should also fix https://bugs.freedesktop.org/show_bug.cgi?id=99484 and the minor rendering issue (Dirt Rally) I tried to fix with https://reviews.llvm.org/D30609.

Although my patch works, it's not the correct solution and it's also too agressive. This one definitely makes more sense.

In D26348#693528, @hakzsam wrote:

This patch should also fix https://bugs.freedesktop.org/show_bug.cgi?id=99484

Please read above my comment about this bug and how it can be fixed without changing LLVM.

In D26348#693537, @mehdi_amini wrote:

In D26348#693528, @hakzsam wrote:

This patch should also fix https://bugs.freedesktop.org/show_bug.cgi?id=99484

Please read above my comment about this bug and how it can be fixed without changing LLVM.

While true, making texture calls have unknown behavior forces the compiler to prevent reordering at all... on an architecture that heavily relies on latency hiding (both horizontal i.e. wave occupancy and vertical i.e. instruction reordering).

Btw. I've been running Mesa radeonsi with this patch on top of LLVM 5.0 for my daily work and leisure time for almost a month now, and haven't encountered any issue.

In D26348#693554, @dark_sylinc wrote:

In D26348#693537, @mehdi_amini wrote:

In D26348#693528, @hakzsam wrote:

This patch should also fix https://bugs.freedesktop.org/show_bug.cgi?id=99484

Please read above my comment about this bug and how it can be fixed without changing LLVM.

While true, making texture calls have unknown behavior forces the compiler to prevent reordering at all... on an architecture that heavily relies on latency hiding (both horizontal i.e. wave occupancy and vertical i.e. instruction reordering).

Right now, frontends are given a choice between performance and correctness...
My point is that one can't choose performance and sacrifices knowingly correctness, and then complains about a LLVM bug that need to be fixed. The bug is in the shader compiler in the first place!

I perfectly understand how having a solution to enable these optimizations is very important for shader compilers.
It is likely that if I was in charge of such a shader compiler, I'd try to explore how to express my intrinsics differently such that I can get "most" optimizations (i.e. not having to say to LLVM that this call can do anything).

Example to think about: can token help? Could the intrinsic be modeled by taking a pointer to a memory location instead of a SSA value? (there are finer grain way to express the effect on a pointer argument, and it wouldn't break SSA).

Btw. I've been running Mesa radeonsi with this patch on top of LLVM 5.0 for my daily work and leisure time for almost a month now, and haven't encountered any issue.

Great. But be aware that anecdotal data are unlikely to expedite a review here. What is needed is someone to spend time to find a solution to and nail the issues discussed in LangRef above.

In D26348#693574, @mehdi_amini wrote:

In D26348#693554, @dark_sylinc wrote:

In D26348#693537, @mehdi_amini wrote:

In D26348#693528, @hakzsam wrote:

This patch should also fix https://bugs.freedesktop.org/show_bug.cgi?id=99484

Please read above my comment about this bug and how it can be fixed without changing LLVM.

While true, making texture calls have unknown behavior forces the compiler to prevent reordering at all... on an architecture that heavily relies on latency hiding (both horizontal i.e. wave occupancy and vertical i.e. instruction reordering).

Right now, frontends are given a choice between performance and correctness...
My point is that one can't choose performance and sacrifices knowingly correctness, and then complains about a LLVM bug that need to be fixed. The bug is in the shader compiler in the first place!

I missed the rationale for this statement, but this does not seem obviously correct to me. Even marking the intrinsics as having unknown side effects does not seem strong enough to ensure the semantics required here - we don't really have a strong definition for "unknown side effects", AFAIK, but I'd not have thought that even unknown side effects on Foo would prevent the folding of:

if (cond) {
  Tmp1 = Foo(v [convergent])
} else {
  Tmp2 = Foo(v [convergent])
}
Tmp3 = phi [Tmp1, Tmp2]

and so, no, I don't think we currently have a solution for this of any kind (even a conservative one) - at least in theory. The closest thing I can think of might be the sideeffect marker on inline-asm statements: That might prevent the folding if we allow inline asm statements to define labels visible to other inline asm statements?

I perfectly understand how having a solution to enable these optimizations is very important for shader compilers.
It is likely that if I was in charge of such a shader compiler, I'd try to explore how to express my intrinsics differently such that I can get "most" optimizations (i.e. not having to say to LLVM that this call can do anything).

Example to think about: can token help? Could the intrinsic be modeled by taking a pointer to a memory location instead of a SSA value? (there are finer grain way to express the effect on a pointer argument, and it wouldn't break SSA).

Btw. I've been running Mesa radeonsi with this patch on top of LLVM 5.0 for my daily work and leisure time for almost a month now, and haven't encountered any issue.

Great. But be aware that anecdotal data are unlikely to expedite a review here. What is needed is someone to spend time to find a solution to and nail the issues discussed in LangRef above.

In D26348#694245, @hfinkel wrote:
In D26348#693574, @mehdi_amini wrote:

My point is that one can't choose performance and sacrifices knowingly correctness, and then complains about a LLVM bug that need to be fixed. The bug is in the shader compiler in the first place!

I missed the rationale for this statement, but this does not seem obviously correct to me. Even marking the intrinsics as having unknown side effects does not seem strong enough to ensure the semantics required here - we don't really have a strong definition for "unknown side effects", AFAIK, but I'd not have thought that even unknown side effects on Foo would prevent the folding of:
if (cond) {
  Tmp1 = Foo(v [convergent])
} else {
  Tmp2 = Foo(v [convergent])
}
Tmp3 = phi [Tmp1, Tmp2]

You're totally correct. I don't see any attribute that would model the convergent effect.
The issue stays the same: the intrinsic are "lying" by using SSA value with an unmodeled constraints.

Example to think about: can token help? Could the intrinsic be modeled by taking a pointer to a memory location instead of a SSA value? (there are finer grain way to express the effect on a pointer argument, and it wouldn't break SSA).

Let me expand on this aspect instead, what about something like:

void foo(int v, bool cond) {
  if (cond) {
    token1 = statepoint()
    bar(v, token1); // v needs to be convergent
  } else {
    token2 = statepoint()
    bar(v, token2);  // v needs to be convergent
  }
}

This is trying to capture the value of v at a given point using the token to bind the call to bar to this particular value of v.

In D26348#694393, @mehdi_amini wrote:
In D26348#694245, @hfinkel wrote:
In D26348#693574, @mehdi_amini wrote:

My point is that one can't choose performance and sacrifices knowingly correctness, and then complains about a LLVM bug that need to be fixed. The bug is in the shader compiler in the first place!

I missed the rationale for this statement, but this does not seem obviously correct to me. Even marking the intrinsics as having unknown side effects does not seem strong enough to ensure the semantics required here - we don't really have a strong definition for "unknown side effects", AFAIK, but I'd not have thought that even unknown side effects on Foo would prevent the folding of:
if (cond) {
  Tmp1 = Foo(v [convergent])
} else {
  Tmp2 = Foo(v [convergent])
}
Tmp3 = phi [Tmp1, Tmp2]
You're totally correct. I don't see any attribute that would model the convergent effect.
The issue stays the same: the intrinsic are "lying" by using SSA value with an unmodeled constraints.

Example to think about: can token help? Could the intrinsic be modeled by taking a pointer to a memory location instead of a SSA value? (there are finer grain way to express the effect on a pointer argument, and it wouldn't break SSA).

Let me expand on this aspect instead, what about something like:
void foo(int v, bool cond) {
  if (cond) {
    token1 = statepoint()
    bar(v, token1); // v needs to be convergent
  } else {
    token2 = statepoint()
    bar(v, token2);  // v needs to be convergent
  }
}
This is trying to capture the value of v at a given point using the token to bind the call to bar to this particular value of v.

Is it okay to fold that to:

void foo(int v, bool cond) {
  if (cond) {
    // token1 = statepoint()
    // bar(v, token1); // v needs to be convergent
  } else {
    // token2 = statepoint()
    // bar(v, token2);  // v needs to be convergent
  }
  token_merged = statepoint()
  bar(v, token_merged); // v needs to be convergent
}

In D26348#694427, @sanjoy wrote:

Is it okay to fold that to:

void foo(int v, bool cond) {
  if (cond) {
    // token1 = statepoint()
    // bar(v, token1); // v needs to be convergent
  } else {
    // token2 = statepoint()
    // bar(v, token2);  // v needs to be convergent
  }
  token_merged = statepoint()
  bar(v, token_merged); // v needs to be convergent
}

Yes, that is ok.

Examples that are ok to fold:

//Example 1: Safe to fold.
void foo(int v, bool cond) {
  if (cond) {
    // token1 = statepoint()
    // bar(v, token1); // v needs to be convergent
  } else {
    // token2 = statepoint()
    // bar(v, token2);  // v needs to be convergent
  }
  token_merged = statepoint()
  bar(v, token_merged); // v needs to be convergent
}

//Example 2: Unsafe to fold, but the language allows it thus it's programmer's
//responsibility to ensure correctness through data fed to the machine.
void foo(int v, bool cond) {
  token = statepoint()
  if (cond) {
    //idx0 = statepoint();
  } else {
    //idx1 = statepoint();
  }
  idx3 = phi[idx0, idx1];
  bar(v[idx3], token); // v needs to be convergent
}

//Example 3: Safe to fold, proven at compile time that v1 == v2
void foo(int v, bool cond) {
  token = statepoint()
  if (cond) {
    // v1 = statepoint()
    // bar(v1, token); // v needs to be convergent
  } else {
    // v2 = statepoint()
    // bar(v2, token);  // v needs to be convergent
  }
  //proven at compile time that v1 == v2 or that cond is always true / always false
  bar(v1, token); // v needs to be convergent
}

//Example 4: Safe to fold, proven at compile time that idx0 == idx1
void foo(int v, bool cond) {
  token = statepoint()
  if (cond) {
    //idx0 = statepoint();
  } else {
    //idx1 = statepoint();
  }
  //proven at compile time that idx1 == idx2 or that cond is always true / always false
  bar(v[idx1], token); // v needs to be convergent
}

Examples that is NOT ok to fold:

//Example 5 Unsafe to fold, should not be fold.
void foo(int v, bool cond) {
  token = statepoint()
  if (cond) {
    // v1 = statepoint()
    // bar(v1, token); // v needs to be convergent
  } else {
    // v2 = statepoint()
    // bar(v2, token);  // v needs to be convergent
  }
  v_merged = phi[v1, v2]
  bar(v_merged, token); // v needs to be convergent
}

//Example 6: (HLSL only) Unsafe to fold and programmer explicitly requested not to fold through language semantics (NonUniformResourceIndex).
//This is identical to Example 2, except the programmer explicitly asked not to fold through a special keyword or intrinsic.
//As far as I can see support for differentiating between these two scenarios is beyond the scope of this patch.
void foo(int v, bool cond) {
  token = statepoint()
  if (cond) {
    //idx0 = statepoint();
  } else {
    //idx1 = statepoint();
  }
  idx3 = noncovergent_phi[idx0, idx1];
  bar(v[idx3], token); // v needs to be convergent
}

//Example 1: Safe to fold.
void foo(int v, bool cond) {

if (cond) {
  // token1 = statepoint()
  // bar(v, token1); // v needs to be convergent
} else {
  // token2 = statepoint()
  // bar(v, token2);  // v needs to be convergent
}
token_merged = statepoint()
bar(v, token_merged); // v needs to be convergent

}

My understanding was that this is *not* OK to fold, we don't know that two different "thread" have the same value for v. Can you elaborate, it is very possible that I'm misunderstanding something here.

I meant that T1 would run with { v1 = 0; cond = true; } and T1 with with { v1 = 1; cond = false; }

In D26348#694722, @mehdi_amini wrote:
//Example 1: Safe to fold.
void foo(int v, bool cond) {
if (cond) {
  // token1 = statepoint()
  // bar(v, token1); // v needs to be convergent
} else {
  // token2 = statepoint()
  // bar(v, token2);  // v needs to be convergent
}
token_merged = statepoint()
bar(v, token_merged); // v needs to be convergent
}
My understanding was that this is *not* OK to fold, we don't know that two different "thread" have the same value for v. Can you elaborate, it is very possible that I'm misunderstanding something here.

My mistake was not being explicit enough in the examples. My apologies. GPUs differentiate dispatch, threadgroup and thread. A dispatch is made of threadgroups, a threadgroup is made of threads.
Uniform values are read only and have the same value the whole dispatch, while convergent values have the same value for the whole threadgroup.
I assumed v came from a uniform. In this example if v is a uniform, it is safe to fold. As both T0 and T1 are guaranteed to have the same value in v.
If v is neither uniform nor convergent, then you're right, it's not safe to fold.

In the example of selecting between v0 and v1; if the condition comes from a uniform (or a convergent), it's safe to fold. If the condition does not come from a uniform or convergent, it's not safe.

//OK to fold. Notice v is uniform; v never changes within the threadgroup.
//"token" can be different because it's not required to be convergent.
void foo( uniform sampler v, bool cond ) {
  if (cond) {
    // token1 = statepoint()
    // bar(v, token1); // v needs to be convergent
  } else {
    // token2 = statepoint()
    // bar(v, token2);  // v needs to be convergent
  }
  token_merged = statepoint()
  bar(v, token_merged); // v is guaranteed to be the same for all threads within the threadgroup.
}

//Unsafe code. Can be fold. This code is unsafe no matter what we do. v may be
//different between threads in the same threadgroup, but all threadgroups will follow
//the same path (cond is true for all threads within the threadgroup, or false for all threads
//within the threadgroup).
void foo( sampler v, uniform bool cond ) {
  if (cond) {
    // token1 = statepoint()
    // bar(v, token1);
  } else {
    // token2 = statepoint()
    // bar(v, token2);
  }
  token_merged = statepoint()
  bar(v, token_merged); // v may be different in the same threadgroup, but there's nothing compiler can do.
}

//Unsafe to fold. Must not be fold. v and cond are neither uniform nor convergent.
void foo( sampler v, bool cond ) {
  if (cond) {
    // token1 = statepoint()
    // bar(v, token1);
  } else {
    // token2 = statepoint()
    // bar(v, token2);
  }
  token_merged = statepoint()
  bar(v, token_merged);
}

//OK to fold. Notice v0, v1, and cond are all uniform.
void foo( uniform sampler v0, uniform sampler v1, uniform bool cond ) {
  token = statepoint();
  if (cond) {
    //bar( v0, token );
  } else {
    //bar( v1, token );
  }
  v2 = phi[ v0, v1 ];
  bar( v2, token );
}

//Unsafe to fold. Must not be fold. bool is not uniform, thus threads
//within the same threadgroup may follow a different path.
void foo( uniform sampler v0, uniform sampler v1, bool cond ) {
  token = statepoint();
  if (cond) {
    //bar( v0, token );
  } else {
    //bar( v1, token );
  }
  v2 = phi[ v0, v1 ];
  bar( v2, token );
}

Of course things aren't that simple. If I write:

uniform sampler mySampler[2];
uniform bool cond;

void foo( sampler v0, sampler v1, bool cond );

void main()
{
    foo( mySampler[0], mySampler[1], cond );
}

Then LLVM optimizer needs to see that all arguments to foo are uniform; then it's safe to fold foo (once foo is inlined). If instead I were to write:

uniform sampler mySampler[2];
uniform sampler anotherSampler;

void foo( sampler v0, sampler v1, bool cond );

void main()
{
    bool cond = anotherSampler.sample( uv = 0 );; //source of non-convergence.
    foo( mySampler[0], mySampler[1], cond );
}

Now foo cannot be folded because cond is not convergent.

If I were to write:

uniform sampler mySampler[2];
uniform sampler anotherSampler;
uniform bool someCond;

void foo( sampler v0, sampler v1, bool cond );

void main()
{
    foo( mySampler[0], mySampler[1], someCond ); // inline expansion can be folded
    bool anotherCond = anotherSampler.sample( uv = 0 );; //source of non-convergence.
    foo( mySampler[0], mySampler[1], anotherCond ); // inline expansion cannot be folded
}

then the first foo() call can be folded, the second one cannot.

It gets even funnier:

uniform sampler mySampler[2];
uniform sampler anotherSampler;

void foo( sampler v0, sampler v1, bool cond );

void main()
{
    bool cond = anotherSampler.sample( uv = 0 );; //source of non-convergence.
    int idx0 = anotherSampler.sample( uv = 1 ); //source of non-convergence.
    // mySampler[idx0] is no longer uniform nor convergent, and cond isn't convergent either. Cannot fold.
    foo( mySampler[idx0], mySampler[1], cond );
}

In other words the optimizer has to track whether the convergence property has been lost.

t-tye added a subscriber: t-tye.Feb 15 2018, 4:20 PM

Herald added subscribers: tpr, javed.absar. · View Herald TranscriptFeb 15 2018, 4:20 PM

arsenm resigned from this revision.Apr 5 2020, 8:04 AM

Herald added subscribers: kerbowa, jdoerfert, zzheng, jvesely. · View Herald TranscriptApr 5 2020, 8:04 AM

jdoerfert added reviewers: simoll, fpetrogalli.Apr 5 2020, 9:32 AM

sanjoy resigned from this revision.Jan 29 2022, 5:27 PM

Revision Contents

Path

Size

docs/

LangRef.rst

62 lines

include/

llvm/

Analysis/

CodeMetrics.h

3 lines

IR/

Instructions.h

12 lines

Intrinsics.td

6 lines

IntrinsicsAMDGPU.td

18 lines

lib/

Analysis/

CodeMetrics.cpp

2 lines

AsmParser/

LLParser.cpp

1 line

IR/

Verifier.cpp

5 lines

Transforms/

Scalar/

GVNHoist.cpp

2 lines

LoopUnrollPass.cpp

7 lines

Utils/

SimplifyCFG.cpp

12 lines

test/

Bitcode/

attributes.ll

5 lines

Transforms/

GVNHoist/

hoist-convergent.ll

27 lines

InstCombine/

select-call.ll

35 lines

LoopUnroll/

convergent.ll

70 lines

SimplifyCFG/

convergent.ll

65 lines

utils/

TableGen/

CodeGenIntrinsics.h

2 lines

CodeGenTarget.cpp

5 lines

IntrinsicEmitter.cpp

6 lines

Diff 79063

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,136 Lines • ▼ Show 20 Lines	``swifterror``
call boundaries rather than placing them in memory. Since this does change		call boundaries rather than placing them in memory. Since this does change
the calling convention, a function which uses the ``swifterror`` attribute		the calling convention, a function which uses the ``swifterror`` attribute
on a parameter is not ABI-compatible with one which does not.		on a parameter is not ABI-compatible with one which does not.

These constraints also allow LLVM to assume that a ``swifterror`` argument		These constraints also allow LLVM to assume that a ``swifterror`` argument
does not alias any other memory visible within a function and that a		does not alias any other memory visible within a function and that a
``swifterror`` alloca passed as an argument does not escape.		``swifterror`` alloca passed as an argument does not escape.

		``convergent``
		In some parallel execution models, there exist operations that are executed
		simultaneously for multiple threads and one or more arguments of the
		sanjoyUnsubmitted Not Done Reply Inline Actions (Based on what I've heard about GPUs) Should you emphasize that they're executed in "lockstep"? Or is that not a correct statement? sanjoy: (Based on what I've heard about GPUs) Should you emphasize that they're executed in "lockstep"?
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions It is a correct statement; whether it needs to be emphasized is anyone's guess ;-) I didn't think so, but it seems that this a bit of a point of contention, so probably something like it is required. Did you see @mehdi_amini 's alternative wording for this paragraph? Copying it here: GPUs (and other similar hardware) are frequently based on a SIMT (or SPMD) model, where a single hardware execution unit executes identical operations simultaneously on behalf of multiple threads of execution. Some of these operations require that one or more arguments of the operation must have the same value across all simultaneously executing threads. Such arguments are called `convergent`. I'd be fine with it, and I think it's fairly clear. nhaehnle: It is a correct statement; whether it needs to be emphasized is anyone's guess ;-) I didn't…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions `uniform` isn't defined in the LangRef. mehdi_amini: `uniform` isn't defined in the LangRef.
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Right, this is also used in the paragraph below. This whole paragraph isn't intended to be normative, but to provide context that allows one to understand the actual condition. Perhaps this could be phrased better. What about: "In some parallel execution models, there exist operations that are executed simultaneously for multiple threads, and one or more arguments of the operation must have the same value across all simultaneously executing threads." nhaehnle: Right, this is also used in the paragraph below. This whole paragraph isn't intended to be…
		hfinkelUnsubmitted Not Done Reply Inline Actions I think that would be better. I still find the current text in this paragraph and the next confusing because understanding them hinges on understanding the term `uniform` which we never define. hfinkel: I think that would be better. I still find the current text in this paragraph and the next…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Okay, that's what the current version says. nhaehnle: Okay, that's what the current version says.
		hfinkelUnsubmitted Not Done Reply Inline Actions Yes, but it still says uniform in the paragraph below. Please either define that term or don't use it. hfinkel: Yes, but it still says uniform in the paragraph below. Please either define that term or don't…
		operation must have the same value across all simultaneously executing
		threads. Such arguments are called ``convergent``.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I don't like this paragraph, and especially the use of "threads", which in general does not imply the lock-step model. It seems like a loose definition that makes little sense for anyone not versed into SIMT and GPU execution model. mehdi_amini: I don't like this paragraph, and especially the use of "threads", which in general does not…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions "threads" is literally the term used by AMD for the thing we're talking about: threads executing in lock-step within a single "wave". It would be counter-productive to deviate from that. And yes, "thread" doesn't always imply lock-step, but the paragraph does explicitly talk about "operations that are executed simultaneously for multiple threads". Maybe this would help drive the point home even more: In some parallel execution models, there exist operations that are executed by a single hardware execution unit on behalf of multiple threads simultaneously and one or more parameters of the operation ... ... but I was previously encouraged to stay away from talking about hardware in the LangRef, so I have to say that this feels increasingly like bike-shedding to me. I'd actually be happy to use language that admits that at the end of the day, this is all about doing something useful with hardware, because I think it helps understand the intuition behind the definition. But the back-and-forth needs to converge at some point (in the limit sense, not in the attribute sense ;-)). nhaehnle: "threads" is literally the term used by AMD for the thing we're talking about: threads…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions The fact that AMD is using "threads" is irrelevant to me: we're not writing an AMDGPU documentation here. This is why the use of `threads` alone seems wrong and misleading to me in the context of this document. (My point wouldn't stand in a GPU-centric document, but the GPU model is not part of the LangRef right now) Otherwise, I would prefer that we spell it clearly, something along these lines: GPUs (and other similar hardware) are frequently based on a SIMT (or SPMD) model, where a single hardware execution unit executes identical operations simultaneously on behalf of multiple threads of execution. Some of these operations require that one or more arguments of the operation must have the same value across all simultaneously executing threads. Such arguments are called ``convergent``. mehdi_amini: The fact that AMD is using "threads" is irrelevant to me: we're not writing an AMDGPU…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions My point was that inventing a new term to use instead of 'threads' is counter-productive, but it seems we agree on that anyway. I agree that something needs to be added for the purpose of LangRef, and I originally thought that talking about "simultaneously executing" would be sufficient, but clearly it isn't. I'd be happy to use your wording. nhaehnle: My point was that inventing a new term to use instead of 'threads' is counter-productive, but…

		While the ``convergent`` attribute on functions can be thought of as
		"whether and when each call site to this function is reached must be
		uniform across a (target-dependent) set of threads", the ``convergent``
		jlebarUnsubmitted Not Done Reply Inline Actions While the `convergent` attribute on functions can be thought of as "whether and when each call site to this function is reached must be uniform across multiple threads", I know this isn't meant to be normative, but I'm a little concerned by this language. Specifically, our semantics say that the compiler is not allowed to introduce new sources of divergence to convergent calls, but it is also not allowed to assume that convergent calls are uniform. I presume the same is true for convergent args -- the compiler cannot introduce divergence into the arg values, but it also cannot assume anything about the arg values as a consequence of their being marked convergent. I know rewriting this to be precise might defeat the purpose of your summary here. But could we add some weasel words so that people can't point to this as justification that divergent calls / args marked with `convergent` are LLVM UB? jlebar: > While the ``convergent`` attribute on functions can be thought of as "whether and when each…
		hfinkelUnsubmitted Not Done Reply Inline Actions This is a good point. We might want to say something like that there is some implementation-defined set of simultaneously-executing threads for which the restriction holds. Thus we won't be able to make an target-independent assumptions about the implied behavior. hfinkel: This is a good point. We might want to say something like that there is some implementation…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I don't see why we should mention anything about thread or concurrency in the LangRef? Also, I'm not thrilled by this paragraph either (to begin with, the sentence about the convergent attribute on function isn't clear to me). mehdi_amini: I don't see why we should mention anything about thread or concurrency in the LangRef? Also…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Hmm. I wonder what the general opinion is on whether this paragraph is still helpful at all. The way I see it, it can serve two purposes: Explain the attribute in an intuitive way to someone with an SIMT/SPMD background. Explain why the same term is used here for a function-parameter attribute and elsewhere as a function attribute. But if it just adds confusion for others, maybe it's best to remove it rather than discuss it to death? After all, the first purpose is kind of fulfilled already by the paragraph before that... If the paragraph is kept around, then I agree with something like: ... whether and when each call site to this function is reached must be uniform across a target-dependent set of simultaneously-executing threads ... nhaehnle: Hmm. I wonder what the general opinion is on whether this paragraph is still helpful at all.
		attribute on function arguments can be thought of as "the value of this
		jlebarUnsubmitted Not Done Reply Inline Actions Nit, I think you want "whereas" or "while". Sort of like "although" but a less strong sense of contrast. jlebar: Nit, I think you want "whereas" or "while". Sort of like "although" but a less strong sense of…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Makes sense. nhaehnle: Makes sense.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I believe: `s/function argument/function parameter/` ( http://llvm.org/docs/LangRef.html#parameter-attributes ) (Check the other places as well) mehdi_amini: I believe: `s/function argument/function parameter/` ( http://llvm.org/docs/LangRef.
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Ok, will do. nhaehnle: Ok, will do.
		argument at each execution of each call site to this function must be
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions (Ignore this, phab bug) mehdi_amini: (Ignore this, phab bug)
		uniform across a (target-dependent) set of threads".
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions This paragraph isn't providing much either, refers to a definition of the `convergent` attribute on functions that is not present elsewhere in the document, and is still using "uniform". mehdi_amini: This paragraph isn't providing much either, refers to a definition of the ``convergent``…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Ok, let's just kill it. Of all the fuss about this definition, the one thing that this paragraph was really meant to address hasn't ever come up, so... (although, perhaps it hasn't come up precisely because the paragraph served its purpose :)) nhaehnle: Ok, let's just kill it. Of all the fuss about this definition, the one thing that this…

		For every function F, transformations must ensure that every pair of
		sanjoyUnsubmitted Not Done Reply Inline Actions I think you need to be specific and explicit about how you're relating "executions" between the pre-transform and the post-transform functions. For instance if the pre-transform function is: void f(int* val dereferenceable(8)) { int k = val } and the post-transform program is: void f(int val dereferenceable(8)) { int k0 = (long )val; int k = k & 0xff..ff; } an execution of the pre-transform program where `k = 400` may not make sense in the context of the post-transform program since you need to produce a 8 byte value for `k0`. sanjoy: I think you need to be specific and explicit about how you're relating "executions" between the…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions As far as I'd understand it, the initial state of memory is part of what defines an execution. Since the parameter is dereferenceable(8), that includes 8 bytes of memory even in the original program. nhaehnle: As far as I'd understand it, the initial state of memory is part of what defines an execution.
		executions of F that is "compatible" with respect to convergent function
		jlebarUnsubmitted Not Done Reply Inline Actions Nit, suggest putting "compatible" in quotes so it's clear it's a term we're about to define. jlebar: Nit, suggest putting "compatible" in quotes so it's clear it's a term we're about to define.
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Sure, no problem. nhaehnle: Sure, no problem.
		arguments in the original function F is also "compatible" when executing
		the transformed function F. "Compatibility" is defined as follows:
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions It is not clear to me what is r1/S1 and r2/S2 that prevent to CSE: if (cond) { Tmp1 = Foo(v [convergent]) } else { Tmp2 = Foo(v [convergent]) } Tmp3 = phi [Tmp1, Tmp2] to: Tmp3 = Foo(v[convergent]) But your definition reference a "call site" as if there is a one-to-one mapping before/after a transformation. mehdi_amini: It is not clear to me what is r1/S1 and r2/S2 that prevent to CSE: ``` if (cond) { Tmp1 =…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions The definition of "compatible" and everything inside it is completely independent of transformations. It talks about two runs of the same program. Your example is forbidden because in the original program, the runs `r1 = { cond = true, v = V0 }` and `r2 = { cond = false, v = V1 }` (with V0 != V1) are compatible (the first call-site of Foo has the sequences r1(CS) = ( V0 ) and r2(CS) = ( ), i.e. r2(CS) is a subsequence of r1(CS); similarly for the second call-site), but in the transformed program they're not (the single call-site of Foo has the sequences r1(CS) = ( V0 ) and r2(CS) = ( V1 ), and neither is a subsequence of the other). I do think that the language I used is pretty clear on that, actually, but since both you and @jlebar seem to have had the same difficulty, I'd like to understand that better. Perhaps it would help to swap those two paragraphs around as follows? Program transformations must ensure that every pair of runs that is compatible with respect to convergent function attributes in the original program must also be compatible in the transformed program. Compatibility is defined as follows: Two runs r1 and r2 of a program are said to be compatible (wrt convergent function attributes) if ... Maybe that helps the reader to parse the definition of compatibility because it first establishes a context where compatibility is about runs within a program rather than runs across multiple (transformed) versions of a program. nhaehnle: The definition of "compatible" and everything inside it is completely independent of…

		Two executions e1 and e2 of a function are compatible (with respect to
		hfinkelUnsubmitted Not Done Reply Inline Actions Please write out "with respect to." Also, you mean convergent function-parameter attributes, not function attributes. The runs of the program terminology I find bothersome, in part because that's not what we're talking about. We're talking about the behavior of different threads within the same program. I think it would be better to say, "Two executions of the calling function, e1 and e2, are compatible..." Then we can say that transformations must preserve compatibility w.r.t. convergent function-parameter attributes for all potential executions of the calling function. We might even further specify that this is for potential simultaneous executions. Finally, I'd really like it if you would include the example we've been discussing in the reference as a non-normative example. I did not really understand the definition until I read @nhaehnle's explanation above. hfinkel: Please write out "with respect to." Also, you mean convergent function-parameter attributes…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions I'm only one person :) I'm making the editing changes and adding the example. s/run/execution/ is fine with me, and I'm adding that as an alternative below, but whenever I try to formulate some text that talks about executing a calling function rather than executing a function, I feel confused about the calling vs. the callee function. Perhaps you can clarify: The runs of the program terminology I find bothersome, in part because that's not what we're talking about. We're talking about the behavior of different threads within the same program. What's the difference between threads and executions in this context? nhaehnle: I'm only one person :) I'm making the editing changes and adding the example.
		hfinkelUnsubmitted Not Done Reply Inline Actions but whenever I try to formulate some text that talks about executing a calling function rather than executing a function, I feel confused about the calling vs. the callee function. Feel free to establish some terminology early in the definition for later use. We're talking about a function-parameter attribute. That attribute must appear on the function parameters of some call site. That call site exists in some function (the caller a.k.a. the calling function) and calls some other function (the callee function). What's the difference between threads and executions in this context? As I think about it, a thread is some entity that executes things (e.g. functions). Threads execute functions, and each time a thread executes some function, that constitutes an execution of that function. hfinkel: > but whenever I try to formulate some text that talks about executing a calling function…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions I see. I played around with this some more, and here's an example I'd like to hear your thoughts on: define void @foo(i32 %a) { ... call void @bar(i32 %a) ... } define void @bar(i32 %a) { call void @baz(i32 convergent %a) } This code has a potential problem because at least the parameter of bar should arguably have been labelled convergent. I'd say that inlining shouldn't have to worry about the convergent attribute at all. However, if the definition we're discussing here takes a function-centric view, then we cannot prove in a target-independent way that inlining bar is correct (because foo gets a new callsite with a convergent function argument, where previously there were no such callsites, i.e. every pair of runs of foo was compatible). If the definition takes a program-centric view (i.e., executions that start at foo implicitly also consider the sequence of arguments passed into baz inside bar), then inlining bar is clearly correct. That would be an argument in favour of staying with the language about "executions of a program" rather than "executions of a function". nhaehnle: I see. I played around with this some more, and here's an example I'd like to hear your…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Hmm, looking at other transforms of similar cases, I think we have to say that cases like this are undefined behaviour. Curious, I think the same applies to the convergent attribute for functions: if a function F contains calls to a convergent function G, but is not itself marked convergent, then calls to F should be undefined behaviour, right? LangRef is currently somewhat silent on this particular point. nhaehnle: Hmm, looking at other transforms of similar cases, I think we have to say that cases like this…
		hfinkelUnsubmitted Not Done Reply Inline Actions Regarding the convergent function attribute, yes. This is why Clang in CUDA mode marks all functions as convergent and then we remove the attribute when we prove it is not needed (in lib/Transforms/IPO/FunctionAttrs.cpp). hfinkel: Regarding the convergent function attribute, yes. This is why Clang in CUDA mode marks all…
		hfinkelUnsubmitted Not Done Reply Inline Actions The execution of a function semantically includes the execution of any functions called, so I still feel this is the better terminology. We're talking about preventing transformations that invalidate executions of individual program executions, comparing different program executions is irrelevant, but each single program execution can contain multiple simultaneous executions of the functions in question. I think that the inlining would be correct regardless. Nothing that the inliner does will change the convergence properties of function arguments, as far as I can tell. However, you also raise a second important point. If the first argument of `@bar` is not marked as convergent, then other optimizations around calls to `@bar` can ruin the convergent property regardless. Thus, it too needs to be marked. hfinkel: The execution of a function semantically includes the execution of any functions called, so I…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions What is the current status on this? How do we propagate/infer this attribute? mehdi_amini: What is the current status on this? How do we propagate/infer this attribute?
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions See the paragraph starting at line 1198 ("Calling a function F that contains a call-site CS..."). Also, to quote myself about what that paragraph has to say: this is fairly restrictive right now. E.g. one could ask what happens if the called function F passes a value that was loaded from memory to a convergent parameter. At least for AMDGPU purposes it's fine to leave that undefined, and probably will remain so for a long time. We currently have no need to actually either infer or remove the attribute, but while the paragraph doesn't explicitly state it, it contains all the definitional power to work with and without a flow that does what is done for the convergent function attribute in some compilation flows: add it everywhere in the beginning, then remove it in places where it can be proved to be unneeded. nhaehnle: See the paragraph starting at line 1198 ("Calling a function F that contains a call-site CS...
		convergent function arguments) if for every call site CS to a function with
		hfinkelUnsubmitted Not Done Reply Inline Actions So, the clarify, we need to ensure compatibility between all call sites with a convergent argument, or all call sites with a convergent argument to the same function, or all call sites with a convergent argument subject to some other condition? hfinkel: So, the clarify, we need to ensure compatibility between all call sites with a convergent…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions I hope we're not talking past each other, but for the definition of compatibility, each call site is considered separately. Compatibility is a property of a pair of executions; it is satisfied if a certain property is satisfied at each call site to a function with convergent arguments. When one proves that a transformation preserves compatibility, one would look at each call-site in the transformed code, and prove the property based on the fact that it was satisfied at some call-site in the original code, but the definition of compatibility itself is never concerned with the interaction or relationship of multiple call sites to each other. nhaehnle: I hope we're not talking past each other, but for the definition of compatibility, each call…
		hfinkelUnsubmitted Not Done Reply Inline Actions but for the definition of compatibility, each call site is considered separately. That's what I thought, thanks! I just wanted to make sure. hfinkel: > but for the definition of compatibility, each call site is considered separately. That's…
		a convergent argument, the sequences e1(CS) and e2(CS) of values supplied
		to the convergent argument at CS during e1 and e2, respectively, satisfy
		that e1(CS) is a subsequence of e2(CS) or vice versa.

		jlebarUnsubmitted Not Done Reply Inline Actions This is going to be the paragraph to wordsmith. I think the approach of saying "A correct transformation has the following property: For any two runs of the program r1 and r2, if r1 and r2 were compatible before the transformation, they must be compatible after the transformation" makes some sense as a way to import SPMD semantics into the langref. But I am not sure about the definition of 'compatible'. For one thing, this doesn't work if the transformation adds or removes a callsite with convergent parameters. But you're sometimes allowed to do this. For example, you could unroll a loop which contains a convergent-parameter call. This is really tricky... jlebar: This is going to be the paragraph to wordsmith. I think the approach of saying "A correct…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions So, the intention is for the language to allow the addition of callsites in loop unrolling. Compatible means that at every call site, the sequences of actual convergent function arguments are equal or one is a subset of the other. If you add a callsite, you have to make sure that this property holds. In the case of loop unrolling, I think the constraint ends up being similar to what happens for convergent functions, i.e. as long as you know that the loop has a multiple of N iterations, you can unroll the loop N times. The effect is that the sequence of actual function arguments at the call site in the loop from run r1 on the original program is split into N sequences of actual function arguments, one for each of the N call sites in the loop-unrolled program. If the sequences in r1 and r2 were equal originally, then they're trivially equal afterwards. If the sequence in r1 was a subsequence of the sequence in r2, then this must mean that the loop took fewer iterations in r1 than in r2. Then it follows that the sequences in r1 are still subsequences of the corresponding sequences in r2. (I think this approach is fine for the proof that unrolling exactly N times satisfies the condition. For the reverse proof that loop transformations satisfying the condition actually do the right thing with respect to the real SIMT model, I'm cheating slightly because there could be an outer loop so that there is not necessarily a clear implication from subsequence -> fewer iterations, but then e.g. use the freedom to choose r1 and r2 in such a way that the outer loop is run through only once.) nhaehnle: So, the intention is for the language to allow the addition of callsites in loop unrolling.
		jlebarUnsubmitted Not Done Reply Inline Actions Oh, I see what you mean now! Okay. My problem was I was trying to compare call sites before and after the transformation, but you only compare calls made within one piece of code (i.e. either before compared to before or after compared to after). Now that I've slept on it (and understood my mistake here), I think that reasoning about how programs run is going to be pretty complicated -- to the point that I am still not 100% sure this is correct. If we can agree on a data-flow formulation, I'd probably find that a lot easier to reason about myself. It sounds like Mehdi doesn't have as much of a problem with it as you might have thought? jlebar: Oh, I see what you mean now! Okay. My problem was I was trying to compare call sites before…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Right, the comparison is within one piece of code. See the last example that Mehdi brought up as an example for why a pure data-flow formulation may be challenging. nhaehnle: Right, the comparison is within one piece of code. See the last example that Mehdi brought up…
		As an example to illustrate this definition, consider the following
		stylized code fragment.

		.. code-block:: text

		if (cond) {
		Tmp1 = Foo(v [convergent])
		sanjoyUnsubmitted Not Done Reply Inline Actions What if `Foo` is a function pointer that we loaded from some place (and thus we don't have a declaration that we can inspect to know for sure if the argument is marked `convergent`)? This rule would imply we can't do this sinking transform in that case. sanjoy: What if `Foo` is a function pointer that we loaded from some place (and thus we don't have a…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions In that case, we may not have a way to specify that the parameter is convergent. This may be a gap in this patch (I haven't actually checked), but not a gap that needs to be filled immediately. This is similar to the convergent attribute of functions: if you're calling through a function pointer, you may want to add the convergent attribute to the call-site. As long as that attribute isn't present, you can assume that the called function is not convergent. The same holds for the function parameter attribute. nhaehnle: In that case, we may not have a way to specify that the parameter is convergent. This may be a…
		} else {
		Tmp2 = Foo(v [convergent])
		}
		Tmp3 = phi [Tmp1, Tmp2]

		Without the restrictions imposed by the convergent attribute, this could be
		simplified to:

		.. code-block:: text

		Tmp3 = Foo(v [convergent])

		To see why this transformation is forbidden, consider executions
		``e1 = { cond = true, v = V0 }`` and ``e2 = { cond = false, v = V1 }`` with
		V0 different from V1. These executions are compatible in the original code
		fragment: At the first call-site of ``Foo``, the sequences are
		``e1(CS) = (V0)`` and ``e2(CS) = ()``, so ``e2(CS)`` is a subsequence of
		``e1(CS)``. The second call-site is similar. However, the executions are
		incompatible in the transformed code fragment: The sequences at the
		call-site are ``e1(CS) = (V0)`` and ``e2(CS) = (V1)``, so neither is a
		subsequence of the other.

		Calling a function F that contains a call-site CS with convergent function
		arguments is only defined behavior if the only data-dependencies of the
		sanjoyUnsubmitted Not Done Reply Inline Actions This is a little tricky -- what if the source program is: void f(int convergent i); void g(int convergent j, int /* normal / k) { f(j - j + k); } Can we simplify the call site to call `f(k)`? If so, did we introduce UB in functions that call `g`? sanjoy:* This is a little tricky -- what if the source program is: ``` void f(int convergent i); void g…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Isn't your source program already UB? You have a call site with an argument that is data-dependent on a non-convergent parameter, `k`. The kind of annoying and SSA-breaking limitation of convergent would be: void f(int convergent i); void g(int convergent j, int /* normal / k) { if (j == k) f(j); } You can't replace `f(j);` with `f(k);` IIUC. mehdi_amini:* Isn't your source program already UB? You have a call site with an argument that is data…
		sanjoyUnsubmitted Not Done Reply Inline Actions You're right, I misread "only data-dependencies" as "there must be at least one convergent data dependencies". The other interesting corner case here is (assuming I did not misread this second time around) around PHI nodes and selects -- if we have: void @other_convergent(int convergent x); void @f(int convergent k, int /* normal / l) { entry: br cond label left, label merge left: br label merge merge: val = PHI [ k, entry ], [ l, left ] call @other_convergent(val); } Is calling the function above well defined if `cond` is is always `false`? What happens if we convert the phi to a select? What happens when we convert the select to arithmetic? sanjoy:* You're right, I misread "only data-dependencies" as "there must be at least one convergent data…
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions @mehdi_amini: This example is interesting, because the transform breaks the condition of the definition, but it is correct in the target semantics. This is very unfortunate, and I don't have an immediate answer for it. @sanjoy's second example is similar. My first intuition would actually have been to say that in all variations, calling the function is undefined. But that's not great if in the future you want to map the semantics of higher-level languages directly to LLVM, including function calls, because there the example is perfectly fine. Anyway, @mehdi_amini's example is the bigger problem. Forcing GVN and others to keep track of the convergent-ness is clearly a bad idea. Thinking out loud, maybe the job could be done by a function attribute that indicates that any parameter may have to be treated as convergent. nhaehnle: @mehdi_amini: This example is interesting, because the transform breaks the condition of the…
		convergent function arguments of CS are convergent parameters of F. If this
		condition is not satisfied, then the behavior in F itself is defined, but a
		function G that calls F has undefined behavior. The purpose of this rule is
		to allow transformations to be applied to G without having to transitively
		look inside all functions called by G.


.. _gc:		.. _gc:

Garbage Collector Strategy Names		Garbage Collector Strategy Names
--------------------------------		--------------------------------

Each function may specify a garbage collector strategy name, which is simply a		Each function may specify a garbage collector strategy name, which is simply a
string:		string:

▲ Show 20 Lines • Show All 11,489 Lines • Show Last 20 Lines

include/llvm/Analysis/CodeMetrics.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	struct CodeMetrics {
bool isRecursive = false;		bool isRecursive = false;

/// \brief True if this function cannot be duplicated.		/// \brief True if this function cannot be duplicated.
///		///
/// True if this function contains one or more indirect branches, or it contains		/// True if this function contains one or more indirect branches, or it contains
/// one or more 'noduplicate' instructions.		/// one or more 'noduplicate' instructions.
bool notDuplicatable = false;		bool notDuplicatable = false;

/// \brief True if this function contains a call to a convergent function.		/// \brief True if this function contains a call to a convergent function or
		/// a function with convergent arguments.
bool convergent = false;		bool convergent = false;

/// \brief True if this function calls alloca (in the C sense).		/// \brief True if this function calls alloca (in the C sense).
bool usesDynamicAlloca = false;		bool usesDynamicAlloca = false;

/// \brief Number of instructions in the analyzed blocks.		/// \brief Number of instructions in the analyzed blocks.
unsigned NumInsts = false;		unsigned NumInsts = false;

▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

include/llvm/IR/Instructions.h

Show First 20 Lines • Show All 1,769 Lines • ▼ Show 20 Lines	bool hasStructRetAttr() const {
return paramHasAttr(1, Attribute::StructRet);		return paramHasAttr(1, Attribute::StructRet);
}		}

/// Determine if any call argument is an aggregate passed by value.		/// Determine if any call argument is an aggregate passed by value.
bool hasByValArgument() const {		bool hasByValArgument() const {
return AttributeList.hasAttrSomewhere(Attribute::ByVal);		return AttributeList.hasAttrSomewhere(Attribute::ByVal);
}		}

		/// Determine if any call argument is convergent.
		bool hasConvergentArgument() const {
		if (AttributeList.hasAttrSomewhere(Attribute::Convergent))
		return true;

		if (const Function *F = getCalledFunction())
		if (F->getAttributes().hasAttrSomewhere(Attribute::Convergent))
		return true;

		return false;
		}

/// Return the function called, or null if this is an		/// Return the function called, or null if this is an
/// indirect function invocation.		/// indirect function invocation.
///		///
Function *getCalledFunction() const {		Function *getCalledFunction() const {
return dyn_cast<Function>(Op<-1>());		return dyn_cast<Function>(Op<-1>());
}		}

/// Get a pointer to the function that is invoked by this		/// Get a pointer to the function that is invoked by this
▲ Show 20 Lines • Show All 3,136 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	def IntrArgMemOnly : IntrinsicProperty;			def IntrArgMemOnly : IntrinsicProperty;

	// Commutative - This intrinsic is commutative: X op Y == Y op X.			// Commutative - This intrinsic is commutative: X op Y == Y op X.
	def Commutative : IntrinsicProperty;			def Commutative : IntrinsicProperty;

	// Throws - This intrinsic can throw.			// Throws - This intrinsic can throw.
	def Throws : IntrinsicProperty;			def Throws : IntrinsicProperty;

				// Convergent - The specified argument is convergent. This corresponds to the
				// IR function argument attribute of the same name.
				class Convergent<int argNo> : IntrinsicProperty {
				jlebarUnsubmitted Not Done Reply Inline Actions This definition actually seems fairly clear to me, if we s/control-//. Is there a reason we don't want to use this as our canonical definition? jlebar: This definition actually seems fairly clear to me, if we s/control-//. Is there a reason we…
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions I'd be happy with it. When I posted something like this on the mailing list initially, Mehdi asked for clarification :) nhaehnle: I'd be happy with it. When I posted something like this on the mailing list initially, Mehdi…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions My concern on the mailing was the original definition was based on "uniformity". Anyway, what about this example: if (cond) { Tmp1 = ... Foo(Tmp1 [convergent]) } else { Tmp2 = ... Foo(Tmp2 [convergent]) } I think your definition does not prevent: if (cond) { Tmp1 = ... } else { Tmp2 = ... } Tmp3 = phi [Tmp1, Tmp2] if (cond) { Foo(Tmp3 [convergent]) } else { Foo(Tmp3 [convergent]) } However it breaks uniformity. mehdi_amini: My concern on the mailing was the original definition was based on "uniformity". Anyway, what…
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions That transform is allowed. It will cause some inefficiencies in the code generation, but the semantics are indeed okay. At each call site of Foo, the argument is "dynamically uniform" using the language of the GLSL specification, i.e. all threads that actually go through the call site will see the same value (of course assuming that this was true for the original program). So maybe it's really just a matter of finding the right wording. nhaehnle: That transform is allowed. It will cause some inefficiencies in the code generation, but the…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions OK let's just do one extra transformation: if (cond) { Tmp1 = ... } else { Tmp2 = ... } Tmp3 = phi [Tmp1, Tmp2] Foo(Tmp3 [convergent]) mehdi_amini: OK let's just do one extra transformation: ``` if (cond) { Tmp1 = ... } else { Tmp2 = ...
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions That transformation is incorrect, and this RFC patch already disables it (in SimplifyCFG/tail sinking). A simpler example would be just: if (cond) { Tmp1 = Foo(v [convergent]) } else { Tmp2 = Foo(v [convergent]) } Tmp3 = phi [Tmp1, Tmp2] to Tmp3 = Foo(v [convergent]) This transform is incorrect in general because cond could be something like `v == 0`, where v is an int that can only take values 0 or 1. Formally, the constraint formulation with runs r1 and r2 forbids this transform because you can choose r1 = { cond = true, v = A }, r2 = { cond = false, v = B }. These runs are compatible in the original program, but incompatible in the transformed program. (This is basically what happens in the linked bug report.) The transform would be correct if the branch condition were proved to be "uncorrelated" with the convergent argument. This can sometimes be done with target-specific knowledge. This target-specific knowledge effectively restricts which pairs of runs need to be considered in the runs-based constraint formulation. nhaehnle: That transformation is incorrect, and this RFC patch already disables it (in SimplifyCFG/tail…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions I'm addressing the definition proposed here: " The values to this argument are convergent, and calls to the intrinsic may not be modified in a way that makes the argument value control-dependent on additional values." I don't see how any of this two definition forbids the transformation I mentioned. Or the one in LangRef: "When a function argument has the `convergent` attribute, a call to the function may only be transformed in a way that does not add additional sources of divergence to the argument." It is using the concept of "divergence" without defining it, so this one is more iffy, but still not clear. mehdi_amini: I'm addressing the definition proposed here: " The values to this argument are convergent, and…
				hfinkelUnsubmitted Not Done Reply Inline Actions It is using the concept of "divergence" without defining it, so this one is more iffy, but still not clear. Yea, we normally talk about this in terms of adding control dependencies, but we'll need to carefully define what we mean here. hfinkel: > It is using the concept of "divergence" without defining it, so this one is more iffy, but…
				nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Okay, so how about in this location in Intrinsics.td we remove the "control" from "control-dependent" and refer explicitly to the IR attribute? Then at least it's clear that the proper definition is located in one specific place. Maybe as short as: // Convergent - The values to this argument are convergent. This corresponds // to the IR function argument attribute of the same name. Then we can focus on the language in LangRef - any feedback on that? nhaehnle: Okay, so how about in this location in Intrinsics.td we remove the "control" from "control…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions The quote I posted just above about using "source of divergence" comes from LangRef. mehdi_amini: The quote I posted just above about using "source of divergence" comes from LangRef.
				hfinkelUnsubmitted Not Done Reply Inline Actions No, because there are important differences. Both refer to control dependencies, but on a function it refers to control dependencies on the execution of the function, but on the value, it is talking about control dependencies on the value being passed into the function, right? hfinkel: No, because there are important differences. Both refer to control dependencies, but on a…
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions @hfinkel: were you answering to me? If so it looks like we're talking past each other. mehdi_amini: @hfinkel: were you answering to me? If so it looks like we're talking past each other.
				hfinkelUnsubmitted Not Done Reply Inline Actions @mehdi_amini: no, I was responding to @nhaehnle hfinkel: @mehdi_amini: no, I was responding to @nhaehnle
				int ArgNo = argNo;
				}

	// NoCapture - The specified argument pointer is not captured by the intrinsic.			// NoCapture - The specified argument pointer is not captured by the intrinsic.
	class NoCapture<int argNo> : IntrinsicProperty {			class NoCapture<int argNo> : IntrinsicProperty {
	int ArgNo = argNo;			int ArgNo = argNo;
	}			}

	// Returned - The specified argument is always the return value of the			// Returned - The specified argument is always the return value of the
	// intrinsic.			// intrinsic.
	class Returned<int argNo> : IntrinsicProperty {			class Returned<int argNo> : IntrinsicProperty {
	▲ Show 20 Lines • Show All 709 Lines • Show Last 20 Lines

include/llvm/IR/IntrinsicsAMDGPU.td

Show First 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	class AMDGPUImageLoad : Intrinsic <
[llvm_anyfloat_ty], // vdata(VGPR)		[llvm_anyfloat_ty], // vdata(VGPR)
[llvm_anyint_ty, // vaddr(VGPR)		[llvm_anyint_ty, // vaddr(VGPR)
llvm_anyint_ty, // rsrc(SGPR)		llvm_anyint_ty, // rsrc(SGPR)
llvm_i32_ty, // dmask(imm)		llvm_i32_ty, // dmask(imm)
llvm_i1_ty, // glc(imm)		llvm_i1_ty, // glc(imm)
llvm_i1_ty, // slc(imm)		llvm_i1_ty, // slc(imm)
llvm_i1_ty, // lwe(imm)		llvm_i1_ty, // lwe(imm)
llvm_i1_ty], // da(imm)		llvm_i1_ty], // da(imm)
[IntrReadMem]>;		[IntrReadMem, Convergent<1>]>;

def int_amdgcn_image_load : AMDGPUImageLoad;		def int_amdgcn_image_load : AMDGPUImageLoad;
def int_amdgcn_image_load_mip : AMDGPUImageLoad;		def int_amdgcn_image_load_mip : AMDGPUImageLoad;
def int_amdgcn_image_getresinfo : AMDGPUImageLoad;		def int_amdgcn_image_getresinfo : AMDGPUImageLoad;

class AMDGPUImageStore : Intrinsic <		class AMDGPUImageStore : Intrinsic <
[],		[],
[llvm_anyfloat_ty, // vdata(VGPR)		[llvm_anyfloat_ty, // vdata(VGPR)
llvm_anyint_ty, // vaddr(VGPR)		llvm_anyint_ty, // vaddr(VGPR)
llvm_anyint_ty, // rsrc(SGPR)		llvm_anyint_ty, // rsrc(SGPR)
llvm_i32_ty, // dmask(imm)		llvm_i32_ty, // dmask(imm)
llvm_i1_ty, // glc(imm)		llvm_i1_ty, // glc(imm)
llvm_i1_ty, // slc(imm)		llvm_i1_ty, // slc(imm)
llvm_i1_ty, // lwe(imm)		llvm_i1_ty, // lwe(imm)
llvm_i1_ty], // da(imm)		llvm_i1_ty], // da(imm)
[]>;		[Convergent<2>]>;

def int_amdgcn_image_store : AMDGPUImageStore;		def int_amdgcn_image_store : AMDGPUImageStore;
def int_amdgcn_image_store_mip : AMDGPUImageStore;		def int_amdgcn_image_store_mip : AMDGPUImageStore;

class AMDGPUImageSample : Intrinsic <		class AMDGPUImageSample : Intrinsic <
[llvm_anyfloat_ty], // vdata(VGPR)		[llvm_anyfloat_ty], // vdata(VGPR)
[llvm_anyfloat_ty, // vaddr(VGPR)		[llvm_anyfloat_ty, // vaddr(VGPR)
llvm_anyint_ty, // rsrc(SGPR)		llvm_anyint_ty, // rsrc(SGPR)
llvm_v4i32_ty, // sampler(SGPR)		llvm_v4i32_ty, // sampler(SGPR)
llvm_i32_ty, // dmask(imm)		llvm_i32_ty, // dmask(imm)
llvm_i1_ty, // unorm(imm)		llvm_i1_ty, // unorm(imm)
llvm_i1_ty, // glc(imm)		llvm_i1_ty, // glc(imm)
llvm_i1_ty, // slc(imm)		llvm_i1_ty, // slc(imm)
llvm_i1_ty, // lwe(imm)		llvm_i1_ty, // lwe(imm)
llvm_i1_ty], // da(imm)		llvm_i1_ty], // da(imm)
[IntrReadMem]>;		[IntrReadMem, Convergent<1>, Convergent<2>]>;

// Basic sample		// Basic sample
def int_amdgcn_image_sample : AMDGPUImageSample;		def int_amdgcn_image_sample : AMDGPUImageSample;
def int_amdgcn_image_sample_cl : AMDGPUImageSample;		def int_amdgcn_image_sample_cl : AMDGPUImageSample;
def int_amdgcn_image_sample_d : AMDGPUImageSample;		def int_amdgcn_image_sample_d : AMDGPUImageSample;
def int_amdgcn_image_sample_d_cl : AMDGPUImageSample;		def int_amdgcn_image_sample_d_cl : AMDGPUImageSample;
def int_amdgcn_image_sample_l : AMDGPUImageSample;		def int_amdgcn_image_sample_l : AMDGPUImageSample;
def int_amdgcn_image_sample_b : AMDGPUImageSample;		def int_amdgcn_image_sample_b : AMDGPUImageSample;
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
class AMDGPUImageAtomic : Intrinsic <		class AMDGPUImageAtomic : Intrinsic <
[llvm_i32_ty],		[llvm_i32_ty],
[llvm_i32_ty, // vdata(VGPR)		[llvm_i32_ty, // vdata(VGPR)
llvm_anyint_ty, // vaddr(VGPR)		llvm_anyint_ty, // vaddr(VGPR)
llvm_v8i32_ty, // rsrc(SGPR)		llvm_v8i32_ty, // rsrc(SGPR)
llvm_i1_ty, // r128(imm)		llvm_i1_ty, // r128(imm)
llvm_i1_ty, // da(imm)		llvm_i1_ty, // da(imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[]>;		[Convergent<2>]>;

def int_amdgcn_image_atomic_swap : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_swap : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_add : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_add : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_sub : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_sub : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_smin : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_smin : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_umin : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_umin : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_smax : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_smax : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_umax : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_umax : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_and : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_and : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_or : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_or : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_xor : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_xor : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_inc : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_inc : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_dec : AMDGPUImageAtomic;		def int_amdgcn_image_atomic_dec : AMDGPUImageAtomic;
def int_amdgcn_image_atomic_cmpswap : Intrinsic <		def int_amdgcn_image_atomic_cmpswap : Intrinsic <
[llvm_i32_ty],		[llvm_i32_ty],
[llvm_i32_ty, // src(VGPR)		[llvm_i32_ty, // src(VGPR)
llvm_i32_ty, // cmp(VGPR)		llvm_i32_ty, // cmp(VGPR)
llvm_anyint_ty, // vaddr(VGPR)		llvm_anyint_ty, // vaddr(VGPR)
llvm_v8i32_ty, // rsrc(SGPR)		llvm_v8i32_ty, // rsrc(SGPR)
llvm_i1_ty, // r128(imm)		llvm_i1_ty, // r128(imm)
llvm_i1_ty, // da(imm)		llvm_i1_ty, // da(imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[]>;		[Convergent<3>]>;

class AMDGPUBufferLoad : Intrinsic <		class AMDGPUBufferLoad : Intrinsic <
[llvm_anyfloat_ty],		[llvm_anyfloat_ty],
[llvm_v4i32_ty, // rsrc(SGPR)		[llvm_v4i32_ty, // rsrc(SGPR)
llvm_i32_ty, // vindex(VGPR)		llvm_i32_ty, // vindex(VGPR)
llvm_i32_ty, // offset(SGPR/VGPR/imm)		llvm_i32_ty, // offset(SGPR/VGPR/imm)
llvm_i1_ty, // glc(imm)		llvm_i1_ty, // glc(imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[IntrReadMem]>;		[IntrReadMem, Convergent<0>]>;
def int_amdgcn_buffer_load_format : AMDGPUBufferLoad;		def int_amdgcn_buffer_load_format : AMDGPUBufferLoad;
def int_amdgcn_buffer_load : AMDGPUBufferLoad;		def int_amdgcn_buffer_load : AMDGPUBufferLoad;

class AMDGPUBufferStore : Intrinsic <		class AMDGPUBufferStore : Intrinsic <
[],		[],
[llvm_anyfloat_ty, // vdata(VGPR) -- can currently only select f32, v2f32, v4f32		[llvm_anyfloat_ty, // vdata(VGPR) -- can currently only select f32, v2f32, v4f32
llvm_v4i32_ty, // rsrc(SGPR)		llvm_v4i32_ty, // rsrc(SGPR)
llvm_i32_ty, // vindex(VGPR)		llvm_i32_ty, // vindex(VGPR)
llvm_i32_ty, // offset(SGPR/VGPR/imm)		llvm_i32_ty, // offset(SGPR/VGPR/imm)
llvm_i1_ty, // glc(imm)		llvm_i1_ty, // glc(imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[IntrWriteMem]>;		[IntrWriteMem, Convergent<1>]>;
def int_amdgcn_buffer_store_format : AMDGPUBufferStore;		def int_amdgcn_buffer_store_format : AMDGPUBufferStore;
def int_amdgcn_buffer_store : AMDGPUBufferStore;		def int_amdgcn_buffer_store : AMDGPUBufferStore;

class AMDGPUBufferAtomic : Intrinsic <		class AMDGPUBufferAtomic : Intrinsic <
[llvm_i32_ty],		[llvm_i32_ty],
[llvm_i32_ty, // vdata(VGPR)		[llvm_i32_ty, // vdata(VGPR)
llvm_v4i32_ty, // rsrc(SGPR)		llvm_v4i32_ty, // rsrc(SGPR)
llvm_i32_ty, // vindex(VGPR)		llvm_i32_ty, // vindex(VGPR)
llvm_i32_ty, // offset(SGPR/VGPR/imm)		llvm_i32_ty, // offset(SGPR/VGPR/imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[]>;		[Convergent<1>]>;
def int_amdgcn_buffer_atomic_swap : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_swap : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_add : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_add : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_sub : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_sub : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_smin : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_smin : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_umin : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_umin : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_smax : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_smax : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_umax : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_umax : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_and : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_and : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_or : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_or : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_xor : AMDGPUBufferAtomic;		def int_amdgcn_buffer_atomic_xor : AMDGPUBufferAtomic;
def int_amdgcn_buffer_atomic_cmpswap : Intrinsic<		def int_amdgcn_buffer_atomic_cmpswap : Intrinsic<
[llvm_i32_ty],		[llvm_i32_ty],
[llvm_i32_ty, // src(VGPR)		[llvm_i32_ty, // src(VGPR)
llvm_i32_ty, // cmp(VGPR)		llvm_i32_ty, // cmp(VGPR)
llvm_v4i32_ty, // rsrc(SGPR)		llvm_v4i32_ty, // rsrc(SGPR)
llvm_i32_ty, // vindex(VGPR)		llvm_i32_ty, // vindex(VGPR)
llvm_i32_ty, // offset(SGPR/VGPR/imm)		llvm_i32_ty, // offset(SGPR/VGPR/imm)
llvm_i1_ty], // slc(imm)		llvm_i1_ty], // slc(imm)
[]>;		[Convergent<2>]>;

def int_amdgcn_buffer_wbinvl1_sc :		def int_amdgcn_buffer_wbinvl1_sc :
GCCBuiltin<"__builtin_amdgcn_buffer_wbinvl1_sc">,		GCCBuiltin<"__builtin_amdgcn_buffer_wbinvl1_sc">,
Intrinsic<[], [], []>;		Intrinsic<[], [], []>;

def int_amdgcn_buffer_wbinvl1 :		def int_amdgcn_buffer_wbinvl1 :
GCCBuiltin<"__builtin_amdgcn_buffer_wbinvl1">,		GCCBuiltin<"__builtin_amdgcn_buffer_wbinvl1">,
Intrinsic<[], [], []>;		Intrinsic<[], [], []>;
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

lib/Analysis/CodeMetrics.cpp

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	if (isa<ExtractElementInst>(I) \|\| I.getType()->isVectorTy())
++NumVectorInsts;		++NumVectorInsts;

if (I.getType()->isTokenTy() && I.isUsedOutsideOfBlock(BB))		if (I.getType()->isTokenTy() && I.isUsedOutsideOfBlock(BB))
notDuplicatable = true;		notDuplicatable = true;

if (const CallInst *CI = dyn_cast<CallInst>(&I)) {		if (const CallInst *CI = dyn_cast<CallInst>(&I)) {
if (CI->cannotDuplicate())		if (CI->cannotDuplicate())
notDuplicatable = true;		notDuplicatable = true;
if (CI->isConvergent())		if (CI->isConvergent() \|\| CI->hasConvergentArgument())
convergent = true;		convergent = true;
}		}

if (const InvokeInst *InvI = dyn_cast<InvokeInst>(&I))		if (const InvokeInst *InvI = dyn_cast<InvokeInst>(&I))
if (InvI->cannotDuplicate())		if (InvI->cannotDuplicate())
notDuplicatable = true;		notDuplicatable = true;

NumInsts += TTI.getUserCost(&I);		NumInsts += TTI.getUserCost(&I);
Show All 21 Lines

lib/AsmParser/LLParser.cpp

Show First 20 Lines • Show All 1,381 Lines • ▼ Show 20 Lines	while (true) {
case lltok::kw_align: {		case lltok::kw_align: {
unsigned Alignment;		unsigned Alignment;
if (ParseOptionalAlignment(Alignment))		if (ParseOptionalAlignment(Alignment))
return true;		return true;
B.addAlignmentAttr(Alignment);		B.addAlignmentAttr(Alignment);
continue;		continue;
}		}
case lltok::kw_byval: B.addAttribute(Attribute::ByVal); break;		case lltok::kw_byval: B.addAttribute(Attribute::ByVal); break;
		case lltok::kw_convergent: B.addAttribute(Attribute::Convergent); break;
case lltok::kw_dereferenceable: {		case lltok::kw_dereferenceable: {
uint64_t Bytes;		uint64_t Bytes;
if (ParseOptionalDerefAttrBytes(lltok::kw_dereferenceable, Bytes))		if (ParseOptionalDerefAttrBytes(lltok::kw_dereferenceable, Bytes))
return true;		return true;
B.addDereferenceableAttr(Bytes);		B.addDereferenceableAttr(Bytes);
continue;		continue;
}		}
case lltok::kw_dereferenceable_or_null: {		case lltok::kw_dereferenceable_or_null: {
▲ Show 20 Lines • Show All 5,128 Lines • Show Last 20 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 1,318 Lines • ▼ Show 20 Lines	if (I->getKindAsEnum() == Attribute::NoReturn \|\|
I->getKindAsEnum() == Attribute::SanitizeMemory \|\|		I->getKindAsEnum() == Attribute::SanitizeMemory \|\|
I->getKindAsEnum() == Attribute::MinSize \|\|		I->getKindAsEnum() == Attribute::MinSize \|\|
I->getKindAsEnum() == Attribute::NoDuplicate \|\|		I->getKindAsEnum() == Attribute::NoDuplicate \|\|
I->getKindAsEnum() == Attribute::Builtin \|\|		I->getKindAsEnum() == Attribute::Builtin \|\|
I->getKindAsEnum() == Attribute::NoBuiltin \|\|		I->getKindAsEnum() == Attribute::NoBuiltin \|\|
I->getKindAsEnum() == Attribute::Cold \|\|		I->getKindAsEnum() == Attribute::Cold \|\|
I->getKindAsEnum() == Attribute::OptimizeNone \|\|		I->getKindAsEnum() == Attribute::OptimizeNone \|\|
I->getKindAsEnum() == Attribute::JumpTable \|\|		I->getKindAsEnum() == Attribute::JumpTable \|\|
I->getKindAsEnum() == Attribute::Convergent \|\|
I->getKindAsEnum() == Attribute::ArgMemOnly \|\|		I->getKindAsEnum() == Attribute::ArgMemOnly \|\|
I->getKindAsEnum() == Attribute::NoRecurse \|\|		I->getKindAsEnum() == Attribute::NoRecurse \|\|
I->getKindAsEnum() == Attribute::InaccessibleMemOnly \|\|		I->getKindAsEnum() == Attribute::InaccessibleMemOnly \|\|
I->getKindAsEnum() == Attribute::InaccessibleMemOrArgMemOnly \|\|		I->getKindAsEnum() == Attribute::InaccessibleMemOrArgMemOnly \|\|
I->getKindAsEnum() == Attribute::AllocSize) {		I->getKindAsEnum() == Attribute::AllocSize) {
if (!isFunction) {		if (!isFunction) {
CheckFailed("Attribute '" + I->getAsString() +		CheckFailed("Attribute '" + I->getAsString() +
"' only applies to functions!", V);		"' only applies to functions!", V);
return;		return;
}		}
} else if (I->getKindAsEnum() == Attribute::ReadOnly \|\|		} else if (I->getKindAsEnum() == Attribute::Convergent \|\|
		I->getKindAsEnum() == Attribute::ReadOnly \|\|
I->getKindAsEnum() == Attribute::WriteOnly \|\|		I->getKindAsEnum() == Attribute::WriteOnly \|\|
I->getKindAsEnum() == Attribute::ReadNone) {		I->getKindAsEnum() == Attribute::ReadNone) {
if (Idx == 0) {		if (Idx == 0) {
CheckFailed("Attribute '" + I->getAsString() +		CheckFailed("Attribute '" + I->getAsString() +
"' does not apply to function returns");		"' does not apply to function returns");
return;		return;
}		}
} else if (isFunction) {		} else if (isFunction) {
Show All 10 Lines	void Verifier::verifyParameterAttrs(AttributeSet Attrs, unsigned Idx, Type *Ty,
bool isReturnValue, const Value *V) {		bool isReturnValue, const Value *V) {
if (!Attrs.hasAttributes(Idx))		if (!Attrs.hasAttributes(Idx))
return;		return;

verifyAttributeTypes(Attrs, Idx, false, V);		verifyAttributeTypes(Attrs, Idx, false, V);

if (isReturnValue)		if (isReturnValue)
Assert(!Attrs.hasAttribute(Idx, Attribute::ByVal) &&		Assert(!Attrs.hasAttribute(Idx, Attribute::ByVal) &&
		!Attrs.hasAttribute(Idx, Attribute::Convergent) &&
!Attrs.hasAttribute(Idx, Attribute::Nest) &&		!Attrs.hasAttribute(Idx, Attribute::Nest) &&
!Attrs.hasAttribute(Idx, Attribute::StructRet) &&		!Attrs.hasAttribute(Idx, Attribute::StructRet) &&
!Attrs.hasAttribute(Idx, Attribute::NoCapture) &&		!Attrs.hasAttribute(Idx, Attribute::NoCapture) &&
!Attrs.hasAttribute(Idx, Attribute::Returned) &&		!Attrs.hasAttribute(Idx, Attribute::Returned) &&
!Attrs.hasAttribute(Idx, Attribute::InAlloca) &&		!Attrs.hasAttribute(Idx, Attribute::InAlloca) &&
!Attrs.hasAttribute(Idx, Attribute::SwiftSelf) &&		!Attrs.hasAttribute(Idx, Attribute::SwiftSelf) &&
!Attrs.hasAttribute(Idx, Attribute::SwiftError),		!Attrs.hasAttribute(Idx, Attribute::SwiftError),
"Attributes 'byval', 'inalloca', 'nest', 'sret', 'nocapture', "		"Attributes 'byval', 'inalloca', 'nest', 'sret', 'nocapture', "
▲ Show 20 Lines • Show All 3,141 Lines • Show Last 20 Lines

lib/Transforms/Scalar/GVNHoist.cpp

Show First 20 Lines • Show All 914 Lines • ▼ Show 20 Lines	for (BasicBlock *BB : depth_first(&F.getEntryBlock())) {
if (Call->mayHaveSideEffects()) {		if (Call->mayHaveSideEffects()) {
if (!OptForMinSize)		if (!OptForMinSize)
break;		break;
// We may continue hoisting across calls which write to memory.		// We may continue hoisting across calls which write to memory.
if (Call->mayThrow())		if (Call->mayThrow())
break;		break;
}		}

if (Call->isConvergent())		if (Call->isConvergent() \|\| Call->hasConvergentArgument())
break;		break;

CI.insert(Call, VN);		CI.insert(Call, VN);
} else if (HoistingGeps \|\| !isa<GetElementPtrInst>(&I1))		} else if (HoistingGeps \|\| !isa<GetElementPtrInst>(&I1))
// Do not hoist scalars past calls that may write to memory because		// Do not hoist scalars past calls that may write to memory because
// that could result in spills later. geps are handled separately.		// that could result in spills later. geps are handled separately.
// TODO: We can relax this for targets like AArch64 as they have more		// TODO: We can relax this for targets like AArch64 as they have more
// registers than X86.		// registers than X86.
▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 1,001 Lines • ▼ Show 20 Lines	static bool tryToUnrollLoop(Loop L, DominatorTree &DT, LoopInfo LI,
//		//
// TODO: This is quite conservative. In practice, convergent_op()		// TODO: This is quite conservative. In practice, convergent_op()
// is likely to be called unconditionally in the loop. In this		// is likely to be called unconditionally in the loop. In this
// case, the program would be ill-formed (on most architectures)		// case, the program would be ill-formed (on most architectures)
// unless n were the same on all threads in a thread group.		// unless n were the same on all threads in a thread group.
// Assuming n is the same on all threads, any kind of unrolling is		// Assuming n is the same on all threads, any kind of unrolling is
// safe. But currently llvm's notion of convergence isn't powerful		// safe. But currently llvm's notion of convergence isn't powerful
// enough to express this.		// enough to express this.
		//
		// Similarly, if the loop contains an operation with a convergent argument,
		// the prelude could shift the sequences of argument values in a way that
		// affects the compatibility of runs wrt convergent function arguments.
		//
		// TODO: This is also conservative. If all convergent arguments that appear
		// in the loop are loop invariant, then compatibility of runs is preserved.
if (Convergent)		if (Convergent)
UP.AllowRemainder = false;		UP.AllowRemainder = false;

// Try to find the trip count upper bound if we cannot find the exact trip		// Try to find the trip count upper bound if we cannot find the exact trip
// count.		// count.
bool MaxOrZero = false;		bool MaxOrZero = false;
if (!TripCount) {		if (!TripCount) {
MaxTripCount = SE->getSmallConstantMaxTripCount(L);		MaxTripCount = SE->getSmallConstantMaxTripCount(L);
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 1,365 Lines • ▼ Show 20 Lines	if (isa<IntrinsicInst>(I))
return false;		return false;

// Constant bundle operands may need to retain their constant-ness for		// Constant bundle operands may need to retain their constant-ness for
// correctness.		// correctness.
if (ImmutableCallSite(I).isBundleOperand(OpIdx))		if (ImmutableCallSite(I).isBundleOperand(OpIdx))
return false;		return false;

return true;		return true;

case Instruction::ShuffleVector:		case Instruction::ShuffleVector:
// Shufflevector masks are constant.		// Shufflevector masks are constant.
return OpIdx != 2;		return OpIdx != 2;
case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::InsertValue:		case Instruction::InsertValue:
// All operands apart from the first are constant.		// All operands apart from the first are constant.
return OpIdx == 0;		return OpIdx == 0;
case Instruction::Alloca:		case Instruction::Alloca:
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	if (!all_of(Insts, [&PNUse,&Succ](const Instruction *I) -> bool {
return false;		return false;
}		}

for (unsigned OI = 0, OE = I0->getNumOperands(); OI != OE; ++OI) {		for (unsigned OI = 0, OE = I0->getNumOperands(); OI != OE; ++OI) {
if (I0->getOperand(OI)->getType()->isTokenTy())		if (I0->getOperand(OI)->getType()->isTokenTy())
// Don't touch any operand of token type.		// Don't touch any operand of token type.
return false;		return false;

		// We cannot sink calls if any argument is convergent: sinking may add
		// divergence even when the argument is the same IR value in all blocks,
		// because there may be divergence in the argument that was removed by the
		// control flow conditions for those blocks.
		jlebarUnsubmitted Not Done Reply Inline Actions I don't understand the "because" clause here; can we rephrase? jlebar: I don't understand the "because" clause here; can we rephrase?
		nhaehnleAuthorUnsubmitted Not Done Reply Inline Actions Yes, let's rephrase using the language of executions: // We cannot sink calls if any parameter is convergent: two executions that go // through different predecessor blocks are compatible with respect to // convergent parameters at the considered call-sites before sinking even if // the concrete values passed to the called function are different. After sinking, // they are no longer compatible, because different concrete values are passed // to the called function at the same call-site. Or the strategically better variant: // We cannot sink calls if any parameter is convergent. nhaehnle: Yes, let's rephrase using the language of executions: // We cannot sink calls if any…
		if (isa<CallInst>(I0) \|\| isa<InvokeInst>(I0)) {
		ImmutableCallSite CS(I0);
		if (OI < CS.getNumArgOperands() &&
		CS.paramHasAttr(OI + 1, Attribute::Convergent))
		return false;
		}

// Because SROA can't handle speculating stores of selects, try not		// Because SROA can't handle speculating stores of selects, try not
// to sink loads or stores of allocas when we'd have to create a PHI for		// to sink loads or stores of allocas when we'd have to create a PHI for
// the address operand. Also, because it is likely that loads or stores		// the address operand. Also, because it is likely that loads or stores
// of allocas will disappear when Mem2Reg/SROA is run, don't sink them.		// of allocas will disappear when Mem2Reg/SROA is run, don't sink them.
// This can cause code churn which can have unintended consequences down		// This can cause code churn which can have unintended consequences down
// the line - see https://llvm.org/bugs/show_bug.cgi?id=30244.		// the line - see https://llvm.org/bugs/show_bug.cgi?id=30244.
// FIXME: This is a workaround for a deficiency in SROA - see		// FIXME: This is a workaround for a deficiency in SROA - see
// https://llvm.org/bugs/show_bug.cgi?id=30188		// https://llvm.org/bugs/show_bug.cgi?id=30188
▲ Show 20 Lines • Show All 4,515 Lines • Show Last 20 Lines

test/Bitcode/attributes.ll

	Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines
	}			}

	; CHECK: define void @f56() #32			; CHECK: define void @f56() #32
	define void @f56() writeonly			define void @f56() writeonly
	{			{
	ret void			ret void
	}			}

				; CHECK: define void @f57(i32 convergent)
				define void @f57(i32 convergent) {
				ret void
				}

	; CHECK: attributes #0 = { noreturn }			; CHECK: attributes #0 = { noreturn }
	; CHECK: attributes #1 = { nounwind }			; CHECK: attributes #1 = { nounwind }
	; CHECK: attributes #2 = { readnone }			; CHECK: attributes #2 = { readnone }
	; CHECK: attributes #3 = { readonly }			; CHECK: attributes #3 = { readonly }
	; CHECK: attributes #4 = { noinline }			; CHECK: attributes #4 = { noinline }
	; CHECK: attributes #5 = { alwaysinline }			; CHECK: attributes #5 = { alwaysinline }
	; CHECK: attributes #6 = { optsize }			; CHECK: attributes #6 = { optsize }
	; CHECK: attributes #7 = { ssp }			; CHECK: attributes #7 = { ssp }
	Show All 26 Lines

test/Transforms/GVNHoist/hoist-convergent.ll

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	if.else:
br label %if.end		br label %if.end

if.end:		if.end:
%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]		%tmax.0 = phi float [ %mul2, %if.then ], [ %mul6, %if.else ]
%add = fadd float %tmax.0, 10.0		%add = fadd float %tmax.0, 10.0
ret float %add		ret float %add
}		}

		; Cannot hoist because there could be divergence in the argument that was
		; filtered out by the branch condition.
		;
		; CHECK-LABEL: @no_convergent_arg_hoisting(
		; CHECK: if.then:
		; CHECK: call float @convergent_arg_func(

		; CHECK: if.else:
		; CHECK: call float @convergent_arg_func(
		define float @no_convergent_arg_hoisting(i1 %cc, float %x) {
		entry:
		br i1 %cc, label %if.then, label %if.else

		if.then:
		%r1 = call float @convergent_arg_func(float %x)
		br label %if.end

		if.else:
		%r2 = call float @convergent_arg_func(float %x)
		br label %if.end

		if.end:
		%r = phi float [ %r1, %if.then ], [ %r2, %if.else ]
		ret float %r
		}

declare float @convergent_func(float, float) #0		declare float @convergent_func(float, float) #0
declare float @func(float, float) #1		declare float @func(float, float) #1
		declare float @convergent_arg_func(float convergent) #1

attributes #0 = { nounwind readnone convergent }		attributes #0 = { nounwind readnone convergent }
attributes #1 = { nounwind readnone }		attributes #1 = { nounwind readnone }

test/Transforms/InstCombine/select-call.ll

This file was added.

				; RUN: opt < %s -instcombine -S \| FileCheck %s

				declare i32 @f(i32 convergent %a, i32 %b) nounwind readnone

				; Must keep both calls to f here, since they're not compatible wrt convergent
				; function arguments.
				define i32 @check_bad(i1 %cond, i32 %v) {
				; CHECK-LABEL: @check_bad(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[R0:%.*]] = call i32 @f(i32 0, i32 %v)
				; CHECK-NEXT: [[R1:%.*]] = call i32 @f(i32 1, i32 %v)
				; CHECK-NEXT: [[OUT:%.*]] = select i1 %cond, i32 [[R0]], i32 [[R1]]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				entry:
				%r0 = call i32 @f(i32 0, i32 %v)
				%r1 = call i32 @f(i32 1, i32 %v)
				%out = select i1 %cond, i32 %r0, i32 %r1
				ret i32 %out
				}

				; TODO: In this case, select C, call, call -> call select would be beneficial
				; and correct.
				define i32 @check_good(i1 %cond, i32 %v) {
				; CHECK-LABEL: @check_good
				; NO-CHECK: %out.v = select i1 %cond, i32 0, i32 1
				; NO-CHECK: %out = call i32 @f(i32 %v, i32 %out.v)
				;
				entry:
				%r0 = call i32 @f(i32 %v, i32 0)
				%r1 = call i32 @f(i32 %v, i32 1)
				%out = select i1 %cond, i32 %r0, i32 %r1
				ret i32 %out
				}

test/Transforms/LoopUnroll/convergent.ll

; RUN: opt < %s -loop-unroll -unroll-runtime -unroll-allow-partial -S \| FileCheck %s		; RUN: opt < %s -loop-unroll -unroll-runtime -unroll-allow-partial -S \| FileCheck %s

declare void @f() convergent		declare void @f() convergent
		declare void @g(i32 convergent)

; Although this loop contains a convergent instruction, it should be		; Although this loop contains a convergent instruction, it should be
; fully unrolled.		; fully unrolled.
;		;
; CHECK-LABEL: @full_unroll(		; CHECK-LABEL: @full_unroll(
define i32 @full_unroll() {		define i32 @full_unroll() {
entry:		entry:
br label %l3		br label %l3

l3:		l3:
%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]		%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
; CHECK: call void @f()		; CHECK: call void @f()
; CHECK: call void @f()		; CHECK: call void @f()
; CHECK: call void @f()		; CHECK: call void @f()
; CHECK-NOT: call void @f()		; CHECK-NOT: call void @f()
call void @f() ;convergent		call void @f() ;convergent
%inc = add nsw i32 %x.0, 1		%inc = add nsw i32 %x.0, 1
%exitcond = icmp eq i32 %inc, 3		%exitcond = icmp eq i32 %inc, 3
br i1 %exitcond, label %exit, label %l3		br i1 %exitcond, label %exit, label %l3

exit:		exit:
ret i32 0		ret i32 0
}		}

		; CHECK-LABEL: @full_unroll_arg(
		define i32 @full_unroll_arg() {
		entry:
		br label %l3

		l3:
		%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK-NOT: call void @g(
		call void @g(i32 %x.0)
		%inc = add nsw i32 %x.0, 1
		%exitcond = icmp eq i32 %inc, 3
		br i1 %exitcond, label %exit, label %l3

		exit:
		ret i32 0
		}


; This loop contains a convergent instruction, but it should be partially		; This loop contains a convergent instruction, but it should be partially
; unrolled. The unroll count is the largest power of 2 that divides the		; unrolled. The unroll count is the largest power of 2 that divides the
; multiple -- 4, in this case.		; multiple -- 4, in this case.
;		;
; CHECK-LABEL: @runtime_unroll(		; CHECK-LABEL: @runtime_unroll(
define i32 @runtime_unroll(i32 %n) {		define i32 @runtime_unroll(i32 %n) {
entry:		entry:
%loop_ctl = mul nsw i32 %n, 12		%loop_ctl = mul nsw i32 %n, 12
Show All 10 Lines	; CHECK-NOT: call void @f()
%inc = add nsw i32 %x.0, 1		%inc = add nsw i32 %x.0, 1
%exitcond = icmp eq i32 %inc, %loop_ctl		%exitcond = icmp eq i32 %inc, %loop_ctl
br i1 %exitcond, label %exit, label %l3		br i1 %exitcond, label %exit, label %l3

exit:		exit:
ret i32 0		ret i32 0
}		}

		; CHECK-LABEL: @runtime_unroll_arg(
		define i32 @runtime_unroll_arg(i32 %n) {
		entry:
		%loop_ctl = mul nsw i32 %n, 12
		br label %l3

		l3:
		%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK-NOT: call void @g(
		call void @g(i32 %x.0)
		%inc = add nsw i32 %x.0, 1
		%exitcond = icmp eq i32 %inc, %loop_ctl
		br i1 %exitcond, label %exit, label %l3

		exit:
		ret i32 0
		}

; This loop contains a convergent instruction, so its partial unroll		; This loop contains a convergent instruction, so its partial unroll
; count must divide its trip multiple. This overrides its unroll		; count must divide its trip multiple. This overrides its unroll
; pragma -- we unroll exactly 8 times, even though 16 is requested.		; pragma -- we unroll exactly 8 times, even though 16 is requested.
; CHECK-LABEL: @pragma_unroll		; CHECK-LABEL: @pragma_unroll
define i32 @pragma_unroll(i32 %n) {		define i32 @pragma_unroll(i32 %n) {
entry:		entry:
%loop_ctl = mul nsw i32 %n, 24		%loop_ctl = mul nsw i32 %n, 24
br label %l3, !llvm.loop !0		br label %l3, !llvm.loop !0
Show All 13 Lines	; CHECK-NOT: call void @f()
%inc = add nsw i32 %x.0, 1		%inc = add nsw i32 %x.0, 1
%exitcond = icmp eq i32 %inc, %loop_ctl		%exitcond = icmp eq i32 %inc, %loop_ctl
br i1 %exitcond, label %exit, label %l3, !llvm.loop !0		br i1 %exitcond, label %exit, label %l3, !llvm.loop !0

exit:		exit:
ret i32 0		ret i32 0
}		}

		; CHECK-LABEL: @pragma_unroll_arg
		define i32 @pragma_unroll_arg(i32 %n) {
		entry:
		%loop_ctl = mul nsw i32 %n, 24
		br label %l3, !llvm.loop !0

		l3:
		%x.0 = phi i32 [ 0, %entry ], [ %inc, %l3 ]
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK: call void @g(
		; CHECK-NOT: call void @g(
		call void @g(i32 %x.0)
		%inc = add nsw i32 %x.0, 1
		%exitcond = icmp eq i32 %inc, %loop_ctl
		br i1 %exitcond, label %exit, label %l3, !llvm.loop !0

		exit:
		ret i32 0
		}

!0 = !{!0, !{!"llvm.loop.unroll.count", i32 16}}		!0 = !{!0, !{!"llvm.loop.unroll.count", i32 16}}

test/Transforms/SimplifyCFG/convergent.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -simplifycfg -S \| FileCheck %s

				declare i32 @f(i32 convergent %a, i32 %b) nounwind readonly

				; Cannot sink because there would clearly be divergence in the first argument.
				;
				define i32 @check1(i1 %cond, i32 %a, i32 %b, i32 %v) {
				; CHECK-LABEL: @check1(
				; CHECK-NEXT: br i1 %cond, label %if.then, label %if.else
				; CHECK: if.then:
				; CHECK-NEXT: [[RA:%.*]] = call i32 @f(i32 %a, i32 %v)
				; CHECK-NEXT: br label %if.end
				; CHECK: if.else:
				; CHECK-NEXT: [[RB:%.*]] = call i32 @f(i32 %b, i32 %v)
				; CHECK-NEXT: br label %if.end
				; CHECK: if.end:
				; CHECK-NEXT: [[OUT:%.*]] = phi i32 [ [[RA]], %if.then ], [ [[RB]], %if.else ]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				br i1 %cond, label %if.then, label %if.else

				if.then:
				%ra = call i32 @f(i32 %a, i32 %v)
				br label %if.end

				if.else:
				%rb = call i32 @f(i32 %b, i32 %v)
				br label %if.end

				if.end:
				%out = phi i32 [ %ra, %if.then ], [ %rb, %if.else ]
				ret i32 %out
				}

				; Cannot sink because there could be divergence in the first argument that was
				; filtered out by the branch condition.
				;
				define i32 @check2(i1 %cond, i32 %a, i32 %b, i32 %v) {
				; CHECK-LABEL: @check2(
				; CHECK-NEXT: br i1 %cond, label %if.then, label %if.else
				; CHECK: if.then:
				; CHECK-NEXT: [[RA:%.*]] = call i32 @f(i32 %v, i32 %a)
				; CHECK-NEXT: br label %if.end
				; CHECK: if.else:
				; CHECK-NEXT: [[RB:%.*]] = call i32 @f(i32 %v, i32 %b)
				; CHECK-NEXT: br label %if.end
				; CHECK: if.end:
				; CHECK-NEXT: [[OUT:%.*]] = phi i32 [ [[RA]], %if.then ], [ [[RB]], %if.else ]
				; CHECK-NEXT: ret i32 [[OUT]]
				;
				br i1 %cond, label %if.then, label %if.else

				if.then:
				%ra = call i32 @f(i32 %v, i32 %a)
				br label %if.end

				if.else:
				%rb = call i32 @f(i32 %v, i32 %b)
				br label %if.end

				if.end:
				%out = phi i32 [ %ra, %if.then ], [ %rb, %if.else ]
				ret i32 %out
				}

utils/TableGen/CodeGenIntrinsics.h

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	struct CodeGenIntrinsic {
bool isNoDuplicate;		bool isNoDuplicate;

/// True if the intrinsic is no-return.		/// True if the intrinsic is no-return.
bool isNoReturn;		bool isNoReturn;

/// True if the intrinsic is marked as convergent.		/// True if the intrinsic is marked as convergent.
bool isConvergent;		bool isConvergent;

enum ArgAttribute { NoCapture, Returned, ReadOnly, WriteOnly, ReadNone };		enum ArgAttribute { Convergent, NoCapture, Returned, ReadOnly, WriteOnly, ReadNone };
std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;		std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;

CodeGenIntrinsic(Record *R);		CodeGenIntrinsic(Record *R);
};		};

class CodeGenIntrinsicTable {		class CodeGenIntrinsicTable {
std::vector<CodeGenIntrinsic> Intrinsics;		std::vector<CodeGenIntrinsic> Intrinsics;

Show All 21 Lines

utils/TableGen/CodeGenTarget.cpp

Show First 20 Lines • Show All 598 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = PropList->size(); i != e; ++i) {
else if (Property->getName() == "Throws")		else if (Property->getName() == "Throws")
canThrow = true;		canThrow = true;
else if (Property->getName() == "IntrNoDuplicate")		else if (Property->getName() == "IntrNoDuplicate")
isNoDuplicate = true;		isNoDuplicate = true;
else if (Property->getName() == "IntrConvergent")		else if (Property->getName() == "IntrConvergent")
isConvergent = true;		isConvergent = true;
else if (Property->getName() == "IntrNoReturn")		else if (Property->getName() == "IntrNoReturn")
isNoReturn = true;		isNoReturn = true;
else if (Property->isSubClassOf("NoCapture")) {		else if (Property->isSubClassOf("Convergent")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, Convergent));
		} else if (Property->isSubClassOf("NoCapture")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, NoCapture));		ArgumentAttributes.push_back(std::make_pair(ArgNo, NoCapture));
} else if (Property->isSubClassOf("Returned")) {		} else if (Property->isSubClassOf("Returned")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, Returned));		ArgumentAttributes.push_back(std::make_pair(ArgNo, Returned));
} else if (Property->isSubClassOf("ReadOnly")) {		} else if (Property->isSubClassOf("ReadOnly")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadOnly));		ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadOnly));
Show All 13 Lines

utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 554 Lines • ▼ Show 20 Lines	if (ae) {
while (ai != ae) {		while (ai != ae) {
unsigned argNo = intrinsic.ArgumentAttributes[ai].first;		unsigned argNo = intrinsic.ArgumentAttributes[ai].first;

OS << " const Attribute::AttrKind AttrParam" << argNo + 1 <<"[]= {";		OS << " const Attribute::AttrKind AttrParam" << argNo + 1 <<"[]= {";
bool addComma = false;		bool addComma = false;

do {		do {
switch (intrinsic.ArgumentAttributes[ai].second) {		switch (intrinsic.ArgumentAttributes[ai].second) {
		case CodeGenIntrinsic::Convergent:
		if (addComma)
		OS << ",";
		OS << "Attribute::Convergent";
		addComma = true;
		break;
case CodeGenIntrinsic::NoCapture:		case CodeGenIntrinsic::NoCapture:
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::NoCapture";		OS << "Attribute::NoCapture";
addComma = true;		addComma = true;
break;		break;
case CodeGenIntrinsic::Returned:		case CodeGenIntrinsic::Returned:
if (addComma)		if (addComma)
▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Allow convergent attribute for function argumentsNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 79063

docs/LangRef.rst

include/llvm/Analysis/CodeMetrics.h

include/llvm/IR/Instructions.h

include/llvm/IR/Intrinsics.td

include/llvm/IR/IntrinsicsAMDGPU.td

lib/Analysis/CodeMetrics.cpp

lib/AsmParser/LLParser.cpp

lib/IR/Verifier.cpp

lib/Transforms/Scalar/GVNHoist.cpp

lib/Transforms/Scalar/LoopUnrollPass.cpp

lib/Transforms/Utils/SimplifyCFG.cpp

test/Bitcode/attributes.ll

test/Transforms/GVNHoist/hoist-convergent.ll

test/Transforms/InstCombine/select-call.ll

test/Transforms/LoopUnroll/convergent.ll

test/Transforms/SimplifyCFG/convergent.ll

utils/TableGen/CodeGenIntrinsics.h

utils/TableGen/CodeGenTarget.cpp

utils/TableGen/IntrinsicEmitter.cpp

Allow convergent attribute for function arguments
Needs RevisionPublic