This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
-
DivergenceAnalysis.h
-
CodeGen/
-
FunctionLoweringInfo.h
1
SelectionDAG.h
2
SelectionDAGNodes.h
1
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
FunctionLoweringInfo.cpp
3
SelectionDAG.cpp
-
SelectionDAGDumper.cpp
5
SelectionDAGISel.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUISelDAGToDAG.cpp
-
AMDGPUISelLowering.h
6/16
AMDGPUISelLowering.cpp
-
AMDGPUTargetTransformInfo.cpp
-
SIISelLowering.cpp
-
SMInstructions.td
-
Utils/
-
AMDGPUBaseInfo.h
-
AMDGPUBaseInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
callee-special-input-sgprs.ll
-
llvm.amdgcn.implicitarg.ptr.ll

Differential D35267

Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection
ClosedPublic

Authored by alex-t on Jul 11 2017, 9:52 AM.

Download Raw Diff

Details

Reviewers

chandlerc
rampitec
vpykhtin
llvm-commits
arsenm
bogner
lattner

Commits

rG2e5eeceeb7a1: Pass Divergence Analysis data to Selection DAG to drive divergence dependent…
rL326703: Pass Divergence Analysis data to Selection DAG to drive divergence

Summary

In SIMT architectures VGPRs are high-demand resource. Same time significant part of the computations operate on naturally scalar data.
That computations can be performed by the SALU and save a lot of VGPRs. This is intended to increase occupancy.
Also, splitting the data flow to scalar and vector parts provide more flexibility to the instruction scheduler that can increase HW utilization.

On GPU targets we say that instruction is vector if it operates on VGPR operands each lane of which contains different values.
We say the instruction is scalar if it operates on SGPR that is shared among the all threads in the warp.

Divergence Analysis was introduced by F. Pereira & Co in 2013 and now is a part of LLVM core analysis stuff.
Unfortunately it's results are mostly useless because there is no way to inform instruction selection DAG about the divergence property of the concrete instruction.
Literally, IR operation that has not divergent operands produces uniform result and should be selected to scalar instruction.

We used to pass divergence data for memory access instructions through metadata just because MemSDNode has memory operand that refer the IR.
This approach is restricted to memory accesses only. That's why we'd need another pass working on the machine code that propagates divergence property
from the value load to computations and finally to the result store. Except the fact that we'd need one more pass,
this pass would repeat on the machine instructions same algorithm that was already done by the divergence analysis over IR.

Since SDNode flags field was recently enhanced to 16 bits and there are 5 bits unoccupied yet we have a chance to use them for passing divergence data to instruction selection.

This change introduce possible approach to the implementation of such enhancement.
It passes DA data for load instructions only. If accepted we'll go ahead and add same code to handle other instructions as well.

Diff Detail

Event Timeline

alex-t created this revision.Jul 11 2017, 9:52 AM

Herald added subscribers: nhaehnle, arsenm. · View Herald TranscriptJul 11 2017, 9:52 AM

lattner resigned from this revision.Jul 11 2017, 10:05 AM

rampitec added inline comments.Jul 11 2017, 1:33 PM

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
334	The analysis is pretty expensive, but not needed by all targets. There is TTI.hasBranchDivergence(). How about adding it as required only if TTI.hasBranchDivergence()? It also means you will need default isDivergent to 0 if analysis is unavailable.

alex-t added inline comments.Jul 12 2017, 5:20 AM

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

334

Not sure if it makes sense. DA.runOnFunction itself bails out if target has no divergence:

bool DivergenceAnalysis::runOnFunction(Function &F) {

auto *TTIWP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>();
if (TTIWP == nullptr)
  return false;

TargetTransformInfo &TTI = TTIWP->getTTI(F);
// Fast path: if the target does not have branch divergence, we do not mark
// any branch as divergent.

if (!TTI.hasBranchDivergence()) return false;**

rampitec added inline comments.Jul 12 2017, 8:03 AM

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
334	It bails, yet it depends on DominatorTree on its own, so adding it as a required pass will cause DT to build.

alex-t added a reviewer: llvm-commits.Jul 12 2017, 9:02 AM

rampitec mentioned this in D36292: AMDGPU: Add pass to cleanup DAG SALU/VALU messes.Aug 3 2017, 9:09 PM

Ping. Does anybody going to look at this? :)

alex-t added a reviewer: arsenm.Sep 4 2017, 7:01 AM

arsenm added inline comments.Sep 11 2017, 8:42 AM

include/llvm/CodeGen/SelectionDAGNodes.h
667–668	I have a general concern about this. The way this is used is going to not fit with how SelectionDAG APIs work, and is going to be very invasive. An SDNode is supposed to be immutable and some level of CSE is done by getNode. You can't have an API that involves setting a bit on a newly created node. Anything setting this needs to be done in getNode. Are divergent and non-divergent nodes CSEable? These need to be handled somewhere to prevent them from folding. You seem to only specially handle loads, but we have a lot of cases where we have combine issues from not knowing whether it's going to be selected to SALU or VALU instructions. If we have to somehow propagate this on every place a node is produced, that is a massive undertaking. I don't think that at this point it's worth trying to do such a level of work on SelectionDAG with GlobalISel on the way. Only handling loads I thought we could do just from the MemOperand.
test/CodeGen/AMDGPU/hsa-func.ll
2 ↗	(On Diff #106050)	This should be dropped

alex-t added inline comments.Sep 11 2017, 9:07 AM

include/llvm/CodeGen/SelectionDAGNodes.h
667–668	I agree with you in general... I also don't like to explicitly propagate divergence flag in each place in combining or/and legalizing. The problem is that getNode is not the only point where new SDNode may be created. For example getLoad and getExtLoad bypass getNode and create LoadSDNode explicitly. As for the handling divergence in CSE map... I maybe do not understand your point? If the node is CSEed we don't care is it divergent or not.

Implementation changed according to the reviewers suggestions.

Herald added a subscriber: wdng. · View Herald TranscriptNov 9 2017, 11:06 AM

This actually looks clean to me, thank you!

alex-t added a reviewer: bogner.Nov 10 2017, 4:40 AM

LGTM

This revision is now accepted and ready to land.Nov 16 2017, 1:29 PM

In general adding "custom" code to SelectionDAGBuilder::setValue looks odd. Instead I would add a target-customizable postprocessing loop on pairs of Value <-> SDNode into SelectionDAGISel::SelectBasicBlock right after the DAG is created. The target hook should be able to get whatever it requires LLVM IR analisys and annotate SDNodes.

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
334	this isn't fixed yet.

Targets that have no divergence do not depend on Divergence Analysis anymore.

In D35267#945021, @vpykhtin wrote:

In general adding "custom" code to SelectionDAGBuilder::setValue looks odd. Instead I would add a target-customizable postprocessing loop on pairs of Value <-> SDNode into SelectionDAGISel::SelectBasicBlock right after the DAG is created. The target hook should be able to get whatever it requires LLVM IR analisys and annotate SDNodes.

The problem I see here is that original Value is already unavailable after DAG builder, which would mean we need to expose NodeMap to targets. In fact current solution looks better to me.

Attention please! If nobody has objections this will be committed next Friday.

Herald added a subscriber: qcolombet. · View Herald TranscriptDec 6 2017, 7:27 AM

Does ReplaceAllUsesWith need to propagate changes to the "IsDivergent" bit?

include/llvm/CodeGen/SelectionDAG.h
354	I like this. :)
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
688 ↗	(On Diff #125725)	You're mutating the node after it's been inserted into the CSEMap, which is generally bad. Also, it's not clear this is the node you need to set the "divergent" bit on (NewN could be something which will be eliminated by DAGCombine, like a BITCAST or MERGE_VALUES). Can we do this some other way which is more obviously correct?

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

In D35267#946933, @efriedma wrote:

Does ReplaceAllUsesWith need to propagate changes to the "IsDivergent" bit?

Divergence Analysis is the iterative solver over SSA form. So, after it's done we assume all the Values are correctly annotated with Divergence flag.
When we change some DAG pattern (combiner/legalizer etc) to some other pattern, the Divergence of any new node (and recursively the resulting pattern root) is superposition of the divergence of it's operands.
So we partially repeat the work that was done by the DA but locally - for each newly created node. This work because we assume all the operands have correct bit set.
Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.
Given that we don't need to propagate the flag in ReplaceAllUsesWith

In D35267#948666, @rampitec wrote:

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

That's true. The problem is that in SelectionDAG::getNode (where the SCEMap insertion is) we have no Value and no chance to check it's divergence.
And this is correct: SelectionDAG is for selection and we should not expose the IR Values to it.

The only way I see is to pass the Divergence parameter to getNode from all the SelectionDAGBuilder visitors. This will be correct but requires to change each of 109 visitors and getNode().

alex-t added inline comments.Dec 8 2017, 5:06 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
688 ↗	(On Diff #125725)	You are right. BTW this is not the only place mutating SDNode that has already been created. Look in SelectionDAG.cpp lines: 4719-4722 if (SDNode *E = FindNodeOrInsertPos(ID, DL, IP)) { E->intersectFlagsWith(Flags); return SDValue(E, 0); } SDnode that found in the map is mutated and then returned w/o any memoization of the mutation

Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.

Transforming "x*0 -> 0" is illegal if x is divergent? That seems surprising.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
688 ↗	(On Diff #125725)	I'm more concerned about whether you're attaching the "divergent" bit to the right node. The mutation is probably mostly harmless; as you note, other places mess with flags.

In D35267#949981, @efriedma wrote:

Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.

Transforming "x*0 -> 0" is illegal if x is divergent? That seems surprising.

Okay, I was unclear. Except for the constants. Your example is a corner case that turn the variable to the constant.
In this case w/o bit propagation we're still correct but sub-optimal.
I can imagine though the case where a long sequence of constant folding ends up with pure zero. If in addition the operand that becomes constant was the only divergent operand, we'd like to propagate.

In D35267#950795, @alex-t wrote:

In D35267#949981, @efriedma wrote:

Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.

Transforming "x*0 -> 0" is illegal if x is divergent? That seems surprising.

Okay, I was unclear. Except for the constants. Your example is a corner case that turn the variable to the constant.
In this case w/o bit propagation we're still correct but sub-optimal.
I can imagine though the case where a long sequence of constant folding ends up with pure zero. If in addition the operand that becomes constant was the only divergent operand, we'd like to propagate.

More general there is a known corner case that for example "get_local_id(x) & ~63" is uniform.

The idea here is that get_local_id() is a source of divergence, but only low 6 bits of it are divergent and upper bits are uniform for our target. Handling such cases would need DA interface to be extended to produce a divergent bit mask instead of one bit answer and employ computeKnownBits on an expression to deduce expressions which converge to be uniform even if depend on a non-uniform value.

The corner case is order of magnitude less frequent than more straight forward uses of a divergent expression though. We have plans to extend divergence analysis in the future to handle this, but without a good mechanism to propagate through DAG it will not be very useful anyway.

In D35267#949394, @alex-t wrote:

In D35267#948666, @rampitec wrote:

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

That's true. The problem is that in SelectionDAG::getNode (where the SCEMap insertion is) we have no Value and no chance to check it's divergence.
And this is correct: SelectionDAG is for selection and we should not expose the IR Values to it.

The only way I see is to pass the Divergence parameter to getNode from all the SelectionDAGBuilder visitors. This will be correct but requires to change each of 109 visitors and getNode().

In fact we have no chance to have 2 SDNodes that differ by the Divergence flag only.
Please note that the selection operates per block. SelectionDAGBuilder construct the DAG for one block at a time.
Then it selects and emits the code. Then all the data including CSE map get cleared.
FoldingSetNodeID creates the hash including node and it's operands.
Thus we hit the hash only if there is same node with same operands.
Form the data dependency point it must have same divergence. So literally it is same node and setting same value of divergence flag makes no harm.
The only case when we could have 2 nodes that differ by the divergence only is if both have same operands but one is control-dependent of the divergent branch.
That immediately means that 2 nodes belong to different basic blocks and hence cannot be folded.

In D35267#952151, @alex-t wrote:

In D35267#949394, @alex-t wrote:

In D35267#948666, @rampitec wrote:

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

That's true. The problem is that in SelectionDAG::getNode (where the SCEMap insertion is) we have no Value and no chance to check it's divergence.
And this is correct: SelectionDAG is for selection and we should not expose the IR Values to it.

The only way I see is to pass the Divergence parameter to getNode from all the SelectionDAGBuilder visitors. This will be correct but requires to change each of 109 visitors and getNode().

In fact we have no chance to have 2 SDNodes that differ by the Divergence flag only.
Please note that the selection operates per block. SelectionDAGBuilder construct the DAG for one block at a time.
Then it selects and emits the code. Then all the data including CSE map get cleared.
FoldingSetNodeID creates the hash including node and it's operands.
Thus we hit the hash only if there is same node with same operands.
Form the data dependency point it must have same divergence. So literally it is same node and setting same value of divergence flag makes no harm.
The only case when we could have 2 nodes that differ by the divergence only is if both have same operands but one is control-dependent of the divergent branch.
That immediately means that 2 nodes belong to different basic blocks and hence cannot be folded.

Thank you, that makes sense. I have no objections then, since creation of a node with a proper flags would result in a really massive patch changing all constructors and visitors. E.g. all node visitors would then need to gain a knowledge about a non node specific property. Post patching seems better to me.

In D35267#950795, @alex-t wrote:

In this case w/o bit propagation we're still correct but sub-optimal.

I'm worried that you're covering up bugs by accepting "sub-optimal" results. Specifically, if you have a node which is marked divergent, but doesn't actually have any divergent operands, it will stay marked divergent in most cases... but if DAGCombine or legalization transforms it to some other equivalent operation, it'll erase the "divergent" marking. So your markings will look mostly correct in simple cases, but break for more complicated cases.

Divergence bit propagation added to ReplaceAllUsesWith

efriedma added inline comments.Dec 19 2017, 4:30 PM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7352–7353	Missing code to unset IsDivergent if a node becomes non-divergent, and missing code to recursively propagate changes.

Actually, what I'd really like to see here is some sort of verifier for the divergent bit. It should be possible to recompute the divergence of the SelectionDAG at any point from first principles. There's a small set of operations which are fundamentally divergent: CopyFromReg where the register contains a divergent value (you should be able to derive this from DivergenceAnalysis), divergent memory accesses, and some target-specific intrinsics. (Not sure that's a complete list, but should be close.) All other operations are divergent if and only if they have a divergent predecessor.

In D35267#960250, @efriedma wrote:

Actually, what I'd really like to see here is some sort of verifier for the divergent bit. It should be possible to recompute the divergence of the SelectionDAG at any point from first principles. There's a small set of operations which are fundamentally divergent: CopyFromReg where the register contains a divergent value (you should be able to derive this from DivergenceAnalysis), divergent memory accesses, and some target-specific intrinsics. (Not sure that's a complete list, but should be close.) All other operations are divergent if and only if they have a divergent predecessor.

What you described is exactly the way how the Divergence Analysis works. Do you really consider creating one more DA upon the Selection DAG?

In D35267#952801, @efriedma wrote:

In D35267#950795, @alex-t wrote:

In this case w/o bit propagation we're still correct but sub-optimal.

I'm worried that you're covering up bugs by accepting "sub-optimal" results. Specifically, if you have a node which is marked divergent, but doesn't actually have any divergent operands, it will stay marked divergent in most cases... but if DAGCombine or legalization transforms it to some other equivalent operation, it'll erase the "divergent" marking. So your markings will look mostly correct in simple cases, but break for more complicated cases.

I still insist that divergence bit propagation over the use replacement is not necessary. In most cases it is useless.
Please note that any node that was created in the combiner/legalizer transformation already has the correct divergence because in CreateOperands the new node divergence is computed as OR over it's operands divergence bits.
Even if some transformation is managed to change the non-divergent use with the divergent one it is illegal.
The corner case in your example that turns variable into the constant can never lead to incorrect code. In the worst case we'll have a splat of zeroes that waste vector register. So the code is still correct but is not optimal.

Do you really consider creating one more DA upon the Selection DAG?

Yes. IR instructions don't have a one-to-one correspondence to SelectionDAG nodes, so I think you're inevitably going to run into subtle bugs which will be difficult to track down.

The key here is computing whether a SelectionDAG node is "naturally" divergent (divergent regardless of its operands); once you have that, computing and verifying complete divergence is trivial. And computing whether a SelectionDAG node is naturally divergent shouldn't be hard, as far as I can tell, so this really shouldn't be much code overall.

Please note that any node that was created in the combiner/legalizer transformation already has the correct divergence because in CreateOperands the new node divergence is computed as OR over it's operands divergence bits.

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node. And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

To start with, let's make sure that we're agreed on terms.
Divergent machine runs a set of threads (warp or wavefront) that execute same set of instructions in same order (SIMT).
Divergent operation operates on "vector" registers such that each register consists of many lanes - each thread operates on the data in corresponding lane.
From the above immediately follows that the only source of divergence is thread ID or any data that is derived from thread ID.
Usually it is a small set of target intrinsics that may be the source of such a data.

There are 2 reasons of operation to be divergent:

It data-dependent on some divergent operation

%tid = call i64 get_global_id_x()  // source of divergence
%1 = add i64 %x, %tid                // data dependence on operand 1
%2 = shl i64 %1, 16                     // data dependence on operand 0
%gep = getelementptr i32, i32 addrspace(1) * %array, i64 %2   // data dependence on operand 1
%val = load i32, %gep                // data dependence on operand 0

operation that is uniform itself but is control-dependent on the divergent branch:

int tid = get_global_id(0)

if (tid < n) {
  x = 1;               // no data-dependency on any divergent data
} else {
  x = 2;              // no data-dependency on any divergent data
}
y = x + 5;     // threads taking different branch-targets have different "y" value - operation is divergent ( it is vector addition on vector registers )

Since the selection DAG only models data dependency the latter case is out of scope of this discussion.
The DAG is constructed, transformed and selected per block.

From the above follows that operation in the selection DAG only may be divergent if there is a path in the DAG from some divergent node to the current node.

Initially DAG is constructed by the walk of the IR (SelectionDAGBuilder) and models IR exactly. Thus the divergence property is kept unchanged.

Both DAG peephole optimizations (combiner) and operations/types legalization do not create the new edges in data dependence graph.
I mean that they match the pattern following the existing edges and then change it to some another sub-graph such that all incoming edges of the old subgraph become incoming edges of the new one and same for the outgoing.
Even if several incoming/outgoing are merged together it keeps data flow pattern.

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node.

So, my question is: could you imagine even theoretical sensible transformation that convert the graph in such a way that uniform node will get divergent income?

And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

Even if it creates new DAG pattern it returns it's root that (because of CreateOperands) has correct divergence that will be passed to setValue. Or I did not understand what you meant?

In D35267#962949, @alex-t wrote:

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node.

So, my question is: could you imagine even theoretical sensible transformation that convert the graph in such a way that uniform node will get divergent income?

No, but that isn't the point. The problem is that you could replace a naturally divergent node with an equivalent naturally divergent node, but the new node doesn't have the divergent bit set (since the bit only gets set in DAGCombine for nodes with divergent operands, and naturally divergent nodes might not have divergent operands). Thinking about it a bit more, I guess regular load/store operations are a bad example; if a load produced multiple values given a uniform address, it would be a data race. But I think atomic memory operations could run into this issue? (Consider, for example, the code in DAGTypeLegalizer::PromoteIntRes_Atomic1.)

In D35267#962949, @alex-t wrote:

And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

Even if it creates new DAG pattern it returns it's root that (because of CreateOperands) has correct divergence that will be passed to setValue. Or I did not understand what you meant?

That's not what I meant.

Say you have a call to a divergent function which returns an i64, but i64 isn't legal on your target (so the function effectively returns two values of type i32). We create the call, a couple CopyFromReg nodes, and then a MERGE_VALUES to merge the value. Then you set the MERGE_VALUES to be divergent... but that isn't really helpful: legalization for MERGE_VALUES erases the node, so the "divergent" bit goes away.

This is a draft of the divergence analysis solver on the selection DAG. In the course of discussion the divergence bit verification was requested.
Analysis of the one given block cannot cover control dependencies. Thus the divergence bits set from the IR reflecting control dependencies cannot match those computed on the one isolated block DAG. That's why it is not exactly the verification. The analysis performed on the DAG augments the divergence information passed from the IR.

In D35267#966245, @efriedma wrote:

In D35267#962949, @alex-t wrote:

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node.

So, my question is: could you imagine even theoretical sensible transformation that convert the graph in such a way that uniform node will get divergent income?

No, but that isn't the point. The problem is that you could replace a naturally divergent node with an equivalent naturally divergent node, but the new node doesn't have the divergent bit set (since the bit only gets set in DAGCombine for nodes with divergent operands, and naturally divergent nodes might not have divergent operands). Thinking about it a bit more, I guess regular load/store operations are a bad example; if a load produced multiple values given a uniform address, it would be a data race. But I think atomic memory operations could run into this issue? (Consider, for example, the code in DAGTypeLegalizer::PromoteIntRes_Atomic1.)

In D35267#962949, @alex-t wrote:

And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

Even if it creates new DAG pattern it returns it's root that (because of CreateOperands) has correct divergence that will be passed to setValue. Or I did not understand what you meant?

That's not what I meant.

Say you have a call to a divergent function which returns an i64, but i64 isn't legal on your target (so the function effectively returns two values of type i32). We create the call, a couple CopyFromReg nodes, and then a MERGE_VALUES to merge the value. Then you set the MERGE_VALUES to be divergent... but that isn't really helpful: legalization for MERGE_VALUES erases the node, so the "divergent" bit goes away.

The diff uploaded is a draft just to check - does it look like what you meant? In fact there are some issues to resolve:

The content of the target specific "isSDNodeSourceOfDivergence" procedure depend on the stage of the DAG lowering where it is called. The most reasonable place is just before the selection after all combining/legalizing are done. In this case all the intrinsics are already expanded and turned to the CopyFromReg or similar elementary operations. So it is unclear if it reasonable to have the code handling this intrinsics.
All the divergence flags propagation in the "ReplaceAllUsesWith" are useless and should be removed.
This solution is not in fact verification because the flags computed on single block in general don't match those passed from the IR because of the control dependencies. This is just yet another part of analysis to augment the information.

I was thinking of a verifier more like the LLVM IR verifier... so we would constantly maintain correct divergence information, then check it in asserts builds. That was we can be confident the bit is right from building the DAG through ISel. In terms of code changes, essentially make the divergence computation in createOperands call isSDNodeSourceOfDivergence, delete the changes to setValue, and make VerifyDAGDiverence assert rather than modify the node when it detects a difference.

In D35267#977124, @alex-t wrote:

The content of the target specific "isSDNodeSourceOfDivergence" procedure depend on the stage of the DAG lowering where it is called. The most reasonable place is just before the selection after all combining/legalizing are done. In this case all the intrinsics are already expanded and turned to the CopyFromReg or similar elementary operations. So it is unclear if it reasonable to have the code handling this intrinsics.

It makes the rest of the patch cleaner if you handle intrinsics in isSDNodeSourceOfDivergence, I think.

This solution is not in fact verification because the flags computed on single block in general don't match those passed from the IR because of the control dependencies. This is just yet another part of analysis to augment the information.

Specifically which nodes are a problem here? We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register. (Not sure there's an existing map from registers to values, but you could easily construct one; basically the inverse of FunctionLoweringInfo::ValueMap.)

Specifically which nodes are a problem here? We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register. (Not sure there's an existing map from registers to values, but you could easily construct one; basically the inverse of FunctionLoweringInfo::ValueMap.)

In one of my previous posts I have explained what control dependencies are. Let's try again.
Consider the following OpenCL code:

uint tid = get_global_id(0);    // returns the ID of the individual workitem
if (tid < 10) {
  x = 2;
} else {
  x = 3;
} 
z = y + x; // all threads 0-9 have x= 2, others x= 3

Please note that the addition "z = y + x" is divergent because different threads compute different values of "z".
Please also note that this addition does not depend on "tid" or any other divergent data. It is not possible to discover this dependency analyzing individual block. We need CFG information.
Divergence Analysis on IR covers control dependencies by means of special PHI-nodes processing.
For regular node the node divergence is computed as literally logical OR of all operands divergence bits.
For PHI-node it adds to the list all the branch instructions that terminate basic blocks in PHI's source blocks post-dominance frontier.

All the above means that we cannot just drop the IR divergence analysis results. DAG only reflects data dependencies.
Analyzing individual block on the DAG we can only follow data dependencies. So if we try to match the divergence bits computed on the IR (counting control flow)
with those computed on the individual block DAG we'll get in assert on the divergence bits set on the nodes control dependent on the divergent branches.

To track all the nodes divergent by the control dependencies we'd need to sustain special data structure along the all stages of the DAG processing.
This all looks too resource consuming.

There is one possible trade-off:
We can add virtual hook in TargetTransformInfo to query if the target support divergence analysis driven selection. It returns true iif the target ensures it has no transformations that may break divergence data integrity.
For AMDGPU that is always true.

If the target does not support this we don't use the divergence bit at all.

This would allow us to use the functionality w/o any even theoretical threat to other targets.

Please also note that this addition does not depend on "tid" or any other divergent data. It is not possible to discover this dependency analyzing individual block. We need CFG information.

Yes, this is what I was getting at with "We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register."; the nodes which need CFG information are precisely CopyFromReg nodes from virtual registers. Each virtual register created by the SelectionDAGBuilder should correspond to exactly one IR instruction.

In D35267#985601, @efriedma wrote:

Please also note that this addition does not depend on "tid" or any other divergent data. It is not possible to discover this dependency analyzing individual block. We need CFG information.

Yes, this is what I was getting at with "We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register."; the nodes which need CFG information are precisely CopyFromReg nodes from virtual registers. Each virtual register created by the SelectionDAGBuilder should correspond to exactly one IR instruction.

In general this would work but we still have several issues:

As I understand you are concerned about the mutating the SDNode after it has been created in getNode().

FunctionLoweringInfo::ValueMap is created during the SelectionDAGBuilder walk through the BasicBlock. So we cannot query live-in register divergence from the CreateOperands => TargetLoweringInfo::isSDNodeSourceOfDivergence. By this point ValueMap has not yet been filled in.
Even if we able to count control dependencies from the SelectionDAGBuilder we would have a mean to propagate the flag value through the DAG along the data dependency edges.

All above means that we cannot just validate the flag values and assert if it does not match. We have to run iterative solver for each block just before the selection to count the control dependencies and to propagate the flag values.

I tried this approach and it works at a first glance.

One more item that should be discussed is the target-specific exceptions to the common divergence modeling algorithm.
For instance in AMDGPU target we have amdgcn.readfirstlane/readlane intrinsics. They accept vector register and return the first or specific lane value.
So both accept naturally divergent VGPR but return the scalar value.
Following the common divergence computing algorithm - "the divergence of operation's result is superposition of the operands divergence" we'd set %scalar = tail call i32 @llvm.amdgcn.readfirstlane(i32 %tid) to divergent that is not true.
In the IR form of the divergence-driven selection we rely on the TargetTransformInfo::isAlwaysUniform hook that was added to interface for this purpose.
It allows the target to declare arbitrary set of target operations as "always uniform" so that the analysis does not count for their operands divergence.

To meet this design we'd have to add similar hook to the TargetLoweringInfo interface. Is this feasible?

In D35267#990432, @alex-t wrote:

As I understand you are concerned about the mutating the SDNode after it has been created in getNode().

My most important concern is actually getting the modeling correct, so queries come up with the correct result when it gets queried by DAGCombine. If the bit on the SDNode is just a cache which can be recomputed/verified, it's fine to mutate it when we need to.

FunctionLoweringInfo::ValueMap is created during the SelectionDAGBuilder walk through the BasicBlock. So we cannot query live-in register divergence from the CreateOperands => TargetLoweringInfo::isSDNodeSourceOfDivergence. By this point ValueMap has not yet been filled in.

Really? I thought we fill it in before we actually start building the SelectionDAG (in FunctionLoweringInfo::set). But you can move it earlier if you need to.

All above means that we cannot just validate the flag values and assert if it does not match. We have to run iterative solver for each block just before the selection to count the control dependencies and to propagate the flag values.

I tried this approach and it works at a first glance.

Great!

To meet this design we'd have to add similar hook to the TargetLoweringInfo interface. Is this feasible?

Yes, this should be fine.

FunctionLoweringInfo::ValueMap is created during the SelectionDAGBuilder walk through the BasicBlock. So we cannot query live-in register divergence from the CreateOperands => TargetLoweringInfo::isSDNodeSourceOfDivergence. By this point ValueMap has not yet been filled in.

Really? I thought we fill it in before we actually start building the SelectionDAG (in FunctionLoweringInfo::set). But you can move it earlier if you need to.

All above means that we cannot just validate the flag values and assert if it does not match. We have to run iterative solver for each block just before the selection to count the control dependencies and to propagate the flag values.

Oops... That was my mistake.

FunctionLoweringInfo::ValueMap gets filled in by the FunctionLoweringInfo::CreateRegs in SelectionDAGISel::SelectAllBasicBlocks much earlier then the SelectionDAGBuilder walks the IR. So, everything works! :)

BTW, we don't need to verify flags since we're creating them in CreaeOperands.
The flag for each node is computed from it's divergence and it's operands. This is going on in SelectionDAGBuilder walk.
For each node, it's operands are already computed in this point and node's divergence is immediately set to correct value.
This is correct just because in contrary to IR DAG has no loops.
Same story if CreateOperands is called from Combiner/Legalizer.

Here is alternative implementation based on the TargetLoweringInfo hooks.

I'd like to see a verifier somewhere that the divergence bit is still correct after DAGCombine (it could be different from what SelectionDAG::createOperands would compute given how ReplaceAllUsesWith works).

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
8311	This is good; I'm happy we're cleanly computing divergence for a DAG node.

rampitec added inline comments.Feb 5 2018, 11:48 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
782	SIRegisterInfo::isVGPR()
795	!DA \|\| DA->isDivergent(...) You are using getAnalysisIfAvailable, so it can be missing.
814	Can you make isIntrinsicSourceOfDivergence() external and use it instead?

rampitec added inline comments.Feb 5 2018, 11:51 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
787	I am afraid that is not true to say that VGPR is necessarily divergent.

In D35267#998083, @efriedma wrote:

I'd like to see a verifier somewhere that the divergence bit is still correct after DAGCombine (it could be different from what SelectionDAG::createOperands would compute given how ReplaceAllUsesWith works).

Could you please clarify the goal of the verification? Let's say we managed to transform the DAG in such a way that uniform pattern has been changed to divergent one.
Then the approach depends on our attitude to the transformation.
If we agree that transformation that change the pattern divergence is illegal? like it is in case we change uniform to divergent, we should assert and bail out.
If we assume that transformation is legal? like in your example of folding divergent variable to zero constant (x*0 => 0), we should recompute the divergence bits instead.
To handle both cases we need one more re-computation over all DAG nodes like it was done in my previous implementation but with error message if the uniform node becomes divergent.

I would like to just re-compute the bits just before selection and leave the legality of the DAG transformation issues to that transformations authors. In other words we compute what we have.
If someone transform the DAG incorrectly it is his own problem.

alex-t added inline comments.Feb 8 2018, 2:18 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
787	That's not true. Do we have a mean to detect that this is a splat vector? If not I'd stay with conservative approach that consider all VGPRs divergent. Alternatively we could add one more target hook to query for special VGPRs that are uniform.

rampitec added inline comments.Feb 8 2018, 10:00 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
787	I doubt you can reliably detect it. The concern is potential unneeded moves and readfirstlane instructions, one thing that we are trying to avoid here.

If we're going to include the "divergent" bit in SDNodes, so we can query it all the time, the bit needs to be correct all the time. The goal of a verifier is to ensure that at any given point, the bits stored in the SelectionDAG are the same as the bits we would compute from scratch. So code still needs to do the right thing to update the divergence bits, if necessary, but the verifier lets us catch mistakes early. This is similar to the way we have a domtree verifier, to ensure transforms correctly update the domtree.

rampitec added inline comments.Feb 8 2018, 12:42 PM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
787	I withdraw this objection. Apparently this is all about physregs, and we do not have a lot of them at this stage.

In D35267#1002305, @efriedma wrote:

If we're going to include the "divergent" bit in SDNodes, so we can query it all the time, the bit needs to be correct all the time. The goal of a verifier is to ensure that at any given point, the bits stored in the SelectionDAG are the same as the bits we would compute from scratch. So code still needs to do the right thing to update the divergence bits, if necessary, but the verifier lets us catch mistakes early. This is similar to the way we have a domtree verifier, to ensure transforms correctly update the domtree.

Re-computation the bits for the entire DAG any time combiner change something is too expensive.
In this case I'd opt to propagate the bit in ReplaceAllUses methods.

Preliminary revision illustrating possible approach to keeping divergence information consistent along the DAG transformation

alex-t added inline comments.Feb 13 2018, 6:29 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
795	if DA == nulptr in the case above we'd return ((bool)!DA) true? maybe it's better return false for the targets that have no DA? I mean " DA && DA->isDivegent()" if we have no DA we return false. In case we have, the returned value will be defined by the isDivergent result

alex-t added inline comments.Feb 13 2018, 7:32 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
782	What should we do fro R600Subtarget?

rampitec added inline comments.Feb 13 2018, 9:59 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
782	Since we are not going to change selection in R600 that is practically non-important what we do for this case.
795	That is conservatively correct to return true. Presumably targets w/o DA will have no use of the bit anyway, but if they are it is dangerous to assume uniformness.

alex-t added inline comments.Feb 13 2018, 10:06 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
795	targets w/o DA will have no use of the bit anyway, but if they are it is dangerous Sounds a bit paranoid :) I just noted that returning "true" for the target that has no divergence at all looks misleading.

alex-t added inline comments.Feb 14 2018, 1:00 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
795	Moreover, targets that have no DA have neither overridden isSDNodeSourceOfDivergence and will never get here.

Some bugfixes and changes according to the reviewers requirements.

rampitec added inline comments.Feb 15 2018, 9:22 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
795	What if DA is invalidated and thus NULL, even for targets with divergence?
827	It still duplicates implementation in AMDGPUTargetTransformInfo.

ping @efriedma

I still want a verifier, a function which checks the bits currently saved on SelectionDAG nodes are the same as the bits we would compute from scratch (and calls report_fatal_error() if they aren't). Maybe call it in a couple places in SelectionDAGISel::CodeGenAndEmitDAG() if assertions are enabled.

alex-t added a comment.Feb 20 2018, 6:02 AM

This comment was removed by alex-t.

This is the preview of the implementation that provide walk-through divergence bits consistency.
Please note that the verification algorithm has polynomial complexity and is expected to be switched ON/OFF by the option (upcoming soon) with default to OFF.

You should be able to do verification in linear time. Just call SelectionDAG::AssignTopologicalOrder() before you start iterating over allnodes().

You should be able to do verification in linear time. Just call SelectionDAG::AssignTopologicalOrder() before you start iterating over allnodes().

Actually, on second thought, maybe don't do that; AssignTopologicalOrder() mutates the SelectionDAG, so it could change the generated code if we call it conditionally. But anyway, a topological sort should be straightforward.

Verification algorithm of linear complexity

efriedma added inline comments.Feb 22 2018, 12:42 PM

include/llvm/CodeGen/TargetLowering.h
2561	Weird indentation; try clang-format?
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7506	This is exactly what I was looking for; thanks.
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
748	Maybe `#ifndef NDEBUG`. Can we somehow skip this for targets which don't use divergence information?

Formatting fixed.
DAG divergence verification for "divergent" targets only.

One test fixed

make check-llvm has passed

Target-independent bits LGTM.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
838	Formatting?

ready to land

alex-t marked 6 inline comments as done.Mar 2 2018, 5:40 AM

alex-t updated this revision to Diff 136772.Mar 2 2018, 9:21 AM

alex-t updated this revision to Diff 136985.Mar 5 2018, 5:58 AM

Closed by commit rL326703: Pass Divergence Analysis data to Selection DAG to drive divergence (authored by alex-t). · Explain WhyMar 5 2018, 7:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Analysis/

DivergenceAnalysis.h

4 lines

CodeGen/

FunctionLoweringInfo.h

11 lines

SelectionDAG.h

31 lines

SelectionDAGNodes.h

6 lines

TargetLowering.h

10 lines

lib/

CodeGen/

SelectionDAG/

FunctionLoweringInfo.cpp

10 lines

SelectionDAG.cpp

83 lines

SelectionDAGDumper.cpp

2 lines

SelectionDAGISel.cpp

17 lines

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

2 lines

AMDGPUISelLowering.h

3 lines

AMDGPUISelLowering.cpp

97 lines

AMDGPUTargetTransformInfo.cpp

53 lines

SIISelLowering.cpp

4 lines

SMInstructions.td

7 lines

Utils/

AMDGPUBaseInfo.h

3 lines

AMDGPUBaseInfo.cpp

51 lines

test/

CodeGen/

AMDGPU/

callee-special-input-sgprs.ll

52 lines

llvm.amdgcn.implicitarg.ptr.ll

25 lines

Diff 135310

include/llvm/Analysis/DivergenceAnalysis.h

//===- llvm/Analysis/DivergenceAnalysis.h - Divergence Analysis -- C++ --===//		//===- llvm/Analysis/DivergenceAnalysis.h - Divergence Analysis -- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// The divergence analysis is an LLVM pass which can be used to find out		// The divergence analysis is an LLVM pass which can be used to find out
// if a branch instruction in a GPU program is divergent or not. It can help		// if a branch instruction in a GPU program is divergent or not. It can help
// branch optimizations such as jump threading and loop unswitching to make		// branch optimizations such as jump threading and loop unswitching to make
// better decisions.		// better decisions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		#ifndef LLVM_ANALYSIS_DIVERGENCE_ANALYSIS_H
		#define LLVM_ANALYSIS_DIVERGENCE_ANALYSIS_H

#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"

namespace llvm {		namespace llvm {
class Value;		class Value;
class DivergenceAnalysis : public FunctionPass {		class DivergenceAnalysis : public FunctionPass {
Show All 17 Lines	public:
// Returns true if V is uniform/non-divergent.		// Returns true if V is uniform/non-divergent.
bool isUniform(const Value *V) const { return !isDivergent(V); }		bool isUniform(const Value *V) const { return !isDivergent(V); }

private:		private:
// Stores all divergent values.		// Stores all divergent values.
DenseSet<const Value *> DivergentValues;		DenseSet<const Value *> DivergentValues;
};		};
} // End llvm namespace		} // End llvm namespace

		#endif //LLVM_ANALYSIS_DIVERGENCE_ANALYSIS_H
		No newline at end of file

include/llvm/CodeGen/FunctionLoweringInfo.h

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	public:
getOrCreateSwiftErrorVRegUseAt(const Instruction , const MachineBasicBlock ,		getOrCreateSwiftErrorVRegUseAt(const Instruction , const MachineBasicBlock ,
const Value *);		const Value *);

/// ValueMap - Since we emit code for the function a basic block at a time,		/// ValueMap - Since we emit code for the function a basic block at a time,
/// we must remember which virtual registers hold the values for		/// we must remember which virtual registers hold the values for
/// cross-basic-block values.		/// cross-basic-block values.
DenseMap<const Value *, unsigned> ValueMap;		DenseMap<const Value *, unsigned> ValueMap;

		/// VirtReg2Value map is needed by the Divergence Analysis driven
		/// instruction selection. It is reverted ValueMap. It is computed
		/// in lazy style - on demand. It is used to get the Value corresponding
		/// to the live in virtual register and is called from the
		/// TargetLowerinInfo::isSDNodeSourceOfDivergence.
		DenseMap<unsigned, const Value*> VirtReg2Value;

		/// This method is called from TargetLowerinInfo::isSDNodeSourceOfDivergence
		/// to get the Value corresponding to the live-in virtual register.
		const Value * getValueFromVirtualReg(unsigned Vreg);

/// Track virtual registers created for exception pointers.		/// Track virtual registers created for exception pointers.
DenseMap<const Value *, unsigned> CatchPadExceptionPointers;		DenseMap<const Value *, unsigned> CatchPadExceptionPointers;

/// Keep track of frame indices allocated for statepoints as they could be		/// Keep track of frame indices allocated for statepoints as they could be
/// used across basic block boundaries. This struct is more complex than a		/// used across basic block boundaries. This struct is more complex than a
/// simple map because the stateopint lowering code de-duplicates gc pointers		/// simple map because the stateopint lowering code de-duplicates gc pointers
/// based on their SDValue (so %p and (bitcast %p to T) will get the same		/// based on their SDValue (so %p and (bitcast %p to T) will get the same
/// slot), and we track that here.		/// slot), and we track that here.
▲ Show 20 Lines • Show All 184 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAG.h

Show All 22 Lines
#include "llvm/ADT/FoldingSet.h"		#include "llvm/ADT/FoldingSet.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/ilist.h"		#include "llvm/ADT/ilist.h"
#include "llvm/ADT/iterator.h"		#include "llvm/ADT/iterator.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/CodeGen/DAGCombine.h"		#include "llvm/CodeGen/DAGCombine.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
		#include "llvm/CodeGen/FunctionLoweringInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineValueType.h"		#include "llvm/CodeGen/MachineValueType.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"		#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/CodeGen/ValueTypes.h"		#include "llvm/CodeGen/ValueTypes.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	class SelectionDAG {
const SelectionDAGTargetInfo *TSI = nullptr;		const SelectionDAGTargetInfo *TSI = nullptr;
const TargetLowering *TLI = nullptr;		const TargetLowering *TLI = nullptr;
const TargetLibraryInfo *LibInfo = nullptr;		const TargetLibraryInfo *LibInfo = nullptr;
MachineFunction *MF;		MachineFunction *MF;
Pass *SDAGISelPass = nullptr;		Pass *SDAGISelPass = nullptr;
LLVMContext *Context;		LLVMContext *Context;
CodeGenOpt::Level OptLevel;		CodeGenOpt::Level OptLevel;

		DivergenceAnalysis * DA = nullptr;
		FunctionLoweringInfo * FLI = nullptr;

/// The function-level optimization remark emitter. Used to emit remarks		/// The function-level optimization remark emitter. Used to emit remarks
/// whenever manipulating the DAG.		/// whenever manipulating the DAG.
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;

/// The starting token.		/// The starting token.
SDNode EntryNode;		SDNode EntryNode;

/// The root of the entire DAG.		/// The root of the entire DAG.
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	private:
template <typename SDNodeTy>		template <typename SDNodeTy>
static uint16_t getSyntheticNodeSubclassData(unsigned Opc, unsigned Order,		static uint16_t getSyntheticNodeSubclassData(unsigned Opc, unsigned Order,
SDVTList VTs, EVT MemoryVT,		SDVTList VTs, EVT MemoryVT,
MachineMemOperand *MMO) {		MachineMemOperand *MMO) {
return SDNodeTy(Opc, Order, DebugLoc(), VTs, MemoryVT, MMO)		return SDNodeTy(Opc, Order, DebugLoc(), VTs, MemoryVT, MMO)
.getRawSubclassData();		.getRawSubclassData();
}		}

void createOperands(SDNode *Node, ArrayRef<SDValue> Vals) {		void createOperands(SDNode *Node, ArrayRef<SDValue> Vals);
		efriedmaUnsubmitted Not Done Reply Inline Actions I like this. :) efriedma: I like this. :)
assert(!Node->OperandList && "Node already has operands");
SDUse *Ops = OperandRecycler.allocate(
ArrayRecycler<SDUse>::Capacity::get(Vals.size()), OperandAllocator);

for (unsigned I = 0; I != Vals.size(); ++I) {
Ops[I].setUser(Node);
Ops[I].setInitial(Vals[I]);
}
Node->NumOperands = Vals.size();
Node->OperandList = Ops;
checkForCycles(Node);
}

void removeOperands(SDNode *Node) {		void removeOperands(SDNode *Node) {
if (!Node->OperandList)		if (!Node->OperandList)
return;		return;
OperandRecycler.deallocate(		OperandRecycler.deallocate(
ArrayRecycler<SDUse>::Capacity::get(Node->NumOperands),		ArrayRecycler<SDUse>::Capacity::get(Node->NumOperands),
Node->OperandList);		Node->OperandList);
Node->NumOperands = 0;		Node->NumOperands = 0;
Node->OperandList = nullptr;		Node->OperandList = nullptr;
}		}

public:		public:
explicit SelectionDAG(const TargetMachine &TM, CodeGenOpt::Level);		explicit SelectionDAG(const TargetMachine &TM, CodeGenOpt::Level);
SelectionDAG(const SelectionDAG &) = delete;		SelectionDAG(const SelectionDAG &) = delete;
SelectionDAG &operator=(const SelectionDAG &) = delete;		SelectionDAG &operator=(const SelectionDAG &) = delete;
~SelectionDAG();		~SelectionDAG();

/// Prepare this SelectionDAG to process code in the given MachineFunction.		/// Prepare this SelectionDAG to process code in the given MachineFunction.
void init(MachineFunction &NewMF, OptimizationRemarkEmitter &NewORE,		void init(MachineFunction &NewMF, OptimizationRemarkEmitter &NewORE,
Pass PassPtr, const TargetLibraryInfo LibraryInfo);		Pass PassPtr, const TargetLibraryInfo LibraryInfo,
		DivergenceAnalysis * DA);

		void setFunctionLoweringInfo(FunctionLoweringInfo * FuncInfo) {
		FLI = FuncInfo;
		}

/// Clear state and free memory necessary to make this		/// Clear state and free memory necessary to make this
/// SelectionDAG ready to process a new block.		/// SelectionDAG ready to process a new block.
void clear();		void clear();

MachineFunction &getMachineFunction() const { return *MF; }		MachineFunction &getMachineFunction() const { return *MF; }
const Pass *getPass() const { return SDAGISelPass; }		const Pass *getPass() const { return SDAGISelPass; }

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	const SDValue &setRoot(SDValue N) {
if (N.getNode())		if (N.getNode())
checkForCycles(N.getNode(), this);		checkForCycles(N.getNode(), this);
Root = N;		Root = N;
if (N.getNode())		if (N.getNode())
checkForCycles(this);		checkForCycles(this);
return Root;		return Root;
}		}

		void VerifyDAGDiverence();

/// This iterates over the nodes in the SelectionDAG, folding		/// This iterates over the nodes in the SelectionDAG, folding
/// certain types of nodes together, or eliminating superfluous nodes. The		/// certain types of nodes together, or eliminating superfluous nodes. The
/// Level argument controls whether Combine is allowed to produce nodes and		/// Level argument controls whether Combine is allowed to produce nodes and
/// types that are illegal on the target.		/// types that are illegal on the target.
void Combine(CombineLevel Level, AliasAnalysis *AA,		void Combine(CombineLevel Level, AliasAnalysis *AA,
CodeGenOpt::Level OptLevel);		CodeGenOpt::Level OptLevel);

/// This transforms the SelectionDAG into a SelectionDAG that		/// This transforms the SelectionDAG into a SelectionDAG that
▲ Show 20 Lines • Show All 649 Lines • ▼ Show 20 Lines	#endif
SDNode UpdateNodeOperands(SDNode N, SDValue Op1, SDValue Op2,		SDNode UpdateNodeOperands(SDNode N, SDValue Op1, SDValue Op2,
SDValue Op3);		SDValue Op3);
SDNode UpdateNodeOperands(SDNode N, SDValue Op1, SDValue Op2,		SDNode UpdateNodeOperands(SDNode N, SDValue Op1, SDValue Op2,
SDValue Op3, SDValue Op4);		SDValue Op3, SDValue Op4);
SDNode UpdateNodeOperands(SDNode N, SDValue Op1, SDValue Op2,		SDNode UpdateNodeOperands(SDNode N, SDValue Op1, SDValue Op2,
SDValue Op3, SDValue Op4, SDValue Op5);		SDValue Op3, SDValue Op4, SDValue Op5);
SDNode UpdateNodeOperands(SDNode N, ArrayRef<SDValue> Ops);		SDNode UpdateNodeOperands(SDNode N, ArrayRef<SDValue> Ops);

		// Propagates the change in divergence to users
		void updateDivergence(SDNode * N);

/// These are used for target selectors to mutate the		/// These are used for target selectors to mutate the
/// specified node to have the specified return type, Target opcode, and		/// specified node to have the specified return type, Target opcode, and
/// operands. Note that target opcodes are stored as		/// operands. Note that target opcodes are stored as
/// ~TargetOpcode in the node opcode field. The resultant node is returned.		/// ~TargetOpcode in the node opcode field. The resultant node is returned.
SDNode SelectNodeTo(SDNode N, unsigned TargetOpc, EVT VT);		SDNode SelectNodeTo(SDNode N, unsigned TargetOpc, EVT VT);
SDNode SelectNodeTo(SDNode N, unsigned TargetOpc, EVT VT, SDValue Op1);		SDNode SelectNodeTo(SDNode N, unsigned TargetOpc, EVT VT, SDValue Op1);
SDNode SelectNodeTo(SDNode N, unsigned TargetOpc, EVT VT,		SDNode SelectNodeTo(SDNode N, unsigned TargetOpc, EVT VT,
SDValue Op1, SDValue Op2);		SDValue Op1, SDValue Op2);
▲ Show 20 Lines • Show All 470 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	protected:
// We define a set of mini-helper classes to help us interpret the bits in our		// We define a set of mini-helper classes to help us interpret the bits in our
// SubclassData. These are designed to fit within a uint16_t so they pack		// SubclassData. These are designed to fit within a uint16_t so they pack
// with NodeType.		// with NodeType.

class SDNodeBitfields {		class SDNodeBitfields {
friend class SDNode;		friend class SDNode;
friend class MemIntrinsicSDNode;		friend class MemIntrinsicSDNode;
friend class MemSDNode;		friend class MemSDNode;
		friend class SelectionDAG;

uint16_t HasDebugValue : 1;		uint16_t HasDebugValue : 1;
uint16_t IsMemIntrinsic : 1;		uint16_t IsMemIntrinsic : 1;
		uint16_t IsDivergent : 1;
};		};
enum { NumSDNodeBits = 2 };		enum { NumSDNodeBits = 3 };

class ConstantSDNodeBitfields {		class ConstantSDNodeBitfields {
friend class ConstantSDNode;		friend class ConstantSDNode;

uint16_t : NumSDNodeBits;		uint16_t : NumSDNodeBits;

uint16_t IsOpaque : 1;		uint16_t IsOpaque : 1;
};		};
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	public:
unsigned getMachineOpcode() const {		unsigned getMachineOpcode() const {
assert(isMachineOpcode() && "Not a MachineInstr opcode!");		assert(isMachineOpcode() && "Not a MachineInstr opcode!");
return ~NodeType;		return ~NodeType;
}		}

bool getHasDebugValue() const { return SDNodeBits.HasDebugValue; }		bool getHasDebugValue() const { return SDNodeBits.HasDebugValue; }
void setHasDebugValue(bool b) { SDNodeBits.HasDebugValue = b; }		void setHasDebugValue(bool b) { SDNodeBits.HasDebugValue = b; }

		bool isDivergent() const { return SDNodeBits.IsDivergent; }

		arsenmUnsubmitted Not Done Reply Inline Actions I have a general concern about this. The way this is used is going to not fit with how SelectionDAG APIs work, and is going to be very invasive. An SDNode is supposed to be immutable and some level of CSE is done by getNode. You can't have an API that involves setting a bit on a newly created node. Anything setting this needs to be done in getNode. Are divergent and non-divergent nodes CSEable? These need to be handled somewhere to prevent them from folding. You seem to only specially handle loads, but we have a lot of cases where we have combine issues from not knowing whether it's going to be selected to SALU or VALU instructions. If we have to somehow propagate this on every place a node is produced, that is a massive undertaking. I don't think that at this point it's worth trying to do such a level of work on SelectionDAG with GlobalISel on the way. Only handling loads I thought we could do just from the MemOperand. arsenm: I have a general concern about this. The way this is used is going to not fit with how…
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions I agree with you in general... I also don't like to explicitly propagate divergence flag in each place in combining or/and legalizing. The problem is that getNode is not the only point where new SDNode may be created. For example getLoad and getExtLoad bypass getNode and create LoadSDNode explicitly. As for the handling divergence in CSE map... I maybe do not understand your point? If the node is CSEed we don't care is it divergent or not. alex-t: I agree with you in general... I also don't like to explicitly propagate divergence flag in…
/// Return true if there are no uses of this node.		/// Return true if there are no uses of this node.
bool use_empty() const { return UseList == nullptr; }		bool use_empty() const { return UseList == nullptr; }

/// Return true if there is exactly one use of this node.		/// Return true if there is exactly one use of this node.
bool hasOneUse() const {		bool hasOneUse() const {
return !use_empty() && std::next(use_begin()) == use_end();		return !use_empty() && std::next(use_begin()) == use_end();
}		}

▲ Show 20 Lines • Show All 1,694 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetLowering.h

Show All 23 Lines
#define LLVM_CODEGEN_TARGETLOWERING_H		#define LLVM_CODEGEN_TARGETLOWERING_H

#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/CodeGen/DAGCombine.h"		#include "llvm/CodeGen/DAGCombine.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineValueType.h"		#include "llvm/CodeGen/MachineValueType.h"
#include "llvm/CodeGen/RuntimeLibcalls.h"		#include "llvm/CodeGen/RuntimeLibcalls.h"
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"		#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/CodeGen/TargetCallingConv.h"		#include "llvm/CodeGen/TargetCallingConv.h"
#include "llvm/CodeGen/ValueTypes.h"		#include "llvm/CodeGen/ValueTypes.h"
▲ Show 20 Lines • Show All 2,511 Lines • ▼ Show 20 Lines	public:
TargetLowering(const TargetLowering &) = delete;		TargetLowering(const TargetLowering &) = delete;
TargetLowering &operator=(const TargetLowering &) = delete;		TargetLowering &operator=(const TargetLowering &) = delete;

/// NOTE: The TargetMachine owns TLOF.		/// NOTE: The TargetMachine owns TLOF.
explicit TargetLowering(const TargetMachine &TM);		explicit TargetLowering(const TargetMachine &TM);

bool isPositionIndependent() const;		bool isPositionIndependent() const;

		virtual bool isSDNodeSourceOfDivergence(const SDNode * N,
		FunctionLoweringInfo * FLI, DivergenceAnalysis * DA) const {
		efriedmaUnsubmitted Not Done Reply Inline Actions Weird indentation; try clang-format? efriedma: Weird indentation; try clang-format?
		return false;
		}

		virtual bool isSDNodeAlwaysUniform(const SDNode * N) const {
		return false;
		}

/// Returns true by value, base pointer and offset pointer and addressing mode		/// Returns true by value, base pointer and offset pointer and addressing mode
/// by reference if the node's address can be legally represented as		/// by reference if the node's address can be legally represented as
/// pre-indexed load / store address.		/// pre-indexed load / store address.
virtual bool getPreIndexedAddressParts(SDNode * /N/, SDValue &/Base/,		virtual bool getPreIndexedAddressParts(SDNode * /N/, SDValue &/Base/,
SDValue &/Offset/,		SDValue &/Offset/,
ISD::MemIndexedMode &/AM/,		ISD::MemIndexedMode &/AM/,
SelectionDAG &/DAG/) const {		SelectionDAG &/DAG/) const {
return false;		return false;
▲ Show 20 Lines • Show All 989 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp

Show First 20 Lines • Show All 541 Lines • ▼ Show 20 Lines	FunctionLoweringInfo::getOrCreateSwiftErrorVRegUseAt(const Instruction I, const MachineBasicBlock MBB, const Value *Val) {
auto It = SwiftErrorVRegDefUses.find(Key);		auto It = SwiftErrorVRegDefUses.find(Key);
if (It == SwiftErrorVRegDefUses.end()) {		if (It == SwiftErrorVRegDefUses.end()) {
unsigned VReg = getOrCreateSwiftErrorVReg(MBB, Val);		unsigned VReg = getOrCreateSwiftErrorVReg(MBB, Val);
SwiftErrorVRegDefUses[Key] = VReg;		SwiftErrorVRegDefUses[Key] = VReg;
return std::make_pair(VReg, true);		return std::make_pair(VReg, true);
}		}
return std::make_pair(It->second, false);		return std::make_pair(It->second, false);
}		}

		const Value *
		FunctionLoweringInfo::getValueFromVirtualReg(unsigned Vreg) {
		if (VirtReg2Value.empty()) {
		for (auto &P : ValueMap) {
		VirtReg2Value[P.second] = P.first;
		}
		}
		return VirtReg2Value[Vreg];
		}

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 898 Lines • ▼ Show 20 Lines	: TM(tm), OptLevel(OL),
EntryNode(ISD::EntryToken, 0, DebugLoc(), getVTList(MVT::Other)),		EntryNode(ISD::EntryToken, 0, DebugLoc(), getVTList(MVT::Other)),
Root(getEntryNode()) {		Root(getEntryNode()) {
InsertNode(&EntryNode);		InsertNode(&EntryNode);
DbgInfo = new SDDbgInfo();		DbgInfo = new SDDbgInfo();
}		}

void SelectionDAG::init(MachineFunction &NewMF,		void SelectionDAG::init(MachineFunction &NewMF,
OptimizationRemarkEmitter &NewORE,		OptimizationRemarkEmitter &NewORE,
Pass PassPtr, const TargetLibraryInfo LibraryInfo) {		Pass PassPtr, const TargetLibraryInfo LibraryInfo,
		DivergenceAnalysis * Divergence) {
MF = &NewMF;		MF = &NewMF;
SDAGISelPass = PassPtr;		SDAGISelPass = PassPtr;
ORE = &NewORE;		ORE = &NewORE;
TLI = getSubtarget().getTargetLowering();		TLI = getSubtarget().getTargetLowering();
TSI = getSubtarget().getSelectionDAGInfo();		TSI = getSubtarget().getSelectionDAGInfo();
LibInfo = LibraryInfo;		LibInfo = LibraryInfo;
Context = &MF->getFunction().getContext();		Context = &MF->getFunction().getContext();
		DA = Divergence;
}		}

SelectionDAG::~SelectionDAG() {		SelectionDAG::~SelectionDAG() {
assert(!UpdateListeners && "Dangling registered DAGUpdateListeners");		assert(!UpdateListeners && "Dangling registered DAGUpdateListeners");
allnodes_clear();		allnodes_clear();
OperandRecycler.clear(OperandAllocator);		OperandRecycler.clear(OperandAllocator);
delete DbgInfo;		delete DbgInfo;
}		}
▲ Show 20 Lines • Show All 739 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::getRegister(unsigned RegNo, EVT VT) {
FoldingSetNodeID ID;		FoldingSetNodeID ID;
AddNodeIDNode(ID, ISD::Register, getVTList(VT), None);		AddNodeIDNode(ID, ISD::Register, getVTList(VT), None);
ID.AddInteger(RegNo);		ID.AddInteger(RegNo);
void *IP = nullptr;		void *IP = nullptr;
if (SDNode *E = FindNodeOrInsertPos(ID, IP))		if (SDNode *E = FindNodeOrInsertPos(ID, IP))
return SDValue(E, 0);		return SDValue(E, 0);

auto *N = newSDNode<RegisterSDNode>(RegNo, VT);		auto *N = newSDNode<RegisterSDNode>(RegNo, VT);
		N->SDNodeBits.IsDivergent = TLI->isSDNodeSourceOfDivergence(N, FLI, DA);
CSEMap.InsertNode(N, IP);		CSEMap.InsertNode(N, IP);
InsertNode(N);		InsertNode(N);
return SDValue(N, 0);		return SDValue(N, 0);
}		}

SDValue SelectionDAG::getRegisterMask(const uint32_t *RegMask) {		SDValue SelectionDAG::getRegisterMask(const uint32_t *RegMask) {
FoldingSetNodeID ID;		FoldingSetNodeID ID;
AddNodeIDNode(ID, ISD::RegisterMask, getVTList(MVT::Untyped), None);		AddNodeIDNode(ID, ISD::RegisterMask, getVTList(MVT::Untyped), None);
▲ Show 20 Lines • Show All 4,917 Lines • ▼ Show 20 Lines	if (!RemoveNodeFromCSEMaps(N))
InsertPos = nullptr;		InsertPos = nullptr;

// Now we update the operands.		// Now we update the operands.
if (N->OperandList[0] != Op1)		if (N->OperandList[0] != Op1)
N->OperandList[0].set(Op1);		N->OperandList[0].set(Op1);
if (N->OperandList[1] != Op2)		if (N->OperandList[1] != Op2)
N->OperandList[1].set(Op2);		N->OperandList[1].set(Op2);

		updateDivergence(N);
// If this gets put into a CSE map, add it.		// If this gets put into a CSE map, add it.
if (InsertPos) CSEMap.InsertNode(N, InsertPos);		if (InsertPos) CSEMap.InsertNode(N, InsertPos);
return N;		return N;
}		}

SDNode *SelectionDAG::		SDNode *SelectionDAG::
UpdateNodeOperands(SDNode *N, SDValue Op1, SDValue Op2, SDValue Op3) {		UpdateNodeOperands(SDNode *N, SDValue Op1, SDValue Op2, SDValue Op3) {
SDValue Ops[] = { Op1, Op2, Op3 };		SDValue Ops[] = { Op1, Op2, Op3 };
▲ Show 20 Lines • Show All 625 Lines • ▼ Show 20 Lines	while (UI != UE) {
// A user can appear in a use list multiple times, and when this		// A user can appear in a use list multiple times, and when this
// happens the uses are usually next to each other in the list.		// happens the uses are usually next to each other in the list.
// To help reduce the number of CSE recomputations, process all		// To help reduce the number of CSE recomputations, process all
// the uses of this user that we can find this way.		// the uses of this user that we can find this way.
do {		do {
SDUse &Use = UI.getUse();		SDUse &Use = UI.getUse();
++UI;		++UI;
Use.set(To);		Use.set(To);
		if (To->isDivergent() != From->isDivergent())
		updateDivergence(User);
} while (UI != UE && *UI == User);		} while (UI != UE && *UI == User);

// Now that we have modified User, add it back to the CSE maps. If it		// Now that we have modified User, add it back to the CSE maps. If it
// already exists there, recursively merge the results together.		// already exists there, recursively merge the results together.
AddModifiedNodeToCSEMaps(User);		AddModifiedNodeToCSEMaps(User);
}		}

// If we just RAUW'd the root, take note.		// If we just RAUW'd the root, take note.
if (FromN == getRoot())		if (FromN == getRoot())
setRoot(To);		setRoot(To);
Show All 37 Lines	while (UI != UE) {
// A user can appear in a use list multiple times, and when this		// A user can appear in a use list multiple times, and when this
// happens the uses are usually next to each other in the list.		// happens the uses are usually next to each other in the list.
// To help reduce the number of CSE recomputations, process all		// To help reduce the number of CSE recomputations, process all
// the uses of this user that we can find this way.		// the uses of this user that we can find this way.
do {		do {
SDUse &Use = UI.getUse();		SDUse &Use = UI.getUse();
++UI;		++UI;
Use.setNode(To);		Use.setNode(To);
		if (To->isDivergent() != From->isDivergent())
		updateDivergence(User);
} while (UI != UE && *UI == User);		} while (UI != UE && *UI == User);

// Now that we have modified User, add it back to the CSE maps. If it		// Now that we have modified User, add it back to the CSE maps. If it
// already exists there, recursively merge the results together.		// already exists there, recursively merge the results together.
AddModifiedNodeToCSEMaps(User);		AddModifiedNodeToCSEMaps(User);
}		}

// If we just RAUW'd the root, take note.		// If we just RAUW'd the root, take note.
Show All 28 Lines	while (UI != UE) {
// happens the uses are usually next to each other in the list.		// happens the uses are usually next to each other in the list.
// To help reduce the number of CSE recomputations, process all		// To help reduce the number of CSE recomputations, process all
// the uses of this user that we can find this way.		// the uses of this user that we can find this way.
do {		do {
SDUse &Use = UI.getUse();		SDUse &Use = UI.getUse();
const SDValue &ToOp = To[Use.getResNo()];		const SDValue &ToOp = To[Use.getResNo()];
++UI;		++UI;
Use.set(ToOp);		Use.set(ToOp);
		if (To->getNode()->isDivergent() != From->isDivergent())
		updateDivergence(User);
} while (UI != UE && *UI == User);		} while (UI != UE && *UI == User);

// Now that we have modified User, add it back to the CSE maps. If it		// Now that we have modified User, add it back to the CSE maps. If it
		efriedmaUnsubmitted Not Done Reply Inline Actions Missing code to unset IsDivergent if a node becomes non-divergent, and missing code to recursively propagate changes. efriedma: Missing code to unset IsDivergent if a node becomes non-divergent, and missing code to…
// already exists there, recursively merge the results together.		// already exists there, recursively merge the results together.
AddModifiedNodeToCSEMaps(User);		AddModifiedNodeToCSEMaps(User);
}		}

// If we just RAUW'd the root, take note.		// If we just RAUW'd the root, take note.
if (From == getRoot().getNode())		if (From == getRoot().getNode())
setRoot(SDValue(To[getRoot().getResNo()]));		setRoot(SDValue(To[getRoot().getResNo()]));
}		}
Show All 40 Lines	do {
// so remove its old self from the CSE maps.		// so remove its old self from the CSE maps.
if (!UserRemovedFromCSEMaps) {		if (!UserRemovedFromCSEMaps) {
RemoveNodeFromCSEMaps(User);		RemoveNodeFromCSEMaps(User);
UserRemovedFromCSEMaps = true;		UserRemovedFromCSEMaps = true;
}		}

++UI;		++UI;
Use.set(To);		Use.set(To);
		if (To->isDivergent() != From->isDivergent())
		updateDivergence(User);
} while (UI != UE && *UI == User);		} while (UI != UE && *UI == User);

// We are iterating over all uses of the From node, so if a use		// We are iterating over all uses of the From node, so if a use
// doesn't use the specific value, no changes are made.		// doesn't use the specific value, no changes are made.
if (!UserRemovedFromCSEMaps)		if (!UserRemovedFromCSEMaps)
continue;		continue;

// Now that we have modified User, add it back to the CSE maps. If it		// Now that we have modified User, add it back to the CSE maps. If it
// already exists there, recursively merge the results together.		// already exists there, recursively merge the results together.
AddModifiedNodeToCSEMaps(User);		AddModifiedNodeToCSEMaps(User);
Show All 16 Lines	namespace {

/// operator< - Sort Memos by User.		/// operator< - Sort Memos by User.
bool operator<(const UseMemo &L, const UseMemo &R) {		bool operator<(const UseMemo &L, const UseMemo &R) {
return (intptr_t)L.User < (intptr_t)R.User;		return (intptr_t)L.User < (intptr_t)R.User;
}		}

} // end anonymous namespace		} // end anonymous namespace

		void SelectionDAG::updateDivergence(SDNode * N)
		{
		if (TLI->isSDNodeAlwaysUniform(N))
		return;
		bool IsDivergent = TLI->isSDNodeSourceOfDivergence(N, FLI, DA);
		for (auto &Op : N->ops()) {
		if (Op.Val.getValueType() != MVT::Other)
		IsDivergent \|= Op.getNode()->isDivergent();
		}
		if (N->SDNodeBits.IsDivergent != IsDivergent) {
		N->SDNodeBits.IsDivergent = IsDivergent;
		for (auto U : N->uses()) {
		updateDivergence(U);
		}
		}
		}

		void SelectionDAG::VerifyDAGDiverence()
		{
		const TargetLowering &TLI = getTargetLoweringInfo();
		DenseMap<const SDNode *, bool> DivergenceMap;
		for (auto &N : allnodes()) {
		DivergenceMap[&N] = false;
		}
		bool Changed = true;
		while (Changed) {
		Changed = false;
		for (auto &N : allnodes()) {
		bool IsDivergent = DivergenceMap[&N];
		bool IsSDNodeDivergent = TLI.isSDNodeSourceOfDivergence(&N, FLI, DA);
		for (auto &Op : N.ops()) {
		if (Op.Val.getValueType() != MVT::Other)
		IsSDNodeDivergent \|= DivergenceMap[Op.getNode()];
		}
		if (!IsDivergent && IsSDNodeDivergent && !TLI.isSDNodeAlwaysUniform(&N)) {
		DivergenceMap[&N] = true;
		Changed = true;
		}
		}
		}
		for (auto &N : allnodes()) {
		assert(DivergenceMap[&N] == N.isDivergent() && "Divergence bit inconsistency detected\n");
		}
		}


/// ReplaceAllUsesOfValuesWith - Replace any uses of From with To, leaving		/// ReplaceAllUsesOfValuesWith - Replace any uses of From with To, leaving
/// uses of other values produced by From.getNode() alone. The same value		/// uses of other values produced by From.getNode() alone. The same value
/// may appear in both the From and To list. The Deleted vector is		/// may appear in both the From and To list. The Deleted vector is
/// handled the same way as for ReplaceAllUsesWith.		/// handled the same way as for ReplaceAllUsesWith.
void SelectionDAG::ReplaceAllUsesOfValuesWith(const SDValue *From,		void SelectionDAG::ReplaceAllUsesOfValuesWith(const SDValue *From,
const SDValue *To,		const SDValue *To,
unsigned Num){		unsigned Num){
// Handle the simple, trivial case efficiently.		// Handle the simple, trivial case efficiently.
if (Num == 1)		if (Num == 1)
return ReplaceAllUsesOfValueWith(From, To);		return ReplaceAllUsesOfValueWith(From, To);

transferDbgValues(From, To);		transferDbgValues(From, To);

// Read up all the uses and make records of them. This helps		// Read up all the uses and make records of them. This helps
// processing new uses that are introduced during the		// processing new uses that are introduced during the
// replacement process.		// replacement process.
		efriedmaUnsubmitted Not Done Reply Inline Actions This is exactly what I was looking for; thanks. efriedma: This is exactly what I was looking for; thanks.
SmallVector<UseMemo, 4> Uses;		SmallVector<UseMemo, 4> Uses;
for (unsigned i = 0; i != Num; ++i) {		for (unsigned i = 0; i != Num; ++i) {
unsigned FromResNo = From[i].getResNo();		unsigned FromResNo = From[i].getResNo();
SDNode *FromNode = From[i].getNode();		SDNode *FromNode = From[i].getNode();
for (SDNode::use_iterator UI = FromNode->use_begin(),		for (SDNode::use_iterator UI = FromNode->use_begin(),
E = FromNode->use_end(); UI != E; ++UI) {		E = FromNode->use_end(); UI != E; ++UI) {
SDUse &Use = UI.getUse();		SDUse &Use = UI.getUse();
if (Use.getResNo() == FromResNo) {		if (Use.getResNo() == FromResNo) {
▲ Show 20 Lines • Show All 773 Lines • ▼ Show 20 Lines	if (isa<ConstantFPSDNode>(N))
return N.getNode();		return N.getNode();

if (ISD::isBuildVectorOfConstantFPSDNodes(N.getNode()))		if (ISD::isBuildVectorOfConstantFPSDNodes(N.getNode()))
return N.getNode();		return N.getNode();

return nullptr;		return nullptr;
}		}

		void SelectionDAG::createOperands(SDNode *Node, ArrayRef<SDValue> Vals) {
		assert(!Node->OperandList && "Node already has operands");
		SDUse *Ops = OperandRecycler.allocate(
		ArrayRecycler<SDUse>::Capacity::get(Vals.size()), OperandAllocator);

		bool IsDivergent = false;
		for (unsigned I = 0; I != Vals.size(); ++I) {
		Ops[I].setUser(Node);
		Ops[I].setInitial(Vals[I]);
		if (Ops[I].Val.getValueType() != MVT::Other) // Skip Chain. It does not carry divergence.
		IsDivergent = IsDivergent \|\| Ops[I].getNode()->isDivergent();
		}
		Node->NumOperands = Vals.size();
		Node->OperandList = Ops;
		IsDivergent \|= TLI->isSDNodeSourceOfDivergence(Node, FLI, DA);
		if (!TLI->isSDNodeAlwaysUniform(Node))
		efriedmaUnsubmitted Not Done Reply Inline Actions This is good; I'm happy we're cleanly computing divergence for a DAG node. efriedma: This is good; I'm happy we're cleanly computing divergence for a DAG node.
		Node->SDNodeBits.IsDivergent = IsDivergent;
		checkForCycles(Node);
		}

#ifndef NDEBUG		#ifndef NDEBUG
static void checkForCyclesHelper(const SDNode *N,		static void checkForCyclesHelper(const SDNode *N,
SmallPtrSetImpl<const SDNode*> &Visited,		SmallPtrSetImpl<const SDNode*> &Visited,
SmallPtrSetImpl<const SDNode*> &Checked,		SmallPtrSetImpl<const SDNode*> &Checked,
const llvm::SelectionDAG *DAG) {		const llvm::SelectionDAG *DAG) {
// If this node has already been checked, don't check it again.		// If this node has already been checked, don't check it again.
if (Checked.count(N))		if (Checked.count(N))
return;		return;
Show All 38 Lines

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 623 Lines • ▼ Show 20 Lines	void SDNode::print_details(raw_ostream &OS, const SelectionDAG *G) const {
}		}

if (VerboseDAGDumping) {		if (VerboseDAGDumping) {
if (unsigned Order = getIROrder())		if (unsigned Order = getIROrder())
OS << " [ORD=" << Order << ']';		OS << " [ORD=" << Order << ']';

if (getNodeId() != -1)		if (getNodeId() != -1)
OS << " [ID=" << getNodeId() << ']';		OS << " [ID=" << getNodeId() << ']';
		if (!(isa<ConstantSDNode>(this) \|\| (isa<ConstantFPSDNode>(this))))
		OS << "# D:" << isDivergent();

if (!G)		if (!G)
return;		return;

DILocation *L = getDebugLoc();		DILocation *L = getDebugLoc();
if (!L)		if (!L)
return;		return;

▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	if (OptLevel != CodeGenOpt::None)
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<GCModuleInfo>();		AU.addRequired<GCModuleInfo>();
AU.addRequired<StackProtector>();		AU.addRequired<StackProtector>();
AU.addPreserved<StackProtector>();		AU.addPreserved<StackProtector>();
AU.addPreserved<GCModuleInfo>();		AU.addPreserved<GCModuleInfo>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
if (UseMBPI && OptLevel != CodeGenOpt::None)		if (UseMBPI && OptLevel != CodeGenOpt::None)
AU.addRequired<BranchProbabilityInfoWrapperPass>();		AU.addRequired<BranchProbabilityInfoWrapperPass>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
		rampitecUnsubmitted Not Done Reply Inline Actions The analysis is pretty expensive, but not needed by all targets. There is TTI.hasBranchDivergence(). How about adding it as required only if TTI.hasBranchDivergence()? It also means you will need default isDivergent to 0 if analysis is unavailable. rampitec: The analysis is pretty expensive, but not needed by all targets. There is TTI.
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions Not sure if it makes sense. DA.runOnFunction itself bails out if target has no divergence: bool DivergenceAnalysis::runOnFunction(Function &F) { auto TTIWP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>(); if (TTIWP == nullptr) return false; TargetTransformInfo &TTI = TTIWP->getTTI(F); // Fast path: if the target does not have branch divergence, we do not mark // any branch as divergent. if (!TTI.hasBranchDivergence()) return false;* alex-t: Not sure if it makes sense. DA.runOnFunction itself bails out if target has no divergence…
		rampitecUnsubmitted Not Done Reply Inline Actions It bails, yet it depends on DominatorTree on its own, so adding it as a required pass will cause DT to build. rampitec: It bails, yet it depends on DominatorTree on its own, so adding it as a required pass will…
		vpykhtinUnsubmitted Not Done Reply Inline Actions this isn't fixed yet. vpykhtin: this isn't fixed yet.
}		}

/// SplitCriticalSideEffectEdges - Look for critical edges with a PHI value that		/// SplitCriticalSideEffectEdges - Look for critical edges with a PHI value that
/// may trap on it. In this case we have to split the edge so that the path		/// may trap on it. In this case we have to split the edge so that the path
/// through the predecessor block that doesn't go to the phi block doesn't		/// through the predecessor block that doesn't go to the phi block doesn't
/// execute the possibly trapping instruction. If available, we pass domtree		/// execute the possibly trapping instruction. If available, we pass domtree
/// and loop info to be updated when we split critical edges. This is because		/// and loop info to be updated when we split critical edges. This is because
/// SelectionDAGISel preserves these analyses.		/// SelectionDAGISel preserves these analyses.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	bool SelectionDAGISel::runOnMachineFunction(MachineFunction &mf) {
DominatorTree *DT = DTWP ? &DTWP->getDomTree() : nullptr;		DominatorTree *DT = DTWP ? &DTWP->getDomTree() : nullptr;
auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();		auto *LIWP = getAnalysisIfAvailable<LoopInfoWrapperPass>();
LoopInfo *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;		LoopInfo *LI = LIWP ? &LIWP->getLoopInfo() : nullptr;

DEBUG(dbgs() << "\n\n\n=== " << Fn.getName() << "\n");		DEBUG(dbgs() << "\n\n\n=== " << Fn.getName() << "\n");

SplitCriticalSideEffectEdges(const_cast<Function &>(Fn), DT, LI);		SplitCriticalSideEffectEdges(const_cast<Function &>(Fn), DT, LI);

CurDAG->init(MF, ORE, this, LibInfo);		CurDAG->init(MF, ORE, this, LibInfo,
		getAnalysisIfAvailable<DivergenceAnalysis>());
FuncInfo->set(Fn, *MF, CurDAG);		FuncInfo->set(Fn, *MF, CurDAG);

// Now get the optional analyzes if we want to.		// Now get the optional analyzes if we want to.
// This is based on the possibly changed OptLevel (after optnone is taken		// This is based on the possibly changed OptLevel (after optnone is taken
// into account). That's unfortunate but OK because it just means we won't		// into account). That's unfortunate but OK because it just means we won't
// ask for passes that have been required anyway.		// ask for passes that have been required anyway.

if (UseMBPI && OptLevel != CodeGenOpt::None)		if (UseMBPI && OptLevel != CodeGenOpt::None)
▲ Show 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	#endif

// Run the DAG combiner in pre-legalize mode.		// Run the DAG combiner in pre-legalize mode.
{		{
NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,		NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);		CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);
}		}

		CurDAG->VerifyDAGDiverence();
		efriedmaUnsubmitted Not Done Reply Inline Actions Maybe `#ifndef NDEBUG`. Can we somehow skip this for targets which don't use divergence information? efriedma: Maybe `#ifndef NDEBUG`. Can we somehow skip this for targets which don't use divergence…

DEBUG(dbgs() << "Optimized lowered selection DAG: "		DEBUG(dbgs() << "Optimized lowered selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
CurDAG->dump());		CurDAG->dump());

// Second step, hack on the DAG until it only uses operations and types that		// Second step, hack on the DAG until it only uses operations and types that
// the target supports.		// the target supports.
if (ViewLegalizeTypesDAGs && MatchFilterBB)		if (ViewLegalizeTypesDAGs && MatchFilterBB)
CurDAG->viewGraph("legalize-types input for " + BlockName);		CurDAG->viewGraph("legalize-types input for " + BlockName);

bool Changed;		bool Changed;
{		{
NamedRegionTimer T("legalize_types", "Type Legalization", GroupName,		NamedRegionTimer T("legalize_types", "Type Legalization", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
Changed = CurDAG->LegalizeTypes();		Changed = CurDAG->LegalizeTypes();
}		}

		CurDAG->VerifyDAGDiverence();

DEBUG(dbgs() << "Type-legalized selection DAG: "		DEBUG(dbgs() << "Type-legalized selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
CurDAG->dump());		CurDAG->dump());

// Only allow creation of legal node types.		// Only allow creation of legal node types.
CurDAG->NewNodesMustHaveLegalTypes = true;		CurDAG->NewNodesMustHaveLegalTypes = true;

if (Changed) {		if (Changed) {
if (ViewDAGCombineLT && MatchFilterBB)		if (ViewDAGCombineLT && MatchFilterBB)
CurDAG->viewGraph("dag-combine-lt input for " + BlockName);		CurDAG->viewGraph("dag-combine-lt input for " + BlockName);

// Run the DAG combiner in post-type-legalize mode.		// Run the DAG combiner in post-type-legalize mode.
{		{
NamedRegionTimer T("combine_lt", "DAG Combining after legalize types",		NamedRegionTimer T("combine_lt", "DAG Combining after legalize types",
GroupName, GroupDescription, TimePassesIsEnabled);		GroupName, GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(AfterLegalizeTypes, AA, OptLevel);		CurDAG->Combine(AfterLegalizeTypes, AA, OptLevel);
}		}

		CurDAG->VerifyDAGDiverence();

DEBUG(dbgs() << "Optimized type-legalized selection DAG: "		DEBUG(dbgs() << "Optimized type-legalized selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
CurDAG->dump());		CurDAG->dump());
}		}

{		{
NamedRegionTimer T("legalize_vec", "Vector Legalization", GroupName,		NamedRegionTimer T("legalize_vec", "Vector Legalization", GroupName,
Show All 27 Lines	// Run the DAG combiner in post-type-legalize mode.
GroupName, GroupDescription, TimePassesIsEnabled);		GroupName, GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(AfterLegalizeVectorOps, AA, OptLevel);		CurDAG->Combine(AfterLegalizeVectorOps, AA, OptLevel);
}		}

DEBUG(dbgs() << "Optimized vector-legalized selection DAG: "		DEBUG(dbgs() << "Optimized vector-legalized selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
CurDAG->dump());		CurDAG->dump());

		CurDAG->VerifyDAGDiverence();
}		}

if (ViewLegalizeDAGs && MatchFilterBB)		if (ViewLegalizeDAGs && MatchFilterBB)
CurDAG->viewGraph("legalize input for " + BlockName);		CurDAG->viewGraph("legalize input for " + BlockName);

{		{
NamedRegionTimer T("legalize", "DAG Legalization", GroupName,		NamedRegionTimer T("legalize", "DAG Legalization", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
CurDAG->Legalize();		CurDAG->Legalize();
}		}

		CurDAG->VerifyDAGDiverence();

DEBUG(dbgs() << "Legalized selection DAG: "		DEBUG(dbgs() << "Legalized selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
CurDAG->dump());		CurDAG->dump());

if (ViewDAGCombine2 && MatchFilterBB)		if (ViewDAGCombine2 && MatchFilterBB)
CurDAG->viewGraph("dag-combine2 input for " + BlockName);		CurDAG->viewGraph("dag-combine2 input for " + BlockName);

// Run the DAG combiner in post-legalize mode.		// Run the DAG combiner in post-legalize mode.
{		{
NamedRegionTimer T("combine2", "DAG Combining 2", GroupName,		NamedRegionTimer T("combine2", "DAG Combining 2", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(AfterLegalizeDAG, AA, OptLevel);		CurDAG->Combine(AfterLegalizeDAG, AA, OptLevel);
}		}

		CurDAG->VerifyDAGDiverence();

DEBUG(dbgs() << "Optimized legalized selection DAG: "		DEBUG(dbgs() << "Optimized legalized selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
CurDAG->dump());		CurDAG->dump());

if (OptLevel != CodeGenOpt::None)		if (OptLevel != CodeGenOpt::None)
ComputeLiveOutVRegInfo();		ComputeLiveOutVRegInfo();

▲ Show 20 Lines • Show All 536 Lines • ▼ Show 20 Lines	void SelectionDAGISel::SelectAllBasicBlocks(const Function &Fn) {
// first.		// first.
assert(*RPOT.begin() == &Fn.getEntryBlock());		assert(*RPOT.begin() == &Fn.getEntryBlock());
++NumEntryBlocks;		++NumEntryBlocks;

// Set up FuncInfo for ISel. Entry blocks never have PHIs.		// Set up FuncInfo for ISel. Entry blocks never have PHIs.
FuncInfo->MBB = FuncInfo->MBBMap[&Fn.getEntryBlock()];		FuncInfo->MBB = FuncInfo->MBBMap[&Fn.getEntryBlock()];
FuncInfo->InsertPt = FuncInfo->MBB->begin();		FuncInfo->InsertPt = FuncInfo->MBB->begin();

		CurDAG->setFunctionLoweringInfo(FuncInfo);

if (!FastIS) {		if (!FastIS) {
LowerArguments(Fn);		LowerArguments(Fn);
} else {		} else {
// See if fast isel can lower the arguments.		// See if fast isel can lower the arguments.
FastIS->startNewBlock();		FastIS->startNewBlock();
if (!FastIS->lowerArguments()) {		if (!FastIS->lowerArguments()) {
FastISelFailed = true;		FastISelFailed = true;
// Fast isel failed to lower these arguments		// Fast isel failed to lower these arguments
▲ Show 20 Lines • Show All 2,278 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

Show All 21 Lines
#include "SIDefines.h"		#include "SIDefines.h"
#include "SIISelLowering.h"		#include "SIISelLowering.h"
#include "SIInstrInfo.h"		#include "SIInstrInfo.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
#include "SIRegisterInfo.h"		#include "SIRegisterInfo.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/FunctionLoweringInfo.h"		#include "llvm/CodeGen/FunctionLoweringInfo.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/MachineValueType.h"		#include "llvm/CodeGen/MachineValueType.h"
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/SelectionDAGISel.h"		#include "llvm/CodeGen/SelectionDAGISel.h"
Show All 40 Lines	explicit AMDGPUDAGToDAGISel(TargetMachine *TM = nullptr,
: SelectionDAGISel(*TM, OptLevel) {		: SelectionDAGISel(*TM, OptLevel) {
AMDGPUASI = AMDGPU::getAMDGPUAS(*TM);		AMDGPUASI = AMDGPU::getAMDGPUAS(*TM);
EnableLateStructurizeCFG = AMDGPUTargetMachine::EnableLateStructurizeCFG;		EnableLateStructurizeCFG = AMDGPUTargetMachine::EnableLateStructurizeCFG;
}		}
~AMDGPUDAGToDAGISel() override = default;		~AMDGPUDAGToDAGISel() override = default;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AMDGPUArgumentUsageInfo>();		AU.addRequired<AMDGPUArgumentUsageInfo>();
		AU.addRequired<DivergenceAnalysis>();
SelectionDAGISel::getAnalysisUsage(AU);		SelectionDAGISel::getAnalysisUsage(AU);
}		}

bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;
void Select(SDNode *N) override;		void Select(SDNode *N) override;
StringRef getPassName() const override;		StringRef getPassName() const override;
void PostprocessISelDAG() override;		void PostprocessISelDAG() override;

▲ Show 20 Lines • Show All 2,149 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	public:

bool storeOfVectorConstantIsCheap(EVT MemVT,		bool storeOfVectorConstantIsCheap(EVT MemVT,
unsigned NumElem,		unsigned NumElem,
unsigned AS) const override;		unsigned AS) const override;
bool aggressivelyPreferBuildVectorSources(EVT VecVT) const override;		bool aggressivelyPreferBuildVectorSources(EVT VecVT) const override;
bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz() const override;
bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz() const override;

		bool isSDNodeSourceOfDivergence(const SDNode * N,
		FunctionLoweringInfo * FLI, DivergenceAnalysis * DA) const;
		bool isSDNodeAlwaysUniform(const SDNode * N) const;
static CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool IsVarArg);		static CCAssignFn *CCAssignFnForCall(CallingConv::ID CC, bool IsVarArg);
static CCAssignFn *CCAssignFnForReturn(CallingConv::ID CC, bool IsVarArg);		static CCAssignFn *CCAssignFnForReturn(CallingConv::ID CC, bool IsVarArg);

SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals, const SDLoc &DL,		const SmallVectorImpl<SDValue> &OutVals, const SDLoc &DL,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;

▲ Show 20 Lines • Show All 399 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

	Show All 19 Lines
	#include "AMDGPUISelLowering.h"			#include "AMDGPUISelLowering.h"
	#include "AMDGPU.h"			#include "AMDGPU.h"
	#include "AMDGPUCallLowering.h"			#include "AMDGPUCallLowering.h"
	#include "AMDGPUFrameLowering.h"			#include "AMDGPUFrameLowering.h"
	#include "AMDGPUIntrinsicInfo.h"			#include "AMDGPUIntrinsicInfo.h"
	#include "AMDGPURegisterInfo.h"			#include "AMDGPURegisterInfo.h"
	#include "AMDGPUSubtarget.h"			#include "AMDGPUSubtarget.h"
	#include "AMDGPUTargetMachine.h"			#include "AMDGPUTargetMachine.h"
				#include "Utils/AMDGPUBaseInfo.h"
	#include "R600MachineFunctionInfo.h"			#include "R600MachineFunctionInfo.h"
	#include "SIInstrInfo.h"			#include "SIInstrInfo.h"
	#include "SIMachineFunctionInfo.h"			#include "SIMachineFunctionInfo.h"
	#include "llvm/CodeGen/CallingConvLower.h"			#include "llvm/CodeGen/CallingConvLower.h"
	#include "llvm/CodeGen/MachineFunction.h"			#include "llvm/CodeGen/MachineFunction.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"
	#include "llvm/CodeGen/SelectionDAG.h"			#include "llvm/CodeGen/SelectionDAG.h"
	#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"			#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
	▲ Show 20 Lines • Show All 707 Lines • ▼ Show 20 Lines
	bool AMDGPUTargetLowering::isCheapToSpeculateCttz() const {			bool AMDGPUTargetLowering::isCheapToSpeculateCttz() const {
	return true;			return true;
	}			}

	bool AMDGPUTargetLowering::isCheapToSpeculateCtlz() const {			bool AMDGPUTargetLowering::isCheapToSpeculateCtlz() const {
	return true;			return true;
	}			}

				bool AMDGPUTargetLowering::isSDNodeAlwaysUniform(const SDNode * N) const {
				switch (N->getOpcode()) {
				default:
				return false;
				case ISD::EntryToken:
				case ISD::TokenFactor:
				return true;
				case ISD::INTRINSIC_WO_CHAIN:
				{
				unsigned IntrID = cast<ConstantSDNode>(N->getOperand(0))->getZExtValue();
				switch (IntrID) {
				default:
				return false;
				case Intrinsic::amdgcn_readfirstlane:
				case Intrinsic::amdgcn_readlane:
				return true;
				}
				}
				break;
				case ISD::LOAD:
				{
				const LoadSDNode * L = dyn_cast<LoadSDNode>(N);
				if (L->getMemOperand()->getAddrSpace()
				== Subtarget->getAMDGPUAS().CONSTANT_ADDRESS_32BIT)
				return true;
				return false;
				}
				break;
				}
				}

				rampitecUnsubmitted Not Done Reply Inline Actions SIRegisterInfo::isVGPR() rampitec: SIRegisterInfo::isVGPR()
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions What should we do fro R600Subtarget? alex-t: What should we do fro R600Subtarget?
				rampitecUnsubmitted Not Done Reply Inline Actions Since we are not going to change selection in R600 that is practically non-important what we do for this case. rampitec: Since we are not going to change selection in R600 that is practically non-important what we do…
				bool AMDGPUTargetLowering::isSDNodeSourceOfDivergence(const SDNode * N,
				FunctionLoweringInfo * FLI, DivergenceAnalysis * DA) const
				{
				switch (N->getOpcode()) {
				case ISD::Register:
				rampitecUnsubmitted Not Done Reply Inline Actions I am afraid that is not true to say that VGPR is necessarily divergent. rampitec: I am afraid that is not true to say that VGPR is necessarily divergent.
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions That's not true. Do we have a mean to detect that this is a splat vector? If not I'd stay with conservative approach that consider all VGPRs divergent. Alternatively we could add one more target hook to query for special VGPRs that are uniform. alex-t: That's not true. Do we have a mean to detect that this is a splat vector? If not I'd stay with…
				rampitecUnsubmitted Not Done Reply Inline Actions I doubt you can reliably detect it. The concern is potential unneeded moves and readfirstlane instructions, one thing that we are trying to avoid here. rampitec: I doubt you can reliably detect it. The concern is potential unneeded moves and readfirstlane…
				rampitecUnsubmitted Not Done Reply Inline Actions I withdraw this objection. Apparently this is all about physregs, and we do not have a lot of them at this stage. rampitec: I withdraw this objection. Apparently this is all about physregs, and we do not have a lot of…
				case ISD::CopyFromReg:
				{
				const RegisterSDNode *R = nullptr;
				if (N->getOpcode() == ISD::Register) {
				R = dyn_cast<RegisterSDNode>(N);
				}
				else {
				R = dyn_cast<RegisterSDNode>(N->getOperand(1));
				rampitecUnsubmitted Not Done Reply Inline Actions !DA \|\| DA->isDivergent(...) You are using getAnalysisIfAvailable, so it can be missing. rampitec: !DA \|\| DA->isDivergent(...) You are using getAnalysisIfAvailable, so it can be missing.
				alex-tAuthorUnsubmitted Not Done Reply Inline Actions if DA == nulptr in the case above we'd return ((bool)!DA) true? maybe it's better return false for the targets that have no DA? I mean " DA && DA->isDivegent()" if we have no DA we return false. In case we have, the returned value will be defined by the isDivergent result alex-t: if DA == nulptr in the case above we'd return ((bool)!DA) true? maybe it's better return false…
				rampitecUnsubmitted Not Done Reply Inline Actions That is conservatively correct to return true. Presumably targets w/o DA will have no use of the bit anyway, but if they are it is dangerous to assume uniformness. rampitec: That is conservatively correct to return true. Presumably targets w/o DA will have no use of…
				alex-tAuthorUnsubmitted Done Reply Inline Actions targets w/o DA will have no use of the bit anyway, but if they are it is dangerous Sounds a bit paranoid :) I just noted that returning "true" for the target that has no divergence at all looks misleading. alex-t: > targets w/o DA will have no use of the bit anyway, but if they are it is dangerous Sounds a…
				alex-tAuthorUnsubmitted Done Reply Inline Actions Moreover, targets that have no DA have neither overridden isSDNodeSourceOfDivergence and will never get here. alex-t: Moreover, targets that have no DA have neither overridden isSDNodeSourceOfDivergence and will…
				rampitecUnsubmitted Done Reply Inline Actions What if DA is invalidated and thus NULL, even for targets with divergence? rampitec: What if DA is invalidated and thus NULL, even for targets with divergence?
				}
				if (R)
				{
				const MachineFunction * MF = FLI->MF;
				const SISubtarget &ST = MF->getSubtarget<SISubtarget>();
				const MachineRegisterInfo &MRI = MF->getRegInfo();
				const SIRegisterInfo &TRI = ST.getInstrInfo()->getRegisterInfo();
				unsigned Reg = R->getReg();
				if (TRI.isPhysicalRegister(Reg))
				return TRI.isVGPR(MRI, Reg);

				if (MRI.isLiveIn(Reg)) {
				// workitem.id.x workitem.id.y workitem.id.z
				if ((MRI.getLiveInPhysReg(Reg) == AMDGPU::T0_X) \|\|
				(MRI.getLiveInPhysReg(Reg) == AMDGPU::T0_Y) \|\|
				(MRI.getLiveInPhysReg(Reg) == AMDGPU::T0_Z)\|\|
				(MRI.getLiveInPhysReg(Reg) == AMDGPU::VGPR0) \|\|
				(MRI.getLiveInPhysReg(Reg) == AMDGPU::VGPR1) \|\|
				(MRI.getLiveInPhysReg(Reg) == AMDGPU::VGPR2))
				rampitecUnsubmitted Done Reply Inline Actions Can you make isIntrinsicSourceOfDivergence() external and use it instead? rampitec: Can you make isIntrinsicSourceOfDivergence() external and use it instead?
				return true;
				// Formal arguments of non-entry functions
				// are conservatively considered divergent
				else if (!AMDGPU::isEntryFunctionCC(FLI->Fn->getCallingConv()))
				return true;
				}
				return !DA \|\| DA->isDivergent(FLI->getValueFromVirtualReg(Reg));
				}
				}
				break;
				case ISD::LOAD:
				{
				const LoadSDNode * L = dyn_cast<LoadSDNode>(N);
				rampitecUnsubmitted Done Reply Inline Actions It still duplicates implementation in AMDGPUTargetTransformInfo. rampitec: It still duplicates implementation in AMDGPUTargetTransformInfo.
				if (L->getMemOperand()->getAddrSpace() == Subtarget->getAMDGPUAS().PRIVATE_ADDRESS)
				return true;
				}
				break;
				case ISD::CALLSEQ_END:
				return true;
				break;
				case ISD::INTRINSIC_WO_CHAIN:
				{

				}
				efriedmaUnsubmitted Done Reply Inline Actions Formatting? efriedma: Formatting?
				return AMDGPU::isIntrinsicSourceOfDivergence(
				cast<ConstantSDNode>(N->getOperand(0))->getZExtValue());
				case ISD::INTRINSIC_W_CHAIN:
				return AMDGPU::isIntrinsicSourceOfDivergence(
				cast<ConstantSDNode>(N->getOperand(1))->getZExtValue());
				}
				return false;
				}

	//===---------------------------------------------------------------------===//			//===---------------------------------------------------------------------===//
	// Target Properties			// Target Properties
	//===---------------------------------------------------------------------===//			//===---------------------------------------------------------------------===//

	bool AMDGPUTargetLowering::isFAbsFree(EVT VT) const {			bool AMDGPUTargetLowering::isFAbsFree(EVT VT) const {
	assert(VT.isFloatingPoint());			assert(VT.isFloatingPoint());

	// Packed operations do not have a fabs modifier.			// Packed operations do not have a fabs modifier.
	▲ Show 20 Lines • Show All 3,490 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show All 11 Lines
// AMDGPU target machine. It uses the target's detailed information to provide		// AMDGPU target machine. It uses the target's detailed information to provide
// more precise answers to certain TTI queries, while letting the target		// more precise answers to certain TTI queries, while letting the target
// independent and default TTI implementations handle the rest.		// independent and default TTI implementations handle the rest.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPUTargetTransformInfo.h"		#include "AMDGPUTargetTransformInfo.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
		#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineValueType.h"		#include "llvm/CodeGen/MachineValueType.h"
#include "llvm/CodeGen/ValueTypes.h"		#include "llvm/CodeGen/ValueTypes.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	case Instruction::InsertElement: {
// Dynamic indexing isn't free and is best avoided.		// Dynamic indexing isn't free and is best avoided.
return Index == ~0u ? 2 : 0;		return Index == ~0u ? 2 : 0;
}		}
default:		default:
return BaseT::getVectorInstrCost(Opcode, ValTy, Index);		return BaseT::getVectorInstrCost(Opcode, ValTy, Index);
}		}
}		}

static bool isIntrinsicSourceOfDivergence(const IntrinsicInst *I) {
switch (I->getIntrinsicID()) {
case Intrinsic::amdgcn_workitem_id_x:
case Intrinsic::amdgcn_workitem_id_y:
case Intrinsic::amdgcn_workitem_id_z:
case Intrinsic::amdgcn_interp_mov:
case Intrinsic::amdgcn_interp_p1:
case Intrinsic::amdgcn_interp_p2:
case Intrinsic::amdgcn_mbcnt_hi:
case Intrinsic::amdgcn_mbcnt_lo:
case Intrinsic::r600_read_tidig_x:
case Intrinsic::r600_read_tidig_y:
case Intrinsic::r600_read_tidig_z:
case Intrinsic::amdgcn_atomic_inc:
case Intrinsic::amdgcn_atomic_dec:
case Intrinsic::amdgcn_ds_fadd:
case Intrinsic::amdgcn_ds_fmin:
case Intrinsic::amdgcn_ds_fmax:
case Intrinsic::amdgcn_image_atomic_swap:
case Intrinsic::amdgcn_image_atomic_add:
case Intrinsic::amdgcn_image_atomic_sub:
case Intrinsic::amdgcn_image_atomic_smin:
case Intrinsic::amdgcn_image_atomic_umin:
case Intrinsic::amdgcn_image_atomic_smax:
case Intrinsic::amdgcn_image_atomic_umax:
case Intrinsic::amdgcn_image_atomic_and:
case Intrinsic::amdgcn_image_atomic_or:
case Intrinsic::amdgcn_image_atomic_xor:
case Intrinsic::amdgcn_image_atomic_inc:
case Intrinsic::amdgcn_image_atomic_dec:
case Intrinsic::amdgcn_image_atomic_cmpswap:
case Intrinsic::amdgcn_buffer_atomic_swap:
case Intrinsic::amdgcn_buffer_atomic_add:
case Intrinsic::amdgcn_buffer_atomic_sub:
case Intrinsic::amdgcn_buffer_atomic_smin:
case Intrinsic::amdgcn_buffer_atomic_umin:
case Intrinsic::amdgcn_buffer_atomic_smax:
case Intrinsic::amdgcn_buffer_atomic_umax:
case Intrinsic::amdgcn_buffer_atomic_and:
case Intrinsic::amdgcn_buffer_atomic_or:
case Intrinsic::amdgcn_buffer_atomic_xor:
case Intrinsic::amdgcn_buffer_atomic_cmpswap:
case Intrinsic::amdgcn_ps_live:
case Intrinsic::amdgcn_ds_swizzle:
return true;
default:
return false;
}
}

static bool isArgPassedInSGPR(const Argument *A) {		static bool isArgPassedInSGPR(const Argument *A) {
const Function *F = A->getParent();		const Function *F = A->getParent();

// Arguments to compute shaders are never a source of divergence.		// Arguments to compute shaders are never a source of divergence.
CallingConv::ID CC = F->getCallingConv();		CallingConv::ID CC = F->getCallingConv();
switch (CC) {		switch (CC) {
case CallingConv::AMDGPU_KERNEL:		case CallingConv::AMDGPU_KERNEL:
Show All 34 Lines	bool AMDGPUTTIImpl::isSourceOfDivergence(const Value *V) const {
// Atomics are divergent because they are executed sequentially: when an		// Atomics are divergent because they are executed sequentially: when an
// atomic operation refers to the same address in each thread, then each		// atomic operation refers to the same address in each thread, then each
// thread after the first sees the value written by the previous thread as		// thread after the first sees the value written by the previous thread as
// original value.		// original value.
if (isa<AtomicRMWInst>(V) \|\| isa<AtomicCmpXchgInst>(V))		if (isa<AtomicRMWInst>(V) \|\| isa<AtomicCmpXchgInst>(V))
return true;		return true;

if (const IntrinsicInst *Intrinsic = dyn_cast<IntrinsicInst>(V))		if (const IntrinsicInst *Intrinsic = dyn_cast<IntrinsicInst>(V))
return isIntrinsicSourceOfDivergence(Intrinsic);		return AMDGPU::isIntrinsicSourceOfDivergence(Intrinsic->getIntrinsicID());

// Assume all function calls are a source of divergence.		// Assume all function calls are a source of divergence.
if (isa<CallInst>(V) \|\| isa<InvokeInst>(V))		if (isa<CallInst>(V) \|\| isa<InvokeInst>(V))
return true;		return true;

return false;		return false;
}		}

▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,381 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
// then we need to use the same legalization rules we use for private.		// then we need to use the same legalization rules we use for private.
if (AS == AMDGPUASI.FLAT_ADDRESS)		if (AS == AMDGPUASI.FLAT_ADDRESS)
AS = MFI->hasFlatScratchInit() ?		AS = MFI->hasFlatScratchInit() ?
AMDGPUASI.PRIVATE_ADDRESS : AMDGPUASI.GLOBAL_ADDRESS;		AMDGPUASI.PRIVATE_ADDRESS : AMDGPUASI.GLOBAL_ADDRESS;

unsigned NumElements = MemVT.getVectorNumElements();		unsigned NumElements = MemVT.getVectorNumElements();
if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|
AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT) {		AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT) {
if (isMemOpUniform(Load))		if (!Op->isDivergent())
return SDValue();		return SDValue();
// Non-uniform loads will be selected to MUBUF instructions, so they		// Non-uniform loads will be selected to MUBUF instructions, so they
// have the same legalization requirements as global and private		// have the same legalization requirements as global and private
// loads.		// loads.
//		//
}		}
if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|
AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT \|\|		AS == AMDGPUASI.CONSTANT_ADDRESS_32BIT \|\|
AS == AMDGPUASI.GLOBAL_ADDRESS) {		AS == AMDGPUASI.GLOBAL_ADDRESS) {
if (Subtarget->getScalarizeGlobalBehavior() && isMemOpUniform(Load) &&		if (Subtarget->getScalarizeGlobalBehavior() && !Op->isDivergent() &&
!Load->isVolatile() && isMemOpHasNoClobberedMemOperand(Load))		!Load->isVolatile() && isMemOpHasNoClobberedMemOperand(Load))
return SDValue();		return SDValue();
// Non-uniform loads will be selected to MUBUF instructions, so they		// Non-uniform loads will be selected to MUBUF instructions, so they
// have the same legalization requirements as global and private		// have the same legalization requirements as global and private
// loads.		// loads.
//		//
}		}
if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\|
▲ Show 20 Lines • Show All 2,373 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SMInstructions.td

	Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Scalar Memory Patterns			// Scalar Memory Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//


	def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{			def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{
	auto Ld = cast<LoadSDNode>(N);			auto Ld = cast<LoadSDNode>(N);
	return Ld->getAlignment() >= 4 &&			return Ld->getAlignment() >= 4 &&
	(((Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS \|\|			((((Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS) \|\| (Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT)) && !N->isDivergent()) \|\|
	Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS_32BIT) &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N)) \|\|
	(Subtarget->getScalarizeGlobalBehavior() && Ld->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS &&			(Subtarget->getScalarizeGlobalBehavior() && Ld->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS &&
	!Ld->isVolatile() &&			!Ld->isVolatile() && !N->isDivergent() &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N) &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N)));			static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N)));
	}]>;			}]>;

	def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;			def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;
	def SMRDImm32 : ComplexPattern<i64, 2, "SelectSMRDImm32">;			def SMRDImm32 : ComplexPattern<i64, 2, "SelectSMRDImm32">;
	def SMRDSgpr : ComplexPattern<i64, 2, "SelectSMRDSgpr">;			def SMRDSgpr : ComplexPattern<i64, 2, "SelectSMRDSgpr">;
	def SMRDBufferImm : ComplexPattern<i32, 1, "SelectSMRDBufferImm">;			def SMRDBufferImm : ComplexPattern<i32, 1, "SelectSMRDBufferImm">;
	def SMRDBufferImm32 : ComplexPattern<i32, 1, "SelectSMRDBufferImm32">;			def SMRDBufferImm32 : ComplexPattern<i32, 1, "SelectSMRDBufferImm32">;
	▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

	Show First 20 Lines • Show All 376 Lines • ▼ Show 20 Lines
	/// offset field.			/// offset field.
	int64_t getSMRDEncodedOffset(const MCSubtargetInfo &ST, int64_t ByteOffset);			int64_t getSMRDEncodedOffset(const MCSubtargetInfo &ST, int64_t ByteOffset);

	/// \returns true if this offset is small enough to fit in the SMRD			/// \returns true if this offset is small enough to fit in the SMRD
	/// offset field. \p ByteOffset should be the offset in bytes and			/// offset field. \p ByteOffset should be the offset in bytes and
	/// not the encoded offset.			/// not the encoded offset.
	bool isLegalSMRDImmOffset(const MCSubtargetInfo &ST, int64_t ByteOffset);			bool isLegalSMRDImmOffset(const MCSubtargetInfo &ST, int64_t ByteOffset);

				/// \returns true if the intrinsic is divergent
				bool isIntrinsicSourceOfDivergence(unsigned IntrID);

	} // end namespace AMDGPU			} // end namespace AMDGPU
	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_LIB_TARGET_AMDGPU_UTILS_AMDGPUBASEINFO_H			#endif // LLVM_LIB_TARGET_AMDGPU_UTILS_AMDGPUBASEINFO_H

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

	//===- AMDGPUBaseInfo.cpp - AMDGPU Base encoding information --------------===//			//===- AMDGPUBaseInfo.cpp - AMDGPU Base encoding information --------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "AMDGPUBaseInfo.h"			#include "AMDGPUBaseInfo.h"
				#include "AMDGPUTargetTransformInfo.h"
	#include "AMDGPU.h"			#include "AMDGPU.h"
	#include "SIDefines.h"			#include "SIDefines.h"
	#include "llvm/ADT/StringRef.h"			#include "llvm/ADT/StringRef.h"
	#include "llvm/ADT/Triple.h"			#include "llvm/ADT/Triple.h"
	#include "llvm/BinaryFormat/ELF.h"			#include "llvm/BinaryFormat/ELF.h"
	#include "llvm/CodeGen/MachineMemOperand.h"			#include "llvm/CodeGen/MachineMemOperand.h"
	#include "llvm/IR/Attributes.h"			#include "llvm/IR/Attributes.h"
	#include "llvm/IR/Constants.h"			#include "llvm/IR/Constants.h"
	▲ Show 20 Lines • Show All 916 Lines • ▼ Show 20 Lines

	AMDGPUAS getAMDGPUAS(const TargetMachine &M) {			AMDGPUAS getAMDGPUAS(const TargetMachine &M) {
	return getAMDGPUAS(M.getTargetTriple());			return getAMDGPUAS(M.getTargetTriple());
	}			}

	AMDGPUAS getAMDGPUAS(const Module &M) {			AMDGPUAS getAMDGPUAS(const Module &M) {
	return getAMDGPUAS(Triple(M.getTargetTriple()));			return getAMDGPUAS(Triple(M.getTargetTriple()));
	}			}

				bool isIntrinsicSourceOfDivergence(unsigned IntrID) {
				switch (IntrID) {
				case Intrinsic::amdgcn_workitem_id_x:
				case Intrinsic::amdgcn_workitem_id_y:
				case Intrinsic::amdgcn_workitem_id_z:
				case Intrinsic::amdgcn_interp_mov:
				case Intrinsic::amdgcn_interp_p1:
				case Intrinsic::amdgcn_interp_p2:
				case Intrinsic::amdgcn_mbcnt_hi:
				case Intrinsic::amdgcn_mbcnt_lo:
				case Intrinsic::r600_read_tidig_x:
				case Intrinsic::r600_read_tidig_y:
				case Intrinsic::r600_read_tidig_z:
				case Intrinsic::amdgcn_atomic_inc:
				case Intrinsic::amdgcn_atomic_dec:
				case Intrinsic::amdgcn_ds_fadd:
				case Intrinsic::amdgcn_ds_fmin:
				case Intrinsic::amdgcn_ds_fmax:
				case Intrinsic::amdgcn_image_atomic_swap:
				case Intrinsic::amdgcn_image_atomic_add:
				case Intrinsic::amdgcn_image_atomic_sub:
				case Intrinsic::amdgcn_image_atomic_smin:
				case Intrinsic::amdgcn_image_atomic_umin:
				case Intrinsic::amdgcn_image_atomic_smax:
				case Intrinsic::amdgcn_image_atomic_umax:
				case Intrinsic::amdgcn_image_atomic_and:
				case Intrinsic::amdgcn_image_atomic_or:
				case Intrinsic::amdgcn_image_atomic_xor:
				case Intrinsic::amdgcn_image_atomic_inc:
				case Intrinsic::amdgcn_image_atomic_dec:
				case Intrinsic::amdgcn_image_atomic_cmpswap:
				case Intrinsic::amdgcn_buffer_atomic_swap:
				case Intrinsic::amdgcn_buffer_atomic_add:
				case Intrinsic::amdgcn_buffer_atomic_sub:
				case Intrinsic::amdgcn_buffer_atomic_smin:
				case Intrinsic::amdgcn_buffer_atomic_umin:
				case Intrinsic::amdgcn_buffer_atomic_smax:
				case Intrinsic::amdgcn_buffer_atomic_umax:
				case Intrinsic::amdgcn_buffer_atomic_and:
				case Intrinsic::amdgcn_buffer_atomic_or:
				case Intrinsic::amdgcn_buffer_atomic_xor:
				case Intrinsic::amdgcn_buffer_atomic_cmpswap:
				case Intrinsic::amdgcn_ps_live:
				case Intrinsic::amdgcn_ds_swizzle:
				return true;
				default:
				return false;
				}
				}
	} // namespace AMDGPU			} // namespace AMDGPU
	} // namespace llvm			} // namespace llvm

test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,CIVI %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9 %s

	; GCN-LABEL: {{^}}use_dispatch_ptr:			; GCN-LABEL: {{^}}use_dispatch_ptr:
	; GCN: s_load_dword s{{[0-9]+}}, s[6:7], 0x0			; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s6
				; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s7
				; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
	define void @use_dispatch_ptr() #1 {			define void @use_dispatch_ptr() #1 {
	%dispatch_ptr = call noalias i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr() #0			%dispatch_ptr = call noalias i8 addrspace(4)* @llvm.amdgcn.dispatch.ptr() #0
	%header_ptr = bitcast i8 addrspace(4)* %dispatch_ptr to i32 addrspace(4)*			%header_ptr = bitcast i8 addrspace(4)* %dispatch_ptr to i32 addrspace(4)*
	%value = load volatile i32, i32 addrspace(4)* %header_ptr			%value = load volatile i32, i32 addrspace(4)* %header_ptr
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}kern_indirect_use_dispatch_ptr:			; GCN-LABEL: {{^}}kern_indirect_use_dispatch_ptr:
	; GCN: enable_sgpr_dispatch_ptr = 1			; GCN: enable_sgpr_dispatch_ptr = 1
	; GCN: s_mov_b64 s[6:7], s[4:5]			; GCN: s_mov_b64 s[6:7], s[4:5]
	define amdgpu_kernel void @kern_indirect_use_dispatch_ptr(i32) #1 {			define amdgpu_kernel void @kern_indirect_use_dispatch_ptr(i32) #1 {
	call void @use_dispatch_ptr()			call void @use_dispatch_ptr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_queue_ptr:			; GCN-LABEL: {{^}}use_queue_ptr:
	; GCN: s_load_dword s{{[0-9]+}}, s[6:7], 0x0			; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s6
				; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s7
				; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
	define void @use_queue_ptr() #1 {			define void @use_queue_ptr() #1 {
	%queue_ptr = call noalias i8 addrspace(4)* @llvm.amdgcn.queue.ptr() #0			%queue_ptr = call noalias i8 addrspace(4)* @llvm.amdgcn.queue.ptr() #0
	%header_ptr = bitcast i8 addrspace(4)* %queue_ptr to i32 addrspace(4)*			%header_ptr = bitcast i8 addrspace(4)* %queue_ptr to i32 addrspace(4)*
	%value = load volatile i32, i32 addrspace(4)* %header_ptr			%value = load volatile i32, i32 addrspace(4)* %header_ptr
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}kern_indirect_use_queue_ptr:			; GCN-LABEL: {{^}}kern_indirect_use_queue_ptr:
	; GCN: enable_sgpr_queue_ptr = 1			; GCN: enable_sgpr_queue_ptr = 1
	; GCN: s_mov_b64 s[6:7], s[4:5]			; GCN: s_mov_b64 s[6:7], s[4:5]
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kern_indirect_use_queue_ptr(i32) #1 {			define amdgpu_kernel void @kern_indirect_use_queue_ptr(i32) #1 {
	call void @use_queue_ptr()			call void @use_queue_ptr()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_queue_ptr_addrspacecast:			; GCN-LABEL: {{^}}use_queue_ptr_addrspacecast:
	; CIVI: s_load_dword [[APERTURE_LOAD:s[0-9]+]], s[6:7], 0x10			; CIVI: flat_load_dword v[[HI:[0-9]+]], v[0:1]
	; GFX9: s_getreg_b32 [[APERTURE_LOAD:s[0-9]+]]			; GFX9: s_getreg_b32 [[APERTURE_LOAD:s[0-9]+]]
				; CIVI: v_mov_b32_e32 v[[LO:[0-9]+]], 16
	; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], [[APERTURE_LOAD]]			; GFX9: v_mov_b32_e32 v[[HI:[0-9]+]], [[APERTURE_LOAD]]
	; GCN: {{flat\|global}}_store_dword v{{\[[0-9]+}}:[[HI]]{{\]}}			; GFX9: {{flat\|global}}_store_dword v{{\[[0-9]+}}:[[HI]]{{\]}}
				; CIVI: {{flat\|global}}_store_dword v{{\[}}[[LO]]:[[HI]]{{\]}}
	define void @use_queue_ptr_addrspacecast() #1 {			define void @use_queue_ptr_addrspacecast() #1 {
	%asc = addrspacecast i32 addrspace(3)* inttoptr (i32 16 to i32 addrspace(3)) to i32			%asc = addrspacecast i32 addrspace(3)* inttoptr (i32 16 to i32 addrspace(3)) to i32
	store volatile i32 0, i32* %asc			store volatile i32 0, i32* %asc
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}kern_indirect_use_queue_ptr_addrspacecast:			; GCN-LABEL: {{^}}kern_indirect_use_queue_ptr_addrspacecast:
	; CIVI: enable_sgpr_queue_ptr = 1			; CIVI: enable_sgpr_queue_ptr = 1

	; CIVI: s_mov_b64 s[6:7], s[4:5]			; CIVI: s_mov_b64 s[6:7], s[4:5]
	; GFX9-NOT: s_mov_b64			; GFX9-NOT: s_mov_b64
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kern_indirect_use_queue_ptr_addrspacecast(i32) #1 {			define amdgpu_kernel void @kern_indirect_use_queue_ptr_addrspacecast(i32) #1 {
	call void @use_queue_ptr_addrspacecast()			call void @use_queue_ptr_addrspacecast()
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_kernarg_segment_ptr:			; GCN-LABEL: {{^}}use_kernarg_segment_ptr:
	; GCN: s_load_dword s{{[0-9]+}}, s[6:7], 0x0			; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s6
				; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s7
				; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
	define void @use_kernarg_segment_ptr() #1 {			define void @use_kernarg_segment_ptr() #1 {
	%kernarg_segment_ptr = call noalias i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr() #0			%kernarg_segment_ptr = call noalias i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr() #0
	%header_ptr = bitcast i8 addrspace(4)* %kernarg_segment_ptr to i32 addrspace(4)*			%header_ptr = bitcast i8 addrspace(4)* %kernarg_segment_ptr to i32 addrspace(4)*
	%value = load volatile i32, i32 addrspace(4)* %header_ptr			%value = load volatile i32, i32 addrspace(4)* %header_ptr
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}kern_indirect_use_kernarg_segment_ptr:			; GCN-LABEL: {{^}}kern_indirect_use_kernarg_segment_ptr:
	▲ Show 20 Lines • Show All 347 Lines • ▼ Show 20 Lines
	; GCN: s_swappc_b64			; GCN: s_swappc_b64
	define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_z() #1 {			define amdgpu_kernel void @kern_indirect_other_arg_use_workgroup_id_z() #1 {
	call void @other_arg_use_workgroup_id_z(i32 555)			call void @other_arg_use_workgroup_id_z(i32 555)
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}use_every_sgpr_input:			; GCN-LABEL: {{^}}use_every_sgpr_input:
	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:4			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:4
	; GCN: s_load_dword s{{[0-9]+}}, s[6:7], 0x0			; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s6
	; GCN: s_load_dword s{{[0-9]+}}, s[8:9], 0x0			; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s7
	; GCN: s_load_dword s{{[0-9]+}}, s[10:11], 0x0			; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
				; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s8
				; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s9
				; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
				; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s10
				; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s11
				; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
	; GCN: ; use s[12:13]			; GCN: ; use s[12:13]
	; GCN: ; use s14			; GCN: ; use s14
	; GCN: ; use s15			; GCN: ; use s15
	; GCN: ; use s16			; GCN: ; use s16
	define void @use_every_sgpr_input() #1 {			define void @use_every_sgpr_input() #1 {
	%alloca = alloca i32, align 4, addrspace(5)			%alloca = alloca i32, align 4, addrspace(5)
	store volatile i32 0, i32 addrspace(5)* %alloca			store volatile i32 0, i32 addrspace(5)* %alloca

	▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines

	; GCN-DAG: s_mov_b32 [[SAVE_X:s[0-57-9][0-9]*]], s14			; GCN-DAG: s_mov_b32 [[SAVE_X:s[0-57-9][0-9]*]], s14
	; GCN-DAG: s_mov_b32 [[SAVE_Y:s[0-68-9][0-9]*]], s15			; GCN-DAG: s_mov_b32 [[SAVE_Y:s[0-68-9][0-9]*]], s15
	; GCN-DAG: s_mov_b32 [[SAVE_Z:s[0-79][0-9]*]], s16			; GCN-DAG: s_mov_b32 [[SAVE_Z:s[0-79][0-9]*]], s16
	; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[6:7]			; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[6:7]
	; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[8:9]			; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[8:9]
	; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[10:11]			; GCN-DAG: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, s[10:11]

	; GCN-DAG: s_mov_b32 s6, s14			; GCN-DAG: s_mov_b32 s6, s14
	; GCN-DAG: s_mov_b32 s7, s15			; GCN-DAG: s_mov_b32 s7, s15
	; GCN-DAG: s_mov_b32 s8, s16			; GCN-DAG: s_mov_b32 s8, s16

				; GCN-DAG: s_mov_b64 s{{\[}}[[LO_X:[0-9]+]]{{\:}}[[HI_X:[0-9]+]]{{\]}}, s[6:7]
				; GCN-DAG: s_mov_b64 s{{\[}}[[LO_Y:[0-9]+]]{{\:}}[[HI_Y:[0-9]+]]{{\]}}, s[8:9]
				; GCN-DAG: s_mov_b64 s{{\[}}[[LO_Z:[0-9]+]]{{\:}}[[HI_Z:[0-9]+]]{{\]}}, s[10:11]

	; GCN: s_swappc_b64			; GCN: s_swappc_b64

	; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:4			; GCN: buffer_store_dword v{{[0-9]+}}, off, s[0:3], s5 offset:4
	; GCN: s_load_dword s{{[0-9]+}},			; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s[[LO_X]]
	; GCN: s_load_dword s{{[0-9]+}},			; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s[[HI_X]]
	; GCN: s_load_dword s{{[0-9]+}},			; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
				; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s[[LO_Y]]
				; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s[[HI_Y]]
				; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
				; GCN: v_mov_b32_e32 v[[LO:[0-9]+]], s[[LO_Z]]
				; GCN: v_mov_b32_e32 v[[HI:[0-9]+]], s[[HI_Z]]
				; GCN: {{flat\|global}}_load_dword v{{[0-9]+}}, v{{\[}}[[LO]]:[[HI]]{{\]}}
	; GCN: ; use			; GCN: ; use
	; GCN: ; use [[SAVE_X]]			; GCN: ; use [[SAVE_X]]
	; GCN: ; use [[SAVE_Y]]			; GCN: ; use [[SAVE_Y]]
	; GCN: ; use [[SAVE_Z]]			; GCN: ; use [[SAVE_Z]]
	define void @func_use_every_sgpr_input_call_use_workgroup_id_xyz_spill() #1 {			define void @func_use_every_sgpr_input_call_use_workgroup_id_xyz_spill() #1 {
	%alloca = alloca i32, align 4, addrspace(5)			%alloca = alloca i32, align 4, addrspace(5)
	call void @use_workgroup_id_xyz()			call void @use_workgroup_id_xyz()

	Show All 39 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.implicitarg.ptr.ll

Show All 28 Lines	define amdgpu_kernel void @kernel_implicitarg_ptr([112 x i8]) #0 {
%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()		%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()
%cast = bitcast i8 addrspace(4)* %implicitarg.ptr to i32 addrspace(4)*		%cast = bitcast i8 addrspace(4)* %implicitarg.ptr to i32 addrspace(4)*
%load = load volatile i32, i32 addrspace(4)* %cast		%load = load volatile i32, i32 addrspace(4)* %cast
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_implicitarg_ptr:		; GCN-LABEL: {{^}}func_implicitarg_ptr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN-NEXT: s_load_dword s{{[0-9]+}}, s[6:7], 0x0{{$}}		; MESA: s_mov_b64 s[8:9], s[6:7]
		; MESA: s_mov_b32 s11, 0xf000
		; MESA: s_mov_b32 s10, -1
		; MESA: buffer_load_dword v0, off, s[8:11], 0
		; HSA: v_mov_b32_e32 v0, s6
		; HSA: v_mov_b32_e32 v1, s7
		; HSA: flat_load_dword v0, v[0:1]
; GCN-NEXT: s_waitcnt		; GCN-NEXT: s_waitcnt
; GCN-NEXT: s_setpc_b64		; GCN-NEXT: s_setpc_b64
define void @func_implicitarg_ptr() #1 {		define void @func_implicitarg_ptr() #1 {
%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()		%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()
%cast = bitcast i8 addrspace(4)* %implicitarg.ptr to i32 addrspace(4)*		%cast = bitcast i8 addrspace(4)* %implicitarg.ptr to i32 addrspace(4)*
%load = load volatile i32, i32 addrspace(4)* %cast		%load = load volatile i32, i32 addrspace(4)* %cast
ret void		ret void
}		}
Show All 32 Lines
; GCN-NOT: s[6:7]		; GCN-NOT: s[6:7]
define void @func_call_implicitarg_ptr_func() #1 {		define void @func_call_implicitarg_ptr_func() #1 {
call void @func_implicitarg_ptr()		call void @func_implicitarg_ptr()
ret void		ret void
}		}

; GCN-LABEL: {{^}}func_kernarg_implicitarg_ptr:		; GCN-LABEL: {{^}}func_kernarg_implicitarg_ptr:
; GCN: s_waitcnt		; GCN: s_waitcnt
; GCN: s_load_dword s{{[0-9]+}}, s[6:7], 0x0{{$}}		; MESA: s_mov_b64 s[12:13], s[6:7]
; GCN: s_load_dword s{{[0-9]+}}, s[8:9], 0x0{{$}}		; MESA: s_mov_b32 s15, 0xf000
		; MESA: s_mov_b32 s14, -1
		; MESA: buffer_load_dword v0, off, s[12:15], 0
		; HSA: v_mov_b32_e32 v0, s6
		; HSA: v_mov_b32_e32 v1, s7
		; HSA: flat_load_dword v0, v[0:1]
		; MESA: s_mov_b32 s10, s14
		; MESA: s_mov_b32 s11, s15
		; MESA: buffer_load_dword v0, off, s[8:11], 0
		; HSA: v_mov_b32_e32 v0, s8
		; HSA: v_mov_b32_e32 v1, s9
		; HSA: flat_load_dword v0, v[0:1]

		; GCN: s_waitcnt vmcnt(0)
define void @func_kernarg_implicitarg_ptr() #1 {		define void @func_kernarg_implicitarg_ptr() #1 {
%kernarg.segment.ptr = call i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr()		%kernarg.segment.ptr = call i8 addrspace(4)* @llvm.amdgcn.kernarg.segment.ptr()
%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()		%implicitarg.ptr = call i8 addrspace(4)* @llvm.amdgcn.implicitarg.ptr()
%cast.kernarg.segment.ptr = bitcast i8 addrspace(4)* %kernarg.segment.ptr to i32 addrspace(4)*		%cast.kernarg.segment.ptr = bitcast i8 addrspace(4)* %kernarg.segment.ptr to i32 addrspace(4)*
%cast.implicitarg = bitcast i8 addrspace(4)* %implicitarg.ptr to i32 addrspace(4)*		%cast.implicitarg = bitcast i8 addrspace(4)* %implicitarg.ptr to i32 addrspace(4)*
%load0 = load volatile i32, i32 addrspace(4)* %cast.kernarg.segment.ptr		%load0 = load volatile i32, i32 addrspace(4)* %cast.kernarg.segment.ptr
%load1 = load volatile i32, i32 addrspace(4)* %cast.implicitarg		%load1 = load volatile i32, i32 addrspace(4)* %cast.implicitarg
ret void		ret void
Show All 19 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selectionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 135310

include/llvm/Analysis/DivergenceAnalysis.h

include/llvm/CodeGen/FunctionLoweringInfo.h

include/llvm/CodeGen/SelectionDAG.h

include/llvm/CodeGen/SelectionDAGNodes.h

include/llvm/CodeGen/TargetLowering.h

lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SMInstructions.td

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h

lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp

test/CodeGen/AMDGPU/callee-special-input-sgprs.ll

test/CodeGen/AMDGPU/llvm.amdgcn.implicitarg.ptr.ll

Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection
ClosedPublic