This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
2
SelectionDAGNodes.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeDAG.cpp
3
SelectionDAGBuilder.h
-
SelectionDAGBuilder.cpp
5
SelectionDAGISel.cpp
-
TargetLowering.cpp
-
Target/AMDGPU/
-
AMDGPU/
6/16
AMDGPUISelLowering.cpp
-
SIISelLowering.cpp
-
SMInstructions.td
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
1
hsa-func.ll

Differential D35267

Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection
ClosedPublic

Authored by alex-t on Jul 11 2017, 9:52 AM.

Download Raw Diff

Details

Reviewers

chandlerc
rampitec
vpykhtin
llvm-commits
arsenm
bogner
lattner

Commits

rG2e5eeceeb7a1: Pass Divergence Analysis data to Selection DAG to drive divergence dependent…
rL326703: Pass Divergence Analysis data to Selection DAG to drive divergence

Summary

In SIMT architectures VGPRs are high-demand resource. Same time significant part of the computations operate on naturally scalar data.
That computations can be performed by the SALU and save a lot of VGPRs. This is intended to increase occupancy.
Also, splitting the data flow to scalar and vector parts provide more flexibility to the instruction scheduler that can increase HW utilization.

On GPU targets we say that instruction is vector if it operates on VGPR operands each lane of which contains different values.
We say the instruction is scalar if it operates on SGPR that is shared among the all threads in the warp.

Divergence Analysis was introduced by F. Pereira & Co in 2013 and now is a part of LLVM core analysis stuff.
Unfortunately it's results are mostly useless because there is no way to inform instruction selection DAG about the divergence property of the concrete instruction.
Literally, IR operation that has not divergent operands produces uniform result and should be selected to scalar instruction.

We used to pass divergence data for memory access instructions through metadata just because MemSDNode has memory operand that refer the IR.
This approach is restricted to memory accesses only. That's why we'd need another pass working on the machine code that propagates divergence property
from the value load to computations and finally to the result store. Except the fact that we'd need one more pass,
this pass would repeat on the machine instructions same algorithm that was already done by the divergence analysis over IR.

Since SDNode flags field was recently enhanced to 16 bits and there are 5 bits unoccupied yet we have a chance to use them for passing divergence data to instruction selection.

This change introduce possible approach to the implementation of such enhancement.
It passes DA data for load instructions only. If accepted we'll go ahead and add same code to handle other instructions as well.

Diff Detail

Event Timeline

alex-t created this revision.Jul 11 2017, 9:52 AM

Herald added subscribers: nhaehnle, arsenm. · View Herald TranscriptJul 11 2017, 9:52 AM

lattner resigned from this revision.Jul 11 2017, 10:05 AM

rampitec added inline comments.Jul 11 2017, 1:33 PM

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
334	The analysis is pretty expensive, but not needed by all targets. There is TTI.hasBranchDivergence(). How about adding it as required only if TTI.hasBranchDivergence()? It also means you will need default isDivergent to 0 if analysis is unavailable.

alex-t added inline comments.Jul 12 2017, 5:20 AM

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

334

Not sure if it makes sense. DA.runOnFunction itself bails out if target has no divergence:

bool DivergenceAnalysis::runOnFunction(Function &F) {

auto *TTIWP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>();
if (TTIWP == nullptr)
  return false;

TargetTransformInfo &TTI = TTIWP->getTTI(F);
// Fast path: if the target does not have branch divergence, we do not mark
// any branch as divergent.

if (!TTI.hasBranchDivergence()) return false;**

rampitec added inline comments.Jul 12 2017, 8:03 AM

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
334	It bails, yet it depends on DominatorTree on its own, so adding it as a required pass will cause DT to build.

alex-t added a reviewer: llvm-commits.Jul 12 2017, 9:02 AM

rampitec mentioned this in D36292: AMDGPU: Add pass to cleanup DAG SALU/VALU messes.Aug 3 2017, 9:09 PM

Ping. Does anybody going to look at this? :)

alex-t added a reviewer: arsenm.Sep 4 2017, 7:01 AM

arsenm added inline comments.Sep 11 2017, 8:42 AM

include/llvm/CodeGen/SelectionDAGNodes.h
665–666	I have a general concern about this. The way this is used is going to not fit with how SelectionDAG APIs work, and is going to be very invasive. An SDNode is supposed to be immutable and some level of CSE is done by getNode. You can't have an API that involves setting a bit on a newly created node. Anything setting this needs to be done in getNode. Are divergent and non-divergent nodes CSEable? These need to be handled somewhere to prevent them from folding. You seem to only specially handle loads, but we have a lot of cases where we have combine issues from not knowing whether it's going to be selected to SALU or VALU instructions. If we have to somehow propagate this on every place a node is produced, that is a massive undertaking. I don't think that at this point it's worth trying to do such a level of work on SelectionDAG with GlobalISel on the way. Only handling loads I thought we could do just from the MemOperand.
test/CodeGen/AMDGPU/hsa-func.ll
2	This should be dropped

alex-t added inline comments.Sep 11 2017, 9:07 AM

include/llvm/CodeGen/SelectionDAGNodes.h
665–666	I agree with you in general... I also don't like to explicitly propagate divergence flag in each place in combining or/and legalizing. The problem is that getNode is not the only point where new SDNode may be created. For example getLoad and getExtLoad bypass getNode and create LoadSDNode explicitly. As for the handling divergence in CSE map... I maybe do not understand your point? If the node is CSEed we don't care is it divergent or not.

Implementation changed according to the reviewers suggestions.

Herald added a subscriber: wdng. · View Herald TranscriptNov 9 2017, 11:06 AM

This actually looks clean to me, thank you!

alex-t added a reviewer: bogner.Nov 10 2017, 4:40 AM

LGTM

This revision is now accepted and ready to land.Nov 16 2017, 1:29 PM

In general adding "custom" code to SelectionDAGBuilder::setValue looks odd. Instead I would add a target-customizable postprocessing loop on pairs of Value <-> SDNode into SelectionDAGISel::SelectBasicBlock right after the DAG is created. The target hook should be able to get whatever it requires LLVM IR analisys and annotate SDNodes.

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
334	this isn't fixed yet.

Targets that have no divergence do not depend on Divergence Analysis anymore.

In D35267#945021, @vpykhtin wrote:

In general adding "custom" code to SelectionDAGBuilder::setValue looks odd. Instead I would add a target-customizable postprocessing loop on pairs of Value <-> SDNode into SelectionDAGISel::SelectBasicBlock right after the DAG is created. The target hook should be able to get whatever it requires LLVM IR analisys and annotate SDNodes.

The problem I see here is that original Value is already unavailable after DAG builder, which would mean we need to expose NodeMap to targets. In fact current solution looks better to me.

Attention please! If nobody has objections this will be committed next Friday.

Herald added a subscriber: qcolombet. · View Herald TranscriptDec 6 2017, 7:27 AM

Does ReplaceAllUsesWith need to propagate changes to the "IsDivergent" bit?

include/llvm/CodeGen/SelectionDAG.h
360 ↗	(On Diff #125725)	I like this. :)
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
668	You're mutating the node after it's been inserted into the CSEMap, which is generally bad. Also, it's not clear this is the node you need to set the "divergent" bit on (NewN could be something which will be eliminated by DAGCombine, like a BITCAST or MERGE_VALUES). Can we do this some other way which is more obviously correct?

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

In D35267#946933, @efriedma wrote:

Does ReplaceAllUsesWith need to propagate changes to the "IsDivergent" bit?

Divergence Analysis is the iterative solver over SSA form. So, after it's done we assume all the Values are correctly annotated with Divergence flag.
When we change some DAG pattern (combiner/legalizer etc) to some other pattern, the Divergence of any new node (and recursively the resulting pattern root) is superposition of the divergence of it's operands.
So we partially repeat the work that was done by the DA but locally - for each newly created node. This work because we assume all the operands have correct bit set.
Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.
Given that we don't need to propagate the flag in ReplaceAllUsesWith

In D35267#948666, @rampitec wrote:

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

That's true. The problem is that in SelectionDAG::getNode (where the SCEMap insertion is) we have no Value and no chance to check it's divergence.
And this is correct: SelectionDAG is for selection and we should not expose the IR Values to it.

The only way I see is to pass the Divergence parameter to getNode from all the SelectionDAGBuilder visitors. This will be correct but requires to change each of 109 visitors and getNode().

alex-t added inline comments.Dec 8 2017, 5:06 AM

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
668	You are right. BTW this is not the only place mutating SDNode that has already been created. Look in SelectionDAG.cpp lines: 4719-4722 if (SDNode *E = FindNodeOrInsertPos(ID, DL, IP)) { E->intersectFlagsWith(Flags); return SDValue(E, 0); } SDnode that found in the map is mutated and then returned w/o any memoization of the mutation

Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.

Transforming "x*0 -> 0" is illegal if x is divergent? That seems surprising.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
668	I'm more concerned about whether you're attaching the "divergent" bit to the right node. The mutation is probably mostly harmless; as you note, other places mess with flags.

In D35267#949981, @efriedma wrote:

Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.

Transforming "x*0 -> 0" is illegal if x is divergent? That seems surprising.

Okay, I was unclear. Except for the constants. Your example is a corner case that turn the variable to the constant.
In this case w/o bit propagation we're still correct but sub-optimal.
I can imagine though the case where a long sequence of constant folding ends up with pure zero. If in addition the operand that becomes constant was the only divergent operand, we'd like to propagate.

In D35267#950795, @alex-t wrote:

In D35267#949981, @efriedma wrote:

Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.

Transforming "x*0 -> 0" is illegal if x is divergent? That seems surprising.

Okay, I was unclear. Except for the constants. Your example is a corner case that turn the variable to the constant.
In this case w/o bit propagation we're still correct but sub-optimal.
I can imagine though the case where a long sequence of constant folding ends up with pure zero. If in addition the operand that becomes constant was the only divergent operand, we'd like to propagate.

More general there is a known corner case that for example "get_local_id(x) & ~63" is uniform.

The idea here is that get_local_id() is a source of divergence, but only low 6 bits of it are divergent and upper bits are uniform for our target. Handling such cases would need DA interface to be extended to produce a divergent bit mask instead of one bit answer and employ computeKnownBits on an expression to deduce expressions which converge to be uniform even if depend on a non-uniform value.

The corner case is order of magnitude less frequent than more straight forward uses of a divergent expression though. We have plans to extend divergence analysis in the future to handle this, but without a good mechanism to propagate through DAG it will not be very useful anyway.

In D35267#949394, @alex-t wrote:

In D35267#948666, @rampitec wrote:

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

That's true. The problem is that in SelectionDAG::getNode (where the SCEMap insertion is) we have no Value and no chance to check it's divergence.
And this is correct: SelectionDAG is for selection and we should not expose the IR Values to it.

The only way I see is to pass the Divergence parameter to getNode from all the SelectionDAGBuilder visitors. This will be correct but requires to change each of 109 visitors and getNode().

In fact we have no chance to have 2 SDNodes that differ by the Divergence flag only.
Please note that the selection operates per block. SelectionDAGBuilder construct the DAG for one block at a time.
Then it selects and emits the code. Then all the data including CSE map get cleared.
FoldingSetNodeID creates the hash including node and it's operands.
Thus we hit the hash only if there is same node with same operands.
Form the data dependency point it must have same divergence. So literally it is same node and setting same value of divergence flag makes no harm.
The only case when we could have 2 nodes that differ by the divergence only is if both have same operands but one is control-dependent of the divergent branch.
That immediately means that 2 nodes belong to different basic blocks and hence cannot be folded.

In D35267#952151, @alex-t wrote:

In D35267#949394, @alex-t wrote:

In D35267#948666, @rampitec wrote:

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

That's true. The problem is that in SelectionDAG::getNode (where the SCEMap insertion is) we have no Value and no chance to check it's divergence.
And this is correct: SelectionDAG is for selection and we should not expose the IR Values to it.

The only way I see is to pass the Divergence parameter to getNode from all the SelectionDAGBuilder visitors. This will be correct but requires to change each of 109 visitors and getNode().

In fact we have no chance to have 2 SDNodes that differ by the Divergence flag only.
Please note that the selection operates per block. SelectionDAGBuilder construct the DAG for one block at a time.
Then it selects and emits the code. Then all the data including CSE map get cleared.
FoldingSetNodeID creates the hash including node and it's operands.
Thus we hit the hash only if there is same node with same operands.
Form the data dependency point it must have same divergence. So literally it is same node and setting same value of divergence flag makes no harm.
The only case when we could have 2 nodes that differ by the divergence only is if both have same operands but one is control-dependent of the divergent branch.
That immediately means that 2 nodes belong to different basic blocks and hence cannot be folded.

Thank you, that makes sense. I have no objections then, since creation of a node with a proper flags would result in a really massive patch changing all constructors and visitors. E.g. all node visitors would then need to gain a knowledge about a non node specific property. Post patching seems better to me.

In D35267#950795, @alex-t wrote:

In this case w/o bit propagation we're still correct but sub-optimal.

I'm worried that you're covering up bugs by accepting "sub-optimal" results. Specifically, if you have a node which is marked divergent, but doesn't actually have any divergent operands, it will stay marked divergent in most cases... but if DAGCombine or legalization transforms it to some other equivalent operation, it'll erase the "divergent" marking. So your markings will look mostly correct in simple cases, but break for more complicated cases.

Divergence bit propagation added to ReplaceAllUsesWith

efriedma added inline comments.Dec 19 2017, 4:30 PM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7311 ↗	(On Diff #126767)	Missing code to unset IsDivergent if a node becomes non-divergent, and missing code to recursively propagate changes.

Actually, what I'd really like to see here is some sort of verifier for the divergent bit. It should be possible to recompute the divergence of the SelectionDAG at any point from first principles. There's a small set of operations which are fundamentally divergent: CopyFromReg where the register contains a divergent value (you should be able to derive this from DivergenceAnalysis), divergent memory accesses, and some target-specific intrinsics. (Not sure that's a complete list, but should be close.) All other operations are divergent if and only if they have a divergent predecessor.

In D35267#960250, @efriedma wrote:

Actually, what I'd really like to see here is some sort of verifier for the divergent bit. It should be possible to recompute the divergence of the SelectionDAG at any point from first principles. There's a small set of operations which are fundamentally divergent: CopyFromReg where the register contains a divergent value (you should be able to derive this from DivergenceAnalysis), divergent memory accesses, and some target-specific intrinsics. (Not sure that's a complete list, but should be close.) All other operations are divergent if and only if they have a divergent predecessor.

What you described is exactly the way how the Divergence Analysis works. Do you really consider creating one more DA upon the Selection DAG?

In D35267#952801, @efriedma wrote:

In D35267#950795, @alex-t wrote:

In this case w/o bit propagation we're still correct but sub-optimal.

I'm worried that you're covering up bugs by accepting "sub-optimal" results. Specifically, if you have a node which is marked divergent, but doesn't actually have any divergent operands, it will stay marked divergent in most cases... but if DAGCombine or legalization transforms it to some other equivalent operation, it'll erase the "divergent" marking. So your markings will look mostly correct in simple cases, but break for more complicated cases.

I still insist that divergence bit propagation over the use replacement is not necessary. In most cases it is useless.
Please note that any node that was created in the combiner/legalizer transformation already has the correct divergence because in CreateOperands the new node divergence is computed as OR over it's operands divergence bits.
Even if some transformation is managed to change the non-divergent use with the divergent one it is illegal.
The corner case in your example that turns variable into the constant can never lead to incorrect code. In the worst case we'll have a splat of zeroes that waste vector register. So the code is still correct but is not optimal.

Do you really consider creating one more DA upon the Selection DAG?

Yes. IR instructions don't have a one-to-one correspondence to SelectionDAG nodes, so I think you're inevitably going to run into subtle bugs which will be difficult to track down.

The key here is computing whether a SelectionDAG node is "naturally" divergent (divergent regardless of its operands); once you have that, computing and verifying complete divergence is trivial. And computing whether a SelectionDAG node is naturally divergent shouldn't be hard, as far as I can tell, so this really shouldn't be much code overall.

Please note that any node that was created in the combiner/legalizer transformation already has the correct divergence because in CreateOperands the new node divergence is computed as OR over it's operands divergence bits.

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node. And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

To start with, let's make sure that we're agreed on terms.
Divergent machine runs a set of threads (warp or wavefront) that execute same set of instructions in same order (SIMT).
Divergent operation operates on "vector" registers such that each register consists of many lanes - each thread operates on the data in corresponding lane.
From the above immediately follows that the only source of divergence is thread ID or any data that is derived from thread ID.
Usually it is a small set of target intrinsics that may be the source of such a data.

There are 2 reasons of operation to be divergent:

It data-dependent on some divergent operation

%tid = call i64 get_global_id_x()  // source of divergence
%1 = add i64 %x, %tid                // data dependence on operand 1
%2 = shl i64 %1, 16                     // data dependence on operand 0
%gep = getelementptr i32, i32 addrspace(1) * %array, i64 %2   // data dependence on operand 1
%val = load i32, %gep                // data dependence on operand 0

operation that is uniform itself but is control-dependent on the divergent branch:

int tid = get_global_id(0)

if (tid < n) {
  x = 1;               // no data-dependency on any divergent data
} else {
  x = 2;              // no data-dependency on any divergent data
}
y = x + 5;     // threads taking different branch-targets have different "y" value - operation is divergent ( it is vector addition on vector registers )

Since the selection DAG only models data dependency the latter case is out of scope of this discussion.
The DAG is constructed, transformed and selected per block.

From the above follows that operation in the selection DAG only may be divergent if there is a path in the DAG from some divergent node to the current node.

Initially DAG is constructed by the walk of the IR (SelectionDAGBuilder) and models IR exactly. Thus the divergence property is kept unchanged.

Both DAG peephole optimizations (combiner) and operations/types legalization do not create the new edges in data dependence graph.
I mean that they match the pattern following the existing edges and then change it to some another sub-graph such that all incoming edges of the old subgraph become incoming edges of the new one and same for the outgoing.
Even if several incoming/outgoing are merged together it keeps data flow pattern.

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node.

So, my question is: could you imagine even theoretical sensible transformation that convert the graph in such a way that uniform node will get divergent income?

And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

Even if it creates new DAG pattern it returns it's root that (because of CreateOperands) has correct divergence that will be passed to setValue. Or I did not understand what you meant?

In D35267#962949, @alex-t wrote:

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node.

So, my question is: could you imagine even theoretical sensible transformation that convert the graph in such a way that uniform node will get divergent income?

No, but that isn't the point. The problem is that you could replace a naturally divergent node with an equivalent naturally divergent node, but the new node doesn't have the divergent bit set (since the bit only gets set in DAGCombine for nodes with divergent operands, and naturally divergent nodes might not have divergent operands). Thinking about it a bit more, I guess regular load/store operations are a bad example; if a load produced multiple values given a uniform address, it would be a data race. But I think atomic memory operations could run into this issue? (Consider, for example, the code in DAGTypeLegalizer::PromoteIntRes_Atomic1.)

In D35267#962949, @alex-t wrote:

And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

Even if it creates new DAG pattern it returns it's root that (because of CreateOperands) has correct divergence that will be passed to setValue. Or I did not understand what you meant?

That's not what I meant.

Say you have a call to a divergent function which returns an i64, but i64 isn't legal on your target (so the function effectively returns two values of type i32). We create the call, a couple CopyFromReg nodes, and then a MERGE_VALUES to merge the value. Then you set the MERGE_VALUES to be divergent... but that isn't really helpful: legalization for MERGE_VALUES erases the node, so the "divergent" bit goes away.

This is a draft of the divergence analysis solver on the selection DAG. In the course of discussion the divergence bit verification was requested.
Analysis of the one given block cannot cover control dependencies. Thus the divergence bits set from the IR reflecting control dependencies cannot match those computed on the one isolated block DAG. That's why it is not exactly the verification. The analysis performed on the DAG augments the divergence information passed from the IR.

In D35267#966245, @efriedma wrote:

In D35267#962949, @alex-t wrote:

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node.

So, my question is: could you imagine even theoretical sensible transformation that convert the graph in such a way that uniform node will get divergent income?

No, but that isn't the point. The problem is that you could replace a naturally divergent node with an equivalent naturally divergent node, but the new node doesn't have the divergent bit set (since the bit only gets set in DAGCombine for nodes with divergent operands, and naturally divergent nodes might not have divergent operands). Thinking about it a bit more, I guess regular load/store operations are a bad example; if a load produced multiple values given a uniform address, it would be a data race. But I think atomic memory operations could run into this issue? (Consider, for example, the code in DAGTypeLegalizer::PromoteIntRes_Atomic1.)

In D35267#962949, @alex-t wrote:

And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

Even if it creates new DAG pattern it returns it's root that (because of CreateOperands) has correct divergence that will be passed to setValue. Or I did not understand what you meant?

That's not what I meant.

Say you have a call to a divergent function which returns an i64, but i64 isn't legal on your target (so the function effectively returns two values of type i32). We create the call, a couple CopyFromReg nodes, and then a MERGE_VALUES to merge the value. Then you set the MERGE_VALUES to be divergent... but that isn't really helpful: legalization for MERGE_VALUES erases the node, so the "divergent" bit goes away.

The diff uploaded is a draft just to check - does it look like what you meant? In fact there are some issues to resolve:

The content of the target specific "isSDNodeSourceOfDivergence" procedure depend on the stage of the DAG lowering where it is called. The most reasonable place is just before the selection after all combining/legalizing are done. In this case all the intrinsics are already expanded and turned to the CopyFromReg or similar elementary operations. So it is unclear if it reasonable to have the code handling this intrinsics.
All the divergence flags propagation in the "ReplaceAllUsesWith" are useless and should be removed.
This solution is not in fact verification because the flags computed on single block in general don't match those passed from the IR because of the control dependencies. This is just yet another part of analysis to augment the information.

I was thinking of a verifier more like the LLVM IR verifier... so we would constantly maintain correct divergence information, then check it in asserts builds. That was we can be confident the bit is right from building the DAG through ISel. In terms of code changes, essentially make the divergence computation in createOperands call isSDNodeSourceOfDivergence, delete the changes to setValue, and make VerifyDAGDiverence assert rather than modify the node when it detects a difference.

In D35267#977124, @alex-t wrote:

The content of the target specific "isSDNodeSourceOfDivergence" procedure depend on the stage of the DAG lowering where it is called. The most reasonable place is just before the selection after all combining/legalizing are done. In this case all the intrinsics are already expanded and turned to the CopyFromReg or similar elementary operations. So it is unclear if it reasonable to have the code handling this intrinsics.

It makes the rest of the patch cleaner if you handle intrinsics in isSDNodeSourceOfDivergence, I think.

This solution is not in fact verification because the flags computed on single block in general don't match those passed from the IR because of the control dependencies. This is just yet another part of analysis to augment the information.

Specifically which nodes are a problem here? We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register. (Not sure there's an existing map from registers to values, but you could easily construct one; basically the inverse of FunctionLoweringInfo::ValueMap.)

Specifically which nodes are a problem here? We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register. (Not sure there's an existing map from registers to values, but you could easily construct one; basically the inverse of FunctionLoweringInfo::ValueMap.)

In one of my previous posts I have explained what control dependencies are. Let's try again.
Consider the following OpenCL code:

uint tid = get_global_id(0);    // returns the ID of the individual workitem
if (tid < 10) {
  x = 2;
} else {
  x = 3;
} 
z = y + x; // all threads 0-9 have x= 2, others x= 3

Please note that the addition "z = y + x" is divergent because different threads compute different values of "z".
Please also note that this addition does not depend on "tid" or any other divergent data. It is not possible to discover this dependency analyzing individual block. We need CFG information.
Divergence Analysis on IR covers control dependencies by means of special PHI-nodes processing.
For regular node the node divergence is computed as literally logical OR of all operands divergence bits.
For PHI-node it adds to the list all the branch instructions that terminate basic blocks in PHI's source blocks post-dominance frontier.

All the above means that we cannot just drop the IR divergence analysis results. DAG only reflects data dependencies.
Analyzing individual block on the DAG we can only follow data dependencies. So if we try to match the divergence bits computed on the IR (counting control flow)
with those computed on the individual block DAG we'll get in assert on the divergence bits set on the nodes control dependent on the divergent branches.

To track all the nodes divergent by the control dependencies we'd need to sustain special data structure along the all stages of the DAG processing.
This all looks too resource consuming.

There is one possible trade-off:
We can add virtual hook in TargetTransformInfo to query if the target support divergence analysis driven selection. It returns true iif the target ensures it has no transformations that may break divergence data integrity.
For AMDGPU that is always true.

If the target does not support this we don't use the divergence bit at all.

This would allow us to use the functionality w/o any even theoretical threat to other targets.

Please also note that this addition does not depend on "tid" or any other divergent data. It is not possible to discover this dependency analyzing individual block. We need CFG information.

Yes, this is what I was getting at with "We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register."; the nodes which need CFG information are precisely CopyFromReg nodes from virtual registers. Each virtual register created by the SelectionDAGBuilder should correspond to exactly one IR instruction.

In D35267#985601, @efriedma wrote:

Please also note that this addition does not depend on "tid" or any other divergent data. It is not possible to discover this dependency analyzing individual block. We need CFG information.

Yes, this is what I was getting at with "We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register."; the nodes which need CFG information are precisely CopyFromReg nodes from virtual registers. Each virtual register created by the SelectionDAGBuilder should correspond to exactly one IR instruction.

In general this would work but we still have several issues:

As I understand you are concerned about the mutating the SDNode after it has been created in getNode().

FunctionLoweringInfo::ValueMap is created during the SelectionDAGBuilder walk through the BasicBlock. So we cannot query live-in register divergence from the CreateOperands => TargetLoweringInfo::isSDNodeSourceOfDivergence. By this point ValueMap has not yet been filled in.
Even if we able to count control dependencies from the SelectionDAGBuilder we would have a mean to propagate the flag value through the DAG along the data dependency edges.

All above means that we cannot just validate the flag values and assert if it does not match. We have to run iterative solver for each block just before the selection to count the control dependencies and to propagate the flag values.

I tried this approach and it works at a first glance.

One more item that should be discussed is the target-specific exceptions to the common divergence modeling algorithm.
For instance in AMDGPU target we have amdgcn.readfirstlane/readlane intrinsics. They accept vector register and return the first or specific lane value.
So both accept naturally divergent VGPR but return the scalar value.
Following the common divergence computing algorithm - "the divergence of operation's result is superposition of the operands divergence" we'd set %scalar = tail call i32 @llvm.amdgcn.readfirstlane(i32 %tid) to divergent that is not true.
In the IR form of the divergence-driven selection we rely on the TargetTransformInfo::isAlwaysUniform hook that was added to interface for this purpose.
It allows the target to declare arbitrary set of target operations as "always uniform" so that the analysis does not count for their operands divergence.

To meet this design we'd have to add similar hook to the TargetLoweringInfo interface. Is this feasible?

In D35267#990432, @alex-t wrote:

As I understand you are concerned about the mutating the SDNode after it has been created in getNode().

My most important concern is actually getting the modeling correct, so queries come up with the correct result when it gets queried by DAGCombine. If the bit on the SDNode is just a cache which can be recomputed/verified, it's fine to mutate it when we need to.

FunctionLoweringInfo::ValueMap is created during the SelectionDAGBuilder walk through the BasicBlock. So we cannot query live-in register divergence from the CreateOperands => TargetLoweringInfo::isSDNodeSourceOfDivergence. By this point ValueMap has not yet been filled in.

Really? I thought we fill it in before we actually start building the SelectionDAG (in FunctionLoweringInfo::set). But you can move it earlier if you need to.

All above means that we cannot just validate the flag values and assert if it does not match. We have to run iterative solver for each block just before the selection to count the control dependencies and to propagate the flag values.

I tried this approach and it works at a first glance.

Great!

To meet this design we'd have to add similar hook to the TargetLoweringInfo interface. Is this feasible?

Yes, this should be fine.

FunctionLoweringInfo::ValueMap is created during the SelectionDAGBuilder walk through the BasicBlock. So we cannot query live-in register divergence from the CreateOperands => TargetLoweringInfo::isSDNodeSourceOfDivergence. By this point ValueMap has not yet been filled in.

Really? I thought we fill it in before we actually start building the SelectionDAG (in FunctionLoweringInfo::set). But you can move it earlier if you need to.

All above means that we cannot just validate the flag values and assert if it does not match. We have to run iterative solver for each block just before the selection to count the control dependencies and to propagate the flag values.

Oops... That was my mistake.

FunctionLoweringInfo::ValueMap gets filled in by the FunctionLoweringInfo::CreateRegs in SelectionDAGISel::SelectAllBasicBlocks much earlier then the SelectionDAGBuilder walks the IR. So, everything works! :)

BTW, we don't need to verify flags since we're creating them in CreaeOperands.
The flag for each node is computed from it's divergence and it's operands. This is going on in SelectionDAGBuilder walk.
For each node, it's operands are already computed in this point and node's divergence is immediately set to correct value.
This is correct just because in contrary to IR DAG has no loops.
Same story if CreateOperands is called from Combiner/Legalizer.

Here is alternative implementation based on the TargetLoweringInfo hooks.

I'd like to see a verifier somewhere that the divergence bit is still correct after DAGCombine (it could be different from what SelectionDAG::createOperands would compute given how ReplaceAllUsesWith works).

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
8258 ↗	(On Diff #132862)	This is good; I'm happy we're cleanly computing divergence for a DAG node.

rampitec added inline comments.Feb 5 2018, 11:48 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
769	SIRegisterInfo::isVGPR()
782	!DA \|\| DA->isDivergent(...) You are using getAnalysisIfAvailable, so it can be missing.
801	Can you make isIntrinsicSourceOfDivergence() external and use it instead?

rampitec added inline comments.Feb 5 2018, 11:51 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
774	I am afraid that is not true to say that VGPR is necessarily divergent.

In D35267#998083, @efriedma wrote:

I'd like to see a verifier somewhere that the divergence bit is still correct after DAGCombine (it could be different from what SelectionDAG::createOperands would compute given how ReplaceAllUsesWith works).

Could you please clarify the goal of the verification? Let's say we managed to transform the DAG in such a way that uniform pattern has been changed to divergent one.
Then the approach depends on our attitude to the transformation.
If we agree that transformation that change the pattern divergence is illegal? like it is in case we change uniform to divergent, we should assert and bail out.
If we assume that transformation is legal? like in your example of folding divergent variable to zero constant (x*0 => 0), we should recompute the divergence bits instead.
To handle both cases we need one more re-computation over all DAG nodes like it was done in my previous implementation but with error message if the uniform node becomes divergent.

I would like to just re-compute the bits just before selection and leave the legality of the DAG transformation issues to that transformations authors. In other words we compute what we have.
If someone transform the DAG incorrectly it is his own problem.

alex-t added inline comments.Feb 8 2018, 2:18 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
774	That's not true. Do we have a mean to detect that this is a splat vector? If not I'd stay with conservative approach that consider all VGPRs divergent. Alternatively we could add one more target hook to query for special VGPRs that are uniform.

rampitec added inline comments.Feb 8 2018, 10:00 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
774	I doubt you can reliably detect it. The concern is potential unneeded moves and readfirstlane instructions, one thing that we are trying to avoid here.

If we're going to include the "divergent" bit in SDNodes, so we can query it all the time, the bit needs to be correct all the time. The goal of a verifier is to ensure that at any given point, the bits stored in the SelectionDAG are the same as the bits we would compute from scratch. So code still needs to do the right thing to update the divergence bits, if necessary, but the verifier lets us catch mistakes early. This is similar to the way we have a domtree verifier, to ensure transforms correctly update the domtree.

rampitec added inline comments.Feb 8 2018, 12:42 PM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
774	I withdraw this objection. Apparently this is all about physregs, and we do not have a lot of them at this stage.

In D35267#1002305, @efriedma wrote:

If we're going to include the "divergent" bit in SDNodes, so we can query it all the time, the bit needs to be correct all the time. The goal of a verifier is to ensure that at any given point, the bits stored in the SelectionDAG are the same as the bits we would compute from scratch. So code still needs to do the right thing to update the divergence bits, if necessary, but the verifier lets us catch mistakes early. This is similar to the way we have a domtree verifier, to ensure transforms correctly update the domtree.

Re-computation the bits for the entire DAG any time combiner change something is too expensive.
In this case I'd opt to propagate the bit in ReplaceAllUses methods.

Preliminary revision illustrating possible approach to keeping divergence information consistent along the DAG transformation

alex-t added inline comments.Feb 13 2018, 6:29 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
782	if DA == nulptr in the case above we'd return ((bool)!DA) true? maybe it's better return false for the targets that have no DA? I mean " DA && DA->isDivegent()" if we have no DA we return false. In case we have, the returned value will be defined by the isDivergent result

alex-t added inline comments.Feb 13 2018, 7:32 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
769	What should we do fro R600Subtarget?

rampitec added inline comments.Feb 13 2018, 9:59 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
769	Since we are not going to change selection in R600 that is practically non-important what we do for this case.
782	That is conservatively correct to return true. Presumably targets w/o DA will have no use of the bit anyway, but if they are it is dangerous to assume uniformness.

alex-t added inline comments.Feb 13 2018, 10:06 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
782	targets w/o DA will have no use of the bit anyway, but if they are it is dangerous Sounds a bit paranoid :) I just noted that returning "true" for the target that has no divergence at all looks misleading.

alex-t added inline comments.Feb 14 2018, 1:00 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
782	Moreover, targets that have no DA have neither overridden isSDNodeSourceOfDivergence and will never get here.

Some bugfixes and changes according to the reviewers requirements.

rampitec added inline comments.Feb 15 2018, 9:22 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
782	What if DA is invalidated and thus NULL, even for targets with divergence?
814	It still duplicates implementation in AMDGPUTargetTransformInfo.

ping @efriedma

I still want a verifier, a function which checks the bits currently saved on SelectionDAG nodes are the same as the bits we would compute from scratch (and calls report_fatal_error() if they aren't). Maybe call it in a couple places in SelectionDAGISel::CodeGenAndEmitDAG() if assertions are enabled.

alex-t added a comment.Feb 20 2018, 6:02 AM

This comment was removed by alex-t.

This is the preview of the implementation that provide walk-through divergence bits consistency.
Please note that the verification algorithm has polynomial complexity and is expected to be switched ON/OFF by the option (upcoming soon) with default to OFF.

You should be able to do verification in linear time. Just call SelectionDAG::AssignTopologicalOrder() before you start iterating over allnodes().

You should be able to do verification in linear time. Just call SelectionDAG::AssignTopologicalOrder() before you start iterating over allnodes().

Actually, on second thought, maybe don't do that; AssignTopologicalOrder() mutates the SelectionDAG, so it could change the generated code if we call it conditionally. But anyway, a topological sort should be straightforward.

Verification algorithm of linear complexity

efriedma added inline comments.Feb 22 2018, 12:42 PM

include/llvm/CodeGen/TargetLowering.h
2561 ↗	(On Diff #135490)	Weird indentation; try clang-format?
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7506 ↗	(On Diff #135490)	This is exactly what I was looking for; thanks.
lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
743	Maybe `#ifndef NDEBUG`. Can we somehow skip this for targets which don't use divergence information?

Formatting fixed.
DAG divergence verification for "divergent" targets only.

One test fixed

make check-llvm has passed

Target-independent bits LGTM.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
825	Formatting?

ready to land

alex-t marked 6 inline comments as done.Mar 2 2018, 5:40 AM

alex-t updated this revision to Diff 136772.Mar 2 2018, 9:21 AM

alex-t updated this revision to Diff 136985.Mar 5 2018, 5:58 AM

Closed by commit rL326703: Pass Divergence Analysis data to Selection DAG to drive divergence (authored by alex-t). · Explain WhyMar 5 2018, 7:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

SelectionDAGNodes.h

6 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

12 lines

SelectionDAGBuilder.h

5 lines

SelectionDAGBuilder.cpp

5 lines

SelectionDAGISel.cpp

3 lines

TargetLowering.cpp

6 lines

Target/

AMDGPU/

AMDGPUISelLowering.cpp

3 lines

SIISelLowering.cpp

4 lines

SMInstructions.td

6 lines

test/

CodeGen/

AMDGPU/

hsa-func.ll

3 lines

Diff 106050

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 463 Lines • ▼ Show 20 Lines	protected:

class SDNodeBitfields {		class SDNodeBitfields {
friend class SDNode;		friend class SDNode;
friend class MemIntrinsicSDNode;		friend class MemIntrinsicSDNode;
friend class MemSDNode;		friend class MemSDNode;

uint16_t HasDebugValue : 1;		uint16_t HasDebugValue : 1;
uint16_t IsMemIntrinsic : 1;		uint16_t IsMemIntrinsic : 1;
		uint16_t IsDivergent : 1;
};		};
enum { NumSDNodeBits = 2 };		enum { NumSDNodeBits = 3 };

class ConstantSDNodeBitfields {		class ConstantSDNodeBitfields {
friend class ConstantSDNode;		friend class ConstantSDNode;

uint16_t : NumSDNodeBits;		uint16_t : NumSDNodeBits;

uint16_t IsOpaque : 1;		uint16_t IsOpaque : 1;
};		};
▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	public:
unsigned getMachineOpcode() const {		unsigned getMachineOpcode() const {
assert(isMachineOpcode() && "Not a MachineInstr opcode!");		assert(isMachineOpcode() && "Not a MachineInstr opcode!");
return ~NodeType;		return ~NodeType;
}		}

bool getHasDebugValue() const { return SDNodeBits.HasDebugValue; }		bool getHasDebugValue() const { return SDNodeBits.HasDebugValue; }
void setHasDebugValue(bool b) { SDNodeBits.HasDebugValue = b; }		void setHasDebugValue(bool b) { SDNodeBits.HasDebugValue = b; }

		bool isDivergent() const { return SDNodeBits.IsDivergent; }
		void setIsDivergent(bool b) { SDNodeBits.IsDivergent = b; }
		arsenmUnsubmitted Not Done Reply Inline Actions I have a general concern about this. The way this is used is going to not fit with how SelectionDAG APIs work, and is going to be very invasive. An SDNode is supposed to be immutable and some level of CSE is done by getNode. You can't have an API that involves setting a bit on a newly created node. Anything setting this needs to be done in getNode. Are divergent and non-divergent nodes CSEable? These need to be handled somewhere to prevent them from folding. You seem to only specially handle loads, but we have a lot of cases where we have combine issues from not knowing whether it's going to be selected to SALU or VALU instructions. If we have to somehow propagate this on every place a node is produced, that is a massive undertaking. I don't think that at this point it's worth trying to do such a level of work on SelectionDAG with GlobalISel on the way. Only handling loads I thought we could do just from the MemOperand. arsenm: I have a general concern about this. The way this is used is going to not fit with how…
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions I agree with you in general... I also don't like to explicitly propagate divergence flag in each place in combining or/and legalizing. The problem is that getNode is not the only point where new SDNode may be created. For example getLoad and getExtLoad bypass getNode and create LoadSDNode explicitly. As for the handling divergence in CSE map... I maybe do not understand your point? If the node is CSEed we don't care is it divergent or not. alex-t: I agree with you in general... I also don't like to explicitly propagate divergence flag in…

/// Return true if there are no uses of this node.		/// Return true if there are no uses of this node.
bool use_empty() const { return UseList == nullptr; }		bool use_empty() const { return UseList == nullptr; }

/// Return true if there is exactly one use of this node.		/// Return true if there is exactly one use of this node.
bool hasOneUse() const {		bool hasOneUse() const {
return !use_empty() && std::next(use_begin()) == use_end();		return !use_empty() && std::next(use_begin()) == use_end();
}		}

▲ Show 20 Lines • Show All 1,658 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	case TargetLowering::Custom: {
break;		break;
}		}
case TargetLowering::Promote: {		case TargetLowering::Promote: {
MVT NVT = TLI.getTypeToPromoteTo(Node->getOpcode(), VT);		MVT NVT = TLI.getTypeToPromoteTo(Node->getOpcode(), VT);
assert(NVT.getSizeInBits() == VT.getSizeInBits() &&		assert(NVT.getSizeInBits() == VT.getSizeInBits() &&
"Can only promote loads to same size type");		"Can only promote loads to same size type");

SDValue Res = DAG.getLoad(NVT, dl, Chain, Ptr, LD->getMemOperand());		SDValue Res = DAG.getLoad(NVT, dl, Chain, Ptr, LD->getMemOperand());
		Res.getNode()->setIsDivergent(LD->isDivergent());
RVal = DAG.getNode(ISD::BITCAST, dl, VT, Res);		RVal = DAG.getNode(ISD::BITCAST, dl, VT, Res);
RChain = Res.getValue(1);		RChain = Res.getValue(1);
break;		break;
}		}
}		}
if (RChain.getNode() != Node) {		if (RChain.getNode() != Node) {
assert(RVal.getNode() != Node && "Load must be completely replaced");		assert(RVal.getNode() != Node && "Load must be completely replaced");
DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 0), RVal);		DAG.ReplaceAllUsesOfValueWith(SDValue(Node, 0), RVal);
Show All 34 Lines	if (SrcWidth != SrcVT.getStoreSizeInBits() &&
// way. A zext load from NVT thus automatically gives zext from SrcVT.		// way. A zext load from NVT thus automatically gives zext from SrcVT.

ISD::LoadExtType NewExtType =		ISD::LoadExtType NewExtType =
ExtType == ISD::ZEXTLOAD ? ISD::ZEXTLOAD : ISD::EXTLOAD;		ExtType == ISD::ZEXTLOAD ? ISD::ZEXTLOAD : ISD::EXTLOAD;

SDValue Result =		SDValue Result =
DAG.getExtLoad(NewExtType, dl, Node->getValueType(0), Chain, Ptr,		DAG.getExtLoad(NewExtType, dl, Node->getValueType(0), Chain, Ptr,
LD->getPointerInfo(), NVT, Alignment, MMOFlags, AAInfo);		LD->getPointerInfo(), NVT, Alignment, MMOFlags, AAInfo);
		Result->setIsDivergent(LD->isDivergent());
Ch = Result.getValue(1); // The chain.		Ch = Result.getValue(1); // The chain.

if (ExtType == ISD::SEXTLOAD)		if (ExtType == ISD::SEXTLOAD)
// Having the top bits zero doesn't help when sign extending.		// Having the top bits zero doesn't help when sign extending.
Result = DAG.getNode(ISD::SIGN_EXTEND_INREG, dl,		Result = DAG.getNode(ISD::SIGN_EXTEND_INREG, dl,
Result.getValueType(),		Result.getValueType(),
Result, DAG.getValueType(SrcVT));		Result, DAG.getValueType(SrcVT));
else if (ExtType == ISD::ZEXTLOAD \|\| NVT == Result.getValueType())		else if (ExtType == ISD::ZEXTLOAD \|\| NVT == Result.getValueType())
Show All 20 Lines	if (SrcWidth != SrcVT.getStoreSizeInBits() &&
auto &DL = DAG.getDataLayout();		auto &DL = DAG.getDataLayout();

if (DL.isLittleEndian()) {		if (DL.isLittleEndian()) {
// EXTLOAD:i24 -> ZEXTLOAD:i16 \| (shl EXTLOAD@+2:i8, 16)		// EXTLOAD:i24 -> ZEXTLOAD:i16 \| (shl EXTLOAD@+2:i8, 16)
// Load the bottom RoundWidth bits.		// Load the bottom RoundWidth bits.
Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, Node->getValueType(0), Chain, Ptr,		Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, Node->getValueType(0), Chain, Ptr,
LD->getPointerInfo(), RoundVT, Alignment, MMOFlags,		LD->getPointerInfo(), RoundVT, Alignment, MMOFlags,
AAInfo);		AAInfo);
		Lo->setIsDivergent(LD->isDivergent());
// Load the remaining ExtraWidth bits.		// Load the remaining ExtraWidth bits.
IncrementSize = RoundWidth / 8;		IncrementSize = RoundWidth / 8;
Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,		Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
DAG.getConstant(IncrementSize, dl,		DAG.getConstant(IncrementSize, dl,
Ptr.getValueType()));		Ptr.getValueType()));
Hi = DAG.getExtLoad(ExtType, dl, Node->getValueType(0), Chain, Ptr,		Hi = DAG.getExtLoad(ExtType, dl, Node->getValueType(0), Chain, Ptr,
LD->getPointerInfo().getWithOffset(IncrementSize),		LD->getPointerInfo().getWithOffset(IncrementSize),
ExtraVT, MinAlign(Alignment, IncrementSize), MMOFlags,		ExtraVT, MinAlign(Alignment, IncrementSize), MMOFlags,
AAInfo);		AAInfo);
		Hi->setIsDivergent(LD->isDivergent());
// Build a factor node to remember that this load is independent of		// Build a factor node to remember that this load is independent of
// the other one.		// the other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));

// Move the top bits to the right place.		// Move the top bits to the right place.
Hi = DAG.getNode(		Hi = DAG.getNode(
ISD::SHL, dl, Hi.getValueType(), Hi,		ISD::SHL, dl, Hi.getValueType(), Hi,
DAG.getConstant(RoundWidth, dl,		DAG.getConstant(RoundWidth, dl,
TLI.getShiftAmountTy(Hi.getValueType(), DL)));		TLI.getShiftAmountTy(Hi.getValueType(), DL)));

// Join the hi and lo parts.		// Join the hi and lo parts.
Value = DAG.getNode(ISD::OR, dl, Node->getValueType(0), Lo, Hi);		Value = DAG.getNode(ISD::OR, dl, Node->getValueType(0), Lo, Hi);
} else {		} else {
// Big endian - avoid unaligned loads.		// Big endian - avoid unaligned loads.
// EXTLOAD:i24 -> (shl EXTLOAD:i16, 8) \| ZEXTLOAD@+2:i8		// EXTLOAD:i24 -> (shl EXTLOAD:i16, 8) \| ZEXTLOAD@+2:i8
// Load the top RoundWidth bits.		// Load the top RoundWidth bits.
Hi = DAG.getExtLoad(ExtType, dl, Node->getValueType(0), Chain, Ptr,		Hi = DAG.getExtLoad(ExtType, dl, Node->getValueType(0), Chain, Ptr,
LD->getPointerInfo(), RoundVT, Alignment, MMOFlags,		LD->getPointerInfo(), RoundVT, Alignment, MMOFlags,
AAInfo);		AAInfo);
		Hi->setIsDivergent(LD->isDivergent());
// Load the remaining ExtraWidth bits.		// Load the remaining ExtraWidth bits.
IncrementSize = RoundWidth / 8;		IncrementSize = RoundWidth / 8;
Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,		Ptr = DAG.getNode(ISD::ADD, dl, Ptr.getValueType(), Ptr,
DAG.getConstant(IncrementSize, dl,		DAG.getConstant(IncrementSize, dl,
Ptr.getValueType()));		Ptr.getValueType()));
Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, Node->getValueType(0), Chain, Ptr,		Lo = DAG.getExtLoad(ISD::ZEXTLOAD, dl, Node->getValueType(0), Chain, Ptr,
LD->getPointerInfo().getWithOffset(IncrementSize),		LD->getPointerInfo().getWithOffset(IncrementSize),
ExtraVT, MinAlign(Alignment, IncrementSize), MMOFlags,		ExtraVT, MinAlign(Alignment, IncrementSize), MMOFlags,
AAInfo);		AAInfo);
		Lo->setIsDivergent(LD->isDivergent());
// Build a factor node to remember that this load is independent of		// Build a factor node to remember that this load is independent of
// the other one.		// the other one.
Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),		Ch = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Lo.getValue(1),
Hi.getValue(1));		Hi.getValue(1));

// Move the top bits to the right place.		// Move the top bits to the right place.
Hi = DAG.getNode(		Hi = DAG.getNode(
ISD::SHL, dl, Hi.getValueType(), Hi,		ISD::SHL, dl, Hi.getValueType(), Hi,
Show All 14 Lines	case TargetLowering::Custom:
isCustom = true;		isCustom = true;
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case TargetLowering::Legal: {		case TargetLowering::Legal: {
Value = SDValue(Node, 0);		Value = SDValue(Node, 0);
Chain = SDValue(Node, 1);		Chain = SDValue(Node, 1);

if (isCustom) {		if (isCustom) {
if (SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG)) {		if (SDValue Res = TLI.LowerOperation(SDValue(Node, 0), DAG)) {
		Res->setIsDivergent(Node->isDivergent());
Value = Res;		Value = Res;
Chain = Res.getValue(1);		Chain = Res.getValue(1);
}		}
} else {		} else {
// If this is an unaligned load and the target doesn't support it,		// If this is an unaligned load and the target doesn't support it,
// expand it.		// expand it.
EVT MemVT = LD->getMemoryVT();		EVT MemVT = LD->getMemoryVT();
unsigned AS = LD->getAddressSpace();		unsigned AS = LD->getAddressSpace();
▲ Show 20 Lines • Show All 3,836 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show All 12 Lines

#ifndef LLVM_LIB_CODEGEN_SELECTIONDAG_SELECTIONDAGBUILDER_H		#ifndef LLVM_LIB_CODEGEN_SELECTIONDAG_SELECTIONDAGBUILDER_H
#define LLVM_LIB_CODEGEN_SELECTIONDAG_SELECTIONDAGBUILDER_H		#define LLVM_LIB_CODEGEN_SELECTIONDAG_SELECTIONDAGBUILDER_H

#include "StatepointLowering.h"		#include "StatepointLowering.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
		#include "llvm/Analysis/DivergenceAnalysis.h"
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"		#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Statepoint.h"		#include "llvm/IR/Statepoint.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Target/TargetLowering.h"		#include "llvm/Target/TargetLowering.h"
#include <utility>		#include <utility>
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
/// implementation that is parameterized by a TargetLowering object.		/// implementation that is parameterized by a TargetLowering object.
///		///
class SelectionDAGBuilder {		class SelectionDAGBuilder {
/// CurInst - The current instruction being visited		/// CurInst - The current instruction being visited
const Instruction *CurInst;		const Instruction *CurInst;

DenseMap<const Value*, SDValue> NodeMap;		DenseMap<const Value*, SDValue> NodeMap;

		DivergenceAnalysis * DA;

/// UnusedArgNodeMap - Maps argument value for unused arguments. This is used		/// UnusedArgNodeMap - Maps argument value for unused arguments. This is used
/// to preserve debug information for incoming arguments.		/// to preserve debug information for incoming arguments.
DenseMap<const Value*, SDValue> UnusedArgNodeMap;		DenseMap<const Value*, SDValue> UnusedArgNodeMap;

/// DanglingDebugInfo - Helper type for DanglingDebugInfoMap.		/// DanglingDebugInfo - Helper type for DanglingDebugInfoMap.
class DanglingDebugInfo {		class DanglingDebugInfo {
const DbgValueInst* DI;		const DbgValueInst* DI;
DebugLoc dl;		DebugLoc dl;
▲ Show 20 Lines • Show All 502 Lines • ▼ Show 20 Lines	public:

SelectionDAGBuilder(SelectionDAG &dag, FunctionLoweringInfo &funcinfo,		SelectionDAGBuilder(SelectionDAG &dag, FunctionLoweringInfo &funcinfo,
CodeGenOpt::Level ol)		CodeGenOpt::Level ol)
: CurInst(nullptr), SDNodeOrder(LowestSDNodeOrder), TM(dag.getTarget()),		: CurInst(nullptr), SDNodeOrder(LowestSDNodeOrder), TM(dag.getTarget()),
DAG(dag), DL(nullptr), AA(nullptr), FuncInfo(funcinfo),		DAG(dag), DL(nullptr), AA(nullptr), FuncInfo(funcinfo),
HasTailCall(false) {		HasTailCall(false) {
}		}

void init(GCFunctionInfo gfi, AliasAnalysis AA,		void init(GCFunctionInfo gfi, AliasAnalysis AA, DivergenceAnalysis *DA,
const TargetLibraryInfo *li);		const TargetLibraryInfo *li);

/// Clear out the current SelectionDAG and the associated state and prepare		/// Clear out the current SelectionDAG and the associated state and prepare
/// this SelectionDAGBuilder object to be used for a new block. This doesn't		/// this SelectionDAGBuilder object to be used for a new block. This doesn't
/// clear out information about additional blocks that are needed to complete		/// clear out information about additional blocks that are needed to complete
/// switch lowering or PHI node updating; that information is cleared out as		/// switch lowering or PHI node updating; that information is cleared out as
/// it is consumed.		/// it is consumed.
void clear();		void clear();
Show All 37 Lines	public:
void resolveDanglingDebugInfo(const Value *V, SDValue Val);		void resolveDanglingDebugInfo(const Value *V, SDValue Val);
SDValue getValue(const Value *V);		SDValue getValue(const Value *V);
bool findValue(const Value *V) const;		bool findValue(const Value *V) const;

SDValue getNonRegisterValue(const Value *V);		SDValue getNonRegisterValue(const Value *V);
SDValue getValueImpl(const Value *V);		SDValue getValueImpl(const Value *V);

void setValue(const Value *V, SDValue NewN) {		void setValue(const Value *V, SDValue NewN) {
SDValue &N = NodeMap[V];		SDValue &N = NodeMap[V];
		efriedmaUnsubmitted Not Done Reply Inline Actions You're mutating the node after it's been inserted into the CSEMap, which is generally bad. Also, it's not clear this is the node you need to set the "divergent" bit on (NewN could be something which will be eliminated by DAGCombine, like a BITCAST or MERGE_VALUES). Can we do this some other way which is more obviously correct? efriedma: You're mutating the node after it's been inserted into the CSEMap, which is generally bad.
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions You are right. BTW this is not the only place mutating SDNode that has already been created. Look in SelectionDAG.cpp lines: 4719-4722 if (SDNode E = FindNodeOrInsertPos(ID, DL, IP)) { E->intersectFlagsWith(Flags); return SDValue(E, 0); } SDnode that found in the map is mutated and then returned w/o any memoization of the mutation alex-t:* You are right. BTW this is not the only place mutating SDNode that has already been created.
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm more concerned about whether you're attaching the "divergent" bit to the right node. The mutation is probably mostly harmless; as you note, other places mess with flags. efriedma: I'm more concerned about whether you're attaching the "divergent" bit to the right node. The…
assert(!N.getNode() && "Already set a value for this node!");		assert(!N.getNode() && "Already set a value for this node!");
N = NewN;		N = NewN;
}		}

void setUnusedArgValue(const Value *V, SDValue NewN) {		void setUnusedArgValue(const Value *V, SDValue NewN) {
SDValue &N = UnusedArgNodeMap[V];		SDValue &N = UnusedArgNodeMap[V];
assert(!N.getNode() && "Already set a value for this node!");		assert(!N.getNode() && "Already set a value for this node!");
N = NewN;		N = NewN;
▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 879 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumRegs; ++i) {
// If we clobbered the stack pointer, MFI should know about it.		// If we clobbered the stack pointer, MFI should know about it.
assert(DAG.getMachineFunction().getFrameInfo().hasOpaqueSPAdjustment());		assert(DAG.getMachineFunction().getFrameInfo().hasOpaqueSPAdjustment());
}		}
}		}
}		}
}		}

void SelectionDAGBuilder::init(GCFunctionInfo gfi, AliasAnalysis aa,		void SelectionDAGBuilder::init(GCFunctionInfo gfi, AliasAnalysis aa,
		DivergenceAnalysis *da,
const TargetLibraryInfo *li) {		const TargetLibraryInfo *li) {
AA = aa;		AA = aa;
GFI = gfi;		GFI = gfi;
		DA = da;
LibInfo = li;		LibInfo = li;
DL = &DAG.getDataLayout();		DL = &DAG.getDataLayout();
Context = DAG.getContext();		Context = DAG.getContext();
LPadToCallSiteMap.clear();		LPadToCallSiteMap.clear();
}		}

void SelectionDAGBuilder::clear() {		void SelectionDAGBuilder::clear() {
NodeMap.clear();		NodeMap.clear();
▲ Show 20 Lines • Show All 2,587 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitLoad(const LoadInst &I) {

Type *Ty = I.getType();		Type *Ty = I.getType();

bool isVolatile = I.isVolatile();		bool isVolatile = I.isVolatile();
bool isNonTemporal = I.getMetadata(LLVMContext::MD_nontemporal) != nullptr;		bool isNonTemporal = I.getMetadata(LLVMContext::MD_nontemporal) != nullptr;
bool isInvariant = I.getMetadata(LLVMContext::MD_invariant_load) != nullptr;		bool isInvariant = I.getMetadata(LLVMContext::MD_invariant_load) != nullptr;
bool isDereferenceable = isDereferenceablePointer(SV, DAG.getDataLayout());		bool isDereferenceable = isDereferenceablePointer(SV, DAG.getDataLayout());
unsigned Alignment = I.getAlignment();		unsigned Alignment = I.getAlignment();
		bool isDivergent = DA->isDivergent(&I);

AAMDNodes AAInfo;		AAMDNodes AAInfo;
I.getAAMetadata(AAInfo);		I.getAAMetadata(AAInfo);
const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
SmallVector<uint64_t, 4> Offsets;		SmallVector<uint64_t, 4> Offsets;
ComputeValueVTs(TLI, DAG.getDataLayout(), Ty, ValueVTs, &Offsets);		ComputeValueVTs(TLI, DAG.getDataLayout(), Ty, ValueVTs, &Offsets);
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumValues; ++i, ++ChainI) {
if (isInvariant)		if (isInvariant)
MMOFlags \|= MachineMemOperand::MOInvariant;		MMOFlags \|= MachineMemOperand::MOInvariant;
if (isDereferenceable)		if (isDereferenceable)
MMOFlags \|= MachineMemOperand::MODereferenceable;		MMOFlags \|= MachineMemOperand::MODereferenceable;

SDValue L = DAG.getLoad(ValueVTs[i], dl, Root, A,		SDValue L = DAG.getLoad(ValueVTs[i], dl, Root, A,
MachinePointerInfo(SV, Offsets[i]), Alignment,		MachinePointerInfo(SV, Offsets[i]), Alignment,
MMOFlags, AAInfo, Ranges);		MMOFlags, AAInfo, Ranges);
		L.getNode()->setIsDivergent(isDivergent);
Values[i] = L;		Values[i] = L;
Chains[ChainI] = L.getValue(1);		Chains[ChainI] = L.getValue(1);
}		}

if (!ConstantMemory) {		if (!ConstantMemory) {
SDValue Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,		SDValue Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other,
makeArrayRef(Chains.data(), ChainI));		makeArrayRef(Chains.data(), ChainI));
if (isVolatile)		if (isVolatile)
▲ Show 20 Lines • Show All 6,163 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	if (OptLevel != CodeGenOpt::None)
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<GCModuleInfo>();		AU.addRequired<GCModuleInfo>();
AU.addRequired<StackProtector>();		AU.addRequired<StackProtector>();
AU.addPreserved<StackProtector>();		AU.addPreserved<StackProtector>();
AU.addPreserved<GCModuleInfo>();		AU.addPreserved<GCModuleInfo>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
if (UseMBPI && OptLevel != CodeGenOpt::None)		if (UseMBPI && OptLevel != CodeGenOpt::None)
AU.addRequired<BranchProbabilityInfoWrapperPass>();		AU.addRequired<BranchProbabilityInfoWrapperPass>();
		AU.addRequired<DivergenceAnalysis>();
		rampitecUnsubmitted Not Done Reply Inline Actions The analysis is pretty expensive, but not needed by all targets. There is TTI.hasBranchDivergence(). How about adding it as required only if TTI.hasBranchDivergence()? It also means you will need default isDivergent to 0 if analysis is unavailable. rampitec: The analysis is pretty expensive, but not needed by all targets. There is TTI.
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions Not sure if it makes sense. DA.runOnFunction itself bails out if target has no divergence: bool DivergenceAnalysis::runOnFunction(Function &F) { auto TTIWP = getAnalysisIfAvailable<TargetTransformInfoWrapperPass>(); if (TTIWP == nullptr) return false; TargetTransformInfo &TTI = TTIWP->getTTI(F); // Fast path: if the target does not have branch divergence, we do not mark // any branch as divergent. if (!TTI.hasBranchDivergence()) return false;* alex-t: Not sure if it makes sense. DA.runOnFunction itself bails out if target has no divergence…
		rampitecUnsubmitted Not Done Reply Inline Actions It bails, yet it depends on DominatorTree on its own, so adding it as a required pass will cause DT to build. rampitec: It bails, yet it depends on DominatorTree on its own, so adding it as a required pass will…
		vpykhtinUnsubmitted Not Done Reply Inline Actions this isn't fixed yet. vpykhtin: this isn't fixed yet.
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

/// SplitCriticalSideEffectEdges - Look for critical edges with a PHI value that		/// SplitCriticalSideEffectEdges - Look for critical edges with a PHI value that
/// may trap on it. In this case we have to split the edge so that the path		/// may trap on it. In this case we have to split the edge so that the path
/// through the predecessor block that doesn't go to the phi block doesn't		/// through the predecessor block that doesn't go to the phi block doesn't
/// execute the possibly trapping instruction. If available, we pass domtree		/// execute the possibly trapping instruction. If available, we pass domtree
/// and loop info to be updated when we split critical edges. This is because		/// and loop info to be updated when we split critical edges. This is because
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	bool SelectionDAGISel::runOnMachineFunction(MachineFunction &mf) {
else		else
FuncInfo->BPI = nullptr;		FuncInfo->BPI = nullptr;

if (OptLevel != CodeGenOpt::None)		if (OptLevel != CodeGenOpt::None)
AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
else		else
AA = nullptr;		AA = nullptr;

SDB->init(GFI, AA, LibInfo);		SDB->init(GFI, AA, &getAnalysis<DivergenceAnalysis>(), LibInfo);

MF->setHasInlineAsm(false);		MF->setHasInlineAsm(false);

FuncInfo->SplitCSR = false;		FuncInfo->SplitCSR = false;

// We split CSR if the target supports it for the given function		// We split CSR if the target supports it for the given function
// and the function has only return exits.		// and the function has only return exits.
if (OptLevel != CodeGenOpt::None && TLI->supportSplitCSR(MF)) {		if (OptLevel != CodeGenOpt::None && TLI->supportSplitCSR(MF)) {
▲ Show 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	#endif

// Run the DAG combiner in pre-legalize mode.		// Run the DAG combiner in pre-legalize mode.
{		{
NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,		NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);		CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);
}		}

DEBUG(dbgs() << "Optimized lowered selection DAG: BB#" << BlockNumber		DEBUG(dbgs() << "Optimized lowered selection DAG: BB#" << BlockNumber
		efriedmaUnsubmitted Not Done Reply Inline Actions Maybe `#ifndef NDEBUG`. Can we somehow skip this for targets which don't use divergence information? efriedma: Maybe `#ifndef NDEBUG`. Can we somehow skip this for targets which don't use divergence…
<< " '" << BlockName << "'\n"; CurDAG->dump());		<< " '" << BlockName << "'\n"; CurDAG->dump());

// Second step, hack on the DAG until it only uses operations and types that		// Second step, hack on the DAG until it only uses operations and types that
// the target supports.		// the target supports.
if (ViewLegalizeTypesDAGs && MatchFilterBB)		if (ViewLegalizeTypesDAGs && MatchFilterBB)
CurDAG->viewGraph("legalize-types input for " + BlockName);		CurDAG->viewGraph("legalize-types input for " + BlockName);

bool Changed;		bool Changed;
▲ Show 20 Lines • Show All 3,001 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 3,481 Lines • ▼ Show 20 Lines	if (isTypeLegal(intVT) && isTypeLegal(LoadedVT)) {
SDValue Scalarized = scalarizeVectorLoad(LD, DAG);		SDValue Scalarized = scalarizeVectorLoad(LD, DAG);
return std::make_pair(Scalarized.getValue(0), Scalarized.getValue(1));		return std::make_pair(Scalarized.getValue(0), Scalarized.getValue(1));
}		}

// Expand to a (misaligned) integer load of the same size,		// Expand to a (misaligned) integer load of the same size,
// then bitconvert to floating point or vector.		// then bitconvert to floating point or vector.
SDValue newLoad = DAG.getLoad(intVT, dl, Chain, Ptr,		SDValue newLoad = DAG.getLoad(intVT, dl, Chain, Ptr,
LD->getMemOperand());		LD->getMemOperand());
		newLoad->setIsDivergent(LD->isDivergent());
SDValue Result = DAG.getNode(ISD::BITCAST, dl, LoadedVT, newLoad);		SDValue Result = DAG.getNode(ISD::BITCAST, dl, LoadedVT, newLoad);
if (LoadedVT != VT)		if (LoadedVT != VT)
Result = DAG.getNode(VT.isFloatingPoint() ? ISD::FP_EXTEND :		Result = DAG.getNode(VT.isFloatingPoint() ? ISD::FP_EXTEND :
ISD::ANY_EXTEND, dl, VT, Result);		ISD::ANY_EXTEND, dl, VT, Result);

return std::make_pair(Result, newLoad.getValue(1));		return std::make_pair(Result, newLoad.getValue(1));
}		}

Show All 19 Lines	if (VT.isFloatingPoint() \|\| VT.isVector()) {

// Do all but one copies using the full register width.		// Do all but one copies using the full register width.
for (unsigned i = 1; i < NumRegs; i++) {		for (unsigned i = 1; i < NumRegs; i++) {
// Load one integer register's worth from the original location.		// Load one integer register's worth from the original location.
SDValue Load = DAG.getLoad(		SDValue Load = DAG.getLoad(
RegVT, dl, Chain, Ptr, LD->getPointerInfo().getWithOffset(Offset),		RegVT, dl, Chain, Ptr, LD->getPointerInfo().getWithOffset(Offset),
MinAlign(LD->getAlignment(), Offset), LD->getMemOperand()->getFlags(),		MinAlign(LD->getAlignment(), Offset), LD->getMemOperand()->getFlags(),
LD->getAAInfo());		LD->getAAInfo());
		Load->setIsDivergent(LD->isDivergent());

// Follow the load with a store to the stack slot. Remember the store.		// Follow the load with a store to the stack slot. Remember the store.
Stores.push_back(DAG.getStore(Load.getValue(1), dl, Load, StackPtr,		Stores.push_back(DAG.getStore(Load.getValue(1), dl, Load, StackPtr,
MachinePointerInfo()));		MachinePointerInfo()));
// Increment the pointers.		// Increment the pointers.
Offset += RegBytes;		Offset += RegBytes;
Ptr = DAG.getNode(ISD::ADD, dl, PtrVT, Ptr, PtrIncrement);		Ptr = DAG.getNode(ISD::ADD, dl, PtrVT, Ptr, PtrIncrement);
StackPtr = DAG.getNode(ISD::ADD, dl, StackPtrVT, StackPtr,		StackPtr = DAG.getNode(ISD::ADD, dl, StackPtrVT, StackPtr,
StackPtrIncrement);		StackPtrIncrement);
}		}

// The last copy may be partial. Do an extending load.		// The last copy may be partial. Do an extending load.
EVT MemVT = EVT::getIntegerVT(*DAG.getContext(),		EVT MemVT = EVT::getIntegerVT(*DAG.getContext(),
8 * (LoadedBytes - Offset));		8 * (LoadedBytes - Offset));
SDValue Load =		SDValue Load =
DAG.getExtLoad(ISD::EXTLOAD, dl, RegVT, Chain, Ptr,		DAG.getExtLoad(ISD::EXTLOAD, dl, RegVT, Chain, Ptr,
LD->getPointerInfo().getWithOffset(Offset), MemVT,		LD->getPointerInfo().getWithOffset(Offset), MemVT,
MinAlign(LD->getAlignment(), Offset),		MinAlign(LD->getAlignment(), Offset),
LD->getMemOperand()->getFlags(), LD->getAAInfo());		LD->getMemOperand()->getFlags(), LD->getAAInfo());
		Load->setIsDivergent(LD->isDivergent());

// Follow the load with a store to the stack slot. Remember the store.		// Follow the load with a store to the stack slot. Remember the store.
// On big-endian machines this requires a truncating store to ensure		// On big-endian machines this requires a truncating store to ensure
// that the bits end up in the right place.		// that the bits end up in the right place.
Stores.push_back(DAG.getTruncStore(Load.getValue(1), dl, Load, StackPtr,		Stores.push_back(DAG.getTruncStore(Load.getValue(1), dl, Load, StackPtr,
MachinePointerInfo(), MemVT));		MachinePointerInfo(), MemVT));

// The order of the stores doesn't matter - say it with a TokenFactor.		// The order of the stores doesn't matter - say it with a TokenFactor.
SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);		SDValue TF = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, Stores);

// Finally, perform the original load only redirected to the stack slot.		// Finally, perform the original load only redirected to the stack slot.
Load = DAG.getExtLoad(LD->getExtensionType(), dl, VT, TF, StackBase,		Load = DAG.getExtLoad(LD->getExtensionType(), dl, VT, TF, StackBase,
MachinePointerInfo(), LoadedVT);		MachinePointerInfo(), LoadedVT);
		Load->setIsDivergent(LD->isDivergent());

// Callers expect a MERGE_VALUES node.		// Callers expect a MERGE_VALUES node.
return std::make_pair(Load, TF);		return std::make_pair(Load, TF);
}		}

assert(LoadedVT.isInteger() && !LoadedVT.isVector() &&		assert(LoadedVT.isInteger() && !LoadedVT.isVector() &&
"Unaligned load of unsupported type.");		"Unaligned load of unsupported type.");

▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 760 Lines • ▼ Show 20 Lines	bool AMDGPUTargetLowering:: storeOfVectorConstantIsCheap(EVT MemVT,
return true;		return true;
}		}

bool AMDGPUTargetLowering::aggressivelyPreferBuildVectorSources(EVT VecVT) const {		bool AMDGPUTargetLowering::aggressivelyPreferBuildVectorSources(EVT VecVT) const {
// There are few operations which truly have vector input operands. Any vector		// There are few operations which truly have vector input operands. Any vector
// operation is going to involve operations on each component, and a		// operation is going to involve operations on each component, and a
// build_vector will be a copy per element, so it always makes sense to use a		// build_vector will be a copy per element, so it always makes sense to use a
// build_vector input in place of the extracted element to avoid a copy into a		// build_vector input in place of the extracted element to avoid a copy into a
// super register.		// super register.
		rampitecUnsubmitted Not Done Reply Inline Actions SIRegisterInfo::isVGPR() rampitec: SIRegisterInfo::isVGPR()
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions What should we do fro R600Subtarget? alex-t: What should we do fro R600Subtarget?
		rampitecUnsubmitted Not Done Reply Inline Actions Since we are not going to change selection in R600 that is practically non-important what we do for this case. rampitec: Since we are not going to change selection in R600 that is practically non-important what we do…
//		//
// We should probably only do this if all users are extracts only, but this		// We should probably only do this if all users are extracts only, but this
// should be the common case.		// should be the common case.
return true;		return true;
}		}
		rampitecUnsubmitted Not Done Reply Inline Actions I am afraid that is not true to say that VGPR is necessarily divergent. rampitec: I am afraid that is not true to say that VGPR is necessarily divergent.
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions That's not true. Do we have a mean to detect that this is a splat vector? If not I'd stay with conservative approach that consider all VGPRs divergent. Alternatively we could add one more target hook to query for special VGPRs that are uniform. alex-t: That's not true. Do we have a mean to detect that this is a splat vector? If not I'd stay with…
		rampitecUnsubmitted Not Done Reply Inline Actions I doubt you can reliably detect it. The concern is potential unneeded moves and readfirstlane instructions, one thing that we are trying to avoid here. rampitec: I doubt you can reliably detect it. The concern is potential unneeded moves and readfirstlane…
		rampitecUnsubmitted Not Done Reply Inline Actions I withdraw this objection. Apparently this is all about physregs, and we do not have a lot of them at this stage. rampitec: I withdraw this objection. Apparently this is all about physregs, and we do not have a lot of…

bool AMDGPUTargetLowering::isTruncateFree(EVT Source, EVT Dest) const {		bool AMDGPUTargetLowering::isTruncateFree(EVT Source, EVT Dest) const {
// Truncate is just accessing a subregister.		// Truncate is just accessing a subregister.

unsigned SrcSize = Source.getSizeInBits();		unsigned SrcSize = Source.getSizeInBits();
unsigned DestSize = Dest.getSizeInBits();		unsigned DestSize = Dest.getSizeInBits();

return DestSize < SrcSize && DestSize % 32 == 0 ;		return DestSize < SrcSize && DestSize % 32 == 0 ;
		rampitecUnsubmitted Not Done Reply Inline Actions !DA \|\| DA->isDivergent(...) You are using getAnalysisIfAvailable, so it can be missing. rampitec: !DA \|\| DA->isDivergent(...) You are using getAnalysisIfAvailable, so it can be missing.
		alex-tAuthorUnsubmitted Not Done Reply Inline Actions if DA == nulptr in the case above we'd return ((bool)!DA) true? maybe it's better return false for the targets that have no DA? I mean " DA && DA->isDivegent()" if we have no DA we return false. In case we have, the returned value will be defined by the isDivergent result alex-t: if DA == nulptr in the case above we'd return ((bool)!DA) true? maybe it's better return false…
		rampitecUnsubmitted Not Done Reply Inline Actions That is conservatively correct to return true. Presumably targets w/o DA will have no use of the bit anyway, but if they are it is dangerous to assume uniformness. rampitec: That is conservatively correct to return true. Presumably targets w/o DA will have no use of…
		alex-tAuthorUnsubmitted Done Reply Inline Actions targets w/o DA will have no use of the bit anyway, but if they are it is dangerous Sounds a bit paranoid :) I just noted that returning "true" for the target that has no divergence at all looks misleading. alex-t: > targets w/o DA will have no use of the bit anyway, but if they are it is dangerous Sounds a…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Moreover, targets that have no DA have neither overridden isSDNodeSourceOfDivergence and will never get here. alex-t: Moreover, targets that have no DA have neither overridden isSDNodeSourceOfDivergence and will…
		rampitecUnsubmitted Done Reply Inline Actions What if DA is invalidated and thus NULL, even for targets with divergence? rampitec: What if DA is invalidated and thus NULL, even for targets with divergence?
}		}

bool AMDGPUTargetLowering::isTruncateFree(Type Source, Type Dest) const {		bool AMDGPUTargetLowering::isTruncateFree(Type Source, Type Dest) const {
// Truncate is just accessing a subregister.		// Truncate is just accessing a subregister.

unsigned SrcSize = Source->getScalarSizeInBits();		unsigned SrcSize = Source->getScalarSizeInBits();
unsigned DestSize = Dest->getScalarSizeInBits();		unsigned DestSize = Dest->getScalarSizeInBits();

if (DestSize== 16 && Subtarget->has16BitInsts())		if (DestSize== 16 && Subtarget->has16BitInsts())
return SrcSize >= 32;		return SrcSize >= 32;

return DestSize < SrcSize && DestSize % 32 == 0;		return DestSize < SrcSize && DestSize % 32 == 0;
}		}

bool AMDGPUTargetLowering::isZExtFree(Type Src, Type Dest) const {		bool AMDGPUTargetLowering::isZExtFree(Type Src, Type Dest) const {
unsigned SrcSize = Src->getScalarSizeInBits();		unsigned SrcSize = Src->getScalarSizeInBits();
unsigned DestSize = Dest->getScalarSizeInBits();		unsigned DestSize = Dest->getScalarSizeInBits();

if (SrcSize == 16 && Subtarget->has16BitInsts())		if (SrcSize == 16 && Subtarget->has16BitInsts())
		rampitecUnsubmitted Done Reply Inline Actions Can you make isIntrinsicSourceOfDivergence() external and use it instead? rampitec: Can you make isIntrinsicSourceOfDivergence() external and use it instead?
return DestSize >= 32;		return DestSize >= 32;

return SrcSize == 32 && DestSize == 64;		return SrcSize == 32 && DestSize == 64;
}		}

bool AMDGPUTargetLowering::isZExtFree(EVT Src, EVT Dest) const {		bool AMDGPUTargetLowering::isZExtFree(EVT Src, EVT Dest) const {
// Any register load of a 64-bit value really requires 2 32-bit moves. For all		// Any register load of a 64-bit value really requires 2 32-bit moves. For all
// practical purposes, the extra mov 0 to load a 64-bit is free. As used,		// practical purposes, the extra mov 0 to load a 64-bit is free. As used,
// this will enable reducing 64-bit operations the 32-bit, which is always		// this will enable reducing 64-bit operations the 32-bit, which is always
// good.		// good.

if (Src == MVT::i16)		if (Src == MVT::i16)
return Dest == MVT::i32 \|\|Dest == MVT::i64 ;		return Dest == MVT::i32 \|\|Dest == MVT::i64 ;
		rampitecUnsubmitted Done Reply Inline Actions It still duplicates implementation in AMDGPUTargetTransformInfo. rampitec: It still duplicates implementation in AMDGPUTargetTransformInfo.

return Src == MVT::i32 && Dest == MVT::i64;		return Src == MVT::i32 && Dest == MVT::i64;
}		}

bool AMDGPUTargetLowering::isZExtFree(SDValue Val, EVT VT2) const {		bool AMDGPUTargetLowering::isZExtFree(SDValue Val, EVT VT2) const {
return isZExtFree(Val.getValueType(), VT2);		return isZExtFree(Val.getValueType(), VT2);
}		}

bool AMDGPUTargetLowering::isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {		bool AMDGPUTargetLowering::isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {
// There aren't really 64-bit registers, but pairs of 32-bit ones and only a		// There aren't really 64-bit registers, but pairs of 32-bit ones and only a
// limited number of native 64-bit operations. Shrinking an operation to fit		// limited number of native 64-bit operations. Shrinking an operation to fit
		efriedmaUnsubmitted Done Reply Inline Actions Formatting? efriedma: Formatting?
// in a single 32-bit register should always be helpful. As currently used,		// in a single 32-bit register should always be helpful. As currently used,
// this is much less general than the name suggests, and is only used in		// this is much less general than the name suggests, and is only used in
// places trying to reduce the sizes of loads. Shrinking loads to < 32-bits is		// places trying to reduce the sizes of loads. Shrinking loads to < 32-bits is
// not profitable, and may actually be harmful.		// not profitable, and may actually be harmful.
return SrcVT.getSizeInBits() > 32 && DestVT.getSizeInBits() == 32;		return SrcVT.getSizeInBits() > 32 && DestVT.getSizeInBits() == 32;
}		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::SplitVectorLoad(const SDValue Op,

unsigned Size = LoMemVT.getStoreSize();		unsigned Size = LoMemVT.getStoreSize();
unsigned BaseAlign = Load->getAlignment();		unsigned BaseAlign = Load->getAlignment();
unsigned HiAlign = MinAlign(BaseAlign, Size);		unsigned HiAlign = MinAlign(BaseAlign, Size);

SDValue LoLoad = DAG.getExtLoad(Load->getExtensionType(), SL, LoVT,		SDValue LoLoad = DAG.getExtLoad(Load->getExtensionType(), SL, LoVT,
Load->getChain(), BasePtr, SrcValue, LoMemVT,		Load->getChain(), BasePtr, SrcValue, LoMemVT,
BaseAlign, Load->getMemOperand()->getFlags());		BaseAlign, Load->getMemOperand()->getFlags());
		LoLoad->setIsDivergent(Load->isDivergent());
SDValue HiPtr = DAG.getNode(ISD::ADD, SL, PtrVT, BasePtr,		SDValue HiPtr = DAG.getNode(ISD::ADD, SL, PtrVT, BasePtr,
DAG.getConstant(Size, SL, PtrVT));		DAG.getConstant(Size, SL, PtrVT));
SDValue HiLoad =		SDValue HiLoad =
DAG.getExtLoad(Load->getExtensionType(), SL, HiVT, Load->getChain(),		DAG.getExtLoad(Load->getExtensionType(), SL, HiVT, Load->getChain(),
HiPtr, SrcValue.getWithOffset(LoMemVT.getStoreSize()),		HiPtr, SrcValue.getWithOffset(LoMemVT.getStoreSize()),
HiMemVT, HiAlign, Load->getMemOperand()->getFlags());		HiMemVT, HiAlign, Load->getMemOperand()->getFlags());
		HiLoad->setIsDivergent(Load->isDivergent());
SDValue Ops[] = {		SDValue Ops[] = {
DAG.getNode(ISD::CONCAT_VECTORS, SL, VT, LoLoad, HiLoad),		DAG.getNode(ISD::CONCAT_VECTORS, SL, VT, LoLoad, HiLoad),
DAG.getNode(ISD::TokenFactor, SL, MVT::Other,		DAG.getNode(ISD::TokenFactor, SL, MVT::Other,
LoLoad.getValue(1), HiLoad.getValue(1))		LoLoad.getValue(1), HiLoad.getValue(1))
};		};

return DAG.getMergeValues(Ops, SL);		return DAG.getMergeValues(Ops, SL);
}		}
▲ Show 20 Lines • Show All 2,495 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIISelLowering.cpp

Show First 20 Lines • Show All 3,640 Lines • ▼ Show 20 Lines	SDValue SITargetLowering::LowerLOAD(SDValue Op, SelectionDAG &DAG) const {
// If there is a possibilty that flat instruction access scratch memory		// If there is a possibilty that flat instruction access scratch memory
// then we need to use the same legalization rules we use for private.		// then we need to use the same legalization rules we use for private.
if (AS == AMDGPUASI.FLAT_ADDRESS)		if (AS == AMDGPUASI.FLAT_ADDRESS)
AS = MFI->hasFlatScratchInit() ?		AS = MFI->hasFlatScratchInit() ?
AMDGPUASI.PRIVATE_ADDRESS : AMDGPUASI.GLOBAL_ADDRESS;		AMDGPUASI.PRIVATE_ADDRESS : AMDGPUASI.GLOBAL_ADDRESS;

unsigned NumElements = MemVT.getVectorNumElements();		unsigned NumElements = MemVT.getVectorNumElements();
if (AS == AMDGPUASI.CONSTANT_ADDRESS) {		if (AS == AMDGPUASI.CONSTANT_ADDRESS) {
if (isMemOpUniform(Load))		if (!Op->isDivergent())
return SDValue();		return SDValue();
// Non-uniform loads will be selected to MUBUF instructions, so they		// Non-uniform loads will be selected to MUBUF instructions, so they
// have the same legalization requirements as global and private		// have the same legalization requirements as global and private
// loads.		// loads.
//		//
}		}
if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\| AS == AMDGPUASI.GLOBAL_ADDRESS) {		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\| AS == AMDGPUASI.GLOBAL_ADDRESS) {
if (Subtarget->getScalarizeGlobalBehavior() && isMemOpUniform(Load) &&		if (Subtarget->getScalarizeGlobalBehavior() && /isMemOpUniform(Load)/!Op->isDivergent() &&
!Load->isVolatile() && isMemOpHasNoClobberedMemOperand(Load))		!Load->isVolatile() && isMemOpHasNoClobberedMemOperand(Load))
return SDValue();		return SDValue();
// Non-uniform loads will be selected to MUBUF instructions, so they		// Non-uniform loads will be selected to MUBUF instructions, so they
// have the same legalization requirements as global and private		// have the same legalization requirements as global and private
// loads.		// loads.
//		//
}		}
if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\| AS == AMDGPUASI.GLOBAL_ADDRESS \|\|		if (AS == AMDGPUASI.CONSTANT_ADDRESS \|\| AS == AMDGPUASI.GLOBAL_ADDRESS \|\|
▲ Show 20 Lines • Show All 2,078 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SMInstructions.td

	Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Scalar Memory Patterns			// Scalar Memory Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//


	def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{			def smrd_load : PatFrag <(ops node:$ptr), (load node:$ptr), [{
	auto Ld = cast<LoadSDNode>(N);			auto Ld = cast<LoadSDNode>(N);
	return Ld->getAlignment() >= 4 &&			return Ld->getAlignment() >= 4 &&
	((Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS &&			((Ld->getAddressSpace() == AMDGPUASI.CONSTANT_ADDRESS && !N->isDivergent()) \|\|
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N)) \|\|
	(Subtarget->getScalarizeGlobalBehavior() && Ld->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS &&			(Subtarget->getScalarizeGlobalBehavior() && Ld->getAddressSpace() == AMDGPUASI.GLOBAL_ADDRESS &&
	!Ld->isVolatile() &&			!Ld->isVolatile() && !N->isDivergent() &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpUniform(N) &&
	static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N)));			static_cast<const SITargetLowering *>(getTargetLowering())->isMemOpHasNoClobberedMemOperand(N)));
	}]>;			}]>;

	def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;			def SMRDImm : ComplexPattern<i64, 2, "SelectSMRDImm">;
	def SMRDImm32 : ComplexPattern<i64, 2, "SelectSMRDImm32">;			def SMRDImm32 : ComplexPattern<i64, 2, "SelectSMRDImm32">;
	def SMRDSgpr : ComplexPattern<i64, 2, "SelectSMRDSgpr">;			def SMRDSgpr : ComplexPattern<i64, 2, "SelectSMRDSgpr">;
	def SMRDBufferImm : ComplexPattern<i32, 1, "SelectSMRDBufferImm">;			def SMRDBufferImm : ComplexPattern<i32, 1, "SelectSMRDBufferImm">;
	def SMRDBufferImm32 : ComplexPattern<i32, 1, "SelectSMRDBufferImm32">;			def SMRDBufferImm32 : ComplexPattern<i32, 1, "SelectSMRDBufferImm32">;
	▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/hsa-func.ll

				; XFAIL: *
				; REQUIRES: asserts
				arsenmUnsubmitted Not Done Reply Inline Actions This should be dropped arsenm: This should be dropped

	; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=kaveri \| FileCheck --check-prefix=HSA %s			; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=kaveri \| FileCheck --check-prefix=HSA %s
	; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=kaveri -mattr=-flat-for-global \| FileCheck --check-prefix=HSA-CI %s			; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=kaveri -mattr=-flat-for-global \| FileCheck --check-prefix=HSA-CI %s
	; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=carrizo \| FileCheck --check-prefix=HSA %s			; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=carrizo \| FileCheck --check-prefix=HSA %s
	; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=carrizo -mattr=-flat-for-global \| FileCheck --check-prefix=HSA-VI %s			; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=carrizo -mattr=-flat-for-global \| FileCheck --check-prefix=HSA-VI %s
	; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=kaveri -filetype=obj \| llvm-readobj -symbols -s -sd \| FileCheck --check-prefix=ELF %s			; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=kaveri -filetype=obj \| llvm-readobj -symbols -s -sd \| FileCheck --check-prefix=ELF %s
	; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=kaveri \| llvm-mc -filetype=obj -triple amdgcn--amdhsa -mcpu=kaveri \| llvm-readobj -symbols -s -sd \| FileCheck %s --check-prefix=ELF			; RUN: llc < %s -mtriple=amdgcn--amdhsa -mcpu=kaveri \| llvm-mc -filetype=obj -triple amdgcn--amdhsa -mcpu=kaveri \| llvm-readobj -symbols -s -sd \| FileCheck %s --check-prefix=ELF

	; The SHT_NOTE section contains the output from the .hsa_code_object_*			; The SHT_NOTE section contains the output from the .hsa_code_object_*
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selectionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 106050

include/llvm/CodeGen/SelectionDAGNodes.h

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

lib/CodeGen/SelectionDAG/TargetLowering.cpp

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SMInstructions.td

test/CodeGen/AMDGPU/hsa-func.ll

Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection
ClosedPublic