- User Since
- Jul 26 2016, 7:17 AM (102 w, 5 d)
Jun 4 2018
should I just disable this analysis completely for r600?
My apologies for the delay. Thanks for handling this. LGTM.
May 24 2018
I agree isVGPR should be illegal to call for R600
May 21 2018
Thanks for catching this
May 3 2018
Could you please clarify - why do you consider that check meaningless for r600?
I see that this line : " const SISubtarget &ST = MF->getSubtarget<SISubtarget>(); " is misleading and in fact is not correct.
I'd better check and choose the R600Subtarget or SISubtarget.
If I understand right we need just check which subtarget to retrieve for physregs check.
Apr 25 2018
Sorry for the delay. Just submitted to trunk.
It was not about the reverse patch only. I had to change tests accordingly.
Apr 16 2018
Apr 13 2018
I have no idea how to backport yet.
I 'my going to submit revert patch.
This change existed for a few months and I' my not sure what may be broken
if reverted. I have to figure out this. Some precheckin tests needed.
Apr 10 2018
In fact yes. This is the most correct solution. The stuff in
SIFixSGPRCopies.cpp has got to gone.
So it does not make sense to spend efforts on it. Moreover there is no
correct solution at all
unless we implement one more DivergenceAnalysis upon the machine IR.
So, if it's okay I'd prefer to revert.
In general LGTM.
I also concerned about the magic test that has no checks and has no visible side effects on shared data but is necessary to reproduce buggy behavior.
Could you please clarify what do you need it for?
Apr 9 2018
It seems like this patch should be reverted. Recently we have no reliable
way to determine the PHI (or whatever else) divergence on the MI level.
The only correct way is to add the DA algorithm that re-computes all the MI
Since we're going to remove the SGPRFix stuff as soon as diverence driven
ISel is ready I consider this patch is not necessary.
Mar 27 2018
Mar 17 2018
I m looking on this.
Mar 15 2018
Mar 14 2018
Mar 13 2018
Mar 5 2018
Mar 2 2018
ready to land
Mar 1 2018
Feb 26 2018
make check-llvm has passed
Following the discussion. The only register classes that appeared necessary according to the test coverage were added.
This preview can be the starting point for the discussion with respect to the proper approach.
One test fixed
One more question - where should R600 reg classes be processed?
Should we implement getRegClass in R600RegisterInfo?
Or it's okay to handle all them in SIRegisterInfo?
DAG divergence verification for "divergent" targets only.
Feb 22 2018
Verification algorithm of linear complexity
Feb 21 2018
This is the preview of the implementation that provide walk-through divergence bits consistency.
Please note that the verification algorithm has polynomial complexity and is expected to be switched ON/OFF by the option (upcoming soon) with default to OFF.
Feb 20 2018
Feb 16 2018
Feb 15 2018
Some bugfixes and changes according to the reviewers requirements.
In fact v_readfirstlane is inserted by the ISel to glue vector input to the unexpected scalar instruction.
This means that compiler user writing valid IR will get unexpected behavior.
Is this documented somewhere?
Feb 14 2018
Feb 13 2018
Feb 9 2018
Preliminary revision illustrating possible approach to keeping divergence information consistent along the DAG transformation
Feb 8 2018
Feb 5 2018
Here is alternative implementation based on the TargetLoweringInfo hooks.
Jan 31 2018
- FunctionLoweringInfo::ValueMap is created during the SelectionDAGBuilder walk through the BasicBlock. So we cannot query live-in register divergence from the CreateOperands => TargetLoweringInfo::isSDNodeSourceOfDivergence. By this point ValueMap has not yet been filled in.
Really? I thought we fill it in before we actually start building the SelectionDAG (in FunctionLoweringInfo::set). But you can move it earlier if you need to.
All above means that we cannot just validate the flag values and assert if it does not match. We have to run iterative solver for each block just before the selection to count the control dependencies and to propagate the flag values.
Jan 30 2018
Jan 29 2018
One more item that should be discussed is the target-specific exceptions to the common divergence modeling algorithm.
For instance in AMDGPU target we have amdgcn.readfirstlane/readlane intrinsics. They accept vector register and return the first or specific lane value.
So both accept naturally divergent VGPR but return the scalar value.
Following the common divergence computing algorithm - "the divergence of operation's result is superposition of the operands divergence" we'd set %scalar = tail call i32 @llvm.amdgcn.readfirstlane(i32 %tid) to divergent that is not true.
In the IR form of the divergence-driven selection we rely on the TargetTransformInfo::isAlwaysUniform hook that was added to interface for this purpose.
It allows the target to declare arbitrary set of target operations as "always uniform" so that the analysis does not count for their operands divergence.
Jan 23 2018
Specifically which nodes are a problem here? We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register. (Not sure there's an existing map from registers to values, but you could easily construct one; basically the inverse of FunctionLoweringInfo::ValueMap.)
Jan 16 2018
This is a draft of the divergence analysis solver on the selection DAG. In the course of discussion the divergence bit verification was requested.
Analysis of the one given block cannot cover control dependencies. Thus the divergence bits set from the IR reflecting control dependencies cannot match those computed on the one isolated block DAG. That's why it is not exactly the verification. The analysis performed on the DAG augments the divergence information passed from the IR.
Dec 21 2017
To start with, let's make sure that we're agreed on terms.
Divergent machine runs a set of threads (warp or wavefront) that execute same set of instructions in same order (SIMT).
Divergent operation operates on "vector" registers such that each register consists of many lanes - each thread operates on the data in corresponding lane.
From the above immediately follows that the only source of divergence is thread ID or any data that is derived from thread ID.
Usually it is a small set of target intrinsics that may be the source of such a data.
Dec 13 2017
Divergence bit propagation added to ReplaceAllUsesWith
Dec 12 2017
Dec 11 2017
Dec 8 2017
Dec 6 2017
Attention please! If nobody has objections this will be committed next Friday.
Dec 5 2017
Targets that have no divergence do not depend on Divergence Analysis anymore.
Dec 1 2017
Nov 29 2017
updated according to the comments
If I understand everything correct...
The problem you're trying to solve is well known.
You have divergent loop-exit and a value that is uniformly defined inside the loop but used outside the loop.
In this case different threads would have different values.
Traditional Divergence Analysis cannot handle this. Since definition inside the loop body is uniform the use is uniform as well.
Since the value has no explicit data dependency of the loop index, the PHI-node in the loop header (that is divergent if loop-exit is) does not affect it's divergence formally.
The value in fact does have loop-carried dependency. For example:
In general looks correct to me. You definitely should check the branch itself - not the condition.
In your case (divergent loop-exit) branch itself is divergent because of the control dependency.
Nov 28 2017
Nov 10 2017
Nov 9 2017
Implementation changed according to the reviewers suggestions.
Oct 20 2017
In other words:
Oct 17 2017
Back to initial approach.
The fix you suggested breaks x86 backend on SingleSource/Benchmarks/Adobe-C++/CMakeFiles/simple_types_loop_invariant.dir/simple_types_loop_invariant.cpp
Oct 16 2017
Oct 13 2017
Fixed according to the Matthias suggestion.
Oct 11 2017
Oct 10 2017
Oct 3 2017
Reverted back to the simplest approach.
Oct 2 2017
It is really make sense to take care of the V_READFIRSTLANE/V_READLANE destination register under exec == 0 condition in case their source VGPR is re-defined in SI_MASK_BRANCH target block. Otherwise we assume that source VGPR is defined in one of the dominating blocks and contain correct value.
- Added the code that checks that scalar register produced by V_READFIRSTLANE/V_READLANE
Is really used by some SALU instruction. This is necessary to avoid de-optimization of those cases when the scalar register is distinctly the scalar operand of vector instruction.