In SIMT architectures VGPRs are high-demand resource. Same time significant part of the computations operate on naturally scalar data.
That computations can be performed by the SALU and save a lot of VGPRs. This is intended to increase occupancy.
Also, splitting the data flow to scalar and vector parts provide more flexibility to the instruction scheduler that can increase HW utilization.
On GPU targets we say that instruction is vector if it operates on VGPR operands each lane of which contains different values.
We say the instruction is scalar if it operates on SGPR that is shared among the all threads in the warp.
Divergence Analysis was introduced by F. Pereira & Co in 2013 and now is a part of LLVM core analysis stuff.
Unfortunately it's results are mostly useless because there is no way to inform instruction selection DAG about the divergence property of the concrete instruction.
Literally, IR operation that has not divergent operands produces uniform result and should be selected to scalar instruction.
We used to pass divergence data for memory access instructions through metadata just because MemSDNode has memory operand that refer the IR.
This approach is restricted to memory accesses only. That's why we'd need another pass working on the machine code that propagates divergence property
from the value load to computations and finally to the result store. Except the fact that we'd need one more pass,
this pass would repeat on the machine instructions same algorithm that was already done by the divergence analysis over IR.
Since SDNode flags field was recently enhanced to 16 bits and there are 5 bits unoccupied yet we have a chance to use them for passing divergence data to instruction selection.
This change introduce possible approach to the implementation of such enhancement.
It passes DA data for load instructions only. If accepted we'll go ahead and add same code to handle other instructions as well.
I have a general concern about this. The way this is used is going to not fit with how SelectionDAG APIs work, and is going to be very invasive. An SDNode is supposed to be immutable and some level of CSE is done by getNode. You can't have an API that involves setting a bit on a newly created node. Anything setting this needs to be done in getNode.
Are divergent and non-divergent nodes CSEable? These need to be handled somewhere to prevent them from folding.
You seem to only specially handle loads, but we have a lot of cases where we have combine issues from not knowing whether it's going to be selected to SALU or VALU instructions. If we have to somehow propagate this on every place a node is produced, that is a massive undertaking. I don't think that at this point it's worth trying to do such a level of work on SelectionDAG with GlobalISel on the way. Only handling loads I thought we could do just from the MemOperand.