This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.
Details
Diff Detail
Event Timeline
llvm/include/llvm/CodeGen/TargetLowering.h | ||
---|---|---|
4002 | Need a description of the function. | |
4003 | Alignment is off. | |
llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp | ||
119 | Place it after the check for isVirtualRegister below? | |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
12983 | Alignment. | |
12998 | We may actually later extend it beyond compares and to include VCC. | |
llvm/lib/Target/AMDGPU/SIISelLowering.h | ||
483 | Alignment. | |
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
935 | getWaveMaskRegClass()? The difference that it returns SReg_32_XM0_XEXECRegClass and not SReg_32_XEXEC_HIRegClass. |
This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. This allows avoiding inserting an SCC copy between each definition and use.
Please explain more. Why doesn't the generic support for physical registers work already? Why do we need to "avoid inserting an SCC copy"?
I was also working on this in D124450 but I have not looked at that patch for a few months.
If we allow SCC to be copied (CopyCost() != -1) then any pair of the def-use will look like:
s_cmp_*** %1 = s_cselect_b64 -1, 0 <-- lowered CopyToReg where srcreg == SCC s_and_b64 %1, exec <-- lowered CopyFromReg where dstreg == SCC %res = s_cselect_b32 %trueval, %falseval // not necessarily s_cselect_b32 may be any SCC user
We could handle the trivial cases where the SCC copy is immediately copied back to SCC. Some more complex cases where these copies are interleaved by other instructions are more difficult to analyze.
At the same time, if we keep the SCC copy cost high, we let the SDNode scheduler place the instructions in an optimal way according to the register carried dependencies.
Literally, we will insert a copy only in case there is no other way to meet the data dependency.
llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp | ||
---|---|---|
119 | It is deliberately here. In case we have a cross-BB SCC-carried dependency it is CopyToReg with the VReg as an operand 1. CopyToReg chain, VregN, SCC def and in successor BB: CopyFromReg SCC, VregN This is really SCC-carried dependency but it implemented via virtual register copy |
addressed changes requested by the reviewer
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12998 | What needs to be changed here right now for that? |
llvm/lib/CodeGen/MachineRegisterInfo.cpp | ||
---|---|---|
86 | formatting: 2 spaces |
Need a description of the function.