This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Always select s_cselect_b32 for uniform 'select' SDNode
ClosedPublic

Authored by alex-t on Sep 9 2022, 11:00 AM.

Details

Summary

This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. It adds the edge in the SDNodeScheduler dependency graph instead of inserting the SCC copy between each definition and use. This approach lets the scheduler place instructions in an optimal way placing the copy only when the dependency cannot be resolved.

Diff Detail

Event Timeline

alex-t created this revision.Sep 9 2022, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2022, 11:00 AM
alex-t requested review of this revision.Sep 9 2022, 11:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2022, 11:00 AM
Herald added a subscriber: wdng. · View Herald Transcript
rampitec added inline comments.Sep 9 2022, 12:28 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4026

Need a description of the function.

4027

Alignment is off.

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
119

Place it after the check for isVirtualRegister below?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12993

Alignment.

13008

We may actually later extend it beyond compares and to include VCC.

llvm/lib/Target/AMDGPU/SIISelLowering.h
483

Alignment.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
935

getWaveMaskRegClass()? The difference that it returns SReg_32_XM0_XEXECRegClass and not SReg_32_XEXEC_HIRegClass.

foad added a comment.Sep 12 2022, 1:20 AM

This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. This allows avoiding inserting an SCC copy between each definition and use.

Please explain more. Why doesn't the generic support for physical registers work already? Why do we need to "avoid inserting an SCC copy"?

I was also working on this in D124450 but I have not looked at that patch for a few months.

This patch contains changes necessary to carry physical condition register (SCC) dependencies through the SDNode scheduler. This allows avoiding inserting an SCC copy between each definition and use.

Please explain more. Why doesn't the generic support for physical registers work already? Why do we need to "avoid inserting an SCC copy"?

I was also working on this in D124450 but I have not looked at that patch for a few months.

If we allow SCC to be copied (CopyCost() != -1) then any pair of the def-use will look like:

s_cmp_***
%1 = s_cselect_b64 -1, 0  <-- lowered CopyToReg  where srcreg == SCC
s_and_b64 %1, exec          <-- lowered CopyFromReg where dstreg == SCC
%res = s_cselect_b32 %trueval, %falseval  // not necessarily  s_cselect_b32 may be any SCC user

We could handle the trivial cases where the SCC copy is immediately copied back to SCC. Some more complex cases where these copies are interleaved by other instructions are more difficult to analyze.
At the same time, if we keep the SCC copy cost high, we let the SDNode scheduler place the instructions in an optimal way according to the register carried dependencies.
Literally, we will insert a copy only in case there is no other way to meet the data dependency.

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
119

It is deliberately here. In case we have a cross-BB SCC-carried dependency it is CopyToReg with the VReg as an operand 1.

CopyToReg chain, VregN, SCC def

and in successor BB:

CopyFromReg SCC, VregN

This is really SCC-carried dependency but it implemented via virtual register copy

alex-t edited the summary of this revision. (Show Details)Sep 13 2022, 8:39 AM
alex-t updated this revision to Diff 459768.Sep 13 2022, 8:53 AM
alex-t marked 7 inline comments as done.

addressed changes requested by the reviewer

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13008

What needs to be changed here right now for that?

rampitec added inline comments.Sep 13 2022, 1:20 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4026

Could you please use the same comment style as the rest of the code?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
13008

Nothing right now.

alex-t updated this revision to Diff 460056.Sep 14 2022, 6:06 AM

Comments style changed.

alex-t marked an inline comment as done.Sep 14 2022, 6:07 AM
This revision is now accepted and ready to land.Sep 14 2022, 1:18 PM
vpykhtin added inline comments.Sep 15 2022, 8:53 AM
llvm/lib/CodeGen/MachineRegisterInfo.cpp
83

formatting: 2 spaces

alex-t updated this revision to Diff 460479.Sep 15 2022, 12:21 PM

small formatting

alex-t marked an inline comment as done.Sep 15 2022, 12:21 PM
llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll