Generate the minimal set of s_mov instructions required when
expanding a SGPR copy operation in copyPhysReg.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
- Merge code generation loops to avoid needing to generate work list
- Fix potential issue when all elements of copy are overwritten
- Fix test file location error
To motivate the peephole.
This pattern effects 2% of graphics shaders on GFX9, and nearly 7% on GFX10.
On average we save ~1.5 instructions per effected shader.
On some VulkanCTS tests the savings are much higher.
Given the relatively low gain, I assume it was not worth introducing a new peephole pass, and took this approach to address the duplicate s_mov instructions at the point of generation (when they cheapest to spot).
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
665 | Perhaps getRegSplitParts could be fixed to handle this degenerate case. It seems fragile to list all the SReg_32 classes here -- e.g. did you forget SReg_32_XEXEC_HI and SReg_32_XM0_XEXEC? |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
645–653 | Can we do something earlier? I don't think copyPhysReg should be considering context beyond the given instruction |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
645–653 | I think the earliest we can do this is immediately following "Simple Register Coalescing", although I am unsure of the RA implications and I do not think we have an appropriate pass it fits into. To avoid adding a new pass I think this might be a reasonable addition to "SI Fix VGPR copies" as that already passes over the whole IR and is doing work on copies. |
How about splitting this change into two parts? Limit this change to generating mixes of 64- and 32-bit moves, and then separately consider what to do about those moves of registers where subregisters are overwritten immediately.
That was my plan.
I was just hoping I could get some input on where would be considered acceptable for the overwrite dedup to go.
Can we do something earlier? I don't think copyPhysReg should be considering context beyond the given instruction