Legalize soffset of buffer instructions. If vgpr is assigned to soffset, use waterfall loop logic to legalize.
Update loadSRsrcFromVGPR method to use for rsrc and soffset operands.
Paths
| Differential D141030
[AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic. ClosedPublic Authored by skc7 on Jan 4 2023, 9:48 PM.
Details Summary Legalize soffset of buffer instructions. If vgpr is assigned to soffset, use waterfall loop logic to legalize. Update loadSRsrcFromVGPR method to use for rsrc and soffset operands.
Diff Detail
Event Timelineskc7 retitled this revision from [WIP][AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic. to [AMDGPU] Legalize soffset of buffer instruction. Use Waterfall loop logic..Jan 9 2023, 3:01 AM Comment Actions I think this will mishandle the case where both the SRD and the soffset are VGPRs. You need to handle both at the same time in one waterfall loop (this should show up if your tests used a meaningful SRD). You can also just look into the globalisel tests for these intrinsics, they test all the permutations already
Comment Actions
Made changes to legalize and use single water fall loop for rsrc and soffset. As per my understanding, SiFixSGPRCopies pass makes call to moveToVALU sequentially (not together for soffset and rsrc) for operands. So we see two water fall loops in the tests.
This revision is now accepted and ready to land.Apr 26 2023, 10:19 AM This revision was landed with ongoing or failed builds.Apr 27 2023, 7:08 AM Closed by commit rGe016fb57b353: [AMDGPU] Legalize soffset of buffer instructions. Use Waterfall loop logic. (authored by skc7). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 517546 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
llvm/test/CodeGen/AMDGPU/legalize-amdgcn.raw.buffer.load.format.f16.ll
llvm/test/CodeGen/AMDGPU/legalize-amdgcn.raw.buffer.load.format.ll
llvm/test/CodeGen/AMDGPU/legalize-amdgcn.raw.buffer.load.ll
|
Don't understand the isIdenticalTo check, this will miss flag mismatches. Register and subregister equality should be sufficient (maybe it doesn't matter based on context)