Instead of legalizing saddr operand with a readfirstlane
when address is moved from SGPR to VGPR we can just
change the opcode.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
5037 | I see it now, thanks. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
5049–5063 | At this point wouldn't it be simpler to just create a fresh new instruction and delete the old one? |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
5049–5063 | Unfortunately callers expect iterator to be intact. Otherwise probably yes, although it would not be less code. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
5049–5060 | Probably should add a comment explaining why this nightmare is here | |
5049–5063 | I think in some contexts in SIFixSGPRCopies the original instruction is erased, but it's a bit of a mess | |
llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir | ||
36 | Can you also add an IR test where this matters? I'm not understanding why the DAG divergence analysis would let this happen |
llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir | ||
---|---|---|
36 | The test case shall be more than 100 instructions long, this is the threshold for memory dependency analysis to give up on the "noclobber" check. The divergence analysis tells it is ok, it is uniform. But noclobber is not set, so the address registers are not known not to be clobberd and we are here. It really happens in large basic blocks. Probably it can also happen if we just had no SALU instructions to do some computations, so ended up with VALU which was pripagated, but that is not the case I have started with. I.e. it is really uniform, and readfirstlane is valid, but we don't want to issue it. I am not really sure we want a test of that size. It will be more than obscure in the first place. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
5027 | Use Inst.getMF() ? | |
5050 | Redundant assert, you've just tested this condition. | |
5055–5056 | Are these last two lines really necessary? Hasn't MRI.moveOperands already handled this? | |
5062 | VAddrDef can't be 0 here, that was already checked earlier. | |
llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir | ||
36 | You could test with -memdep-block-scan-limit=1 ? |
Addressed review comments. One thing left is a reasonable IR test.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
5055–5056 | It did not. I have added the comment. | |
5062 | Right, this is a part of D101408. Removed from this patch. | |
llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir | ||
36 | I will check if I can produce a reasonable IR test. |
LGTM with test comment
llvm/test/CodeGen/AMDGPU/global-load-saddr-to-vaddr.ll | ||
---|---|---|
3 | Add a comment explaining what this is testing | |
22–24 | Oh, this is the case where the loop index/pointer is really uniform, but forced to be a vector branch. It's just better to do the VALU address computation than do it scalar and copy with readfirstlane | |
llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir | ||
36 | Whether or not this can be converted to SMRD is a different problem |
Added comment to the test.
llvm/test/CodeGen/AMDGPU/global-load-saddr-to-vaddr.ll | ||
---|---|---|
22–24 | Yes, the memory dependency tracing was another case related to the opposite situation. This is just because we are producing VGPRs with a uniform value for the pointer. |
Use Inst.getMF() ?