The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to
optimize AGPR to AGPR copy but it missed checking for the SGPR
clobbering for the S_MOV_B64_IMM_PSEUDO generation.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This fixes the codegen of test_rocrand_kernel_xorwow.cpp of rocRAND (tracked by SWDEV-306338).
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp | ||
---|---|---|
102 | Why do we check for COPY here? If Reg is not AGPR then we will not touch any COPY instructions. | |
119 | Is there a reason why we need a new loop to check this? This should do the same as the removed lines of code below:
As in if there are more than one def of a subreg then bail. |
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp | ||
---|---|---|
150 | If you do a separate loop I do not understand why this handling is still in the second loop. |
We don't need the new loop. Sorry for the noise. We just need to make sure that
we don't do the S_MOV_B64_IMM_PSEUDO generation when we have an SGPR COPY like
how D104874 did.
Can you just change the range loop to preincrement the iterator?