This is to avoid using the callee saved registers for the return address of the tail call return instruction.
Details
- Reviewers
arsenm cdevadas bcahoon rampitec - Commits
- rG3bc1e084eef0: AMDGPU: Created a subclass for the return address operand in the tail call…
rG461a559bc9bd: AMDGPU: Created a subclass for the return address operand in the tail call…
rG7a98934fadc3: AMDGPU: Created a subclass for the return address operand in the tail call…
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Description is imprecise. It's a subclass, not a subregister class.
LGTM with fixed commit message
What will happen to getMinimalPhysRegClass query for SGPR_64 registers?
If this returns the new register class, ccr_sgpr_64, that's not the desired behavior.
getMinimalPhysRegClass should always return the minimal physical register class. I think CCR_SGPR_64
has all the properties of SGPR_64, so we won't have any issue for the (srcReg, RC) pair.
But we want the most lax register class possible. CCR_SGPR_64 should be used in this context and this context alone
In case you haven't noticed, the expensive checks build bots started to fail (after this patch afaict).
See https://lab.llvm.org/buildbot/#/builders/16 (or more specific this first fail when testing this patch https://lab.llvm.org/buildbot/#/builders/16/builds/45691 )
I also noticed these bots have been down for over 24 hours, so I've reverted this in ec1f5b947e6c197c0d61cc0a5c7a219ac4873614 to hopefully get the bots back to green.
I am not sure what happened. But the test actually passed on my end. So I re-commit it as it is.
Am I missing anything in the cmake options? I see it still fails:
cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" -DLLVM_USE_LINKER=lld ../llvm
In case you haven't noticed, the expensive checks build bots started to fail (after this patch afaict).
See https://lab.llvm.org/buildbot/#/builders/16 (or more specific this first fail when testing this patch https://lab.llvm.org/buildbot/#/builders/16/builds/45691 )
I added -verify-machineinstrs to the offending test so you don't need to enable expensive checks to provoke the failure: https://github.com/llvm/llvm-project/commit/13822e22968426d9a9ad7a9aa6e201e23f7bd3d1
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2490 ↗ | (On Diff #511530) | Looking at the test diffs, I think we should introduce a copy to the correct register class when TCRETURN is inserted. If the G_GLOBAL_VALUE has other uses, it's over constrained |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2490 ↗ | (On Diff #511530) | I would say the test diff you saw is from different places even though here we may have the same issue. I plan to commit as it is for now. |
llvm/lib/Target/AMDGPU/SIRegisterInfo.td | ||
---|---|---|
822 | Doesn't this depend on the calling convention? CSR_AMDGPU_SGPRs and CSR_AMDGPU_SI_Gfx_SGPRs are different. |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
2490 ↗ | (On Diff #511530) | Using constrainOperandRegClass should work |
llvm/lib/Target/AMDGPU/SIRegisterInfo.td | ||
---|---|---|
822 | Yeah, the amdgpu_gfx calling convention has s[32:63] as scratch registers, so picking a register out of s[0:32] as return register makes tail calls with the amdgpu_gfx calling convention impossible. |
llvm/lib/Target/AMDGPU/SIRegisterInfo.td | ||
---|---|---|
822 | We are seeing a tail call return address in callee saved registers, which were over-written during restore of the CSR before return. This is a correct issue. To solve this, we think it is better to restrict the tail call return address to be in the scratch registers, rather than in sreg_64 which could be a CSR. I am not sure whether the same issue exists with amdgpu_gfx calling convention. If yes, we may need to do similar thing to restrict to scratch register for tail call return address. |
llvm/lib/Target/AMDGPU/SIRegisterInfo.td | ||
---|---|---|
822 | The same problem will exist for all calling conventions |
Doesn't this depend on the calling convention? CSR_AMDGPU_SGPRs and CSR_AMDGPU_SI_Gfx_SGPRs are different.