This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Created a sub-register class for the return address operand in the tail call return instruction
ClosedPublic

Authored by cfang on Mar 28 2023, 4:57 PM.

Diff Detail

Event Timeline

cfang created this revision.Mar 28 2023, 4:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 4:57 PM
cfang requested review of this revision.Mar 28 2023, 4:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 4:57 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm accepted this revision.Mar 28 2023, 6:55 PM

Description is imprecise. It's a subclass, not a subregister class.

LGTM with fixed commit message

This revision is now accepted and ready to land.Mar 28 2023, 6:55 PM

What will happen to getMinimalPhysRegClass query for SGPR_64 registers?
If this returns the new register class, ccr_sgpr_64, that's not the desired behavior.

cfang added a comment.Mar 28 2023, 9:06 PM

What will happen to getMinimalPhysRegClass query for SGPR_64 registers?
If this returns the new register class, ccr_sgpr_64, that's not the desired behavior.

getMinimalPhysRegClass should always return the minimal physical register class. I think CCR_SGPR_64
has all the properties of SGPR_64, so we won't have any issue for the (srcReg, RC) pair.

What will happen to getMinimalPhysRegClass query for SGPR_64 registers?
If this returns the new register class, ccr_sgpr_64, that's not the desired behavior.

getMinimalPhysRegClass should always return the minimal physical register class. I think CCR_SGPR_64
has all the properties of SGPR_64, so we won't have any issue for the (srcReg, RC) pair.

But we want the most lax register class possible. CCR_SGPR_64 should be used in this context and this context alone

bjope added a subscriber: bjope.Mar 31 2023, 1:08 AM

In case you haven't noticed, the expensive checks build bots started to fail (after this patch afaict).
See https://lab.llvm.org/buildbot/#/builders/16 (or more specific this first fail when testing this patch https://lab.llvm.org/buildbot/#/builders/16/builds/45691 )

In case you haven't noticed, the expensive checks build bots started to fail (after this patch afaict).
See https://lab.llvm.org/buildbot/#/builders/16 (or more specific this first fail when testing this patch https://lab.llvm.org/buildbot/#/builders/16/builds/45691 )

I also noticed these bots have been down for over 24 hours, so I've reverted this in ec1f5b947e6c197c0d61cc0a5c7a219ac4873614 to hopefully get the bots back to green.

cfang added a comment.Apr 4 2023, 11:13 AM

In case you haven't noticed, the expensive checks build bots started to fail (after this patch afaict).
See https://lab.llvm.org/buildbot/#/builders/16 (or more specific this first fail when testing this patch https://lab.llvm.org/buildbot/#/builders/16/builds/45691 )

I also noticed these bots have been down for over 24 hours, so I've reverted this in ec1f5b947e6c197c0d61cc0a5c7a219ac4873614 to hopefully get the bots back to green.

I am not sure what happened. But the test actually passed on my end. So I re-commit it as it is.

cfang added a comment.Apr 4 2023, 11:39 AM

In case you haven't noticed, the expensive checks build bots started to fail (after this patch afaict).
See https://lab.llvm.org/buildbot/#/builders/16 (or more specific this first fail when testing this patch https://lab.llvm.org/buildbot/#/builders/16/builds/45691 )

I also noticed these bots have been down for over 24 hours, so I've reverted this in ec1f5b947e6c197c0d61cc0a5c7a219ac4873614 to hopefully get the bots back to green.

I am not sure what happened. But the test actually passed on my end. So I re-commit it as it is.

Am I missing anything in the cmake options? I see it still fails:
cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" -DLLVM_USE_LINKER=lld ../llvm

bjope added a comment.Apr 4 2023, 11:52 AM

In case you haven't noticed, the expensive checks build bots started to fail (after this patch afaict).
See https://lab.llvm.org/buildbot/#/builders/16 (or more specific this first fail when testing this patch https://lab.llvm.org/buildbot/#/builders/16/builds/45691 )

I also noticed these bots have been down for over 24 hours, so I've reverted this in ec1f5b947e6c197c0d61cc0a5c7a219ac4873614 to hopefully get the bots back to green.

I am not sure what happened. But the test actually passed on my end. So I re-commit it as it is.

Am I missing anything in the cmake options? I see it still fails:
cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ -DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86" -DLLVM_USE_LINKER=lld ../llvm

Expensive checks are enabled by using

-DLLVM_ENABLE_EXPENSIVE_CHECKS=ON
foad added a comment.Apr 5 2023, 2:38 AM

In case you haven't noticed, the expensive checks build bots started to fail (after this patch afaict).
See https://lab.llvm.org/buildbot/#/builders/16 (or more specific this first fail when testing this patch https://lab.llvm.org/buildbot/#/builders/16/builds/45691 )

I added -verify-machineinstrs to the offending test so you don't need to enable expensive checks to provoke the failure: https://github.com/llvm/llvm-project/commit/13822e22968426d9a9ad7a9aa6e201e23f7bd3d1

cfang updated this revision to Diff 511527.Apr 6 2023, 2:25 PM

Fix a LIT test failure by keeping the original RegClass, if any.

cfang updated this revision to Diff 511530.Apr 6 2023, 2:32 PM
cfang added a reviewer: rampitec.
cdevadas added inline comments.Apr 7 2023, 4:11 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2490

Not sure this is the right thing to do. @foad can you review this?

arsenm added inline comments.Apr 7 2023, 3:38 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2490

Looking at the test diffs, I think we should introduce a copy to the correct register class when TCRETURN is inserted. If the G_GLOBAL_VALUE has other uses, it's over constrained

cfang added inline comments.Apr 10 2023, 10:30 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2490

I would say the test diff you saw is from different places even though here we may have the same issue. I plan to commit as it is for now.
and continue to investigate the performance issue related to the regClass change. Thanks.

foad added a comment.Apr 11 2023, 3:55 AM

Can you add a test where this affects codegen?

llvm/lib/Target/AMDGPU/SIRegisterInfo.td
822

Doesn't this depend on the calling convention? CSR_AMDGPU_SGPRs and CSR_AMDGPU_SI_Gfx_SGPRs are different.

arsenm added inline comments.Apr 11 2023, 4:43 AM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2490

Using constrainOperandRegClass should work

sebastian-ne added inline comments.Apr 11 2023, 5:08 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
822

Yeah, the amdgpu_gfx calling convention has s[32:63] as scratch registers, so picking a register out of s[0:32] as return register makes tail calls with the amdgpu_gfx calling convention impossible.

cfang added inline comments.Apr 13 2023, 2:04 PM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
822

We are seeing a tail call return address in callee saved registers, which were over-written during restore of the CSR before return. This is a correct issue. To solve this, we think it is better to restrict the tail call return address to be in the scratch registers, rather than in sreg_64 which could be a CSR.

I am not sure whether the same issue exists with amdgpu_gfx calling convention. If yes, we may need to do similar thing to restrict to scratch register for tail call return address.
arsenm added inline comments.Apr 13 2023, 3:36 PM
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
822

The same problem will exist for all calling conventions