This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix saving fp and bp
ClosedPublic

Authored by sebastian-ne on Feb 17 2021, 7:14 AM.

Details

Summary

Spilling the fp or bp to scratch could overwrite VGPRs of inactive
lanes. Fix that by using only the active lanes of the scavenged VGPR.

This builds on the assumptions that

  1. a function is never called with exec=0
  2. lanes do not die in a function, i.e. exec!=0 in the function epilog
  3. no new lanes are active when exiting the function, i.e. exec in the epilog is a subset of exec in the prolog.

Diff Detail

Event Timeline

sebastian-ne created this revision.Feb 17 2021, 7:14 AM
sebastian-ne requested review of this revision.Feb 17 2021, 7:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2021, 7:14 AM

Should comment why the register isn't considered killed

llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
716

Don't need hasValue

728
  • instead of getValue
774

Comment the isKill

796

Typo imlicit

988

Comment isKill

1002
  • instead of getValue
1045

Typo imlicit

sebastian-ne marked 7 inline comments as done.

Fix review comments and remove writelane from epilogs.

arsenm accepted this revision.Feb 19 2021, 5:24 AM
This revision is now accepted and ready to land.Feb 19 2021, 5:24 AM

In the light of changes in D96980, can we assume that

  1. exec != 0 in the function prolog and epilog?
  2. exec in the epilog is a subset of exec in the prolog (i.e. no new lanes are active)?

That would simplify the code here.

arsenm added a comment.Mar 3 2021, 5:54 AM

In the light of changes in D96980, can we assume that

  1. exec != 0 in the function prolog and epilog?

Yes (at least for the prolog). Allowing functions to kill exec and return is an open question. We have to assume a call to an arbitrary function will have any of the problematic cases that can't run with exec=0.

  1. exec in the epilog is a subset of exec in the prolog (i.e. no new lanes are active)?

This should be fine

  1. exec != 0 in the function prolog and epilog?

Yes (at least for the prolog). Allowing functions to kill exec and return is an open question. We have to assume a call to an arbitrary function will have any of the problematic cases that can't run with exec=0.

Thanks. I’ll leave it as it is then, because we are not sure about the epilog.

  1. exec != 0 in the function prolog and epilog?

Yes (at least for the prolog). Allowing functions to kill exec and return is an open question. We have to assume a call to an arbitrary function will have any of the problematic cases that can't run with exec=0.

Thanks. I’ll leave it as it is then, because we are not sure about the epilog.

If the only way lanes can be killed is llvm.amdgcn.kill (or demote) then epilog will never be reached with EXEC=0 as kill now immediately terminates a shader if all lanes are dead (or helpers). Are there other mechanisms that reduce EXEC? Inline ASM?

arsenm added a comment.Mar 3 2021, 5:42 PM
  1. exec != 0 in the function prolog and epilog?

Yes (at least for the prolog). Allowing functions to kill exec and return is an open question. We have to assume a call to an arbitrary function will have any of the problematic cases that can't run with exec=0.

Thanks. I’ll leave it as it is then, because we are not sure about the epilog.

If the only way lanes can be killed is llvm.amdgcn.kill (or demote) then epilog will never be reached with EXEC=0 as kill now immediately terminates a shader if all lanes are dead (or helpers). Are there other mechanisms that reduce EXEC? Inline ASM?

Inline asm theoretically could, but we don't even attempt to handle exec modifications on it. We could also just define the ABI to make that an invalid use

Rewrite the patch to not rely on D96336 anymore.

Instead of spilling all lanes of the used VGPR, assume that the live lanes are fitting in the prolog and epilog and use only live lanes for spilling.

sebastian-ne edited the summary of this revision. (Show Details)Mar 30 2021, 2:11 AM
sebastian-ne requested review of this revision.Mar 30 2021, 2:13 AM

I changed the frame-setup-without-sgpr-to-vgpr-spills.ll test to be autogenerated (it is currently not autogenerated on main). I will pre-commit that if it’s ok.

arsenm accepted this revision.Mar 30 2021, 3:35 PM

LGTM. I think "lanes do not die in a function" may not be true in the future, but is good enough for now (not sure whether callers or callees should be responsible for wave termination on kill)

This revision is now accepted and ready to land.Mar 30 2021, 3:35 PM
This revision was landed with ongoing or failed builds.Apr 12 2021, 2:54 AM
This revision was automatically updated to reflect the committed changes.