This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Handle Additional Cases in tryFoldPhiAGPR
ClosedPublic

Authored by Pierre-vh on Jun 27 2023, 7:50 AM.

Details

Reviewers
arsenm
Group Reviewers
Restricted Project
Commits
rG026fc9e9c41d: [AMDGPU] Handle Additional Cases in tryFoldPhiAGPR
Summary

Sometimes PHI have different incoming values, such as:

%1:vgpr_256 = COPY %0:agpr_256
%2:vgpr_32 = COPY %1:vgpr_256.sub0

Those weren't handled, which could lead to massive performance issues if break-large-PHIs kicked in + AGPRs were used (MFMA)

Fixes SWDEV-407986

Diff Detail

Event Timeline

Pierre-vh created this revision.Jun 27 2023, 7:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 7:50 AM
Pierre-vh requested review of this revision.Jun 27 2023, 7:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 7:50 AM
arsenm added inline comments.Jun 27 2023, 8:27 AM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
1644–1684

Doc comment

1647

Don't call it SubRegMask, it's just the SubReg

1667

We really ought to ban defless registers in SSA but you do need this for undef sources

1764

It's not a regmask, it's just a subreg index

llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505

Could also use and end to end IR test

Pierre-vh marked 4 inline comments as done.Jun 28 2023, 12:03 AM
Pierre-vh added inline comments.
llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505

I don't have one; I can try reducing the original app's code but that may take a while, the kernel is big.

I tried reproducing the pattern I thought was relevant using some functions from mfma_loop but I can't get it to create that two copy pattern. I will try digging a bit more, but is a end-to-end test necessary here?

arsenm added inline comments.Jun 28 2023, 8:15 AM
llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505

I'd like an end to end test because eventually all this code should be deleted but we should preserve the copy-avoiding behavior. In the ideal future RegBankSelect would take care of this

Pierre-vh added inline comments.Jun 29 2023, 12:41 AM
llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505

mfma-loop already has quite a few end-to-end testcases that will fail if this fold doesn't work, but break-large-PHIs is applied. I can try to come up with another case that _maybe_ reproduces the two-copies pattern in the current trunk LLVM, but I can't guarantee it'll always produce that pattern in an end-to-end testcase. Changes anywhere in the pipeline may cause the lowering to be different and this new code path to no longer be hit, hence why I don't think it's particularly useful to have a end-to-end case here. I don't think it's worth the effort to come up with a testcase that may be entirely irrelevant very soon.

Now I'm thinking, something that may be useful would be to run mfma-loop.ll through CGP, and add that to the tests. That way if break-large-PHIs is disabled at some point in the future, we still have a stress test for AGPRs folding like SIFoldOperand does. Would you like me to add that?

Lastly, note that all of this is very specific to break-large-PHIs + DAGISel. There's a very good chance that if we switch to GISel (and no longer break PHIs in IR), that this fold will be entirely irrelevant because we won't break PHIs that way anymore, or some combine will take care of it, etc.

arsenm accepted this revision.Jun 29 2023, 5:06 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/fold-agpr-phis.mir
505

it doesn't really need to be guaranteed, if you can come up with something add it

This revision is now accepted and ready to land.Jun 29 2023, 5:06 AM
This revision was landed with ongoing or failed builds.Jun 29 2023, 5:49 AM
This revision was automatically updated to reflect the committed changes.