HomePhabricator

AMDGPU: Move m0 initializations earlier

Description

AMDGPU: Move m0 initializations earlier

Summary:
After hoisting and merging m0 initializations schedule them as early as
possible in the MBB. This helps the scheduler avoid hazards in some
cases.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67450

Details

Committed
kerbowaSep 11 2019, 2:28 PM
Differential Revision
D67450: AMDGPU: Move m0 initializations earlier
Parents
rL371670: gn build: Merge r371661
Branches
Unknown
Tags
Unknown

Event Timeline

arsenm added a subscriber: arsenm.Mon, Oct 7, 10:02 PM
arsenm added inline comments.
/llvm/trunk/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
606–621

I'm attempting to stop reserving M0, and having a lot of trouble with this placement logic. I'm getting placement after terminators in some cases.

The threshold here seems incorrect to use? I don't understand the function of it. If a threshold is valid to use, then it should be possible to simply insert at the beginning of the block without this scan?

kerbowa marked an inline comment as done.Mon, Oct 7, 10:16 PM
kerbowa added inline comments.
/llvm/trunk/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
606–621

The threshold is just a arbitrary limit for the scan, but we do need to check every instruction we want to move past.

Placement after terminators is obviously wrong. Do you have an example?

arsenm added inline comments.Tue, Oct 8, 11:31 AM
/llvm/trunk/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
606–621

I think I'm still hitting placement logic issues here, but the specific terminator case I started looking at turns out was already wrong coming out of isel.

kerbowa marked an inline comment as done.Tue, Oct 8, 3:17 PM
kerbowa added inline comments.
/llvm/trunk/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
606–621

We can remove the threshold if you think that's better. This logic limits movement to essentially scheduling regions. We wont move past calls or uses/defs of m0.

It's a different story when doing the merging further above.

If you see some placement issue you can try disabling '-amdgpu-enable-merge-m0' to see if that's the issue.