This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Implement register pressure callbacks
ClosedPublic

Authored by rampitec on Feb 7 2017, 7:26 PM.

Details

Summary

Implement getRegPressureLimit and getRegPressureSetLimit callbacks in
SIRegisterInfo.

This makes standard converge scheduler to behave almost the same as
GCNScheduler, sometime slightly better sometimes a bit worse.
In gerenal that is also possible to switch GCNScheduler to use these
callbacks instead of getMaxWaves(), which also makes GCNScheduler
slightly better on some tests and slightly worse on another. A big
win is behavior with converge scheduler.

Note, these are used not only by scheduling, but in places like
MachineLICM.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Feb 7 2017, 7:26 PM
rampitec edited the summary of this revision. (Show Details)Feb 7 2017, 7:27 PM
arsenm edited edge metadata.Feb 7 2017, 7:36 PM

Typo in commit message "startdard"W

lib/Target/AMDGPU/SIRegisterInfo.cpp
1509–1510 ↗(On Diff #87584)

Why doesn't this need to handle the tuples of registers?

1513 ↗(On Diff #87584)

SGPR register classes are also more complex because of the variants that exclude m0 and add vcc etc. Why don't those need to be handled?

lib/Target/AMDGPU/SIRegisterInfo.h
275 ↗(On Diff #87584)

const should be added here instead of using const_cast from the other overload?

Typo in commit message "startdard"W

Thanks ;)

rampitec added inline comments.Feb 7 2017, 7:48 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
1509–1510 ↗(On Diff #87584)

Since we are tracking subregs all supreregs are counted as their parts. I.e. only 32 bit registers are relevant. If you check GCNScheduler it does the same. In fact that is correct, as you want to count them in the same bucket.

1513 ↗(On Diff #87584)

Again, what is matter are real 32 bit SGPRs. Everything else either counted through subregs, or returns RC size in a default case. What we need is to limit SGPR and VGPR more than their RC size to handle occupancy.

rampitec edited the summary of this revision. (Show Details)Feb 7 2017, 7:48 PM
rampitec added inline comments.
lib/Target/AMDGPU/SIRegisterInfo.h
275 ↗(On Diff #87584)

There is no prototype with const MF. Too bad.

rampitec added inline comments.Feb 8 2017, 10:47 AM
lib/Target/AMDGPU/SIRegisterInfo.h
275 ↗(On Diff #87584)

Note, even if I implement my own function taking const MF and call it from non-const overload, the const_cast will just move into that function to call parent's getRegPressureLimit().

arsenm accepted this revision.Feb 8 2017, 11:49 AM

LGTM

lib/Target/AMDGPU/SIRegisterInfo.h
275 ↗(On Diff #87584)

I think the parent class should be fixed

This revision is now accepted and ready to land.Feb 8 2017, 11:49 AM
This revision was automatically updated to reflect the committed changes.