This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Implement a custom MachineSchedStrategy
ClosedPublic

Authored by tstellarAMD on Aug 18 2016, 12:13 PM.

Details

Summary

GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.

It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.

Shader DB stats:

32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)

Diff Detail

Repository
rL LLVM

Event Timeline

tstellarAMD retitled this revision from to AMDGPU/SI: Implement a custom MachineSchedStrategy.
tstellarAMD updated this object.
tstellarAMD added a subscriber: llvm-commits.

Forgot to add new files.

arsenm edited edge metadata.Aug 18 2016, 12:19 PM

I think you forgot to add the new file

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
110–117 ↗(On Diff #68588)

I think the naming here is too confusing. I think both schedulers need new names

arsenm added inline comments.Aug 18 2016, 12:23 PM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
21 ↗(On Diff #68590)

Should include SIInstrInfo instead

30–55 ↗(On Diff #68590)

We already have similar functions in the subtarget, so these should go there

114–117 ↗(On Diff #68590)

Braces

193 ↗(On Diff #68590)

Can this use a range loop?

267–270 ↗(On Diff #68590)

One DEBUG block

301–302 ↗(On Diff #68590)

Can be one DEBUG

arsenm added inline comments.Aug 18 2016, 12:25 PM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
154 ↗(On Diff #68590)

You already have ST

tstellarAMD marked 6 inline comments as done.
tstellarAMD edited edge metadata.
  • Rename scheduler to GCNMaxOccupancyScheduler.
  • Take LDS size into account when determining current occupancy
  • Enable by default.
  • Fix lit tests.
  • Addressed other review comments.
tstellarAMD marked an inline comment as done.Aug 25 2016, 8:17 AM
arsenm accepted this revision.Aug 25 2016, 9:58 AM
arsenm edited edge metadata.

LGTM with minor fixes

lib/Target/AMDGPU/AMDGPUSubtarget.h
435 ↗(On Diff #69252)

Line before comment

lib/Target/AMDGPU/GCNSchedStrategy.cpp
15 ↗(On Diff #69252)

I don't think this needs AMDGPU.h

35–37 ↗(On Diff #69252)

I think this would be easier to read with the first min in a variable, std::min(MinRegOccupancy, LocalMemOccupancy)

138 ↗(On Diff #69252)

Missing reference, maybe should just be an ArrayRef

This revision is now accepted and ready to land.Aug 25 2016, 9:58 AM
mareko edited edge metadata.Aug 25 2016, 11:47 AM

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

This is only accounting for the known static LDS, there's no reason to special case this

Please also ignore the LDS size for shader type = GS. It's the same reason as VS.

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

This is only accounting for the known static LDS, there's no reason to special case this

What is the "known static LDS"?

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

This is only accounting for the known static LDS, there's no reason to special case this

What is the "known static LDS"?

The declared. if you mean these shaders are creating a huge LDS array and don't intend to use it, they should switch to passing in a local pointer argument

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

This is only accounting for the known static LDS, there's no reason to special case this

What is the "known static LDS"?

The declared. if you mean these shaders are creating a huge LDS array and don't intend to use it, they should switch to passing in a local pointer argument

Alternatively if you can't spare an input we could add an intrinsic to get the LDS base pointer (which I guess would just return 0)

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

This is only accounting for the known static LDS, there's no reason to special case this

What is the "known static LDS"?

The declared. if you mean these shaders are creating a huge LDS array and don't intend to use it, they should switch to passing in a local pointer argument

What would the LLVM IR look like in this case? Do we have tests exercising LDS passed in a pointer argument?

For shader type = VS, please don't take the LDS size into account. The LDS size is unknown at compile time and it's almost always much less than what's declared.

This is only accounting for the known static LDS, there's no reason to special case this

What is the "known static LDS"?

The declared. if you mean these shaders are creating a huge LDS array and don't intend to use it, they should switch to passing in a local pointer argument

What would the LLVM IR look like in this case? Do we have tests exercising LDS passed in a pointer argument?

Yes, this is how OpenCL handles dynamic LDS. See local-memory.ll, it's just an addrspace(3)* argument to the function.

The alternative is to either have an intrinsic that returns the base LDS pointer so we know that dynamic LDS is in use, or to have an attribute that dynamic LDS is used and you can just use 0 directly

tstellarAMD marked 4 inline comments as done.
tstellarAMD edited edge metadata.

Only take LDS size into account for compute shaders.

Remove the special case for non-compute shaders. This will be fixed in mesa.

LGTM

lib/Target/AMDGPU/GCNSchedStrategy.cpp
9–12 ↗(On Diff #69420)

Should remove the empty file comment or add something

175 ↗(On Diff #69420)

This can be just -Reason

This revision was automatically updated to reflect the committed changes.