This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Set most sched model resource's BufferSize to one
ClosedPublic

Authored by kerbowa on Nov 30 2021, 12:26 AM.

Details

Summary

Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memory ops
and their consumers on an in-order processor. After this change, the
scheduler will treat the data edges from loads as blocking so that
stalls are guaranteed when waiting for data to be retreaved from memory.
Since we don't actually track waitcnt here, this should do a better job
at modeling their behavior.

Practically, this means that the scheduler will trigger the 'STALL'
heuristic more often.

This type of change needs to be evaluated experimentally. Preliminary
results are positive.

See test: schedule-ilp.mir

Diff Detail

Event Timeline

kerbowa created this revision.Nov 30 2021, 12:26 AM
kerbowa requested review of this revision.Nov 30 2021, 12:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2021, 12:26 AM
foad added a comment.Nov 30 2021, 2:13 AM

Would it be possible to write some llvm-mca tests that show exactly how this change affects the scheduling model? schedule-ilp.mir is better than nothing, but it is still a very indirect way of observing what's going on under the covers.

foad added a comment.Nov 30 2021, 2:33 AM

I am mostly in favour of this change just because I never understood how setting BufferSize to (say) 15 bore any relation to that fact that we can have 15 VMEM loads in flight at once. In fact I really don't understand what effect all these different ProcResources have. I think they will stop the scheduler from trying to issue two different (say) VMEM instructions in the same cycle, because they both need to use the same resource; but we have already set IssueWidth = 1, so the scheduler should already know that it can't issue any two instructions in the same cycle.

Incidentally in D87621 we discussed whether most resources should use BufferSize = 0 or 1. I am still unsure.

rampitec accepted this revision.Nov 30 2021, 9:11 AM

LGTM. As previously discussed that seem to describe our HW better. If you can add a test as suggested by Jay that will not hurt.

This revision is now accepted and ready to land.Nov 30 2021, 9:11 AM

On GFX10.1 Vulkan with our usual test cases this seems be performance agnostic.

Would it be possible to write some llvm-mca tests that show exactly how this change affects the scheduling model? schedule-ilp.mir is better than nothing, but it is still a very indirect way of observing what's going on under the covers.

I will add these mca tests along with another patch to improve scheduling for latency. This fixes a few pressing perf issues so I'm committing it now.