This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use max waves for scheduler's initial occupancy target
ClosedPublic

Authored by kerbowa on Oct 23 2021, 5:17 PM.

Details

Summary

The scheduler should set critical/excess register usage thresholds that
are guided by the maximum possible occupancy for the function. This
change is focused on setting proper lower bounds on register usage which
we would typically only see when a specific number of maximum waves is
requested with the "waves-per-eu" attribute, or by setting
"amdgpu-num-vgpr|sgpr" directly. This was broken previously. I have a
follow-on patch that will address issues with the scheduler not
targeting correct upper bounds on register usage which is typical with
launch bounds and min "waves-per-eu".

Changes by this patch:

Set the initial critical register usage thresholds to minimum values
that are determined by the maximum possible occupancy for the function,
or the number of allocatable registers, whichever is lower.

Avoid unisgned overflow if register limits are lower than the register
tracking "ErrorMargin", I.e. when using stress-regalloc=2.

Diff Detail

Event Timeline

kerbowa created this revision.Oct 23 2021, 5:17 PM
kerbowa requested review of this revision.Oct 23 2021, 5:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 23 2021, 5:17 PM

I wanted to clarify some things that I saw with the test schedule-regpressure-limit3.ll. Before this change, the scheduler is actually targeting max occupancy in this test when only one wave was requested. This is the main motivation for this patch. However what we actually see is higher register usage, despite the scheduler trying to limit VGPR RP at basically every step. I found that the reason for this was that disabling amdgpu-aa so that the flat stores may alias the LDS loads was greatly restricting the ability of the scheduler to reduce RP. So I've removed -enable-amdgpu-aa from this test.
This test is also a good example of the machine schedulers' seeming shortcomings when trying to maximize ILP. Even when running -misched=ilpmax we are getting worse results than some naive ILP heuristics.

arsenm accepted this revision.Oct 25 2021, 6:32 AM
This revision is now accepted and ready to land.Oct 25 2021, 6:32 AM
This revision was landed with ongoing or failed builds.Oct 26 2021, 3:31 PM
This revision was automatically updated to reflect the committed changes.