This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add second pass of the scheduler
ClosedPublic

Authored by rampitec on Feb 27 2017, 8:47 PM.

Details

Summary

If during scheduling we have identified that we cannot keep optimistic
occupancy increase critical register pressure limit and try scheduling
of the whole function again. In this case blocks with smaller pressure
will have a chance for better scheduling.

Diff Detail

Repository
rL LLVM

Event Timeline

rampitec created this revision.Feb 27 2017, 8:47 PM
rampitec retitled this revision from Add second pass of the scheduler to [AMDGPU] Add second pass of the scheduler.Feb 27 2017, 8:48 PM
tstellar added inline comments.
lib/Target/AMDGPU/GCNSchedStrategy.cpp
48–50 ↗(On Diff #89972)

Can you store TargetOccupancy in the SIMachineFunctionInfo object? I think that would be a little cleaner.

tstellar added inline comments.Feb 28 2017, 8:26 AM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
48–50 ↗(On Diff #89972)

To clarify:

TargetOccupancy = MFI->getTargetOccupancy();

rampitec added inline comments.Feb 28 2017, 8:29 AM
lib/Target/AMDGPU/GCNSchedStrategy.cpp
48–50 ↗(On Diff #89972)

It is the override for scheduler and it will not be always initialized to non-zero. I'm afraid if I expose this field in MFI it would be misleading.

vpykhtin edited edge metadata.Feb 28 2017, 9:57 AM

I'm just a bit confused: what gives scheduler more registry freedom on rescheduling run?

I'm just a bit confused: what gives scheduler more registry freedom on rescheduling run?

With lower occupancy we have more registers available. If for any block we cannot maintain minimal occupancy we can use the same number of registers in other blocks. That is achieved by bumping critical limits.

vpykhtin accepted this revision.Feb 28 2017, 10:13 AM

Ok, it's just not that easy to follow this in GCNMaxOccupancySchedStrategy::initialize.

This revision is now accepted and ready to land.Feb 28 2017, 10:13 AM
kzhuravl added inline comments.
lib/Target/AMDGPU/GCNSchedStrategy.cpp
62 ↗(On Diff #89972)

I think this should also respect the "amdgpu-waves-per-eu" attribute (https://clang.llvm.org/docs/AttributeReference.html#amdgpu-waves-per-eu)?

How expensive is it to do this? The scheduler is already frequently the most expensive pass after RA, sometimes surpassing it

How expensive is it to do this? The scheduler is already frequently the most expensive pass after RA, sometimes surpassing it

The algorithm is not very expensive itself. Liveins scanned for all defined registers once every region. This is probably most expensive part if there are a lot of registers. The main scan after that only touches those registers which are alive in the region. This is obviously more expensive than without it, but not terribly expensive given ready LIS we already have.

That would be possible to preserve LiveRegs between regions, but main scheduler loop can skip some regions.

lib/Target/AMDGPU/GCNSchedStrategy.cpp
62 ↗(On Diff #89972)

It does when it calls getRegPressureSetLimit(). However if we are not limited or cannot keep within guessed optimistic limits we override TargetOccupancy and reschedule.

This revision was automatically updated to reflect the committed changes.