If during scheduling we have identified that we cannot keep optimistic
occupancy increase critical register pressure limit and try scheduling
of the whole function again. In this case blocks with smaller pressure
will have a chance for better scheduling.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/AMDGPU/GCNSchedStrategy.cpp | ||
---|---|---|
48–50 ↗ | (On Diff #89972) | Can you store TargetOccupancy in the SIMachineFunctionInfo object? I think that would be a little cleaner. |
lib/Target/AMDGPU/GCNSchedStrategy.cpp | ||
---|---|---|
48–50 ↗ | (On Diff #89972) | To clarify: TargetOccupancy = MFI->getTargetOccupancy(); |
lib/Target/AMDGPU/GCNSchedStrategy.cpp | ||
---|---|---|
48–50 ↗ | (On Diff #89972) | It is the override for scheduler and it will not be always initialized to non-zero. I'm afraid if I expose this field in MFI it would be misleading. |
I'm just a bit confused: what gives scheduler more registry freedom on rescheduling run?
With lower occupancy we have more registers available. If for any block we cannot maintain minimal occupancy we can use the same number of registers in other blocks. That is achieved by bumping critical limits.
Ok, it's just not that easy to follow this in GCNMaxOccupancySchedStrategy::initialize.
lib/Target/AMDGPU/GCNSchedStrategy.cpp | ||
---|---|---|
62 ↗ | (On Diff #89972) | I think this should also respect the "amdgpu-waves-per-eu" attribute (https://clang.llvm.org/docs/AttributeReference.html#amdgpu-waves-per-eu)? |
How expensive is it to do this? The scheduler is already frequently the most expensive pass after RA, sometimes surpassing it
The algorithm is not very expensive itself. Liveins scanned for all defined registers once every region. This is probably most expensive part if there are a lot of registers. The main scan after that only touches those registers which are alive in the region. This is obviously more expensive than without it, but not terribly expensive given ready LIS we already have.
That would be possible to preserve LiveRegs between regions, but main scheduler loop can skip some regions.
lib/Target/AMDGPU/GCNSchedStrategy.cpp | ||
---|---|---|
62 ↗ | (On Diff #89972) | It does when it calls getRegPressureSetLimit(). However if we are not limited or cannot keep within guessed optimistic limits we override TargetOccupancy and reschedule. |