Iterative approach to find best schedule is essential for GCN architecture. This change combines a number of ideas for iterative scheduling and present the infrastructure.
Lightweight scheduling
Default schedulers are scheduling immediately on MIR reordering instructions and updating IR data, such as LiveIntervals. This is relatively heavy - instead a scheduling strategy can return an array of MachineInstr pointers (or equivalent, as does SIScheduler) that defines particular schedule. This lightweight schedule can be scored against other variants and implemented once. There're two types of lightweight schedules:
- array of pointers to DAG SUnits - supposed to be returned by strategies. The benefit here is that scoring function can use DAG SUnits. Doesn't include debug values.
- array of pointers to MachineInstr - this is so called 'detached' schedule in the sence that it doesn't depend on DAG state anymore and includes debug values. This is usefull when there is a need to store some variants for a later selection.
Scheduling using different strategies require a strategy to preserve DAG state so that other strategies can reuse the same DAG. This can be achieved either by saving touched DAG data, or better not touching DAG at all by annotating DAG SUnits with relevant for a particual strategy information: SUnit has NodeNum field which allows easy annotation not using maps. Minreg strategy implements later approach.
GCNUpwardRPTracker
Lightweight schedules cannot be tracked using llvm RP trackers, for this purpose GCNUpwardRPTracker was introduced. As the name states it can only go upward inst by inst. The order of inst is defined by the tracker's caller, so it can be used both for tracking lightweight schedules and IR sequences. Upward tracking is easier to implement because it only requires region liveout set to operate, except for one case, when we need to find used livemask for a large registry use. Despite that for lightweight schedule LiveIntervals isn't updated yet for a given instruction it can be still used because livemask for a use would not change for any schedule, as all defs should dominate the use. The defs can be reordered, but the overall mask should remain the same.
TODO: save liveout sets for every region when recording and reuse for subsequent RP tracking as liveouts doesn't depend on schedule.
GCNRegPressure
the structure to track registry pressure. Contains number of SGPR/VGPRs used, weigths for large SGPR/VGPRs and compare function - pressure giving max occupancy wins, otherwise wins pressure with the lowest large registers weight.
Minimal registry scheduler (example)
This is an experimental simple scheduler the main purpose of which is to learn ways how to consume less possible registers for a region (it doesn't care for performance at all). It doesn't always return minimal usage but works relatively well on large regions with unrolled loops. Its also used in tryMaximizeOccupancy scheduling pass.
Legacy Max occupancy scheduler
included as the example and mimics current behaviour. It doesn't use lightweight schedules but shows an example of how legacy and lightweight schedulers can be intermixed. The main difference is that it first collects all the regions to schedule and sorts them by regpressure. This way it starts with the fattest region first knowing best achievable occupancy beforehand. It also includes tryMaximizeOccupancy pass which tries to minimize registry usage with minreg strategy for the most consuming regions.
None of these schedulers are turned on by default in this change.
Testing:
Legacy Max occupancy scheduler fully passes lit tests.
Minreg runs lit tests without asserts.
No performance impact so far.
Tests to be added soon.
I think it's better to put the call to the function under DEBUG rather than wrapping an entire function body