Short: Patch with a new SystemZ SchedStrategy for Pre-RA scheduling, with refactored GenereicScheduler to reuse all of tryCandidate(). Advice needed on how to know that SU uses the vector unit.
In continuation of the discussion we had last November (http://lists.llvm.org/pipermail/llvm-dev/2017-November/119250.html), I am now much closer to actually committing something for the SystemZ backend. As then, I want to do an extra latency check on specific SUs in tryCandidate(). If it would be accepted - as Andy previously explained quite clearly it is not - this might have looked something like:
--- a/lib/CodeGen/MachineScheduler.cpp +++ b/lib/CodeGen/MachineScheduler.cpp @@ -2968,14 +2968,20 @@ void GenericScheduler::tryCandidate(SchedCandidate &Cand, // Avoid increasing the max pressure of the entire region. if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CurrentMax, Cand.RPDelta.CurrentMax, TryCand, Cand, RegMax, TRI, DAG->MF)) return; + // Let target give priority to latency for certain instructions, e.g. those + // using a particular pipeline. + if (RegionPolicy.LatencyBoost && ST.tryAggrLatency(TryCand.SU) && + tryLatency(TryCand, Cand, *Zone)) + return; + if (SameBoundary) { // Avoid critical resource consumption and balance the schedule.
This is quite simple, but another "knob" to turn, instead of a truly flexible tryCandidate() method. I agree with Andy that tuning the pre-RA scheduling heruistics is important enough to motivate a target to derive its own strategy, so that's what I did now instead by:
- Refactor tryCandidate() into digestible parts so that a target that wants to override this method with minor modifications can do so easily. I tried to do a separation into the logical parts, but I am of course open to alternatives, including name changing of the new methods.
- tryLatency() becomes a protected method of GenericSchedulerBase instead of a static function in MachineScheduler.cpp, so that the derived SystemZ strategy can reuse it. More static functions may follow this pattern when needed, such as tryLess(), tryGreater(), etc.
- A new class SystemZPreRASchedStrategy. Its tryCandidate() method basically needs to check if SU uses the vector unit (Z13_VecUnit / Z14_VecUnit), and if so call tryLatency(). I am not sure what the best way to do this check would be, and the current way of using a string comparison on the MCProcResourceDesc::Name is of course temporary. It would have been nice to have enums for the execution units, but Tablegen does not print them. Another alternative might be to use a TSFlag bit for this in the instruction descriptor, but that would mean duplicating information unnecessarily. What is the recommended way of identifying a specific processor resource? I don't see this being done anywhere. Is there another way, such as giving the VecUnit some type of flag or value?
Coming back to the original mail discussion, this will perhaps be somewhat awkward to maintain over time - in order to keep up with the developments in the base class, one has to 1) diff both tryCandidate(), and also 2) check what calls to DAG->addMutation() are done in the generic createGenericSchedLive().
As before, there are some nice improvements on benchmarks with this :-)
SystemZ tests updated.