This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ, MachineScheduler] Refactor GenericScheduler::tryCandidate() to reuse parts in a new SystemZ scheduling strategy.
Needs ReviewPublic

Authored by jonpa on Feb 15 2018, 2:31 AM.

Details

Reviewers
uweigand
Summary

Short: Patch with a new SystemZ SchedStrategy for Pre-RA scheduling, with refactored GenereicScheduler to reuse all of tryCandidate(). Advice needed on how to know that SU uses the vector unit.

In continuation of the discussion we had last November (http://lists.llvm.org/pipermail/llvm-dev/2017-November/119250.html), I am now much closer to actually committing something for the SystemZ backend. As then, I want to do an extra latency check on specific SUs in tryCandidate(). If it would be accepted - as Andy previously explained quite clearly it is not - this might have looked something like:

--- a/lib/CodeGen/MachineScheduler.cpp
+++ b/lib/CodeGen/MachineScheduler.cpp
@@ -2968,14 +2968,20 @@ void GenericScheduler::tryCandidate(SchedCandidate &Cand,
   // Avoid increasing the max pressure of the entire region.
   if (DAG->isTrackingPressure() && tryPressure(TryCand.RPDelta.CurrentMax,
                                                Cand.RPDelta.CurrentMax,
                                                TryCand, Cand, RegMax, TRI,
                                                DAG->MF))
     return;

+  // Let target give priority to latency for certain instructions, e.g. those
+  // using a particular pipeline.
+  if (RegionPolicy.LatencyBoost && ST.tryAggrLatency(TryCand.SU) &&
+      tryLatency(TryCand, Cand, *Zone))
+    return;
+
   if (SameBoundary) {
     // Avoid critical resource consumption and balance the schedule.

This is quite simple, but another "knob" to turn, instead of a truly flexible tryCandidate() method. I agree with Andy that tuning the pre-RA scheduling heruistics is important enough to motivate a target to derive its own strategy, so that's what I did now instead by:

  • Refactor tryCandidate() into digestible parts so that a target that wants to override this method with minor modifications can do so easily. I tried to do a separation into the logical parts, but I am of course open to alternatives, including name changing of the new methods.
  • tryLatency() becomes a protected method of GenericSchedulerBase instead of a static function in MachineScheduler.cpp, so that the derived SystemZ strategy can reuse it. More static functions may follow this pattern when needed, such as tryLess(), tryGreater(), etc.
  • A new class SystemZPreRASchedStrategy. Its tryCandidate() method basically needs to check if SU uses the vector unit (Z13_VecUnit / Z14_VecUnit), and if so call tryLatency(). I am not sure what the best way to do this check would be, and the current way of using a string comparison on the MCProcResourceDesc::Name is of course temporary. It would have been nice to have enums for the execution units, but Tablegen does not print them. Another alternative might be to use a TSFlag bit for this in the instruction descriptor, but that would mean duplicating information unnecessarily. What is the recommended way of identifying a specific processor resource? I don't see this being done anywhere. Is there another way, such as giving the VecUnit some type of flag or value?

Coming back to the original mail discussion, this will perhaps be somewhat awkward to maintain over time - in order to keep up with the developments in the base class, one has to 1) diff both tryCandidate(), and also 2) check what calls to DAG->addMutation() are done in the generic createGenericSchedLive().

As before, there are some nice improvements on benchmarks with this :-)

SystemZ tests updated.

Diff Detail

Event Timeline

jonpa created this revision.Feb 15 2018, 2:31 AM

Refactor tryCandidate() into digestible parts so that a target that wants to override this method with minor modifications can do so easily. I tried to do a separation into the logical parts, but I am of course open to alternatives, including name changing of the new methods.

Thanks for picking this up! This is along the lines what I outlined in the email thread and what I was planning to do, but did not find the time so far.

I left some inline comments and I think it would be best if you could split this up into a MachineScheduler patch and a SystemZ patch. One advantage of having target specific schedulers IMO should be that target maintainers can more easily accept changes without worrying about other targets.

Coming back to the original mail discussion, this will perhaps be somewhat awkward to maintain over time - in order to keep up with the developments in the base class, one has to 1) diff both tryCandidate(), and also 2) check what calls to DAG->addMutation() are done in the generic createGenericSchedLive().

I do not think this will be a huge issue, as there are not many changes to the generic heuristics (and with more target-specific schedulers, I expect even less need to tweak the generic heuristics). As for DAG mutations, at least the AArch64 backend already manages that itself and I do not think this has been a problem so far.

include/llvm/CodeGen/MachineScheduler.h
894–895 ↗(On Diff #134392)

I am not sure what the benefit of that is and at the moment it seems like one setting too much to me. I think it is not unreasonable to expect people to provide their own tryCandidate implementation if they want custom latency heuristics.

969 ↗(On Diff #134392)

Usually camel case is used for function names, so I think it would be better to have tryCandidateRegPressure - same for other methods below.

981 ↗(On Diff #134392)

I think for now it seems like making tryCandidate a virtual function gives peopl eenough freedom to ship their own heuristics while re-using most of the existing bits. I don't see the benefit of making the helper functions class members however.

Also, tryCandidate should not modify the scheduler state, so I think it should be const.

And as this is part of the interface now and we expect people to extend it, it would be good to document it.

lib/CodeGen/MachineScheduler.cpp
2602 ↗(On Diff #134392)

Splitting the code into multiple helper functions would be great opportunity to document the heuristics :)

jonpa added inline comments.Feb 17 2018, 9:30 AM
include/llvm/CodeGen/MachineScheduler.h
894–895 ↗(On Diff #134392)

Well... why have duplicated code in a Target backend, if it can be helped?

To me, it is generally reasonable to assume that a target starts out with enabling the generic mischeduler, and then as a next step experiment with it by adding / removing / reordering heuristics. What harm is it then to provide the means to do so with protected member methods?

981 ↗(On Diff #134392)

How are those bits going to be reused if not by inheritance?

How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all else?

fhahn added inline comments.Feb 19 2018, 5:58 AM
include/llvm/CodeGen/MachineScheduler.h
894–895 ↗(On Diff #134392)

IMO with too much subclassing and overriding, it can be quite complicated to see what's going on - but I think that comes down to personal preference, mostly. And that's just me, I suppose with those kinds of things @MatzeB and @atrick 's thoughts are much more important ;)

I also think it would be nice to have all heuristic functions somewhere together, to make it easier to keep track of them.

981 ↗(On Diff #134392)

How do you picture SystemZ adding this simple heuristic (see top of page) while reusing all else?

They do not have to be member functions, they could be regular exported functions. They don't access the scheduler object (I think), so I don't think they would need to be member functions. To me, it seems simpler not to couple the scheduler and the heuristics too much, but that's again mostly personal perference.

jonpa added inline comments.Feb 19 2018, 8:13 AM
include/llvm/CodeGen/MachineScheduler.h
894–895 ↗(On Diff #134392)

I tried adding const to all the new methods, which seemed to increase readability at least to me.

tryLatency() is used both pre- and post-RA, so it needs to be in the base. I guess tryGreater() and other such small methods should also all go in the base class.

Personally, I think it makes sense to have these type of functions as part of the class, since after all that is a scheduling strategy class for which these methods actually are very central. But like you said as well, that's just me...

981 ↗(On Diff #134392)

I tried removing the 'static' from such a function in MachineScheduler.cpp, and then adding an 'extern' declaration in SystemZMachineScheduler.cpp, but that lead to a lot of linker errors. How would you do this?

fhahn added inline comments.Feb 19 2018, 8:55 AM
include/llvm/CodeGen/MachineScheduler.h
981 ↗(On Diff #134392)

ah sorry. I meant removing static and moving the declaration to the header file.

General thoughts:

  • Factoring code textually across components is a non-goal in itself. The resulting abstraction is often less clear and harder to maintain. It's always tempting to factor code that appears repetitious but isn't actually a serious maintainability problem. Code factoring should be driven by logical boundaries, not textual repetition.
  • Inheritance as a code reuse strategy usually results in less clarity and less maintainability. I prefer object composition or free standing functions and only use inheritance when polymorphism is required.

(I know that's not very helpful advice without examples, so just take it FWIW.)

MachineScheduler thoughts:

  • Creating small utility functions that can be easily combined into a useful scheduling strategy for targets is great.
  • Breaking up the top-level tryCandidate and moving the boilerplate and pesky details into smaller helpers looks good.

So, on these points, the patch is a positive contribution.

  • An important design principle is that when GenericScheduler's implementation changes it should *not* affect targets that have already been tuned and are overriding the scheduling strategy.

See the problem that inheritance creates? As a code reuse strategy, it violates the decoupling between Target and CodeGen.

In particular, arbitrarily gouping the heuristics into RegPressure vs. RegPressure2 and Latency vs. Latency2 is unhelpful. Each heuristic entry point that you expose to the Target should have clear semantics that aren't likely to change as GenericScheduler evolves. The contract for each entry point should be clear.

So, for example, tryCandidateClusteredWeak makes sense. It isn't going to be reimplemented as something else. You can't do this for "Latency" and "RegPressure" because that could mean a number of different things. I think the challenge here is to come up with a name for each heuristic that makes sense to expose outside of the GenericScheduler implementation.

You do not *need* to expose all heuristics used by the GenericScheduler directly to targets. It's not hard for targets to copy the few lines of code for a heuristic. Copying the code doesn't create a maintenance problem, it solves one. Just expose the heuristics that are clear and easy to describe.

jonpa added a comment.Feb 21 2018, 2:07 AM

Creating small utility functions that can be easily combined into a useful scheduling strategy for targets is great.

I followed Florians and your suggestion and simply removed the static keyword and put the declaration in the header file. I then realized I had to pass some member objects as arguments instead when needed. For instance, tryCandidate_RegPress() calls DAG->isTrackingPressure(), which is a ScheduleDAGMILive method. Since the SchedBoundary *Zone argument only has a ScheduleDAGMI *DAG member, it cannot be used directly. Also, tryCandidate_Latency() needs the Rem object passed.

Regardless of which utility functions we end up with, I suppose this is is acceptable and preferred still to inheritance?

An important design principle is that when GenericScheduler's implementation changes it should *not* affect targets that have already been tuned and are overriding the scheduling strategy.

Just to express my thoughts: I see this point in the sense that if a target truly had a perfect set-in-stone tuning, it would be disastrous to change anything in an uncontrollable way. But given that mischeduler is relatively new and evolving, and a target may merely have been able to improve benchmarks with a minor modification, I think it's more natural to think that a target would really want to be in on the improvements to come in the future. In other words, *not* to decouple. Take register pressure, for instance. I don't think a target that does not have it's own register pressure heuristics would want to fall behind if the common code changes in the future. That change should then be general goodness, or it should be put in a target specific strategy, right?

So to me personally, working on just one backend, it would be slightly preferred to have e.g. the tryCandidate_RegPress() function in the target strategy, so that if somebody improved it, my target would immediately get that improvement. I don't want to decouple this, since I was merely adding a heuristic with lesser priority.

On the other hand, providing just the smaller utility functions and then doing a copy-and-paste of tryCandidate() should probably work quite well in practice, as you say. In the very long run, this would also give maximum flexibility for each target.
I suppose then we could just have one or two of those new methods, like tryCandidate_Clustered_Weak().

BTW, I am still open to suggestions on the SystemZ specific issue of answering the question "does this SU use MCProcResource X?" -- see top of page.

Just to express my thoughts: I see this point in the sense that if a target truly had a perfect set-in-stone tuning, it would be disastrous to change anything in an uncontrollable way. But given that mischeduler is relatively new and evolving, and a target may merely have been able to improve benchmarks with a minor modification, I think it's more natural to think that a target would really want to be in on the improvements to come in the future. In other words, *not* to decouple. Take register pressure, for instance. I don't think a target that does not have it's own register pressure heuristics would want to fall behind if the common code changes in the future. That change should then be general goodness, or it should be put in a target specific strategy, right?

So to me personally, working on just one backend, it would be slightly preferred to have e.g. the tryCandidate_RegPress() function in the target strategy, so that if somebody improved it, my target would immediately get that improvement. I don't want to decouple this, since I was merely adding a heuristic with lesser priority.

On the other hand, providing just the smaller utility functions and then doing a copy-and-paste of tryCandidate() should probably work quite well in practice, as you say. In the very long run, this would also give maximum flexibility for each target.
I suppose then we could just have one or two of those new methods, like tryCandidate_Clustered_Weak().

OK. I think it's very hard to group heuristics in a meaningful way. Some repetition of code on the target side will make it easier to maintain. Backing up a bit, my broader concern is that when Targets become too dependent on the incidental behavior of machine independent code, it really inhibits changes to the machine independent code.

BTW, I am still open to suggestions on the SystemZ specific issue of answering the question "does this SU use MCProcResource X?" -- see top of page.

I haven't worked with the code in years, but here's my take. In your subtarget you can define your own symbolic constant, like SystemZVectorUnitIdx = 6. You can assert that 0 == strcmp(getProcResource(SytemZVectorUnitIdx).Name, "Z14_VecUnit"). If, in the rare event that you add resources to this subtarget, the assert will force you to update this constant.

You can define a method on your Subtarget roughly like this:

auto *resource = getWriteProcResBegin(SC);
if (resource->ProcResourceIdx == SystemZVectorUnitIdx)

Hope that works.

-Andy

jonpa updated this revision to Diff 135902.Feb 26 2018, 7:33 AM

Thanks for review and advice, patch updated.

OK. I think it's very hard to group heuristics in a meaningful way. Some repetition of code on the target side will make it easier to maintain. Backing up a bit, my broader concern is that when Targets become too dependent on the incidental behavior of machine independent code, it really inhibits changes to the machine independent code.

I have updated the patch per your guidelines where the target copies tryCandidate() while reusing only small utility functions like tryLess() etc, which are now declared in MachineScheduler.h.

... In your subtarget you can define your own symbolic constant...

OK, I tried that, which is at least a slight improvement to looking it up every time.

I had to wrap the AMDGPU tryLess() and tryGreater() methods in a local namespace to avoid conflict.

Do you still want me to split this into a separate patch without the SystemZ part? (You don't have to approve that)

atrick accepted this revision.Feb 26 2018, 10:54 AM

This looks fine to me based on a quick review. I don't know if @fhahn or @MatzeB still want to weigh in. Not sure if anyone else needs to review the SystemZ specific code or if you effectively own that.

This revision is now accepted and ready to land.Feb 26 2018, 10:54 AM
fhahn accepted this revision.Feb 26 2018, 11:08 AM

Great, thanks Jonas! LGTM too.

lib/Target/AMDGPU/SIMachineScheduler.cpp
157 ↗(On Diff #135902)

nit: not sure, should nested namespaces be separated by newlines?

jonpa updated this revision to Diff 136042.Feb 27 2018, 12:24 AM

Thanks for review!

Not sure if anyone else needs to review the SystemZ specific code or if you effectively own that.

Uli is the reviewer for the SystemZ part, as usual.

NFC update to put nested namespaces on separate lines.

lib/Target/AMDGPU/SIMachineScheduler.cpp
157 ↗(On Diff #135902)

I copied this from the PowerPC backend, but looking at the "Namespace Indentation" section of the coding guidlines, it seems that at least in the example there newlines are used, so it seems you are right.

fhahn added a comment.Apr 11 2018, 7:42 AM

Hi Jonas,

are you planning on committing this patch in the near future? I suppose you are waiting until @uweigand had a look at the SystemZ specific changes?

Recently, a few people have been looking into tweaking scheduler heuristics, and after this change goes in we could easily add an experimental scheduler for AArch64 to experiment with tweaking heuristics.

Thanks again for putting up the patch!

Florian

Hi Jonas,

are you planning on committing this patch in the near future? I suppose you are waiting until @uweigand had a look at the SystemZ specific changes?

Recently, a few people have been looking into tweaking scheduler heuristics, and after this change goes in we could easily add an experimental scheduler for AArch64 to experiment with tweaking heuristics.

Thanks again for putting up the patch!

Florian

I committed the common code parts as r329884. We are waiting a bit with this for SystemZ, since we are having second thoughts that we probably should enable bi-directional scheduling before enabling tryLatency() in a specialized case (without bi-directional the second tryLatency() call is never made).

Thanks for review.

jonpa updated this revision to Diff 142126.Apr 12 2018, 1:27 AM
jonpa removed reviewers: atrick, fhahn, MatzeB.

Since the common code changes have now been committed, this is an update to only keep the SystemZ specific parts.

This revision now requires review to proceed.Apr 12 2018, 1:27 AM
nhaehnle removed a subscriber: nhaehnle.Apr 12 2018, 8:32 AM