This is an archive of the discontinued LLVM Phabricator instance.

[PPC] Allow two post RA schedulers to be in the pipeline and select one depending on the Machine Function's subtarget
AcceptedPublic

Authored by echristo on Oct 11 2016, 11:37 AM.

Details

Summary

For PWR9, PPC backend migrates to the new Post RA MI scheduler. We continue to use PostRA top down list scheduler for PWR8 and older processors. Given that each function can override module's target-cpu, we need a mechanism that allows us to use proper scheduler, depending on the function's target-cpu.

The solution here has the following components:

1- We add a new method MachineFunctionPass::skipFunction (const MachineFunction &) which internally passes the pass ID to Machine Function's subtarget and ask whether this pass should run for this subtarget or not. If Subtarget is OK, we then internally call FunctionPass::skipFunction.

2- A new virtual getPassID method is added to MachineFunctionPass. Each subclass, can override this to return proper ID. This is used in the skipFunction of (1).

3- Some refactoring in TargetPassConfig and PPCTargetMachine is done to allow PPC to add both PostRA schedulers to the pipeline and disable the post RA scheduler with command line options.

4- TargetSubtargetInfo now has a virtual runPass method that is called by the new skipFunction method of (1).

I have tested this by checking which pass is run when invoking llc and pass different value for mcpu option. To commit those tests, I would need to run test that rely on checking DEBUG output of these passes. I am not clear if committing this kind of testcase is OK/necessary or not. Comments are welcome. I have checked that -print-after= option works properly with these changes. I have not checked if the subtarget constructed for a function that overrides target-cpu contains all necessary information. But I believe that is a separate problem. We have to check that at some point and fix any bug that we find. (I suspect that the project that allowed functions to override target-cpus has done that check....)

Diff Detail

Event Timeline

amehsan updated this revision to Diff 74274.Oct 11 2016, 11:37 AM
amehsan retitled this revision from to [PPC] Allow two post RA schedulers to be in the pipeline and select one depending on the Machine Function's subtarget.
amehsan updated this object.
amehsan added reviewers: kbarton, echristo, hfinkel.
amehsan added a subscriber: llvm-commits.
MatzeB requested changes to this revision.Oct 11 2016, 11:43 AM
MatzeB added a reviewer: MatzeB.

This adds a bunch of infrastructure in a bad way! This should be handled with a Subtarget callback like TargetSubtargetInfo::enablePostRAScheduler() which already exists!

include/llvm/CodeGen/MachineFunctionPass.h
34 ↗(On Diff #74274)

What is this good for? It seems unrelevant to the patch and this is just a base class nothing that can/should be scheduled or end up in the pass registry.

This revision now requires changes to proceed.Oct 11 2016, 11:43 AM

I think I should be more specific in my critique and I noticed that I missed part of the problem while reading this the first time:

  • I'm against adding infarstructure/special cases at a basic level like MachineFunctionPass. It is already more complicated than it should be today.
  • The current way of replacing the PostRAScheduler with the MachinePostRAScheduler is messy indeed. How about always scheduling both passes and letting enablePostRAScheduler() return an enum value indicating whether we want the machine or the list scheduler to run?

I think I should be more specific in my critique and I noticed that I missed part of the problem while reading this the first time:

  • I'm against adding infarstructure/special cases at a basic level like MachineFunctionPass. It is already more complicated than it should be today.
  • The current way of replacing the PostRAScheduler with the MachinePostRAScheduler is messy indeed. How about always scheduling both passes and letting enablePostRAScheduler() return an enum value indicating whether we want the machine or the list scheduler to run?

Thanks. First of all part of the solution that you are proposing is what is being done in the patch. Both passes are being added to the pipeline. You are suggesting a somewhat different approach for each pass to decide whether it should run or not. I think using enablePostRAScheduler() pollutes subtarget classes. Details follow.

The problem is that, if I want to take this approach then:

1- Subtarget has a method devoted to turning on or off PostRA scheduling.
2- Subtarget has to be aware of different PostRA scheduling passes that we have (to be able to pass proper enum values).

The solution in this patch provides one general solution for subtargets to answer queries about whether a pass should run or not. It also relies on existing Pass IDs. If this problem pops up again for a different pass or a different platform, all we need to do is to change the platform's runPass method that I have introduced here (and add a couple of lines to it). Also the given pass will need to change the skipFunction method that it calls. (all that is needed for that, is to remove a few characters of the code, as you can see in this patch).

Also please note that the method enablePostRASchedule can be removed from the subtarget classes with the infrastructure introduced in this patch. I can open a PR so we can do that cleanup later.

In comparison, the solution proposed in this comment, requires having a method corresponding to the given pass in the subtarget AND also making subtarget aware of PassIDs or something equivalent (like enums that you are proposing).

Just to give you an example that this is not an isolated problem, I can mention modulo scheduling pass. If we want to turn that on for PPC most likely it will be only for PWR9 not earlier processors. So do we need to add a new method enableMachinePipelining to the subtarget? I think it will be cleaner to use the infrastructure introduced in this patch. Call proper overload of skipFunction and add a couple of lines to runPass method and check for the ID of that pass.

If we come to the conclusion that this is the right approach for solving this problem, the fact that MachineFunctionPass is already complicated is beside the point. If something should not be in there, it should be removed. If we need to refactor it to handle its complexity that is a different problem to tackle.

I think I should be more specific in my critique and I noticed that I missed part of the problem while reading this the first time:

  • I'm against adding infarstructure/special cases at a basic level like MachineFunctionPass. It is already more complicated than it should be today.
  • The current way of replacing the PostRAScheduler with the MachinePostRAScheduler is messy indeed. How about always scheduling both passes and letting enablePostRAScheduler() return an enum value indicating whether we want the machine or the list scheduler to run?

Thanks. First of all part of the solution that you are proposing is what is being done in the patch. Both passes are being added to the pipeline.

Well I was proposing to always schedule both passes in TargetPassConfig rather than requiring interested targets to overload a newly introduced addPostRASched() functions to keep things simpler.

You are suggesting a somewhat different approach for each pass to decide whether it should run or not. I think using enablePostRAScheduler() pollutes subtarget classes. Details follow.

The problem is that, if I want to take this approach then:

1- Subtarget has a method devoted to turning on or off PostRA scheduling.
2- Subtarget has to be aware of different PostRA scheduling passes that we have (to be able to pass proper enum values).

I think the fact that you propose this very patch shows that the choice of post-ra scheduling algorithm is subtarget specific hence subtargets have to potentially be aware of it. In any way implementers can choose not to override that function and go with the default (which enables the machine scheduler or nothing depending on the choice made in the tablegen/MCSchedule).

The solution in this patch provides one general solution for subtargets to answer queries about whether a pass should run or not. It also relies on existing Pass IDs. If this problem pops up again for a different pass or a different platform, all we need to do is to change the platform's runPass method that I have introduced here (and add a couple of lines to it). Also the given pass will need to change the skipFunction method that it calls. (all that is needed for that, is to remove a few characters of the code, as you can see in this patch).

I am being wary about the "general solution" part. The pattern adds an extra non-obvious possibility why a pass wouldn't run, which has its costs when reading and debugging any pass. There is exactly one runPass() function which has to handle a combination of subtarget + passid to make a decision, this is very likely to get unwieldy as more passes and more subtarget exceptions get added in the future. Having 1 callback per pass (like enablePostRAScheduling) at least keeps the passid dimension out of it and makes it easier to search and understand the behaviour IMO.

Also please note that the method enablePostRASchedule can be removed from the subtarget classes with the infrastructure introduced in this patch. I can open a PR so we can do that cleanup later.

In comparison, the solution proposed in this comment, requires having a method corresponding to the given pass in the subtarget AND also making subtarget aware of PassIDs or something equivalent (like enums that you are proposing).

Just to give you an example that this is not an isolated problem, I can mention modulo scheduling pass. If we want to turn that on for PPC most likely it will be only for PWR9 not earlier processors. So do we need to add a new method enableMachinePipelining to the subtarget? I think it will be cleaner to use the infrastructure introduced in this patch. Call proper overload of skipFunction and add a couple of lines to runPass method and check for the ID of that pass.

Given the current choices I rather see enableMachinePipelining() added than a generic skipPass() callback.

If we come to the conclusion that this is the right approach for solving this problem, the fact that MachineFunctionPass is already complicated is beside the point. If something should not be in there, it should be removed. If we need to refactor it to handle its complexity that is a different problem to tackle.

Well I believe there are reasonable specific solutions here so we do not need to increase the general complexity.

The pattern adds an extra non-obvious possibility why a pass wouldn't run, which has its costs when reading and debugging any pass.

Not sure I follow here. Note that we have two overload of skipFunction(). Any pass that does not want to call the subtarget, can call the current overload. Any pass that calls the new overload, will need to call subtarget one way or another. So the extra check is there in both cases.

There is exactly one runPass() function which has to handle a combination of subtarget + passid to make a decision, this is very likely to get unwieldy as more passes and more subtarget exceptions get added in the future.

In most cases this function looks at (cpu, pass) pair and return true/false based on that. Any subtarget that starts to have a complicated runPass, can engineer their code properly. For example by adding private functions for each pass, and calling the private function based on pass id. (Note that the difference with having enablePostRA... function is that the pass dependent extra function is no longer part of the subtarget interface).

Having 1 callback per pass (like enablePostRAScheduling) at least keeps the passid dimension out of it and makes it easier to search and understand the behaviour IMO.

I see two problems with this. The minor problem is this: for the example that we have discussed so far, we need one enablePostRA function that returns an enum and an enableMachinePipeliner that returns a bool. This is a minor advantage for the current patch in terms of code quality. (In addition to a less polluted interface for subtarget). The more important problem is related to the following:

Well I was proposing to always schedule both passes in TargetPassConfig rather than requiring interested targets to overload a newly introduced addPostRASched() functions to keep things simpler.

This is not a good idea. There are already backends that have stopped using Post RA top down scheduler and use the new MI scheduler. Also the general idea, AFAIK, is that the top down scheduler is the old mechanism and MI scheduler is the new mechanism. With what you are proposing, we are saying that people who have moved/will move to the new scheduler (or potentially any new subtarget that starts using new mechanism from the beginning), should continue calling the old scheduler until every backend has moved off of it so we can entirely remove it. I don't think this is a good idea. This reminds me of the first point of your comment that I quoted here: "The pattern adds an extra non-obvious possibility why a pass wouldn't run, which has its costs when reading and debugging any pass.". Not exactly the same issue, but similar.

The pattern adds an extra non-obvious possibility why a pass wouldn't run, which has its costs when reading and debugging any pass.

Not sure I follow here. Note that we have two overload of skipFunction(). Any pass that does not want to call the subtarget, can call the current overload. Any pass that calls the new overload, will need to call subtarget one way or another. So the extra check is there in both cases.

There is exactly one runPass() function which has to handle a combination of subtarget + passid to make a decision, this is very likely to get unwieldy as more passes and more subtarget exceptions get added in the future.

In most cases this function looks at (cpu, pass) pair and return true/false based on that. Any subtarget that starts to have a complicated runPass, can engineer their code properly. For example by adding private functions for each pass, and calling the private function based on pass id. (Note that the difference with having enablePostRA... function is that the pass dependent extra function is no longer part of the subtarget interface).

Having 1 callback per pass (like enablePostRAScheduling) at least keeps the passid dimension out of it and makes it easier to search and understand the behaviour IMO.

I see two problems with this. The minor problem is this: for the example that we have discussed so far, we need one enablePostRA function that returns an enum and an enableMachinePipeliner that returns a bool. This is a minor advantage for the current patch in terms of code quality. (In addition to a less polluted interface for subtarget). The more important problem is related to the following:

I don't see the code becoming cleaner here. Instead of a targeted patch that just affects the behavior of PostMachineScheduler and PostRASchedulerList you add a generic mechanism that affects every pass calling skipFunction(). I don't see the majority of passes using this functionality.
So what you see as helpful/useful utility for other passes, mainly strikes me as a (small but not neglibile) increase in accidental complexity.

Well I was proposing to always schedule both passes in TargetPassConfig rather than requiring interested targets to overload a newly introduced addPostRASched() functions to keep things simpler.

This is not a good idea. There are already backends that have stopped using Post RA top down scheduler and use the new MI scheduler. Also the general idea, AFAIK, is that the top down scheduler is the old mechanism and MI scheduler is the new mechanism. With what you are proposing, we are saying that people who have moved/will move to the new scheduler (or potentially any new subtarget that starts using new mechanism from the beginning), should continue calling the old scheduler until every backend has moved off of it so we can entirely remove it. I don't think this is a good idea. This reminds me of the first point of your comment that I quoted here: "The pattern adds an extra non-obvious possibility why a pass wouldn't run, which has its costs when reading and debugging any pass.". Not exactly the same issue, but similar.

I don't see the point. This is one more pass scheduled that most likely immediately boils out at the beginning of runOnMachineFunction(), I don't see how that is worse than adding one more if() to every MachinePass. Sure I would love to not have to schedule two passes, so go ahead and upgrade your subtargets PPC! Until then I am fine with the simple solution.

amehsan added a comment.EditedOct 14 2016, 8:24 PM

you add a generic mechanism that affects every pass calling skipFunction().

This patch has zero impact on any pass other than the two PostRA schedulers.

I don't see how that is worse than adding one more if() to every MachinePass.

Why do you think this patch adds one more if() to every MachinePass? The impact on any pass other than the two PostRA schedulers, is literally zero.

you add a generic mechanism that affects every pass calling skipFunction().

This patch has zero impact on any pass other than the two PostRA schedulers.

I don't see how that is worse than adding one more if() to every MachinePass.

Why do you think this patch adds one more if() to every MachinePass? The impact on any pass other than the two PostRA schedulers, is literally zero.

Oh you overload skipFunction, it really looks like an override given that there is already a skipFunction in the base class, that is confusing. And I still don't think the patch should add things to MachineFunctionPass.

include/llvm/CodeGen/MachineFunctionPass.h
74–75 ↗(On Diff #74274)

This is highly confusing to have different behavior based on which overload you choose, there is not even a comment here describing the difference.

include/llvm/Target/TargetSubtargetInfo.h
225–228 ↗(On Diff #74274)

This does not mention that this will only work if the MachineFunction variant of skipFunction is used.

lib/Target/PowerPC/PPCSubtarget.cpp
263 ↗(On Diff #74274)

This cannot be reached and can be removed.

lib/Target/PowerPC/PPCTargetMachine.cpp
438–439

indentation

443–449

I don't see what is PPC specific about this, maybe put it into TargetPassConfig? It may not be necessary though as there is already -enable-post-misched=false.

444

indentation, please check clang-format.

MatzeB added inline comments.Oct 17 2016, 11:39 AM
include/llvm/CodeGen/MachineFunctionPass.h
34 ↗(On Diff #74274)

We should not need to define an ID for the base class as we never put a bare MachineFunctionPass into the PassManager pipeline.

70–72 ↗(On Diff #74274)

There is already AnalysisID getPassID() const in the base class which does that (just returns a AnalysisID pointer instead of a reference).

you add a generic mechanism that affects every pass calling skipFunction().

This patch has zero impact on any pass other than the two PostRA schedulers.

I don't see how that is worse than adding one more if() to every MachinePass.

Why do you think this patch adds one more if() to every MachinePass? The impact on any pass other than the two PostRA schedulers, is literally zero.

Oh you overload skipFunction, it really looks like an override given that there is already a skipFunction in the base class, that is confusing. And I still don't think the patch should add things to MachineFunctionPass.

:) OK. I had mentioned a couple of times in my earlier comments that this is an overload, but I guess long comments are mostly glanced over. So I will remove the getPassID function and the static ID field that I added and instead use existing getPassID. (I did not realize that this exists. Thanks for pointing it out :). I will also add comments before declaration of each skipFunction overload (and other places that you mentioned) to clarify that there exists another overload.

include/llvm/CodeGen/MachineFunctionPass.h
74–75 ↗(On Diff #74274)

I will add comment. If I understand correctly the using clause would not be needed if this was an override. That should indicate that this is an overload. But a comment will definitely help.

lib/Target/PowerPC/PPCSubtarget.cpp
263 ↗(On Diff #74274)

OK

lib/Target/PowerPC/PPCTargetMachine.cpp
438–439

OK

443–449

I need to make sure that if someone compile with -mcpu=pwr9 and there are functions with target-cpu=pwr8, still with -disable-post-ra option passed to the command line all post ra schedulers will be turned off.

444

OK

I just happened to run accross this patch: https://reviews.llvm.org/D8717

So there is precedent on how to achieve this, we should probably use this style for consistency.

I just happened to run accross this patch: https://reviews.llvm.org/D8717

So there is precedent on how to achieve this, we should probably use this style for consistency.

Thanks. Will look into it.

I just happened to run accross this patch: https://reviews.llvm.org/D8717

So there is precedent on how to achieve this, we should probably use this style for consistency.

Thanks. Will look into it.

My main concern with the approach of D8717 is that now I will need to change functions like createGenericSchedPostRA() and either create Target Specific overrides for that or add some target specific code to it (get PPCSubtarget, if not null then...). I think both of these are ugly and more invasive.

I just happened to run accross this patch: https://reviews.llvm.org/D8717

So there is precedent on how to achieve this, we should probably use this style for consistency.

Thanks. Will look into it.

My main concern with the approach of D8717 is that now I will need to change functions like createGenericSchedPostRA() and either create Target Specific overrides for that or add some target specific code to it (get PPCSubtarget, if not null then...). I think both of these are ugly and more invasive.

As for consistency, I think the approach here can replace both D8717 and enablePostRAScheduler() approaches. We need a PR to remove those in favor of the simpler and more general approach.

amehsan updated this revision to Diff 75728.EditedOct 25 2016, 10:35 AM
amehsan edited edge metadata.

Even though we may continue discussing the approach, I update the patch to

1- Use existing getPassID instead of introducing a new one.
2- Adding more comments to the code.

Will this patch become obsolete when darwin support is removed from the PowerPC target (see http://lists.llvm.org/pipermail/llvm-dev/2016-October/106359.html)? Even if we would really decide to keep PPC support, I am pretty sure that in the interest of easy maintainability we should not have an alternate scheduling strategy just for darwin...

amehsan added a comment.EditedOct 25 2016, 12:55 PM

Will this patch become obsolete when darwin support is removed from the PowerPC target (see http://lists.llvm.org/pipermail/llvm-dev/2016-October/106359.html)? Even if we would really decide to keep PPC support, I am pretty sure that in the interest of easy maintainability we should not have an alternate scheduling strategy just for darwin...

No, this patch has nothing to do with Darwin support. I know that I am checking DarwinDirective somewhere. That is because I saw that pattern used to check which cpu we are running on elsewhere in the code. I can change that line of code (if that directive becomes obsolete after we drop Darwin support), but otherwise this patch is mostly about PWR8 and PWR9 and has nothing to do with darwin support.

Will this patch become obsolete when darwin support is removed from the PowerPC target (see http://lists.llvm.org/pipermail/llvm-dev/2016-October/106359.html)? Even if we would really decide to keep PPC support, I am pretty sure that in the interest of easy maintainability we should not have an alternate scheduling strategy just for darwin...

No, this patch has nothing to do with Darwin support. I know that I am checking DarwinDirective somewhere. That is because I saw that pattern used to check which cpu we are running on elsewhere in the code. I can change that line of code, but otherwise this patch is mostly about PWR8 and PWR9 and has nothing to do with darwin support.

Ok, checking DarwinDirective to differentiate between PWR8 and PWR9 is confusing for the uninitiated :)

Will this patch become obsolete when darwin support is removed from the PowerPC target (see http://lists.llvm.org/pipermail/llvm-dev/2016-October/106359.html)? Even if we would really decide to keep PPC support, I am pretty sure that in the interest of easy maintainability we should not have an alternate scheduling strategy just for darwin...

No, this patch has nothing to do with Darwin support. I know that I am checking DarwinDirective somewhere. That is because I saw that pattern used to check which cpu we are running on elsewhere in the code. I can change that line of code, but otherwise this patch is mostly about PWR8 and PWR9 and has nothing to do with darwin support.

Ok, checking DarwinDirective to differentiate between PWR8 and PWR9 is confusing for the uninitiated :)

You are right :) Unfortunately that seems to be fairly common in the code, and the alternative that I found was checking CPU string. I thought this is still better than checking a string. Definitely we need to improve that.

hfinkel edited edge metadata.Oct 25 2016, 6:34 PM

Will this patch become obsolete when darwin support is removed from the PowerPC target (see http://lists.llvm.org/pipermail/llvm-dev/2016-October/106359.html)? Even if we would really decide to keep PPC support, I am pretty sure that in the interest of easy maintainability we should not have an alternate scheduling strategy just for darwin...

No, this patch has nothing to do with Darwin support. I know that I am checking DarwinDirective somewhere. That is because I saw that pattern used to check which cpu we are running on elsewhere in the code. I can change that line of code, but otherwise this patch is mostly about PWR8 and PWR9 and has nothing to do with darwin support.

Ok, checking DarwinDirective to differentiate between PWR8 and PWR9 is confusing for the uninitiated :)

You are right :) Unfortunately that seems to be fairly common in the code, and the alternative that I found was checking CPU string. I thought this is still better than checking a string. Definitely we need to improve that.

Yea, we've never renamed the variable. We should.

Yea, we've never renamed the variable. We should.

Opened a PR for this

kbarton edited edge metadata.Oct 27 2016, 12:40 PM

I'm OK with this, as long as all of @MatzeB issues are addressed.
It would be good to get comments from either @hfinkel or @echristo as well.

@MatzeB
Do you have further comments on this?

echristo added inline comments.Oct 27 2016, 4:40 PM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

I'm not sure why you need all of this and can't write the skipFunction without needing a MachineFunction in specific. Can you explain for me?

amehsan added inline comments.Oct 27 2016, 6:07 PM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

Because I need MachineFunction's subtarget. Does that answer your question?

echristo added inline comments.Oct 28 2016, 12:17 AM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

Not really, but I'd missed any resolution (which didn't appear to happen) for the basic idea of passing in a function to the pass.

Can this be done similar to IfConversion instead? I'd really prefer not to have this overloading of the idea of skipFunction.

amehsan added inline comments.Oct 28 2016, 8:48 AM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

I don't really understand what you mean by "the basic idea of passing in a function to the pass". Could you clarify?

By IfConversion I believe you are referring to D8717. Could you explain why you prefer that to overloading skipFunction?

echristo added inline comments.Oct 28 2016, 1:32 PM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

So D8717 allows targets to configure passes by passing in a callback at configuration time. skipFunction is a pretty terrible hack that originally existed as a mechanism to deal with optnone and has an ok side effect of being able to use it for bisection. It's not a use case that should be promulgated further, especially given that the configuration via call back is both easier to understand and just as configurable (and more localized).

amehsan added inline comments.Oct 29 2016, 10:23 AM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

Let me try to explain why "skipFunction is a pretty terrible hack"

Ideally, when we have a noopt function, no optimization pass should be invoked. Our infrastructure does not allow us to do that, so we use this hack that each optimization first checks the function if it is noopt or not.

Is this correct? If yes, are there other reason as well or not. If no, why skipFunction is a hack?

amehsan added inline comments.Oct 29 2016, 11:00 AM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

I wanted to edit my comment if possible, and clicked on "hide comment". Let me repeat it.

Let me try to explain why "skipFunction is a pretty terrible hack"

Ideally, when we have a noopt function, no optimization pass should be invoked. For some reason we cannot do that, so we use this hack that each optimization first checks the function if it is noopt or not.

Is this correct? If yes, are there other reason as well? If no, why skipFunction is a hack?

echristo added inline comments.Oct 29 2016, 11:06 AM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

It's a bit more complicated than that, it also involves how the code generator is constructed, the problems with the DAG combiner, various code generators not handling code that the DAG combiner hasn't run on, etc. You should take a look at the threads around the time the code came in about it. I believe Paul Robinson and I were part of the discussions here.

There's also my dislike of hardcoding pass IDs in the backend - I think that's poor maintenance and more confusing than a simple function passed in at pass construction time. Do you disagree? If so, can you cite an example of why?

Let me rephrase your question back at you: Is there a good reason for not doing it the way I'm proposing and have used across all of the backends to enable/disable passes based on subtargets?

amehsan added inline comments.Oct 30 2016, 7:09 PM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

Before I answer your question, let me say that I can completely decouple this patch from skipFunction by a few small changes.That should address your concern of promulgating skipFunction.

With D8717, there is extra boilerplate code for creating, passing and checking the subtarget callback to the pass. This boilerplate code should be repeated for each pass that is subtarget dependent. We need all of this, despite the fact that a subtarget object is already available through MachineFunction object. My proposed patch, minimizes the boilerplate code and uses the available subtarget object. It is simpler and it is more general.

Additionally, from a quick look, I am not sure that implementing D8717 approach for PostRA MIScheduler is as straightforward as IfConversion, given creation of scheduler seems to be more complicated. Again, this is my impression from a quick look at the code.

I have found an email thread from early 2014 which should be what you mentioned. I have not yet gone through it to see if I find a reason there against this idea or not. If there is a reason, that explains why the extra boilerplate of 8717 is required, I can happily look into what it takes to apply that idea to PostRA scheduler.

echristo added inline comments.Oct 30 2016, 7:37 PM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

I feel there's very little boilerplate added for a single pass. I would definitely prefer that mechanism. While the MI Scheduler code is more complicated than IfConversion I'd prefer you work on integrating this style of callback and simplifying the code that's currently there (i.e. removing the current way the command line option works if necessary) than the current direction you're going.

amehsan added inline comments.Oct 30 2016, 7:56 PM
lib/CodeGen/MachineFunctionPass.cpp
94–98 ↗(On Diff #75728)

OK. I will update the patch.

amehsan updated this revision to Diff 76497.Oct 31 2016, 4:00 PM
amehsan edited edge metadata.
amehsan added inline comments.Oct 31 2016, 4:03 PM
lib/Target/PowerPC/PPCTargetMachine.cpp
447

I need to make sure that the lower bound here (PPC::DIR_32) is correct.

This looks pretty good. I just have some small nitpicks left:

include/llvm/CodeGen/Passes.h
120–121

I recently changed the callback for IfConversion/ExpandBundle to take a MachineFunction& instead of a Function& which should also make life slightly easier in this case.

181–182

Same as above, use MachineFunction&

include/llvm/CodeGen/TargetPassConfig.h
256–257

That fact that targets can override these functions is true for the whole file. I'd just state what the function is supposed to be doing. Something like
Add scheduler passes to be executed after register allocation.

258

Rename this to addPostRASched, a target could just as well add multiple passes or none at all.

lib/CodeGen/MachineScheduler.cpp
368–369

There is an interesting difference to IfConversion here which first checks skipFunction and then performs this test. Thinking about it I think this variant is preferable in the pass bisection use case. So we can keep this and should change IfConversion in a subsequent patch.

lib/CodeGen/TargetPassConfig.cpp
275–278

This comment should go to TargetPassConfig.h.

286–297

This comment should go to TargetPassConfig.h.

539–541

This comment seems redundant with the doxygen comment for the whole addPostRASchedPass() so we can remove it.

lib/Target/PowerPC/PPCTargetMachine.cpp
440

This can be simplified to MF.getSubtarget<PPCSubtarget>().getDarwinDirective() when the argument is changed to MachineFunction& (see above). Similar in the next callback.

MatzeB added inline comments.Oct 31 2016, 4:17 PM
lib/CodeGen/TargetPassConfig.cpp
543–544

It looks like we supersede ™::targetSchedulesPostRAScheduling()"? It should be trivially for the targets reporting true here to instead just overload this method with an empty one. (We can do this in a follow up patch though).

amehsan added inline comments.Nov 1 2016, 7:42 AM
lib/CodeGen/TargetPassConfig.cpp
543–544

My understanding is that if a target returns true, it means that they probably want to schedule PostRAScheduler pass somewhere else in the pipeline. I think this condition should guard the call to addPostRASchedPass(), even though the functionality will be the same.

amehsan added inline comments.Nov 1 2016, 9:24 AM
lib/CodeGen/TargetPassConfig.cpp
543–544

I also leave this comment here as it talks about the functionality of targetSchedulesPostRAScheduling. I think we need a PR to clean up all the different ways and method of scheduling post ra sched and remove redundants and those that don't have a use case. There is the virtual function and this targetSchedulesPostRAScheduling() and enablePostRA.... and various methods of choosing between old scheduler and new scheduer (command line, substitutePass)

amehsan updated this revision to Diff 76589.Nov 1 2016, 10:15 AM
MatzeB accepted this revision.Nov 1 2016, 10:23 AM
MatzeB edited edge metadata.

LGTM, thanks for working on this.
I couldn't stop myself marking a few more nitpicks, but there is no further review necessary.

lib/CodeGen/MachineScheduler.cpp
140

We usually don't have an empty line at the beginning of a class.

367

We usually do no have a newline at the beginning of a function. Same for PostRAScheduler::runOnMachineFunction.

lib/CodeGen/PostRASchedulerList.cpp
717

No newline at end of file.

lib/CodeGen/TargetPassConfig.cpp
543–544

Yes, this was meant as a side remark and as I said we can deal with it another time, not as part of this patch.

lib/Target/PowerPC/PPCTargetMachine.cpp
305

We usually do not have a newline before the end of a class.

This revision is now accepted and ready to land.Nov 1 2016, 10:23 AM
amehsan updated this revision to Diff 76605.Nov 1 2016, 11:18 AM
amehsan edited edge metadata.

Eric, asked for a little more clean up.

I am aware of the following mechanisms for scheduling post ra schedulers.

1- The new virtual function in this patch
2- targetSchedulesPostRAScheduling
3- enablePostRA method in the subtarget
4- PostRAScheduler flag in include/llvm/Target/TargetSchedule.td
5- command line option to turn on new scheduler instead of old one.
6- command line option to turn off post ra schedulers
7- SubstitutePass to use new scheduler instead of old one.

Each of these seems to have a slightly different usecase. If there is a likelihood that we can remove some of these, it might be worthwhile to open a PR. otherwise we can leave it as is.

Matt: Sorry I just realized I could have applied your comment to this update. I will make sure to check those before committing though.

echristo added inline comments.Nov 1 2016, 1:16 PM
lib/CodeGen/TargetPassConfig.cpp
543–544

When were you planning on tackling this? I think unraveling this as part of a prepartory patch and splitting out the necessary elements would be general project goodness.

MatzeB added a comment.Nov 1 2016, 2:17 PM

Indeed there are number of mechanisms in place nowadays, so for what it's worth:

Eric, asked for a little more clean up.

I am aware of the following mechanisms for scheduling post ra schedulers.

1- The new virtual function in this patch

I consider overriding methods in TargetPassConfig to be the most canonical way for targets to influence pass scheduling. So this should stay.

2- targetSchedulesPostRAScheduling

This seems redundant to me as soon as 1) is implemented.

3- enablePostRA method in the subtarget
4- PostRAScheduler flag in include/llvm/Target/TargetSchedule.td

This redundancy is unfortunate, IMO we should just have the one in TargetSchedModel/TargetSchedule.td and not an extra callback in TargetSubtargetInfo. Last time I tried fixing this I ran into the problem that aarch64 reuses a scheduling model for two targets with different postrasched settings. One way to attack this would be to figure out if we may not better share the same postra setting or whether we should duplicate the scheduling model with different postra sched settings. The other thing which I would like to see longer term is having a way to subclass TargetSchedModel so targets can specialize a few more scheduling related functions (there is some functions like shouldScheduleAdjacent() or isAsCheapAsMove() which I think would make more sense as part of a per-subtarget scheduling model). I'd really like to see some cleanups here but don't think we should block this patch (at least not on this specific point).

5- command line option to turn on new scheduler instead of old one.
6- command line option to turn off post ra schedulers

I don't think we need to spend too much time thinking about our debugging commandline options. And those two are fine IMO.

7- SubstitutePass to use new scheduler instead of old one.

I never liked this mechanism. I believe it should be possible to (re-)model TargetPassConfig in a way that there is a method that can be overridden to replace a pass. (I also believe that we could simplify target passmanager configuration by doing the nonintuitive move of actually duplicating some of the setup code across our targets so they have an easier time creating variants/deviating from the default, but I can see this becoming a controversial debate I'd like to not start in this review ;-) )

Each of these seems to have a slightly different usecase. If there is a likelihood that we can remove some of these, it might be worthwhile to open a PR. otherwise we can leave it as is.

A PR or even patches would be apreciated.

Matt: Sorry I just realized I could have applied your comment to this update. I will make sure to check those before committing though.

2- targetSchedulesPostRAScheduling

This seems redundant to me as soon as 1) is implemented.

Maybe I am missing something, but this seems different from (1). Note that addPostRASched is called from addMachinePasses which is not typically overriden by targets. SystemZ has a usecase for targetSchedulesPostRAScheduling which cannot be covered by (1).

amehsan added a comment.EditedNov 1 2016, 6:24 PM

I opened a PR for this and copied the last couple of comments there. CC'ed Eric and Matt on the PR.

echristo added inline comments.Nov 1 2016, 6:25 PM
lib/CodeGen/TargetPassConfig.cpp
543–544

To elaborate more on this, I'm not sure we want to have the technical debt of even more ways to deal with this than we already have. I'm pretty unhappy with the existing aspects :)

MatzeB added a comment.Nov 1 2016, 6:28 PM

2- targetSchedulesPostRAScheduling

This seems redundant to me as soon as 1) is implemented.

Maybe I am missing something, but this seems different from (1). Note that addPostRASched is called from addMachinePasses which is not typically overriden by targets. SystemZ has a usecase for targetSchedulesPostRAScheduling which cannot be covered by (1).

Overriding addPostRASched() with an empty function has the same effect as returning true in targetSchedulesPostRAScheduling(), hasn't it?

2- targetSchedulesPostRAScheduling

This seems redundant to me as soon as 1) is implemented.

Maybe I am missing something, but this seems different from (1). Note that addPostRASched is called from addMachinePasses which is not typically overriden by targets. SystemZ has a usecase for targetSchedulesPostRAScheduling which cannot be covered by (1).

Overriding addPostRASched() with an empty function has the same effect as returning true in targetSchedulesPostRAScheduling(), hasn't it?

Correct :)

MatzeB added a comment.Nov 1 2016, 6:39 PM

2- targetSchedulesPostRAScheduling

This seems redundant to me as soon as 1) is implemented.

Maybe I am missing something, but this seems different from (1). Note that addPostRASched is called from addMachinePasses which is not typically overriden by targets. SystemZ has a usecase for targetSchedulesPostRAScheduling which cannot be covered by (1).

Overriding addPostRASched() with an empty function has the same effect as returning true in targetSchedulesPostRAScheduling(), hasn't it?

Correct :)

So we can remove targetSchedulesPostRAScheduling() and replace it with the empty function overload after adding addPostRASched().

2- targetSchedulesPostRAScheduling

This seems redundant to me as soon as 1) is implemented.

Maybe I am missing something, but this seems different from (1). Note that addPostRASched is called from addMachinePasses which is not typically overriden by targets. SystemZ has a usecase for targetSchedulesPostRAScheduling which cannot be covered by (1).

Overriding addPostRASched() with an empty function has the same effect as returning true in targetSchedulesPostRAScheduling(), hasn't it?

Correct :)

So we can remove targetSchedulesPostRAScheduling() and replace it with the empty function overload after adding addPostRASched().

That's a reasonable change for this patch. I'll wait to see Eric's comments well, before updating the patch though. Just to save time.

amehsan requested a review of this revision.Nov 3 2016, 8:41 AM
amehsan edited edge metadata.

I put this back in review as Eric wanted to take a look.

2- targetSchedulesPostRAScheduling

This seems redundant to me as soon as 1) is implemented.

Maybe I am missing something, but this seems different from (1). Note that addPostRASched is called from addMachinePasses which is not typically overriden by targets. SystemZ has a usecase for targetSchedulesPostRAScheduling which cannot be covered by (1).

Overriding addPostRASched() with an empty function has the same effect as returning true in targetSchedulesPostRAScheduling(), hasn't it?

Correct :)

So we can remove targetSchedulesPostRAScheduling() and replace it with the empty function overload after adding addPostRASched().

That's a reasonable change for this patch. I'll wait to see Eric's comments well, before updating the patch though. Just to save time.

Actually let's make this change in a follow up patch. That change will require other reviewers (from SystemZ). I will make it high priority after I commit this one, but let's finish this patch so I have P9 scheduling work checked in.

2- targetSchedulesPostRAScheduling

This seems redundant to me as soon as 1) is implemented.

Maybe I am missing something, but this seems different from (1). Note that addPostRASched is called from addMachinePasses which is not typically overriden by targets. SystemZ has a usecase for targetSchedulesPostRAScheduling which cannot be covered by (1).

Overriding addPostRASched() with an empty function has the same effect as returning true in targetSchedulesPostRAScheduling(), hasn't it?

Correct :)

So we can remove targetSchedulesPostRAScheduling() and replace it with the empty function overload after adding addPostRASched().

That's a reasonable change for this patch. I'll wait to see Eric's comments well, before updating the patch though. Just to save time.

Actually let's make this change in a follow up patch. That change will require other reviewers (from SystemZ). I will make it high priority after I commit this one, but let's finish this patch so I have P9 scheduling work checked in.

I believe this case is obvious enough that we could go without a review from a SystemZ person (of course if you have some SystemZ people in mind add them as subscribers/reviewers).

amehsan updated this revision to Diff 79767.Nov 30 2016, 9:33 AM
amehsan edited edge metadata.

Added the clean up step that we discussed. Also, made some changes in the comments in the code.

echristo edited edge metadata.Dec 9 2016, 6:44 PM

One inline request for a comment, otherwise LGTM. Thanks for working on this and your patience!

-eric

lib/Target/PowerPC/PPCTargetMachine.cpp
435

Needs a comment as to what's going on here.

amehsan accepted this revision.Dec 19 2016, 11:54 AM
amehsan added a reviewer: amehsan.

@echristo explicitly approved this in his last comment, but I think he forgot to change the action box. So I accept just for easier book-keeping. (Eric is in vacation, AFAIK)

This revision is now accepted and ready to land.Dec 19 2016, 11:54 AM
echristo accepted this revision.Jan 4 2017, 4:35 PM
echristo edited edge metadata.

Relatedly I think we're ok with this for now.

echristo commandeered this revision.Mar 30 2017, 7:26 PM
echristo removed a reviewer: echristo.

Ehsan has left the building so I'm going to take this over and get it committed.