Page MenuHomePhabricator

[LLVM] [SCCP] [WIP] : Add Function Specialization pass
Needs ReviewPublic

Authored by mivnay on Dec 27 2020, 7:25 AM.

Details

Summary

This patch adds Function specialization pass to LLVM. The constant parameters (like function pointers, numerical constants) are propagated to the callee by specializing the function. It also handles specialization of the recursive functions. This change is just to give the full context for patch D93762. The pass is still missing the profitability.

The pass is based on the Function specialization proposed here.

Performance: There are significant gains in two SPEC CPU 2017 intrate benchmarks. I am yet to do the complete SPEC run and measure the speed /size change.

Diff Detail

Event Timeline

mivnay created this revision.Dec 27 2020, 7:25 AM
mivnay requested review of this revision.Dec 27 2020, 7:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2020, 7:25 AM
xbolva00 added a subscriber: xbolva00.EditedDec 27 2020, 9:32 AM

Great! So something like gcc function spliting/cloning? (It would be good to support noipa, noclone attribute to disable this optimalization)

It also handles specialization of the recursive functions

So you should see gains for exchange2, right? For this case, this decision should be driven by PGO data in the future.

dongAxis1944 added inline comments.Jan 18 2021, 7:32 PM
lib/Transforms/IPO/FunctionSpecialization.cpp
209

i think cloneCandidateFunction might become ‘compile time killer’, think the following case:

f(int arg1, int arg2) {...}
g(1, 2);

this patch might clone 'f' twice. But i think it might be avoid.

hi @mivnay, could you please rebase this and D93762 ? So we can test it.
It say that this patch can improve mcf 8%, we are interested at this because we also found that mcf has performance regression caused by ILP data dependency.

kuter added a subscriber: kuter.Jan 24 2021, 7:49 AM

Hi @mivnay,

As the approach here is pretty much identical to D36432, do you have any plans on the cost model for this pass? The earlier proposal seems to have been abandoned due to a lack of good heuristics to trade off between the (potentially huge) increase in code size (as well as compile time) and performance improvements (which may or may not arise due to later optimizations). Perhaps it would help to look at what trade-offs GCC makes? Specifically, it would be good to have an idea of how you would implement getSpecializationCost and getSpecializationBonus.

To help with review, could you make sure this patch is based on top of D93762 so that those changes don't appear here again?

Thanks!

lib/Transforms/IPO/FunctionSpecialization.cpp
209

It looks like the outer loop would terminate on the first argument that can be used for specialization. So for your example, assuming there is only one call to f we would first specialize into

f.arg1(int arg2) { arg1 = 1; ... }
f.arg1(2);

and on the next call to specializeFunctions into

f.arg1.arg2() { arg1 = 1; arg2 = 2; ... }
f.arg1.arg2();

In general, yes, this pass has the potential to impact compile time significantly, especially if there are many possible (known constant) values for some arguments. That is why a good cost model (isArgumentInteresting) is important.

dongAxis1944 added inline comments.Wed, Jan 27, 6:50 AM
lib/Transforms/IPO/FunctionSpecialization.cpp
209

yes, you right. do you have any plan to construct a call graph(not likely scc in llvm) to calculate all the possible value of arguments? Then clone and simplify the function, i think it might faster than this.

mivnay added a comment.EditedWed, Jan 27, 10:19 PM

Hi @mivnay,

As the approach here is pretty much identical to D36432, do you have any plans on the cost model for this pass? The earlier proposal seems to have been abandoned due to a lack of good heuristics to trade off between the (potentially huge) increase in code size (as well as compile time) and performance improvements (which may or may not arise due to later optimizations). Perhaps it would help to look at what trade-offs GCC makes? Specifically, it would be good to have an idea of how you would implement getSpecializationCost and getSpecializationBonus.

To help with review, could you make sure this patch is based on top of D93762 so that those changes don't appear here again?

Thanks!

Hi,

Thanks for looking into the patch. I have uploaded this patch just as a reference for the review D93762. The pass is not complete and might have issues. As you mentioned, the code is identical to D36432 but we are trying to add a new pass (which can be invoked optionally) instead of tight integration with the SCCP functions. We are trying to utilize the SCCPSolver by moving it to the header file. Can you please look at D393762?

fhahn added a comment.Thu, Feb 11, 8:03 AM

Hi @mivnay,

As the approach here is pretty much identical to D36432, do you have any plans on the cost model for this pass? The earlier proposal seems to have been abandoned due to a lack of good heuristics to trade off between the (potentially huge) increase in code size (as well as compile time) and performance improvements (which may or may not arise due to later optimizations). Perhaps it would help to look at what trade-offs GCC makes? Specifically, it would be good to have an idea of how you would implement getSpecializationCost and getSpecializationBonus.

To help with review, could you make sure this patch is based on top of D93762 so that those changes don't appear here again?

Thanks!

Hi,

Thanks for looking into the patch. I have uploaded this patch just as a reference for the review D93762. The pass is not complete and might have issues. As you mentioned, the code is identical to D36432 but we are trying to add a new pass (which can be invoked optionally) instead of tight integration with the SCCP functions. We are trying to utilize the SCCPSolver by moving it to the header file. Can you please look at D393762?

IICU the review for D393762 raised some issues with the approach and also has comments with a few alternatives that may be viable to explore instead of function specialization? I am a bit reluctant to add a pass without a clear path towards enabling without causing regressions in general, especially when there are potential alternatives that may not suffer from the same problem.

mivnay added a comment.EditedSat, Feb 20, 4:07 AM

hi @mivnay, could you please rebase this and D93762 ? So we can test it.
It say that this patch can improve mcf 8%, we are interested at this because we also found that mcf has performance regression caused by ILP data dependency.

Hi,

Hi @mivnay,

As the approach here is pretty much identical to D36432, do you have any plans on the cost model for this pass? The earlier proposal seems to have been abandoned due to a lack of good heuristics to trade off between the (potentially huge) increase in code size (as well as compile time) and performance improvements (which may or may not arise due to later optimizations). Perhaps it would help to look at what trade-offs GCC makes? Specifically, it would be good to have an idea of how you would implement getSpecializationCost and getSpecializationBonus.

To help with review, could you make sure this patch is based on top of D93762 so that those changes don't appear here again?

Thanks!

Hi,

Thanks for looking into the patch. I have uploaded this patch just as a reference for the review D93762. The pass is not complete and might have issues. As you mentioned, the code is identical to D36432 but we are trying to add a new pass (which can be invoked optionally) instead of tight integration with the SCCP functions. We are trying to utilize the SCCPSolver by moving it to the header file. Can you please look at D393762?

IICU the review for D393762 raised some issues with the approach and also has comments with a few alternatives that may be viable to explore instead of function specialization? I am a bit reluctant to add a pass without a clear path towards enabling without causing regressions in general, especially when there are potential alternatives that may not suffer from the same problem.

@fhahn I am not sure if there was any conclusion with respect to the different approach. Are you referring to "integrating with inlining"? D36432 looked at specializing the function pointer types only and relied heavily on inlining the specialized function arguments. We are trying to accept other types as well. We are currently running spec with various architectures (AArch64 / X86) to enable the pass by default by having an aggressive mode option as well.