This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Compute specialisation gain even when forcing specialisation
ClosedPublic

Authored by chill on Oct 18 2022, 8:26 AM.

Details

Summary

When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g. for a function

void f(int x, int y)

the call like f(1, 2) could be matched by either

void f.1(int x /* int y == 2 */);

or

void f.2(/* int x == 1, int y == 2 */);

The FunctionSpecialisation pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag -force-function-specialization, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.

This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when -force-function-specialization
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.

Diff Detail

Event Timeline

chill created this revision.Oct 18 2022, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 8:26 AM
chill requested review of this revision.Oct 18 2022, 8:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 8:26 AM
labrinea added inline comments.Oct 18 2022, 2:15 PM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
456–458

I can see that you are not removing unprofitable specializations, which means you've preserved the "force" semantics of the command line option. Have you checked whether any of the existing tests breaks with this change?

The patch itself look good to me. And it looks like the fundamental problem is that we'll match different specializations with a single callsties. Although we workaround it, it'll be better solve the root problems.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
456–458

It looks like we can preserve the current semantics since we don't check the Gains further. It is a little bit odd if we'll specialize the callosities of which has Gain less than 0. But the author explains "assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it incurs." And this explanation looks reasonable to me.

chill added a comment.Oct 19 2022, 2:44 AM

The patch itself look good to me. And it looks like the fundamental problem is that we'll match different specializations with a single callsties. Although we workaround it, it'll be better solve the root problems.

I don't think it's a problem at all, it's just what is, for a call site with n constants, there are 2^n-1 possible specialisations, that match the call.
They would have different cost/benefit and we will choose one with the biggest gain.

The cost function must be such that if we have two specialisations

s_0(c_0, c_1, ..., c_n, a_0, a_1, ..., a_m)

and

s_1(d_0, d_1, .., d_k, b_0, b_1, .., b_p)

(where n + m == k + p, c_i, d_i - constants, a_i, b_i - formal parameters, parameters reordered without loss of generality)

such that

{c_0, c_1, ..., c_n} < {d_0, d_1, ..., d_m}

then s_1 has a better cost than s_0.

That leaves as with specialisations over disjoint sets of constants, and, indeed, some of these may have equal gain, but then there is no reason to prefer one over the other, so we choose the one that deterministically comes first.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
456–458

Have you checked whether any of the existing tests breaks with this change?

No test breaks (which might mean there is lack of test coverage).

chill marked 2 inline comments as done.Oct 19 2022, 2:45 AM

I don't think it's a problem at all, it's just what is, for a call site with n constants, there are 2^n-1 possible specialisations, that match the call.

But what's the case that we'll get worse gain with more constants? I mean, if a callsite has n constants, then we can get the gain for this callsite by specializing all the constants. For a callsite foo(1, 2), in what case it will be better to specialize it as foo_1(2) than foo_1_2()? Hope I express my meaning clearly.

chill added a comment.Oct 19 2022, 3:13 AM

I don't think it's a problem at all, it's just what is, for a call site with n constants, there are 2^n-1 possible specialisations, that match the call.

For a callsite foo(1, 2), in what case it will be better to specialize it as foo_1(2) than foo_1_2()?

I don't think there is such a case where foo_1(2) would be better than foo_1_2(). As explained in my comment above, foo_1(2) is specialised over a subset
of the constants over which foo_1_2() is specialised, thus foo_1_2() must have better cost, will appear first in the list and will be used to rewrite the call site.

Note that these considerations pertain for the case when we have already created several matching specialisations *from several call sites* and must choose which ones to apply.

It is a separate question given an individual call site, which specialisation to create. And we create the one over the greatest number of constants, as any other one would have worse gain.

Ideally the ordering in which the specializations appear should have nothing to do with how we decide to match them with the callsites. However, it looks like the ordering does matter as far as rewriteCallSites() is concerned. Your suggestion to use the cost model to address this issue seems to work, but we need to come up with a better way of associating specializations with callsites. That said I am ok with the change as an intermediate solution.

chill added a comment.Oct 19 2022, 6:33 AM

Ideally the ordering in which the specializations appear should have nothing to do with how we decide to match them with the callsites. However, it looks like the ordering does matter as far as rewriteCallSites() is concerned. Your suggestion to use the cost model to address this issue seems to work, but we need to come up with a better way of associating specializations with callsites. That said I am ok with the change as an intermediate solution.

Well, we may change the way call sites and specialisations are matched. I have an alternative design for this, which explicitly maintains an association between a specialisation and the call sites used to generate it (which means it's the best specialisation for these call sites).

But as long as we have the current algorithm and data structures, IMHO the approach with comparing gain in fact works, not just "appear to work", as explained by my comment above.

Do you see any fault in the reasoning there?

labrinea accepted this revision.Oct 19 2022, 6:50 AM

I don't see any fault, the reasoning makes sense. I just wanted to emphasize the need to address the specialization-callsite association. The reassurance that you're working towards that direction is all I wanted before we make this change.

This revision is now accepted and ready to land.Oct 19 2022, 6:50 AM
ChuanqiXu accepted this revision.Oct 19 2022, 7:14 PM

@chill I mean, for a callsite with n constants, we don't need to calculate the cost for other cases, it should be good to calculate the cost for specializing all the n constants. And in this way, we can avoid to have multiple specialization info for a single call site in the process. It should save the time and avoid problems like this one. But this may be chatty talk since as you said, it is not possible to make this unless we make a big refactoring. So this is not a requirement to this patch of course. This may be day-1 defect. And this patch itself looks good.

This revision was landed with ongoing or failed builds.Oct 26 2022, 2:17 AM
This revision was automatically updated to reflect the committed changes.