This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
3/3
FunctionSpecialization.cpp
-
test/Transforms/FunctionSpecialization/
-
Transforms/
-
FunctionSpecialization/
-
specialization-order.ll

Differential D136180

[FuncSpec] Compute specialisation gain even when forcing specialisation
ClosedPublic

Authored by chill on Oct 18 2022, 8:26 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
ChuanqiXu
labrinea

Commits

rG606d25e545d3: [FuncSpec] Compute specialisation gain even when forcing specialisation

Summary

When rewriting the call sites to call the new specialised functions, a
single call site can be matched by two different specialisations - a
"less specialised" version of the function and a "more specialised"
version of the function, e.g. for a function

void f(int x, int y)

the call like f(1, 2) could be matched by either

void f.1(int x /* int y == 2 */);

void f.2(/* int x == 1, int y == 2 */);

The FunctionSpecialisation pass tries to match specialisation in the
order of decreasing gain, so "more specialised" functions are
preferred to "less specialised" functions. This breaks, however, when
using the flag -force-function-specialization, in which case the
cost/benefit analysis is not performed and all the specialisations are
equally preferable.

This patch makes the pass calculate specialisation gain and order the
specialisations accordingly even when -force-function-specialization
is used, under the assumption that this flag has purely debugging
purpose and it is reasonable to ignore the extra computing effort it
incurs.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

chill created this revision.Oct 18 2022, 8:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 8:26 AM

Herald added subscribers: snehasish, ormris, hiraditya. · View Herald Transcript

chill requested review of this revision.Oct 18 2022, 8:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 8:26 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

chill added a parent revision: D135893: [FuncSpec] Fix specialisation based on literals.Oct 18 2022, 8:26 AM

chill added a child revision: D135968: [FuncSpec][NFC] Refactor finding specialisation opportunities .

chill added reviewers: SjoerdMeijer, ChuanqiXu, labrinea.

Harbormaster completed remote builds in B192764: Diff 468558.Oct 18 2022, 9:56 AM

labrinea added inline comments.Oct 18 2022, 2:15 PM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
456–458	I can see that you are not removing unprofitable specializations, which means you've preserved the "force" semantics of the command line option. Have you checked whether any of the existing tests breaks with this change?

The patch itself look good to me. And it looks like the fundamental problem is that we'll match different specializations with a single callsties. Although we workaround it, it'll be better solve the root problems.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
456–458	It looks like we can preserve the current semantics since we don't check the Gains further. It is a little bit odd if we'll specialize the callosities of which has Gain less than 0. But the author explains "assumption that this flag has purely debugging purpose and it is reasonable to ignore the extra computing effort it incurs." And this explanation looks reasonable to me.

In D136180#3867123, @ChuanqiXu wrote:

The patch itself look good to me. And it looks like the fundamental problem is that we'll match different specializations with a single callsties. Although we workaround it, it'll be better solve the root problems.

I don't think it's a problem at all, it's just what is, for a call site with n constants, there are 2^n-1 possible specialisations, that match the call.
They would have different cost/benefit and we will choose one with the biggest gain.

The cost function must be such that if we have two specialisations

s_0(c_0, c_1, ..., c_n, a_0, a_1, ..., a_m)

and

s_1(d_0, d_1, .., d_k, b_0, b_1, .., b_p)

(where n + m == k + p, c_i, d_i - constants, a_i, b_i - formal parameters, parameters reordered without loss of generality)

such that

{c_0, c_1, ..., c_n} < {d_0, d_1, ..., d_m}

then s_1 has a better cost than s_0.

That leaves as with specialisations over disjoint sets of constants, and, indeed, some of these may have equal gain, but then there is no reason to prefer one over the other, so we choose the one that deterministically comes first.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
456–458	Have you checked whether any of the existing tests breaks with this change? No test breaks (which might mean there is lack of test coverage).

chill marked 2 inline comments as done.Oct 19 2022, 2:45 AM

In D136180#3867731, @chill wrote:

I don't think it's a problem at all, it's just what is, for a call site with n constants, there are 2^n-1 possible specialisations, that match the call.

But what's the case that we'll get worse gain with more constants? I mean, if a callsite has n constants, then we can get the gain for this callsite by specializing all the constants. For a callsite foo(1, 2), in what case it will be better to specialize it as foo_1(2) than foo_1_2()? Hope I express my meaning clearly.

In D136180#3867775, @ChuanqiXu wrote:

In D136180#3867731, @chill wrote:

I don't think it's a problem at all, it's just what is, for a call site with n constants, there are 2^n-1 possible specialisations, that match the call.

For a callsite foo(1, 2), in what case it will be better to specialize it as foo_1(2) than foo_1_2()?

I don't think there is such a case where foo_1(2) would be better than foo_1_2(). As explained in my comment above, foo_1(2) is specialised over a subset
of the constants over which foo_1_2() is specialised, thus foo_1_2() must have better cost, will appear first in the list and will be used to rewrite the call site.

Note that these considerations pertain for the case when we have already created several matching specialisations *from several call sites* and must choose which ones to apply.

It is a separate question given an individual call site, which specialisation to create. And we create the one over the greatest number of constants, as any other one would have worse gain.

Ideally the ordering in which the specializations appear should have nothing to do with how we decide to match them with the callsites. However, it looks like the ordering does matter as far as rewriteCallSites() is concerned. Your suggestion to use the cost model to address this issue seems to work, but we need to come up with a better way of associating specializations with callsites. That said I am ok with the change as an intermediate solution.

In D136180#3868139, @labrinea wrote:

Ideally the ordering in which the specializations appear should have nothing to do with how we decide to match them with the callsites. However, it looks like the ordering does matter as far as rewriteCallSites() is concerned. Your suggestion to use the cost model to address this issue seems to work, but we need to come up with a better way of associating specializations with callsites. That said I am ok with the change as an intermediate solution.

Well, we may change the way call sites and specialisations are matched. I have an alternative design for this, which explicitly maintains an association between a specialisation and the call sites used to generate it (which means it's the best specialisation for these call sites).

But as long as we have the current algorithm and data structures, IMHO the approach with comparing gain in fact works, not just "appear to work", as explained by my comment above.

Do you see any fault in the reasoning there?

I don't see any fault, the reasoning makes sense. I just wanted to emphasize the need to address the specialization-callsite association. The reassurance that you're working towards that direction is all I wanted before we make this change.

This revision is now accepted and ready to land.Oct 19 2022, 6:50 AM

@chill I mean, for a callsite with n constants, we don't need to calculate the cost for other cases, it should be good to calculate the cost for specializing all the n constants. And in this way, we can avoid to have multiple specialization info for a single call site in the process. It should save the time and avoid problems like this one. But this may be chatty talk since as you said, it is not possible to make this unless we make a big refactoring. So this is not a requirement to this patch of course. This may be day-1 defect. And this patch itself looks good.

chill added a child revision: D136332: [FuncSpec][NFC] Avoid redundant computations of DominatorTree/LoopInfo.Oct 20 2022, 3:56 AM

chill removed a child revision: D136332: [FuncSpec][NFC] Avoid redundant computations of DominatorTree/LoopInfo.

This revision was landed with ongoing or failed builds.Oct 26 2022, 2:17 AM

Closed by commit rG606d25e545d3: [FuncSpec] Compute specialisation gain even when forcing specialisation (authored by chill). · Explain Why

This revision was automatically updated to reflect the committed changes.

chill added a commit: rG606d25e545d3: [FuncSpec] Compute specialisation gain even when forcing specialisation.

chill removed a child revision: D135968: [FuncSpec][NFC] Refactor finding specialisation opportunities .Oct 27 2022, 3:04 AM

chill removed a parent revision: D135893: [FuncSpec] Fix specialisation based on literals.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

10 lines

test/

Transforms/

FunctionSpecialization/

specialization-order.ll

58 lines

Diff 470751

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

Show First 20 Lines • Show All 440 Lines • ▼ Show 20 Lines	for (Argument &FormalArg : F->args()) {
for (const auto &Entry : ActualArgs) {		for (const auto &Entry : ActualArgs) {
CallBase *Call = Entry.first;		CallBase *Call = Entry.first;
Constant *ActualArg = Entry.second;		Constant *ActualArg = Entry.second;

auto I = Specializations.insert({Call, SpecializationInfo()});		auto I = Specializations.insert({Call, SpecializationInfo()});
SpecializationInfo &S = I.first->second;		SpecializationInfo &S = I.first->second;

if (I.second)		if (I.second)
S.Gain = ForceFunctionSpecialization ? 1 : 0 - Cost;		S.Gain = 0 - Cost;
if (!ForceFunctionSpecialization)
S.Gain += getSpecializationBonus(&FormalArg, ActualArg);		S.Gain += getSpecializationBonus(&FormalArg, ActualArg);
S.Args.push_back({&FormalArg, ActualArg});		S.Args.push_back({&FormalArg, ActualArg});
}		}
}		}

// Remove unprofitable specializations.		// Remove unprofitable specializations.
		if (!ForceFunctionSpecialization)
Specializations.remove_if(		Specializations.remove_if(
[](const auto &Entry) { return Entry.second.Gain <= 0; });		[](const auto &Entry) { return Entry.second.Gain <= 0; });
		labrineaUnsubmitted Done Reply Inline Actions I can see that you are not removing unprofitable specializations, which means you've preserved the "force" semantics of the command line option. Have you checked whether any of the existing tests breaks with this change? labrinea: I can see that you are not removing unprofitable specializations, which means you've preserved…
		ChuanqiXuUnsubmitted Done Reply Inline Actions It looks like we can preserve the current semantics since we don't check the Gains further. It is a little bit odd if we'll specialize the callosities of which has Gain less than 0. But the author explains "assumption that this flag has purely debugging purpose and it is reasonable to ignore the extra computing effort it incurs." And this explanation looks reasonable to me. ChuanqiXu: It looks like we can preserve the current semantics since we don't check the Gains further. It…
		chillAuthorUnsubmitted Done Reply Inline Actions Have you checked whether any of the existing tests breaks with this change? No test breaks (which might mean there is lack of test coverage). chill: > Have you checked whether any of the existing tests breaks with this change? No test breaks…

// Clear the MapVector and return the underlying vector.		// Clear the MapVector and return the underlying vector.
WorkList = Specializations.takeVector();		WorkList = Specializations.takeVector();

// Sort the candidates in descending order.		// Sort the candidates in descending order.
llvm::stable_sort(WorkList, [](const auto &L, const auto &R) {		llvm::stable_sort(WorkList, [](const auto &L, const auto &R) {
return L.second.Gain > R.second.Gain;		return L.second.Gain > R.second.Gain;
});		});
▲ Show 20 Lines • Show All 508 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/specialization-order.ll

This file was added.

				; RUN: opt -S --passes=function-specialization,deadargelim -force-function-specialization < %s \| FileCheck %s
				define dso_local i32 @add(i32 %x, i32 %y) {
				entry:
				%add = add nsw i32 %y, %x
				ret i32 %add
				}

				define dso_local i32 @sub(i32 %x, i32 %y) {
				entry:
				%sub = sub nsw i32 %x, %y
				ret i32 %sub
				}

				define internal i32 @f(i32 %x, i32 %y, ptr %u, ptr %v) noinline {
				entry:
				%call = tail call i32 %u(i32 %x, i32 %y)
				%call1 = tail call i32 %v(i32 %x, i32 %y)
				%mul = mul nsw i32 %call1, %call
				ret i32 %mul
				}

				define dso_local i32 @g0(i32 %x, i32 %y) {
				; CHECK-LABEL: @g0
				; CHECK: call i32 @f.2(i32 [[X:%.]], i32 [[Y:%.]])
				entry:
				%call = tail call i32 @f(i32 %x, i32 %y, ptr @add, ptr @add)
				ret i32 %call
				}


				define dso_local i32 @g1(i32 %x, i32 %y) {
				; CHECK-LABEL: @g1(
				; CHECK: call i32 @f.1(i32 [[X:%.]], i32 [[Y:%.]])
				entry:
				%call = tail call i32 @f(i32 %x, i32 %y, ptr @sub, ptr @add)
				ret i32 %call
				}

				define dso_local i32 @g2(i32 %x, i32 %y, ptr %v) {
				; CHECK-LABEL @g2
				; CHECK call i32 @f.3(i32 [[X:%.]], i32 [[Y:%.]], ptr [[V:%.*]])
				entry:
				%call = tail call i32 @f(i32 %x, i32 %y, ptr @sub, ptr %v)
				ret i32 %call
				}

				; CHECK-LABEL: define {{.*}} i32 @f.1
				; CHECK: call i32 @sub(i32 %x, i32 %y)
				; CHECK-NEXT: call i32 @add(i32 %x, i32 %y)

				; CHECK-LABEL: define {{.*}} i32 @f.2
				; CHECK: call i32 @add(i32 %x, i32 %y)
				; CHECK-NEXT call i32 @add(i32 %x, i32 %y)

				; CHECK-LABEL: define {{.*}} i32 @f.3
				; CHECK: call i32 @sub(i32 %x, i32 %y)
				; CHECK-NEXT: call i32 %v(i32 %x, i32 %y)