Page MenuHomePhabricator

[FuncSpec] Do not generate multiple copies for identical specializations.
AbandonedPublic

Authored by labrinea on Oct 7 2022, 10:15 AM.

Details

Summary

Fix for the problem shown in D135459.

Diff Detail

Event Timeline

labrinea created this revision.Oct 7 2022, 10:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 10:15 AM
labrinea requested review of this revision.Oct 7 2022, 10:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 10:15 AM
chill added inline comments.Oct 10 2022, 3:47 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
427

Avoid C-style casts. It should work as hash_value(ArrayRef<ArgInfo>(Info.Args));

429

This is not correct, if the hash values clash two different specialisation will be regarded as the same and one will be discarded.

430

Discarding duplicates should happen before we decide which of the possible specialisations to perform. For example, in calculateGains the MaxClonedThreshold limit is applied on a list, which potentially contains duplicates.

435

This looks like a big redundancy, we traverse all the call sites of F for each specialisation while all we need is traverse all the call sites of F once and redirect the call site to the correct specialisation.

440

So the idea here is, when specializing on the second parameter only, to turn

void g(int x, int y) {
...
  g(x, y);
...
}
...
g(1, 2);

into

void g.1(int x, /* unused */ int y) {
...
  g.1(x, 2);
...
}
...
g.1(1, 2);

But the same ought to happen for:

void g(int x, int y) {
...
  g(x, 2);
...
}
...
g(1, 2);

and it looks to me the test will miss it because it will compare 2 (from the call argument) to y in the cloned function.

(Similar issue in the original code as well).

chill added inline comments.Oct 10 2022, 7:01 AM
llvm/include/llvm/Transforms/Utils/SCCPSolver.h
59

I don't really see the point in creating the temporary pair.

return hash_combine(Info.Formal, Info.Actual)
labrinea updated this revision to Diff 475864.Nov 16 2022, 10:18 AM

This adds just a 0.03% geomean overhead in terms of instruction count for CTMark (llvmtestsuite O3+NewPM) over the parent revision D126455 (with specialization of functions enabled).

chill added inline comments.Nov 17 2022, 1:56 AM
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
97

This has to be a DenseSet. std::set has logarithmic complexity of operations.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
393

Here evaluation of the gain occurs even if the specialization is a duplicate.
I would expect getSpecializationBonus to take non-negligible time.

455

(nit) This function return Close twice, once as a return value and once as an out argument. A cleaner design would be to just return the value as the function does not depend on the Clone member variable and need not know anything about it.

labrinea added inline comments.Nov 17 2022, 5:47 AM
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
97

DenseSet is more expensive to iterate on. I have another patch which sorts the specializations globally (across module, not per function) where I keep the specializations sorted in a std::multiset and I use std::merge to unite the two sets. That's relatively cheap as it doesn't perform copies of objects but shuffles pointers around. Also that multiset is iterated a lot more than the set I am adding here. The compexity is logarithmic to the number of callsites, still not a huge number in my opinion.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
393

This is already happening, but good point there's room for improvement.

455

I cached the Clone pointer to the SpecInfo for the parent revision D126455, so that updateCallSites() does not rely on two vectors (clones, specialziations) being in sync. So you are suggesting to not set the member variable here, but after returning from this function?

chill added inline comments.Nov 18 2022, 8:45 AM
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
97

This should be a comparison function which induces strict weak ordering on the elements.

labrinea updated this revision to Diff 481216.Dec 8 2022, 2:40 AM

Added strict weak ordering for the elements inserted to std::set.

labrinea added inline comments.Dec 8 2022, 2:51 AM
llvm/lib/Transforms/IPO/SCCP.cpp
46

oops

labrinea abandoned this revision.Dec 12 2022, 8:40 AM

This patch achieves the same as D139346 with ~0.001% (geomean) increase of Instruction Count for CTMark (LTO) or ~0.033% (geomean) increase of execution time (user+system) for the IPSCCP pass. I am abandoning this patch in favor of D139346.