This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Global ranking of specialisations
ClosedPublic

Authored by chill on Dec 5 2022, 10:14 AM.

Details

Summary

The FunctionSpecialization pass chooses specializations among the opportunities
presented by a single function and its calls, progressively penalizing subsequent
specialization attempts by artificially increasing the cost of a specialization, depending
on how many specialization were applied before. Thus the chosen specializations are
sensitive to the order the functions appear in the module and may be worse than
others, had those others been considered earlier.

This patch makes the FunctionSpecialization pass rank the specializations globally, i.e.
choose the "best" specializations amongst the all possible specializations
in the module, for all functions.

Since this involved quite a bit of redesign of the pass data structures, this patch also carries:

  • removal of duplicate specializations
  • optimization of call sites update, by collecting per specialization the list of call sites that can be directly rewritten, without prior expensive check if the call constants and their positions match those of the specialized function.

A bit of a write-up up about the FuncSpec data structures and operation:

Each potential function specialisation is kept in a single vector (AllSpecs in
FunctionSpecializer::run). This vector is populated by
FunctionSpecializer::findSpecializations.

The findSpecializations member function has a local DenseMap to eliminate
duplicates - with each call to the current function, findSpecializations builds a
specialisation signature (SpecSig) and looks it in the duplicates map. If the
signature is present, the function records the call to rewrite into the
existing specialisation instance. If the signature is absent, it means we have
a new specialisation instance - the function calculates the gain and creates a
new entry in AllSpecs. Negative gain specialisation are ignored at this
point, unless forced.

The potential specialisations for a function form a contiguous range in the
AllSpecs [1]. This range is recorded in SpecMap SM, so we can quickly find
all specialisations for a function.

Once we have all the potential specialisations with their gains we need to
choose the best ones, which fit in the module specialisation budget. This is
done by using a max-heap (std::make_heap, std::push_heap, etc) to find the
best NSpec specialisations with a single traversal of the AllSpecs
vector. The heap itself is contained with a small vector (BestSpecs) of
indices into AllSpecs, since elements of AllSpecs are a bit too heavy to
shuffle around.

Next the chosen specialisation are performed, that is, functions cloned,
SCCPSolver primed, and known call sites updated.

Then we run the SCCPSolver to propagate constants in the cloned functions,
after which we walk the calls of the original functions to update them to call
the specialised functions.


[1] This range may contain specialisation that were discarded and is not ordered
in any way. One alternative design is to keep a vector indices of all
specialisations for this function (which would initially be, i, i+1,
i+2, etc) and later sort them by gain, pushing non-applied ones to the
back. This has the potential to speed updateCallSites up.

Diff Detail

Event Timeline

chill created this revision.Dec 5 2022, 10:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 10:14 AM
chill requested review of this revision.Dec 5 2022, 10:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 10:14 AM
ChuanqiXu accepted this revision.Dec 5 2022, 10:59 PM

I didn't take part in the previous reviewing due to limited time. But this change is relatively independent and pretty good.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
402–403

The comment here didn't get changed. It would be better to copy the comments in the declaration to here or remove the comments here simply.

456–477
This revision is now accepted and ready to land.Dec 5 2022, 10:59 PM

Agreed, nice change, some nits inlined.

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
84

Nit: perhaps a more descriptive name. The original SpecInfo was a tiny bit better.

109–113

Nit: range of what exactly? I don't know what a "specialisations array" is.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
351–352

The inline cost used to be a fundamental part of the cost-model. In getSpecializationBonus, you will see calls to getInlineCost, which I hope takes increase in code size into account.

I've seen earlier revisions of the patch before it was on phabricator, so I am happy with its current form. My only worry was a bit of a slowdown in compile time for lencod but I don't have ideas to improve it. I'll let other people review :)

I will fix the strict weak ordering issue in D135463 and will upload a new revision for people to compare before we merge this change.

chill edited the summary of this revision. (Show Details)Dec 6 2022, 4:57 AM

Ok, I think we've found a bug from compiling lencod (llvmtestsuite CTMark):

2163 FnSpecialization: Function get_mem_mv , gain 19006
2164 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_bot_field_mb, i64 0, i32 17)
2165 FnSpecialization: Function get_mem_mv , gain 19006
2166 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_bot_field_mb, i64 0, i32 16)
2167 FnSpecialization: Function get_mem_mv , gain 19006
2168 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_top_field_mb, i64 0, i32 17)
2169 FnSpecialization: Function get_mem_mv , gain 19006
2170 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_top_field_mb, i64 0, i32 16)
2171 FnSpecialization: Function get_mem_mv , gain 19006
2172 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_bot_frame_mb, i64 0, i32 17)
2173 FnSpecialization: Function get_mem_mv , gain 19006
2174 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_bot_frame_mb, i64 0, i32 16)
2175 FnSpecialization: Function get_mem_mv , gain 19006
2176 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_top_frame_mb, i64 0, i32 17)
2177 FnSpecialization: Function get_mem_mv , gain 19006
2178 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_top_frame_mb, i64 0, i32 16)
2179 FnSpecialization: Function get_mem_mv , gain 19006
2180 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.ImageParameters, ptr @images, i64 0, i32 82)
2181 FnSpecialization: Function get_mem_mv , gain 19006
2182 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.ImageParameters, ptr @images, i64 0, i32 81)
2183 FnSpecialization: Function get_mem_mv , gain 19006
2184 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.ImageParameters, ptr @images, i64 0, i32 80)
2185 FnSpecialization: Function get_mem_mv , gain 19006
2186 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.ImageParameters, ptr @images, i64 0, i32 79)
...
2199 FnSpecialization: Specialized 12 functions in module ld-temp.o

All 12 specializations are coming from get_mem_mv, but only MaxClonesThreshold ought to be kept.

chill updated this revision to Diff 481234.Dec 8 2022, 4:05 AM
chill marked 3 inline comments as done.Dec 8 2022, 4:06 AM
chill added a comment.Dec 8 2022, 4:14 AM

All 12 specializations are coming from get_mem_mv, but only MaxClonesThreshold ought to be kept.

Well, yes and no.

For the "no" part:
We may well create more than MaxClonesThreshold specialisations for a single function, as long as the total number of specializations across all functions does not exceed NumCandidates * MaxClonesThreshold where NumCandidates is the number of functions with at least one specialisation.

For the "yes" part:
There was a bug computing NumCandidates in which resulted in the pass doing more specialisations than allowed in this test.

chill added a comment.Dec 8 2022, 4:26 AM

Latest update:

  • fixed the bug in calculating the specialisation budget
  • let the compiler choose the size of some SmallVectors
  • moved a few debug prints around
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
84

It's no easy to come with a more descriptive name, it's just a specialization and Spec / Specs is quite easy to pronounce.
I don't mind spelling it fully - Specialization.
As for "Info", "Data", "Attr", and similar adornments I support the idea that these suffixes/prefixes do not add any information and are just noise that needs to be avoided as much as possible. (cf. "Clean Code" )

labrinea accepted this revision.Dec 11 2022, 7:02 AM

Compilation times look way better now, thanks!

My measurements for llvm testsuite with [NewPM-O3 + LTO + FuncSpec=ON] of single threaded [user + system] time spent on IPSCCP show:

testnameDelta %
mafft-0.397
Bullet+0.075
consumer-typeset-1.931
SPASS-0.212
kimwitu++-0.891
ClamAV-0.694
sqlite3-0.320
lencod+0.527
7zip+0.063
tramp3d-v4-0.470
geomean-0.427

(measured best three runs by passing the -time-passes option to lld)

In term of Instruction Count of the total compilation (not of IPSCCP only) I am seeing:

testnameDelta %
ClamAV+0.082
7zip-0.023
tramp3d-v4-0.019
kimwitu++-0.010
sqlite3-0.040
mafft-0.013
lencod-0.006
SPASS-0.004
consumer-typeset+0.017
Bullet-0.020
geomean-0.004

All 12 specializations are coming from get_mem_mv, but only MaxClonesThreshold ought to be kept.

Well, yes and no.

For the "no" part:
We may well create more than MaxClonesThreshold specialisations for a single function, as long as the total number of specializations across all functions does not exceed NumCandidates * MaxClonesThreshold where NumCandidates is the number of functions with at least one specialisation.

For the "yes" part:
There was a bug computing NumCandidates in which resulted in the pass doing more specialisations than allowed in this test.

Apologiers for not being clear. That's what I meant: NumCandidates=1 ( function get_mem_mv() ) for lencod and so we should be keeping (NumCandidates=1) x (MaxClonesThreshold=3). Thanks for fixing the issue with NumCandidates :)

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
237–244

Can we move these in the corresponding header files, where they are declared as friends? I am seeing a warning when I compile:

warning: ‘llvm::hash_code llvm::hash_value(const llvm::ArgInfo&)’ has not been declared within ‘llvm’
warning: ‘llvm::hash_code llvm::hash_value(const llvm::SpecSig&)’ has not been declared within ‘llvm’

246–258

Should we maybe move this one too inside FunctionSpecialization.h ?

453–454

Please remove whitespace.

labrinea added inline comments.Dec 11 2022, 7:23 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
733

typo

chill added inline comments.Dec 14 2022, 3:45 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
237–244

The friend declarations in the header file refer to names in the innermost enclosing namespace (https://eel.is/c++draft/dcl.meaning#general-2.3 ) , which is llvm in this case. GCC seems a bit overzealous here, but
I don't mind moving these functions to the header, in spirit they are quite like operator==, etc.

246–258

What would we gain from it?

This is purely an implementation detail that is best kept away from header files.

chill updated this revision to Diff 482802.Dec 14 2022, 4:00 AM
chill marked 6 inline comments as done.
labrinea accepted this revision.Dec 14 2022, 4:21 AM
This revision was landed with ongoing or failed builds.Dec 14 2022, 7:35 AM
This revision was automatically updated to reflect the committed changes.