The FunctionSpecialization pass chooses specializations among the opportunities
presented by a single function and its calls, progressively penalizing subsequent
specialization attempts by artificially increasing the cost of a specialization, depending
on how many specialization were applied before. Thus the chosen specializations are
sensitive to the order the functions appear in the module and may be worse than
others, had those others been considered earlier.
This patch makes the FunctionSpecialization pass rank the specializations globally, i.e.
choose the "best" specializations amongst the all possible specializations
in the module, for all functions.
Since this involved quite a bit of redesign of the pass data structures, this patch also carries:
- removal of duplicate specializations
- optimization of call sites update, by collecting per specialization the list of call sites that can be directly rewritten, without prior expensive check if the call constants and their positions match those of the specialized function.
A bit of a write-up up about the FuncSpec data structures and operation:
Each potential function specialisation is kept in a single vector (AllSpecs in
FunctionSpecializer::run). This vector is populated by
FunctionSpecializer::findSpecializations.
The findSpecializations member function has a local DenseMap to eliminate
duplicates - with each call to the current function, findSpecializations builds a
specialisation signature (SpecSig) and looks it in the duplicates map. If the
signature is present, the function records the call to rewrite into the
existing specialisation instance. If the signature is absent, it means we have
a new specialisation instance - the function calculates the gain and creates a
new entry in AllSpecs. Negative gain specialisation are ignored at this
point, unless forced.
The potential specialisations for a function form a contiguous range in the
AllSpecs [1]. This range is recorded in SpecMap SM, so we can quickly find
all specialisations for a function.
Once we have all the potential specialisations with their gains we need to
choose the best ones, which fit in the module specialisation budget. This is
done by using a max-heap (std::make_heap, std::push_heap, etc) to find the
best NSpec specialisations with a single traversal of the AllSpecs
vector. The heap itself is contained with a small vector (BestSpecs) of
indices into AllSpecs, since elements of AllSpecs are a bit too heavy to
shuffle around.
Next the chosen specialisation are performed, that is, functions cloned,
SCCPSolver primed, and known call sites updated.
Then we run the SCCPSolver to propagate constants in the cloned functions,
after which we walk the calls of the original functions to update them to call
the specialised functions.
[1] This range may contain specialisation that were discarded and is not ordered
in any way. One alternative design is to keep a vector indices of all
specialisations for this function (which would initially be, i, i+1,
i+2, etc) and later sort them by gain, pushing non-applied ones to the
back. This has the potential to speed updateCallSites up.
Nit: perhaps a more descriptive name. The original SpecInfo was a tiny bit better.