This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/
-
llvm/
-
Transforms/
-
IPO/
3/3
FunctionSpecialization.h
-
Utils/
-
SCCPSolver.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
9/9
FunctionSpecialization.cpp
-
test/Transforms/FunctionSpecialization/
-
Transforms/
-
FunctionSpecialization/
-
global-rank.ll
-
identical-specializations.ll
-
specialization-order.ll
-
specialize-multiple-arguments.ll

Differential D139346

[FuncSpec] Global ranking of specialisations
ClosedPublic

Authored by chill on Dec 5 2022, 10:14 AM.

Download Raw Diff

Details

Reviewers

ChuanqiXu
SjoerdMeijer
labrinea
fhahn

Commits

rGe6b9fc4c8be0: [FuncSpec] Global ranking of specialisations

Summary

The FunctionSpecialization pass chooses specializations among the opportunities
presented by a single function and its calls, progressively penalizing subsequent
specialization attempts by artificially increasing the cost of a specialization, depending
on how many specialization were applied before. Thus the chosen specializations are
sensitive to the order the functions appear in the module and may be worse than
others, had those others been considered earlier.

This patch makes the FunctionSpecialization pass rank the specializations globally, i.e.
choose the "best" specializations amongst the all possible specializations
in the module, for all functions.

Since this involved quite a bit of redesign of the pass data structures, this patch also carries:

removal of duplicate specializations
optimization of call sites update, by collecting per specialization the list of call sites that can be directly rewritten, without prior expensive check if the call constants and their positions match those of the specialized function.

A bit of a write-up up about the FuncSpec data structures and operation:

Each potential function specialisation is kept in a single vector (AllSpecs in
FunctionSpecializer::run). This vector is populated by
FunctionSpecializer::findSpecializations.

The findSpecializations member function has a local DenseMap to eliminate
duplicates - with each call to the current function, findSpecializations builds a
specialisation signature (SpecSig) and looks it in the duplicates map. If the
signature is present, the function records the call to rewrite into the
existing specialisation instance. If the signature is absent, it means we have
a new specialisation instance - the function calculates the gain and creates a
new entry in AllSpecs. Negative gain specialisation are ignored at this
point, unless forced.

The potential specialisations for a function form a contiguous range in the
AllSpecs [1]. This range is recorded in SpecMap SM, so we can quickly find
all specialisations for a function.

Once we have all the potential specialisations with their gains we need to
choose the best ones, which fit in the module specialisation budget. This is
done by using a max-heap (std::make_heap, std::push_heap, etc) to find the
best NSpec specialisations with a single traversal of the AllSpecs
vector. The heap itself is contained with a small vector (BestSpecs) of
indices into AllSpecs, since elements of AllSpecs are a bit too heavy to
shuffle around.

Next the chosen specialisation are performed, that is, functions cloned,
SCCPSolver primed, and known call sites updated.

Then we run the SCCPSolver to propagate constants in the cloned functions,
after which we walk the calls of the original functions to update them to call
the specialised functions.

[1] This range may contain specialisation that were discarded and is not ordered
in any way. One alternative design is to keep a vector indices of all
specialisations for this function (which would initially be, i, i+1,
i+2, etc) and later sort them by gain, pushing non-applied ones to the
back. This has the potential to speed updateCallSites up.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

chill created this revision.Dec 5 2022, 10:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 10:14 AM

Herald added subscribers: snehasish, ormris, hiraditya. · View Herald Transcript

chill requested review of this revision.Dec 5 2022, 10:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 5 2022, 10:14 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

chill added a parent revision: D126455: [FuncSpec] Make the Function Specializer part of the IPSCCP pass..Dec 5 2022, 10:15 AM

chill added reviewers: ChuanqiXu, SjoerdMeijer, labrinea, fhahn.

Harbormaster completed remote builds in B201150: Diff 480151.Dec 5 2022, 3:53 PM

I didn't take part in the previous reviewing due to limited time. But this change is relatively independent and pretty good.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
402–403	The comment here didn't get changed. It would be better to copy the comments in the declaration to here or remove the comments here simply.
456–477

This revision is now accepted and ready to land.Dec 5 2022, 10:59 PM

Agreed, nice change, some nits inlined.

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
84	Nit: perhaps a more descriptive name. The original SpecInfo was a tiny bit better.
109–113	Nit: range of what exactly? I don't know what a "specialisations array" is.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
351–352	The inline cost used to be a fundamental part of the cost-model. In getSpecializationBonus, you will see calls to getInlineCost, which I hope takes increase in code size into account.

I've seen earlier revisions of the patch before it was on phabricator, so I am happy with its current form. My only worry was a bit of a slowdown in compile time for lencod but I don't have ideas to improve it. I'll let other people review :)

I will fix the strict weak ordering issue in D135463 and will upload a new revision for people to compare before we merge this change.

chill edited the summary of this revision. (Show Details)Dec 6 2022, 4:57 AM

Ok, I think we've found a bug from compiling lencod (llvmtestsuite CTMark):

2163 FnSpecialization: Function get_mem_mv , gain 19006
2164 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_bot_field_mb, i64 0, i32 17)
2165 FnSpecialization: Function get_mem_mv , gain 19006
2166 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_bot_field_mb, i64 0, i32 16)
2167 FnSpecialization: Function get_mem_mv , gain 19006
2168 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_top_field_mb, i64 0, i32 17)
2169 FnSpecialization: Function get_mem_mv , gain 19006
2170 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_top_field_mb, i64 0, i32 16)
2171 FnSpecialization: Function get_mem_mv , gain 19006
2172 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_bot_frame_mb, i64 0, i32 17)
2173 FnSpecialization: Function get_mem_mv , gain 19006
2174 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_bot_frame_mb, i64 0, i32 16)
2175 FnSpecialization: Function get_mem_mv , gain 19006
2176 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_top_frame_mb, i64 0, i32 17)
2177 FnSpecialization: Function get_mem_mv , gain 19006
2178 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.RD_DATA, ptr @rddata_top_frame_mb, i64 0, i32 16)
2179 FnSpecialization: Function get_mem_mv , gain 19006
2180 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.ImageParameters, ptr @images, i64 0, i32 82)
2181 FnSpecialization: Function get_mem_mv , gain 19006
2182 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.ImageParameters, ptr @images, i64 0, i32 81)
2183 FnSpecialization: Function get_mem_mv , gain 19006
2184 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.ImageParameters, ptr @images, i64 0, i32 80)
2185 FnSpecialization: Function get_mem_mv , gain 19006
2186 FnSpecialization:   FormalArg = %0, ActualArg = getelementptr inbounds (%struct.ImageParameters, ptr @images, i64 0, i32 79)
...
2199 FnSpecialization: Specialized 12 functions in module ld-temp.o

All 12 specializations are coming from get_mem_mv, but only MaxClonesThreshold ought to be kept.

chill updated this revision to Diff 481234.Dec 8 2022, 4:05 AM

Harbormaster completed remote builds in B201933: Diff 481234.Dec 8 2022, 4:05 AM

chill marked 3 inline comments as done.Dec 8 2022, 4:06 AM

In D139346#3980918, @labrinea wrote:

All 12 specializations are coming from get_mem_mv, but only MaxClonesThreshold ought to be kept.

Well, yes and no.

For the "no" part:
We may well create more than MaxClonesThreshold specialisations for a single function, as long as the total number of specializations across all functions does not exceed NumCandidates * MaxClonesThreshold where NumCandidates is the number of functions with at least one specialisation.

For the "yes" part:
There was a bug computing NumCandidates in which resulted in the pass doing more specialisations than allowed in this test.

Latest update:

fixed the bug in calculating the specialisation budget
let the compiler choose the size of some SmallVectors
moved a few debug prints around

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
84	It's no easy to come with a more descriptive name, it's just a specialization and `Spec` / `Specs` is quite easy to pronounce. I don't mind spelling it fully - `Specialization`. As for "Info", "Data", "Attr", and similar adornments I support the idea that these suffixes/prefixes do not add any information and are just noise that needs to be avoided as much as possible. (cf. "Clean Code" )

Compilation times look way better now, thanks!

My measurements for llvm testsuite with [NewPM-O3 + LTO + FuncSpec=ON] of single threaded [user + system] time spent on IPSCCP show:

testname	Delta %
mafft	-0.397
Bullet	+0.075
consumer-typeset	-1.931
SPASS	-0.212
kimwitu++	-0.891
ClamAV	-0.694
sqlite3	-0.320
lencod	+0.527
7zip	+0.063
tramp3d-v4	-0.470
geomean	-0.427

(measured best three runs by passing the -time-passes option to lld)

In term of Instruction Count of the total compilation (not of IPSCCP only) I am seeing:

testname	Delta %
ClamAV	+0.082
7zip	-0.023
tramp3d-v4	-0.019
kimwitu++	-0.010
sqlite3	-0.040
mafft	-0.013
lencod	-0.006
SPASS	-0.004
consumer-typeset	+0.017
Bullet	-0.020
geomean	-0.004

In D139346#3981106, @chill wrote:

In D139346#3980918, @labrinea wrote:

All 12 specializations are coming from get_mem_mv, but only MaxClonesThreshold ought to be kept.

Well, yes and no.

For the "no" part:
We may well create more than MaxClonesThreshold specialisations for a single function, as long as the total number of specializations across all functions does not exceed NumCandidates * MaxClonesThreshold where NumCandidates is the number of functions with at least one specialisation.

For the "yes" part:
There was a bug computing NumCandidates in which resulted in the pass doing more specialisations than allowed in this test.

Apologiers for not being clear. That's what I meant: NumCandidates=1 ( function get_mem_mv() ) for lencod and so we should be keeping (NumCandidates=1) x (MaxClonesThreshold=3). Thanks for fixing the issue with NumCandidates :)

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
237–244	Can we move these in the corresponding header files, where they are declared as friends? I am seeing a warning when I compile: warning: ‘llvm::hash_code llvm::hash_value(const llvm::ArgInfo&)’ has not been declared within ‘llvm’ warning: ‘llvm::hash_code llvm::hash_value(const llvm::SpecSig&)’ has not been declared within ‘llvm’
246–258	Should we maybe move this one too inside `FunctionSpecialization.h` ?
453–454	Please remove whitespace.

labrinea added inline comments.Dec 11 2022, 7:23 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
733	typo

labrinea mentioned this in D135463: [FuncSpec] Do not generate multiple copies for identical specializations..Dec 12 2022, 8:40 AM

chill added inline comments.Dec 14 2022, 3:45 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
237–244	The friend declarations in the header file refer to names in the innermost enclosing namespace (https://eel.is/c++draft/dcl.meaning#general-2.3 ) , which is `llvm` in this case. GCC seems a bit overzealous here, but I don't mind moving these functions to the header, in spirit they are quite like `operator==`, etc.
246–258	What would we gain from it? This is purely an implementation detail that is best kept away from header files.

chill updated this revision to Diff 482802.Dec 14 2022, 4:00 AM

chill marked 6 inline comments as done.

labrinea accepted this revision.Dec 14 2022, 4:21 AM

Harbormaster completed remote builds in B203086: Diff 482802.Dec 14 2022, 5:18 AM

This revision was landed with ongoing or failed builds.Dec 14 2022, 7:35 AM

Closed by commit rGe6b9fc4c8be0: [FuncSpec] Global ranking of specialisations (authored by chill). · Explain Why

This revision was automatically updated to reflect the committed changes.

chill added a commit: rGe6b9fc4c8be0: [FuncSpec] Global ranking of specialisations.

labrinea mentioned this in D140210: [IPSCCP] Enable specialization of functions..Dec 16 2022, 9:42 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

FunctionSpecialization.h

88 lines

Utils/

SCCPSolver.h

12 lines

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

283 lines

test/

Transforms/

FunctionSpecialization/

global-rank.ll

51 lines

identical-specializations.ll

24 lines

specialization-order.ll

16 lines

specialize-multiple-arguments.ll

22 lines

Diff 482854

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar/SCCP.h"		#include "llvm/Transforms/Scalar/SCCP.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/SCCPSolver.h"		#include "llvm/Transforms/Utils/SCCPSolver.h"
#include "llvm/Transforms/Utils/SizeOpts.h"		#include "llvm/Transforms/Utils/SizeOpts.h"

using namespace llvm;		using namespace llvm;

namespace llvm {		namespace llvm {
// Bookkeeping struct to pass data from the analysis and profitability phase		// Specialization signature, used to uniquely designate a specialization within
// to the actual transform helper functions.		// a function.
struct SpecializationInfo {		struct SpecSig {
SmallVector<ArgInfo, 8> Args; // Stores the {formal,actual} argument pairs.		// Hashing support, used to distinguish between ordinary, empty, or tombstone
InstructionCost Gain; // Profitability: Gain = Bonus - Cost.		// keys.
Function *Clone; // The definition of the specialized function.		unsigned Key = 0;
		SmallVector<ArgInfo, 4> Args;

		bool operator==(const SpecSig &Other) const {
		if (Key != Other.Key \|\| Args.size() != Other.Args.size())
		return false;
		for (size_t I = 0; I < Args.size(); ++I)
		if (Args[I] != Other.Args[I])
		return false;
		return true;
		}

		friend hash_code hash_value(const SpecSig &S) {
		return hash_combine(hash_value(S.Key),
		hash_combine_range(S.Args.begin(), S.Args.end()));
		}
		};
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: perhaps a more descriptive name. The original SpecInfo was a tiny bit better. SjoerdMeijer: Nit: perhaps a more descriptive name. The original SpecInfo was a tiny bit better.
		chillAuthorUnsubmitted Done Reply Inline Actions It's no easy to come with a more descriptive name, it's just a specialization and `Spec` / `Specs` is quite easy to pronounce. I don't mind spelling it fully - `Specialization`. As for "Info", "Data", "Attr", and similar adornments I support the idea that these suffixes/prefixes do not add any information and are just noise that needs to be avoided as much as possible. (cf. "Clean Code" ) chill: It's no easy to come with a more descriptive name, it's just a specialization and `Spec` /…

		// Specialization instance.
		struct Spec {
		// Original function.
		Function *F;

		// Cloned function, a specialized version of the original one.
		Function *Clone = nullptr;

		// Specialization signature.
		SpecSig Sig;

		// Profitability of the specialization.
		InstructionCost Gain;

		// List of call sites, matching this specialization.
		SmallVector<CallBase *> CallSites;

		Spec(Function *F, const SpecSig &S, InstructionCost G)
		: F(F), Sig(S), Gain(G) {}
		Spec(Function *F, const SpecSig &&S, InstructionCost G)
		: F(F), Sig(S), Gain(G) {}
};		};

using CallSpecBinding = std::pair<CallBase *, SpecializationInfo>;		// Map of potential specializations for each function. The FunctionSpecializer
// We are using MapVector because it guarantees deterministic iteration		// keeps the discovered specialisation opportunities for the module in a single
// order across executions.		// vector, where the specialisations of each function form a contiguous range.
using SpecializationMap = SmallMapVector<CallBase *, SpecializationInfo, 8>;		// This map's value is the beginning and the end of that range.
		using SpecMap = DenseMap<Function *, std::pair<unsigned, unsigned>>;
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: range of what exactly? I don't know what a "specialisations array" is. SjoerdMeijer: Nit: range of what exactly? I don't know what a "specialisations array" is.

class FunctionSpecializer {		class FunctionSpecializer {

/// The IPSCCP Solver.		/// The IPSCCP Solver.
SCCPSolver &Solver;		SCCPSolver &Solver;

Module &M;		Module &M;

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	private:
void removeDeadFunctions();		void removeDeadFunctions();

/// Remove any ssa_copy intrinsics that may have been introduced.		/// Remove any ssa_copy intrinsics that may have been introduced.
void cleanUpSSA();		void cleanUpSSA();

// Compute the code metrics for function \p F.		// Compute the code metrics for function \p F.
CodeMetrics &analyzeFunction(Function *F);		CodeMetrics &analyzeFunction(Function *F);

/// This function decides whether it's worthwhile to specialize function		/// @brief Find potential specialization opportunities.
/// \p F based on the known constant values its arguments can take on. It		/// @param F Function to specialize
/// only discovers potential specialization opportunities without actually		/// @param Cost Cost of specializing a function. Final gain is this cost
/// applying them.		/// minus benefit
///		/// @param AllSpecs A vector to add potential specializations to.
/// \returns true if any specializations have been found.		/// @param SM A map for a function's specialisation range
		/// @return True, if any potential specializations were found
bool findSpecializations(Function *F, InstructionCost Cost,		bool findSpecializations(Function *F, InstructionCost Cost,
SmallVectorImpl<CallSpecBinding> &WorkList);		SmallVectorImpl<Spec> &AllSpecs, SpecMap &SM);

bool isCandidateFunction(Function *F);		bool isCandidateFunction(Function *F);

Function createSpecialization(Function F, CallSpecBinding &Specialization);		/// @brief Create a specialization of \p F and prime the SCCPSolver
		/// @param F Function to specialize
		/// @param S Which specialization to create
		/// @return The new, cloned function
		Function createSpecialization(Function F, const SpecSig &S);

/// Compute and return the cost of specializing function \p F.		/// Compute and return the cost of specializing function \p F.
InstructionCost getSpecializationCost(Function *F);		InstructionCost getSpecializationCost(Function *F);

/// Compute a bonus for replacing argument \p A with constant \p C.		/// Compute a bonus for replacing argument \p A with constant \p C.
InstructionCost getSpecializationBonus(Argument A, Constant C,		InstructionCost getSpecializationBonus(Argument A, Constant C,
const LoopInfo &LI);		const LoopInfo &LI);

/// Determine if it is possible to specialise the function for constant values		/// Determine if it is possible to specialise the function for constant values
/// of the formal parameter \p A.		/// of the formal parameter \p A.
bool isArgumentInteresting(Argument *A);		bool isArgumentInteresting(Argument *A);

/// Check if the value \p V (an actual argument) is a constant or can only		/// Check if the value \p V (an actual argument) is a constant or can only
/// have a constant value. Return that constant.		/// have a constant value. Return that constant.
Constant getCandidateConstant(Value V);		Constant getCandidateConstant(Value V);

/// Redirects callsites of function \p F to its specialized copies.		/// @brief Find and update calls to \p F, which match a specialization
void updateCallSites(Function *F,		/// @param F Orginal function
SmallVectorImpl<CallSpecBinding> &Specializations);		/// @param Begin Start of a range of possibly matching specialisations
		/// @param End End of a range (exclusive) of possibly matching specialisations
		void updateCallSites(Function F, const Spec Begin, const Spec *End);
};		};
} // namespace llvm		} // namespace llvm

#endif // LLVM_TRANSFORMS_IPO_FUNCTIONSPECIALIZATION_H		#endif // LLVM_TRANSFORMS_IPO_FUNCTIONSPECIALIZATION_H

llvm/include/llvm/Transforms/Utils/SCCPSolver.h

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	struct AnalysisResultsForFn {
LoopInfo *LI;		LoopInfo *LI;
};		};

/// Helper struct shared between Function Specialization and SCCP Solver.		/// Helper struct shared between Function Specialization and SCCP Solver.
struct ArgInfo {		struct ArgInfo {
Argument *Formal; // The Formal argument being analysed.		Argument *Formal; // The Formal argument being analysed.
Constant *Actual; // A corresponding actual constant argument.		Constant *Actual; // A corresponding actual constant argument.

ArgInfo(Argument F, Constant A) : Formal(F), Actual(A){};		ArgInfo(Argument F, Constant A) : Formal(F), Actual(A) {}

		bool operator==(const ArgInfo &Other) const {
		return Formal == Other.Formal && Actual == Other.Actual;
		}

		bool operator!=(const ArgInfo &Other) const { return !(*this == Other); }

		friend hash_code hash_value(const ArgInfo &A) {
		return hash_combine(hash_value(A.Formal), hash_value(A.Actual));
		}
};		};

class SCCPInstVisitor;		class SCCPInstVisitor;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// SCCPSolver - This interface class is a general purpose solver for Sparse		/// SCCPSolver - This interface class is a general purpose solver for Sparse
/// Conditional Constant Propagation (SCCP).		/// Conditional Constant Propagation (SCCP).
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines

} }

/// Remove any ssa_copy intrinsics that may have been introduced. /// Remove any ssa_copy intrinsics that may have been introduced.

void FunctionSpecializer::cleanUpSSA() { void FunctionSpecializer::cleanUpSSA() {

for (Function *F : SpecializedFuncs) for (Function *F : SpecializedFuncs)

removeSSACopy(*F); removeSSACopy(*F);

} }

template <> struct llvm::DenseMapInfo<SpecSig> {

static inline SpecSig getEmptyKey() { return {~0U, {}}; }

static inline SpecSig getTombstoneKey() { return {~1U, {}}; }

static unsigned getHashValue(const SpecSig &S) {

return static_cast<unsigned>(hash_value(S));

labrineaUnsubmitted

Done

Can we move these in the corresponding header files, where they are declared as friends? I am seeing a warning when I compile:

warning: ‘llvm::hash_code llvm::hash_value(const llvm::ArgInfo&)’ has not been declared within ‘llvm’
warning: ‘llvm::hash_code llvm::hash_value(const llvm::SpecSig&)’ has not been declared within ‘llvm’

labrinea: Can we move these in the corresponding header files, where they are declared as friends? I am…

chillAuthorUnsubmitted

Done

The friend declarations in the header file refer to names in the innermost enclosing namespace (https://eel.is/c++draft/dcl.meaning#general-2.3 ) , which is llvm in this case. GCC seems a bit overzealous here, but
I don't mind moving these functions to the header, in spirit they are quite like operator==, etc.

chill: The friend declarations in the header file refer to names in the innermost enclosing namespace…

}

static bool isEqual(const SpecSig &LHS, const SpecSig &RHS) {

return LHS == RHS;

}

};

/// Attempt to specialize functions in the module to enable constant /// Attempt to specialize functions in the module to enable constant

/// propagation across function boundaries. /// propagation across function boundaries.

/// ///

/// \returns true if at least one function is specialized. /// \returns true if at least one function is specialized.

bool FunctionSpecializer::run() { bool FunctionSpecializer::run() {

bool Changed = false; // Find possible specializations for each function.

SpecMap SM;

labrineaUnsubmitted

Done

Should we maybe move this one too inside FunctionSpecialization.h ?

labrinea: Should we maybe move this one too inside `FunctionSpecialization.h` ?

chillAuthorUnsubmitted

Done

What would we gain from it?

This is purely an implementation detail that is best kept away from header files.

chill: What would we gain from it? This is purely an implementation detail that is best kept away…

SmallVector<Spec, 32> AllSpecs;

unsigned NumCandidates = 0;

for (Function &F : M) { for (Function &F : M) {

if (!isCandidateFunction(&F)) if (!isCandidateFunction(&F))

continue; continue;

auto Cost = getSpecializationCost(&F); auto Cost = getSpecializationCost(&F);

if (!Cost.isValid()) { if (!Cost.isValid()) {

LLVM_DEBUG(dbgs() << "FnSpecialization: Invalid specialization cost.\n"); LLVM_DEBUG(dbgs() << "FnSpecialization: Invalid specialization cost for "

<< F.getName() << "\n");

continue; continue;

} }

LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for " LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for "

<< F.getName() << " is " << Cost << "\n"); << F.getName() << " is " << Cost << "\n");

SmallVector<CallSpecBinding, 8> Specializations; if (!findSpecializations(&F, Cost, AllSpecs, SM)) {

if (!findSpecializations(&F, Cost, Specializations)) {

LLVM_DEBUG( LLVM_DEBUG(

dbgs() << "FnSpecialization: No possible specializations found\n"); dbgs() << "FnSpecialization: No possible specializations found for "

<< F.getName() << "\n");

continue; continue;

} }

Changed = true; ++NumCandidates;

}

if (!NumCandidates) {

LLVM_DEBUG(

dbgs()

<< "FnSpecialization: No possible specializations found in module\n");

return false;

}

// Choose the most profitable specialisations, which fit in the module

// specialization budget, which is derived from maximum number of

// specializations per specialization candidate function.

auto CompareGain = [&AllSpecs](unsigned I, unsigned J) {

return AllSpecs[I].Gain > AllSpecs[J].Gain;

};

const unsigned NSpecs =

std::min(NumCandidates * MaxClonesThreshold, unsigned(AllSpecs.size()));

SmallVector<unsigned> BestSpecs(NSpecs + 1);

std::iota(BestSpecs.begin(), BestSpecs.begin() + NSpecs, 0);

if (AllSpecs.size() > NSpecs) {

LLVM_DEBUG(dbgs() << "FnSpecialization: Number of candidates exceed "

<< "the maximum number of clones threshold.\n"

<< "FnSpecialization: Specializing the "

<< NSpecs

<< " most profitable candidates.\n");

std::make_heap(BestSpecs.begin(), BestSpecs.begin() + NSpecs, CompareGain);

for (unsigned I = NSpecs, N = AllSpecs.size(); I < N; ++I) {

BestSpecs[NSpecs] = I;

std::push_heap(BestSpecs.begin(), BestSpecs.end(), CompareGain);

std::pop_heap(BestSpecs.begin(), BestSpecs.end(), CompareGain);

}

LLVM_DEBUG(dbgs() << "FnSpecialization: List of specializations \n";

for (unsigned I = 0; I < NSpecs; ++I) {

const Spec &S = AllSpecs[BestSpecs[I]];

dbgs() << "FnSpecialization: Function " << S.F->getName()

<< " , gain " << S.Gain << "\n";

for (const ArgInfo &Arg : S.Sig.Args)

dbgs() << "FnSpecialization: FormalArg = "

<< Arg.Formal->getNameOrAsOperand()

<< ", ActualArg = " << Arg.Actual->getNameOrAsOperand()

<< "\n";

});

// Create the chosen specializations.

SmallPtrSet<Function *, 8> OriginalFuncs;

SmallVector<Function *> Clones;

for (unsigned I = 0; I < NSpecs; ++I) {

Spec &S = AllSpecs[BestSpecs[I]];

S.Clone = createSpecialization(S.F, S.Sig);

// Update the known call sites to call the clone.

for (CallBase *Call : S.CallSites) {

LLVM_DEBUG(dbgs() << "FnSpecialization: Redirecting " << *Call

<< " to call " << S.Clone->getName() << "\n");

Call->setCalledFunction(S.Clone);

}

SmallVector<Function *, 4> Clones; Clones.push_back(S.Clone);

for (CallSpecBinding &Specialization : Specializations) OriginalFuncs.insert(S.F);

Clones.push_back(createSpecialization(&F, Specialization)); }

Solver.solveWhileResolvedUndefsIn(Clones); Solver.solveWhileResolvedUndefsIn(Clones);

updateCallSites(&F, Specializations);

// Update the rest of the call sites - these are the recursive calls, calls

// to discarded specialisations and calls that may match a specialisation

// after the solver runs.

for (Function *F : OriginalFuncs) {

auto [Begin, End] = SM[F];

SjoerdMeijerUnsubmitted

Done

The inline cost used to be a fundamental part of the cost-model. In getSpecializationBonus, you will see calls to getInlineCost, which I hope takes increase in code size into account.

SjoerdMeijer: The inline cost used to be a fundamental part of the cost-model. In getSpecializationBonus, you…

updateCallSites(F, AllSpecs.begin() + Begin, AllSpecs.begin() + End);

} }

promoteConstantStackValues(); promoteConstantStackValues();

LLVM_DEBUG(if (NbFunctionsSpecialized) dbgs() LLVM_DEBUG(if (NbFunctionsSpecialized) dbgs()

<< "FnSpecialization: Specialized " << NbFunctionsSpecialized << "FnSpecialization: Specialized " << NbFunctionsSpecialized

<< " functions in module " << M.getName() << "\n"); << " functions in module " << M.getName() << "\n");

NumFuncSpecialized += NbFunctionsSpecialized; NumFuncSpecialized += NbFunctionsSpecialized;

return Changed; return true;

} }

void FunctionSpecializer::removeDeadFunctions() { void FunctionSpecializer::removeDeadFunctions() {

for (Function *F : FullySpecialized) { for (Function *F : FullySpecialized) {

LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead function " LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead function "

<< F->getName() << "\n"); << F->getName() << "\n");

if (FAM) if (FAM)

FAM->clear(*F, F->getName()); FAM->clear(*F, F->getName());

Show All 23 Lines

/// Clone the function \p F and remove the ssa_copy intrinsics added by /// Clone the function \p F and remove the ssa_copy intrinsics added by

/// the SCCPSolver in the cloned version. /// the SCCPSolver in the cloned version.

static Function *cloneCandidateFunction(Function *F) { static Function *cloneCandidateFunction(Function *F) {

ValueToValueMapTy Mappings; ValueToValueMapTy Mappings;

Function *Clone = CloneFunction(F, Mappings); Function *Clone = CloneFunction(F, Mappings);

removeSSACopy(*Clone); removeSSACopy(*Clone);

return Clone; return Clone;

} }

/// This function decides whether it's worthwhile to specialize function bool FunctionSpecializer::findSpecializations(Function *F, InstructionCost Cost,

ChuanqiXuUnsubmitted

Done

The comment here didn't get changed. It would be better to copy the comments in the declaration to here or remove the comments here simply.

ChuanqiXu: The comment here didn't get changed. It would be better to copy the comments in the declaration…

/// \p F based on the known constant values its arguments can take on. It SmallVectorImpl<Spec> &AllSpecs,

/// only discovers potential specialization opportunities without actually SpecMap &SM) {

/// applying them. // A mapping from a specialisation signature to the index of the respective

/// // entry in the all specialisation array. Used to ensure uniqueness of

/// \returns true if any specializations have been found. // specialisations.

bool FunctionSpecializer::findSpecializations( DenseMap<SpecSig, unsigned> UM;

Function *F, InstructionCost Cost,

SmallVectorImpl<CallSpecBinding> &WorkList) {

// Get a list of interesting arguments. // Get a list of interesting arguments.

SmallVector<Argument *, 4> Args; SmallVector<Argument *> Args;

for (Argument &Arg : F->args()) for (Argument &Arg : F->args())

if (isArgumentInteresting(&Arg)) if (isArgumentInteresting(&Arg))

Args.push_back(&Arg); Args.push_back(&Arg);

if (!Args.size()) if (Args.empty())

return false; return false;

// Find all the call sites for the function. bool Found = false;

SpecializationMap Specializations;

for (User *U : F->users()) { for (User *U : F->users()) {

if (!isa<CallInst>(U) && !isa<InvokeInst>(U)) if (!isa<CallInst>(U) && !isa<InvokeInst>(U))

continue; continue;

auto &CS = *cast<CallBase>(U); auto &CS = *cast<CallBase>(U);

// Skip irrelevant users. // The user instruction does not call our function.

if (CS.getCalledFunction() != F) if (CS.getCalledFunction() != F)

continue; continue;

// If the call site has attribute minsize set, that callsite won't be // If the call site has attribute minsize set, that callsite won't be

// specialized. // specialized.

if (CS.hasFnAttr(Attribute::MinSize)) if (CS.hasFnAttr(Attribute::MinSize))

continue; continue;

// If the parent of the call site will never be executed, we don't need // If the parent of the call site will never be executed, we don't need

// to worry about the passed value. // to worry about the passed value.

if (!Solver.isBlockExecutable(CS.getParent())) if (!Solver.isBlockExecutable(CS.getParent()))

continue; continue;

// Examine arguments and create specialization candidates from call sites // Examine arguments and create a specialisation candidate from the

// with constant arguments. // constant operands of this call site.

bool Added = false; SpecSig S;

for (Argument *A : Args) { for (Argument *A : Args) {

Constant *C = getCandidateConstant(CS.getArgOperand(A->getArgNo())); Constant *C = getCandidateConstant(CS.getArgOperand(A->getArgNo()));

if (!C) if (!C)

continue; continue;

LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting argument "

if (!Added) { << A->getName() << " : " << C->getNameOrAsOperand()

Specializations[&CS] = {{}, 0 - Cost, nullptr}; << "\n");

Added = true;

}

SpecializationInfo &S = Specializations.back().second;

S.Gain += getSpecializationBonus(A, C, Solver.getLoopInfo(*F));

S.Args.push_back({A, C}); S.Args.push_back({A, C});

} }

Added = false;

}

// Remove unprofitable specializations. if (S.Args.empty())

if (!ForceFunctionSpecialization) continue;

labrineaUnsubmitted

Done

Please remove whitespace.

labrinea: Please remove whitespace.

Specializations.remove_if(

[](const auto &Entry) { return Entry.second.Gain <= 0; });

// Clear the MapVector and return the underlying vector.

WorkList = Specializations.takeVector();

// Sort the candidates in descending order.

llvm::stable_sort(WorkList, [](const auto &L, const auto &R) {

return L.second.Gain > R.second.Gain;

});

// Truncate the worklist to 'MaxClonesThreshold' candidates if necessary. // Check if we have encountered the same specialisation already.

if (WorkList.size() > MaxClonesThreshold) { if (auto It = UM.find(S); It != UM.end()) {

LLVM_DEBUG(dbgs() << "FnSpecialization: Number of candidates exceed " // Existing specialisation. Add the call to the list to rewrite, unless

<< "the maximum number of clones threshold.\n" // it's a recursive call. A specialisation, generated because of a

<< "FnSpecialization: Truncating worklist to " // recursive call may end up as not the best specialisation for all

<< MaxClonesThreshold << " candidates.\n"); // the cloned instances of this call, which result from specialising

WorkList.erase(WorkList.begin() + MaxClonesThreshold, WorkList.end()); // functions. Hence we don't rewrite the call directly, but match it with

// the best specialisation once all specialisations are known.

if (CS.getFunction() == F)

continue;

const unsigned Index = It->second;

AllSpecs[Index].CallSites.push_back(&CS);

} else {

// Calculate the specialisation gain.

InstructionCost Gain = 0 - Cost;

for (ArgInfo &A : S.Args)

Gain +=

getSpecializationBonus(A.Formal, A.Actual, Solver.getLoopInfo(*F));

// Discard unprofitable specialisations.

if (!ForceFunctionSpecialization && Gain <= 0)

continue;

ChuanqiXuUnsubmitted

Done

S.Args.push_back({A, C});

}

- if (!S.Args.size())

+ if (S.Args.empty())

continue;

ChuanqiXu:

// Create a new specialisation entry.

auto &Spec = AllSpecs.emplace_back(F, S, Gain);

if (CS.getFunction() != F)

Spec.CallSites.push_back(&CS);

const unsigned Index = AllSpecs.size() - 1;

UM[S] = Index;

if (auto [It, Inserted] = SM.try_emplace(F, Index, Index + 1); !Inserted)

It->second.second = Index + 1;

Found = true;

}

} }

LLVM_DEBUG(dbgs() << "FnSpecialization: Specializations for function " return Found;

<< F->getName() << "\n";

for (const auto &Entry

: WorkList) {

dbgs() << "FnSpecialization: Gain = " << Entry.second.Gain

<< "\n";

for (const ArgInfo &Arg : Entry.second.Args)

dbgs() << "FnSpecialization: FormalArg = "

<< Arg.Formal->getNameOrAsOperand()

<< ", ActualArg = " << Arg.Actual->getNameOrAsOperand()

<< "\n";

});

return !WorkList.empty();

} }

bool FunctionSpecializer::isCandidateFunction(Function *F) { bool FunctionSpecializer::isCandidateFunction(Function *F) {

if (F->isDeclaration()) if (F->isDeclaration())

return false; return false;

if (F->hasFnAttribute(Attribute::NoDuplicate)) if (F->hasFnAttribute(Attribute::NoDuplicate))

return false; return false;

Show All 19 Lines bool FunctionSpecializer::isCandidateFunction(Function *F) {

if (F->hasFnAttribute(Attribute::AlwaysInline)) if (F->hasFnAttribute(Attribute::AlwaysInline))

return false; return false;

LLVM_DEBUG(dbgs() << "FnSpecialization: Try function: " << F->getName() LLVM_DEBUG(dbgs() << "FnSpecialization: Try function: " << F->getName()

<< "\n"); << "\n");

return true; return true;

} }

Function * Function *FunctionSpecializer::createSpecialization(Function *F, const SpecSig &S) {

FunctionSpecializer::createSpecialization(Function *F,

CallSpecBinding &Specialization) {

Function *Clone = cloneCandidateFunction(F); Function *Clone = cloneCandidateFunction(F);

Specialization.second.Clone = Clone;

// Initialize the lattice state of the arguments of the function clone, // Initialize the lattice state of the arguments of the function clone,

// marking the argument on which we specialized the function constant // marking the argument on which we specialized the function constant

// with the given value. // with the given value.

Solver.markArgInFuncSpecialization(Clone, Specialization.second.Args); Solver.markArgInFuncSpecialization(Clone, S.Args);

Solver.addArgumentTrackedFunction(Clone); Solver.addArgumentTrackedFunction(Clone);

Solver.markBlockExecutable(&Clone->front()); Solver.markBlockExecutable(&Clone->front());

// Mark all the specialized functions // Mark all the specialized functions

SpecializedFuncs.insert(Clone); SpecializedFuncs.insert(Clone);

NbFunctionsSpecialized++; NbFunctionsSpecialized++;

Show All 9 Lines InstructionCost FunctionSpecializer::getSpecializationCost(Function *F) {

// inlined so that we shouldn't specialize it. // inlined so that we shouldn't specialize it.

if (Metrics.notDuplicatable || !Metrics.NumInsts.isValid() || if (Metrics.notDuplicatable || !Metrics.NumInsts.isValid() ||

(!ForceFunctionSpecialization && (!ForceFunctionSpecialization &&

!F->hasFnAttribute(Attribute::NoInline) && !F->hasFnAttribute(Attribute::NoInline) &&

Metrics.NumInsts < SmallFunctionThreshold)) Metrics.NumInsts < SmallFunctionThreshold))

return InstructionCost::getInvalid(); return InstructionCost::getInvalid();

// Otherwise, set the specialization cost to be the cost of all the // Otherwise, set the specialization cost to be the cost of all the

// instructions in the function and penalty for specializing more functions. // instructions in the function.

unsigned Penalty = NbFunctionsSpecialized + 1; return Metrics.NumInsts * InlineConstants::getInstrCost();

return Metrics.NumInsts * InlineConstants::getInstrCost() * Penalty;

} }

static InstructionCost getUserBonus(User *U, llvm::TargetTransformInfo &TTI, static InstructionCost getUserBonus(User *U, llvm::TargetTransformInfo &TTI,

const LoopInfo &LI) { const LoopInfo &LI) {

auto *I = dyn_cast_or_null<Instruction>(U); auto *I = dyn_cast_or_null<Instruction>(U);

// If not an instruction we do not know how to evaluate. // If not an instruction we do not know how to evaluate.

// Keep minimum possible cost for now so that it doesnt affect // Keep minimum possible cost for now so that it doesnt affect

// specialization. // specialization.

▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines if (A->hasByValAttr() && !A->getParent()->onlyReadsMemory())

return false; return false;

// Check the lattice value and decide if we should attemt to specialize, // Check the lattice value and decide if we should attemt to specialize,

// based on this argument. No point in specialization, if the lattice value // based on this argument. No point in specialization, if the lattice value

// is already a constant. // is already a constant.

const ValueLatticeElement &LV = Solver.getLatticeValueFor(A); const ValueLatticeElement &LV = Solver.getLatticeValueFor(A);

if (LV.isUnknownOrUndef() || LV.isConstant() || if (LV.isUnknownOrUndef() || LV.isConstant() ||

(LV.isConstantRange() && LV.getConstantRange().isSingleElement())) { (LV.isConstantRange() && LV.getConstantRange().isSingleElement())) {

LLVM_DEBUG(dbgs() << "FnSpecialization: Nothing to do, argument " LLVM_DEBUG(dbgs() << "FnSpecialization: Nothing to do, parameter "

<< A->getNameOrAsOperand() << " is already constant\n"); << A->getNameOrAsOperand() << " is already constant\n");

return false; return false;

} }

LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting parameter "

<< A->getNameOrAsOperand() << "\n");

return true; return true;

} }

/// Check if the valuy \p V (an actual argument) is a constant or can only /// Check if the valuy \p V (an actual argument) is a constant or can only

/// have a constant value. Return that constant. /// have a constant value. Return that constant.

Constant *FunctionSpecializer::getCandidateConstant(Value *V) { Constant *FunctionSpecializer::getCandidateConstant(Value *V) {

if (isa<PoisonValue>(V)) if (isa<PoisonValue>(V))

return nullptr; return nullptr;

Show All 19 Lines if (!C) {

else if (LV.isConstantRange() && LV.getConstantRange().isSingleElement()) { else if (LV.isConstantRange() && LV.getConstantRange().isSingleElement()) {

assert(V->getType()->isIntegerTy() && "Non-integral constant range"); assert(V->getType()->isIntegerTy() && "Non-integral constant range");

C = Constant::getIntegerValue(V->getType(), C = Constant::getIntegerValue(V->getType(),

*LV.getConstantRange().getSingleElement()); *LV.getConstantRange().getSingleElement());

} else } else

return nullptr; return nullptr;

} }

LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting argument "

<< V->getNameOrAsOperand() << "\n");

return C; return C;

} }

/// Redirects callsites of function \p F to its specialized copies. void FunctionSpecializer::updateCallSites(Function *F, const Spec *Begin,

void FunctionSpecializer::updateCallSites( const Spec *End) {

Function *F, SmallVectorImpl<CallSpecBinding> &Specializations) { // Collect the call sites that need updating.

labrineaUnsubmitted

Done

typo

labrinea: typo

SmallVector<CallBase *, 8> ToUpdate; SmallVector<CallBase *> ToUpdate;

for (User *U : F->users()) { for (User *U : F->users())

if (auto *CS = dyn_cast<CallBase>(U)) if (auto *CS = dyn_cast<CallBase>(U);

if (CS->getCalledFunction() == F && CS && CS->getCalledFunction() == F &&

Solver.isBlockExecutable(CS->getParent())) Solver.isBlockExecutable(CS->getParent()))

ToUpdate.push_back(CS); ToUpdate.push_back(CS);

}

unsigned NCallsLeft = ToUpdate.size(); unsigned NCallsLeft = ToUpdate.size();

for (CallBase *CS : ToUpdate) { for (CallBase *CS : ToUpdate) {

// Decrement the counter if the callsite is either recursive or updated.

bool ShouldDecrementCount = CS->getFunction() == F; bool ShouldDecrementCount = CS->getFunction() == F;

for (CallSpecBinding &Specialization : Specializations) {

Function *Clone = Specialization.second.Clone;

SmallVectorImpl<ArgInfo> &Args = Specialization.second.Args;

if (any_of(Args, [CS, this](const ArgInfo &Arg) { // Find the best matching specialisation.

const Spec *BestSpec = nullptr;

for (const Spec &S : make_range(Begin, End)) {

if (!S.Clone || (BestSpec && S.Gain <= BestSpec->Gain))

continue;

if (any_of(S.Sig.Args, [CS, this](const ArgInfo &Arg) {

unsigned ArgNo = Arg.Formal->getArgNo(); unsigned ArgNo = Arg.Formal->getArgNo();

return getCandidateConstant(CS->getArgOperand(ArgNo)) != Arg.Actual; return getCandidateConstant(CS->getArgOperand(ArgNo)) != Arg.Actual;

})) }))

continue; continue;

LLVM_DEBUG(dbgs() << "FnSpecialization: Replacing call site " << *CS BestSpec = &S;

<< " with " << Clone->getName() << "\n"); }

CS->setCalledFunction(Clone); if (BestSpec) {

LLVM_DEBUG(dbgs() << "FnSpecialization: Redirecting " << *CS

<< " to call " << BestSpec->Clone->getName() << "\n");

CS->setCalledFunction(BestSpec->Clone);

ShouldDecrementCount = true; ShouldDecrementCount = true;

break;

} }

if (ShouldDecrementCount) if (ShouldDecrementCount)

--NCallsLeft; --NCallsLeft;

} }

// If the function has been completely specialized, the original function // If the function has been completely specialized, the original function

// is no longer needed. Mark it unreachable. // is no longer needed. Mark it unreachable.

if (NCallsLeft == 0) { if (NCallsLeft == 0) {

Solver.markFunctionUnreachable(F); Solver.markFunctionUnreachable(F);

FullySpecialized.insert(F); FullySpecialized.insert(F);

} }

llvm/test/Transforms/FunctionSpecialization/global-rank.ll

This file was added.

				; RUN: opt -S --passes=ipsccp -specialize-functions -func-specialization-max-clones=1 < %s \| FileCheck %s
				define internal i32 @f(i32 noundef %x, ptr nocapture noundef readonly %p, ptr nocapture noundef readonly %q) noinline {
				entry:
				%call = tail call i32 %p(i32 noundef %x)
				%call1 = tail call i32 %q(i32 noundef %x)
				%add = add nsw i32 %call1, %call
				ret i32 %add
				}

				define internal i32 @g(i32 noundef %x, ptr nocapture noundef readonly %p, ptr nocapture noundef readonly %q) noinline {
				entry:
				%call = tail call i32 %p(i32 noundef %x)
				%call1 = tail call i32 %q(i32 noundef %x)
				%sub = sub nsw i32 %call, %call1
				ret i32 %sub
				}

				define i32 @h0(i32 noundef %x) {
				entry:
				%call = tail call i32 @f(i32 noundef %x, ptr noundef nonnull @pp, ptr noundef nonnull @qq)
				ret i32 %call
				}

				define i32 @h1(i32 noundef %x) {
				entry:
				%call = tail call i32 @f(i32 noundef %x, ptr noundef nonnull @qq, ptr noundef nonnull @pp)
				ret i32 %call
				}

				define i32 @h2(i32 noundef %x, ptr nocapture noundef readonly %p) {
				entry:
				%call = tail call i32 @g(i32 noundef %x, ptr noundef %p, ptr noundef nonnull @pp)
				ret i32 %call
				}

				define i32 @h3(i32 noundef %x, ptr nocapture noundef readonly %p) {
				entry:
				%call = tail call i32 @g(i32 noundef %x, ptr noundef %p, ptr noundef nonnull @qq)
				ret i32 %call
				}

				declare i32 @pp(i32 noundef)
				declare i32 @qq(i32 noundef)


				; Check that the global ranking causes two specialisations of
				; `f` to be chosen, whereas the old algorithm would choose
				; one specialsation of `f` and one of `g`.

				; CHECK-DAG: define internal i32 @f.1
				; CHECK-DAG: define internal i32 @f.2

llvm/test/Transforms/FunctionSpecialization/identical-specializations.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=ipsccp -specialize-functions -force-function-specialization -S < %s \| FileCheck %s			; RUN: opt -passes=ipsccp -specialize-functions -force-function-specialization -S < %s \| FileCheck %s

	define i64 @main(i64 %x, i64 %y, i1 %flag) {			define i64 @main(i64 %x, i64 %y, i1 %flag) {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]			; CHECK-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
	; CHECK: plus:			; CHECK: plus:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @compute.1(i64 [[X:%.]], i64 [[Y:%.*]], ptr @plus, ptr @minus)			; CHECK-NEXT: [[CMP0:%.]] = call i64 @compute.2(i64 [[X:%.]], i64 [[Y:%.*]], ptr @plus, ptr @minus)
	; CHECK-NEXT: br label [[MERGE:%.*]]			; CHECK-NEXT: br label [[MERGE:%.*]]
	; CHECK: minus:			; CHECK: minus:
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @compute.2(i64 [[X]], i64 [[Y]], ptr @minus, ptr @plus)			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @compute.3(i64 [[X]], i64 [[Y]], ptr @minus, ptr @plus)
	; CHECK-NEXT: br label [[MERGE]]			; CHECK-NEXT: br label [[MERGE]]
	; CHECK: merge:			; CHECK: merge:
	; CHECK-NEXT: [[PH:%.*]] = phi i64 [ [[CMP0]], [[PLUS]] ], [ [[CMP1]], [[MINUS]] ]			; CHECK-NEXT: [[PH:%.*]] = phi i64 [ [[CMP0]], [[PLUS]] ], [ [[CMP1]], [[MINUS]] ]
	; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.1(i64 [[PH]], i64 42, ptr @plus, ptr @minus)			; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.2(i64 [[PH]], i64 42, ptr @plus, ptr @minus)
	; CHECK-NEXT: ret i64 [[CMP2]]			; CHECK-NEXT: ret i64 [[CMP2]]
	;			;
	entry:			entry:
	br i1 %flag, label %plus, label %minus			br i1 %flag, label %plus, label %minus

	plus:			plus:
	%cmp0 = call i64 @compute(i64 %x, i64 %y, ptr @plus, ptr @minus)			%cmp0 = call i64 @compute(i64 %x, i64 %y, ptr @plus, ptr @minus)
	br label %merge			br label %merge
	Show All 32 Lines
	; CHECK-LABEL: @minus(			; CHECK-LABEL: @minus(
	entry:			entry:
	%sub = sub i64 %x, %y			%sub = sub i64 %x, %y
	ret i64 %sub			ret i64 %sub
	}			}

	; CHECK-LABEL: @compute.1			; CHECK-LABEL: @compute.1
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @plus(i64 [[X:%.]], i64 [[Y:%.*]])			; CHECK-NEXT: [[CMP0:%.]] = call i64 %binop1(i64 [[X:%.]], i64 [[Y:%.*]])
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @minus(i64 [[X]], i64 [[Y]])			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])
	; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute(i64 [[X]], i64 [[Y]], ptr @plus, ptr @plus)			; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], ptr %binop1, ptr @plus)

	; CHECK-LABEL: @compute.2			; CHECK-LABEL: @compute.2
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @minus(i64 [[X:%.]], i64 [[Y:%.*]])			; CHECK-NEXT: [[CMP0:%.]] = call i64 @plus(i64 [[X:%.]], i64 [[Y:%.*]])
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @minus(i64 [[X]], i64 [[Y]])
	; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.2(i64 [[X]], i64 [[Y]], ptr @minus, ptr @plus)			; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], ptr @plus, ptr @plus)

	; CHECK-LABEL: @compute.3			; CHECK-LABEL: @compute.3
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @plus(i64 [[X:%.]], i64 [[Y:%.*]])			; CHECK-NEXT: [[CMP0:%.]] = call i64 @minus(i64 [[X:%.]], i64 [[Y:%.*]])
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @minus(i64 [[X]], i64 [[Y]])			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])
	; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute(i64 [[X]], i64 [[Y]], ptr @plus, ptr @plus)			; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.3(i64 [[X]], i64 [[Y]], ptr @minus, ptr @plus)

llvm/test/Transforms/FunctionSpecialization/specialization-order.ll

Show All 15 Lines	entry:
%call = tail call i32 %u(i32 %x, i32 %y)		%call = tail call i32 %u(i32 %x, i32 %y)
%call1 = tail call i32 %v(i32 %x, i32 %y)		%call1 = tail call i32 %v(i32 %x, i32 %y)
%mul = mul nsw i32 %call1, %call		%mul = mul nsw i32 %call1, %call
ret i32 %mul		ret i32 %mul
}		}

define dso_local i32 @g0(i32 %x, i32 %y) {		define dso_local i32 @g0(i32 %x, i32 %y) {
; CHECK-LABEL: @g0		; CHECK-LABEL: @g0
; CHECK: call i32 @f.2(i32 [[X:%.]], i32 [[Y:%.]])		; CHECK: call i32 @f.3(i32 [[X:%.]], i32 [[Y:%.]])
entry:		entry:
%call = tail call i32 @f(i32 %x, i32 %y, ptr @add, ptr @add)		%call = tail call i32 @f(i32 %x, i32 %y, ptr @add, ptr @add)
ret i32 %call		ret i32 %call
}		}


define dso_local i32 @g1(i32 %x, i32 %y) {		define dso_local i32 @g1(i32 %x, i32 %y) {
; CHECK-LABEL: @g1(		; CHECK-LABEL: @g1(
; CHECK: call i32 @f.1(i32 [[X:%.]], i32 [[Y:%.]])		; CHECK: call i32 @f.2(i32 [[X:%.]], i32 [[Y:%.]])
entry:		entry:
%call = tail call i32 @f(i32 %x, i32 %y, ptr @sub, ptr @add)		%call = tail call i32 @f(i32 %x, i32 %y, ptr @sub, ptr @add)
ret i32 %call		ret i32 %call
}		}

define dso_local i32 @g2(i32 %x, i32 %y, ptr %v) {		define dso_local i32 @g2(i32 %x, i32 %y, ptr %v) {
; CHECK-LABEL @g2		; CHECK-LABEL @g2
; CHECK call i32 @f.3(i32 [[X:%.]], i32 [[Y:%.]], ptr [[V:%.*]])		; CHECK call i32 @f.1(i32 [[X:%.]], i32 [[Y:%.]], ptr [[V:%.*]])
entry:		entry:
%call = tail call i32 @f(i32 %x, i32 %y, ptr @sub, ptr %v)		%call = tail call i32 @f(i32 %x, i32 %y, ptr @sub, ptr %v)
ret i32 %call		ret i32 %call
}		}

; CHECK-LABEL: define {{.*}} i32 @f.1		; CHECK-LABEL: define {{.*}} i32 @f.1
; CHECK: call i32 @sub(i32 %x, i32 %y)		; CHECK: call i32 @sub(i32 %x, i32 %y)
; CHECK-NEXT: call i32 @add(i32 %x, i32 %y)		; CHECK-NEXT: call i32 %v(i32 %x, i32 %y)

; CHECK-LABEL: define {{.*}} i32 @f.2		; CHECK-LABEL: define {{.*}} i32 @f.2
; CHECK: call i32 @add(i32 %x, i32 %y)		; CHECK: call i32 @sub(i32 %x, i32 %y)
; CHECK-NEXT call i32 @add(i32 %x, i32 %y)		; CHECK-NEXT: call i32 @add(i32 %x, i32 %y)

; CHECK-LABEL: define {{.*}} i32 @f.3		; CHECK-LABEL: define {{.*}} i32 @f.3
; CHECK: call i32 @sub(i32 %x, i32 %y)		; CHECK: call i32 @add(i32 %x, i32 %y)
; CHECK-NEXT: call i32 %v(i32 %x, i32 %y)		; CHECK-NEXT call i32 @add(i32 %x, i32 %y)

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	;			;
	; TWO-LABEL: @main(			; TWO-LABEL: @main(
	; TWO-NEXT: entry:			; TWO-NEXT: entry:
	; TWO-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]			; TWO-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
	; TWO: plus:			; TWO: plus:
	; TWO-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.*]], ptr @power, ptr @mul)			; TWO-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.*]], ptr @power, ptr @mul)
	; TWO-NEXT: br label [[MERGE:%.*]]			; TWO-NEXT: br label [[MERGE:%.*]]
	; TWO: minus:			; TWO: minus:
	; TWO-NEXT: [[TMP1:%.*]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], ptr @plus, ptr @minus)			; TWO-NEXT: [[TMP1:%.*]] = call i64 @compute.2(i64 [[X]], i64 [[Y]], ptr @plus, ptr @minus)
	; TWO-NEXT: br label [[MERGE]]			; TWO-NEXT: br label [[MERGE]]
	; TWO: merge:			; TWO: merge:
	; TWO-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]			; TWO-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
	; TWO-NEXT: [[TMP3:%.*]] = call i64 @compute.2(i64 [[TMP2]], i64 42, ptr @minus, ptr @power)			; TWO-NEXT: [[TMP3:%.*]] = call i64 @compute.1(i64 [[TMP2]], i64 42, ptr @minus, ptr @power)
	; TWO-NEXT: ret i64 [[TMP3]]			; TWO-NEXT: ret i64 [[TMP3]]
	;			;
	; THREE-LABEL: @main(			; THREE-LABEL: @main(
	; THREE-NEXT: entry:			; THREE-NEXT: entry:
	; THREE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]			; THREE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
	; THREE: plus:			; THREE: plus:
	; THREE-NEXT: [[TMP0:%.]] = call i64 @compute.3(i64 [[X:%.]], i64 [[Y:%.*]], ptr @power, ptr @mul)			; THREE-NEXT: [[TMP0:%.]] = call i64 @compute.1(i64 [[X:%.]], i64 [[Y:%.*]], ptr @power, ptr @mul)
	; THREE-NEXT: br label [[MERGE:%.*]]			; THREE-NEXT: br label [[MERGE:%.*]]
	; THREE: minus:			; THREE: minus:
	; THREE-NEXT: [[TMP1:%.*]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], ptr @plus, ptr @minus)			; THREE-NEXT: [[TMP1:%.*]] = call i64 @compute.2(i64 [[X]], i64 [[Y]], ptr @plus, ptr @minus)
	; THREE-NEXT: br label [[MERGE]]			; THREE-NEXT: br label [[MERGE]]
	; THREE: merge:			; THREE: merge:
	; THREE-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]			; THREE-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
	; THREE-NEXT: [[TMP3:%.*]] = call i64 @compute.2(i64 [[TMP2]], i64 42, ptr @minus, ptr @power)			; THREE-NEXT: [[TMP3:%.*]] = call i64 @compute.3(i64 [[TMP2]], i64 42, ptr @minus, ptr @power)
	; THREE-NEXT: ret i64 [[TMP3]]			; THREE-NEXT: ret i64 [[TMP3]]
	;			;
	entry:			entry:
	br i1 %flag, label %plus, label %minus			br i1 %flag, label %plus, label %minus

	plus:			plus:
	%tmp0 = call i64 @compute(i64 %x, i64 %y, ptr @power, ptr @mul)			%tmp0 = call i64 @compute(i64 %x, i64 %y, ptr @power, ptr @mul)
	br label %merge			br label %merge

	minus:			minus:
	%tmp1 = call i64 @compute(i64 %x, i64 %y, ptr @plus, ptr @minus)			%tmp1 = call i64 @compute(i64 %x, i64 %y, ptr @plus, ptr @minus)
	br label %merge			br label %merge

	merge:			merge:
	%tmp2 = phi i64 [ %tmp0, %plus ], [ %tmp1, %minus]			%tmp2 = phi i64 [ %tmp0, %plus ], [ %tmp1, %minus]
	%tmp3 = call i64 @compute(i64 %tmp2, i64 42, ptr @minus, ptr @power)			%tmp3 = call i64 @compute(i64 %tmp2, i64 42, ptr @minus, ptr @power)
	ret i64 %tmp3			ret i64 %tmp3
	}			}

	; THREE-NOT: define internal i64 @compute			; THREE-NOT: define internal i64 @compute
	;			;
	; THREE-LABEL: define internal i64 @compute.1(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {			; THREE-LABEL: define internal i64 @compute.1(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {
	; THREE-NEXT: entry:			; THREE-NEXT: entry:
	; THREE-NEXT: [[TMP0:%.+]] = call i64 @plus(i64 %x, i64 %y)			; THREE-NEXT: [[TMP0:%.+]] = call i64 @power(i64 %x, i64 %y)
	; THREE-NEXT: [[TMP1:%.+]] = call i64 @minus(i64 %x, i64 %y)			; THREE-NEXT: [[TMP1:%.+]] = call i64 @mul(i64 %x, i64 %y)
	; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]			; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
	; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x			; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
	; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y			; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
	; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2			; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
	; THREE-NEXT: ret i64 [[TMP5]]			; THREE-NEXT: ret i64 [[TMP5]]
	; THREE-NEXT: }			; THREE-NEXT: }
	;			;
	; THREE-LABEL: define internal i64 @compute.2(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {			; THREE-LABEL: define internal i64 @compute.2(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {
	; THREE-NEXT: entry:			; THREE-NEXT: entry:
	; THREE-NEXT: [[TMP0:%.+]] = call i64 @minus(i64 %x, i64 %y)			; THREE-NEXT: [[TMP0:%.+]] = call i64 @plus(i64 %x, i64 %y)
	; THREE-NEXT: [[TMP1:%.+]] = call i64 @power(i64 %x, i64 %y)			; THREE-NEXT: [[TMP1:%.+]] = call i64 @minus(i64 %x, i64 %y)
	; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]			; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
	; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x			; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
	; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y			; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
	; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2			; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
	; THREE-NEXT: ret i64 [[TMP5]]			; THREE-NEXT: ret i64 [[TMP5]]
	; THREE-NEXT: }			; THREE-NEXT: }
	;			;
	; THREE-LABEL: define internal i64 @compute.3(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {			; THREE-LABEL: define internal i64 @compute.3(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {
	; THREE-NEXT: entry:			; THREE-NEXT: entry:
	; THREE-NEXT: [[TMP0:%.+]] = call i64 @power(i64 %x, i64 %y)			; THREE-NEXT: [[TMP0:%.+]] = call i64 @minus(i64 %x, i64 %y)
	; THREE-NEXT: [[TMP1:%.+]] = call i64 @mul(i64 %x, i64 %y)			; THREE-NEXT: [[TMP1:%.+]] = call i64 @power(i64 %x, i64 %y)
	; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]			; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
	; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x			; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
	; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y			; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
	; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2			; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
	; THREE-NEXT: ret i64 [[TMP5]]			; THREE-NEXT: ret i64 [[TMP5]]
	; THREE-NEXT: }			; THREE-NEXT: }
	;			;
	define internal i64 @compute(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {			define internal i64 @compute(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Global ranking of specialisationsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 482854

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

llvm/include/llvm/Transforms/Utils/SCCPSolver.h

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

llvm/test/Transforms/FunctionSpecialization/global-rank.ll

llvm/test/Transforms/FunctionSpecialization/identical-specializations.ll

llvm/test/Transforms/FunctionSpecialization/specialization-order.ll

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

[FuncSpec] Global ranking of specialisations
ClosedPublic