This is an archive of the discontinued LLVM Phabricator instance.

Compilation
FnSpecialization: Specialized 1 functions in module test-suite/CTMark/SPASS/terminator.c
FnSpecialization: Specialized 5 functions in module test-suite/CTMark/ClamAV/libclamav_message.c
FnSpecialization: Specialized 1 functions in module test-suite/CTMark/SPASS/rules-inf.c
FnSpecialization: Specialized 1 functions in module test-suite/CTMark/sqlite3/sqlite3.c

Linking
FnSpecialization: Specialized 1 functions in module ld-temp.o (ClamAV)
FnSpecialization: Specialized 1 functions in module ld-temp.o (sqlite3)
FnSpecialization: Specialized 3 functions in module ld-temp.o (lencod)

New cost model (LTO)

Compilation
FnSpecialization: Created 1 specializations in module test-suite/CTMark/ClamAV/libclamav_message.c

Linking
FnSpecialization: Created 2 specializations in module ld-temp.o (ClamAV)
FnSpecialization: Created 2 specializations in module ld-temp.o (sqlite3)
FnSpecialization: Created 3 specializations in module ld-temp.o (lencod)

Instruction Count % delta

ClamAV	+0.381
7zip	-0.012
tramp3d-v4	-0.007
kimwitu++	+0.097
sqlite3	+0.170
mafft	+0.026
lencod	-0.054
SPASS	-0.673
consumer-typeset	+0.054
Bullet	-0.012
geomean	-0.003

(The specializations happening at link time seem to be affecting the total compilation time a lot more than those happening pre-link time).

labrinea added a child revision: D145394: [FuncSpec] Do not run pre-link when doing LTO..Mar 6 2023, 8:50 AM

Harbormaster completed remote builds in B217563: Diff 502619.Mar 6 2023, 8:51 AM

labrinea added inline comments.Mar 7 2023, 2:33 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
140–141	Oops, forgot to remove it from here

Changes from last revision:

Removed redefinition of SpecMap type from the header file.
Removed @param Cost from the description of findSpecializations() in the header file.
Decreased the value of MinScore from 100 to 80 as it no longer specialized mcf_r from SPEC2017.

Adjusted the test output for recursive-penalty.ll. Decreasing MinScore from 100 to 80 translates to 10 instead of 8 specializations being created for the test.

Harbormaster completed remote builds in B218734: Diff 504229.Mar 10 2023, 2:10 PM

labrinea mentioned this in D145819: [FuncSpec] Increase the maximum number of times the specializer can run..Mar 12 2023, 9:27 AM

I am wondering if you're trying to do too much in this patch. But I only had a quick look at this, and need to look again.
But my first request is going to be about the cost/bonus calculation. Since this is crucial for this transformation, it would be useful to document and comment the idea/algorithm at some place, with all the formulas and definitions (costs, bonus, latency, code size etc). I think this would greatly improve readability of these changes and the code in general.

Added a description of the cost model in the header file.

Harbormaster completed remote builds in B219891: Diff 505834.Mar 16 2023, 10:11 AM

SjoerdMeijer added inline comments.Mar 17 2023, 2:01 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
51	Thanks for adding these descriptions, that is very helpful. Just trying to understand the initial code size calculation better, the last part: [number of inline candidates] x [small function size] why is that not the [number of instructions] for each inline candidate?

SjoerdMeijer added inline comments.Mar 17 2023, 2:23 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
88	It's not clear to me what this range exactly is.
90	Nit: perhaps consider a more informative name for this.
92	Nit: abreviation -> abbreviation and you can drop `shorter`, or the comment entirely, up to you.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
396	Nit: `UM` is a bit cryptic.

labrinea added inline comments.Mar 17 2023, 4:22 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
51	Because this information is not available in CodeMetrics. We don't know who that inline candidate is at this point, and if we did we would have to compute it's CodeMetrics. So instead we use `mall function size` as an approximation. Knowing the exact number of instructions does not matter that much. What matters is to increase the codesize hit if there are inline candidates, meaning the entire function might end up even larger than it is now.
88	This typdef is unrelated to this patch, I just grouped all the typedef here. Just ignore it.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
396	UniqueMap. Was already named as such before this patch, so unrelated. Please ignore on this review.

SjoerdMeijer added inline comments.Mar 17 2023, 5:11 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
51	Thanks, if you can add this explanation to the comments that would help.
88	Okay, so it lived somewhere else before and you just moved it here, but can you explain what the range is while we are at it?
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
396	Naming is difficult, but it would be a shame if we cannot come up with something better while we are at it.
543–544	Bit of duplication here with the comment above.
567–568	We are not calculating one bonus anymore, but two values latency and code size.
588	I don't quite get why we are subtracting UserSize here. I guess something related to double or recounting, but after a quick look it wasn't clear to me, perhaps a comment would be helpful here.
llvm/test/Transforms/FunctionSpecialization/recursive-penalty.ll
10	Do we want to check the resulting IR too?

chill added inline comments.Mar 17 2023, 8:09 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
88	Isn't that clear from the comment above? We have a vector of specializations. In this vector the specializations of any one function are adjacent to one another, IOW "form a contiguous range". For example, if specializations of `foo` are at indices 4, 5, 6, 7 in the vector, we keep just 4 and 8, to save space. The range is `[4, 8)`. A somewhat longer description of the data structures is given in the commit message: in https://reviews.llvm.org/rGe6b9fc4c8be00660c5e1f1605a6a5d92475bdba7

SjoerdMeijer added inline comments.Mar 17 2023, 12:40 PM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
88	Isn't that clear from the comment above? No, sorry, it wasn't for me. This however, is crystal clear to me: We have a vector of specializations. In this vector the specializations of any one function are adjacent to one another, IOW "form a contiguous range". For example, if specializations of foo are at indices 4, 5, 6, 7 in the vector, we keep just 4 and 8, to save space. The range is [4, 8). As it also answers my follow up question, i.e. what is the relevance/rationale of this. I think exactly this should be part of the comment/explanation here. A somewhat longer description of the data structures is given in the commit message: in https://reviews.llvm.org/rGe6b9fc4c8be00660c5e1f1605a6a5d92475bdba7 So I missed that review, i.e. didn't have time to look into it, but am now dipping back into FuncSpec, which is why I am missing some rationale. But I think this is an excellent write-up, that should be, or parts of it, should be part of the write up to make this self contained (i.e. it's a bit of a missed opportunity this explanation is only in the commit message).

labrinea added inline comments.Mar 20 2023, 8:03 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
543–544	Good point, I forgot to remove these comments.
567–568	Will update the wording.
588	We start from `Cost CodeSize = Metrics.NumInsts + Metrics.NumInlineCandidates * MinFunctionSize;` (see the invocation of getSpecializationBonus). Then we start decreasing it by the footprint of each user.
llvm/test/Transforms/FunctionSpecialization/recursive-penalty.ll
10	Not really. All we care about in this test is to show that the number of specializations created is not linear to `funcspec-max-iters`.

labrinea added inline comments.Mar 23 2023, 10:30 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
51	Will do.
88	I disagree with repeating the write-up in a comment as it may become irrelevant at some point. I think git blame should suffice. I find the existing comment already quite extensive/informative.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
396	Agreed. I'll update it.

labrinea updated this revision to Diff 507835.Mar 23 2023, 11:33 AM

labrinea marked 7 inline comments as done.

Harbormaster completed remote builds in B221384: Diff 507835.Mar 23 2023, 1:15 PM

Cheers, LGTM

This revision is now accepted and ready to land.Mar 27 2023, 1:40 AM

labrinea mentioned this in rG93ac2dbefcab: [FuncSpec][NFC] Add an alias for InstructionCost..May 9 2023, 3:42 AM

I am having second thoughts about this change. I will merge the trivial NFC changes (variable renaming, type aliasing) and will raise new tickets for the remaining parts. So abandoning the cost model improvements for now.

Herald added a subscriber: hoy. · View Herald TranscriptMay 9 2023, 3:46 AM

labrinea mentioned this in rG929a8c9f72dc: [FuncSpec][NFC] Rename cryptic variable to better describe it..May 9 2023, 4:01 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

FunctionSpecialization.h

87 lines

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

256 lines

test/

Transforms/

FunctionSpecialization/

function-specialization-constant-integers.ll

2 lines

function-specialization-loop.ll

2 lines

get-possible-constants.ll

2 lines

recursive-penalty.ll

64 lines

remove-dead-recursive-function.ll

2 lines

Diff 507835

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

Show All 23 Lines
//		//
// Ideas:		// Ideas:
// - With a function specialization attribute for arguments, we could have		// - With a function specialization attribute for arguments, we could have
// a direct way to steer function specialization, avoiding the cost-model,		// a direct way to steer function specialization, avoiding the cost-model,
// and thus control compile-times / code-size.		// and thus control compile-times / code-size.
//		//
// Todos:		// Todos:
// - Specializing recursive functions relies on running the transformation a		// - Specializing recursive functions relies on running the transformation a
// number of times, which is controlled by option		// number of times, which is controlled by the option `funcspec-max-iters`.
// `func-specialization-max-iters`. Thus, increasing this value and the		// Perhaps there is a compile-time friendlier way for such specializations.
// number of iterations, will linearly increase the number of times recursive		// (see also the discussion in https://reviews.llvm.org/D106426 for details)
// functions get specialized, see also the discussion in		//
// https://reviews.llvm.org/D106426 for details. Perhaps there is a		// Cost Model:
// compile-time friendlier way to control/limit the number of specialisations		// -----------
// for recursive functions.		// To determine whether a specialization is profitable, we compute its score as
// - Don't transform the function if function specialization does not trigger;		// the ratio of latency savings over codesize increase. The score must exceed
// the SCCPSolver may make IR changes.		// a minimum threshold, which is controlled by the option `funcspec-min-score`.
		//
		// To compute the latency savings attributed to each of the constant arguments,
		// we traverse the use-def chain (ignoring ssa copies) up to a maximum depth
		// controlled by the option `funcspec-max-user-depth`. The latency of a user is
		// exponentiated if inside a loop nest, but the deeper in the use-def chain we
		// are the less the latency is "worth". The average loop iteration count is
		// controlled by the option `funcspec-avg-loop-iters`. Inlining benefits are
		// estimated as part of the latency savings.
		//
		// To compute the codesize increase, we first estimate the initial codesize of
		// the function as [number of instructions] + [number of inline candidates] x
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Thanks for adding these descriptions, that is very helpful. Just trying to understand the initial code size calculation better, the last part: [number of inline candidates] x [small function size] why is that not the [number of instructions] for each inline candidate? SjoerdMeijer: Thanks for adding these descriptions, that is very helpful. Just trying to understand the…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Because this information is not available in CodeMetrics. We don't know who that inline candidate is at this point, and if we did we would have to compute it's CodeMetrics. So instead we use `mall function size` as an approximation. Knowing the exact number of instructions does not matter that much. What matters is to increase the codesize hit if there are inline candidates, meaning the entire function might end up even larger than it is now. labrinea: Because this information is not available in CodeMetrics. We don't know who that inline…
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Thanks, if you can add this explanation to the comments that would help. SjoerdMeijer: Thanks, if you can add this explanation to the comments that would help.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Will do. labrinea: Will do.
		// [small function size] since the exact number of instructions per inline
		// candidate is unknown. "Small function size" corresponds to the minimum size
		// a function needs to have to be considered for specialization. This is
		// controlled by the option `funcspec-min-function-size`. Similarly to latency
		// savings, we accumulate the codesize reduction attributed to the users of a
		// constant argument. To prevent a function from being specialized linearly to
		// the number of times the Specializer runs (controlled by `funcspec-max-iters`)
		// we penalize specialization on the same argument. The penalty is reflected on
		// the codesize increase.
		//
		// When accumulating latency and codesize savings of constant arguments, we keep
		// a set of visited users to avoid accounting for the same gains multiple times.
//		//
// References:		// References:
// - 2021 LLVM Dev Mtg “Introducing function specialisation, and can we enable		// - 2021 LLVM Dev Mtg “Introducing function specialisation, and can we enable
// it by default?”, https://www.youtube.com/watch?v=zJiCjeXgV5Q		// it by default?”, https://www.youtube.com/watch?v=zJiCjeXgV5Q
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TRANSFORMS_IPO_FUNCTIONSPECIALIZATION_H		#ifndef LLVM_TRANSFORMS_IPO_FUNCTIONSPECIALIZATION_H
#define LLVM_TRANSFORMS_IPO_FUNCTIONSPECIALIZATION_H		#define LLVM_TRANSFORMS_IPO_FUNCTIONSPECIALIZATION_H

#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Transforms/Scalar/SCCP.h"		#include "llvm/Transforms/Scalar/SCCP.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/SCCPSolver.h"		#include "llvm/Transforms/Utils/SCCPSolver.h"
#include "llvm/Transforms/Utils/SizeOpts.h"		#include "llvm/Transforms/Utils/SizeOpts.h"

using namespace llvm;		using namespace llvm;

namespace llvm {		namespace llvm {
		// Map of potential specializations for each function. The FunctionSpecializer
		// keeps the discovered specialisation opportunities for the module in a single
		// vector, where the specialisations of each function form a contiguous range.
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions It's not clear to me what this range exactly is. SjoerdMeijer: It's not clear to me what this range exactly is.
		labrineaAuthorUnsubmitted Done Reply Inline Actions This typdef is unrelated to this patch, I just grouped all the typedef here. Just ignore it. labrinea: This typdef is unrelated to this patch, I just grouped all the typedef here. Just ignore it.
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Okay, so it lived somewhere else before and you just moved it here, but can you explain what the range is while we are at it? SjoerdMeijer: Okay, so it lived somewhere else before and you just moved it here, but can you explain what…
		chillUnsubmitted Not Done Reply Inline Actions Isn't that clear from the comment above? We have a vector of specializations. In this vector the specializations of any one function are adjacent to one another, IOW "form a contiguous range". For example, if specializations of `foo` are at indices 4, 5, 6, 7 in the vector, we keep just 4 and 8, to save space. The range is `[4, 8)`. A somewhat longer description of the data structures is given in the commit message: in https://reviews.llvm.org/rGe6b9fc4c8be00660c5e1f1605a6a5d92475bdba7 chill: Isn't that clear from the comment above? We have a vector of specializations. In this vector…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Isn't that clear from the comment above? No, sorry, it wasn't for me. This however, is crystal clear to me: We have a vector of specializations. In this vector the specializations of any one function are adjacent to one another, IOW "form a contiguous range". For example, if specializations of foo are at indices 4, 5, 6, 7 in the vector, we keep just 4 and 8, to save space. The range is [4, 8). As it also answers my follow up question, i.e. what is the relevance/rationale of this. I think exactly this should be part of the comment/explanation here. A somewhat longer description of the data structures is given in the commit message: in https://reviews.llvm.org/rGe6b9fc4c8be00660c5e1f1605a6a5d92475bdba7 So I missed that review, i.e. didn't have time to look into it, but am now dipping back into FuncSpec, which is why I am missing some rationale. But I think this is an excellent write-up, that should be, or parts of it, should be part of the write up to make this self contained (i.e. it's a bit of a missed opportunity this explanation is only in the commit message). SjoerdMeijer: > Isn't that clear from the comment above? No, sorry, it wasn't for me. This however, is…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I disagree with repeating the write-up in a comment as it may become irrelevant at some point. I think git blame should suffice. I find the existing comment already quite extensive/informative. labrinea: I disagree with repeating the write-up in a comment as it may become irrelevant at some point.
		// This map's value is the beginning and the end of that range.
		using SpecMap = DenseMap<Function *, std::pair<unsigned, unsigned>>;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: perhaps consider a more informative name for this. SjoerdMeijer: Nit: perhaps consider a more informative name for this.

		// Just a shorter abbreviation to improve indentation.
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: abreviation -> abbreviation and you can drop `shorter`, or the comment entirely, up to you. SjoerdMeijer: Nit: abreviation -> abbreviation and you can drop `shorter`, or the comment entirely, up to you.
		using Cost = InstructionCost;

// Specialization signature, used to uniquely designate a specialization within		// Specialization signature, used to uniquely designate a specialization within
// a function.		// a function.
struct SpecSig {		struct SpecSig {
// Hashing support, used to distinguish between ordinary, empty, or tombstone		// Hashing support, used to distinguish between ordinary, empty, or tombstone
// keys.		// keys.
unsigned Key = 0;		unsigned Key = 0;
SmallVector<ArgInfo, 4> Args;		SmallVector<ArgInfo, 4> Args;

Show All 19 Lines	struct Spec {

// Cloned function, a specialized version of the original one.		// Cloned function, a specialized version of the original one.
Function *Clone = nullptr;		Function *Clone = nullptr;

// Specialization signature.		// Specialization signature.
SpecSig Sig;		SpecSig Sig;

// Profitability of the specialization.		// Profitability of the specialization.
InstructionCost Gain;		Cost Score;

// List of call sites, matching this specialization.		// List of call sites, matching this specialization.
SmallVector<CallBase *> CallSites;		SmallVector<CallBase *> CallSites;

Spec(Function *F, const SpecSig &S, InstructionCost G)		Spec(Function *F, const SpecSig &S, Cost Score)
: F(F), Sig(S), Gain(G) {}		: F(F), Sig(S), Score(Score) {}
Spec(Function *F, const SpecSig &&S, InstructionCost G)		Spec(Function *F, const SpecSig &&S, Cost Score)
: F(F), Sig(S), Gain(G) {}		: F(F), Sig(S), Score(Score) {}
};		};

// Map of potential specializations for each function. The FunctionSpecializer
// keeps the discovered specialisation opportunities for the module in a single
// vector, where the specialisations of each function form a contiguous range.
// This map's value is the beginning and the end of that range.
using SpecMap = DenseMap<Function *, std::pair<unsigned, unsigned>>;

class FunctionSpecializer {		class FunctionSpecializer {
		labrineaAuthorUnsubmitted Done Reply Inline Actions Oops, forgot to remove it from here labrinea: Oops, forgot to remove it from here

/// The IPSCCP Solver.		/// The IPSCCP Solver.
SCCPSolver &Solver;		SCCPSolver &Solver;

Module &M;		Module &M;

/// Analysis manager, needed to invalidate analyses.		/// Analysis manager, needed to invalidate analyses.
FunctionAnalysisManager *FAM;		FunctionAnalysisManager *FAM;

/// Analyses used to help determine if a function should be specialized.		/// Analyses used to help determine if a function should be specialized.
std::function<const TargetLibraryInfo &(Function &)> GetTLI;		std::function<const TargetLibraryInfo &(Function &)> GetTLI;
std::function<TargetTransformInfo &(Function &)> GetTTI;		std::function<TargetTransformInfo &(Function &)> GetTTI;
std::function<AssumptionCache &(Function &)> GetAC;		std::function<AssumptionCache &(Function &)> GetAC;

SmallPtrSet<Function *, 32> Specializations;		SmallPtrSet<Function *, 32> Specializations;
SmallPtrSet<Function *, 32> FullySpecialized;		SmallPtrSet<Function *, 32> FullySpecialized;
DenseMap<Function *, CodeMetrics> FunctionMetrics;		DenseMap<Function *, CodeMetrics> FunctionMetrics;
		DenseMap<Argument *, unsigned> NumSpecs;

public:		public:
FunctionSpecializer(		FunctionSpecializer(
SCCPSolver &Solver, Module &M, FunctionAnalysisManager *FAM,		SCCPSolver &Solver, Module &M, FunctionAnalysisManager *FAM,
std::function<const TargetLibraryInfo &(Function &)> GetTLI,		std::function<const TargetLibraryInfo &(Function &)> GetTLI,
std::function<TargetTransformInfo &(Function &)> GetTTI,		std::function<TargetTransformInfo &(Function &)> GetTTI,
std::function<AssumptionCache &(Function &)> GetAC)		std::function<AssumptionCache &(Function &)> GetAC)
: Solver(Solver), M(M), FAM(FAM), GetTLI(GetTLI), GetTTI(GetTTI),		: Solver(Solver), M(M), FAM(FAM), GetTLI(GetTLI), GetTTI(GetTTI),
Show All 19 Lines	private:
void promoteConstantStackValues();		void promoteConstantStackValues();

/// Clean up fully specialized functions.		/// Clean up fully specialized functions.
void removeDeadFunctions();		void removeDeadFunctions();

/// Remove any ssa_copy intrinsics that may have been introduced.		/// Remove any ssa_copy intrinsics that may have been introduced.
void cleanUpSSA();		void cleanUpSSA();

// Compute the code metrics for function \p F.
CodeMetrics &analyzeFunction(Function *F);

/// @brief Find potential specialization opportunities.		/// @brief Find potential specialization opportunities.
/// @param F Function to specialize		/// @param F Function to specialize
/// @param Cost Cost of specializing a function. Final gain is this cost
/// minus benefit
/// @param AllSpecs A vector to add potential specializations to.		/// @param AllSpecs A vector to add potential specializations to.
/// @param SM A map for a function's specialisation range		/// @param SM A map for a function's specialisation range
/// @return True, if any potential specializations were found		/// @return True, if any potential specializations were found
bool findSpecializations(Function *F, InstructionCost Cost,		bool findSpecializations(Function *F, SpecMap &SM,
SmallVectorImpl<Spec> &AllSpecs, SpecMap &SM);		SmallVectorImpl<Spec> &AllSpecs);

bool isCandidateFunction(Function *F);		bool isCandidateFunction(Function *F);

/// @brief Create a specialization of \p F and prime the SCCPSolver		/// @brief Create a specialization of \p F and prime the SCCPSolver
/// @param F Function to specialize		/// @param F Function to specialize
/// @param S Which specialization to create		/// @param S Which specialization to create
/// @return The new, cloned function		/// @return The new, cloned function
Function createSpecialization(Function F, const SpecSig &S);		Function createSpecialization(Function F, const SpecSig &S);

/// Compute and return the cost of specializing function \p F.		/// Compute latency and codesize savings resulting from the replacement of
InstructionCost getSpecializationCost(Function *F);		/// argument \p A with constant \p C.
		void getSpecializationBonus(Argument A, Constant C, const LoopInfo &LI,
/// Compute a bonus for replacing argument \p A with constant \p C.		Cost &Latency, Cost &CodeSize,
InstructionCost getSpecializationBonus(Argument A, Constant C,		DenseSet<User *> &Visited);
const LoopInfo &LI);

/// Determine if it is possible to specialise the function for constant values		/// Determine if it is possible to specialise the function for constant values
/// of the formal parameter \p A.		/// of the formal parameter \p A.
bool isArgumentInteresting(Argument *A);		bool isArgumentInteresting(Argument *A);

/// Check if the value \p V (an actual argument) is a constant or can only		/// Check if the value \p V (an actual argument) is a constant or can only
/// have a constant value. Return that constant.		/// have a constant value. Return that constant.
Constant getCandidateConstant(Value V);		Constant getCandidateConstant(Value V);
Show All 10 Lines

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

//===- FunctionSpecialization.cpp - Function Specialization ---------------===//		//===- FunctionSpecialization.cpp - Function Specialization ---------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//
// This specialises functions with constant parameters. Constant parameters
// like function pointers and constant globals are propagated to the callee by
// specializing the function. The main benefit of this pass at the moment is
// that indirect calls are transformed into direct calls, which provides inline
// opportunities that the inliner would not have been able to achieve. That's
// why function specialisation is run before the inliner in the optimisation
// pipeline; that is by design. Otherwise, we would only benefit from constant
// passing, which is a valid use-case too, but hasn't been explored much in
// terms of performance uplifts, cost-model and compile-time impact.
//
// Current limitations:
// - It does not yet handle integer ranges. We do support "literal constants",
// but that's off by default under an option.
// - The cost-model could be further looked into (it mainly focuses on inlining
// benefits),
//
// Ideas:
// - With a function specialization attribute for arguments, we could have
// a direct way to steer function specialization, avoiding the cost-model,
// and thus control compile-times / code-size.
//
// Todos:
// - Specializing recursive functions relies on running the transformation a
// number of times, which is controlled by option
// `func-specialization-max-iters`. Thus, increasing this value and the
// number of iterations, will linearly increase the number of times recursive
// functions get specialized, see also the discussion in
// https://reviews.llvm.org/D106426 for details. Perhaps there is a
// compile-time friendlier way to control/limit the number of specialisations
// for recursive functions.
// - Don't transform the function if function specialization does not trigger;
// the SCCPSolver may make IR changes.
//
// References:
// - 2021 LLVM Dev Mtg “Introducing function specialisation, and can we enable
// it by default?”, https://www.youtube.com/watch?v=zJiCjeXgV5Q
//
//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/FunctionSpecialization.h"		#include "llvm/Transforms/IPO/FunctionSpecialization.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueLattice.h"		#include "llvm/Analysis/ValueLattice.h"
Show All 16 Lines	static cl::opt<bool> ForceSpecialization(
"Force function specialization for every call site with a constant "		"Force function specialization for every call site with a constant "
"argument"));		"argument"));

static cl::opt<unsigned> MaxClones(		static cl::opt<unsigned> MaxClones(
"funcspec-max-clones", cl::init(3), cl::Hidden, cl::desc(		"funcspec-max-clones", cl::init(3), cl::Hidden, cl::desc(
"The maximum number of clones allowed for a single function "		"The maximum number of clones allowed for a single function "
"specialization"));		"specialization"));

		static cl::opt<unsigned> MaxUserDepth(
		"funcspec-max-user-depth", cl::init(3), cl::Hidden, cl::desc(
		"The maximum recursion depth on a use-def chain for calculating "
		"the specialization bonus of a constant argument"));

		static cl::opt<unsigned> MinScore(
		"funcspec-min-score", cl::init(80), cl::Hidden, cl::desc(
		"Do not specialize functions with score lower than this value "
		"(the ratio of latency gains over codesize increase)"));

static cl::opt<unsigned> MinFunctionSize(		static cl::opt<unsigned> MinFunctionSize(
"funcspec-min-function-size", cl::init(100), cl::Hidden, cl::desc(		"funcspec-min-function-size", cl::init(100), cl::Hidden, cl::desc(
"Don't specialize functions that have less than this number of "		"Don't specialize functions that have less than this number of "
"instructions"));		"instructions"));

static cl::opt<unsigned> AvgLoopIters(		static cl::opt<unsigned> AvgLoopIters(
"funcspec-avg-loop-iters", cl::init(10), cl::Hidden, cl::desc(		"funcspec-avg-loop-iters", cl::init(10), cl::Hidden, cl::desc(
"Average loop iteration count"));		"Average loop iteration count"));
▲ Show 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	bool FunctionSpecializer::run() {
// Find possible specializations for each function.		// Find possible specializations for each function.
SpecMap SM;		SpecMap SM;
SmallVector<Spec, 32> AllSpecs;		SmallVector<Spec, 32> AllSpecs;
unsigned NumCandidates = 0;		unsigned NumCandidates = 0;
for (Function &F : M) {		for (Function &F : M) {
if (!isCandidateFunction(&F))		if (!isCandidateFunction(&F))
continue;		continue;

auto Cost = getSpecializationCost(&F);		if (!findSpecializations(&F, SM, AllSpecs)) {
if (!Cost.isValid()) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Invalid specialization cost for "
<< F.getName() << "\n");
continue;
}

LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for "
<< F.getName() << " is " << Cost << "\n");

if (!findSpecializations(&F, Cost, AllSpecs, SM)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "FnSpecialization: No possible specializations found for "		dbgs() << "FnSpecialization: No possible specializations found for "
<< F.getName() << "\n");		<< F.getName() << "\n");
continue;		continue;
}		}

++NumCandidates;		++NumCandidates;
}		}

if (!NumCandidates) {		if (!NumCandidates) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "FnSpecialization: No possible specializations found in module\n");		<< "FnSpecialization: No possible specializations found in module\n");
return false;		return false;
}		}

// Choose the most profitable specialisations, which fit in the module		// Choose the most profitable specialisations, which fit in the module
// specialization budget, which is derived from maximum number of		// specialization budget, which is derived from maximum number of
// specializations per specialization candidate function.		// specializations per specialization candidate function.
auto CompareGain = [&AllSpecs](unsigned I, unsigned J) {		auto CompareScore = [&AllSpecs](unsigned I, unsigned J) {
return AllSpecs[I].Gain > AllSpecs[J].Gain;		return AllSpecs[I].Score > AllSpecs[J].Score;
};		};
const unsigned NSpecs =		const unsigned NSpecs =
std::min(NumCandidates * MaxClones, unsigned(AllSpecs.size()));		std::min(NumCandidates * MaxClones, unsigned(AllSpecs.size()));
SmallVector<unsigned> BestSpecs(NSpecs + 1);		SmallVector<unsigned> BestSpecs(NSpecs + 1);
std::iota(BestSpecs.begin(), BestSpecs.begin() + NSpecs, 0);		std::iota(BestSpecs.begin(), BestSpecs.begin() + NSpecs, 0);
if (AllSpecs.size() > NSpecs) {		if (AllSpecs.size() > NSpecs) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Number of candidates exceed "		LLVM_DEBUG(dbgs() << "FnSpecialization: Number of candidates exceed "
<< "the maximum number of clones threshold.\n"		<< "the maximum number of clones threshold.\n"
<< "FnSpecialization: Specializing the "		<< "FnSpecialization: Specializing the "
<< NSpecs		<< NSpecs
<< " most profitable candidates.\n");		<< " most profitable candidates.\n");
std::make_heap(BestSpecs.begin(), BestSpecs.begin() + NSpecs, CompareGain);		std::make_heap(BestSpecs.begin(), BestSpecs.begin() + NSpecs, CompareScore);
for (unsigned I = NSpecs, N = AllSpecs.size(); I < N; ++I) {		for (unsigned I = NSpecs, N = AllSpecs.size(); I < N; ++I) {
BestSpecs[NSpecs] = I;		BestSpecs[NSpecs] = I;
std::push_heap(BestSpecs.begin(), BestSpecs.end(), CompareGain);		std::push_heap(BestSpecs.begin(), BestSpecs.end(), CompareScore);
std::pop_heap(BestSpecs.begin(), BestSpecs.end(), CompareGain);		std::pop_heap(BestSpecs.begin(), BestSpecs.end(), CompareScore);
}		}
}		}

LLVM_DEBUG(dbgs() << "FnSpecialization: List of specializations \n";		LLVM_DEBUG(dbgs() << "FnSpecialization: List of specializations \n";
for (unsigned I = 0; I < NSpecs; ++I) {		for (unsigned I = 0; I < NSpecs; ++I) {
const Spec &S = AllSpecs[BestSpecs[I]];		const Spec &S = AllSpecs[BestSpecs[I]];
dbgs() << "FnSpecialization: Function " << S.F->getName()		dbgs() << "FnSpecialization: Function " << S.F->getName()
<< " , gain " << S.Gain << "\n";		<< " , score " << S.Score << "\n";
for (const ArgInfo &Arg : S.Sig.Args)		for (const ArgInfo &Arg : S.Sig.Args)
dbgs() << "FnSpecialization: FormalArg = "		dbgs() << "FnSpecialization: FormalArg = "
<< Arg.Formal->getNameOrAsOperand()		<< Arg.Formal->getNameOrAsOperand()
<< ", ActualArg = " << Arg.Actual->getNameOrAsOperand()		<< ", ActualArg = " << Arg.Actual->getNameOrAsOperand()
<< "\n";		<< "\n";
});		});

// Create the chosen specializations.		// Create the chosen specializations.
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead function "
<< F->getName() << "\n");		<< F->getName() << "\n");
if (FAM)		if (FAM)
FAM->clear(*F, F->getName());		FAM->clear(*F, F->getName());
F->eraseFromParent();		F->eraseFromParent();
}		}
FullySpecialized.clear();		FullySpecialized.clear();
}		}

// Compute the code metrics for function \p F.
CodeMetrics &FunctionSpecializer::analyzeFunction(Function *F) {
auto I = FunctionMetrics.insert({F, CodeMetrics()});
CodeMetrics &Metrics = I.first->second;
if (I.second) {
// The code metrics were not cached.
SmallPtrSet<const Value *, 32> EphValues;
CodeMetrics::collectEphemeralValues(F, &(GetAC)(*F), EphValues);
for (BasicBlock &BB : *F)
Metrics.analyzeBasicBlock(&BB, (GetTTI)(*F), EphValues);

LLVM_DEBUG(dbgs() << "FnSpecialization: Code size of function "
<< F->getName() << " is " << Metrics.NumInsts
<< " instructions\n");
}
return Metrics;
}

/// Clone the function \p F and remove the ssa_copy intrinsics added by		/// Clone the function \p F and remove the ssa_copy intrinsics added by
/// the SCCPSolver in the cloned version.		/// the SCCPSolver in the cloned version.
static Function cloneCandidateFunction(Function F) {		static Function cloneCandidateFunction(Function F) {
ValueToValueMapTy Mappings;		ValueToValueMapTy Mappings;
Function *Clone = CloneFunction(F, Mappings);		Function *Clone = CloneFunction(F, Mappings);
removeSSACopy(*Clone);		removeSSACopy(*Clone);
return Clone;		return Clone;
}		}

bool FunctionSpecializer::findSpecializations(Function *F, InstructionCost Cost,		bool FunctionSpecializer::findSpecializations(Function *F, SpecMap &SM,
SmallVectorImpl<Spec> &AllSpecs,		SmallVectorImpl<Spec> &AllSpecs) {
SpecMap &SM) {		// Analyze the function if not done yet.
// A mapping from a specialisation signature to the index of the respective		auto [It, Inserted] = FunctionMetrics.try_emplace(F, CodeMetrics());
// entry in the all specialisation array. Used to ensure uniqueness of		CodeMetrics &Metrics = It->second;
// specialisations.		if (Inserted) {
DenseMap<SpecSig, unsigned> UM;		// The code metrics were not cached.
		SmallPtrSet<const Value *, 32> EphValues;
		CodeMetrics::collectEphemeralValues(F, &(GetAC)(*F), EphValues);
		for (BasicBlock &BB : *F)
		Metrics.analyzeBasicBlock(&BB, (GetTTI)(*F), EphValues);
		}
		// If the code metrics reveal that we shouldn't duplicate the function, we
		// shouldn't specialize it. Set the specialization cost to Invalid.
		// Or if the lines of codes implies that this function is easy to get
		// inlined so that we shouldn't specialize it.
		if (Metrics.notDuplicatable \|\| !Metrics.NumInsts.isValid() \|\|
		(!ForceSpecialization && !F->hasFnAttribute(Attribute::NoInline) &&
		Metrics.NumInsts < MinFunctionSize))
		return false;

// Get a list of interesting arguments.		// Get a list of interesting arguments.
SmallVector<Argument *> Args;		SmallVector<Argument *> Args;
for (Argument &Arg : F->args())		for (Argument &Arg : F->args())
if (isArgumentInteresting(&Arg))		if (isArgumentInteresting(&Arg))
Args.push_back(&Arg);		Args.push_back(&Arg);

if (Args.empty())		if (Args.empty())
return false;		return false;

		// A mapping from a specialisation signature to the index of the respective
		// entry in the all specialisation array. Used to ensure uniqueness of
		// specialisations.
		DenseMap<SpecSig, unsigned> UniqueSpecs;
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: `UM` is a bit cryptic. SjoerdMeijer: Nit: `UM` is a bit cryptic.
		labrineaAuthorUnsubmitted Done Reply Inline Actions UniqueMap. Was already named as such before this patch, so unrelated. Please ignore on this review. labrinea: UniqueMap. Was already named as such before this patch, so unrelated. Please ignore on this…
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Naming is difficult, but it would be a shame if we cannot come up with something better while we are at it. SjoerdMeijer: Naming is difficult, but it would be a shame if we cannot come up with something better while…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Agreed. I'll update it. labrinea: Agreed. I'll update it.

		// LoopInfo is required for the bonus estimation of each argument's users.
		const LoopInfo &LI = Solver.getLoopInfo(*F);

bool Found = false;		bool Found = false;
for (User *U : F->users()) {		for (User *U : F->users()) {
if (!isa<CallInst>(U) && !isa<InvokeInst>(U))		if (!isa<CallInst>(U) && !isa<InvokeInst>(U))
continue;		continue;
auto &CS = *cast<CallBase>(U);		auto &CS = *cast<CallBase>(U);

// The user instruction does not call our function.		// The user instruction does not call our function.
if (CS.getCalledFunction() != F)		if (CS.getCalledFunction() != F)
Show All 21 Lines	for (Argument *A : Args) {
<< "\n");		<< "\n");
S.Args.push_back({A, C});		S.Args.push_back({A, C});
}		}

if (S.Args.empty())		if (S.Args.empty())
continue;		continue;

// Check if we have encountered the same specialisation already.		// Check if we have encountered the same specialisation already.
if (auto It = UM.find(S); It != UM.end()) {		if (auto It = UniqueSpecs.find(S); It != UniqueSpecs.end()) {
// Existing specialisation. Add the call to the list to rewrite, unless		// Existing specialisation. Add the call to the list to rewrite, unless
// it's a recursive call. A specialisation, generated because of a		// it's a recursive call. A specialisation, generated because of a
// recursive call may end up as not the best specialisation for all		// recursive call may end up as not the best specialisation for all
// the cloned instances of this call, which result from specialising		// the cloned instances of this call, which result from specialising
// functions. Hence we don't rewrite the call directly, but match it with		// functions. Hence we don't rewrite the call directly, but match it with
// the best specialisation once all specialisations are known.		// the best specialisation once all specialisations are known.
if (CS.getFunction() == F)		if (CS.getFunction() == F)
continue;		continue;
const unsigned Index = It->second;		const unsigned Index = It->second;
AllSpecs[Index].CallSites.push_back(&CS);		AllSpecs[Index].CallSites.push_back(&CS);
} else {		} else {
// Calculate the specialisation gain.		// Calculate the specialisation gain.
InstructionCost Gain = 0 - Cost;		Cost Latency = 0;
		Cost CodeSize = Metrics.NumInsts +
		Metrics.NumInlineCandidates * MinFunctionSize;
		DenseSet<User *> Visited;
for (ArgInfo &A : S.Args)		for (ArgInfo &A : S.Args)
Gain +=		getSpecializationBonus(A.Formal, A.Actual, LI, Latency, CodeSize,
getSpecializationBonus(A.Formal, A.Actual, Solver.getLoopInfo(*F));		Visited);
		assert (CodeSize >= 0 &&
		"The code size bonus cannot be larger than the function");
		Cost Score = Latency / (CodeSize + 1);

		LLVM_DEBUG(dbgs() << "FnSpecialization: Score {Latency " << Latency
		<< ", CodeSize " << CodeSize << "} = " << Score
		<< "\n");
// Discard unprofitable specialisations.		// Discard unprofitable specialisations.
if (!ForceSpecialization && Gain <= 0)		if (!ForceSpecialization && Score < MinScore)
continue;		continue;

// Create a new specialisation entry.		// Create a new specialisation entry.
auto &Spec = AllSpecs.emplace_back(F, S, Gain);		auto &Spec = AllSpecs.emplace_back(F, S, Score);
if (CS.getFunction() != F)		if (CS.getFunction() != F)
Spec.CallSites.push_back(&CS);		Spec.CallSites.push_back(&CS);
const unsigned Index = AllSpecs.size() - 1;		const unsigned Index = AllSpecs.size() - 1;
UM[S] = Index;		UniqueSpecs[S] = Index;
if (auto [It, Inserted] = SM.try_emplace(F, Index, Index + 1); !Inserted)		if (auto [It, Inserted] = SM.try_emplace(F, Index, Index + 1); !Inserted)
It->second.second = Index + 1;		It->second.second = Index + 1;
Found = true;		Found = true;
}		}
}		}

return Found;		return Found;
}		}
Show All 34 Lines
Function FunctionSpecializer::createSpecialization(Function F, const SpecSig &S) {		Function FunctionSpecializer::createSpecialization(Function F, const SpecSig &S) {
Function *Clone = cloneCandidateFunction(F);		Function *Clone = cloneCandidateFunction(F);

Solver.addArgumentTrackedFunction(Clone);		Solver.addArgumentTrackedFunction(Clone);
Solver.addTrackedFunction(Clone);		Solver.addTrackedFunction(Clone);

// Mark all the specialized functions		// Mark all the specialized functions
Specializations.insert(Clone);		Specializations.insert(Clone);
++NumSpecsCreated;

return Clone;		// Update the cost model.
}		for (const ArgInfo &A : S.Args)
		++NumSpecs[A.Formal];

/// Compute and return the cost of specializing function \p F.		++NumSpecsCreated;
InstructionCost FunctionSpecializer::getSpecializationCost(Function *F) {
CodeMetrics &Metrics = analyzeFunction(F);
// If the code metrics reveal that we shouldn't duplicate the function, we
// shouldn't specialize it. Set the specialization cost to Invalid.
// Or if the lines of codes implies that this function is easy to get
// inlined so that we shouldn't specialize it.
if (Metrics.notDuplicatable \|\| !Metrics.NumInsts.isValid() \|\|
(!ForceSpecialization && !F->hasFnAttribute(Attribute::NoInline) &&
Metrics.NumInsts < MinFunctionSize))
return InstructionCost::getInvalid();

// Otherwise, set the specialization cost to be the cost of all the		return Clone;
// instructions in the function.
return Metrics.NumInsts * InlineConstants::getInstrCost();
}		}

static InstructionCost getUserBonus(User *U, llvm::TargetTransformInfo &TTI,		static void getUserBonus(User *U, TargetTransformInfo &TTI,
const LoopInfo &LI) {		const LoopInfo &LI, unsigned UserDepth,
		Cost &Latency, Cost &CodeSize,
		DenseSet<User *> &Visited) {
		// If the user is not an instruction we do not know how to evaluate.
		// If we have already visited this user there's nothing to do.
		// If the user is deep in the use-def chain then stop traversing.
		auto [It, Inserted] = Visited.insert(U);
auto *I = dyn_cast_or_null<Instruction>(U);		auto *I = dyn_cast_or_null<Instruction>(U);
// If not an instruction we do not know how to evaluate.		if (!I \|\| !Inserted \|\| UserDepth > MaxUserDepth)
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Bit of duplication here with the comment above. SjoerdMeijer: Bit of duplication here with the comment above.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Good point, I forgot to remove these comments. labrinea: Good point, I forgot to remove these comments.
// Keep minimum possible cost for now so that it doesnt affect		return;
// specialization.
if (!I)
return std::numeric_limits<unsigned>::min();

InstructionCost Cost =		// Ignore SSA copies.
TTI.getInstructionCost(U, TargetTransformInfo::TCK_SizeAndLatency);		auto *II = dyn_cast<IntrinsicInst>(I);
		bool IsSSACopy = II && II->getIntrinsicID() == Intrinsic::ssa_copy;

// Increase the cost if it is inside the loop.		if (!IsSSACopy) {
		// Increase the Latency if inside a loop and modulate by UserDepth.
unsigned LoopDepth = LI.getLoopDepth(I->getParent());		unsigned LoopDepth = LI.getLoopDepth(I->getParent());
Cost *= std::pow((double)AvgLoopIters, LoopDepth);		Latency += (TTI.getInstructionCost(U, TargetTransformInfo::TCK_Latency) *
		std::pow((double)AvgLoopIters, LoopDepth)) / UserDepth;
		CodeSize += TTI.getInstructionCost(U, TargetTransformInfo::TCK_CodeSize);

		LLVM_DEBUG(dbgs() << "FnSpecialization: Bonus { Latency = " << Latency
		<< ", CodeSize = " << CodeSize << "} after user " << *U
		<< "\n");
		++UserDepth;
		}
// Traverse recursively if there are more uses.		// Traverse recursively if there are more uses.
// TODO: Any other instructions to be added here?		for (User *User : I->users())
if (I->mayReadFromMemory() \|\| I->isCast())		getUserBonus(User, TTI, LI, UserDepth, Latency, CodeSize, Visited);
for (auto *User : I->users())
Cost += getUserBonus(User, TTI, LI);

return Cost;
}		}

/// Compute a bonus for replacing argument \p A with constant \p C.		/// Compute latency and codesize savings resulting from the replacement of
		SjoerdMeijerUnsubmitted Done Reply Inline Actions We are not calculating one bonus anymore, but two values latency and code size. SjoerdMeijer: We are not calculating one bonus anymore, but two values latency and code size.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Will update the wording. labrinea: Will update the wording.
InstructionCost		/// argument \p A with constant \p C.
FunctionSpecializer::getSpecializationBonus(Argument A, Constant C,		void FunctionSpecializer::getSpecializationBonus(Argument A, Constant C,
const LoopInfo &LI) {		const LoopInfo &LI, Cost &Latency, Cost &CodeSize,
		DenseSet<User *> &Visited) {
Function *F = A->getParent();		Function *F = A->getParent();
auto &TTI = (GetTTI)(*F);		auto &TTI = (GetTTI)(*F);
LLVM_DEBUG(dbgs() << "FnSpecialization: Analysing bonus for constant: "		LLVM_DEBUG(dbgs() << "FnSpecialization: Analysing bonus for constant: "
<< C->getNameOrAsOperand() << "\n");		<< C->getNameOrAsOperand() << "\n");
		// The more we specialize an argument, the more expensive it gets.
		CodeSize *= NumSpecs[A] + 1;

InstructionCost TotalCost = 0;		for (User *U : A->users()) {
for (auto *U : A->users()) {		Cost UserLatency = 0;
TotalCost += getUserBonus(U, TTI, LI);		Cost UserSize = 0;
LLVM_DEBUG(dbgs() << "FnSpecialization: User cost ";		getUserBonus(U, TTI, LI, /UserDepth=/1, UserLatency, UserSize, Visited);
TotalCost.print(dbgs()); dbgs() << " for: " << *U << "\n");		LLVM_DEBUG(dbgs() << "FnSpecialization: Accumulated bonus { Latency = "
		<< UserLatency << ", CodeSize = " << UserSize
		<< "} for user " << *U << "\n");
		// Decrease the codesize penalty by the footprint of each user.
		Latency += UserLatency;
		SjoerdMeijerUnsubmitted Done Reply Inline Actions I don't quite get why we are subtracting UserSize here. I guess something related to double or recounting, but after a quick look it wasn't clear to me, perhaps a comment would be helpful here. SjoerdMeijer: I don't quite get why we are subtracting UserSize here. I guess something related to double or…
		labrineaAuthorUnsubmitted Done Reply Inline Actions We start from `Cost CodeSize = Metrics.NumInsts + Metrics.NumInlineCandidates * MinFunctionSize;` (see the invocation of getSpecializationBonus). Then we start decreasing it by the footprint of each user. labrinea: We start from `Cost CodeSize = Metrics.NumInsts + Metrics.NumInlineCandidates * MinFunctionSize…
		CodeSize -= UserSize;
}		}

// The below heuristic is only concerned with exposing inlining		// The below heuristic is only concerned with exposing inlining
// opportunities via indirect call promotion. If the argument is not a		// opportunities via indirect call promotion. If the argument is not a
// (potentially casted) function pointer, give up.		// (potentially casted) function pointer, give up.
Function *CalledFunction = dyn_cast<Function>(C->stripPointerCasts());		Function *CalledFunction = dyn_cast<Function>(C->stripPointerCasts());
if (!CalledFunction)		if (!CalledFunction)
return TotalCost;		return;

// Get TTI for the called function (used for the inline cost).		// Get TTI for the called function (used for the inline cost).
auto &CalleeTTI = (GetTTI)(*CalledFunction);		auto &CalleeTTI = (GetTTI)(*CalledFunction);

// Look at all the call sites whose called value is the argument.		// Look at all the call sites whose called value is the argument.
// Specializing the function on the argument would allow these indirect		// Specializing the function on the argument would allow these indirect
// calls to be promoted to direct calls. If the indirect call promotion		// calls to be promoted to direct calls. If the indirect call promotion
// would likely enable the called function to be inlined, specializing is a		// would likely enable the called function to be inlined, specializing is a
Show All 26 Lines	for (User *U : A->users()) {
// threshold.		// threshold.
if (IC.isAlways())		if (IC.isAlways())
Bonus += Params.DefaultThreshold;		Bonus += Params.DefaultThreshold;
else if (IC.isVariable() && IC.getCostDelta() > 0)		else if (IC.isVariable() && IC.getCostDelta() > 0)
Bonus += IC.getCostDelta();		Bonus += IC.getCostDelta();

LLVM_DEBUG(dbgs() << "FnSpecialization: Inlining bonus " << Bonus		LLVM_DEBUG(dbgs() << "FnSpecialization: Inlining bonus " << Bonus
<< " for user " << *U << "\n");		<< " for user " << *U << "\n");
		Latency += Bonus;
}		}

return TotalCost + Bonus;
}		}

static bool isValidArgumentType(Type *Ty) {		static bool isValidArgumentType(Type *Ty) {
return Ty->isPointerTy() \|\| (SpecializeLiteralConstant &&		return Ty->isPointerTy() \|\| (SpecializeLiteralConstant &&
(Ty->isIntegerTy() \|\| Ty->isFloatingPointTy() \|\| Ty->isStructTy()));		(Ty->isIntegerTy() \|\| Ty->isFloatingPointTy() \|\| Ty->isStructTy()));
}		}

/// Determine if it is possible to specialise the function for constant values		/// Determine if it is possible to specialise the function for constant values
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	void FunctionSpecializer::updateCallSites(Function F, const Spec Begin,

unsigned NCallsLeft = ToUpdate.size();		unsigned NCallsLeft = ToUpdate.size();
for (CallBase *CS : ToUpdate) {		for (CallBase *CS : ToUpdate) {
bool ShouldDecrementCount = CS->getFunction() == F;		bool ShouldDecrementCount = CS->getFunction() == F;

// Find the best matching specialisation.		// Find the best matching specialisation.
const Spec *BestSpec = nullptr;		const Spec *BestSpec = nullptr;
for (const Spec &S : make_range(Begin, End)) {		for (const Spec &S : make_range(Begin, End)) {
if (!S.Clone \|\| (BestSpec && S.Gain <= BestSpec->Gain))		if (!S.Clone \|\| (BestSpec && S.Score <= BestSpec->Score))
continue;		continue;

if (any_of(S.Sig.Args, [CS, this](const ArgInfo &Arg) {		if (any_of(S.Sig.Args, [CS, this](const ArgInfo &Arg) {
unsigned ArgNo = Arg.Formal->getArgNo();		unsigned ArgNo = Arg.Formal->getArgNo();
return getCandidateConstant(CS->getArgOperand(ArgNo)) != Arg.Actual;		return getCandidateConstant(CS->getArgOperand(ArgNo)) != Arg.Actual;
}))		}))
continue;		continue;

Show All 22 Lines

llvm/test/Transforms/FunctionSpecialization/function-specialization-constant-integers.ll

	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-for-literal-constant=true -funcspec-min-function-size=10 -S < %s \| FileCheck %s			; RUN: opt -passes="ipsccp<func-spec>" -funcspec-for-literal-constant=true -force-specialization -S < %s \| FileCheck %s

	; Check that the literal constant parameter could be specialized.			; Check that the literal constant parameter could be specialized.
	; CHECK: @foo.1(			; CHECK: @foo.1(
	; CHECK: @foo.2(			; CHECK: @foo.2(

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	declare i32 @getValue()			declare i32 @getValue()
	Show All 35 Lines

llvm/test/Transforms/FunctionSpecialization/function-specialization-loop.ll

	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-avg-loop-iters=5 -funcspec-min-function-size=10 -S < %s \| FileCheck %s			; RUN: opt -passes="ipsccp<func-spec>" -funcspec-avg-loop-iters=11 -funcspec-min-function-size=10 -S < %s \| FileCheck %s

	; Check that the loop depth results in a larger specialization bonus.			; Check that the loop depth results in a larger specialization bonus.
	; CHECK: @foo.1(			; CHECK: @foo.1(
	; CHECK: @foo.2(			; CHECK: @foo.2(

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	@A = external dso_local constant i32, align 4			@A = external dso_local constant i32, align 4
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/get-possible-constants.ll

	; RUN: opt -S --passes="ipsccp<func-spec>" < %s \| FileCheck %s			; RUN: opt -S --passes="ipsccp<func-spec>" -force-specialization < %s \| FileCheck %s
	define dso_local i32 @p0(i32 noundef %x) {			define dso_local i32 @p0(i32 noundef %x) {
	entry:			entry:
	%add = add nsw i32 %x, 1			%add = add nsw i32 %x, 1
	ret i32 %add			ret i32 %add
	}			}

	define dso_local i32 @p1(i32 noundef %x) {			define dso_local i32 @p1(i32 noundef %x) {
	entry:			entry:
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/recursive-penalty.ll

This file was added.

				; REQUIRES: asserts
				; RUN: opt -passes="ipsccp<func-spec>,inline,instcombine,simplifycfg" -S \
				; RUN: -funcspec-min-function-size=23 -funcspec-max-iters=100 \
				; RUN: -debug-only=function-specialization < %s 2>&1 \| FileCheck %s

				; Make sure the number of specializations created are not
				; linear to the number of iterations (funcspec-max-iters).

				; CHECK: FnSpecialization: Created 10 specializations in module

				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Do we want to check the resulting IR too? SjoerdMeijer: Do we want to check the resulting IR too?
				labrineaAuthorUnsubmitted Done Reply Inline Actions Not really. All we care about in this test is to show that the number of specializations created is not linear to `funcspec-max-iters`. labrinea: Not really. All we care about in this test is to show that the number of specializations…
				@Global = internal constant i32 1, align 4

				define internal void @recursiveFunc(ptr readonly %arg) {
				%temp = alloca i32, align 4
				%arg.load = load i32, ptr %arg, align 4
				%arg.cmp = icmp slt i32 %arg.load, 10000
				br i1 %arg.cmp, label %loop1, label %ret.block

				loop1:
				br label %loop2

				loop2:
				br label %loop3

				loop3:
				br label %loop4

				loop4:
				br label %block6

				block6:
				call void @print_val(i32 %arg.load)
				%arg.add = add nsw i32 %arg.load, 1
				store i32 %arg.add, ptr %temp, align 4
				call void @recursiveFunc(ptr %temp)
				br label %loop4.end

				loop4.end:
				%exit_cond1 = call i1 @exit_cond()
				br i1 %exit_cond1, label %loop4, label %loop3.end

				loop3.end:
				%exit_cond2 = call i1 @exit_cond()
				br i1 %exit_cond2, label %loop3, label %loop2.end

				loop2.end:
				%exit_cond3 = call i1 @exit_cond()
				br i1 %exit_cond3, label %loop2, label %loop1.end

				loop1.end:
				%exit_cond4 = call i1 @exit_cond()
				br i1 %exit_cond4, label %loop1, label %ret.block

				ret.block:
				ret void
				}

				define i32 @main() {
				call void @recursiveFunc(ptr @Global)
				ret i32 0
				}

				declare dso_local void @print_val(i32)
				declare dso_local i1 @exit_cond()

llvm/test/Transforms/FunctionSpecialization/remove-dead-recursive-function.ll

	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-min-function-size=3 -S < %s \| FileCheck %s			; RUN: opt -passes="ipsccp<func-spec>" -force-specialization -S < %s \| FileCheck %s

	define i64 @main(i64 %x, i1 %flag) {			define i64 @main(i64 %x, i1 %flag) {
	entry:			entry:
	br i1 %flag, label %plus, label %minus			br i1 %flag, label %plus, label %minus

	plus:			plus:
	%tmp0 = call i64 @compute(i64 %x, ptr @plus)			%tmp0 = call i64 @compute(i64 %x, ptr @plus)
	br label %merge			br label %merge
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Cost model improvements.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 507835

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

llvm/test/Transforms/FunctionSpecialization/function-specialization-constant-integers.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization-loop.ll

llvm/test/Transforms/FunctionSpecialization/get-possible-constants.ll

llvm/test/Transforms/FunctionSpecialization/recursive-penalty.ll

llvm/test/Transforms/FunctionSpecialization/remove-dead-recursive-function.ll

[FuncSpec] Cost model improvements.
AbandonedPublic