This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
-
FunctionSpecialization.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
15/16
FunctionSpecialization.cpp
-
unittests/Transforms/IPO/
-
Transforms/
-
IPO/
-
FunctionSpecializationTest.cpp

Differential D157123

[FuncSpec] Rework the discardment logic for unprofitable specializations.
ClosedPublic

Authored by labrinea on Aug 4 2023, 11:37 AM.

Download Raw Diff

Details

Reviewers

ChuanqiXu
chill

Commits

rGd1b376fd7bf7: [FuncSpec] Rework the discardment logic for unprofitable specializations.

Summary

Currently we make an arbitrary comparison between codesize and latency
in order to decide whether to keep a specialization or not. Sometimes
the latency savings are biased in favor of loops because of imprecise
block frequencies, therefore this metric contains a lot of noise. This
patch tries to address the problem as follows:

Reject specializations whose codesize savings are less than X% of the original function size.
Reject specializations whose latency savings are less than Y% of the original function size.
Reject specializations whose inlining bonus is less than Z% of the original function size.

I am not saying this is super precise, but at least X, Y and Z are
configurable, allowing us to tweak the cost model. Moreover, it lets
us prioritize codesize over latency, which is a less noisy metric.

I am also increasing the minimum size a function should have to be
considered a candidate for specialization. Initially the cost of
a function was calculated as

  
CodeMetrics::NumInsts * InlineConstants::getInstrCost()

which later in D150464 was altered into CodeMetrics::NumInsts since
the metric is supposed to model TargetTransformInfo::TCK_CodeSize.
However, we omitted adjusting MinFunctionSize in that commit.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

labrinea created this revision.Aug 4 2023, 11:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2023, 11:37 AM

Herald added subscribers: hoy, ormris, hiraditya. · View Herald Transcript

labrinea requested review of this revision.Aug 4 2023, 11:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2023, 11:37 AM

labrinea added a parent revision: D156903: [FuncSpec] Estimate dead blocks more accurately..Aug 4 2023, 11:37 AM

labrinea added inline comments.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
902–903	I couldn't think of an example where this would be useful, let alone that calling `getInlineCost` is expensive according to this comment: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/InlineCost.h#L274

Harbormaster completed remote builds in B250390: Diff 547289.Aug 4 2023, 1:55 PM

ChuanqiXu added inline comments.Aug 6 2023, 6:47 PM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
864–865	I am wondering if the condition `\|\|` may be too strict. My rough feeling is: it is good enough to perform specialization if we can see one of the benefits.
902–903	Maybe I just feel reusing the existing mature cost model may be a better choice. But given that is expensive and we're willing to build and fine-tunning our cost model. It may be good to not depend the expensieve `getInlineCost`.
933–938	Let's move the comment to the definition of the function. It will be much more clear.

labrinea added inline comments.Aug 7 2023, 1:10 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
864–865	Without the latency condition CTMark triggers too much. The two conditions seem to work quite well in conjunction, CTMark gets less specializations than before, which is good for compile times, but we still specialize the interesting cases (mcf, exchange).

LGTM with moving the comment.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
864–865	The result from benchmark is overwhelming : )

This revision is now accepted and ready to land.Aug 7 2023, 1:16 AM

@ChuanqiXu I was mistaken, the condition is indeed too strict. I'll rethink about it.

Changes from last revision:

Increased the minimum function size
Added options to control the codesize/latency/inlining savings as percentages of the function size

labrinea requested review of this revision.Aug 8 2023, 10:42 AM

For the practical effects, I feel good as long as the interesting benchmarks remains optimized.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
108	What is a `inlining saving`? I am not sure if this is a wide known definition.
833	May this be a better name?
851	What's the meaning of the magic number `100`?
851–857	nit: may this be slightly more clear?

labrinea added inline comments.Aug 8 2023, 11:58 PM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
108	I can rename it to MinInliningBonus
833	Ok
851	Percentage. We can't do floating point decision so we multiply by MinSavings and then divide by 100.
851	*division
851–857	Okay

labrinea added inline comments.Aug 9 2023, 12:37 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
833	Sorry I misunderstood. I am naming it score because at the end it contains InliningBonus + std::max(B.CodeSize, B.Latency).

Addressed review comments.

labrinea marked 7 inline comments as done.Aug 9 2023, 12:39 AM

LGTM then. Thanks.

This revision is now accepted and ready to land.Aug 9 2023, 2:05 AM

This revision was landed with ongoing or failed builds.Aug 9 2023, 2:33 AM

Closed by commit rGd1b376fd7bf7: [FuncSpec] Rework the discardment logic for unprofitable specializations. (authored by labrinea). · Explain Why

This revision was automatically updated to reflect the committed changes.

labrinea added a commit: rGd1b376fd7bf7: [FuncSpec] Rework the discardment logic for unprofitable specializations..

Harbormaster completed remote builds in B251301: Diff 548495.Aug 9 2023, 3:14 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

FunctionSpecialization.h

18 lines

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

105 lines

unittests/

Transforms/

IPO/

FunctionSpecializationTest.cpp

26 lines

Diff 548531

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	public:
InstCostVisitor(const DataLayout &DL, BlockFrequencyInfo &BFI,		InstCostVisitor(const DataLayout &DL, BlockFrequencyInfo &BFI,
TargetTransformInfo &TTI, SCCPSolver &Solver)		TargetTransformInfo &TTI, SCCPSolver &Solver)
: DL(DL), BFI(BFI), TTI(TTI), Solver(Solver) {}		: DL(DL), BFI(BFI), TTI(TTI), Solver(Solver) {}

bool isBlockExecutable(BasicBlock *BB) {		bool isBlockExecutable(BasicBlock *BB) {
return Solver.isBlockExecutable(BB) && !DeadBlocks.contains(BB);		return Solver.isBlockExecutable(BB) && !DeadBlocks.contains(BB);
}		}

Bonus getUserBonus(Instruction User, Value Use = nullptr,		Bonus getSpecializationBonus(Argument A, Constant C);
Constant *C = nullptr);

Bonus getBonusFromPendingPHIs();		Bonus getBonusFromPendingPHIs();

private:		private:
friend class InstVisitor<InstCostVisitor, Constant *>;		friend class InstVisitor<InstCostVisitor, Constant *>;

static bool canEliminateSuccessor(BasicBlock BB, BasicBlock Succ,		static bool canEliminateSuccessor(BasicBlock BB, BasicBlock Succ,
DenseSet<BasicBlock *> &DeadBlocks);		DenseSet<BasicBlock *> &DeadBlocks);

		Bonus getUserBonus(Instruction User, Value Use = nullptr,
		Constant *C = nullptr);

Cost estimateBasicBlocks(SmallVectorImpl<BasicBlock *> &WorkList);		Cost estimateBasicBlocks(SmallVectorImpl<BasicBlock *> &WorkList);
Cost estimateSwitchInst(SwitchInst &I);		Cost estimateSwitchInst(SwitchInst &I);
Cost estimateBranchInst(BranchInst &I);		Cost estimateBranchInst(BranchInst &I);

Constant *visitInstruction(Instruction &I) { return nullptr; }		Constant *visitInstruction(Instruction &I) { return nullptr; }
Constant *visitPHINode(PHINode &I);		Constant *visitPHINode(PHINode &I);
Constant *visitFreezeInst(FreezeInst &I);		Constant *visitFreezeInst(FreezeInst &I);
Constant *visitCallBase(CallBase &I);		Constant *visitCallBase(CallBase &I);
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	public:
bool run();		bool run();

InstCostVisitor getInstCostVisitorFor(Function *F) {		InstCostVisitor getInstCostVisitorFor(Function *F) {
auto &BFI = GetBFI(*F);		auto &BFI = GetBFI(*F);
auto &TTI = GetTTI(*F);		auto &TTI = GetTTI(*F);
return InstCostVisitor(M.getDataLayout(), BFI, TTI, Solver);		return InstCostVisitor(M.getDataLayout(), BFI, TTI, Solver);
}		}

/// Compute a bonus for replacing argument \p A with constant \p C.
Bonus getSpecializationBonus(Argument A, Constant C,
InstCostVisitor &Visitor);

private:		private:
Constant getPromotableAlloca(AllocaInst Alloca, CallInst *Call);		Constant getPromotableAlloca(AllocaInst Alloca, CallInst *Call);

/// A constant stack value is an AllocaInst that has a single constant		/// A constant stack value is an AllocaInst that has a single constant
/// value stored to it. Return this constant if such an alloca stack value		/// value stored to it. Return this constant if such an alloca stack value
/// is a function argument.		/// is a function argument.
Constant getConstantStackValue(CallInst Call, Value *Val);		Constant getConstantStackValue(CallInst Call, Value *Val);

/// See if there are any new constant values for the callers of \p F via		/// See if there are any new constant values for the callers of \p F via
/// stack variables and promote them to global variables.		/// stack variables and promote them to global variables.
void promoteConstantStackValues(Function *F);		void promoteConstantStackValues(Function *F);

/// Clean up fully specialized functions.		/// Clean up fully specialized functions.
void removeDeadFunctions();		void removeDeadFunctions();

/// Remove any ssa_copy intrinsics that may have been introduced.		/// Remove any ssa_copy intrinsics that may have been introduced.
void cleanUpSSA();		void cleanUpSSA();

/// @brief Find potential specialization opportunities.		/// @brief Find potential specialization opportunities.
/// @param F Function to specialize		/// @param F Function to specialize
/// @param SpecCost Cost of specializing a function. Final score is benefit		/// @param FuncSize Cost of specializing a function.
/// minus this cost.
/// @param AllSpecs A vector to add potential specializations to.		/// @param AllSpecs A vector to add potential specializations to.
/// @param SM A map for a function's specialisation range		/// @param SM A map for a function's specialisation range
/// @return True, if any potential specializations were found		/// @return True, if any potential specializations were found
bool findSpecializations(Function *F, unsigned SpecCost,		bool findSpecializations(Function *F, unsigned FuncSize,
SmallVectorImpl<Spec> &AllSpecs, SpecMap &SM);		SmallVectorImpl<Spec> &AllSpecs, SpecMap &SM);

		/// Compute the inlining bonus for replacing argument \p A with constant \p C.
		unsigned getInliningBonus(Argument A, Constant C);

bool isCandidateFunction(Function *F);		bool isCandidateFunction(Function *F);

/// @brief Create a specialization of \p F and prime the SCCPSolver		/// @brief Create a specialization of \p F and prime the SCCPSolver
/// @param F Function to specialize		/// @param F Function to specialize
/// @param S Which specialization to create		/// @param S Which specialization to create
/// @return The new, cloned function		/// @return The new, cloned function
Function createSpecialization(Function F, const SpecSig &S);		Function createSpecialization(Function F, const SpecSig &S);

Show All 17 Lines

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines static cl::opt<unsigned> MaxIncomingPhiValues(

"considered during the specialization bonus estimation")); "considered during the specialization bonus estimation"));

static cl::opt<unsigned> MaxBlockPredecessors( static cl::opt<unsigned> MaxBlockPredecessors(

"funcspec-max-block-predecessors", cl::init(2), cl::Hidden, cl::desc( "funcspec-max-block-predecessors", cl::init(2), cl::Hidden, cl::desc(

"The maximum number of predecessors a basic block can have to be " "The maximum number of predecessors a basic block can have to be "

"considered during the estimation of dead code")); "considered during the estimation of dead code"));

static cl::opt<unsigned> MinFunctionSize( static cl::opt<unsigned> MinFunctionSize(

"funcspec-min-function-size", cl::init(100), cl::Hidden, cl::desc( "funcspec-min-function-size", cl::init(300), cl::Hidden, cl::desc(

"Don't specialize functions that have less than this number of " "Don't specialize functions that have less than this number of "

"instructions")); "instructions"));

static cl::opt<unsigned> MinCodeSizeSavings(

"funcspec-min-codesize-savings", cl::init(20), cl::Hidden, cl::desc(

"Reject specializations whose codesize savings are less than this"

"much percent of the original function size"));

static cl::opt<unsigned> MinLatencySavings(

"funcspec-min-latency-savings", cl::init(70), cl::Hidden, cl::desc(

"Reject specializations whose latency savings are less than this"

"much percent of the original function size"));

static cl::opt<unsigned> MinInliningBonus(

"funcspec-min-inlining-bonus", cl::init(300), cl::Hidden, cl::desc(

"Reject specializations whose inlining bonus is less than this"

ChuanqiXuUnsubmitted

Done

What is a inlining saving? I am not sure if this is a wide known definition.

ChuanqiXu: What is a `inlining saving`? I am not sure if this is a wide known definition.

labrineaAuthorUnsubmitted

Done

I can rename it to MinInliningBonus

labrinea: I can rename it to MinInliningBonus

"much percent of the original function size"));

static cl::opt<bool> SpecializeOnAddress( static cl::opt<bool> SpecializeOnAddress(

"funcspec-on-address", cl::init(false), cl::Hidden, cl::desc( "funcspec-on-address", cl::init(false), cl::Hidden, cl::desc(

"Enable function specialization on the address of global values")); "Enable function specialization on the address of global values"));

// Disabled by default as it can significantly increase compilation times. // Disabled by default as it can significantly increase compilation times.

// //

// https://llvm-compile-time-tracker.com // https://llvm-compile-time-tracker.com

// https://github.com/nikic/llvm-compile-time-tracker // https://github.com/nikic/llvm-compile-time-tracker

▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines while (!PendingPHIs.empty()) {

Instruction *Phi = PendingPHIs.pop_back_val(); Instruction *Phi = PendingPHIs.pop_back_val();

// The pending PHIs could have been proven dead by now. // The pending PHIs could have been proven dead by now.

if (isBlockExecutable(Phi->getParent())) if (isBlockExecutable(Phi->getParent()))

B += getUserBonus(Phi); B += getUserBonus(Phi);

} }

return B; return B;

} }

/// Compute a bonus for replacing argument \p A with constant \p C.

Bonus InstCostVisitor::getSpecializationBonus(Argument *A, Constant *C) {

LLVM_DEBUG(dbgs() << "FnSpecialization: Analysing bonus for constant: "

<< C->getNameOrAsOperand() << "\n");

Bonus B;

for (auto *U : A->users())

if (auto *UI = dyn_cast<Instruction>(U))

if (isBlockExecutable(UI->getParent()))

B += getUserBonus(UI, A, C);

LLVM_DEBUG(dbgs() << "FnSpecialization: Accumulated bonus {CodeSize = "

<< B.CodeSize << ", Latency = " << B.Latency

<< "} for argument " << *A << "\n");

return B;

}

Bonus InstCostVisitor::getUserBonus(Instruction *User, Value *Use, Constant *C) { Bonus InstCostVisitor::getUserBonus(Instruction *User, Value *Use, Constant *C) {

// We have already propagated a constant for this user. // We have already propagated a constant for this user.

if (KnownConstants.contains(User)) if (KnownConstants.contains(User))

return {0, 0}; return {0, 0};

// Cache the iterator before visiting. // Cache the iterator before visiting.

LastVisited = Use ? KnownConstants.insert({Use, C}).first LastVisited = Use ? KnownConstants.insert({Use, C}).first

: KnownConstants.end(); : KnownConstants.end();

▲ Show 20 Lines • Show All 393 Lines • ▼ Show 20 Lines for (Function &F : M) {

// times. This should change if specialization on literal constants gets // times. This should change if specialization on literal constants gets

// enabled. // enabled.

if (!Inserted && !Metrics.isRecursive && !SpecializeLiteralConstant) if (!Inserted && !Metrics.isRecursive && !SpecializeLiteralConstant)

continue; continue;

int64_t Sz = *Metrics.NumInsts.getValue(); int64_t Sz = *Metrics.NumInsts.getValue();

assert(Sz > 0 && "CodeSize should be positive"); assert(Sz > 0 && "CodeSize should be positive");

// It is safe to down cast from int64_t, NumInsts is always positive. // It is safe to down cast from int64_t, NumInsts is always positive.

unsigned SpecCost = static_cast<unsigned>(Sz); unsigned FuncSize = static_cast<unsigned>(Sz);

LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for " LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for "

<< F.getName() << " is " << SpecCost << "\n"); << F.getName() << " is " << FuncSize << "\n");

if (Inserted && Metrics.isRecursive) if (Inserted && Metrics.isRecursive)

promoteConstantStackValues(&F); promoteConstantStackValues(&F);

if (!findSpecializations(&F, SpecCost, AllSpecs, SM)) { if (!findSpecializations(&F, FuncSize, AllSpecs, SM)) {

LLVM_DEBUG( LLVM_DEBUG(

dbgs() << "FnSpecialization: No possible specializations found for " dbgs() << "FnSpecialization: No possible specializations found for "

<< F.getName() << "\n"); << F.getName() << "\n");

continue; continue;

} }

++NumCandidates; ++NumCandidates;

} }

▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines

/// the SCCPSolver in the cloned version. /// the SCCPSolver in the cloned version.

static Function *cloneCandidateFunction(Function *F) { static Function *cloneCandidateFunction(Function *F) {

ValueToValueMapTy Mappings; ValueToValueMapTy Mappings;

Function *Clone = CloneFunction(F, Mappings); Function *Clone = CloneFunction(F, Mappings);

removeSSACopy(*Clone); removeSSACopy(*Clone);

return Clone; return Clone;

} }

bool FunctionSpecializer::findSpecializations(Function *F, unsigned SpecCost, bool FunctionSpecializer::findSpecializations(Function *F, unsigned FuncSize,

SmallVectorImpl<Spec> &AllSpecs, SmallVectorImpl<Spec> &AllSpecs,

SpecMap &SM) { SpecMap &SM) {

// A mapping from a specialisation signature to the index of the respective // A mapping from a specialisation signature to the index of the respective

// entry in the all specialisation array. Used to ensure uniqueness of // entry in the all specialisation array. Used to ensure uniqueness of

// specialisations. // specialisations.

DenseMap<SpecSig, unsigned> UniqueSpecs; DenseMap<SpecSig, unsigned> UniqueSpecs;

// Get a list of interesting arguments. // Get a list of interesting arguments.

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines if (auto It = UniqueSpecs.find(S); It != UniqueSpecs.end()) {

// the best specialisation once all specialisations are known. // the best specialisation once all specialisations are known.

if (CS.getFunction() == F) if (CS.getFunction() == F)

continue; continue;

const unsigned Index = It->second; const unsigned Index = It->second;

AllSpecs[Index].CallSites.push_back(&CS); AllSpecs[Index].CallSites.push_back(&CS);

} else { } else {

// Calculate the specialisation gain. // Calculate the specialisation gain.

Bonus B; Bonus B;

unsigned Score = 0;

ChuanqiXuUnsubmitted

Done

Bonus B;

- unsigned Score = 0;

+ unsigned InliningBonusScore = 0;

InstCostVisitor Visitor = getInstCostVisitorFor(F);

May this be a better name?

ChuanqiXu: May this be a better name?

labrineaAuthorUnsubmitted

Done

labrinea: Ok

labrineaAuthorUnsubmitted

Done

Sorry I misunderstood. I am naming it score because at the end it contains InliningBonus + std::max(B.CodeSize, B.Latency).

labrinea: Sorry I misunderstood. I am naming it score because at the end it contains InliningBonus + std…

InstCostVisitor Visitor = getInstCostVisitorFor(F); InstCostVisitor Visitor = getInstCostVisitorFor(F);

for (ArgInfo &A : S.Args) for (ArgInfo &A : S.Args) {

B += getSpecializationBonus(A.Formal, A.Actual, Visitor); B += Visitor.getSpecializationBonus(A.Formal, A.Actual);

Score += getInliningBonus(A.Formal, A.Actual);

}

B += Visitor.getBonusFromPendingPHIs(); B += Visitor.getBonusFromPendingPHIs();

LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization score {CodeSize = "

LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization bonus {CodeSize = "

<< B.CodeSize << ", Latency = " << B.Latency << B.CodeSize << ", Latency = " << B.Latency

<< "}\n"); << ", Inlining = " << Score << "}\n");

auto IsProfitable = [&FuncSize](Bonus &B, unsigned Score) -> bool {

// No check required.

if (ForceSpecialization)

return true;

// Minimum inlining bonus.

if (Score > MinInliningBonus * FuncSize / 100)

ChuanqiXuUnsubmitted

Done

What's the meaning of the magic number 100?

ChuanqiXu: What's the meaning of the magic number `100`?

labrineaAuthorUnsubmitted

Done

Percentage. We can't do floating point decision so we multiply by MinSavings and then divide by 100.

labrinea: Percentage. We can't do floating point decision so we multiply by MinSavings and then divide by…

labrineaAuthorUnsubmitted

Done

*division

labrinea: *division

return true;

// Minimum codesize savings.

if (B.CodeSize < MinCodeSizeSavings * FuncSize / 100)

return false;

// Minimum latency savings.

if (B.Latency < MinLatencySavings * FuncSize / 100)

ChuanqiXuUnsubmitted

Done

// Minimum inlining savings.

- bool HasInliningBonus = Score > MinInliningSavings * FuncSize / 100;

+ if (Score > MinInliningSavings * FuncSize / 100)

+ return true;

// Minimum codesize savings.

if (B.CodeSize < MinCodeSizeSavings * FuncSize / 100)

- return HasInliningBonus;

+ return false;

// Minimum latency savings.

if (B.Latency < MinLatencySavings * FuncSize / 100)

- return HasInliningBonus;

+ return false;

return true;

nit: may this be slightly more clear?

ChuanqiXu: nit: may this be slightly more clear?

labrineaAuthorUnsubmitted

Done

Okay

labrinea: Okay

return false;

return true;

};

// Discard unprofitable specialisations. // Discard unprofitable specialisations.

if (!ForceSpecialization && B.Latency <= SpecCost - B.CodeSize) if (!IsProfitable(B, Score))

continue; continue;

ChuanqiXuUnsubmitted

Done

I am wondering if the condition || may be too strict. My rough feeling is: it is good enough to perform specialization if we can see one of the benefits.

ChuanqiXu: I am wondering if the condition `||` may be too strict. My rough feeling is: it is good enough…

labrineaAuthorUnsubmitted

Done

Without the latency condition CTMark triggers too much. The two conditions seem to work quite well in conjunction, CTMark gets less specializations than before, which is good for compile times, but we still specialize the interesting cases (mcf, exchange).

labrinea: Without the latency condition CTMark triggers too much. The two conditions seem to work quite…

ChuanqiXuUnsubmitted

Done

The result from benchmark is overwhelming : )

ChuanqiXu: The result from benchmark is overwhelming : )

// Create a new specialisation entry. // Create a new specialisation entry.

auto &Spec = AllSpecs.emplace_back(F, S, B.Latency); Score += std::max(B.CodeSize, B.Latency);

auto &Spec = AllSpecs.emplace_back(F, S, Score);

if (CS.getFunction() != F) if (CS.getFunction() != F)

Spec.CallSites.push_back(&CS); Spec.CallSites.push_back(&CS);

const unsigned Index = AllSpecs.size() - 1; const unsigned Index = AllSpecs.size() - 1;

UniqueSpecs[S] = Index; UniqueSpecs[S] = Index;

if (auto [It, Inserted] = SM.try_emplace(F, Index, Index + 1); !Inserted) if (auto [It, Inserted] = SM.try_emplace(F, Index, Index + 1); !Inserted)

It->second.second = Index + 1; It->second.second = Index + 1;

} }

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines Function *FunctionSpecializer::createSpecialization(Function *F,

Solver.addTrackedFunction(Clone); Solver.addTrackedFunction(Clone);

// Mark all the specialized functions // Mark all the specialized functions

Specializations.insert(Clone); Specializations.insert(Clone);

++NumSpecsCreated; ++NumSpecsCreated;

return Clone; return Clone;

} }

/// Compute a bonus for replacing argument \p A with constant \p C. /// Compute the inlining bonus for replacing argument \p A with constant \p C.

Bonus FunctionSpecializer::getSpecializationBonus(Argument *A, Constant *C, /// The below heuristic is only concerned with exposing inlining

InstCostVisitor &Visitor) { /// opportunities via indirect call promotion. If the argument is not a

LLVM_DEBUG(dbgs() << "FnSpecialization: Analysing bonus for constant: " /// (potentially casted) function pointer, give up.

<< C->getNameOrAsOperand() << "\n"); unsigned FunctionSpecializer::getInliningBonus(Argument *A, Constant *C) {

ChuanqiXuUnsubmitted

Done

Let's move the comment to the definition of the function. It will be much more clear.

ChuanqiXu: Let's move the comment to the definition of the function. It will be much more clear.

Bonus B;

for (auto *U : A->users())

if (auto *UI = dyn_cast<Instruction>(U))

if (Visitor.isBlockExecutable(UI->getParent()))

B += Visitor.getUserBonus(UI, A, C);

LLVM_DEBUG(dbgs() << "FnSpecialization: Accumulated bonus {CodeSize = "

<< B.CodeSize << ", Latency = " << B.Latency

<< "} for argument " << *A << "\n");

// The below heuristic is only concerned with exposing inlining

// opportunities via indirect call promotion. If the argument is not a

// (potentially casted) function pointer, give up.

// TODO: Perhaps we should consider checking such inlining opportunities

// while traversing the users of the specialization arguments ?

labrineaAuthorUnsubmitted

Done

I couldn't think of an example where this would be useful, let alone that calling getInlineCost is expensive according to this comment:
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Analysis/InlineCost.h#L274

labrinea: I couldn't think of an example where this would be useful, let alone that calling…

ChuanqiXuUnsubmitted

Not Done

Maybe I just feel reusing the existing mature cost model may be a better choice. But given that is expensive and we're willing to build and fine-tunning our cost model. It may be good to not depend the expensieve getInlineCost.

ChuanqiXu: Maybe I just feel reusing the existing mature cost model may be a better choice. But given that…

Function *CalledFunction = dyn_cast<Function>(C->stripPointerCasts()); Function *CalledFunction = dyn_cast<Function>(C->stripPointerCasts());

if (!CalledFunction) if (!CalledFunction)

return B; return 0;

// Get TTI for the called function (used for the inline cost). // Get TTI for the called function (used for the inline cost).

auto &CalleeTTI = (GetTTI)(*CalledFunction); auto &CalleeTTI = (GetTTI)(*CalledFunction);

// Look at all the call sites whose called value is the argument. // Look at all the call sites whose called value is the argument.

// Specializing the function on the argument would allow these indirect // Specializing the function on the argument would allow these indirect

// calls to be promoted to direct calls. If the indirect call promotion // calls to be promoted to direct calls. If the indirect call promotion

// would likely enable the called function to be inlined, specializing is a // would likely enable the called function to be inlined, specializing is a

Show All 28 Lines if (IC.isAlways())

InliningBonus += Params.DefaultThreshold; InliningBonus += Params.DefaultThreshold;

else if (IC.isVariable() && IC.getCostDelta() > 0) else if (IC.isVariable() && IC.getCostDelta() > 0)

InliningBonus += IC.getCostDelta(); InliningBonus += IC.getCostDelta();

LLVM_DEBUG(dbgs() << "FnSpecialization: Inlining bonus " << InliningBonus LLVM_DEBUG(dbgs() << "FnSpecialization: Inlining bonus " << InliningBonus

<< " for user " << *U << "\n"); << " for user " << *U << "\n");

} }

return B += {0, InliningBonus}; return InliningBonus > 0 ? static_cast<unsigned>(InliningBonus) : 0;

} }

/// Determine if it is possible to specialise the function for constant values /// Determine if it is possible to specialise the function for constant values

/// of the formal parameter \p A. /// of the formal parameter \p A.

bool FunctionSpecializer::isArgumentInteresting(Argument *A) { bool FunctionSpecializer::isArgumentInteresting(Argument *A) {

// No point in specialization if the argument is unused. // No point in specialization if the argument is unused.

if (A->user_empty()) if (A->user_empty())

return false; return false;

▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	TEST_F(FunctionSpecializationTest, SwitchInst) {
Instruction &Sdiv = *++Case2.begin();		Instruction &Sdiv = *++Case2.begin();
Instruction &BrBB2 = Case2.back();		Instruction &BrBB2 = Case2.back();
Instruction &Add = BB1.front();		Instruction &Add = BB1.front();
Instruction &Or = BB2.front();		Instruction &Or = BB2.front();
Instruction &BrLoop = BB2.back();		Instruction &BrLoop = BB2.back();

// mul		// mul
Bonus Ref = getInstCost(Mul);		Bonus Ref = getInstCost(Mul);
Bonus Test = Specializer.getSpecializationBonus(F->getArg(0), One, Visitor);		Bonus Test = Visitor.getSpecializationBonus(F->getArg(0), One);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);

// and + or + add		// and + or + add
Ref = getInstCost(And) + getInstCost(Or) + getInstCost(Add);		Ref = getInstCost(And) + getInstCost(Or) + getInstCost(Add);
Test = Specializer.getSpecializationBonus(F->getArg(1), One, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(1), One);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);

// switch + sdiv + br + br		// switch + sdiv + br + br
Ref = getInstCost(Switch) +		Ref = getInstCost(Switch) +
getInstCost(Sdiv, /SizeOnly =/ true) +		getInstCost(Sdiv, /SizeOnly =/ true) +
getInstCost(BrBB2, /SizeOnly =/ true) +		getInstCost(BrBB2, /SizeOnly =/ true) +
getInstCost(BrLoop, /SizeOnly =/ true);		getInstCost(BrLoop, /SizeOnly =/ true);
Test = Specializer.getSpecializationBonus(F->getArg(2), One, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(2), One);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);
}		}

TEST_F(FunctionSpecializationTest, BranchInst) {		TEST_F(FunctionSpecializationTest, BranchInst) {
const char *ModuleString = R"(		const char *ModuleString = R"(
define void @foo(i32 %a, i32 %b, i1 %cond) {		define void @foo(i32 %a, i32 %b, i1 %cond) {
entry:		entry:
Show All 35 Lines	TEST_F(FunctionSpecializationTest, BranchInst) {
Instruction &BrBB1BB2 = BB0.back();		Instruction &BrBB1BB2 = BB0.back();
Instruction &Add = BB1.front();		Instruction &Add = BB1.front();
Instruction &Sdiv = *++BB1.begin();		Instruction &Sdiv = *++BB1.begin();
Instruction &BrBB2 = BB1.back();		Instruction &BrBB2 = BB1.back();
Instruction &BrLoop = BB2.front();		Instruction &BrLoop = BB2.front();

// mul		// mul
Bonus Ref = getInstCost(Mul);		Bonus Ref = getInstCost(Mul);
Bonus Test = Specializer.getSpecializationBonus(F->getArg(0), One, Visitor);		Bonus Test = Visitor.getSpecializationBonus(F->getArg(0), One);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);

// add		// add
Ref = getInstCost(Add);		Ref = getInstCost(Add);
Test = Specializer.getSpecializationBonus(F->getArg(1), One, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(1), One);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);

// branch + sub + br + sdiv + br		// branch + sub + br + sdiv + br
Ref = getInstCost(Branch) +		Ref = getInstCost(Branch) +
getInstCost(Sub, /SizeOnly =/ true) +		getInstCost(Sub, /SizeOnly =/ true) +
getInstCost(BrBB1BB2) +		getInstCost(BrBB1BB2) +
getInstCost(Sdiv, /SizeOnly =/ true) +		getInstCost(Sdiv, /SizeOnly =/ true) +
getInstCost(BrBB2, /SizeOnly =/ true) +		getInstCost(BrBB2, /SizeOnly =/ true) +
getInstCost(BrLoop, /SizeOnly =/ true);		getInstCost(BrLoop, /SizeOnly =/ true);
Test = Specializer.getSpecializationBonus(F->getArg(2), False, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(2), False);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);
}		}

TEST_F(FunctionSpecializationTest, Misc) {		TEST_F(FunctionSpecializationTest, Misc) {
const char *ModuleString = R"(		const char *ModuleString = R"(
%struct_t = type { [8 x i16], [8 x i16], i32, i32, i32, ptr, [8 x i8] }		%struct_t = type { [8 x i16], [8 x i16], i32, i32, i32, ptr, [8 x i8] }
@g = constant %struct_t zeroinitializer, align 16		@g = constant %struct_t zeroinitializer, align 16
Show All 32 Lines	TEST_F(FunctionSpecializationTest, Misc) {
Instruction &Select = *BlockIter++;		Instruction &Select = *BlockIter++;
Instruction &Gep = *BlockIter++;		Instruction &Gep = *BlockIter++;
Instruction &Load = *BlockIter++;		Instruction &Load = *BlockIter++;
Instruction &Freeze = *BlockIter++;		Instruction &Freeze = *BlockIter++;
Instruction &Smax = *BlockIter++;		Instruction &Smax = *BlockIter++;

// icmp + zext		// icmp + zext
Bonus Ref = getInstCost(Icmp) + getInstCost(Zext);		Bonus Ref = getInstCost(Icmp) + getInstCost(Zext);
Bonus Test = Specializer.getSpecializationBonus(F->getArg(0), One, Visitor);		Bonus Test = Visitor.getSpecializationBonus(F->getArg(0), One);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);

// select		// select
Ref = getInstCost(Select);		Ref = getInstCost(Select);
Test = Specializer.getSpecializationBonus(F->getArg(1), True, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(1), True);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);

// gep + load + freeze + smax		// gep + load + freeze + smax
Ref = getInstCost(Gep) + getInstCost(Load) + getInstCost(Freeze) +		Ref = getInstCost(Gep) + getInstCost(Load) + getInstCost(Freeze) +
getInstCost(Smax);		getInstCost(Smax);
Test = Specializer.getSpecializationBonus(F->getArg(2), GV, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(2), GV);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);

Test = Specializer.getSpecializationBonus(F->getArg(3), Undef, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(3), Undef);
EXPECT_TRUE(Test.CodeSize == 0 && Test.Latency == 0);		EXPECT_TRUE(Test.CodeSize == 0 && Test.Latency == 0);
}		}

TEST_F(FunctionSpecializationTest, PhiNode) {		TEST_F(FunctionSpecializationTest, PhiNode) {
const char *ModuleString = R"(		const char *ModuleString = R"(
define void @foo(i32 %a, i32 %b, i32 %i) {		define void @foo(i32 %a, i32 %b, i32 %i) {
entry:		entry:
br label %loop		br label %loop
Show All 34 Lines	TEST_F(FunctionSpecializationTest, PhiNode) {
Instruction &Switch = Loop.back();		Instruction &Switch = Loop.back();
Instruction &Add = Case1.front();		Instruction &Add = Case1.front();
Instruction &PhiCase2 = Case2.front();		Instruction &PhiCase2 = Case2.front();
Instruction &BrBB = Case2.back();		Instruction &BrBB = Case2.back();
Instruction &PhiBB = BB.front();		Instruction &PhiBB = BB.front();
Instruction &Icmp = *++BB.begin();		Instruction &Icmp = *++BB.begin();
Instruction &Branch = BB.back();		Instruction &Branch = BB.back();

Bonus Test = Specializer.getSpecializationBonus(F->getArg(0), One, Visitor);		Bonus Test = Visitor.getSpecializationBonus(F->getArg(0), One);
EXPECT_TRUE(Test.CodeSize == 0 && Test.Latency == 0);		EXPECT_TRUE(Test.CodeSize == 0 && Test.Latency == 0);

Test = Specializer.getSpecializationBonus(F->getArg(1), One, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(1), One);
EXPECT_TRUE(Test.CodeSize == 0 && Test.Latency == 0);		EXPECT_TRUE(Test.CodeSize == 0 && Test.Latency == 0);

// switch + phi + br		// switch + phi + br
Bonus Ref = getInstCost(Switch) +		Bonus Ref = getInstCost(Switch) +
getInstCost(PhiCase2, /SizeOnly =/ true) +		getInstCost(PhiCase2, /SizeOnly =/ true) +
getInstCost(BrBB, /SizeOnly =/ true);		getInstCost(BrBB, /SizeOnly =/ true);
Test = Specializer.getSpecializationBonus(F->getArg(2), One, Visitor);		Test = Visitor.getSpecializationBonus(F->getArg(2), One);
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);

// phi + phi + add + icmp + branch		// phi + phi + add + icmp + branch
Ref = getInstCost(PhiBB) + getInstCost(PhiLoop) + getInstCost(Add) +		Ref = getInstCost(PhiBB) + getInstCost(PhiLoop) + getInstCost(Add) +
getInstCost(Icmp) + getInstCost(Branch);		getInstCost(Icmp) + getInstCost(Branch);
Test = Visitor.getBonusFromPendingPHIs();		Test = Visitor.getBonusFromPendingPHIs();
EXPECT_EQ(Test, Ref);		EXPECT_EQ(Test, Ref);
EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);		EXPECT_TRUE(Test.CodeSize > 0 && Test.Latency > 0);
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Rework the discardment logic for unprofitable specializations.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 548531

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp

[FuncSpec] Rework the discardment logic for unprofitable specializations.
ClosedPublic