This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
-
FunctionSpecialization.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
5/8
FunctionSpecialization.cpp
-
test/Transforms/FunctionSpecialization/
-
Transforms/
-
FunctionSpecialization/
-
compiler-crash-58759.ll
-
function-specialization-constant-expression.ll
-
function-specialization-minsize3.ll
-
function-specialization.ll
-
function-specialization2.ll
-
get-possible-constants.ll
-
global-rank.ll
-
identical-specializations.ll
-
literal-const.ll
-
max-iters.ll
-
noinline.ll
-
remove-dead-recursive-function.ll
-
specialize-multiple-arguments.ll
-
unittests/Transforms/IPO/
-
Transforms/
-
IPO/
-
FunctionSpecializationTest.cpp

Differential D150649

[FuncSpec] Enable specialization of literal constants.
ClosedPublic

Authored by labrinea on May 16 2023, 2:19 AM.

Download Raw Diff

Details

Reviewers

chill
ChuanqiXu
SjoerdMeijer

Commits

rG0524534d5220: [FuncSpec] Enable specialization of literal constants.

Summary

To do so we have to tweak the cost model such that specialization does not trigger excessively.

Compile times -O3

benchmark	nspecs before	nspecs after	instrCnt delta %
ClamAV	5	5	+0.003
7zip		1	+0.006
tramp3d-v4			-0.03
kimwitu++			-0.015
sqlite3			+0.034
mafft			+0.022
lencod		1	+0.171
SPASS		1	+0.364
consumer-typeset	1		-0.007
Bullet	1	1	+0.015
geomean			+0.056

Compile times LTO

benchmark	nspecs before	nspecs after	instrCnt delta %
ClamAV		2	+0.535
7zip			+0.024
tramp3d-v4			+0.022
kimwitu++			-0.008
sqlite3			+0.091
mafft			-0.001
lencod		6	+0.205
SPASS	3	1	+0.032
consumer-typeset	1	1	-0.361
Bullet			-0.01
geomean			+0.053

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

labrinea created this revision.May 16 2023, 2:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2023, 2:19 AM

Herald added subscribers: hoy, snehasish, ormris, hiraditya. · View Herald Transcript

labrinea requested review of this revision.May 16 2023, 2:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2023, 2:19 AM

labrinea added a parent revision: D150464: [FuncSpec] Improve the accuracy of the cost model..May 16 2023, 2:20 AM

Harbormaster completed remote builds in B232238: Diff 522503.May 16 2023, 2:20 AM

labrinea mentioned this in D150375: [FuncSpec] Replace LoopInfo with BlockFrequencyInfo..May 16 2023, 2:22 AM

labrinea added inline comments.May 16 2023, 2:47 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
83–86	This is an emprirically made up number based on measurements mainly to keep the llvm test suite compile times low. Perhaps we could improve this in the future.

The code change looks trivial. So it looks good if the measured data (including compilation time and performance) don't have a regression.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
706–718	Can't we move this out of the loop simply?
739	What's the reason that we use `/` instead of `-` here?

labrinea mentioned this in D150464: [FuncSpec] Improve the accuracy of the cost model..May 17 2023, 9:39 AM

Rebased on parent revision.

Harbormaster completed remote builds in B232637: Diff 523085.May 17 2023, 9:42 AM

labrinea marked 2 inline comments as done.May 17 2023, 9:48 AM

labrinea added inline comments.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
706–718	It's explained in the comments. If we hoist this code we are eagerly asking for the BlockFrequencyAnalysis to run even if no specializations are found. I've checked and moving it regresses compilation times for benchmarks with no specializations.
739	Because we want the Bonus to be at least `MinScore` times higher than SpecCost. The delta was too aggressive heuristic. A ratio seems more sensible.

ChuanqiXu accepted this revision.May 18 2023, 12:30 AM

ChuanqiXu added inline comments.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
22–23	nit: It is better to have a paragraph for the cost model. This is not required now. We can add one after we feel it is relatively stable.
706–718	Got it. Thanks : )
739	I still don't understand it a lot. But maybe this is what heuristic model is.

This revision is now accepted and ready to land.May 18 2023, 12:30 AM

rebase

Harbormaster completed remote builds in B233987: Diff 524887.May 23 2023, 3:43 PM

labrinea removed a parent revision: D150464: [FuncSpec] Improve the accuracy of the cost model..May 24 2023, 3:57 AM

labrinea edited the summary of this revision. (Show Details)May 24 2023, 9:17 AM

This revision was landed with ongoing or failed builds.May 25 2023, 2:05 AM

Closed by commit rG0524534d5220: [FuncSpec] Enable specialization of literal constants. (authored by labrinea). · Explain Why

This revision was automatically updated to reflect the committed changes.

labrinea added a commit: rG0524534d5220: [FuncSpec] Enable specialization of literal constants..

What kind of run-time improvements does this give? The patch description mentions the compile-time regressions this causes, but not what improvements we get from it.

In D150649#4371387, @nikic wrote:

What kind of run-time improvements does this give? The patch description mentions the compile-time regressions this causes, but not what improvements we get from it.

We want this to enable vectorization of mc_chroma from 525.x264_r in SPEC INTrate 2017, but there's more work needed for that. This patch will let us create a specialized version where the loop boundaries are constant.

nikic added a reverting change: rG96a14f388b1a: Revert "[FuncSpec] Replace LoopInfo with BlockFrequencyInfo".May 30 2023, 5:49 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

FunctionSpecialization.h

2 lines

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

53 lines

test/

Transforms/

FunctionSpecialization/

compiler-crash-58759.ll

2 lines

function-specialization-constant-expression.ll

50 lines

function-specialization-minsize3.ll

2 lines

function-specialization.ll

4 lines

function-specialization2.ll

get-possible-constants.ll

2 lines

global-rank.ll

3 lines

identical-specializations.ll

12 lines

literal-const.ll

3 lines

max-iters.ll

110 lines

noinline.ll

2 lines

remove-dead-recursive-function.ll

2 lines

specialize-multiple-arguments.ll

26 lines

unittests/

Transforms/

IPO/

FunctionSpecializationTest.cpp

5 lines

Diff 525488

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

Show First 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	FunctionSpecializer(
GetTTI(GetTTI), GetAC(GetAC) {}		GetTTI(GetTTI), GetAC(GetAC) {}

~FunctionSpecializer();		~FunctionSpecializer();

bool isClonedFunction(Function *F) { return Specializations.count(F); }		bool isClonedFunction(Function *F) { return Specializations.count(F); }

bool run();		bool run();

		static unsigned getBlockFreqMultiplier();

InstCostVisitor getInstCostVisitorFor(Function *F) {		InstCostVisitor getInstCostVisitorFor(Function *F) {
auto &BFI = (GetBFI)(*F);		auto &BFI = (GetBFI)(*F);
auto &TTI = (GetTTI)(*F);		auto &TTI = (GetTTI)(*F);
return InstCostVisitor(M.getDataLayout(), BFI, TTI, Solver);		return InstCostVisitor(M.getDataLayout(), BFI, TTI, Solver);
}		}

/// Compute a bonus for replacing argument \p A with constant \p C.		/// Compute a bonus for replacing argument \p A with constant \p C.
Cost getSpecializationBonus(Argument A, Constant C,		Cost getSpecializationBonus(Argument A, Constant C,
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

Show All 13 Lines
// why function specialisation is run before the inliner in the optimisation		// why function specialisation is run before the inliner in the optimisation
// pipeline; that is by design. Otherwise, we would only benefit from constant		// pipeline; that is by design. Otherwise, we would only benefit from constant
// passing, which is a valid use-case too, but hasn't been explored much in		// passing, which is a valid use-case too, but hasn't been explored much in
// terms of performance uplifts, cost-model and compile-time impact.		// terms of performance uplifts, cost-model and compile-time impact.
//		//
// Current limitations:		// Current limitations:
// - It does not yet handle integer ranges. We do support "literal constants",		// - It does not yet handle integer ranges. We do support "literal constants",
// but that's off by default under an option.		// but that's off by default under an option.
// - The cost-model could be further looked into (it mainly focuses on inlining		// - The cost-model could be further looked into (it mainly focuses on inlining
// benefits),		// benefits),
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions nit: It is better to have a paragraph for the cost model. This is not required now. We can add one after we feel it is relatively stable. ChuanqiXu: nit: It is better to have a paragraph for the cost model. This is not required now. We can add…
//		//
// Ideas:		// Ideas:
// - With a function specialization attribute for arguments, we could have		// - With a function specialization attribute for arguments, we could have
// a direct way to steer function specialization, avoiding the cost-model,		// a direct way to steer function specialization, avoiding the cost-model,
// and thus control compile-times / code-size.		// and thus control compile-times / code-size.
//		//
// Todos:		// Todos:
// - Specializing recursive functions relies on running the transformation a		// - Specializing recursive functions relies on running the transformation a
Show All 37 Lines

STATISTIC(NumSpecsCreated, "Number of specializations created");		STATISTIC(NumSpecsCreated, "Number of specializations created");

static cl::opt<bool> ForceSpecialization(		static cl::opt<bool> ForceSpecialization(
"force-specialization", cl::init(false), cl::Hidden, cl::desc(		"force-specialization", cl::init(false), cl::Hidden, cl::desc(
"Force function specialization for every call site with a constant "		"Force function specialization for every call site with a constant "
"argument"));		"argument"));

		// Set to 2^3 to model three levels of if-else nest.
		static cl::opt<unsigned> BlockFreqMultiplier(
		"funcspec-block-freq-multiplier", cl::init(8), cl::Hidden, cl::desc(
		"Multiplier to scale block frequency of user instructions during "
		"specialization bonus estimation"));

		static cl::opt<unsigned> MinEntryFreq(
		"funcspec-min-entry-freq", cl::init(450), cl::Hidden, cl::desc(
		"Do not specialize functions with entry block frequency lower than "
		"this value"));
		labrineaAuthorUnsubmitted Done Reply Inline Actions This is an emprirically made up number based on measurements mainly to keep the llvm test suite compile times low. Perhaps we could improve this in the future. labrinea: This is an emprirically made up number based on measurements mainly to keep the llvm test suite…

		static cl::opt<unsigned> MinScore(
		"funcspec-min-score", cl::init(2), cl::Hidden, cl::desc(
		"Do not specialize functions with score lower than this value "
		"(the ratio of specialization bonus over specialization cost)"));

static cl::opt<unsigned> MaxClones(		static cl::opt<unsigned> MaxClones(
"funcspec-max-clones", cl::init(3), cl::Hidden, cl::desc(		"funcspec-max-clones", cl::init(3), cl::Hidden, cl::desc(
"The maximum number of clones allowed for a single function "		"The maximum number of clones allowed for a single function "
"specialization"));		"specialization"));

static cl::opt<unsigned> MinFunctionSize(		static cl::opt<unsigned> MinFunctionSize(
"funcspec-min-function-size", cl::init(100), cl::Hidden, cl::desc(		"funcspec-min-function-size", cl::init(100), cl::Hidden, cl::desc(
"Don't specialize functions that have less than this number of "		"Don't specialize functions that have less than this number of "
"instructions"));		"instructions"));

static cl::opt<bool> SpecializeOnAddress(		static cl::opt<bool> SpecializeOnAddress(
"funcspec-on-address", cl::init(false), cl::Hidden, cl::desc(		"funcspec-on-address", cl::init(false), cl::Hidden, cl::desc(
"Enable function specialization on the address of global values"));		"Enable function specialization on the address of global values"));

// Disabled by default as it can significantly increase compilation times.
//
// https://llvm-compile-time-tracker.com
// https://github.com/nikic/llvm-compile-time-tracker
static cl::opt<bool> SpecializeLiteralConstant(		static cl::opt<bool> SpecializeLiteralConstant(
"funcspec-for-literal-constant", cl::init(false), cl::Hidden, cl::desc(		"funcspec-for-literal-constant", cl::init(true), cl::Hidden, cl::desc(
"Enable specialization of functions that take a literal constant as an "		"Enable specialization of functions that take a literal constant as an "
"argument"));		"argument"));

		unsigned FunctionSpecializer::getBlockFreqMultiplier() {
		return BlockFreqMultiplier;
		}

// Estimates the instruction cost of all the basic blocks in \p WorkList.		// Estimates the instruction cost of all the basic blocks in \p WorkList.
// The successors of such blocks are added to the list as long as they are		// The successors of such blocks are added to the list as long as they are
// executable and they have a unique predecessor. \p WorkList represents		// executable and they have a unique predecessor. \p WorkList represents
// the basic blocks of a specialization which become dead once we replace		// the basic blocks of a specialization which become dead once we replace
// instructions that are known to be constants. The aim here is to estimate		// instructions that are known to be constants. The aim here is to estimate
// the combination of size and latency savings in comparison to the non		// the combination of size and latency savings in comparison to the non
// specialized version of the function.		// specialized version of the function.
static Cost estimateBasicBlocks(SmallVectorImpl<BasicBlock *> &WorkList,		static Cost estimateBasicBlocks(SmallVectorImpl<BasicBlock *> &WorkList,
ConstMap &KnownConstants, SCCPSolver &Solver,		ConstMap &KnownConstants, SCCPSolver &Solver,
BlockFrequencyInfo &BFI,		BlockFrequencyInfo &BFI,
TargetTransformInfo &TTI) {		TargetTransformInfo &TTI) {
Cost Bonus = 0;		Cost Bonus = 0;

// Accumulate the instruction cost of each basic block weighted by frequency.		// Accumulate the instruction cost of each basic block weighted by frequency.
while (!WorkList.empty()) {		while (!WorkList.empty()) {
BasicBlock *BB = WorkList.pop_back_val();		BasicBlock *BB = WorkList.pop_back_val();

uint64_t Weight = BFI.getBlockFreq(BB).getFrequency() /		uint64_t Weight = BlockFreqMultiplier *
		BFI.getBlockFreq(BB).getFrequency() /
BFI.getEntryFreq();		BFI.getEntryFreq();
if (!Weight)		if (!Weight)
continue;		continue;

for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
// Disregard SSA copies.		// Disregard SSA copies.
if (auto *II = dyn_cast<IntrinsicInst>(&I))		if (auto *II = dyn_cast<IntrinsicInst>(&I))
if (II->getIntrinsicID() == Intrinsic::ssa_copy)		if (II->getIntrinsicID() == Intrinsic::ssa_copy)
Show All 36 Lines	if (auto *I = dyn_cast<BranchInst>(User))
return estimateBranchInst(*I);		return estimateBranchInst(*I);

C = visit(*User);		C = visit(*User);
if (!C)		if (!C)
return 0;		return 0;

KnownConstants.insert({User, C});		KnownConstants.insert({User, C});

uint64_t Weight = BFI.getBlockFreq(User->getParent()).getFrequency() /		uint64_t Weight = BlockFreqMultiplier *
		BFI.getBlockFreq(User->getParent()).getFrequency() /
BFI.getEntryFreq();		BFI.getEntryFreq();
if (!Weight)		if (!Weight)
return 0;		return 0;

Cost Bonus = Weight *		Cost Bonus = Weight *
TTI.getInstructionCost(User, TargetTransformInfo::TCK_SizeAndLatency);		TTI.getInstructionCost(User, TargetTransformInfo::TCK_SizeAndLatency);

LLVM_DEBUG(dbgs() << "FnSpecialization: Bonus " << Bonus		LLVM_DEBUG(dbgs() << "FnSpecialization: Bonus " << Bonus
▲ Show 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	bool FunctionSpecializer::findSpecializations(Function *F, Cost SpecCost,
SmallVector<Argument *> Args;		SmallVector<Argument *> Args;
for (Argument &Arg : F->args())		for (Argument &Arg : F->args())
if (isArgumentInteresting(&Arg))		if (isArgumentInteresting(&Arg))
Args.push_back(&Arg);		Args.push_back(&Arg);

if (Args.empty())		if (Args.empty())
return false;		return false;

		bool HasCheckedEntryFreq = false;
for (User *U : F->users()) {		for (User *U : F->users()) {
if (!isa<CallInst>(U) && !isa<InvokeInst>(U))		if (!isa<CallInst>(U) && !isa<InvokeInst>(U))
continue;		continue;
auto &CS = *cast<CallBase>(U);		auto &CS = *cast<CallBase>(U);

// The user instruction does not call our function.		// The user instruction does not call our function.
if (CS.getCalledFunction() != F)		if (CS.getCalledFunction() != F)
continue;		continue;
Show All 19 Lines	for (Argument *A : Args) {
<< A->getName() << " : " << C->getNameOrAsOperand()		<< A->getName() << " : " << C->getNameOrAsOperand()
<< "\n");		<< "\n");
S.Args.push_back({A, C});		S.Args.push_back({A, C});
}		}

if (S.Args.empty())		if (S.Args.empty())
continue;		continue;

		// Check the function entry frequency only once. We sink this code here to
		// postpone running the Block Frequency Analysis until we know for sure
		// there are Specialization candidates, otherwise we are adding unnecessary
		// overhead.
		if (!HasCheckedEntryFreq) {
		// Reject cold functions (for some definition of 'cold').
		uint64_t EntryFreq = (GetBFI)(*F).getEntryFreq();
		if (!ForceSpecialization && EntryFreq < MinEntryFreq)
		return false;

		HasCheckedEntryFreq = true;
		LLVM_DEBUG(dbgs() << "FnSpecialization: Entry block frequency for "
		<< F->getName() << " = " << EntryFreq << "\n");
		ChuanqiXuUnsubmitted Done Reply Inline Actions Can't we move this out of the loop simply? ChuanqiXu: Can't we move this out of the loop simply?
		labrineaAuthorUnsubmitted Done Reply Inline Actions It's explained in the comments. If we hoist this code we are eagerly asking for the BlockFrequencyAnalysis to run even if no specializations are found. I've checked and moving it regresses compilation times for benchmarks with no specializations. labrinea: It's explained in the comments. If we hoist this code we are eagerly asking for the…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Got it. Thanks : ) ChuanqiXu: Got it. Thanks : )
		}

// Check if we have encountered the same specialisation already.		// Check if we have encountered the same specialisation already.
if (auto It = UniqueSpecs.find(S); It != UniqueSpecs.end()) {		if (auto It = UniqueSpecs.find(S); It != UniqueSpecs.end()) {
// Existing specialisation. Add the call to the list to rewrite, unless		// Existing specialisation. Add the call to the list to rewrite, unless
// it's a recursive call. A specialisation, generated because of a		// it's a recursive call. A specialisation, generated because of a
// recursive call may end up as not the best specialisation for all		// recursive call may end up as not the best specialisation for all
// the cloned instances of this call, which result from specialising		// the cloned instances of this call, which result from specialising
// functions. Hence we don't rewrite the call directly, but match it with		// functions. Hence we don't rewrite the call directly, but match it with
// the best specialisation once all specialisations are known.		// the best specialisation once all specialisations are known.
if (CS.getFunction() == F)		if (CS.getFunction() == F)
continue;		continue;
const unsigned Index = It->second;		const unsigned Index = It->second;
AllSpecs[Index].CallSites.push_back(&CS);		AllSpecs[Index].CallSites.push_back(&CS);
} else {		} else {
// Calculate the specialisation gain.		// Calculate the specialisation gain.
Cost Score = 0 - SpecCost;		Cost Score = 0;
InstCostVisitor Visitor = getInstCostVisitorFor(F);		InstCostVisitor Visitor = getInstCostVisitorFor(F);
for (ArgInfo &A : S.Args)		for (ArgInfo &A : S.Args)
Score += getSpecializationBonus(A.Formal, A.Actual, Visitor);		Score += getSpecializationBonus(A.Formal, A.Actual, Visitor);
		Score /= SpecCost;
		ChuanqiXuUnsubmitted Done Reply Inline Actions What's the reason that we use `/` instead of `-` here? ChuanqiXu: What's the reason that we use `/` instead of `-` here?
		labrineaAuthorUnsubmitted Done Reply Inline Actions Because we want the Bonus to be at least `MinScore` times higher than SpecCost. The delta was too aggressive heuristic. A ratio seems more sensible. labrinea: Because we want the Bonus to be at least `MinScore` times higher than SpecCost. The delta was…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions I still don't understand it a lot. But maybe this is what heuristic model is. ChuanqiXu: I still don't understand it a lot. But maybe this is what heuristic model is.

// Discard unprofitable specialisations.		// Discard unprofitable specialisations.
if (!ForceSpecialization && Score <= 0)		if (!ForceSpecialization && Score < MinScore)
continue;		continue;

// Create a new specialisation entry.		// Create a new specialisation entry.
auto &Spec = AllSpecs.emplace_back(F, S, Score);		auto &Spec = AllSpecs.emplace_back(F, S, Score);
if (CS.getFunction() != F)		if (CS.getFunction() != F)
Spec.CallSites.push_back(&CS);		Spec.CallSites.push_back(&CS);
const unsigned Index = AllSpecs.size() - 1;		const unsigned Index = AllSpecs.size() - 1;
UniqueSpecs[S] = Index;		UniqueSpecs[S] = Index;
▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/compiler-crash-58759.ll

	; RUN: opt -S --passes="default<O3>" < %s \| FileCheck %s			; RUN: opt -S --passes="default<O3>" -force-specialization < %s \| FileCheck %s

	define dso_local i32 @g0(i32 noundef %x) local_unnamed_addr {			define dso_local i32 @g0(i32 noundef %x) local_unnamed_addr {
	entry:			entry:
	%call = tail call fastcc i32 @f(i32 noundef %x, ptr noundef nonnull @p0)			%call = tail call fastcc i32 @f(i32 noundef %x, ptr noundef nonnull @p0)
	ret i32 %call			ret i32 %call
	}			}

	define internal fastcc i32 @f(i32 noundef %x, ptr nocapture noundef readonly %p) noinline {			define internal fastcc i32 @f(i32 noundef %x, ptr nocapture noundef readonly %p) noinline {
	Show All 18 Lines

llvm/test/Transforms/FunctionSpecialization/function-specialization-constant-expression.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; RUN: opt -passes="ipsccp<func-spec>" -force-specialization -S < %s \| FileCheck %s

	; Test function specialization wouldn't crash due to constant expression.			; Test function specialization wouldn't crash due to constant expression.
	; Note that this test case shows that function specialization pass would			; Note that this test case shows that function specialization pass would
	; transform the function even if no specialization happened.			; transform the function even if no specialization happened.

	; RUN: opt -passes="ipsccp<func-spec>" -force-specialization -S < %s \| FileCheck %s

	%struct = type { i8, i16, i32, i64, i64}			%struct = type { i8, i16, i32, i64, i64}
	@Global = internal constant %struct {i8 0, i16 1, i32 2, i64 3, i64 4}			@Global = internal constant %struct {i8 0, i16 1, i32 2, i64 3, i64 4}

	define internal i64 @func2(ptr %x) {			define internal i64 @func2(ptr %x) {
	entry:			entry:
	%val = ptrtoint ptr %x to i64			%val = ptrtoint ptr %x to i64
	ret i64 %val			ret i64 %val
	}			}

	define internal i64 @func(ptr %x, ptr %binop) {			define internal i64 @func(ptr %x, ptr %binop) {
	; CHECK-LABEL: @func(			; CHECK-LABEL: @func(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	%tmp0 = call i64 %binop(ptr %x)			%tmp0 = call i64 %binop(ptr %x)
	ret i64 %tmp0			ret i64 %tmp0
	}			}

	define internal i64 @zoo(i1 %flag) {			define internal i64 @zoo(i1 %flag) {
	; CHECK-LABEL: @zoo(
	; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
	; CHECK: plus:
	; CHECK-NEXT: [[TMP0:%.]] = call i64 @func2.2(ptr getelementptr inbounds ([[STRUCT:%.]], ptr @Global, i32 0, i32 3))
	; CHECK-NEXT: br label [[MERGE:%.*]]
	; CHECK: minus:
	; CHECK-NEXT: [[TMP1:%.*]] = call i64 @func2.1(ptr getelementptr inbounds ([[STRUCT]], ptr @Global, i32 0, i32 4))
	; CHECK-NEXT: br label [[MERGE]]
	; CHECK: merge:
	; CHECK-NEXT: [[TMP2:%.]] = phi i64 [ ptrtoint (ptr getelementptr inbounds ([[STRUCT:%.]], ptr @Global, i32 0, i32 3) to i64), [[PLUS]] ], [ ptrtoint (ptr getelementptr inbounds ([[STRUCT:%.*]], ptr @Global, i32 0, i32 4) to i64), [[MINUS]] ]
	; CHECK-NEXT: ret i64 [[TMP2]]
	;
	entry:			entry:
	br i1 %flag, label %plus, label %minus			br i1 %flag, label %plus, label %minus

	plus:			plus:
	%arg = getelementptr %struct, ptr @Global, i32 0, i32 3			%arg = getelementptr %struct, ptr @Global, i32 0, i32 3
	%tmp0 = call i64 @func2(ptr %arg)			%tmp0 = call i64 @func2(ptr %arg)
	br label %merge			br label %merge

	minus:			minus:
	%arg2 = getelementptr %struct, ptr @Global, i32 0, i32 4			%arg2 = getelementptr %struct, ptr @Global, i32 0, i32 4
	%tmp1 = call i64 @func2(ptr %arg2)			%tmp1 = call i64 @func2(ptr %arg2)
	br label %merge			br label %merge

	merge:			merge:
	%tmp2 = phi i64 [ %tmp0, %plus ], [ %tmp1, %minus]			%tmp2 = phi i64 [ %tmp0, %plus ], [ %tmp1, %minus]
	ret i64 %tmp2			ret i64 %tmp2
	}			}


	define i64 @main() {			define i64 @main() {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: [[TMP1:%.*]] = call i64 @zoo(i1 false)			; CHECK-NEXT: [[TMP1:%.*]] = call i64 @zoo.4(i1 false)
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @zoo(i1 true)			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @zoo.3(i1 true)
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP1]], [[TMP2]]			; CHECK-NEXT: ret i64 add (i64 ptrtoint (ptr getelementptr inbounds ([[STRUCT:%.*]], ptr @Global, i32 0, i32 4) to i64), i64 ptrtoint (ptr getelementptr inbounds ([[STRUCT]], ptr @Global, i32 0, i32 3) to i64))
	; CHECK-NEXT: ret i64 [[TMP3]]
	;			;
	%1 = call i64 @zoo(i1 0)			%1 = call i64 @zoo(i1 0)
	%2 = call i64 @zoo(i1 1)			%2 = call i64 @zoo(i1 1)
	%3 = add i64 %1, %2			%3 = add i64 %1, %2
	ret i64 %3			ret i64 %3
	}			}

				; CHECK-LABEL: @func2.1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret i64 undef

				; CHECK-LABEL: @func2.2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret i64 undef

				; CHECK-LABEL: @zoo.3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[PLUS:%.*]]
				; CHECK: plus:
				; CHECK-NEXT: [[TMP0:%.]] = call i64 @func2.2(ptr getelementptr inbounds ([[STRUCT:%.]], ptr @Global, i32 0, i32 3))
				; CHECK-NEXT: br label [[MERGE:%.*]]
				; CHECK: merge:
				; CHECK-NEXT: ret i64 undef

				; CHECK-LABEL: @zoo.4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br label [[MINUS:%.*]]
				; CHECK: minus:
				; CHECK-NEXT: [[TMP1:%.]] = call i64 @func2.1(ptr getelementptr inbounds ([[STRUCT:%.]], ptr @Global, i32 0, i32 4))
				; CHECK-NEXT: br label [[MERGE:%.*]]
				; CHECK: merge:
				; CHECK-NEXT: ret i64 undef

llvm/test/Transforms/FunctionSpecialization/function-specialization-minsize3.ll

	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-min-function-size=3 -S < %s \| FileCheck %s			; RUN: opt -passes="ipsccp<func-spec>" -force-specialization -S < %s \| FileCheck %s

	; Checks for callsites that have been annotated with MinSize. We only expect			; Checks for callsites that have been annotated with MinSize. We only expect
	; specialisation for the call that does not have the attribute:			; specialisation for the call that does not have the attribute:
	;			;
	; CHECK: plus:			; CHECK: plus:
	; CHECK: %tmp0 = call i64 @compute.1(i64 %x, ptr @plus)			; CHECK: %tmp0 = call i64 @compute.1(i64 %x, ptr @plus)
	; CHECK: br label %merge			; CHECK: br label %merge
	; CHECK: minus:			; CHECK: minus:
	Show All 39 Lines

llvm/test/Transforms/FunctionSpecialization/function-specialization.ll

	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-min-function-size=3 -S < %s \| FileCheck %s			; RUN: opt -passes="ipsccp<func-spec>" -force-specialization -S < %s \| FileCheck %s
	; RUN: opt -passes="ipsccp<no-func-spec>" -funcspec-min-function-size=3 -S < %s \| FileCheck %s --check-prefix=NOFSPEC			; RUN: opt -passes="ipsccp<no-func-spec>" -force-specialization -S < %s \| FileCheck %s --check-prefix=NOFSPEC

	define i64 @main(i64 %x, i1 %flag) {			define i64 @main(i64 %x, i1 %flag) {
	;			;
	; CHECK-LABEL: @main(i64 %x, i1 %flag) {			; CHECK-LABEL: @main(i64 %x, i1 %flag) {
	; CHECK: entry:			; CHECK: entry:
	; CHECK-NEXT: br i1 %flag, label %plus, label %minus			; CHECK-NEXT: br i1 %flag, label %plus, label %minus
	; CHECK: plus:			; CHECK: plus:
	; CHECK-NEXT: [[TMP0:%.+]] = call i64 @compute.1(i64 %x, ptr @plus)			; CHECK-NEXT: [[TMP0:%.+]] = call i64 @compute.1(i64 %x, ptr @plus)
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/function-specialization2.ll

This file was deleted.

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes="ipsccp<func-spec>,deadargelim" -force-specialization -S < %s \| FileCheck %s
	; RUN: opt -passes="ipsccp<func-spec>,deadargelim" -funcspec-max-iters=1 -force-specialization -S < %s \| FileCheck %s
	; RUN: opt -passes="ipsccp<func-spec>,deadargelim" -funcspec-max-iters=0 -force-specialization -S < %s \| FileCheck %s --check-prefix=DISABLED

	; DISABLED-NOT: @func.1(
	; DISABLED-NOT: @func.2(

	define internal i32 @func(ptr %0, i32 %1, ptr nocapture %2) {
	%4 = alloca i32, align 4
	store i32 %1, ptr %4, align 4
	%5 = load i32, ptr %4, align 4
	%6 = icmp slt i32 %5, 1
	br i1 %6, label %14, label %7

	7: ; preds = %3
	%8 = load i32, ptr %4, align 4
	%9 = sext i32 %8 to i64
	%10 = getelementptr inbounds i32, ptr %0, i64 %9
	call void %2(ptr %10)
	%11 = load i32, ptr %4, align 4
	%12 = add nsw i32 %11, -1
	%13 = call i32 @func(ptr %0, i32 %12, ptr %2)
	br label %14

	14: ; preds = %3, %7
	ret i32 0
	}

	define internal void @increment(ptr nocapture %0) {
	%2 = load i32, ptr %0, align 4
	%3 = add nsw i32 %2, 1
	store i32 %3, ptr %0, align 4
	ret void
	}

	define internal void @decrement(ptr nocapture %0) {
	%2 = load i32, ptr %0, align 4
	%3 = add nsw i32 %2, -1
	store i32 %3, ptr %0, align 4
	ret void
	}

	define i32 @main(ptr %0, i32 %1) {
	; CHECK: call void @func.2(ptr [[TMP0:%.]], i32 [[TMP1:%.]])
	%3 = call i32 @func(ptr %0, i32 %1, ptr nonnull @increment)
	; CHECK: call void @func.1(ptr [[TMP0]], i32 0)
	%4 = call i32 @func(ptr %0, i32 %3, ptr nonnull @decrement)
	; CHECK: ret i32 0
	ret i32 %4
	}

	; CHECK: @func.1(
	; CHECK: [[TMP3:%.*]] = alloca i32, align 4
	; CHECK: store i32 [[TMP1:%.*]], ptr [[TMP3]], align 4
	; CHECK: [[TMP4:%.*]] = load i32, ptr [[TMP3]], align 4
	; CHECK: [[TMP5:%.*]] = icmp slt i32 [[TMP4]], 1
	; CHECK: br i1 [[TMP5]], label [[TMP13:%.]], label [[TMP6:%.]]
	; CHECK: 6:
	; CHECK: [[TMP7:%.*]] = load i32, ptr [[TMP3]], align 4
	; CHECK: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64
	; CHECK: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[TMP0:%.]], i64 [[TMP8]]
	; CHECK: call void @decrement(ptr [[TMP9]])
	; CHECK: [[TMP10:%.*]] = load i32, ptr [[TMP3]], align 4
	; CHECK: [[TMP11:%.*]] = add nsw i32 [[TMP10]], -1
	; CHECK: call void @func.1(ptr [[TMP0]], i32 [[TMP11]])
	; CHECK: br label [[TMP12:%.*]]
	; CHECK: 12:
	; CHECK: ret void
	;
	;
	; CHECK: @func.2(
	; CHECK: [[TMP3:%.*]] = alloca i32, align 4
	; CHECK: store i32 [[TMP1:%.*]], ptr [[TMP3]], align 4
	; CHECK: [[TMP4:%.*]] = load i32, ptr [[TMP3]], align 4
	; CHECK: [[TMP5:%.*]] = icmp slt i32 [[TMP4]], 1
	; CHECK: br i1 [[TMP5]], label [[TMP13:%.]], label [[TMP6:%.]]
	; CHECK: 6:
	; CHECK: [[TMP7:%.*]] = load i32, ptr [[TMP3]], align 4
	; CHECK: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64
	; CHECK: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[TMP0:%.]], i64 [[TMP8]]
	; CHECK: call void @increment(ptr [[TMP9]])
	; CHECK: [[TMP10:%.*]] = load i32, ptr [[TMP3]], align 4
	; CHECK: [[TMP11:%.*]] = add nsw i32 [[TMP10]], -1
	; CHECK: call void @func.2(ptr [[TMP0]], i32 [[TMP11]])
	; CHECK: br label [[TMP12:%.*]]
	; CHECK: 12:
	; CHECK: ret void

llvm/test/Transforms/FunctionSpecialization/get-possible-constants.ll

	; RUN: opt -S --passes="ipsccp<func-spec>" < %s \| FileCheck %s			; RUN: opt -S --passes="ipsccp<func-spec>" -force-specialization < %s \| FileCheck %s
	define dso_local i32 @p0(i32 noundef %x) {			define dso_local i32 @p0(i32 noundef %x) {
	entry:			entry:
	%add = add nsw i32 %x, 1			%add = add nsw i32 %x, 1
	ret i32 %add			ret i32 %add
	}			}

	define dso_local i32 @p1(i32 noundef %x) {			define dso_local i32 @p1(i32 noundef %x) {
	entry:			entry:
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/global-rank.ll

	; RUN: opt -S --passes="ipsccp<func-spec>" -funcspec-max-clones=1 < %s \| FileCheck %s			; RUN: opt -S --passes="ipsccp<func-spec>" -funcspec-max-clones=1 -force-specialization < %s \| FileCheck %s

	define internal i32 @f(i32 noundef %x, ptr nocapture noundef readonly %p, ptr nocapture noundef readonly %q) noinline {			define internal i32 @f(i32 noundef %x, ptr nocapture noundef readonly %p, ptr nocapture noundef readonly %q) noinline {
	entry:			entry:
	%call = tail call i32 %p(i32 noundef %x)			%call = tail call i32 %p(i32 noundef %x)
	%call1 = tail call i32 %q(i32 noundef %x)			%call1 = tail call i32 %q(i32 noundef %x)
	%add = add nsw i32 %call1, %call			%add = add nsw i32 %call1, %call
	ret i32 %add			ret i32 %add
	}			}

	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/identical-specializations.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes="ipsccp<func-spec>" -force-specialization -S < %s \| FileCheck %s			; RUN: opt -passes="ipsccp<func-spec>" -force-specialization -S < %s \| FileCheck %s

	define i64 @main(i64 %x, i64 %y, i1 %flag) {			define i64 @main(i64 %x, i64 %y, i1 %flag) {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]			; CHECK-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
	; CHECK: plus:			; CHECK: plus:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @compute.2(i64 [[X:%.]], i64 [[Y:%.*]], ptr @plus, ptr @minus)			; CHECK-NEXT: [[CMP0:%.]] = call i64 @compute.2(i64 [[X:%.]], i64 42, ptr @plus, ptr @minus)
	; CHECK-NEXT: br label [[MERGE:%.*]]			; CHECK-NEXT: br label [[MERGE:%.*]]
	; CHECK: minus:			; CHECK: minus:
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @compute.3(i64 [[X]], i64 [[Y]], ptr @minus, ptr @plus)			; CHECK-NEXT: [[CMP1:%.]] = call i64 @compute.3(i64 [[X]], i64 [[Y:%.]], ptr @minus, ptr @plus)
	; CHECK-NEXT: br label [[MERGE]]			; CHECK-NEXT: br label [[MERGE]]
	; CHECK: merge:			; CHECK: merge:
	; CHECK-NEXT: [[PH:%.*]] = phi i64 [ [[CMP0]], [[PLUS]] ], [ [[CMP1]], [[MINUS]] ]			; CHECK-NEXT: [[PH:%.*]] = phi i64 [ [[CMP0]], [[PLUS]] ], [ [[CMP1]], [[MINUS]] ]
	; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.2(i64 [[PH]], i64 42, ptr @plus, ptr @minus)			; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.2(i64 [[PH]], i64 42, ptr @plus, ptr @minus)
	; CHECK-NEXT: ret i64 [[CMP2]]			; CHECK-NEXT: ret i64 [[CMP2]]
	;			;
	entry:			entry:
	br i1 %flag, label %plus, label %minus			br i1 %flag, label %plus, label %minus

	plus:			plus:
	%cmp0 = call i64 @compute(i64 %x, i64 %y, ptr @plus, ptr @minus)			%cmp0 = call i64 @compute(i64 %x, i64 42, ptr @plus, ptr @minus)
	br label %merge			br label %merge

	minus:			minus:
	%cmp1 = call i64 @compute(i64 %x, i64 %y, ptr @minus, ptr @plus)			%cmp1 = call i64 @compute(i64 %x, i64 %y, ptr @minus, ptr @plus)
	br label %merge			br label %merge

	merge:			merge:
	%ph = phi i64 [ %cmp0, %plus ], [ %cmp1, %minus]			%ph = phi i64 [ %cmp0, %plus ], [ %cmp1, %minus]
	Show All 31 Lines
	; CHECK-LABEL: @compute.1			; CHECK-LABEL: @compute.1
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 %binop1(i64 [[X:%.]], i64 [[Y:%.*]])			; CHECK-NEXT: [[CMP0:%.]] = call i64 %binop1(i64 [[X:%.]], i64 [[Y:%.*]])
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])
	; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], ptr %binop1, ptr @plus)			; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], ptr %binop1, ptr @plus)

	; CHECK-LABEL: @compute.2			; CHECK-LABEL: @compute.2
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @plus(i64 [[X:%.]], i64 [[Y:%.*]])			; CHECK-NEXT: [[CMP0:%.]] = call i64 @plus(i64 [[X:%.]], i64 42)
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @minus(i64 [[X]], i64 [[Y]])			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @minus(i64 [[X]], i64 42)
	; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], ptr @plus, ptr @plus)			; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.1(i64 [[X]], i64 42, ptr @plus, ptr @plus)

	; CHECK-LABEL: @compute.3			; CHECK-LABEL: @compute.3
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @minus(i64 [[X:%.]], i64 [[Y:%.*]])			; CHECK-NEXT: [[CMP0:%.]] = call i64 @minus(i64 [[X:%.]], i64 [[Y:%.*]])
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])
	; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.3(i64 [[X]], i64 [[Y]], ptr @minus, ptr @plus)			; CHECK-NEXT: [[CMP2:%.*]] = call i64 @compute.3(i64 [[X]], i64 [[Y]], ptr @minus, ptr @plus)

llvm/test/Transforms/FunctionSpecialization/literal-const.ll

	; RUN: opt -S --passes="ipsccp<func-spec>" \			; RUN: opt -S --passes="ipsccp<func-spec>" \
				; RUN: -funcspec-for-literal-constant=0 \
	; RUN: -force-specialization < %s \| FileCheck %s -check-prefix CHECK-NOLIT			; RUN: -force-specialization < %s \| FileCheck %s -check-prefix CHECK-NOLIT
	; RUN: opt -S --passes="ipsccp<func-spec>" \			; RUN: opt -S --passes="ipsccp<func-spec>" \
	; RUN: -funcspec-for-literal-constant \			; RUN: -funcspec-for-literal-constant=1 \
	; RUN: -force-specialization < %s \| FileCheck %s -check-prefix CHECK-LIT			; RUN: -force-specialization < %s \| FileCheck %s -check-prefix CHECK-LIT

	define i32 @f0(i32 noundef %x) {			define i32 @f0(i32 noundef %x) {
	entry:			entry:
	%call = tail call i32 @neg(i32 noundef %x, i1 noundef zeroext false)			%call = tail call i32 @neg(i32 noundef %x, i1 noundef zeroext false)
	ret i32 %call			ret i32 %call
	}			}

	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/max-iters.ll

This file was added.

				; RUN: opt -passes="ipsccp<func-spec>,deadargelim" -force-specialization -S < %s \| FileCheck %s --check-prefixes=COMMON,ITERS1
				; RUN: opt -passes="ipsccp<func-spec>,deadargelim" -funcspec-max-iters=1 -force-specialization -S < %s \| FileCheck %s --check-prefixes=COMMON,ITERS1
				; RUN: opt -passes="ipsccp<func-spec>,deadargelim" -funcspec-max-iters=2 -force-specialization -S < %s \| FileCheck %s --check-prefixes=COMMON,ITERS2
				; RUN: opt -passes="ipsccp<func-spec>,deadargelim" -funcspec-max-iters=0 -force-specialization -S < %s \| FileCheck %s --check-prefix=DISABLED

				; DISABLED-NOT: @func.1(
				; DISABLED-NOT: @func.2(
				; DISABLED-NOT: @func.3(

				define internal i32 @func(ptr %0, i32 %1, ptr nocapture %2) {
				%4 = alloca i32, align 4
				store i32 %1, ptr %4, align 4
				%5 = load i32, ptr %4, align 4
				%6 = icmp slt i32 %5, 1
				br i1 %6, label %14, label %7

				7: ; preds = %3
				%8 = load i32, ptr %4, align 4
				%9 = sext i32 %8 to i64
				%10 = getelementptr inbounds i32, ptr %0, i64 %9
				call void %2(ptr %10)
				%11 = load i32, ptr %4, align 4
				%12 = add nsw i32 %11, -1
				%13 = call i32 @func(ptr %0, i32 %12, ptr %2)
				br label %14

				14: ; preds = %3, %7
				ret i32 0
				}

				define internal void @increment(ptr nocapture %0) {
				%2 = load i32, ptr %0, align 4
				%3 = add nsw i32 %2, 1
				store i32 %3, ptr %0, align 4
				ret void
				}

				define internal void @decrement(ptr nocapture %0) {
				%2 = load i32, ptr %0, align 4
				%3 = add nsw i32 %2, -1
				store i32 %3, ptr %0, align 4
				ret void
				}

				define i32 @main(ptr %0, i32 %1) {
				; COMMON: define i32 @main(
				; COMMON-NEXT: call void @func.2(ptr [[TMP0:%.]], i32 [[TMP1:%.]])
				; COMMON-NEXT: call void @func.1(ptr [[TMP0]])
				; COMMON-NEXT: ret i32 0
				;
				%3 = call i32 @func(ptr %0, i32 %1, ptr nonnull @increment)
				%4 = call i32 @func(ptr %0, i32 %3, ptr nonnull @decrement)
				ret i32 %4
				}

				; COMMON: define internal void @func.1(
				; COMMON-NEXT: [[TMP2:%.*]] = alloca i32, align 4
				; COMMON-NEXT: store i32 0, ptr [[TMP2]], align 4
				; COMMON-NEXT: [[TMP3:%.*]] = load i32, ptr [[TMP2]], align 4
				; COMMON-NEXT: [[TMP4:%.*]] = icmp slt i32 [[TMP3]], 1
				; COMMON-NEXT: br i1 [[TMP4]], label [[TMP11:%.]], label [[TMP5:%.]]
				; COMMON: 5:
				; COMMON-NEXT: [[TMP6:%.*]] = load i32, ptr [[TMP2]], align 4
				; COMMON-NEXT: [[TMP7:%.*]] = sext i32 [[TMP6]] to i64
				; COMMON-NEXT: [[TMP8:%.]] = getelementptr inbounds i32, ptr [[TMP0:%.]], i64 [[TMP7]]
				; COMMON-NEXT: call void @decrement(ptr [[TMP8]])
				; COMMON-NEXT: [[TMP9:%.*]] = load i32, ptr [[TMP2]], align 4
				; COMMON-NEXT: [[TMP10:%.*]] = add nsw i32 [[TMP9]], -1
				; ITERS1-NEXT: call void @func(ptr [[TMP0]], i32 [[TMP10]], ptr @decrement)
				; ITERS2-NEXT: call void @func.3(ptr [[TMP0]], i32 [[TMP10]])
				; COMMON-NEXT: br label [[TMP11:%.*]]
				; COMMON: 11:
				; COMMON-NEXT: ret void
				;
				; COMMON: define internal void @func.2(
				; COMMON-NEXT: [[TMP3:%.*]] = alloca i32, align 4
				; COMMON-NEXT: store i32 [[TMP1:%.*]], ptr [[TMP3]], align 4
				; COMMON-NEXT: [[TMP4:%.*]] = load i32, ptr [[TMP3]], align 4
				; COMMON-NEXT: [[TMP5:%.*]] = icmp slt i32 [[TMP4]], 1
				; COMMON-NEXT: br i1 [[TMP5]], label [[TMP13:%.]], label [[TMP6:%.]]
				; COMMON: 6:
				; COMMON-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP3]], align 4
				; COMMON-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64
				; COMMON-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[TMP0:%.]], i64 [[TMP8]]
				; COMMON-NEXT: call void @increment(ptr [[TMP9]])
				; COMMON-NEXT: [[TMP10:%.*]] = load i32, ptr [[TMP3]], align 4
				; COMMON-NEXT: [[TMP11:%.*]] = add nsw i32 [[TMP10]], -1
				; COMMON-NEXT: call void @func.2(ptr [[TMP0]], i32 [[TMP11]])
				; COMMON-NEXT: br label [[TMP12:%.*]]
				; COMMON: 12:
				; COMMON-NEXT: ret void
				;
				; ITERS2: define internal void @func.3(
				; ITERS2-NEXT: [[TMP3:%.*]] = alloca i32, align 4
				; ITERS2-NEXT: store i32 [[TMP1:%.*]], ptr [[TMP3]], align 4
				; ITERS2-NEXT: [[TMP4:%.*]] = load i32, ptr [[TMP3]], align 4
				; ITERS2-NEXT: [[TMP5:%.*]] = icmp slt i32 [[TMP4]], 1
				; ITERS2-NEXT: br i1 [[TMP5]], label [[TMP13:%.]], label [[TMP6:%.]]
				; ITERS2: 6:
				; ITERS2-NEXT: [[TMP7:%.*]] = load i32, ptr [[TMP3]], align 4
				; ITERS2-NEXT: [[TMP8:%.*]] = sext i32 [[TMP7]] to i64
				; ITERS2-NEXT: [[TMP9:%.]] = getelementptr inbounds i32, ptr [[TMP0:%.]], i64 [[TMP8]]
				; ITERS2-NEXT: call void @decrement(ptr [[TMP9]])
				; ITERS2-NEXT: [[TMP10:%.*]] = load i32, ptr [[TMP3]], align 4
				; ITERS2-NEXT: [[TMP11:%.*]] = add nsw i32 [[TMP10]], -1
				; ITERS2-NEXT: call void @func.3(ptr [[TMP0]], i32 [[TMP11]])
				; ITERS2-NEXT: br label [[TMP12:%.*]]
				; ITERS2: 12:
				; ITERS2-NEXT: ret void

llvm/test/Transforms/FunctionSpecialization/noinline.ll

	; RUN: opt -S --passes="ipsccp<func-spec>" < %s \| FileCheck %s			; RUN: opt -S --passes="ipsccp<func-spec>" -funcspec-min-entry-freq=1 < %s \| FileCheck %s
	define dso_local i32 @p0(i32 noundef %x) {			define dso_local i32 @p0(i32 noundef %x) {
	entry:			entry:
	%add = add nsw i32 %x, 1			%add = add nsw i32 %x, 1
	ret i32 %add			ret i32 %add
	}			}

	define dso_local i32 @p1(i32 noundef %x) {			define dso_local i32 @p1(i32 noundef %x) {
	entry:			entry:
	Show All 26 Lines

llvm/test/Transforms/FunctionSpecialization/remove-dead-recursive-function.ll

	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-min-function-size=3 -S < %s \| FileCheck %s			; RUN: opt -passes="ipsccp<func-spec>" -force-specialization -S < %s \| FileCheck %s

	define i64 @main(i64 %x, i1 %flag) {			define i64 @main(i64 %x, i1 %flag) {
	entry:			entry:
	br i1 %flag, label %plus, label %minus			br i1 %flag, label %plus, label %minus

	plus:			plus:
	%tmp0 = call i64 @compute(i64 %x, ptr @plus)			%tmp0 = call i64 @compute(i64 %x, ptr @plus)
	br label %merge			br label %merge
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-max-clones=0 -funcspec-min-function-size=14 -S < %s \| FileCheck %s --check-prefix=NONE			; RUN: opt -passes="ipsccp<func-spec>" -funcspec-max-clones=0 -force-specialization -S < %s \| FileCheck %s --check-prefix=NONE
	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-max-clones=1 -funcspec-min-function-size=14 -S < %s \| FileCheck %s --check-prefix=ONE			; RUN: opt -passes="ipsccp<func-spec>" -funcspec-max-clones=1 -force-specialization -S < %s \| FileCheck %s --check-prefix=ONE
	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-max-clones=2 -funcspec-min-function-size=14 -S < %s \| FileCheck %s --check-prefix=TWO			; RUN: opt -passes="ipsccp<func-spec>" -funcspec-max-clones=2 -force-specialization -S < %s \| FileCheck %s --check-prefix=TWO
	; RUN: opt -passes="ipsccp<func-spec>" -funcspec-max-clones=3 -funcspec-min-function-size=14 -S < %s \| FileCheck %s --check-prefix=THREE			; RUN: opt -passes="ipsccp<func-spec>" -funcspec-max-clones=3 -force-specialization -S < %s \| FileCheck %s --check-prefix=THREE

	; Make sure that we iterate correctly after sorting the specializations:			; Make sure that we iterate correctly after sorting the specializations:
	; FnSpecialization: Specializations for function compute			;
	; FnSpecialization: Gain = 608			; Score(@plus, @minus) > Score(42, @minus, @power) > Score(@power, @mul)
	; FnSpecialization: FormalArg = binop1, ActualArg = power
	; FnSpecialization: FormalArg = binop2, ActualArg = mul
	; FnSpecialization: Gain = 982
	; FnSpecialization: FormalArg = binop1, ActualArg = plus
	; FnSpecialization: FormalArg = binop2, ActualArg = minus
	; FnSpecialization: Gain = 795
	; FnSpecialization: FormalArg = binop1, ActualArg = minus
	; FnSpecialization: FormalArg = binop2, ActualArg = power

	define i64 @main(i64 %x, i64 %y, i1 %flag) {			define i64 @main(i64 %x, i64 %y, i1 %flag) {
	; NONE-LABEL: @main(			; NONE-LABEL: @main(
	; NONE-NEXT: entry:			; NONE-NEXT: entry:
	; NONE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]			; NONE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
	; NONE: plus:			; NONE: plus:
	; NONE-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.*]], ptr @power, ptr @mul)			; NONE-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.*]], ptr @power, ptr @mul)
	; NONE-NEXT: br label [[MERGE:%.*]]			; NONE-NEXT: br label [[MERGE:%.*]]
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x			; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
	; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y			; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
	; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2			; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
	; THREE-NEXT: ret i64 [[TMP5]]			; THREE-NEXT: ret i64 [[TMP5]]
	; THREE-NEXT: }			; THREE-NEXT: }
	;			;
	; THREE-LABEL: define internal i64 @compute.3(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {			; THREE-LABEL: define internal i64 @compute.3(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {
	; THREE-NEXT: entry:			; THREE-NEXT: entry:
	; THREE-NEXT: [[TMP0:%.+]] = call i64 @minus(i64 %x, i64 %y)			; THREE-NEXT: [[TMP0:%.+]] = call i64 @minus(i64 %x, i64 42)
	; THREE-NEXT: [[TMP1:%.+]] = call i64 @power(i64 %x, i64 %y)			; THREE-NEXT: [[TMP1:%.+]] = call i64 @power(i64 %x, i64 42)
	; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]			; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
	; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x			; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
	; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y			; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], 42
	; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2			; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
	; THREE-NEXT: ret i64 [[TMP5]]			; THREE-NEXT: ret i64 [[TMP5]]
	; THREE-NEXT: }			; THREE-NEXT: }
	;			;
	define internal i64 @compute(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {			define internal i64 @compute(i64 %x, i64 %y, ptr %binop1, ptr %binop2) {
	entry:			entry:
	%tmp0 = call i64 %binop1(i64 %x, i64 %y)			%tmp0 = call i64 %binop1(i64 %x, i64 %y)
	%tmp1 = call i64 %binop2(i64 %x, i64 %y)			%tmp1 = call i64 %binop2(i64 %x, i64 %y)
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	FunctionSpecializer getSpecializerFor(Function *F) {
return FunctionSpecializer(Solver, M, &FAM, GetBFI, GetTLI, GetTTI,		return FunctionSpecializer(Solver, M, &FAM, GetBFI, GetTLI, GetTTI,
GetAC);		GetAC);
}		}

Cost getInstCost(Instruction &I) {		Cost getInstCost(Instruction &I) {
auto &TTI = FAM.getResult<TargetIRAnalysis>(*I.getFunction());		auto &TTI = FAM.getResult<TargetIRAnalysis>(*I.getFunction());
auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(*I.getFunction());		auto &BFI = FAM.getResult<BlockFrequencyAnalysis>(*I.getFunction());

return BFI.getBlockFreq(I.getParent()).getFrequency() / BFI.getEntryFreq() *		uint64_t Weight = FunctionSpecializer::getBlockFreqMultiplier() *
		BFI.getBlockFreq(I.getParent()).getFrequency() /
		BFI.getEntryFreq();
		return Weight *
TTI.getInstructionCost(&I, TargetTransformInfo::TCK_SizeAndLatency);		TTI.getInstructionCost(&I, TargetTransformInfo::TCK_SizeAndLatency);
}		}
};		};

} // namespace llvm		} // namespace llvm

using namespace llvm;		using namespace llvm;

▲ Show 20 Lines • Show All 162 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Enable specialization of literal constants.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 525488

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

llvm/test/Transforms/FunctionSpecialization/compiler-crash-58759.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization-constant-expression.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization-minsize3.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization2.ll

llvm/test/Transforms/FunctionSpecialization/get-possible-constants.ll

llvm/test/Transforms/FunctionSpecialization/global-rank.ll

llvm/test/Transforms/FunctionSpecialization/identical-specializations.ll

llvm/test/Transforms/FunctionSpecialization/literal-const.ll

llvm/test/Transforms/FunctionSpecialization/max-iters.ll

llvm/test/Transforms/FunctionSpecialization/noinline.ll

llvm/test/Transforms/FunctionSpecialization/remove-dead-recursive-function.ll

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp

[FuncSpec] Enable specialization of literal constants.
ClosedPublic