This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
1
SCCPSolver.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
2/9
FunctionSpecialization.cpp
-
test/Transforms/FunctionSpecialization/
-
Transforms/
-
FunctionSpecialization/
-
identical-specializations.ll

Differential D135463

[FuncSpec] Do not generate multiple copies for identical specializations.
AbandonedPublic

Authored by labrinea on Oct 7 2022, 10:15 AM.

Download Raw Diff

Details

Reviewers

chill
SjoerdMeijer
ChuanqiXu

Summary

Fix for the problem shown in D135459.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

labrinea created this revision.Oct 7 2022, 10:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 10:15 AM

Herald added subscribers: snehasish, ormris, hiraditya. · View Herald Transcript

labrinea requested review of this revision.Oct 7 2022, 10:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 10:15 AM

labrinea added a parent revision: D135459: [NFC][FuncSpec] Add a test to show redundant function cloning..Oct 7 2022, 10:15 AM

Harbormaster completed remote builds in B190970: Diff 466112.Oct 7 2022, 10:16 AM

chill added inline comments.Oct 10 2022, 3:47 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
526	Avoid C-style casts. It should work as `hash_value(ArrayRef<ArgInfo>(Info.Args));`
528	This is not correct, if the hash values clash two different specialisation will be regarded as the same and one will be discarded.
529	Discarding duplicates should happen before we decide which of the possible specialisations to perform. For example, in `calculateGains` the `MaxClonedThreshold` limit is applied on a list, which potentially contains duplicates.
534	This looks like a big redundancy, we traverse all the call sites of `F` for each specialisation while all we need is traverse all the call sites of `F` once and redirect the call site to the correct specialisation.
539	So the idea here is, when specializing on the second parameter only, to turn void g(int x, int y) { ... g(x, y); ... } ... g(1, 2); into void g.1(int x, /* unused */ int y) { ... g.1(x, 2); ... } ... g.1(1, 2); But the same ought to happen for: void g(int x, int y) { ... g(x, 2); ... } ... g(1, 2); and it looks to me the test will miss it because it will compare `2` (from the call argument) to `y` in the cloned function. (Similar issue in the original code as well).

chill added inline comments.Oct 10 2022, 7:01 AM

llvm/include/llvm/Transforms/Utils/SCCPSolver.h
54	I don't really see the point in creating the temporary pair. return hash_combine(Info.Formal, Info.Actual)

labrinea mentioned this in D135459: [NFC][FuncSpec] Add a test to show redundant function cloning..Oct 12 2022, 2:01 AM

This adds just a 0.03% geomean overhead in terms of instruction count for CTMark (llvmtestsuite O3+NewPM) over the parent revision D126455 (with specialization of functions enabled).

labrinea edited parent revisions, added: D126455: [FuncSpec] Make the Function Specializer part of the IPSCCP pass.; removed: D135459: [NFC][FuncSpec] Add a test to show redundant function cloning..Nov 16 2022, 10:19 AM

Harbormaster completed remote builds in B198030: Diff 475864.Nov 16 2022, 10:19 AM

chill added inline comments.Nov 17 2022, 1:56 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
82 ↗	(On Diff #475864)	This has to be a `DenseSet`. `std::set` has logarithmic complexity of operations.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
408–409	Here evaluation of the gain occurs even if the specialization is a duplicate. I would expect `getSpecializationBonus` to take non-negligible time.
726–727	(nit) This function return `Close` twice, once as a return value and once as an out argument. A cleaner design would be to just return the value as the function does not depend on the `Clone` member variable and need not know anything about it.

labrinea added inline comments.Nov 17 2022, 5:47 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
82 ↗	(On Diff #475864)	DenseSet is more expensive to iterate on. I have another patch which sorts the specializations globally (across module, not per function) where I keep the specializations sorted in a std::multiset and I use std::merge to unite the two sets. That's relatively cheap as it doesn't perform copies of objects but shuffles pointers around. Also that multiset is iterated a lot more than the set I am adding here. The compexity is logarithmic to the number of callsites, still not a huge number in my opinion.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
408–409	This is already happening, but good point there's room for improvement.
726–727	I cached the Clone pointer to the SpecInfo for the parent revision D126455, so that updateCallSites() does not rely on two vectors (clones, specialziations) being in sync. So you are suggesting to not set the member variable here, but after returning from this function?

chill added reviewers: SjoerdMeijer, ChuanqiXu.Nov 17 2022, 6:23 AM

chill added inline comments.Nov 18 2022, 8:45 AM

llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h
82 ↗	(On Diff #475864)	This should be a comparison function which induces strict weak ordering on the elements.

labrinea mentioned this in D139346: [FuncSpec] Global ranking of specialisations.Dec 6 2022, 3:59 AM

Added strict weak ordering for the elements inserted to std::set.

Harbormaster completed remote builds in B201922: Diff 481216.Dec 8 2022, 2:41 AM

labrinea added inline comments.Dec 8 2022, 2:51 AM

llvm/lib/Transforms/IPO/SCCP.cpp
46 ↗	(On Diff #481216)	oops

This patch achieves the same as D139346 with ~0.001% (geomean) increase of Instruction Count for CTMark (LTO) or ~0.033% (geomean) increase of execution time (user+system) for the IPSCCP pass. I am abandoning this patch in favor of D139346.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

SCCPSolver.h

4 lines

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

128 lines

test/

Transforms/

FunctionSpecialization/

identical-specializations.ll

6 lines

Diff 466112

llvm/include/llvm/Transforms/Utils/SCCPSolver.h

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	};			};

	/// Helper struct shared between Function Specialization and SCCP Solver.			/// Helper struct shared between Function Specialization and SCCP Solver.
	struct ArgInfo {			struct ArgInfo {
	Argument *Formal; // The Formal argument being analysed.			Argument *Formal; // The Formal argument being analysed.
	Constant *Actual; // A corresponding actual constant argument.			Constant *Actual; // A corresponding actual constant argument.

	ArgInfo(Argument F, Constant A) : Formal(F), Actual(A){};			ArgInfo(Argument F, Constant A) : Formal(F), Actual(A){};

				friend hash_code hash_value(const ArgInfo &Info) {
				return hash_value(std::make_pair(Info.Formal, Info.Actual));
				chillUnsubmitted Not Done Reply Inline Actions I don't really see the point in creating the temporary pair. return hash_combine(Info.Formal, Info.Actual) chill: I don't really see the point in creating the temporary pair. return hash_combine(Info.
				}
	};			};

	class SCCPInstVisitor;			class SCCPInstVisitor;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	/// SCCPSolver - This interface class is a general purpose solver for Sparse			/// SCCPSolver - This interface class is a general purpose solver for Sparse
	/// Conditional Constant Propagation (SCCP).			/// Conditional Constant Propagation (SCCP).
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

Show First 20 Lines • Show All 315 Lines • ▼ Show 20 Lines	for (auto *F : Candidates) {

SmallVector<CallSpecBinding, 8> Specializations;		SmallVector<CallSpecBinding, 8> Specializations;
if (!calculateGains(F, Cost, Specializations)) {		if (!calculateGains(F, Cost, Specializations)) {
LLVM_DEBUG(dbgs() << "FnSpecialization: No possible constants found\n");		LLVM_DEBUG(dbgs() << "FnSpecialization: No possible constants found\n");
continue;		continue;
}		}

Changed = true;		Changed = true;
for (auto &Entry : Specializations)		createSpecializations(Specializations, F, WorkList);
specializeFunction(F, Entry.second, WorkList);
}		}

updateSpecializedFuncs(Candidates, WorkList);		updateSpecializedFuncs(Candidates, WorkList);
NumFuncSpecialized += NbFunctionsSpecialized;		NumFuncSpecialized += NbFunctionsSpecialized;
return Changed;		return Changed;
}		}

void removeDeadInstructions() {		void removeDeadInstructions() {
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	if (I.second) {
Metrics.analyzeBasicBlock(&BB, (GetTTI)(*F), EphValues);		Metrics.analyzeBasicBlock(&BB, (GetTTI)(*F), EphValues);

LLVM_DEBUG(dbgs() << "FnSpecialization: Code size of function "		LLVM_DEBUG(dbgs() << "FnSpecialization: Code size of function "
<< F->getName() << " is " << Metrics.NumInsts		<< F->getName() << " is " << Metrics.NumInsts
<< " instructions\n");		<< " instructions\n");
}		}
return Metrics;		return Metrics;
}		}

/// Clone the function \p F and remove the ssa_copy intrinsics added by		/// Clone the function \p F and remove the ssa_copy intrinsics added by
		chillUnsubmitted Not Done Reply Inline Actions Here evaluation of the gain occurs even if the specialization is a duplicate. I would expect `getSpecializationBonus` to take non-negligible time. chill: Here evaluation of the gain occurs even if the specialization is a duplicate. I would expect…
		labrineaAuthorUnsubmitted Done Reply Inline Actions This is already happening, but good point there's room for improvement. labrinea: This is already happening, but good point there's room for improvement.
/// the SCCPSolver in the cloned version.		/// the SCCPSolver in the cloned version.
Function cloneCandidateFunction(Function F, ValueToValueMapTy &Mappings) {		Function cloneCandidateFunction(Function F, ValueToValueMapTy &Mappings) {
Function *Clone = CloneFunction(F, Mappings);		Function *Clone = CloneFunction(F, Mappings);
removeSSACopy(*Clone);		removeSSACopy(*Clone);
return Clone;		return Clone;
}		}

/// This function decides whether it's worthwhile to specialize function		/// This function decides whether it's worthwhile to specialize function
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	bool isCandidateFunction(Function *F) {
if (F->hasFnAttribute(Attribute::AlwaysInline))		if (F->hasFnAttribute(Attribute::AlwaysInline))
return false;		return false;

LLVM_DEBUG(dbgs() << "FnSpecialization: Try function: " << F->getName()		LLVM_DEBUG(dbgs() << "FnSpecialization: Try function: " << F->getName()
<< "\n");		<< "\n");
return true;		return true;
}		}

void specializeFunction(Function *F, SpecializationInfo &S,		void createSpecializations(SmallVectorImpl<CallSpecBinding> &Specializations,
FuncList &WorkList) {		Function *F, FuncList &WorkList) {
		// Make sure we don't create multiple copies for the same specialization.
		SmallDenseMap<size_t, Function *, 4> Clones;

		for (const auto &Entry : Specializations) {
		CallBase *Call = Entry.first;
		const SpecializationInfo &Info = Entry.second;

		// Create a copy of the function if it doesn't already exist.
		hash_code Key = hash_value((ArrayRef<ArgInfo>)Info.Args);
		chillUnsubmitted Not Done Reply Inline Actions Avoid C-style casts. It should work as `hash_value(ArrayRef<ArgInfo>(Info.Args));` chill: Avoid C-style casts. It should work as `hash_value(ArrayRef<ArgInfo>(Info.Args));`
		auto Pair = Clones.insert({Key, nullptr});
		Function *&Clone = Pair.first->second;
		chillUnsubmitted Not Done Reply Inline Actions This is not correct, if the hash values clash two different specialisation will be regarded as the same and one will be discarded. chill: This is not correct, if the hash values clash two different specialisation will be regarded as…
		if (Pair.second) {
		chillUnsubmitted Not Done Reply Inline Actions Discarding duplicates should happen before we decide which of the possible specialisations to perform. For example, in `calculateGains` the `MaxClonedThreshold` limit is applied on a list, which potentially contains duplicates. chill: Discarding duplicates should happen before we decide which of the possible specialisations to…
ValueToValueMapTy Mappings;		ValueToValueMapTy Mappings;
Function *Clone = cloneCandidateFunction(F, Mappings);		Clone = cloneCandidateFunction(F, Mappings);
		// Update recursive calls if their parameters are the same as the
		// formal arguments of the specialization.
		for (User *U : F->users())
		chillUnsubmitted Not Done Reply Inline Actions This looks like a big redundancy, we traverse all the call sites of `F` for each specialisation while all we need is traverse all the call sites of `F` once and redirect the call site to the correct specialisation. chill: This looks like a big redundancy, we traverse all the call sites of `F` for each specialisation…
		if (auto CS = dyn_cast<CallBase>(U))
		if (CS->getFunction() == Clone &&
		all_of(Info.Args, [CS, &Mappings](const ArgInfo &Arg) {
		unsigned ArgNo = Arg.Formal->getArgNo();
		return CS->getArgOperand(ArgNo) == Mappings[Arg.Formal];
		chillUnsubmitted Not Done Reply Inline Actions So the idea here is, when specializing on the second parameter only, to turn void g(int x, int y) { ... g(x, y); ... } ... g(1, 2); into void g.1(int x, /* unused / int y) { ... g.1(x, 2); ... } ... g.1(1, 2); But the same ought to happen for: void g(int x, int y) { ... g(x, 2); ... } ... g(1, 2); and it looks to me the test will miss it because it will compare `2` (from the call argument) to `y` in the cloned function. (Similar issue in the original code as well). chill:* So the idea here is, when specializing on the second parameter only, to turn ``` void g(int x…
		}))
		CS->setCalledFunction(Clone);
		}

// Rewrite calls to the function so that they call the clone instead.		// Update the call site.
rewriteCallSites(Clone, S.Args, Mappings);		Call->setCalledFunction(Clone);
		Solver.markOverdefined(Call);

// Initialize the lattice state of the arguments of the function clone,		// Initialize the lattice state of the arguments of the function clone,
// marking the argument on which we specialized the function constant		// marking the argument on which we specialized the function constant
// with the given value.		// with the given value.
Solver.markArgInFuncSpecialization(Clone, S.Args);		Solver.markArgInFuncSpecialization(Clone, Info.Args);

// Mark all the specialized functions		// Mark all the specialized functions
WorkList.push_back(Clone);		WorkList.push_back(Clone);
NbFunctionsSpecialized++;		NbFunctionsSpecialized++;

// If the function has been completely specialized, the original function		// If the function has been completely specialized, the original function
// is no longer needed. Mark it unreachable.		// is no longer needed. Mark it unreachable.
if (F->getNumUses() == 0 \|\| all_of(F->users(), [F](User *U) {		if (F->getNumUses() == 0 \|\| all_of(F->users(), [F](User *U) {
if (auto *CS = dyn_cast<CallBase>(U))		if (auto *CS = dyn_cast<CallBase>(U))
return CS->getFunction() == F;		return CS->getFunction() == F;
return false;		return false;
})) {		})) {
Solver.markFunctionUnreachable(F);		Solver.markFunctionUnreachable(F);
FullySpecialized.insert(F);		FullySpecialized.insert(F);
}		}
}		}
		}

/// Compute and return the cost of specializing function \p F.		/// Compute and return the cost of specializing function \p F.
InstructionCost getSpecializationCost(Function *F) {		InstructionCost getSpecializationCost(Function *F) {
CodeMetrics &Metrics = analyzeFunction(F);		CodeMetrics &Metrics = analyzeFunction(F);
// If the code metrics reveal that we shouldn't duplicate the function, we		// If the code metrics reveal that we shouldn't duplicate the function, we
// shouldn't specialize it. Set the specialization cost to Invalid.		// shouldn't specialize it. Set the specialization cost to Invalid.
// Or if the lines of codes implies that this function is easy to get		// Or if the lines of codes implies that this function is easy to get
// inlined so that we shouldn't specialize it.		// inlined so that we shouldn't specialize it.
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	bool isArgumentInteresting(Argument *A,
// model, so we would need to find the unique constants.		// model, so we would need to find the unique constants.
//		//
// TODO 2: this currently does not support constants, i.e. integer ranges.		// TODO 2: this currently does not support constants, i.e. integer ranges.
//		//
getPossibleConstants(A, Constants);		getPossibleConstants(A, Constants);

if (Constants.empty())		if (Constants.empty())
return false;		return false;

LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting argument "		LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting argument "
		chillUnsubmitted Not Done Reply Inline Actions (nit) This function return `Close` twice, once as a return value and once as an out argument. A cleaner design would be to just return the value as the function does not depend on the `Clone` member variable and need not know anything about it. chill: (nit) This function return `Close` twice, once as a return value and once as an out argument. A…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I cached the Clone pointer to the SpecInfo for the parent revision D126455, so that updateCallSites() does not rely on two vectors (clones, specialziations) being in sync. So you are suggesting to not set the member variable here, but after returning from this function? labrinea: I cached the Clone pointer to the SpecInfo for the parent revision D126455, so that…
<< A->getNameOrAsOperand() << "\n");		<< A->getNameOrAsOperand() << "\n");
return true;		return true;
}		}

/// Collect in \p Constants all the constant values that argument \p A can		/// Collect in \p Constants all the constant values that argument \p A can
/// take on.		/// take on.
void getPossibleConstants(Argument *A,		void getPossibleConstants(Argument *A,
SmallVectorImpl<CallArgBinding> &Constants) {		SmallVectorImpl<CallArgBinding> &Constants) {
Show All 36 Lines	for (User *U : F->users()) {
}		}

if (isa<Constant>(V) && (Solver.getLatticeValueFor(V).isConstant() \|\|		if (isa<Constant>(V) && (Solver.getLatticeValueFor(V).isConstant() \|\|
EnableSpecializationForLiteralConstant))		EnableSpecializationForLiteralConstant))
Constants.push_back({&CS, cast<Constant>(V)});		Constants.push_back({&CS, cast<Constant>(V)});
}		}
}		}

/// Rewrite calls to function \p F to call function \p Clone instead.
///
/// This function modifies calls to function \p F as long as the actual
/// arguments match those in \p Args. Note that for recursive calls we
/// need to compare against the cloned formal arguments.
///
/// Callsites that have been marked with the MinSize function attribute won't
/// be specialized and rewritten.
void rewriteCallSites(Function *Clone, const SmallVectorImpl<ArgInfo> &Args,
ValueToValueMapTy &Mappings) {
assert(!Args.empty() && "Specialization without arguments");
Function *F = Args[0].Formal->getParent();

SmallVector<CallBase *, 8> CallSitesToRewrite;
for (auto *U : F->users()) {
if (!isa<CallInst>(U) && !isa<InvokeInst>(U))
continue;
auto &CS = *cast<CallBase>(U);
if (!CS.getCalledFunction() \|\| CS.getCalledFunction() != F)
continue;
CallSitesToRewrite.push_back(&CS);
}

LLVM_DEBUG(dbgs() << "FnSpecialization: Replacing call sites of "
<< F->getName() << " with " << Clone->getName() << "\n");

for (auto *CS : CallSitesToRewrite) {
LLVM_DEBUG(dbgs() << "FnSpecialization: "
<< CS->getFunction()->getName() << " ->" << *CS
<< "\n");
if (/* recursive call */
(CS->getFunction() == Clone &&
all_of(Args,
[CS, &Mappings](const ArgInfo &Arg) {
unsigned ArgNo = Arg.Formal->getArgNo();
return CS->getArgOperand(ArgNo) == Mappings[Arg.Formal];
})) \|\|
/* normal call */
all_of(Args, [CS](const ArgInfo &Arg) {
unsigned ArgNo = Arg.Formal->getArgNo();
return CS->getArgOperand(ArgNo) == Arg.Actual;
})) {
CS->setCalledFunction(Clone);
Solver.markOverdefined(CS);
}
}
}

void updateSpecializedFuncs(FuncList &Candidates, FuncList &WorkList) {		void updateSpecializedFuncs(FuncList &Candidates, FuncList &WorkList) {
for (auto *F : WorkList) {		for (auto *F : WorkList) {
SpecializedFuncs.insert(F);		SpecializedFuncs.insert(F);

// Initialize the state of the newly created functions, marking them		// Initialize the state of the newly created functions, marking them
// argument-tracked and executable.		// argument-tracked and executable.
if (F->hasExactDefinition() && !F->hasFnAttribute(Attribute::Naked))		if (F->hasExactDefinition() && !F->hasFnAttribute(Attribute::Naked))
Solver.addTrackedFunction(F);		Solver.addTrackedFunction(F);
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/identical-specializations.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @plus(i64 [[X:%.]], i64 [[Y:%.*]])			; CHECK-NEXT: [[CMP0:%.]] = call i64 @plus(i64 [[X:%.]], i64 [[Y:%.*]])
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @minus(i64 [[X]], i64 [[Y]])			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @minus(i64 [[X]], i64 [[Y]])

	; CHECK-LABEL: @compute.2			; CHECK-LABEL: @compute.2
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @minus(i64 [[X:%.]], i64 [[Y:%.*]])			; CHECK-NEXT: [[CMP0:%.]] = call i64 @minus(i64 [[X:%.]], i64 [[Y:%.*]])
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])			; CHECK-NEXT: [[CMP1:%.*]] = call i64 @plus(i64 [[X]], i64 [[Y]])

	; CHECK-LABEL: @compute.3			; CHECK-NOT: define internal i64 @compute.3(
	; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP0:%.]] = call i64 @plus(i64 [[X:%.]], i64 [[Y:%.*]])
	; CHECK-NEXT: [[CMP1:%.*]] = call i64 @minus(i64 [[X]], i64 [[Y]])

This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Do not generate multiple copies for identical specializations.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 466112

llvm/include/llvm/Transforms/Utils/SCCPSolver.h

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

llvm/test/Transforms/FunctionSpecialization/identical-specializations.ll

[FuncSpec] Do not generate multiple copies for identical specializations.
AbandonedPublic