This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
ClosedPublic

Authored by hoy on Sep 30 2021, 9:25 AM.

Download Raw Diff

Details

Reviewers

wenlei
wlei
wmi

Commits

rG098a0d8fbc4e: [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.

Summary

This patch continues unblocking optimizations that are blocked by pseudo probe instrumentation.

Not exactly like DbgIntrinsics, PseudoProbe intrinsic has other attributes (such as mayread, maywrite, mayhaveSideEffect) that can block optimizations. The issues fixed are:

Flipped default param of getFirstNonPHIOrDbg API to skip pseudo probes
Unblocked CSE by avoiding pseudo probe from clobbering memory SSA
Unblocked induction variable simpliciation
Allow empty loop deletion by treating probe intrinsic isDroppable
Some refactoring.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Sep 30 2021, 9:25 AM

Herald added subscribers: ormris, dexonsmith, modimo and 5 others. · View Herald TranscriptSep 30 2021, 9:25 AM

hoy requested review of this revision.Sep 30 2021, 9:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2021, 9:25 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Thanks for the changes to reduce probe overhead. As we chatted off patch, some information and verification of this change's impact on profile quality would be helpful.

In D110847#3035672, @wenlei wrote:

Thanks for the changes to reduce probe overhead. As we chatted off patch, some information and verification of this change's impact on profile quality would be helpful.

The original patch contained several general changes, e.g, flipping the default values of those skipping APIs, which were observed to affect profile quality. After narrowing down the offending changes that are in SimplifyCFG, reverting them brought back the profile quality while still reduced the probe overhead in a decent amount.

Updating D110847: [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.

Harbormaster completed remote builds in B128130: Diff 378697.Oct 11 2021, 10:31 AM

In D110847#3055368, @hoy wrote:

In D110847#3035672, @wenlei wrote:

Thanks for the changes to reduce probe overhead. As we chatted off patch, some information and verification of this change's impact on profile quality would be helpful.

The original patch contained several general changes, e.g, flipping the default values of those skipping APIs, which were observed to affect profile quality. After narrowing down the offending changes that are in SimplifyCFG, reverting them brought back the profile quality while still reduced the probe overhead in a decent amount.

What are the specific optimizations that were critical for profile quality in SimplifyCFG? On the other hand, it looks to me that some of the other unblocked optimizations like InstCombine could also have some impact on profile quality.
The change looks good, but I think it'd also be nice to have some insight and reasoning as to why certain optimizations have be blocked for profile quality, beyond experiment results which could vary from one workload to another.

In D110847#3058762, @wenlei wrote:

In D110847#3055368, @hoy wrote:

In D110847#3035672, @wenlei wrote:

Thanks for the changes to reduce probe overhead. As we chatted off patch, some information and verification of this change's impact on profile quality would be helpful.

The original patch contained several general changes, e.g, flipping the default values of those skipping APIs, which were observed to affect profile quality. After narrowing down the offending changes that are in SimplifyCFG, reverting them brought back the profile quality while still reduced the probe overhead in a decent amount.

What are the specific optimizations that were critical for profile quality in SimplifyCFG? On the other hand, it looks to me that some of the other unblocked optimizations like InstCombine could also have some impact on profile quality.
The change looks good, but I think it'd also be nice to have some insight and reasoning as to why certain optimizations have be blocked for profile quality, beyond experiment results which could vary from one workload to another.

Good question. In general SimplifyCFG is more disruptive to the flow graph by folding, merging or removing blocks. Such flow graph changes affected the execution frequency of blocks, and samples collected on the new blocks have a problem to correlate to original blocks. A particular case is

/// If this basic block is simple enough, and if a predecessor branches to us
/// and one of our successors, fold the block into the predecessor and use
/// logical operations to pick the right destination.
bool llvm::FoldBranchToCommonDest(BranchInst *BI, DomTreeUpdater *DTU,

Compared to SimplifyCFG, InstCombine is more like optimizing on instruction level. Moving an instruction around or folding instructions are likely having a smaller impact on the shape of flow graph and block frequencies.

In D110847#3058794, @hoy wrote:
In D110847#3058762, @wenlei wrote:

In D110847#3055368, @hoy wrote:

In D110847#3035672, @wenlei wrote:

Thanks for the changes to reduce probe overhead. As we chatted off patch, some information and verification of this change's impact on profile quality would be helpful.

The original patch contained several general changes, e.g, flipping the default values of those skipping APIs, which were observed to affect profile quality. After narrowing down the offending changes that are in SimplifyCFG, reverting them brought back the profile quality while still reduced the probe overhead in a decent amount.

What are the specific optimizations that were critical for profile quality in SimplifyCFG? On the other hand, it looks to me that some of the other unblocked optimizations like InstCombine could also have some impact on profile quality.
The change looks good, but I think it'd also be nice to have some insight and reasoning as to why certain optimizations have be blocked for profile quality, beyond experiment results which could vary from one workload to another.

Good question. In general SimplifyCFG is more disruptive to the flow graph by folding, merging or removing blocks. Such flow graph changes affected the execution frequency of blocks, and samples collected on the new blocks have a problem to correlate to original blocks. A particular case is
/// If this basic block is simple enough, and if a predecessor branches to us
/// and one of our successors, fold the block into the predecessor and use
/// logical operations to pick the right destination.
bool llvm::FoldBranchToCommonDest(BranchInst *BI, DomTreeUpdater *DTU,
Compared to SimplifyCFG, InstCombine is more like optimizing on instruction level. Moving an instruction around or folding instructions are likely having a smaller impact on the shape of flow graph and block frequencies.

Ok, while some are destructive, perhaps there're still recoverable cases (e.g. SimplifyCondBranchToCondBranch). But we can deal with them later.

This revision is now accepted and ready to land.Oct 12 2021, 9:17 AM

Closed by commit rG098a0d8fbc4e: [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3. (authored by hoy). · Explain WhyOct 12 2021, 9:44 AM

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rG098a0d8fbc4e: [CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3..

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

BasicBlock.h

12 lines

lib/

Analysis/

InlineCost.cpp

7 lines

MemorySSA.cpp

2 lines

CodeGen/

Analysis.cpp

4 lines

IR/

User.cpp

2 lines

Transforms/

IPO/

GlobalDCE.cpp

2 lines

GlobalOpt.cpp

2 lines

InstCombine/

InstCombineCalls.cpp

2 lines

InstCombineLoadStoreAlloca.cpp

4 lines

InstructionCombining.cpp

2 lines

Scalar/

EarlyCSE.cpp

6 lines

IndVarSimplify.cpp

4 lines

Utils/

CloneFunction.cpp

4 lines

SimplifyCFG.cpp

13 lines

Vectorize/

VectorCombine.cpp

2 lines

test/

Transforms/

SampleProfile/

pseudo-probe-cse.ll

28 lines

pseudo-probe-loop-deletion.ll

35 lines

Diff 379086

llvm/include/llvm/IR/BasicBlock.h

Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	public:
Instruction* getFirstNonPHI() {		Instruction* getFirstNonPHI() {
return const_cast<Instruction *>(		return const_cast<Instruction *>(
static_cast<const BasicBlock *>(this)->getFirstNonPHI());		static_cast<const BasicBlock *>(this)->getFirstNonPHI());
}		}

/// Returns a pointer to the first instruction in this block that is not a		/// Returns a pointer to the first instruction in this block that is not a
/// PHINode or a debug intrinsic, or any pseudo operation if \c SkipPseudoOp		/// PHINode or a debug intrinsic, or any pseudo operation if \c SkipPseudoOp
/// is true.		/// is true.
const Instruction *getFirstNonPHIOrDbg(bool SkipPseudoOp = false) const;		const Instruction *getFirstNonPHIOrDbg(bool SkipPseudoOp = true) const;
Instruction *getFirstNonPHIOrDbg(bool SkipPseudoOp = false) {		Instruction *getFirstNonPHIOrDbg(bool SkipPseudoOp = true) {
return const_cast<Instruction *>(		return const_cast<Instruction *>(
static_cast<const BasicBlock *>(this)->getFirstNonPHIOrDbg(		static_cast<const BasicBlock *>(this)->getFirstNonPHIOrDbg(
SkipPseudoOp));		SkipPseudoOp));
}		}

/// Returns a pointer to the first instruction in this block that is not a		/// Returns a pointer to the first instruction in this block that is not a
/// PHINode, a debug intrinsic, or a lifetime intrinsic, or any pseudo		/// PHINode, a debug intrinsic, or a lifetime intrinsic, or any pseudo
/// operation if \c SkipPseudoOp is true.		/// operation if \c SkipPseudoOp is true.
const Instruction *		const Instruction *
getFirstNonPHIOrDbgOrLifetime(bool SkipPseudoOp = false) const;		getFirstNonPHIOrDbgOrLifetime(bool SkipPseudoOp = true) const;
Instruction *getFirstNonPHIOrDbgOrLifetime(bool SkipPseudoOp = false) {		Instruction *getFirstNonPHIOrDbgOrLifetime(bool SkipPseudoOp = true) {
return const_cast<Instruction *>(		return const_cast<Instruction *>(
static_cast<const BasicBlock *>(this)->getFirstNonPHIOrDbgOrLifetime(		static_cast<const BasicBlock *>(this)->getFirstNonPHIOrDbgOrLifetime(
SkipPseudoOp));		SkipPseudoOp));
}		}

/// Returns an iterator to the first instruction in this block that is		/// Returns an iterator to the first instruction in this block that is
/// suitable for inserting a non-PHI instruction.		/// suitable for inserting a non-PHI instruction.
///		///
/// In particular, it skips all PHIs and LandingPad instructions.		/// In particular, it skips all PHIs and LandingPad instructions.
const_iterator getFirstInsertionPt() const;		const_iterator getFirstInsertionPt() const;
iterator getFirstInsertionPt() {		iterator getFirstInsertionPt() {
return static_cast<const BasicBlock *>(this)		return static_cast<const BasicBlock *>(this)
->getFirstInsertionPt().getNonConst();		->getFirstInsertionPt().getNonConst();
}		}

/// Return a const iterator range over the instructions in the block, skipping		/// Return a const iterator range over the instructions in the block, skipping
/// any debug instructions. Skip any pseudo operations as well if \c		/// any debug instructions. Skip any pseudo operations as well if \c
/// SkipPseudoOp is true.		/// SkipPseudoOp is true.
iterator_range<filter_iterator<BasicBlock::const_iterator,		iterator_range<filter_iterator<BasicBlock::const_iterator,
std::function<bool(const Instruction &)>>>		std::function<bool(const Instruction &)>>>
instructionsWithoutDebug(bool SkipPseudoOp = false) const;		instructionsWithoutDebug(bool SkipPseudoOp = true) const;

/// Return an iterator range over the instructions in the block, skipping any		/// Return an iterator range over the instructions in the block, skipping any
/// debug instructions. Skip and any pseudo operations as well if \c		/// debug instructions. Skip and any pseudo operations as well if \c
/// SkipPseudoOp is true.		/// SkipPseudoOp is true.
iterator_range<		iterator_range<
filter_iterator<BasicBlock::iterator, std::function<bool(Instruction &)>>>		filter_iterator<BasicBlock::iterator, std::function<bool(Instruction &)>>>
instructionsWithoutDebug(bool SkipPseudoOp = false);		instructionsWithoutDebug(bool SkipPseudoOp = true);

/// Return the size of the basic block ignoring debug instructions		/// Return the size of the basic block ignoring debug instructions
filter_iterator<BasicBlock::const_iterator,		filter_iterator<BasicBlock::const_iterator,
std::function<bool(const Instruction &)>>::difference_type		std::function<bool(const Instruction &)>>::difference_type
sizeWithoutDebug() const;		sizeWithoutDebug() const;

/// Unlink 'this' from the containing function, but do not delete it.		/// Unlink 'this' from the containing function, but do not delete it.
void removeFromParent();		void removeFromParent();
▲ Show 20 Lines • Show All 370 Lines • Show Last 20 Lines

llvm/lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 2,387 Lines • ▼ Show 20 Lines	CallAnalyzer::analyzeBlock(BasicBlock *BB,
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
// FIXME: Currently, the number of instructions in a function regardless of		// FIXME: Currently, the number of instructions in a function regardless of
// our ability to simplify them during inline to constants or dead code,		// our ability to simplify them during inline to constants or dead code,
// are actually used by the vector bonus heuristic. As long as that's true,		// are actually used by the vector bonus heuristic. As long as that's true,
// we have to special case debug intrinsics here to prevent differences in		// we have to special case debug intrinsics here to prevent differences in
// inlining due to debug symbols. Eventually, the number of unsimplified		// inlining due to debug symbols. Eventually, the number of unsimplified
// instructions shouldn't factor into the cost computation, but until then,		// instructions shouldn't factor into the cost computation, but until then,
// hack around it here.		// hack around it here.
if (isa<DbgInfoIntrinsic>(I))		// Similarly, skip pseudo-probes.
continue;		if (I.isDebugOrPseudoInst())

// Skip pseudo-probes.
if (isa<PseudoProbeInst>(I))
continue;		continue;

// Skip ephemeral values.		// Skip ephemeral values.
if (EphValues.count(&I))		if (EphValues.count(&I))
continue;		continue;

++NumInstructions;		++NumInstructions;
if (isa<ExtractElementInst>(I) \|\| I.getType()->isVectorTy())		if (isa<ExtractElementInst>(I) \|\| I.getType()->isVectorTy())
▲ Show 20 Lines • Show All 716 Lines • Show Last 20 Lines

llvm/lib/Analysis/MemorySSA.cpp

Show First 20 Lines • Show All 303 Lines • ▼ Show 20 Lines	if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(DefInst)) {
// (including creating MemoryAccesses for them): we just end up inventing		// (including creating MemoryAccesses for them): we just end up inventing
// clobbers where they don't really exist at all. Please see D43269 for		// clobbers where they don't really exist at all. Please see D43269 for
// context.		// context.
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
case Intrinsic::invariant_start:		case Intrinsic::invariant_start:
case Intrinsic::invariant_end:		case Intrinsic::invariant_end:
case Intrinsic::assume:		case Intrinsic::assume:
case Intrinsic::experimental_noalias_scope_decl:		case Intrinsic::experimental_noalias_scope_decl:
		case Intrinsic::pseudoprobe:
return {false, AliasResult(AliasResult::NoAlias)};		return {false, AliasResult(AliasResult::NoAlias)};
case Intrinsic::dbg_addr:		case Intrinsic::dbg_addr:
case Intrinsic::dbg_declare:		case Intrinsic::dbg_declare:
case Intrinsic::dbg_label:		case Intrinsic::dbg_label:
case Intrinsic::dbg_value:		case Intrinsic::dbg_value:
llvm_unreachable("debuginfo shouldn't have associated defs!");		llvm_unreachable("debuginfo shouldn't have associated defs!");
default:		default:
break;		break;
▲ Show 20 Lines • Show All 1,457 Lines • ▼ Show 20 Lines	MemoryUseOrDef MemorySSA::createNewAccess(Instruction I,
// FIXME: Replace this special casing with a more accurate modelling of		// FIXME: Replace this special casing with a more accurate modelling of
// assume's control dependency.		// assume's control dependency.
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default:		default:
break;		break;
case Intrinsic::assume:		case Intrinsic::assume:
case Intrinsic::experimental_noalias_scope_decl:		case Intrinsic::experimental_noalias_scope_decl:
		case Intrinsic::pseudoprobe:
return nullptr;		return nullptr;
}		}
}		}

// Using a nonstandard AA pipelines might leave us with unexpected modref		// Using a nonstandard AA pipelines might leave us with unexpected modref
// results for I, so add a check to not model instructions that may not read		// results for I, so add a check to not model instructions that may not read
// from or write to memory. This is necessary for correctness.		// from or write to memory. This is necessary for correctness.
if (!I->mayReadFromMemory() && !I->mayWriteToMemory())		if (!I->mayReadFromMemory() && !I->mayWriteToMemory())
▲ Show 20 Lines • Show All 891 Lines • Show Last 20 Lines

llvm/lib/CodeGen/Analysis.cpp

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	bool llvm::isInTailCallPosition(const CallBase &Call, const TargetMachine &TM) {

// If I will have a chain, make sure no other instruction that will have a		// If I will have a chain, make sure no other instruction that will have a
// chain interposes between I and the return.		// chain interposes between I and the return.
// Check for all calls including speculatable functions.		// Check for all calls including speculatable functions.
for (BasicBlock::const_iterator BBI = std::prev(ExitBB->end(), 2);; --BBI) {		for (BasicBlock::const_iterator BBI = std::prev(ExitBB->end(), 2);; --BBI) {
if (&*BBI == &Call)		if (&*BBI == &Call)
break;		break;
// Debug info intrinsics do not get in the way of tail call optimization.		// Debug info intrinsics do not get in the way of tail call optimization.
if (isa<DbgInfoIntrinsic>(BBI))
continue;
// Pseudo probe intrinsics do not block tail call optimization either.		// Pseudo probe intrinsics do not block tail call optimization either.
if (isa<PseudoProbeInst>(BBI))		if (BBI->isDebugOrPseudoInst())
continue;		continue;
// A lifetime end, assume or noalias.decl intrinsic should not stop tail		// A lifetime end, assume or noalias.decl intrinsic should not stop tail
// call optimization.		// call optimization.
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(BBI))		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(BBI))
if (II->getIntrinsicID() == Intrinsic::lifetime_end \|\|		if (II->getIntrinsicID() == Intrinsic::lifetime_end \|\|
II->getIntrinsicID() == Intrinsic::assume \|\|		II->getIntrinsicID() == Intrinsic::assume \|\|
II->getIntrinsicID() == Intrinsic::experimental_noalias_scope_decl)		II->getIntrinsicID() == Intrinsic::experimental_noalias_scope_decl)
continue;		continue;
▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

llvm/lib/IR/User.cpp

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	MutableArrayRef<uint8_t> User::getDescriptor() {
auto DI = reinterpret_cast<DescriptorInfo >(getIntrusiveOperands()) - 1;		auto DI = reinterpret_cast<DescriptorInfo >(getIntrusiveOperands()) - 1;
assert(DI->SizeInBytes != 0 && "Should not have had a descriptor otherwise!");		assert(DI->SizeInBytes != 0 && "Should not have had a descriptor otherwise!");

return MutableArrayRef<uint8_t>(		return MutableArrayRef<uint8_t>(
reinterpret_cast<uint8_t *>(DI) - DI->SizeInBytes, DI->SizeInBytes);		reinterpret_cast<uint8_t *>(DI) - DI->SizeInBytes, DI->SizeInBytes);
}		}

bool User::isDroppable() const {		bool User::isDroppable() const {
return isa<AssumeInst>(this);		return isa<AssumeInst>(this) \|\| isa<PseudoProbeInst>(this);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// User operator new Implementations		// User operator new Implementations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void *User::allocateFixedOperandUser(size_t Size, unsigned Us,		void *User::allocateFixedOperandUser(size_t Size, unsigned Us,
unsigned DescBytes) {		unsigned DescBytes) {
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/GlobalDCE.cpp

	Show First 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	ModulePass *llvm::createGlobalDCEPass() {			ModulePass *llvm::createGlobalDCEPass() {
	return new GlobalDCELegacyPass();			return new GlobalDCELegacyPass();
	}			}

	/// Returns true if F is effectively empty.			/// Returns true if F is effectively empty.
	static bool isEmptyFunction(Function *F) {			static bool isEmptyFunction(Function *F) {
	BasicBlock &Entry = F->getEntryBlock();			BasicBlock &Entry = F->getEntryBlock();
	for (auto &I : Entry) {			for (auto &I : Entry) {
	if (isa<DbgInfoIntrinsic>(I))			if (I.isDebugOrPseudoInst())
	continue;			continue;
	if (auto *RI = dyn_cast<ReturnInst>(&I))			if (auto *RI = dyn_cast<ReturnInst>(&I))
	return !RI->getReturnValue();			return !RI->getReturnValue();
	break;			break;
	}			}
	return false;			return false;
	}			}

	▲ Show 20 Lines • Show All 371 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/GlobalOpt.cpp

	Show First 20 Lines • Show All 2,574 Lines • ▼ Show 20 Lines
	/// the code so we simply check for 'ret'.			/// the code so we simply check for 'ret'.
	static bool cxxDtorIsEmpty(const Function &Fn) {			static bool cxxDtorIsEmpty(const Function &Fn) {
	// FIXME: We could eliminate C++ destructors if they're readonly/readnone and			// FIXME: We could eliminate C++ destructors if they're readonly/readnone and
	// nounwind, but that doesn't seem worth doing.			// nounwind, but that doesn't seem worth doing.
	if (Fn.isDeclaration())			if (Fn.isDeclaration())
	return false;			return false;

	for (auto &I : Fn.getEntryBlock()) {			for (auto &I : Fn.getEntryBlock()) {
	if (isa<DbgInfoIntrinsic>(I))			if (I.isDebugOrPseudoInst())
	continue;			continue;
	if (isa<ReturnInst>(I))			if (isa<ReturnInst>(I))
	return true;			return true;
	break;			break;
	}			}
	return false;			return false;
	}			}

	▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

	Show First 20 Lines • Show All 676 Lines • ▼ Show 20 Lines
	removeTriviallyEmptyRange(IntrinsicInst &EndI, InstCombinerImpl &IC,			removeTriviallyEmptyRange(IntrinsicInst &EndI, InstCombinerImpl &IC,
	std::function<bool(const IntrinsicInst &)> IsStart) {			std::function<bool(const IntrinsicInst &)> IsStart) {
	// We start from the end intrinsic and scan backwards, so that InstCombine			// We start from the end intrinsic and scan backwards, so that InstCombine
	// has already processed (and potentially removed) all the instructions			// has already processed (and potentially removed) all the instructions
	// before the end intrinsic.			// before the end intrinsic.
	BasicBlock::reverse_iterator BI(EndI), BE(EndI.getParent()->rend());			BasicBlock::reverse_iterator BI(EndI), BE(EndI.getParent()->rend());
	for (; BI != BE; ++BI) {			for (; BI != BE; ++BI) {
	if (auto I = dyn_cast<IntrinsicInst>(&BI)) {			if (auto I = dyn_cast<IntrinsicInst>(&BI)) {
	if (isa<DbgInfoIntrinsic>(I) \|\|			if (I->isDebugOrPseudoInst() \|\|
	I->getIntrinsicID() == EndI.getIntrinsicID())			I->getIntrinsicID() == EndI.getIntrinsicID())
	continue;			continue;
	if (IsStart(*I)) {			if (IsStart(*I)) {
	if (haveSameOperands(EndI, *I, EndI.arg_size())) {			if (haveSameOperands(EndI, *I, EndI.arg_size())) {
	IC.eraseInstFromFunction(*I);			IC.eraseInstFromFunction(*I);
	IC.eraseInstFromFunction(EndI);			IC.eraseInstFromFunction(EndI);
	return true;			return true;
	}			}
	▲ Show 20 Lines • Show All 2,577 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show First 20 Lines • Show All 1,481 Lines • ▼ Show 20 Lines	bool InstCombinerImpl::mergeStoreIntoSuccessor(StoreInst &SI) {
if (!OtherBr \|\| BBI == OtherBB->begin())		if (!OtherBr \|\| BBI == OtherBB->begin())
return false;		return false;

// If the other block ends in an unconditional branch, check for the 'if then		// If the other block ends in an unconditional branch, check for the 'if then
// else' case. There is an instruction before the branch.		// else' case. There is an instruction before the branch.
StoreInst *OtherStore = nullptr;		StoreInst *OtherStore = nullptr;
if (OtherBr->isUnconditional()) {		if (OtherBr->isUnconditional()) {
--BBI;		--BBI;
// Skip over debugging info.		// Skip over debugging info and pseudo probes.
while (isa<DbgInfoIntrinsic>(BBI) \|\|		while (BBI->isDebugOrPseudoInst() \|\|
(isa<BitCastInst>(BBI) && BBI->getType()->isPointerTy())) {		(isa<BitCastInst>(BBI) && BBI->getType()->isPointerTy())) {
if (BBI==OtherBB->begin())		if (BBI==OtherBB->begin())
return false;		return false;
--BBI;		--BBI;
}		}
// If this isn't a store, isn't a store to the same location, or is not the		// If this isn't a store, isn't a store to the same location, or is not the
// right kind of store, bail out.		// right kind of store, bail out.
OtherStore = dyn_cast<StoreInst>(BBI);		OtherStore = dyn_cast<StoreInst>(BBI);
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 2,919 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitUnconditionalBranchInst(BranchInst &BI) {
assert(BI.isUnconditional() && "Only for unconditional branches.");		assert(BI.isUnconditional() && "Only for unconditional branches.");

// If this store is the second-to-last instruction in the basic block		// If this store is the second-to-last instruction in the basic block
// (excluding debug info and bitcasts of pointers) and if the block ends with		// (excluding debug info and bitcasts of pointers) and if the block ends with
// an unconditional branch, try to move the store to the successor block.		// an unconditional branch, try to move the store to the successor block.

auto GetLastSinkableStore = [](BasicBlock::iterator BBI) {		auto GetLastSinkableStore = [](BasicBlock::iterator BBI) {
auto IsNoopInstrForStoreMerging = [](BasicBlock::iterator BBI) {		auto IsNoopInstrForStoreMerging = [](BasicBlock::iterator BBI) {
return isa<DbgInfoIntrinsic>(BBI) \|\|		return BBI->isDebugOrPseudoInst() \|\|
(isa<BitCastInst>(BBI) && BBI->getType()->isPointerTy());		(isa<BitCastInst>(BBI) && BBI->getType()->isPointerTy());
};		};

BasicBlock::iterator FirstInstr = BBI->getParent()->begin();		BasicBlock::iterator FirstInstr = BBI->getParent()->begin();
do {		do {
if (BBI != FirstInstr)		if (BBI != FirstInstr)
--BBI;		--BBI;
} while (BBI != FirstInstr && IsNoopInstrForStoreMerging(BBI));		} while (BBI != FirstInstr && IsNoopInstrForStoreMerging(BBI));
▲ Show 20 Lines • Show All 1,359 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/EarlyCSE.cpp

Show First 20 Lines • Show All 1,259 Lines • ▼ Show 20 Lines	for (Instruction &Inst : make_early_inc_range(BB->getInstList())) {
}		}

// Skip sideeffect intrinsics, for the same reason as assume intrinsics.		// Skip sideeffect intrinsics, for the same reason as assume intrinsics.
if (match(&Inst, m_Intrinsic<Intrinsic::sideeffect>())) {		if (match(&Inst, m_Intrinsic<Intrinsic::sideeffect>())) {
LLVM_DEBUG(dbgs() << "EarlyCSE skipping sideeffect: " << Inst << '\n');		LLVM_DEBUG(dbgs() << "EarlyCSE skipping sideeffect: " << Inst << '\n');
continue;		continue;
}		}

		// Skip pseudoprobe intrinsics, for the same reason as assume intrinsics.
		if (match(&Inst, m_Intrinsic<Intrinsic::pseudoprobe>())) {
		LLVM_DEBUG(dbgs() << "EarlyCSE skipping pseudoprobe: " << Inst << '\n');
		continue;
		}

// We can skip all invariant.start intrinsics since they only read memory,		// We can skip all invariant.start intrinsics since they only read memory,
// and we can forward values across it. For invariant starts without		// and we can forward values across it. For invariant starts without
// invariant ends, we can use the fact that the invariantness never ends to		// invariant ends, we can use the fact that the invariantness never ends to
// start a scope in the current generaton which is true for all future		// start a scope in the current generaton which is true for all future
// generations. Also, we dont need to consume the last store since the		// generations. Also, we dont need to consume the last store since the
// semantics of invariant.start allow us to perform DSE of the last		// semantics of invariant.start allow us to perform DSE of the last
// store, if there was a store following invariant.start. Consider:		// store, if there was a store following invariant.start. Consider:
//		//
▲ Show 20 Lines • Show All 470 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

Show First 20 Lines • Show All 1,269 Lines • ▼ Show 20 Lines	while (I != Preheader->begin()) {
// Otherwise, sink it to the exit block.		// Otherwise, sink it to the exit block.
Instruction ToMove = &I;		Instruction ToMove = &I;
bool Done = false;		bool Done = false;

if (I != Preheader->begin()) {		if (I != Preheader->begin()) {
// Skip debug info intrinsics.		// Skip debug info intrinsics.
do {		do {
--I;		--I;
} while (isa<DbgInfoIntrinsic>(I) && I != Preheader->begin());		} while (I->isDebugOrPseudoInst() && I != Preheader->begin());

if (isa<DbgInfoIntrinsic>(I) && I == Preheader->begin())		if (I->isDebugOrPseudoInst() && I == Preheader->begin())
Done = true;		Done = true;
} else {		} else {
Done = true;		Done = true;
}		}

MadeAnyChanges = true;		MadeAnyChanges = true;
ToMove->moveBefore(*ExitBlock, InsertPt);		ToMove->moveBefore(*ExitBlock, InsertPt);
if (Done) break;		if (Done) break;
▲ Show 20 Lines • Show All 705 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/CloneFunction.cpp

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (DIFinder && TheModule)
DIFinder->processInstruction(*TheModule, I);		DIFinder->processInstruction(*TheModule, I);

Instruction *NewInst = I.clone();		Instruction *NewInst = I.clone();
if (I.hasName())		if (I.hasName())
NewInst->setName(I.getName() + NameSuffix);		NewInst->setName(I.getName() + NameSuffix);
NewBB->getInstList().push_back(NewInst);		NewBB->getInstList().push_back(NewInst);
VMap[&I] = NewInst; // Add instruction map to value.		VMap[&I] = NewInst; // Add instruction map to value.

hasCalls \|= (isa<CallInst>(I) && !isa<DbgInfoIntrinsic>(I));		hasCalls \|= (isa<CallInst>(I) && !I.isDebugOrPseudoInst());
if (const AllocaInst *AI = dyn_cast<AllocaInst>(&I)) {		if (const AllocaInst *AI = dyn_cast<AllocaInst>(&I)) {
if (!AI->isStaticAlloca()) {		if (!AI->isStaticAlloca()) {
hasDynamicAllocas = true;		hasDynamicAllocas = true;
}		}
}		}
}		}

if (CodeInfo) {		if (CodeInfo) {
▲ Show 20 Lines • Show All 331 Lines • ▼ Show 20 Lines	if (!isa<PHINode>(NewInst)) {
}		}
}		}
}		}

if (II->hasName())		if (II->hasName())
NewInst->setName(II->getName() + NameSuffix);		NewInst->setName(II->getName() + NameSuffix);
VMap[&*II] = NewInst; // Add instruction map to value.		VMap[&*II] = NewInst; // Add instruction map to value.
NewBB->getInstList().push_back(NewInst);		NewBB->getInstList().push_back(NewInst);
hasCalls \|= (isa<CallInst>(II) && !isa<DbgInfoIntrinsic>(II));		hasCalls \|= (isa<CallInst>(II) && !II->isDebugOrPseudoInst());

if (CodeInfo) {		if (CodeInfo) {
CodeInfo->OrigVMap[&*II] = NewInst;		CodeInfo->OrigVMap[&*II] = NewInst;
if (auto CB = dyn_cast<CallBase>(&II))		if (auto CB = dyn_cast<CallBase>(&II))
if (CB->hasOperandBundles())		if (CB->hasOperandBundles())
CodeInfo->OperandBundleCallSites.push_back(NewInst);		CodeInfo->OperandBundleCallSites.push_back(NewInst);
}		}

▲ Show 20 Lines • Show All 621 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,581 Lines • ▼ Show 20 Lines	if (isa<AssumeInst>(V))
return true;		return true;
return isSafeToSpeculativelyExecute(V) &&		return isSafeToSpeculativelyExecute(V) &&
all_of(V->users(),		all_of(V->users(),
[&](const User *U) { return EphValues.count(U); });		[&](const User *U) { return EphValues.count(U); });
};		};

// Walk the loop in reverse so that we can identify ephemeral values properly		// Walk the loop in reverse so that we can identify ephemeral values properly
// (values only feeding assumes).		// (values only feeding assumes).
for (Instruction &I : reverse(BB->instructionsWithoutDebug())) {		for (Instruction &I : reverse(BB->instructionsWithoutDebug(false))) {
// Can't fold blocks that contain noduplicate or convergent calls.		// Can't fold blocks that contain noduplicate or convergent calls.
if (CallInst *CI = dyn_cast<CallInst>(&I))		if (CallInst *CI = dyn_cast<CallInst>(&I))
if (CI->cannotDuplicate() \|\| CI->isConvergent())		if (CI->cannotDuplicate() \|\| CI->isConvergent())
return false;		return false;

// Ignore ephemeral values which are deleted during codegen.		// Ignore ephemeral values which are deleted during codegen.
if (IsEphemeral(&I))		if (IsEphemeral(&I))
EphValues.insert(&I);		EphValues.insert(&I);
▲ Show 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	if (PN->getType()->isIntegerTy(1) &&
return Changed;		return Changed;

// If all PHI nodes are promotable, check to make sure that all instructions		// If all PHI nodes are promotable, check to make sure that all instructions
// in the predecessor blocks can be promoted as well. If not, we won't be able		// in the predecessor blocks can be promoted as well. If not, we won't be able
// to get rid of the control flow, so it's not worth promoting to select		// to get rid of the control flow, so it's not worth promoting to select
// instructions.		// instructions.
for (BasicBlock *IfBlock : IfBlocks)		for (BasicBlock *IfBlock : IfBlocks)
for (BasicBlock::iterator I = IfBlock->begin(); !I->isTerminator(); ++I)		for (BasicBlock::iterator I = IfBlock->begin(); !I->isTerminator(); ++I)
if (!AggressiveInsts.count(&*I) && !isa<DbgInfoIntrinsic>(I) &&		if (!AggressiveInsts.count(&*I) && !I->isDebugOrPseudoInst()) {
!isa<PseudoProbeInst>(I)) {
// This is not an aggressive instruction that we can promote.		// This is not an aggressive instruction that we can promote.
// Because of this, we won't be able to get rid of the control flow, so		// Because of this, we won't be able to get rid of the control flow, so
// the xform is not worth it.		// the xform is not worth it.
return Changed;		return Changed;
}		}

// If either of the blocks has it's address taken, we can't do this fold.		// If either of the blocks has it's address taken, we can't do this fold.
if (any_of(IfBlocks,		if (any_of(IfBlocks,
▲ Show 20 Lines • Show All 507 Lines • ▼ Show 20 Lines	auto IsWorthwhile = [&](BasicBlock BB, ArrayRef<StoreInst > FreeStores) {
if (!BB)		if (!BB)
return true;		return true;
// Heuristic: if the block can be if-converted/phi-folded and the		// Heuristic: if the block can be if-converted/phi-folded and the
// instructions inside are all cheap (arithmetic/GEPs), it's worthwhile to		// instructions inside are all cheap (arithmetic/GEPs), it's worthwhile to
// thread this store.		// thread this store.
InstructionCost Cost = 0;		InstructionCost Cost = 0;
InstructionCost Budget =		InstructionCost Budget =
PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic;		PHINodeFoldingThreshold * TargetTransformInfo::TCC_Basic;
for (auto &I : BB->instructionsWithoutDebug()) {		for (auto &I : BB->instructionsWithoutDebug(false)) {
// Consider terminator instruction to be free.		// Consider terminator instruction to be free.
if (I.isTerminator())		if (I.isTerminator())
continue;		continue;
// If this is one the stores that we want to speculate out of this BB,		// If this is one the stores that we want to speculate out of this BB,
// then don't count it's cost, consider it to be free.		// then don't count it's cost, consider it to be free.
if (auto *S = dyn_cast<StoreInst>(&I))		if (auto *S = dyn_cast<StoreInst>(&I))
if (llvm::find(FreeStores, S))		if (llvm::find(FreeStores, S))
continue;		continue;
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	static bool SimplifyCondBranchToCondBranch(BranchInst PBI, BranchInst BI,
if (MergeCondStores && mergeConditionalStores(PBI, BI, DTU, DL, TTI))		if (MergeCondStores && mergeConditionalStores(PBI, BI, DTU, DL, TTI))
return true;		return true;

// If this is a conditional branch in an empty block, and if any		// If this is a conditional branch in an empty block, and if any
// predecessors are a conditional branch to one of our destinations,		// predecessors are a conditional branch to one of our destinations,
// fold the conditions into logical ops and one cond br.		// fold the conditions into logical ops and one cond br.

// Ignore dbg intrinsics.		// Ignore dbg intrinsics.
if (&*BB->instructionsWithoutDebug().begin() != BI)		if (&*BB->instructionsWithoutDebug(false).begin() != BI)
return false;		return false;

int PBIOp, BIOp;		int PBIOp, BIOp;
if (PBI->getSuccessor(0) == BI->getSuccessor(0)) {		if (PBI->getSuccessor(0) == BI->getSuccessor(0)) {
PBIOp = 0;		PBIOp = 0;
BIOp = 0;		BIOp = 0;
} else if (PBI->getSuccessor(0) == BI->getSuccessor(1)) {		} else if (PBI->getSuccessor(0) == BI->getSuccessor(1)) {
PBIOp = 0;		PBIOp = 0;
▲ Show 20 Lines • Show All 1,426 Lines • ▼ Show 20 Lines	GetCaseResults(SwitchInst SI, ConstantInt CaseVal, BasicBlock *CaseDest,
const DataLayout &DL, const TargetTransformInfo &TTI) {		const DataLayout &DL, const TargetTransformInfo &TTI) {
// The block from which we enter the common destination.		// The block from which we enter the common destination.
BasicBlock *Pred = SI->getParent();		BasicBlock *Pred = SI->getParent();

// If CaseDest is empty except for some side-effect free instructions through		// If CaseDest is empty except for some side-effect free instructions through
// which we can constant-propagate the CaseVal, continue to its successor.		// which we can constant-propagate the CaseVal, continue to its successor.
SmallDenseMap<Value , Constant > ConstantPool;		SmallDenseMap<Value , Constant > ConstantPool;
ConstantPool.insert(std::make_pair(SI->getCondition(), CaseVal));		ConstantPool.insert(std::make_pair(SI->getCondition(), CaseVal));
for (Instruction &I :CaseDest->instructionsWithoutDebug()) {		for (Instruction &I : CaseDest->instructionsWithoutDebug(false)) {
if (I.isTerminator()) {		if (I.isTerminator()) {
// If the terminator is a simple branch, continue to the next block.		// If the terminator is a simple branch, continue to the next block.
if (I.getNumSuccessors() != 1 \|\| I.isExceptionalTerminator())		if (I.getNumSuccessors() != 1 \|\| I.isExceptionalTerminator())
return false;		return false;
Pred = CaseDest;		Pred = CaseDest;
CaseDest = I.getSuccessor(0);		CaseDest = I.getSuccessor(0);
} else if (Constant *C = ConstantFold(&I, DL, ConstantPool)) {		} else if (Constant *C = ConstantFold(&I, DL, ConstantPool)) {
// Instruction is side-effect free and constant.		// Instruction is side-effect free and constant.
▲ Show 20 Lines • Show All 998 Lines • ▼ Show 20 Lines	if (isValueEqualityComparison(SI)) {

Value *Cond = SI->getCondition();		Value *Cond = SI->getCondition();
if (SelectInst *Select = dyn_cast<SelectInst>(Cond))		if (SelectInst *Select = dyn_cast<SelectInst>(Cond))
if (SimplifySwitchOnSelect(SI, Select))		if (SimplifySwitchOnSelect(SI, Select))
return requestResimplify();		return requestResimplify();

// If the block only contains the switch, see if we can fold the block		// If the block only contains the switch, see if we can fold the block
// away into any preds.		// away into any preds.
if (SI == &*BB->instructionsWithoutDebug().begin())		if (SI == &*BB->instructionsWithoutDebug(false).begin())
if (FoldValueComparisonIntoPredecessors(SI, Builder))		if (FoldValueComparisonIntoPredecessors(SI, Builder))
return requestResimplify();		return requestResimplify();
}		}

// Try to transform the switch into an icmp and a branch.		// Try to transform the switch into an icmp and a branch.
if (TurnSwitchRangeIntoICmp(SI, Builder))		if (TurnSwitchRangeIntoICmp(SI, Builder))
return requestResimplify();		return requestResimplify();

▲ Show 20 Lines • Show All 583 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Show First 20 Lines • Show All 1,079 Lines • ▼ Show 20 Lines	auto FoldInst = [this, &MadeChange](Instruction &I) {
MadeChange \|= foldSingleElementStore(I);		MadeChange \|= foldSingleElementStore(I);
};		};
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Use early increment range so that we can erase instructions in loop.		// Use early increment range so that we can erase instructions in loop.
for (Instruction &I : make_early_inc_range(BB)) {		for (Instruction &I : make_early_inc_range(BB)) {
if (isa<DbgInfoIntrinsic>(I))		if (I.isDebugOrPseudoInst())
continue;		continue;
FoldInst(I);		FoldInst(I);
}		}
}		}

while (!Worklist.isEmpty()) {		while (!Worklist.isEmpty()) {
Instruction *I = Worklist.removeOne();		Instruction *I = Worklist.removeOne();
if (!I)		if (!I)
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/test/Transforms/SampleProfile/pseudo-probe-cse.ll

This file was added.

				; RUN: opt < %s -S -early-cse-memssa \| FileCheck %s

				define i16 @f1() readonly {
				ret i16 0
				}

				declare void @f2()

				; Check that EarlyCSE correctly handles pseudo probes that don't have
				; a MemoryAccess.

				define void @f3() {
				; CHECK-LABEL: @f3(
				; CHECK-NEXT: [[CALL1:%.*]] = call i16 @f1()
				; CHECK-NEXT: call void @llvm.pseudoprobe
				; CHECK-NEXT: ret void
				;
				%call1 = call i16 @f1()
				call void @llvm.pseudoprobe(i64 6878943695821059507, i64 9, i32 0, i64 -1)
				%call2 = call i16 @f1()
				ret void
				}


				; Function Attrs: inaccessiblememonly nounwind willreturn
				declare void @llvm.pseudoprobe(i64, i64, i32, i64) #0

				attributes #0 = { inaccessiblememonly nounwind willreturn }
				No newline at end of file

llvm/test/Transforms/SampleProfile/pseudo-probe-loop-deletion.ll

This file was added.

				; RUN: opt %s -passes=loop-deletion -S \| FileCheck %s --check-prefixes=CHECK

				%class.Loc.95 = type { %class.Domain.96 }
				%class.Domain.96 = type { %class.DomainBase.97 }
				%class.DomainBase.97 = type { [3 x %struct.WrapNoInit] }
				%struct.WrapNoInit = type { %class.Loc }
				%class.Loc = type { %class.Domain.67 }
				%class.Domain.67 = type { %class.DomainBase.68 }
				%class.DomainBase.68 = type { i32 }

				define dso_local void @foo(%class.Loc.95* %0) {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: br label [[foo:%.*]]
				; CHECK: foo.exit:
				; CHECK-NEXT: ret void
				;
				br label %2

				2: ; preds = %4, %1
				%.0.i.i = phi %class.Loc.95* [ undef, %1 ], [ %5, %4 ]
				%3 = icmp ne %class.Loc.95* %.0.i.i, %0
				br i1 %3, label %4, label %foo.exit

				4: ; preds = %2
				call void @llvm.pseudoprobe(i64 6878943695821059507, i64 9, i32 0, i64 -1)
				%5 = getelementptr inbounds %class.Loc.95, %class.Loc.95* %.0.i.i, i32 1
				br label %2

				foo.exit: ; preds = %2
				ret void
				}

				declare void @llvm.pseudoprobe(i64, i64, i32, i64) #1

				attributes #1 = { willreturn readnone norecurse nofree }

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 379086

llvm/include/llvm/IR/BasicBlock.h

llvm/lib/Analysis/InlineCost.cpp

llvm/lib/Analysis/MemorySSA.cpp

llvm/lib/CodeGen/Analysis.cpp

llvm/lib/IR/User.cpp

llvm/lib/Transforms/IPO/GlobalDCE.cpp

llvm/lib/Transforms/IPO/GlobalOpt.cpp

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

llvm/lib/Transforms/Scalar/EarlyCSE.cpp

llvm/lib/Transforms/Scalar/IndVarSimplify.cpp

llvm/lib/Transforms/Utils/CloneFunction.cpp

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/SampleProfile/pseudo-probe-cse.ll

llvm/test/Transforms/SampleProfile/pseudo-probe-loop-deletion.ll

[CSSPGO] Unblock optimizations with pseudo probe instrumentation part 3.
ClosedPublic