This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
1
InlinerPass.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
10
Inliner.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
-
inline_unreachable.ll

Differential D15289

Avoid inlining CallSites leading to unreachable
AbandonedPublic

Authored by junbuml on Dec 7 2015, 6:40 AM.

Download Raw Diff

Details

Reviewers

majnemer
manmanren
hfinkel
rsmith
mcrosier

Summary

It might be reasonably to avoid inlining CallSites leading to
an unreachable-terminated block unless the inline cost is obviously low.
Before analysis passes such as BPI and BFI are included in inliner, this change
could support performing conservative inlining for CallSites which will be
unlikely executed in their normal execution flows.

This change might reduce code size blow-up in exceptional blocks that are rarely
taken (e.g., exception handle regions) as well as indirectly increase inline opportunities
for unwinding functions containing exception handling code.

Diff Detail

Event Timeline

junbuml updated this revision to Diff 42067.Dec 7 2015, 6:40 AM

junbuml retitled this revision from to Avoid inlining CallSites leading to unreachable.

junbuml updated this object.

junbuml added reviewers: hfinkel, manmanren, rsmith, mcrosier.Dec 7 2015, 6:44 AM

junbuml added a reviewer: majnemer.

junbuml added subscribers: mssimpso, gberry.

mcrosier added a subscriber: Gerolf.Dec 7 2015, 6:47 AM

Do you have benchmark numbers to substantiate this change? Also, it's not totally uncommon to use exception handling for control flow in languages like C#.

I had performance tests only for spec2006 in cortex-a57 where I found some gains in xalancbmk (3%), povary (4%), and gcc (2%).

The basic idea of this patch is in the assumption of that unreachable-terminated blocks (e.g., exception handling) are less frequently executed comparing with the normal execution flow. If this basic idea is reasonable, do you think making the unreachable-threshold less aggressive could be a reasonable compromise? As of now, we have very aggressive unreachable-threshold, which is 0 in this change.

Hi Jun

can you share some insight into where the gains are coming from? My reading is that the patch suppresses inlining of unreachable code. That static code size increase should not decrease performance. Does it push the thresholds and suppress inlining of “good” functions? Or does it suppress optimizations from happening?

Thanks
Gerolf

@Gerolf, I don't think this patch wants to make code which contains unreachable less favorable to inlining. I think the idea is that if you have code which is post-dominated by unreachable, then it might be on an error handling path (e.g. exit, abort, throw) which might mean that it is colder.

@junbuml I think it would be good to gather data on a few different thresholds here.

This approach looks okay to me in general. Is there any compile-time impact with this change?
Is there any stats behind choosing 0 as threshold?

Cheers,
Manman

lib/Transforms/IPO/Inliner.cpp
282	Can you add a comment here on the relationship of this function and calcUnreachableHeuristics in BranchProbabilityInfo.cpp? i.e the difference and what to do when Inliner can include BPI and BFI.
292	Please elaborate why in the comment.
304	Can this if statement be removed or combined with the next if statement?

@Gerolf As David mentioned above, this change tries to avoid inlining call sites which is post-dominated by unreachable in their normal execution flow. For example, in below code, Exception() and getErrorMsg() would be post-dominated by unreachable so that we don't want to inline them in elementAt(). By avoiding inlining them in such cold regions, elementAt() itself could get more opportunities to be inlined.

int elementAt(int idx) {
  if (idx > limit)
    throw Exception(idx, getErrorMsg());
  return Data[idx];
}

@majnemer
Regarding the threshold, let me start addition performance runs with different thresholds and update comments.

@manmanren
I didn't have chance to measure compile-time with/without this change, but I believe that this change may not have any significant impact on compile-time because collectBlocksLeadingToUnreachable() is executed only once per function and it iterates basic blocks in the post order only once. Please let me know if you need more clarification about compile-time. I will address your inline comments soon.

Thanks for the reviews.

Gerolf added inline comments.Dec 7 2015, 5:25 PM

include/llvm/Transforms/IPO/InlinerPass.h
62	// Find blocks post-dominated by an unreachable-terminated block would be more accurate.
lib/Transforms/IPO/Inliner.cpp
285	Would it make sense to have a hasUnreachableBlock function attribute that marks a routine that has no (or at least one) unreachable block? For such function the analysis here is unnecessary.
527–535	You could compute a hasUnreachableBlock attribute here, too.

Do you also plan to investigate/apply similar inlining heuristics to call sites dominated by catch clauses etc?

Thanks
Gerolf

Do you also plan to investigate/apply similar inlining heuristics to call sites dominated by catch clauses etc?

Once this patch is landed. I could investigate the catch block case and other cases as well based on BranchProbabilityInfo::calculate().

lib/Transforms/IPO/Inliner.cpp
527–535	If the purpose of new function attribute is to minimize the analysis (collectBlocksLeadingToUnreachable()), one easiest way could be adding a new variable here like HasUnreachable and mark HasUnreachable=true when we detect an unreachable in the function. And then we can call collectBlocksLeadingToUnreachable() only when both HasCallSites and HasUnreachable true.

Addressed comments from Manman and Gerolf. Still running performance tests with different thresholds.

You might wish to make an exception for main() here... there is plenty of code like this:

int main() {
  ...
  exit(0);
}

Should you add llvm-commits as a subscriber?

Cheers,
Manman

junbuml edited edge metadata.Dec 9 2015, 9:52 AM

junbuml added a subscriber: llvm-commits.

Addressed Hal's comment. So, I made an exception for main() and added a test.
Thanks Hal for the review.

hfinkel added inline comments.Dec 9 2015, 10:39 AM

lib/Transforms/IPO/Inliner.cpp
287	To ask a higher-level question: Why do you want to take this intermediate step? Why not just jump straight to the end and use BPI? I'm a bit afraid of introducing yet more heuristics that we intend to rip out because of the resulting performance churn. It is true, doing this outside the pass manager adds compile-time expense (this will be better with the new pass manager, because we'll be able to properly cache and update these things), but I suspect the compile-time benefit from not inlining will compensate to a large extent. DominatorTree DT; DT.recalculate(F); LoopInfo LI(DT); BranchProbabilityInfo BPI(F, LI); and now you have a BPI object, and can do the "right" thing. Plus, we can then concentrate on adding branch probability heuristics in BPI, where we really want them, and not implicitly in the inliner.

I took this intermediate step simply because I thought that BPI should be hooked with new pass manager.
I believe using the BPI object directly to find call sites leading to some uncommon regions must be a better approach and will make everything smoother with the new pass mananger later.

LGTM assuming you address the check for 'main'.

lib/Transforms/IPO/Inliner.cpp
558	That is a bit too hacky. :-) There could be exception paths even in main where you want your optimization to fire. There needs to be a way to distinguish between "hot" and "cold" unreachable blocks, eg. if it is cold if is part of EH handling or if it is an exit returning an error code etc. And that heuristic should be applicable eg. for BranchProbabilityInfo::calcUnreachableHeuristics also eventually. I suggest a FIXME for now instead of checking for a specific function name.

Do you want to land this change first. And then bring BPI object here as a separate patch?

lib/Transforms/IPO/Inliner.cpp
558	For me, checking function names for main() doesn't seem uncommon. Please let me know if there is any better way. I agree that even in main we can distinguish real cold blocks, but I didn't want to make this patch bigger. I think a separate patch only for that could lead the review process much easier. So as you suggested, I want to add FIXME for now if reviewers are okay with it.

Do you want to land this change first. And then bring BPI object here as a separate patch?

We can do that; the test cases will be good to have regardless.

lib/Transforms/IPO/Inliner.cpp
558	A FIXME is fine for now (so long as we address it in the near future).

Add FIXME about detecting real cold unreachable blocks in main.

We can do that; the test cases will be good to have regardless.

Can you give me more detail about the test cases I missed ?

In D15289#307264, @junbuml wrote:

We can do that; the test cases will be good to have regardless.

Can you give me more detail about the test cases I missed ?

I meant that the test cases you have will be useful as regression tests along with a BPI-based implementation as well. Thus, when we have a BPI-based implementation, we that discard this one, but keep the tests.

Thank you for clarifying that.
Still running performance tests.

AndyAyers added a subscriber: AndyAyers.Dec 10 2015, 12:13 PM

Note on windows "main" may be "wmain" "WinMain" "WINMAIN" "MAIN" "AfxWinMain" "ENTGQQ" etc.

I've tried doing things like this in the past and it is tricky to get right outside of benchmarks. For instance we had performance-sensitive loops postdominated by calls to exit(). (And if you think interprocedurally, everything is generally postdominated by a very infrequently taken exit point).

Using BPI or similar is better in that at least all the parts of the framework can agree on the criteria for what is relative hot and what is relatively cold. As I told our devs, if you think you know better, please work to enhance the static heuristics or the profile count maintenance so every part of the code can benefit.

Also there are lots of cases where inlining in cold paths is helpful to improving CQ on hot paths. But that's a discussion for another day, perhaps.

Note on windows "main" may be "wmain" "WinMain" "WINMAIN" "MAIN" "AfxWinMain" "ENTGQQ" etc.

I'm not sure if it make sense to add all these names here. Is there any existing function to find an entry point for Windows? Please let me know any suggestion?

I've tried doing things like this in the past and it is tricky to get right outside of benchmarks. For instance we had performance-sensitive loops postdominated by calls to exit(). (And if you think interprocedurally, everything is generally postdominated by a very infrequently taken exit point).

I agree this could happen. So, I think we cannot blindly assume that all unreachable-terminated blocks are rarely executed. This is also the case in BranchProbabilityInfo::calcUnreachableHeuristics(). However, it might be still reasonable to assume that EH paths are less frequently taken in general so that we can inline little conservatively when call sites are post-dominated by EH code. So instead of detecting all unreachable blocks, we can limit the search only for blocks post-dominated by EH code (e.g., throw ()) ?

Using BPI or similar is better in that at least all the parts of the framework can agree on the criteria for what is relative hot and what is relatively cold. As I told our devs, if you think you know better, please work to enhance the static heuristics or the profile count maintenance so every part of the code can benefit.

I agree using BPI with some general heuristic here and enhancing BPI itself must be a right approach in general. But, I prefer to take incremental changes as long as the basic idea and direction is reasonable.

Also there are lots of cases where inlining in cold paths is helpful to improving CQ on hot paths. But that's a discussion for another day, perhaps.

Can you give me more detail about this?

In D15289#307628, @junbuml wrote:

Note on windows "main" may be "wmain" "WinMain" "WINMAIN" "MAIN" "AfxWinMain" "ENTGQQ" etc.

...

I agree using BPI with some general heuristic here and enhancing BPI itself must be a right approach in general. But, I prefer to take incremental changes as long as the basic idea and direction is reasonable.

This is why I'm not super happy about this patch: Aside for the test cases, it is not clear this is really any kind of incremental step toward the solution we want (based on BPI). I'm not unhappy enough to say it shouldn't go in if others are okay with it going in, but I agree that the BPI-based solution is really where we should be devoting our efforts.

...

I don't recall seeing any data collection with regard to the threshold.

I'm saying you probably can't really reliably use "main" (or any of windows's myriad variations of the name) as something to base heuristics on. Even if you could, it would be annoying to get different optimizations for code depending on the name of the function that contained the code.

davidxl added subscribers: davidxl, eraman.Dec 10 2015, 10:49 PM

This code basically duplicates unreachable heuristics in BPI analysis, which is not desirable as mentioned by Hal.

The good news is that Easwaran is working on a solution that can remove the inliner's limitation so that profile data will be made available to inliner. I expect this to be landed very soon.

@hfinkel @AndyAyers @davidxl
As you mentioned, using BPI directly in this patch must be the right thing to do. As a first step, I will limit the use of BPI to identify blocks leading to a real cold unreachable blocks and inliner will only use BPI without any function name check. I should enhance calcUnreachableHeuristics() to consider unwinding edges of InvokeInst and exit(0).

@majnemer
Regarding threshold, base on my spec2006 runs, I'm trying to find the maximum possible value so that we can avoid inlining only when the call site leading to cold region is large enough. Hopefully, I will be able to finish my tests today.

@davidxl
For me it appears that Easwaran's change performs aggressive inlining for the hot sites based on profile data? I think my change somewhat different because it statically try to identify cold blocks and perform conservative inlining.

As Easwaran is working on bringing BPI/BFI in inliner and expect it to be landed soon, I may want to revisit this patch after BPI/BFI is hooked. Please let me know if the BPI/BFI hook with inliner is not expected to be committed soon.

I submitted a separate patch (D15466) to enhance BranchProbabilityInfo::calcUnreachableHeuristics for InvokeInst which is basically the same idea used in collectBlocksLeadingToUnreachable in this patch.

I will revisit this patch after BPI/BFI is hooked in inliner.

junbuml mentioned this in D13304: Avoid inlining in throw statement.Jan 12 2016, 8:03 AM

junbuml mentioned this in D16616: Avoid inlining call sites in unreachable-terminated block.Jan 26 2016, 5:43 PM

junbuml mentioned this in rL259403: Avoid inlining call sites in unreachable-terminated block.Feb 1 2016, 12:59 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

IPO/

InlinerPass.h

6 lines

lib/

Transforms/

IPO/

Inliner.cpp

68 lines

test/

Transforms/

Inline/

inline_unreachable.ll

135 lines

Diff 42207

include/llvm/Transforms/IPO/InlinerPass.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	struct Inliner : public CallGraphSCCPass {

/// Calculate the inline threshold for given Caller. This threshold is lower		/// Calculate the inline threshold for given Caller. This threshold is lower
/// if the caller is marked with OptimizeForSize and -inline-threshold is not		/// if the caller is marked with OptimizeForSize and -inline-threshold is not
/// given on the comand line. It is higher if the callee is marked with the		/// given on the comand line. It is higher if the callee is marked with the
/// inlinehint attribute.		/// inlinehint attribute.
///		///
unsigned getInlineThreshold(CallSite CS) const;		unsigned getInlineThreshold(CallSite CS) const;

		// Find blocks post-dominated by an unreachable-terminated block in their
		GerolfUnsubmitted Not Done Reply Inline Actions // Find blocks post-dominated by an unreachable-terminated block would be more accurate. Gerolf: // Find blocks post-dominated by an unreachable-terminated block would be more accurate.
		// normal execution paths.
		void collectBlocksLeadingToUnreachable(Function *F);

/// getInlineCost - This method must be implemented by the subclass to		/// getInlineCost - This method must be implemented by the subclass to
/// determine the cost of inlining the specified call site. If the cost		/// determine the cost of inlining the specified call site. If the cost
/// returned is greater than the current inline threshold, the call site is		/// returned is greater than the current inline threshold, the call site is
/// not inlined.		/// not inlined.
///		///
virtual InlineCost getInlineCost(CallSite CS) = 0;		virtual InlineCost getInlineCost(CallSite CS) = 0;

/// removeDeadFunctions - Remove dead functions.		/// removeDeadFunctions - Remove dead functions.
///		///
/// This also includes a hack in the form of the 'AlwaysInlineOnly' flag		/// This also includes a hack in the form of the 'AlwaysInlineOnly' flag
/// which restricts it to deleting functions with an 'AlwaysInline'		/// which restricts it to deleting functions with an 'AlwaysInline'
/// attribute. This is useful for the InlineAlways pass that only wants to		/// attribute. This is useful for the InlineAlways pass that only wants to
/// deal with that subset of the functions.		/// deal with that subset of the functions.
bool removeDeadFunctions(CallGraph &CG, bool AlwaysInlineOnly = false);		bool removeDeadFunctions(CallGraph &CG, bool AlwaysInlineOnly = false);

private:		private:
// InlineThreshold - Cache the value here for easy access.		// InlineThreshold - Cache the value here for easy access.
unsigned InlineThreshold;		unsigned InlineThreshold;

// InsertLifetime - Insert @llvm.lifetime intrinsics.		// InsertLifetime - Insert @llvm.lifetime intrinsics.
bool InsertLifetime;		bool InsertLifetime;

		SmallPtrSet<BasicBlock *, 16> BlocksLeadingToUnreachable;

/// shouldInline - Return true if the inliner should attempt to		/// shouldInline - Return true if the inliner should attempt to
/// inline at the given CallSite.		/// inline at the given CallSite.
bool shouldInline(CallSite CS);		bool shouldInline(CallSite CS);
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif

lib/Transforms/IPO/Inliner.cpp

Show All 10 Lines
// missing any calls and updating the call graph. The decisions of which calls		// missing any calls and updating the call graph. The decisions of which calls
// are profitable to inline are implemented elsewhere.		// are profitable to inline are implemented elsewhere.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/InlinerPass.h"		#include "llvm/Transforms/IPO/InlinerPass.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
Show All 30 Lines

// We instroduce this threshold to help performance of instrumentation based		// We instroduce this threshold to help performance of instrumentation based
// PGO before we actually hook up inliner with analysis passes such as BPI and		// PGO before we actually hook up inliner with analysis passes such as BPI and
// BFI.		// BFI.
static cl::opt<int>		static cl::opt<int>
ColdThreshold("inlinecold-threshold", cl::Hidden, cl::init(225),		ColdThreshold("inlinecold-threshold", cl::Hidden, cl::init(225),
cl::desc("Threshold for inlining functions with cold attribute"));		cl::desc("Threshold for inlining functions with cold attribute"));

		static cl::opt<int> UnreachableThreshold(
		"inlineunreachable-threshold", cl::Hidden, cl::init(0),
		cl::desc("Threshold for inlining call sites leading to unreachable."));

// Threshold to use when optsize is specified (and there is no -inline-limit).		// Threshold to use when optsize is specified (and there is no -inline-limit).
const int OptSizeThreshold = 75;		const int OptSizeThreshold = 75;

Inliner::Inliner(char &ID)		Inliner::Inliner(char &ID)
: CallGraphSCCPass(ID), InlineThreshold(InlineLimit), InsertLifetime(true) {}		: CallGraphSCCPass(ID), InlineThreshold(InlineLimit), InsertLifetime(true) {}

Inliner::Inliner(char &ID, int Threshold, bool InsertLifetime)		Inliner::Inliner(char &ID, int Threshold, bool InsertLifetime)
: CallGraphSCCPass(ID), InlineThreshold(InlineLimit.getNumOccurrences() > 0 ?		: CallGraphSCCPass(ID), InlineThreshold(InlineLimit.getNumOccurrences() > 0 ?
▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	for (unsigned AllocaNo = 0, e = IFI.StaticAllocas.size();
// operation.		// operation.
AllocasForType.push_back(AI);		AllocasForType.push_back(AI);
UsedAllocas.insert(AI);		UsedAllocas.insert(AI);
}		}

return true;		return true;
}		}

		// Before analysis passes such as BPI and BFI are included in inliner, this
		manmanrenUnsubmitted Not Done Reply Inline Actions Can you add a comment here on the relationship of this function and calcUnreachableHeuristics in BranchProbabilityInfo.cpp? i.e the difference and what to do when Inliner can include BPI and BFI. manmanren: Can you add a comment here on the relationship of this function and calcUnreachableHeuristics…
		// function performs identifying blocks leading to an unreachable-terminated
		// block. Unlike BranchProbabilityInfo::calcUnreachableHeuristics(), this
		// function considers blocks post-dominated by unreachable only in their normal
		GerolfUnsubmitted Not Done Reply Inline Actions Would it make sense to have a hasUnreachableBlock function attribute that marks a routine that has no (or at least one) unreachable block? For such function the analysis here is unnecessary. Gerolf: Would it make sense to have a hasUnreachableBlock function attribute that marks a routine that…
		// execution path (i.e., unwind edges of InvokeInsts are not considered).
		// FIXME: When inliner is hooked with BPI, we may need to check edge weights to
		hfinkelUnsubmitted Not Done Reply Inline Actions To ask a higher-level question: Why do you want to take this intermediate step? Why not just jump straight to the end and use BPI? I'm a bit afraid of introducing yet more heuristics that we intend to rip out because of the resulting performance churn. It is true, doing this outside the pass manager adds compile-time expense (this will be better with the new pass manager, because we'll be able to properly cache and update these things), but I suspect the compile-time benefit from not inlining will compensate to a large extent. DominatorTree DT; DT.recalculate(F); LoopInfo LI(DT); BranchProbabilityInfo BPI(F, LI); and now you have a BPI object, and can do the "right" thing. Plus, we can then concentrate on adding branch probability heuristics in BPI, where we really want them, and not implicitly in the inliner. hfinkel: To ask a higher-level question: Why do you want to take this intermediate step? Why not just…
		// all successors to identify blocks post-dominated by blocks very unlikely to
		// be taken.
		void Inliner::collectBlocksLeadingToUnreachable(Function *F) {
		// Walk the basic blocks in post-order so that we can build up state about
		// the successors of a block iteratively.
		manmanrenUnsubmitted Not Done Reply Inline Actions Please elaborate why in the comment. manmanren: Please elaborate why in the comment.
		for (auto BB : post_order(&F->getEntryBlock())) {
		TerminatorInst *TI = BB->getTerminator();
		if (TI->getNumSuccessors() == 0 && isa<UnreachableInst>(TI)) {
		BlocksLeadingToUnreachable.insert(BB);
		continue;
		}

		// Don't follow unwind edges of invokes as it is also extremely unlikely
		// taken like the edges leading to an unreachable.
		if (auto *II = dyn_cast<InvokeInst>(BB->getTerminator())) {
		if (BlocksLeadingToUnreachable.count(II->getNormalDest()))
		BlocksLeadingToUnreachable.insert(BB);
		manmanrenUnsubmitted Not Done Reply Inline Actions Can this if statement be removed or combined with the next if statement? manmanren: Can this if statement be removed or combined with the next if statement?
		continue;
		}

		unsigned NumUnreachableEdges = 0;
		for (succ_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I)
		if (BlocksLeadingToUnreachable.count(*I))
		++NumUnreachableEdges;

		// If all successors are leading to unreachable, this block is too.
		if (NumUnreachableEdges && NumUnreachableEdges == TI->getNumSuccessors())
		BlocksLeadingToUnreachable.insert(BB);
		}
		}

unsigned Inliner::getInlineThreshold(CallSite CS) const {		unsigned Inliner::getInlineThreshold(CallSite CS) const {
int Threshold = InlineThreshold; // -inline-threshold or else selected by		int Threshold = InlineThreshold; // -inline-threshold or else selected by
// overall opt level		// overall opt level

		// If the normal execution path leads to an unreachable-terminated block,
		// there is little point in inlining this unless the inline cost is obviously
		// low.
		if (BlocksLeadingToUnreachable.count(CS.getInstruction()->getParent()) &&
		Threshold > UnreachableThreshold)
		Threshold = UnreachableThreshold;

// If -inline-threshold is not given, listen to the optsize attribute when it		// If -inline-threshold is not given, listen to the optsize attribute when it
// would decrease the threshold.		// would decrease the threshold.
Function *Caller = CS.getCaller();		Function *Caller = CS.getCaller();
bool OptSize = Caller && !Caller->isDeclaration() &&		bool OptSize = Caller && !Caller->isDeclaration() &&
// FIXME: Use Function::optForSize().		// FIXME: Use Function::optForSize().
Caller->hasFnAttribute(Attribute::OptimizeForSize);		Caller->hasFnAttribute(Attribute::OptimizeForSize);
if (!(InlineLimit.getNumOccurrences() > 0) && OptSize &&		if (!(InlineLimit.getNumOccurrences() > 0) && OptSize &&
OptSizeThreshold < Threshold)		OptSizeThreshold < Threshold)
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	DEBUG(dbgs() << " NOT Inlining: cost=" << IC.getCost()
<< ", thres=" << (IC.getCostDelta() + IC.getCost())		<< ", thres=" << (IC.getCostDelta() + IC.getCost())
<< ", Call: " << *CS.getInstruction() << "\n");		<< ", Call: " << *CS.getInstruction() << "\n");
emitAnalysis(CS, Twine(CS.getCalledFunction()->getName() +		emitAnalysis(CS, Twine(CS.getCalledFunction()->getName() +
" too costly to inline (cost=") +		" too costly to inline (cost=") +
Twine(IC.getCost()) + ", threshold=" +		Twine(IC.getCost()) + ", threshold=" +
Twine(IC.getCostDelta() + IC.getCost()) + ")");		Twine(IC.getCostDelta() + IC.getCost()) + ")");
return false;		return false;
}		}

// Try to detect the case where the current inlining candidate caller (call		// Try to detect the case where the current inlining candidate caller (call
// it B) is a static or linkonce-ODR function and is an inlining candidate		// it B) is a static or linkonce-ODR function and is an inlining candidate
// elsewhere, and the current candidate callee (call it C) is large enough		// elsewhere, and the current candidate callee (call it C) is large enough
// that inlining it into B would make B too big to inline later. In these		// that inlining it into B would make B too big to inline later. In these
// circumstances it may be best not to inline C into B, but to inline B into		// circumstances it may be best not to inline C into B, but to inline B into
// its callers.		// its callers.
//		//
// This only applies to static and linkonce-ODR functions because those are		// This only applies to static and linkonce-ODR functions because those are
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	bool Inliner::runOnSCC(CallGraphSCC &SCC) {
SmallVector<std::pair<CallSite, int>, 16> CallSites;		SmallVector<std::pair<CallSite, int>, 16> CallSites;

// When inlining a callee produces new call sites, we want to keep track of		// When inlining a callee produces new call sites, we want to keep track of
// the fact that they were inlined from the callee. This allows us to avoid		// the fact that they were inlined from the callee. This allows us to avoid
// infinite inlining in some obscure cases. To represent this, we use an		// infinite inlining in some obscure cases. To represent this, we use an
// index into the InlineHistory vector.		// index into the InlineHistory vector.
SmallVector<std::pair<Function*, int>, 8> InlineHistory;		SmallVector<std::pair<Function*, int>, 8> InlineHistory;

		BlocksLeadingToUnreachable.clear();

for (CallGraphNode *Node : SCC) {		for (CallGraphNode *Node : SCC) {
Function *F = Node->getFunction();		Function *F = Node->getFunction();
if (!F) continue;		if (!F) continue;

for (BasicBlock &BB : *F)		bool HasCallSites = false;
		bool HasUnreachableTerminatedBlock = false;
		for (BasicBlock &BB : *F) {
		if (!HasUnreachableTerminatedBlock) {
		TerminatorInst *TI = BB.getTerminator();
		if (TI->getNumSuccessors() == 0 && isa<UnreachableInst>(TI))
		HasUnreachableTerminatedBlock = true;
		}
		GerolfUnsubmitted Not Done Reply Inline Actions You could compute a hasUnreachableBlock attribute here, too. Gerolf: You could compute a hasUnreachableBlock attribute here, too.
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions If the purpose of new function attribute is to minimize the analysis (collectBlocksLeadingToUnreachable()), one easiest way could be adding a new variable here like HasUnreachable and mark HasUnreachable=true when we detect an unreachable in the function. And then we can call collectBlocksLeadingToUnreachable() only when both HasCallSites and HasUnreachable true. junbuml: If the purpose of new function attribute is to minimize the analysis…
for (Instruction &I : BB) {		for (Instruction &I : BB) {
CallSite CS(cast<Value>(&I));		CallSite CS(cast<Value>(&I));
// If this isn't a call, or it is a call to an intrinsic, it can		// If this isn't a call, or it is a call to an intrinsic, it can
// never be inlined.		// never be inlined.
if (!CS \|\| isa<IntrinsicInst>(I))		if (!CS \|\| isa<IntrinsicInst>(I))
continue;		continue;

// If this is a direct call to an external function, we can never inline		// If this is a direct call to an external function, we can never inline
// it. If it is an indirect call, inlining may resolve it to be a		// it. If it is an indirect call, inlining may resolve it to be a
// direct call, so we keep it.		// direct call, so we keep it.
if (Function *Callee = CS.getCalledFunction())		if (Function *Callee = CS.getCalledFunction())
if (Callee->isDeclaration())		if (Callee->isDeclaration())
continue;		continue;

CallSites.push_back(std::make_pair(CS, -1));		CallSites.push_back(std::make_pair(CS, -1));
		HasCallSites = true;
		}
}		}
		if (HasCallSites && HasUnreachableTerminatedBlock)
		collectBlocksLeadingToUnreachable(F);
}		}

DEBUG(dbgs() << ": " << CallSites.size() << " call sites.\n");		DEBUG(dbgs() << ": " << CallSites.size() << " call sites.\n");
		GerolfUnsubmitted Not Done Reply Inline Actions That is a bit too hacky. :-) There could be exception paths even in main where you want your optimization to fire. There needs to be a way to distinguish between "hot" and "cold" unreachable blocks, eg. if it is cold if is part of EH handling or if it is an exit returning an error code etc. And that heuristic should be applicable eg. for BranchProbabilityInfo::calcUnreachableHeuristics also eventually. I suggest a FIXME for now instead of checking for a specific function name. Gerolf: That is a bit too hacky. :-) There could be exception paths even in main where you want your…
		junbumlAuthorUnsubmitted Not Done Reply Inline Actions For me, checking function names for main() doesn't seem uncommon. Please let me know if there is any better way. I agree that even in main we can distinguish real cold blocks, but I didn't want to make this patch bigger. I think a separate patch only for that could lead the review process much easier. So as you suggested, I want to add FIXME for now if reviewers are okay with it. junbuml: For me, checking function names for main() doesn't seem uncommon. Please let me know if there…
		hfinkelUnsubmitted Not Done Reply Inline Actions A FIXME is fine for now (so long as we address it in the near future). hfinkel: A FIXME is fine for now (so long as we address it in the near future).

// If there are no calls in this function, exit early.		// If there are no calls in this function, exit early.
if (CallSites.empty())		if (CallSites.empty())
return false;		return false;

// Now that we have all of the call sites, move the ones to functions in the		// Now that we have all of the call sites, move the ones to functions in the
// current SCC to the end of the list.		// current SCC to the end of the list.
unsigned FirstCallInSCC = CallSites.size();		unsigned FirstCallInSCC = CallSites.size();
▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

test/Transforms/Inline/inline_unreachable.ll

This file was added.

				; RUN: opt < %s -inline -S \| FileCheck %s

				@a = global i32 4
				@_ZTIi = external global i8*

				; CHECK-LABEL: callSimpleFunction
				; CHECK: call i32 @simpleFunction
				define i32 @callSimpleFunction(i32 %idx, i32 %limit) {
				entry:
				%cmp = icmp sge i32 %idx, %limit
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				call i32 @simpleFunction(i32 %idx)
				br label %invoke.cont

				invoke.cont:
				unreachable

				if.end:
				ret i32 %idx
				}

				; CHECK-LABEL: callSmallFunction
				; CHECK-NOT: call i32 @smallFunction
				define i32 @callSmallFunction(i32 %idx, i32 %limit) {
				entry:
				%cmp = icmp sge i32 %idx, %limit
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				call i32 @smallFunction(i32 %idx)
				br label %invoke.cont

				invoke.cont:
				unreachable

				if.end:
				ret i32 %idx
				}

				; CHECK-LABEL: throwSimpleException
				; CHECK: invoke i32 @simpleFunction
				define i32 @throwSimpleException(i32 %idx, i32 %limit) #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				%cmp = icmp sge i32 %idx, %limit
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%exception = call i8* @__cxa_allocate_exception(i64 1) #0
				invoke i32 @simpleFunction(i32 %idx)
				to label %invoke.cont unwind label %lpad

				invoke.cont: ; preds = %if.then
				call void @__cxa_throw(i8* %exception, i8* bitcast (i8** @_ZTIi to i8), i8 null) #1
				unreachable

				lpad: ; preds = %if.then
				%ll = landingpad { i8*, i32 }
				cleanup
				ret i32 %idx

				if.end: ; preds = %entry
				ret i32 %idx
				}

				; CHECK-LABEL: throwSmallException
				; CHECK-NOT: invoke i32 @smallFunction
				define i32 @throwSmallException(i32 %idx, i32 %limit) #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				%cmp = icmp sge i32 %idx, %limit
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%exception = call i8* @__cxa_allocate_exception(i64 1) #0
				invoke i32 @smallFunction(i32 %idx)
				to label %invoke.cont unwind label %lpad

				invoke.cont: ; preds = %if.then
				call void @__cxa_throw(i8* %exception, i8* bitcast (i8** @_ZTIi to i8), i8 null) #1
				unreachable

				lpad: ; preds = %if.then
				%ll = landingpad { i8*, i32 }
				cleanup
				ret i32 %idx

				if.end: ; preds = %entry
				ret i32 %idx
				}

				define i32 @simpleFunction(i32 %a) #0 {
				entry:
				%a1 = load volatile i32, i32* @a
				%x1 = add i32 %a1, %a1
				%a2 = load volatile i32, i32* @a
				%x2 = add i32 %x1, %a2
				%a3 = load volatile i32, i32* @a
				%x3 = add i32 %x2, %a3
				%a4 = load volatile i32, i32* @a
				%x4 = add i32 %x3, %a4
				%a5 = load volatile i32, i32* @a
				%x5 = add i32 %x4, %a5
				%a6 = load volatile i32, i32* @a
				%x6 = add i32 %x5, %a6
				%a7 = load volatile i32, i32* @a
				%x7 = add i32 %x6, %a6
				%a8 = load volatile i32, i32* @a
				%x8 = add i32 %x7, %a8
				%a9 = load volatile i32, i32* @a
				%x9 = add i32 %x8, %a9
				%a10 = load volatile i32, i32* @a
				%x10 = add i32 %x9, %a10
				%a11 = load volatile i32, i32* @a
				%x11 = add i32 %x10, %a11
				%a12 = load volatile i32, i32* @a
				%x12 = add i32 %x11, %a12
				%add = add i32 %x12, %a
				ret i32 %add
				}

				define i32 @smallFunction(i32 %a) {
				entry:
				%r = load volatile i32, i32* @a
				ret i32 %r
				}

				attributes #0 = { nounwind }
				attributes #1 = { noreturn }

				declare i8* @__cxa_allocate_exception(i64)
				declare i32 @__gxx_personality_v0(...)
				declare void @__cxa_throw(i8, i8, i8*)

This is an archive of the discontinued LLVM Phabricator instance.

Avoid inlining CallSites leading to unreachableAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 42207

include/llvm/Transforms/IPO/InlinerPass.h

lib/Transforms/IPO/Inliner.cpp

test/Transforms/Inline/inline_unreachable.ll

Avoid inlining CallSites leading to unreachable
AbandonedPublic