This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
10/12
Inliner.cpp
-
test/Transforms/Inline/
-
Transforms/
-
Inline/
1
inline_call.ll
-
inline_invoke.ll
-
last-callsite.ll

Differential D104028

[llvm][Inliner] Add an optional PriorityInlineOrder
ClosedPublic

Authored by taolq on Jun 10 2021, 6:34 AM.

Download Raw Diff

Details

Reviewers

mtrofin
kazu
teemperor

Commits

rG671a87104b81: [llvm][Inliner] Add an optional PriorityInlineOrder
rGa740b707d193: [llvm][Inliner] Add an optional PriorityInlineOrder

Summary

This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

taolq created this revision.Jun 10 2021, 6:34 AM

Herald added subscribers: ormris, hiraditya, eraman. · View Herald TranscriptJun 10 2021, 6:34 AM

taolq requested review of this revision.Jun 10 2021, 6:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 6:34 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

taolq retitled this revision from [WIP] use standard priority queue to order inlining to [WIP] Use standard priority queue to order inlining.Jun 10 2021, 6:46 AM

taolq edited the summary of this revision. (Show Details)

taolq added reviewers: mtrofin, kazu, teemperor.

Herald added a subscriber: Prazek. · View Herald TranscriptJun 10 2021, 6:46 AM

Harbormaster completed remote builds in B108613: Diff 351157.Jun 10 2021, 7:50 AM

mtrofin added inline comments.Jun 10 2021, 9:01 AM

llvm/lib/Transforms/IPO/Inliner.cpp
712	you can also delay pop-ing. Or, better, have pop return T, and front return reference and be a const function (i.e. a peek - you need it I think in the for loop, line 899).
823–828	add a flag to this file to control which InlineOrder you use.

ormris removed a subscriber: ormris.Jun 10 2021, 10:25 AM

Add inline order option
modify pop() & front() API

Harbormaster completed remote builds in B109115: Diff 351890.Jun 14 2021, 9:18 AM

With the introduction of the flag, there shouldn't be any more failing tests, right?

llvm/lib/Transforms/IPO/Inliner.cpp
103	nit: drop the second `inline`, simpler: InlineEnablePriorityOrder inline-enable-priority-order
823–828	nit: when the flag is enabled, you end up allocating an object just to drop it right after. You can just allocate the appropriate one on each branch of the if statement. Then, if you want, just assert after the if that CallsPtr is not null - this will help maintaining this later, when there are more than 2 ordering implementations, and it makes the design crystal-clear.

kazu added inline comments.Jun 14 2021, 10:21 AM

llvm/lib/Transforms/IPO/Inliner.cpp
823–828	nit: I would just leave the name as is -- "Calls". "CallsPtr" is a bit mouthful although I understand what you mean here.

In D104028#2817188, @mtrofin wrote:

With the introduction of the flag, there shouldn't be any more failing tests, right?

Yes, all tests pass.

In D104028#2818310, @taolq wrote:

In D104028#2817188, @mtrofin wrote:

With the introduction of the flag, there shouldn't be any more failing tests, right?

Yes, all tests pass.

Worth changing the summary (and also the git commit - the latter gets copied to the former only the first time you arc diff )

renaming option and Calls
allocate Calls once on the branch statement

ChuanqiXu added a subscriber: ChuanqiXu.Jun 15 2021, 12:30 AM

Harbormaster completed remote builds in B109240: Diff 352054.Jun 15 2021, 12:56 AM

I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?
BTW, for the performance side, SPEC2017 shows some regression with some improvement if we use std::priority_queue by InlineCost. It may be irrelevant to this patch.

PS2: Did you consider to move InlineOrder(s) class out of Inlined.cpp into the header and make it as a member of InlinedPass just like the Advisor? I guess it may be more easier to use them.

llvm/lib/Transforms/IPO/Inliner.cpp
768–779	I prefer to use std::vector<T> as data member instead of PriorityQueue. After that, the implementation may be simpler. For example: swap all the required element to the end of the vector std::make_heap(...) And the implementation for pop and push wouldn't be much harder.

In D104028#2818736, @ChuanqiXu wrote:

I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?

Thanks for your comments. In this patch, callsites are sorted by callee size (see PriorityInlindeOrder::evaluate())

BTW, for the performance side, SPEC2017 shows some regression with some improvement if we use std::priority_queue by InlineCost. It may be irrelevant to this patch.

That sounds great! I would like to use more elaborate priority functions in the future, e.g. consider both inline costs, callee size, and other profile.

PS2: Did you consider to move InlineOrder(s) class out of Inlined.cpp into the header and make it as a member of InlinedPass just like the Advisor? I guess it may be more easier to use them.

It is defined in Inliner.cpp, because it is an abstraction meant for the inliner.

In D104028#2818832, @taolq wrote:

Thanks for your comments. In this patch, callsites are sorted by callee size (see PriorityInlindeOrder::evaluate())

Sorry, I missed that.

That sounds great! I would like to use more elaborate priority functions in the future, e.g. consider both inline costs, callee size, and other profile.

The main point is the regression. Calculate the inline cost for every callsite is costful. In other words, it grows the compile-time without significant improvements. (We could discuss this in other threads further, it may be irrelevant)

It is defined in Inliner.cpp, because it is an abstraction meant for the inliner.

I am fine to remain it in Inliner.cpp. But the reason may be a little weird for me since there many components of inlining didn't be put in inliner.cpp.

ChuanqiXu added inline comments.Jun 15 2021, 2:01 AM

llvm/lib/Transforms/IPO/Inliner.cpp
904–906	It may be better to: CallBase *CB = Calls->front().first; const int InlineHistoryID = Calls->front().second; Since we would call pop() immediately, the reference P would be invalid. It may be possible that programmers may refer P after `Calls->pop()`, which is a disaster. Another option is to replace `auto &P =` with `auto P =`.

yurai007 added a subscriber: yurai007.Jun 15 2021, 8:15 AM

yurai007 added inline comments.

llvm/lib/Transforms/IPO/Inliner.cpp
727	nit: use 'final' to unlock devirtualization opportunity for overridden members.
827	nit: std::make_unique?

replace implementation of PriorityInlineOrder from PriorityQueue to SmallVector

taolq edited the summary of this revision. (Show Details)Jun 15 2021, 8:25 AM

Harbormaster completed remote builds in B109306: Diff 352151.Jun 15 2021, 9:07 AM

ChuanqiXu added inline comments.Jun 15 2021, 7:08 PM

llvm/lib/Transforms/IPO/Inliner.cpp
19	Now it is not necessary.
798–822	Since we add new order strategy, we need edit the comment too.

taolq retitled this revision from [WIP] Use standard priority queue to order inlining to Add an optional PriorityInlineOrder..Jun 16 2021, 8:24 AM

taolq edited the summary of this revision. (Show Details)

taolq retitled this revision from Add an optional PriorityInlineOrder. to Add an optional PriorityInlineOrder.

remove useless header file
add tests for PriorityInlineOrder

In D104028#2818736, @ChuanqiXu wrote:

I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?

The intention here is to make it easy to try out different priority functions -- callee size, dynamic call count, impacts on callers, etc.

In D104028#2818867, @ChuanqiXu wrote:

The main point is the regression. Calculate the inline cost for every callsite is costful. In other words, it grows the compile-time without significant improvements. (We could discuss this in other threads further, it may be irrelevant)

The compilation time is not a main concern while we are gathering insights.

mtrofin added inline comments.Jun 16 2021, 1:27 PM

llvm/test/Transforms/Inline/inline_call.ll
3	in this and the other test files, is there something different in the output from line 2 and line 3, so that you can add a check that enable-priority-order actually does something different?

Harbormaster completed remote builds in B109523: Diff 352448.Jun 16 2021, 4:38 PM

In D104028#2822410, @kazu wrote:

In D104028#2818736, @ChuanqiXu wrote:

I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?

The intention here is to make it easy to try out different priority functions -- callee size, dynamic call count, impacts on callers, etc.

In D104028#2818867, @ChuanqiXu wrote:

The main point is the regression. Calculate the inline cost for every callsite is costful. In other words, it grows the compile-time without significant improvements. (We could discuss this in other threads further, it may be irrelevant)

The compilation time is not a main concern while we are gathering insights.

Yeah, I understood it. I comment this just because I find that someone are doing something similar with me so that I want to share something.

BTW: It'd better to add a tag before the title, like '[Inline]' or '[Inliner]'.

taolq retitled this revision from Add an optional PriorityInlineOrder to [llvm][Inliner] Add an optional PriorityInlineOrder.Jun 16 2021, 6:59 PM

Add a test

kazu added inline comments.Jun 17 2021, 10:07 AM

llvm/lib/Transforms/IPO/Inliner.cpp
800–801	`smaller`

fix typo

mtrofin added inline comments.Jun 17 2021, 5:27 PM

llvm/test/Transforms/Inline/monster_scc.ll
73 ↗	(On Diff #352880)	These look like the old cases, though - is there a difference that the priority-based ordering introduces? (maybe I'm missing it)
132 ↗	(On Diff #352880)	same comment - what's the difference?

taolq added inline comments.Jun 17 2021, 5:56 PM

llvm/test/Transforms/Inline/monster_scc.ll
73 ↗	(On Diff #352880)	Actually, they are different. Each one, (old inline pass, new inline pass, PriorityInlineOrder) makes a different function body after inlining. The inlined function name is hard to distinguish. You could also focus on the number of called functions. e.g. with PriorityInlineOrder, this function has 7 call after inlining, more than the previous two cases.
132 ↗	(On Diff #352880)	After inlining, the called functions are different. These check the sequence of call statements.

lgtm

This revision is now accepted and ready to land.Jun 17 2021, 6:02 PM

This revision was landed with ongoing or failed builds.Jun 18 2021, 2:00 AM

Closed by commit rGa740b707d193: [llvm][Inliner] Add an optional PriorityInlineOrder (authored by taolq). · Explain Why

This revision was automatically updated to reflect the committed changes.

taolq added a commit: rGa740b707d193: [llvm][Inliner] Add an optional PriorityInlineOrder.

It looks like this change may cause the following failure on GreenDragon. It would be great if you could take a look.

ommand Output (stderr):
--
+ : 'RUN: at line 42'
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -inline -inline-threshold=150 -enable-new-pm=0
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,OLD
+ : 'RUN: at line 43'
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -passes=inline -inline-threshold=150
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,NEW
+ : 'RUN: at line 44'
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -passes=inline -inline-threshold=150 -inline-enable-priority-order=true
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,PO
/Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll:76:7: error: PO: expected string not found in input
; PO: call void @_Z1fILb1ELi2EEvPbS0_(
      ^
<stdin>:19:34: note: scanning from here
 call void @_Z1fILb1ELi1EEvPbS0_(i8* %add.ptr2, i8* %E)
                                 ^
<stdin>:23:2: note: possible intended match here
 call void @_Z1fILb0ELi1EEvPbS0_(i8* %add.ptr2, i8* %E)
 ^
/Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll:137:7: error: PO: expected string not found in input
; PO: call void @_Z1fILb0ELi1EEvPbS0_(
      ^
<stdin>:43:34: note: scanning from here
 call void @_Z1fILb1ELi1EEvPbS0_(i8* %add.ptr2, i8* %E)
                                 ^
<stdin>:72:2: note: possible intended match here
 call void @_Z1fILb0ELi3EEvPbS0_(i8* %add.ptr2.i9, i8* %E)
 ^

https://green.lab.llvm.org/green/job/clang-stage1-RA/21735/consoleFull#2293633728254eaf0-7326-4999-85b0-388101f2d404

Looks like this breaks tests on mac: http://45.33.8.238/macm1/11921/step_11.txt

Please take a look and revert for now if it takes a while to fix.

taolq added a reverting change: rG93183a41b962: Revert D104028 "[llvm][Inliner] Add an optional PriorityInlineOrder".Jun 18 2021, 3:52 AM

Harbormaster completed remote builds in B109834: Diff 352880.Jun 18 2021, 9:22 AM

taolq reopened this revision.Jun 18 2021, 6:47 PM

This revision is now accepted and ready to land.Jun 18 2021, 6:47 PM

remove test monster_scc.ll which has a different output in macOS

This revision was landed with ongoing or failed builds.Jun 18 2021, 7:20 PM

Closed by commit rG671a87104b81: [llvm][Inliner] Add an optional PriorityInlineOrder (authored by taolq). · Explain Why

This revision was automatically updated to reflect the committed changes.

taolq added a commit: rG671a87104b81: [llvm][Inliner] Add an optional PriorityInlineOrder.

In D104028#2828545, @taolq wrote:

remove test monster_scc.ll which has a different output in macOS

Could you elaborate *why* it has a different output? Not sure if removing the test because it fails on macOS is the best way forward. It would be good to understand *why* there's a difference first.

Harbormaster completed remote builds in B110034: Diff 353149.Jun 19 2021, 2:08 PM

In D104028#2829050, @fhahn wrote:

In D104028#2828545, @taolq wrote:

remove test monster_scc.ll which has a different output in macOS

Could you elaborate *why* it has a different output? Not sure if removing the test because it fails on macOS is the best way forward. It would be good to understand *why* there's a difference first.

I don't know the reason yet. I will figure it out.

wenlei added a subscriber: wenlei.Aug 30 2021, 2:01 PM

MTC added a subscriber: MTC.Aug 30 2021, 7:11 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

Inliner.cpp

96 lines

test/

Transforms/

Inline/

inline_call.ll

1 line

inline_invoke.ll

1 line

last-callsite.ll

1 line

Diff 352448

llvm/lib/Transforms/IPO/Inliner.cpp

Show All 10 Lines
// are profitable to inline are implemented elsewhere.		// are profitable to inline are implemented elsewhere.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/IPO/Inliner.h"		#include "llvm/Transforms/IPO/Inliner.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
		ChuanqiXuUnsubmitted Done Reply Inline Actions Now it is not necessary. ChuanqiXu: Now it is not necessary.
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/BasicAliasAnalysis.h"		#include "llvm/Analysis/BasicAliasAnalysis.h"
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines

static cl::opt<std::string> CGSCCInlineReplayFile(		static cl::opt<std::string> CGSCCInlineReplayFile(
"cgscc-inline-replay", cl::init(""), cl::value_desc("filename"),		"cgscc-inline-replay", cl::init(""), cl::value_desc("filename"),
cl::desc(		cl::desc(
"Optimization remarks file containing inline remarks to be replayed "		"Optimization remarks file containing inline remarks to be replayed "
"by inlining from cgscc inline remarks."),		"by inlining from cgscc inline remarks."),
cl::Hidden);		cl::Hidden);

		static cl::opt<bool> InlineEnablePriorityOrder(
		"inline-enable-priority-order", cl::Hidden, cl::init(false),
		mtrofinUnsubmitted Done Reply Inline Actions nit: drop the second `inline`, simpler: InlineEnablePriorityOrder inline-enable-priority-order mtrofin: nit: drop the second `inline`, simpler: InlineEnablePriorityOrder inline-enable-priority-order
		cl::desc("Enable the priority inline order for the inliner"));

LegacyInlinerBase::LegacyInlinerBase(char &ID) : CallGraphSCCPass(ID) {}		LegacyInlinerBase::LegacyInlinerBase(char &ID) : CallGraphSCCPass(ID) {}

LegacyInlinerBase::LegacyInlinerBase(char &ID, bool InsertLifetime)		LegacyInlinerBase::LegacyInlinerBase(char &ID, bool InsertLifetime)
: CallGraphSCCPass(ID), InsertLifetime(InsertLifetime) {}		: CallGraphSCCPass(ID), InsertLifetime(InsertLifetime) {}

/// For this class, we declare that we require and preserve the call graph.		/// For this class, we declare that we require and preserve the call graph.
/// If the derived class implements this method, it should		/// If the derived class implements this method, it should
/// always explicitly call the implementation here.		/// always explicitly call the implementation here.
▲ Show 20 Lines • Show All 558 Lines • ▼ Show 20 Lines	assert(IAA->getAdvisor() &&
"Expected a present InlineAdvisorAnalysis also have an "		"Expected a present InlineAdvisorAnalysis also have an "
"InlineAdvisor initialized");		"InlineAdvisor initialized");
return *IAA->getAdvisor();		return *IAA->getAdvisor();
}		}

template <typename T> class InlineOrder {		template <typename T> class InlineOrder {
public:		public:
using reference = T &;		using reference = T &;
		using const_reference = const T &;

virtual ~InlineOrder() {}		virtual ~InlineOrder() {}

virtual size_t size() = 0;		virtual size_t size() = 0;

virtual void push(const T &Elt) = 0;		virtual void push(const T &Elt) = 0;

virtual void pop() = 0;		virtual T pop() = 0;

virtual reference front() = 0;		virtual const_reference front() = 0;

virtual void erase_if(function_ref<bool(T)> Pred) = 0;		virtual void erase_if(function_ref<bool(T)> Pred) = 0;

bool empty() { return !size(); }		bool empty() { return !size(); }
};		};

template <typename T, typename Container = SmallVector<T, 16>>		template <typename T, typename Container = SmallVector<T, 16>>
class DefaultInlineOrder : public InlineOrder<T> {		class DefaultInlineOrder : public InlineOrder<T> {
using reference = T &;		using reference = T &;
		using const_reference = const T &;

public:		public:
size_t size() override { return Calls.size() - FirstIndex; }		size_t size() override { return Calls.size() - FirstIndex; }

void push(const T &Elt) override { Calls.push_back(Elt); }		void push(const T &Elt) override { Calls.push_back(Elt); }

void pop() override {		T pop() override {
assert(size() > 0);		assert(size() > 0);
FirstIndex++;		return Calls[FirstIndex++];
}		}

reference front() override {		const_reference front() override {
		mtrofinUnsubmitted Not Done Reply Inline Actions you can also delay pop-ing. Or, better, have pop return T, and front return reference and be a const function (i.e. a peek - you need it I think in the for loop, line 899). mtrofin: you can also delay pop-ing. Or, better, have pop return T, and front return reference and be a…
assert(size() > 0);		assert(size() > 0);
return Calls[FirstIndex];		return Calls[FirstIndex];
}		}

void erase_if(function_ref<bool(T)> Pred) override {		void erase_if(function_ref<bool(T)> Pred) override {
Calls.erase(std::remove_if(Calls.begin() + FirstIndex, Calls.end(), Pred),		Calls.erase(std::remove_if(Calls.begin() + FirstIndex, Calls.end(), Pred),
Calls.end());		Calls.end());
}		}

private:		private:
Container Calls;		Container Calls;
size_t FirstIndex = 0;		size_t FirstIndex = 0;
};		};

		class PriorityInlineOrder : public InlineOrder<std::pair<CallBase *, int>> {
		yurai007Unsubmitted Not Done Reply Inline Actions nit: use 'final' to unlock devirtualization opportunity for overridden members. yurai007: nit: use 'final' to unlock devirtualization opportunity for overridden members.
		using T = std::pair<CallBase *, int>;
		using reference = T &;
		using const_reference = const T &;

		static bool cmp(const T &P1, const T &P2) { return P1.second > P2.second; }

		int evaluate(CallBase *CB) {
		Function *Callee = CB->getCalledFunction();
		return (int)Callee->getInstructionCount();
		}

		public:
		size_t size() override { return Heap.size(); }

		void push(const T &Elt) override {
		CallBase *CB = Elt.first;
		const int InlineHistoryID = Elt.second;
		const int Goodness = evaluate(CB);

		Heap.push_back({CB, Goodness});
		std::push_heap(Heap.begin(), Heap.end(), cmp);
		InlineHistoryMap[CB] = InlineHistoryID;
		}

		T pop() override {
		assert(size() > 0);
		CallBase *CB = Heap.front().first;
		T Result = std::make_pair(CB, InlineHistoryMap[CB]);
		InlineHistoryMap.erase(CB);
		std::pop_heap(Heap.begin(), Heap.end(), cmp);
		Heap.pop_back();
		return Result;
		}

		const_reference front() override {
		assert(size() > 0);
		CallBase *CB = Heap.front().first;
		return *InlineHistoryMap.find(CB);
		}

		void erase_if(function_ref<bool(T)> Pred) override {
		Heap.erase(std::remove_if(Heap.begin(), Heap.end(), Pred), Heap.end());
		std::make_heap(Heap.begin(), Heap.end(), cmp);
		}

		private:
		SmallVector<T, 16> Heap;
		DenseMap<CallBase *, int> InlineHistoryMap;
		};

PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,		PreservedAnalyses InlinerPass::run(LazyCallGraph::SCC &InitialC,
CGSCCAnalysisManager &AM, LazyCallGraph &CG,		CGSCCAnalysisManager &AM, LazyCallGraph &CG,
		ChuanqiXuUnsubmitted Done Reply Inline Actions I prefer to use std::vector<T> as data member instead of PriorityQueue. After that, the implementation may be simpler. For example: swap all the required element to the end of the vector std::make_heap(...) And the implementation for pop and push wouldn't be much harder. ChuanqiXu: I prefer to use std::vector<T> as data member instead of PriorityQueue. After that, the…
CGSCCUpdateResult &UR) {		CGSCCUpdateResult &UR) {
const auto &MAMProxy =		const auto &MAMProxy =
AM.getResult<ModuleAnalysisManagerCGSCCProxy>(InitialC, CG);		AM.getResult<ModuleAnalysisManagerCGSCCProxy>(InitialC, CG);
bool Changed = false;		bool Changed = false;

assert(InitialC.size() > 0 && "Cannot handle an empty SCC!");		assert(InitialC.size() > 0 && "Cannot handle an empty SCC!");
Module &M = *InitialC.begin()->getFunction().getParent();		Module &M = *InitialC.begin()->getFunction().getParent();
ProfileSummaryInfo *PSI = MAMProxy.getCachedResult<ProfileSummaryAnalysis>(M);		ProfileSummaryInfo *PSI = MAMProxy.getCachedResult<ProfileSummaryAnalysis>(M);

FunctionAnalysisManager &FAM =		FunctionAnalysisManager &FAM =
AM.getResult<FunctionAnalysisManagerCGSCCProxy>(InitialC, CG)		AM.getResult<FunctionAnalysisManagerCGSCCProxy>(InitialC, CG)
.getManager();		.getManager();

InlineAdvisor &Advisor = getAdvisor(MAMProxy, FAM, M);		InlineAdvisor &Advisor = getAdvisor(MAMProxy, FAM, M);
Advisor.onPassEntry();		Advisor.onPassEntry();

auto AdvisorOnExit = make_scope_exit([&] { Advisor.onPassExit(); });		auto AdvisorOnExit = make_scope_exit([&] { Advisor.onPassExit(); });

// We use a single common worklist for calls across the entire SCC. We		// We use a single common worklist for calls across the entire SCC. We
// process these in-order and append new calls introduced during inlining to		// process these in-order and append new calls introduced during inlining to
// the end.		// the end. The PriorityInlineOrder is optional here, in which the samller
		// callsite would have a higher priority to inline.
		kazuUnsubmitted Done Reply Inline Actions `smaller` kazu: `smaller`
//		//
// Note that this particular order of processing is actually critical to		// Note that this particular order of processing is actually critical to
// avoid very bad behaviors. Consider highly connected call graphs where		// avoid very bad behaviors. Consider highly connected call graphs where
// each function contains a small amount of code and a couple of calls to		// each function contains a small amount of code and a couple of calls to
// other functions. Because the LLVM inliner is fundamentally a bottom-up		// other functions. Because the LLVM inliner is fundamentally a bottom-up
// inliner, it can handle gracefully the fact that these all appear to be		// inliner, it can handle gracefully the fact that these all appear to be
// reasonable inlining candidates as it will flatten things until they become		// reasonable inlining candidates as it will flatten things until they become
// too big to inline, and then move on and flatten another batch.		// too big to inline, and then move on and flatten another batch.
//		//
// However, when processing call edges within an SCC we cannot rely on this		// However, when processing call edges within an SCC we cannot rely on this
// bottom-up behavior. As a consequence, with heavily connected SCCs of		// bottom-up behavior. As a consequence, with heavily connected SCCs of
// functions we can end up incrementally inlining N calls into each of		// functions we can end up incrementally inlining N calls into each of
// N functions because each incremental inlining decision looks good and we		// N functions because each incremental inlining decision looks good and we
// don't have a topological ordering to prevent explosions.		// don't have a topological ordering to prevent explosions.
//		//
// To compensate for this, we don't process transitive edges made immediate		// To compensate for this, we don't process transitive edges made immediate
// by inlining until we've done one pass of inlining across the entire SCC.		// by inlining until we've done one pass of inlining across the entire SCC.
// Large, highly connected SCCs still lead to some amount of code bloat in		// Large, highly connected SCCs still lead to some amount of code bloat in
// this model, but it is uniformly spread across all the functions in the SCC		// this model, but it is uniformly spread across all the functions in the SCC
// and eventually they all become too large to inline, rather than		// and eventually they all become too large to inline, rather than
// incrementally maknig a single function grow in a super linear fashion.		// incrementally maknig a single function grow in a super linear fashion.
		ChuanqiXuUnsubmitted Done Reply Inline Actions Since we add new order strategy, we need edit the comment too. ChuanqiXu: Since we add new order strategy, we need edit the comment too.
DefaultInlineOrder<std::pair<CallBase *, int>> Calls;		std::unique_ptr<InlineOrder<std::pair<CallBase *, int>>> Calls;
		if (InlineEnablePriorityOrder)
		Calls = std::make_unique<PriorityInlineOrder>();
		else
		Calls = std::make_unique<DefaultInlineOrder<std::pair<CallBase *, int>>>();
		yurai007Unsubmitted Done Reply Inline Actions nit: std::make_unique? yurai007: nit: std::make_unique?
		assert(Calls != nullptr && "Expected an initialized InlineOrder");
		mtrofinUnsubmitted Done Reply Inline Actions add a flag to this file to control which InlineOrder you use. mtrofin: add a flag to this file to control which InlineOrder you use.
		mtrofinUnsubmitted Done Reply Inline Actions nit: when the flag is enabled, you end up allocating an object just to drop it right after. You can just allocate the appropriate one on each branch of the if statement. Then, if you want, just assert after the if that CallsPtr is not null - this will help maintaining this later, when there are more than 2 ordering implementations, and it makes the design crystal-clear. mtrofin: nit: when the flag is enabled, you end up allocating an object just to drop it right after. You…
		kazuUnsubmitted Done Reply Inline Actions nit: I would just leave the name as is -- "Calls". "CallsPtr" is a bit mouthful although I understand what you mean here. kazu: nit: I would just leave the name as is -- "Calls". "CallsPtr" is a bit mouthful although I…

// Populate the initial list of calls in this SCC.		// Populate the initial list of calls in this SCC.
for (auto &N : InitialC) {		for (auto &N : InitialC) {
auto &ORE =		auto &ORE =
FAM.getResult<OptimizationRemarkEmitterAnalysis>(N.getFunction());		FAM.getResult<OptimizationRemarkEmitterAnalysis>(N.getFunction());
// We want to generally process call sites top-down in order for		// We want to generally process call sites top-down in order for
// simplifications stemming from replacing the call with the returned value		// simplifications stemming from replacing the call with the returned value
// after inlining to be visible to subsequent inlining decisions.		// after inlining to be visible to subsequent inlining decisions.
// FIXME: Using instructions sequence is a really bad way to do this.		// FIXME: Using instructions sequence is a really bad way to do this.
// Instead we should do an actual RPO walk of the function body.		// Instead we should do an actual RPO walk of the function body.
for (Instruction &I : instructions(N.getFunction()))		for (Instruction &I : instructions(N.getFunction()))
if (auto *CB = dyn_cast<CallBase>(&I))		if (auto *CB = dyn_cast<CallBase>(&I))
if (Function *Callee = CB->getCalledFunction()) {		if (Function *Callee = CB->getCalledFunction()) {
if (!Callee->isDeclaration())		if (!Callee->isDeclaration())
Calls.push({CB, -1});		Calls->push({CB, -1});
else if (!isa<IntrinsicInst>(I)) {		else if (!isa<IntrinsicInst>(I)) {
using namespace ore;		using namespace ore;
setInlineRemark(*CB, "unavailable definition");		setInlineRemark(*CB, "unavailable definition");
ORE.emit([&]() {		ORE.emit([&]() {
return OptimizationRemarkMissed(DEBUG_TYPE, "NoDefinition", &I)		return OptimizationRemarkMissed(DEBUG_TYPE, "NoDefinition", &I)
<< NV("Callee", Callee) << " will not be inlined into "		<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CB->getCaller())		<< NV("Caller", CB->getCaller())
<< " because its definition is unavailable"		<< " because its definition is unavailable"
<< setIsVerbose();		<< setIsVerbose();
});		});
}		}
}		}
}		}
if (Calls.empty())		if (Calls->empty())
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// Capture updatable variable for the current SCC.		// Capture updatable variable for the current SCC.
auto *C = &InitialC;		auto *C = &InitialC;

// When inlining a callee produces new call sites, we want to keep track of		// When inlining a callee produces new call sites, we want to keep track of
// the fact that they were inlined from the callee. This allows us to avoid		// the fact that they were inlined from the callee. This allows us to avoid
// infinite inlining in some obscure cases. To represent this, we use an		// infinite inlining in some obscure cases. To represent this, we use an
// index into the InlineHistory vector.		// index into the InlineHistory vector.
SmallVector<std::pair<Function *, int>, 16> InlineHistory;		SmallVector<std::pair<Function *, int>, 16> InlineHistory;

// Track a set vector of inlined callees so that we can augment the caller		// Track a set vector of inlined callees so that we can augment the caller
// with all of their edges in the call graph before pruning out the ones that		// with all of their edges in the call graph before pruning out the ones that
// got simplified away.		// got simplified away.
SmallSetVector<Function *, 4> InlinedCallees;		SmallSetVector<Function *, 4> InlinedCallees;

// Track the dead functions to delete once finished with inlining calls. We		// Track the dead functions to delete once finished with inlining calls. We
// defer deleting these to make it easier to handle the call graph updates.		// defer deleting these to make it easier to handle the call graph updates.
SmallVector<Function *, 4> DeadFunctions;		SmallVector<Function *, 4> DeadFunctions;

// Loop forward over all of the calls.		// Loop forward over all of the calls.
while (!Calls.empty()) {		while (!Calls->empty()) {
// We expect the calls to typically be batched with sequences of calls that		// We expect the calls to typically be batched with sequences of calls that
// have the same caller, so we first set up some shared infrastructure for		// have the same caller, so we first set up some shared infrastructure for
// this caller. We also do any pruning we can at this layer on the caller		// this caller. We also do any pruning we can at this layer on the caller
// alone.		// alone.
Function &F = *Calls.front().first->getCaller();		Function &F = *Calls->front().first->getCaller();
LazyCallGraph::Node &N = *CG.lookup(F);		LazyCallGraph::Node &N = *CG.lookup(F);
if (CG.lookupSCC(N) != C) {		if (CG.lookupSCC(N) != C) {
Calls.pop();		Calls->pop();
continue;		continue;
}		}

LLVM_DEBUG(dbgs() << "Inlining calls in: " << F.getName() << "\n"		LLVM_DEBUG(dbgs() << "Inlining calls in: " << F.getName() << "\n"
<< " Function size: " << F.getInstructionCount()		<< " Function size: " << F.getInstructionCount()
<< "\n");		<< "\n");

auto GetAssumptionCache = [&](Function &F) -> AssumptionCache & {		auto GetAssumptionCache = [&](Function &F) -> AssumptionCache & {
return FAM.getResult<AssumptionAnalysis>(F);		return FAM.getResult<AssumptionAnalysis>(F);
};		};

// Now process as many calls as we have within this caller in the sequence.		// Now process as many calls as we have within this caller in the sequence.
// We bail out as soon as the caller has to change so we can update the		// We bail out as soon as the caller has to change so we can update the
// call graph and prepare the context of that new caller.		// call graph and prepare the context of that new caller.
bool DidInline = false;		bool DidInline = false;
while (!Calls.empty() && Calls.front().first->getCaller() == &F) {		while (!Calls->empty() && Calls->front().first->getCaller() == &F) {
auto &P = Calls.front();		auto P = Calls->pop();
Calls.pop();
CallBase *CB = P.first;		CallBase *CB = P.first;
const int InlineHistoryID = P.second;		const int InlineHistoryID = P.second;
		ChuanqiXuUnsubmitted Done Reply Inline Actions It may be better to: CallBase CB = Calls->front().first; const int InlineHistoryID = Calls->front().second; Since we would call pop() immediately, the reference P would be invalid. It may be possible that programmers may refer P after `Calls->pop()`, which is a disaster. Another option is to replace `auto &P =` with `auto P =`. ChuanqiXu:* It may be better to: ``` CallBase *CB = Calls->front().first; const int InlineHistoryID = Calls…
Function &Callee = *CB->getCalledFunction();		Function &Callee = *CB->getCalledFunction();

if (InlineHistoryID != -1 &&		if (InlineHistoryID != -1 &&
inlineHistoryIncludes(&Callee, InlineHistoryID, InlineHistory)) {		inlineHistoryIncludes(&Callee, InlineHistoryID, InlineHistory)) {
setInlineRemark(*CB, "recursive");		setInlineRemark(*CB, "recursive");
continue;		continue;
}		}

▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	while (!Calls->empty() && Calls->front().first->getCaller() == &F) {
// the post-inline cleanup and the next DevirtSCCRepeatedPass		// the post-inline cleanup and the next DevirtSCCRepeatedPass
// iteration because the next iteration may not happen and we may		// iteration because the next iteration may not happen and we may
// miss inlining it.		// miss inlining it.
if (tryPromoteCall(*ICB))		if (tryPromoteCall(*ICB))
NewCallee = ICB->getCalledFunction();		NewCallee = ICB->getCalledFunction();
}		}
if (NewCallee)		if (NewCallee)
if (!NewCallee->isDeclaration())		if (!NewCallee->isDeclaration())
Calls.push({ICB, NewHistoryID});		Calls->push({ICB, NewHistoryID});
}		}
}		}

// Merge the attributes based on the inlining.		// Merge the attributes based on the inlining.
AttributeFuncs::mergeAttributesForInlining(F, Callee);		AttributeFuncs::mergeAttributesForInlining(F, Callee);

// For local functions, check whether this makes the callee trivially		// For local functions, check whether this makes the callee trivially
// dead. In that case, we can drop the body of the function eagerly		// dead. In that case, we can drop the body of the function eagerly
// which may reduce the number of callers of other functions to one,		// which may reduce the number of callers of other functions to one,
// changing inline cost thresholds.		// changing inline cost thresholds.
bool CalleeWasDeleted = false;		bool CalleeWasDeleted = false;
if (Callee.hasLocalLinkage()) {		if (Callee.hasLocalLinkage()) {
// To check this we also need to nuke any dead constant uses (perhaps		// To check this we also need to nuke any dead constant uses (perhaps
// made dead by this operation on other functions).		// made dead by this operation on other functions).
Callee.removeDeadConstantUsers();		Callee.removeDeadConstantUsers();
if (Callee.use_empty() && !CG.isLibFunction(Callee)) {		if (Callee.use_empty() && !CG.isLibFunction(Callee)) {
Calls.erase_if([&](const std::pair<CallBase *, int> &Call) {		Calls->erase_if([&](const std::pair<CallBase *, int> &Call) {
return Call.first->getCaller() == &Callee;		return Call.first->getCaller() == &Callee;
});		});
// Clear the body and queue the function itself for deletion when we		// Clear the body and queue the function itself for deletion when we
// finish inlining and call graph updates.		// finish inlining and call graph updates.
// Note that after this point, it is an error to do anything other		// Note that after this point, it is an error to do anything other
// than use the callee's address or delete it.		// than use the callee's address or delete it.
Callee.dropAllReferences();		Callee.dropAllReferences();
assert(!is_contained(DeadFunctions, &Callee) &&		assert(!is_contained(DeadFunctions, &Callee) &&
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/inline_call.ll

	; Check the optimizer doesn't crash at inlining the function top and all of its callees are inlined.			; Check the optimizer doesn't crash at inlining the function top and all of its callees are inlined.
	; RUN: opt < %s -O3 -S \| FileCheck %s			; RUN: opt < %s -O3 -S \| FileCheck %s
				; RUN: opt < %s -O3 -inline-enable-priority-order=true -S \| FileCheck %s
				mtrofinUnsubmitted Not Done Reply Inline Actions in this and the other test files, is there something different in the output from line 2 and line 3, so that you can add a check that enable-priority-order actually does something different? mtrofin: in this and the other test files, is there something different in the output from line 2 and…

	define dso_local void (...)* @second(i8** %p) {			define dso_local void (...)* @second(i8** %p) {
	entry:			entry:
	%p.addr = alloca i8**, align 8			%p.addr = alloca i8**, align 8
	store i8 %p, i8* %p.addr, align 8			store i8 %p, i8* %p.addr, align 8
	%tmp = load i8, i8* %p.addr, align 8			%tmp = load i8, i8* %p.addr, align 8
	%tmp1 = load i8, i8* %tmp, align 8			%tmp1 = load i8, i8* %tmp, align 8
	%tmp2 = bitcast i8* %tmp1 to void (...)*			%tmp2 = bitcast i8* %tmp1 to void (...)*
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	entry:			entry:
	%f = alloca void (...)*, align 8			%f = alloca void (...)*, align 8
	%call = call void (...)* @gen()			%call = call void (...)* @gen()
	store void (...)* %call, void (...)** %f, align 8			store void (...)* %call, void (...)** %f, align 8
	%tmp = load void (...), void (...)* %f, align 8			%tmp = load void (...), void (...)* %f, align 8
	call void @run(void (...)* %tmp)			call void @run(void (...)* %tmp)
	ret void			ret void
	}			}
	No newline at end of file			No newline at end of file

llvm/test/Transforms/Inline/inline_invoke.ll

	; RUN: opt < %s -inline -S \| FileCheck %s			; RUN: opt < %s -inline -S \| FileCheck %s
	; RUN: opt < %s -passes='cgscc(inline)' -S \| FileCheck %s			; RUN: opt < %s -passes='cgscc(inline)' -S \| FileCheck %s
				; RUN: opt < %s -passes='cgscc(inline)' -inline-enable-priority-order=true -S \| FileCheck %s

	; Test that the inliner correctly handles inlining into invoke sites			; Test that the inliner correctly handles inlining into invoke sites
	; by appending selectors and forwarding _Unwind_Resume directly to the			; by appending selectors and forwarding _Unwind_Resume directly to the
	; enclosing landing pad.			; enclosing landing pad.

	;; Test 0 - basic functionality.			;; Test 0 - basic functionality.

	%struct.A = type { i8 }			%struct.A = type { i8 }
	▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/last-callsite.ll

	; RUN: opt < %s -passes='cgscc(inline)' -inline-threshold=0 -S \| FileCheck %s			; RUN: opt < %s -passes='cgscc(inline)' -inline-threshold=0 -S \| FileCheck %s
				; RUN: opt < %s -passes='cgscc(inline)' -inline-threshold=0 -inline-enable-priority-order=true -S \| FileCheck %s

	; The 'test1_' prefixed functions test the basic 'last callsite' inline			; The 'test1_' prefixed functions test the basic 'last callsite' inline
	; threshold adjustment where we specifically inline the last call site of an			; threshold adjustment where we specifically inline the last call site of an
	; internal function regardless of cost.			; internal function regardless of cost.

	define internal void @test1_f() {			define internal void @test1_f() {
	entry:			entry:
	%p = alloca i32			%p = alloca i32
	▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][Inliner] Add an optional PriorityInlineOrderClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 352448

llvm/lib/Transforms/IPO/Inliner.cpp

llvm/test/Transforms/Inline/inline_call.ll

llvm/test/Transforms/Inline/inline_invoke.ll

llvm/test/Transforms/Inline/last-callsite.ll

[llvm][Inliner] Add an optional PriorityInlineOrder
ClosedPublic