This is an archive of the discontinued LLVM Phabricator instance.

[llvm][Inliner] Add an optional PriorityInlineOrder
ClosedPublic

Authored by taolq on Jun 10 2021, 6:34 AM.

Details

Summary

This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.

Diff Detail

Event Timeline

taolq created this revision.Jun 10 2021, 6:34 AM
taolq requested review of this revision.Jun 10 2021, 6:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 6:34 AM
taolq retitled this revision from [WIP] use standard priority queue to order inlining to [WIP] Use standard priority queue to order inlining.Jun 10 2021, 6:46 AM
taolq edited the summary of this revision. (Show Details)
taolq added reviewers: mtrofin, kazu, teemperor.
mtrofin added inline comments.Jun 10 2021, 9:01 AM
llvm/lib/Transforms/IPO/Inliner.cpp
712

you can also delay pop-ing. Or, better, have pop return T, and front return reference and be a const function (i.e. a peek - you need it I think in the for loop, line 899).

823–828

add a flag to this file to control which InlineOrder you use.

ormris removed a subscriber: ormris.Jun 10 2021, 10:25 AM
taolq updated this revision to Diff 351890.Jun 14 2021, 8:40 AM
  • Add inline order option
  • modify pop() & front() API

With the introduction of the flag, there shouldn't be any more failing tests, right?

llvm/lib/Transforms/IPO/Inliner.cpp
103

nit: drop the second inline, simpler:

InlineEnablePriorityOrder
inline-enable-priority-order

823–828

nit: when the flag is enabled, you end up allocating an object just to drop it right after. You can just allocate the appropriate one on each branch of the if statement. Then, if you want, just assert after the if that CallsPtr is not null - this will help maintaining this later, when there are more than 2 ordering implementations, and it makes the design crystal-clear.

kazu added inline comments.Jun 14 2021, 10:21 AM
llvm/lib/Transforms/IPO/Inliner.cpp
823–828

nit: I would just leave the name as is -- "Calls". "CallsPtr" is a bit mouthful although I understand what you mean here.

taolq added a comment.Jun 14 2021, 5:32 PM

With the introduction of the flag, there shouldn't be any more failing tests, right?

Yes, all tests pass.

With the introduction of the flag, there shouldn't be any more failing tests, right?

Yes, all tests pass.

Worth changing the summary (and also the git commit - the latter gets copied to the former only the first time you arc diff )

taolq updated this revision to Diff 352054.Jun 15 2021, 12:20 AM
taolq marked 3 inline comments as done.
  • renaming option and Calls
  • allocate Calls once on the branch statement

I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?
BTW, for the performance side, SPEC2017 shows some regression with some improvement if we use std::priority_queue by InlineCost. It may be irrelevant to this patch.

PS2: Did you consider to move InlineOrder(s) class out of Inlined.cpp into the header and make it as a member of InlinedPass just like the Advisor? I guess it may be more easier to use them.

llvm/lib/Transforms/IPO/Inliner.cpp
768–779

I prefer to use std::vector<T> as data member instead of PriorityQueue. After that, the implementation may be simpler. For example:

swap all the required element to the end of the vector
std::make_heap(...)

And the implementation for pop and push wouldn't be much harder.

taolq added a comment.Jun 15 2021, 1:35 AM

I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?

Thanks for your comments. In this patch, callsites are sorted by callee size (see PriorityInlindeOrder::evaluate())

BTW, for the performance side, SPEC2017 shows some regression with some improvement if we use std::priority_queue by InlineCost. It may be irrelevant to this patch.

That sounds great! I would like to use more elaborate priority functions in the future, e.g. consider both inline costs, callee size, and other profile.

PS2: Did you consider to move InlineOrder(s) class out of Inlined.cpp into the header and make it as a member of InlinedPass just like the Advisor? I guess it may be more easier to use them.

It is defined in Inliner.cpp, because it is an abstraction meant for the inliner.

Thanks for your comments. In this patch, callsites are sorted by callee size (see PriorityInlindeOrder::evaluate())

Sorry, I missed that.

That sounds great! I would like to use more elaborate priority functions in the future, e.g. consider both inline costs, callee size, and other profile.

The main point is the regression. Calculate the inline cost for every callsite is costful. In other words, it grows the compile-time without significant improvements. (We could discuss this in other threads further, it may be irrelevant)

It is defined in Inliner.cpp, because it is an abstraction meant for the inliner.

I am fine to remain it in Inliner.cpp. But the reason may be a little weird for me since there many components of inlining didn't be put in inliner.cpp.

ChuanqiXu added inline comments.Jun 15 2021, 2:01 AM
llvm/lib/Transforms/IPO/Inliner.cpp
904–906

It may be better to:

CallBase *CB = Calls->front().first;
const int InlineHistoryID = Calls->front().second;

Since we would call pop() immediately, the reference P would be invalid. It may be possible that programmers may refer P after Calls->pop(), which is a disaster. Another option is to replace auto &P = with auto P =.

yurai007 added inline comments.
llvm/lib/Transforms/IPO/Inliner.cpp
727

nit: use 'final' to unlock devirtualization opportunity for overridden members.

827

nit: std::make_unique?

taolq updated this revision to Diff 352151.Jun 15 2021, 8:18 AM
taolq marked 2 inline comments as done.
  • replace implementation of PriorityInlineOrder from PriorityQueue to SmallVector
taolq edited the summary of this revision. (Show Details)Jun 15 2021, 8:25 AM
ChuanqiXu added inline comments.Jun 15 2021, 7:08 PM
llvm/lib/Transforms/IPO/Inliner.cpp
19

Now it is not necessary.

798–822

Since we add new order strategy, we need edit the comment too.

taolq retitled this revision from [WIP] Use standard priority queue to order inlining to Add an optional PriorityInlineOrder..Jun 16 2021, 8:24 AM
taolq edited the summary of this revision. (Show Details)
taolq retitled this revision from Add an optional PriorityInlineOrder. to Add an optional PriorityInlineOrder.
taolq updated this revision to Diff 352448.Jun 16 2021, 8:27 AM
taolq marked 4 inline comments as done.
taolq edited the summary of this revision. (Show Details)
  • remove useless header file
  • add tests for PriorityInlineOrder
kazu added a comment.Jun 16 2021, 10:43 AM

I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?

The intention here is to make it easy to try out different priority functions -- callee size, dynamic call count, impacts on callers, etc.

The main point is the regression. Calculate the inline cost for every callsite is costful. In other words, it grows the compile-time without significant improvements. (We could discuss this in other threads further, it may be irrelevant)

The compilation time is not a main concern while we are gathering insights.

mtrofin added inline comments.Jun 16 2021, 1:27 PM
llvm/test/Transforms/Inline/inline_call.ll
3

in this and the other test files, is there something different in the output from line 2 and line 3, so that you can add a check that enable-priority-order actually does something different?

I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?

The intention here is to make it easy to try out different priority functions -- callee size, dynamic call count, impacts on callers, etc.

The main point is the regression. Calculate the inline cost for every callsite is costful. In other words, it grows the compile-time without significant improvements. (We could discuss this in other threads further, it may be irrelevant)

The compilation time is not a main concern while we are gathering insights.

Yeah, I understood it. I comment this just because I find that someone are doing something similar with me so that I want to share something.

BTW: It'd better to add a tag before the title, like '[Inline]' or '[Inliner]'.

taolq retitled this revision from Add an optional PriorityInlineOrder to [llvm][Inliner] Add an optional PriorityInlineOrder.Jun 16 2021, 6:59 PM
taolq updated this revision to Diff 352747.Jun 17 2021, 9:01 AM
  • Add a test
kazu added inline comments.Jun 17 2021, 10:07 AM
llvm/lib/Transforms/IPO/Inliner.cpp
800–801

smaller

taolq updated this revision to Diff 352880.Jun 17 2021, 5:21 PM
taolq marked an inline comment as done.
  • fix typo
mtrofin added inline comments.Jun 17 2021, 5:27 PM
llvm/test/Transforms/Inline/monster_scc.ll
73 ↗(On Diff #352880)

These look like the old cases, though - is there a difference that the priority-based ordering introduces? (maybe I'm missing it)

132 ↗(On Diff #352880)

same comment - what's the difference?

taolq added inline comments.Jun 17 2021, 5:56 PM
llvm/test/Transforms/Inline/monster_scc.ll
73 ↗(On Diff #352880)

Actually, they are different. Each one, (old inline pass, new inline pass, PriorityInlineOrder) makes a different function body after inlining.
The inlined function name is hard to distinguish. You could also focus on the number of called functions.
e.g. with PriorityInlineOrder, this function has 7 call after inlining, more than the previous two cases.

132 ↗(On Diff #352880)

After inlining, the called functions are different.
These check the sequence of call statements.

mtrofin accepted this revision.Jun 17 2021, 6:02 PM

lgtm

This revision is now accepted and ready to land.Jun 17 2021, 6:02 PM
This revision was landed with ongoing or failed builds.Jun 18 2021, 2:00 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a subscriber: fhahn.Jun 18 2021, 3:07 AM

It looks like this change may cause the following failure on GreenDragon. It would be great if you could take a look.

ommand Output (stderr):
--
+ : 'RUN: at line 42'
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -inline -inline-threshold=150 -enable-new-pm=0
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,OLD
+ : 'RUN: at line 43'
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -passes=inline -inline-threshold=150
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,NEW
+ : 'RUN: at line 44'
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -passes=inline -inline-threshold=150 -inline-enable-priority-order=true
+ /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,PO
/Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll:76:7: error: PO: expected string not found in input
; PO: call void @_Z1fILb1ELi2EEvPbS0_(
      ^
<stdin>:19:34: note: scanning from here
 call void @_Z1fILb1ELi1EEvPbS0_(i8* %add.ptr2, i8* %E)
                                 ^
<stdin>:23:2: note: possible intended match here
 call void @_Z1fILb0ELi1EEvPbS0_(i8* %add.ptr2, i8* %E)
 ^
/Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll:137:7: error: PO: expected string not found in input
; PO: call void @_Z1fILb0ELi1EEvPbS0_(
      ^
<stdin>:43:34: note: scanning from here
 call void @_Z1fILb1ELi1EEvPbS0_(i8* %add.ptr2, i8* %E)
                                 ^
<stdin>:72:2: note: possible intended match here
 call void @_Z1fILb0ELi3EEvPbS0_(i8* %add.ptr2.i9, i8* %E)
 ^

https://green.lab.llvm.org/green/job/clang-stage1-RA/21735/consoleFull#2293633728254eaf0-7326-4999-85b0-388101f2d404

thakis added a subscriber: thakis.Jun 18 2021, 3:45 AM

Looks like this breaks tests on mac: http://45.33.8.238/macm1/11921/step_11.txt

Please take a look and revert for now if it takes a while to fix.

taolq reopened this revision.Jun 18 2021, 6:47 PM
This revision is now accepted and ready to land.Jun 18 2021, 6:47 PM
taolq updated this revision to Diff 353149.EditedJun 18 2021, 6:48 PM
  • remove test monster_scc.ll which has a different output in macOS
This revision was landed with ongoing or failed builds.Jun 18 2021, 7:20 PM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Jun 19 2021, 1:44 PM
  • remove test monster_scc.ll which has a different output in macOS

Could you elaborate *why* it has a different output? Not sure if removing the test because it fails on macOS is the best way forward. It would be good to understand *why* there's a difference first.

  • remove test monster_scc.ll which has a different output in macOS

Could you elaborate *why* it has a different output? Not sure if removing the test because it fails on macOS is the best way forward. It would be good to understand *why* there's a difference first.

I don't know the reason yet. I will figure it out.

wenlei added a subscriber: wenlei.Aug 30 2021, 2:01 PM
MTC added a subscriber: MTC.Aug 30 2021, 7:11 PM