This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
With the introduction of the flag, there shouldn't be any more failing tests, right?
llvm/lib/Transforms/IPO/Inliner.cpp | ||
---|---|---|
104 | nit: drop the second inline, simpler: InlineEnablePriorityOrder | |
824 | nit: when the flag is enabled, you end up allocating an object just to drop it right after. You can just allocate the appropriate one on each branch of the if statement. Then, if you want, just assert after the if that CallsPtr is not null - this will help maintaining this later, when there are more than 2 ordering implementations, and it makes the design crystal-clear. |
llvm/lib/Transforms/IPO/Inliner.cpp | ||
---|---|---|
824 | nit: I would just leave the name as is -- "Calls". "CallsPtr" is a bit mouthful although I understand what you mean here. |
Worth changing the summary (and also the git commit - the latter gets copied to the former only the first time you arc diff )
I am tuning the performance by reordering inlining in downstream. My first try was to use std::priority_queue. But I tried to use the inline cost heuristic to order them. In this patch it looks like sort callsites by HistoryID? What's the intention?
BTW, for the performance side, SPEC2017 shows some regression with some improvement if we use std::priority_queue by InlineCost. It may be irrelevant to this patch.
PS2: Did you consider to move InlineOrder(s) class out of Inlined.cpp into the header and make it as a member of InlinedPass just like the Advisor? I guess it may be more easier to use them.
llvm/lib/Transforms/IPO/Inliner.cpp | ||
---|---|---|
763–774 | I prefer to use std::vector<T> as data member instead of PriorityQueue. After that, the implementation may be simpler. For example: swap all the required element to the end of the vector std::make_heap(...) And the implementation for pop and push wouldn't be much harder. |
Thanks for your comments. In this patch, callsites are sorted by callee size (see PriorityInlindeOrder::evaluate())
BTW, for the performance side, SPEC2017 shows some regression with some improvement if we use std::priority_queue by InlineCost. It may be irrelevant to this patch.
That sounds great! I would like to use more elaborate priority functions in the future, e.g. consider both inline costs, callee size, and other profile.
PS2: Did you consider to move InlineOrder(s) class out of Inlined.cpp into the header and make it as a member of InlinedPass just like the Advisor? I guess it may be more easier to use them.
It is defined in Inliner.cpp, because it is an abstraction meant for the inliner.
Sorry, I missed that.
That sounds great! I would like to use more elaborate priority functions in the future, e.g. consider both inline costs, callee size, and other profile.
The main point is the regression. Calculate the inline cost for every callsite is costful. In other words, it grows the compile-time without significant improvements. (We could discuss this in other threads further, it may be irrelevant)
It is defined in Inliner.cpp, because it is an abstraction meant for the inliner.
I am fine to remain it in Inliner.cpp. But the reason may be a little weird for me since there many components of inlining didn't be put in inliner.cpp.
llvm/lib/Transforms/IPO/Inliner.cpp | ||
---|---|---|
902–904 | It may be better to: CallBase *CB = Calls->front().first; const int InlineHistoryID = Calls->front().second; Since we would call pop() immediately, the reference P would be invalid. It may be possible that programmers may refer P after Calls->pop(), which is a disaster. Another option is to replace auto &P = with auto P =. |
The intention here is to make it easy to try out different priority functions -- callee size, dynamic call count, impacts on callers, etc.
The compilation time is not a main concern while we are gathering insights.
llvm/test/Transforms/Inline/inline_call.ll | ||
---|---|---|
3 ↗ | (On Diff #352448) | in this and the other test files, is there something different in the output from line 2 and line 3, so that you can add a check that enable-priority-order actually does something different? |
Yeah, I understood it. I comment this just because I find that someone are doing something similar with me so that I want to share something.
BTW: It'd better to add a tag before the title, like '[Inline]' or '[Inliner]'.
llvm/lib/Transforms/IPO/Inliner.cpp | ||
---|---|---|
802 | smaller |
llvm/test/Transforms/Inline/monster_scc.ll | ||
---|---|---|
73 ↗ | (On Diff #352880) | Actually, they are different. Each one, (old inline pass, new inline pass, PriorityInlineOrder) makes a different function body after inlining. |
132 ↗ | (On Diff #352880) | After inlining, the called functions are different. |
It looks like this change may cause the following failure on GreenDragon. It would be great if you could take a look.
ommand Output (stderr): -- + : 'RUN: at line 42' + /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -inline -inline-threshold=150 -enable-new-pm=0 + /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,OLD + : 'RUN: at line 43' + /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -passes=inline -inline-threshold=150 + /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,NEW + : 'RUN: at line 44' + /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/opt -S -passes=inline -inline-threshold=150 -inline-enable-priority-order=true + /Users/buildslave/jenkins/workspace/clang-stage1-RA/clang-build/bin/FileCheck /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll --check-prefixes=CHECK,PO /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll:76:7: error: PO: expected string not found in input ; PO: call void @_Z1fILb1ELi2EEvPbS0_( ^ <stdin>:19:34: note: scanning from here call void @_Z1fILb1ELi1EEvPbS0_(i8* %add.ptr2, i8* %E) ^ <stdin>:23:2: note: possible intended match here call void @_Z1fILb0ELi1EEvPbS0_(i8* %add.ptr2, i8* %E) ^ /Users/buildslave/jenkins/workspace/clang-stage1-RA/llvm-project/llvm/test/Transforms/Inline/monster_scc.ll:137:7: error: PO: expected string not found in input ; PO: call void @_Z1fILb0ELi1EEvPbS0_( ^ <stdin>:43:34: note: scanning from here call void @_Z1fILb1ELi1EEvPbS0_(i8* %add.ptr2, i8* %E) ^ <stdin>:72:2: note: possible intended match here call void @_Z1fILb0ELi3EEvPbS0_(i8* %add.ptr2.i9, i8* %E) ^
Looks like this breaks tests on mac: http://45.33.8.238/macm1/11921/step_11.txt
Please take a look and revert for now if it takes a while to fix.
Could you elaborate *why* it has a different output? Not sure if removing the test because it fails on macOS is the best way forward. It would be good to understand *why* there's a difference first.
Now it is not necessary.