- Separate user bonus to Latency and CodeSize.
- Do not re-count the bonus of a visited user (from other arguments).
- Limit the recursion depth when accumulating user bonus.
- Penalize latency of loop nests the deeper we get in the use-def chain.
- Penalize specialization on the same argument.
- Formulate specialization score based on Latency/CodeSize ratio.
Details
- Reviewers
ChuanqiXu SjoerdMeijer chill
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Old cost model (LTO)
Compilation
FnSpecialization: Specialized 1 functions in module test-suite/CTMark/SPASS/terminator.c
FnSpecialization: Specialized 5 functions in module test-suite/CTMark/ClamAV/libclamav_message.c
FnSpecialization: Specialized 1 functions in module test-suite/CTMark/SPASS/rules-inf.c
FnSpecialization: Specialized 1 functions in module test-suite/CTMark/sqlite3/sqlite3.c
Linking
FnSpecialization: Specialized 1 functions in module ld-temp.o (ClamAV)
FnSpecialization: Specialized 1 functions in module ld-temp.o (sqlite3)
FnSpecialization: Specialized 3 functions in module ld-temp.o (lencod)
New cost model (LTO)
Compilation
FnSpecialization: Created 1 specializations in module test-suite/CTMark/ClamAV/libclamav_message.c
Linking
FnSpecialization: Created 2 specializations in module ld-temp.o (ClamAV)
FnSpecialization: Created 2 specializations in module ld-temp.o (sqlite3)
FnSpecialization: Created 3 specializations in module ld-temp.o (lencod)
Instruction Count % delta
ClamAV | +0.381 |
7zip | -0.012 |
tramp3d-v4 | -0.007 |
kimwitu++ | +0.097 |
sqlite3 | +0.170 |
mafft | +0.026 |
lencod | -0.054 |
SPASS | -0.673 |
consumer-typeset | +0.054 |
Bullet | -0.012 |
geomean | -0.003 |
(The specializations happening at link time seem to be affecting the total compilation time a lot more than those happening pre-link time).
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
140–141 | Oops, forgot to remove it from here |
Changes from last revision:
- Removed redefinition of SpecMap type from the header file.
- Removed @param Cost from the description of findSpecializations() in the header file.
- Decreased the value of MinScore from 100 to 80 as it no longer specialized mcf_r from SPEC2017.
Adjusted the test output for recursive-penalty.ll. Decreasing MinScore from 100 to 80 translates to 10 instead of 8 specializations being created for the test.
I am wondering if you're trying to do too much in this patch. But I only had a quick look at this, and need to look again.
But my first request is going to be about the cost/bonus calculation. Since this is crucial for this transformation, it would be useful to document and comment the idea/algorithm at some place, with all the formulas and definitions (costs, bonus, latency, code size etc). I think this would greatly improve readability of these changes and the code in general.
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
51 | Thanks for adding these descriptions, that is very helpful. Just trying to understand the initial code size calculation better, the last part: [number of inline candidates] x [small function size] why is that not the [number of instructions] for each inline candidate? |
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
88 | It's not clear to me what this range exactly is. | |
90 | Nit: perhaps consider a more informative name for this. | |
92 | Nit: abreviation -> abbreviation | |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
396 | Nit: UM is a bit cryptic. |
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
51 | Because this information is not available in CodeMetrics. We don't know who that inline candidate is at this point, and if we did we would have to compute it's CodeMetrics. So instead we use mall function size as an approximation. Knowing the exact number of instructions does not matter that much. What matters is to increase the codesize hit if there are inline candidates, meaning the entire function might end up even larger than it is now. | |
88 | This typdef is unrelated to this patch, I just grouped all the typedef here. Just ignore it. | |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
396 | UniqueMap. Was already named as such before this patch, so unrelated. Please ignore on this review. |
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
51 | Thanks, if you can add this explanation to the comments that would help. | |
88 | Okay, so it lived somewhere else before and you just moved it here, but can you explain what the range is while we are at it? | |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
396 | Naming is difficult, but it would be a shame if we cannot come up with something better while we are at it. | |
543–544 | Bit of duplication here with the comment above. | |
567–568 | We are not calculating one bonus anymore, but two values latency and code size. | |
588 | I don't quite get why we are subtracting UserSize here. I guess something related to double or recounting, but after a quick look it wasn't clear to me, perhaps a comment would be helpful here. | |
llvm/test/Transforms/FunctionSpecialization/recursive-penalty.ll | ||
10 | Do we want to check the resulting IR too? |
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
88 | Isn't that clear from the comment above? We have a vector of specializations. In this vector the specializations of any one function are adjacent to one another, IOW "form a contiguous range". A somewhat longer description of the data structures is given in the commit message: in https://reviews.llvm.org/rGe6b9fc4c8be00660c5e1f1605a6a5d92475bdba7 |
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
88 |
No, sorry, it wasn't for me. This however, is crystal clear to me:
As it also answers my follow up question, i.e. what is the relevance/rationale of this. I think exactly this should be part of the comment/explanation here.
So I missed that review, i.e. didn't have time to look into it, but am now dipping back into FuncSpec, which is why I am missing some rationale. But I think this is an excellent write-up, that should be, or parts of it, should be part of the write up to make this self contained (i.e. it's a bit of a missed opportunity this explanation is only in the commit message). |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
543–544 | Good point, I forgot to remove these comments. | |
567–568 | Will update the wording. | |
588 | We start from Cost CodeSize = Metrics.NumInsts + Metrics.NumInlineCandidates * MinFunctionSize; (see the invocation of getSpecializationBonus). Then we start decreasing it by the footprint of each user. | |
llvm/test/Transforms/FunctionSpecialization/recursive-penalty.ll | ||
10 | Not really. All we care about in this test is to show that the number of specializations created is not linear to funcspec-max-iters. |
llvm/include/llvm/Transforms/IPO/FunctionSpecialization.h | ||
---|---|---|
51 | Will do. | |
88 | I disagree with repeating the write-up in a comment as it may become irrelevant at some point. I think git blame should suffice. I find the existing comment already quite extensive/informative. | |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
396 | Agreed. I'll update it. |
I am having second thoughts about this change. I will merge the trivial NFC changes (variable renaming, type aliasing) and will raise new tickets for the remaining parts. So abandoning the cost model improvements for now.
Thanks for adding these descriptions, that is very helpful.
Just trying to understand the initial code size calculation better, the last part:
why is that not the [number of instructions] for each inline candidate?