This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
SCCPSolver.h
-
lib/Transforms/
-
Transforms/
-
IPO/
25/57
FunctionSpecialization.cpp
-
Utils/
1/2
SCCPSolver.cpp
-
test/Transforms/FunctionSpecialization/
-
Transforms/
-
FunctionSpecialization/
-
function-specialization4.ll
1/2
specialize-multiple-arguments.ll

Differential D119880

[FuncSpec] Support function specialization across multiple arguments.
ClosedPublic

Authored by labrinea on Feb 15 2022, 12:10 PM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
fhahn
ChuanqiXu

Commits

rG8045bf9d0dc5: [FuncSpec] Support function specialization across multiple arguments.

Summary

The current implementation of Function Specialization does not allow specializing more than one arguments per function call, which is a limitation I am lifting with this patch.

My main challenge was to choose the most suitable ADT for storing the specializations. We need an associative container for binding all the actual arguments of a specialization to the function call. We also need a consistent iteration order across executions. Lastly we want to be able to sort the entries by Gain and reject the least profitable ones.

MapVector fits the bill but not quite; erasing elements is expensive and using stable_sort messes up the indices to the underlying vector. I am therefore using the underlying vector directly after calculating the Gain.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

labrinea created this revision.Feb 15 2022, 12:10 PM

Herald added subscribers: snehasish, ormris, hiraditya. · View Herald TranscriptFeb 15 2022, 12:10 PM

labrinea requested review of this revision.Feb 15 2022, 12:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 12:10 PM

labrinea added a parent revision: D119878: [FuncSpec] Remove definitions of fully specialized functions..Feb 15 2022, 12:10 PM

Harbormaster completed remote builds in B149788: Diff 408995.Feb 15 2022, 12:10 PM

There is quite a lot to unpack here:

This patch makes a significant change in the cost model as it no longer seems sensible to calculate the specialization gain per function argument, but rather as a whole. I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation for newly discovered functions, and the truncation of clones for a given function.

My request would be to split this up and do one thing at a time if possible. There also seems to a bit of refactoring and NFC changes in here, probably also best split off.

Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Clarifying my previous comment a bit more:

Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Compile times are important of course. But what I want to say is that we should aim to lift some of these arbitrary restrictions, like you mentioned, by providing new options/ways to control things but try to be as NFC or close to the original behaviour as possible. That was tuned to specialise very infrequently and a special case, so everything lifting of restrictions will increase compile times. Thus, the way I look at this that you put the infrastructure in place, so that perhaps later we can change things, or decisions can be manually overridden.

I have measured compilation times with the pass enabled/disabled by default using instruction count as the metric following these:

Could you share the link the the actual comparison (there's a C link on the left side for each commit on the overview page)? From the numbers you posted it is not clear for which configuration those numbers are (e.g. O3 + NewPM, ReleaseLTO + g + NewPM).

! In D119880#3325744, @fhahn wrote:
Could you share the link the the actual comparison (there's a C link on the left side for each commit on the overview page)? From the numbers you posted it is not clear for which configuration those numbers are (e.g. O3 + NewPM, ReleaseLTO + g + NewPM).

Sorry I wasn't clear. I performed a local run on my x86 machine configuring the build as cmake -GNinja /path/to/llvm-test-suite/ -DOPTFLAGS="" -C /path/to/llvm-test-suite/cmake/caches/O3.cmake -DCMAKE_C_COMPILER=/path/to/release-build-no-asserts/bin/clang -DTEST_SUITE_USE_PERF=true -DTEST_SUITE_SUBDIRS=CTMark -DTEST_SUITE_RUN_BENCHMARKS=false -DTEST_SUITE_COLLECT_CODE_SIZE=false. I am not sure which pass manager that configuration is using.

! In D119880#3325445, @SjoerdMeijer wrote:
My request would be to split this up and do one thing at a time if possible. There also seems to a bit of refactoring and NFC changes in here, probably also best split off.
Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Could you clarify which bits to split up, as I don't see how I could further break down this patch? Regarding the number of functions specialized in comparison to the previous version, I believe the llvm-test-suite reports statistics so I might be able to provide that information. Cheers.

! In D119880#3325448, @SjoerdMeijer wrote:
Compile times are important of course. But what I want to say is that we should aim to lift some of these arbitrary restrictions, like you mentioned, by providing new options/ways to control things but try to be as NFC or close to the original behaviour as possible. That was tuned to specialise very infrequently and a special case, so everything lifting of restrictions will increase compile times. Thus, the way I look at this that you put the infrastructure in place, so that perhaps later we can change things, or decisions can be manually overridden.

Indeed this pass is profitable for spec-int-mcf. The two interesting functions we get to specialize have a gain about 4M. I experimented with the default value of MinGainThreshold among {1, 1K, 10K, 100K, 1M}. Using the llvm-test-suite for measuring compilation times anything above 10K had more or less the same effect, so I chose that one.

Here are a few more statistics from comparing this patch to the current implementation of function specialization:

This is from compiling the llvm-test-suite at -O3 under perf with a release build (no asserts) targeting x86. The metric is instruction count (average of three)

test name	%delta
ClamAV	+0.009545546536911
7zip	-0.001629518928931
tramp3d-v4	-0.046465647871192
kimwitu++	+0.011940454030694
sqlite3	-0.158695422048798
mafft	-0.014463100189515
lencod	-0.020921880121996
SPASS	-0.047946880831827
Bullet	-0.003464312699035
consumer-typeset	-0.008383706273952

geomean = -0.0280598%

This is from compiling/running the llvm-test-suite at -O3 targeting AArch64 with statistics.

test name	num specializations before	num specializations after
MultiSource/Applications/ClamAV/clamscan.test	3	0
MultiSource/Applications/d/make_dparser.test	1	0
MultiSource/Applications/oggenc/oggenc.test	2	2
MultiSource/Applications/sqlite3/sqlite3.test	3	0

Sorry for the delay. First, about a NFC and some refactoring, can the reshuffle of ArgInfo and SpecialisationInfo and the changes in the Solver functions be an NFC change perhaps?

But more importantly, rereading the description, I disagree with these statements:

I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation for newly discovered functions, and the truncation of clones for a given function.

These are not arbitrary restrictions, but was by design, like I mentioned in a comment inline. For the former, this was fundamental how the cost model used to work. For the latter, the candidates were sorted first on profitability, then the candidates with the least gain disregarded. Again, this was both by design, to manage the number of specialisations and compile-times. Also, this patch introduces an arbitrary heuristic: MinGainThreshold.

The real arbitrary restriction of the current implementation is this: we only specialise the first argument. This is the real restriction that we should lift first, and it should be based of course on profitability. That's why I feel this patch is doing too much at the same time, from some restructuring, to changing how the cost-model works and specialising multiple arguments. So my suggestion/question is if it would be possible to split things up accordingly. Does that make sense?

In D119880#3327657, @labrinea wrote:

Here are a few more statistics from comparing this patch to the current implementation of function specialization:

This is from compiling the llvm-test-suite at -O3 under perf with a release build (no asserts) targeting x86. The metric is instruction count (average of three)

test name %delta

ClamAV +0.009545546536911

7zip -0.001629518928931

tramp3d-v4 -0.046465647871192

kimwitu++ +0.011940454030694

sqlite3 -0.158695422048798

mafft -0.014463100189515

lencod -0.020921880121996

SPASS -0.047946880831827

Bullet -0.003464312699035

consumer-typeset -0.008383706273952

geomean = -0.0280598%

This is from compiling/running the llvm-test-suite at -O3 targeting AArch64 with statistics.

test name num specializations before num specializations after

MultiSource/Applications/ClamAV/clamscan.test 3 0

MultiSource/Applications/d/make_dparser.test 1 0

MultiSource/Applications/oggenc/oggenc.test 2 2

MultiSource/Applications/sqlite3/sqlite3.test 3 0

So why exactly do we specialise less?

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
107	Like I mentioned before, not sure what the value is of recording compile time numbers here as things are still in flux; i.e. I don't think this information will age very well.
117–119	Nit: perhaps `SpecializationInfo`, to be a tiny bit more consistent (with ArgInfo) and descriptive.
314–315	Nit: `Info` could be more descriptive.
398	Nit: `Info` could be more descriptive.
551	This was fundamental how the cost model used to work: the more function specialised, the harder that become due to the Penalty. So I disagree with this statement: I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation .... This was not arbitrary, but by design.

! In D119880#3337134, @SjoerdMeijer wrote:
These are not arbitrary restrictions, but was by design, like I mentioned in a comment inline. For the former, this was fundamental how the cost model used to work. For the latter, the candidates were sorted first on profitability, then the candidates with the least gain disregarded. Again, this was both by design, to manage the number of specialisations and compile-times. Also, this patch introduces an arbitrary heuristic: MinGainThreshold.

I'll give an example to make the problem more clear. Let's say function foo has four candidate specializations with bonus 10, 15, 20, 25 and cost 5 (assume penalty is zero at this point), and function bar has four specializations with bonus 20, 25, 30, 35 and cost 20 (assume it would have been 5 without the penalty). The corresponding gains are 5, 10, 15, 20 for foo and 0, 5, 10, 15 for bar. With MaxClonesThreshold = 2 we reject the first two candidates of each function. With the new cost model the gains would be 5, 10, 15, 20 for foo (same as before) and 15, 20, 25, 30 for bar. With MinGainThreshold = 20 we reject the first three candidates of foo and the first candidate of bar. As a result we have the same number of specializations as before, but we have kept the most profitable ones. MinGainThreshold is a way to control the code size increase as well as the compilation times without having to sort all the specializations (of all functions, not per function). Its value was decided empirically as I explained in my previous comment. The existing cost model considers everything with gain above zero profitable. MinGainThreshold allows fine-tuning this value. If we prefered sorting all the specializations by gain we would then need to decide how many of them we are keeping based on some heuristic (percentage maybe?), which is more or less the same problem as deciding a value for MinGainThreshold: both are somewhat "arbitrary".

So why exactly do we specialise less?

We specialize less because the default value of MinGainThreshold has been set high.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
107	That's why I am mentioning percentages and not absolute values, but okay, I'll remove this comment.
117–119	The name `SpecializationInfo` is being used below: using SpecializationInfo = SmallMapVector<CallBase *, Specialization, 8>; I will change that one to`SpecializationMap` if that's okay.
314–315	Will change it to `Specializations`.
398	Will change it to `Specializations`
551	Maybe "arbitrary" was the wrong vocabulary here. Sorry. What I mean is that it is not fair to bias the calculation of the cost of a given function based on historical data as "how many specializations have happend so far". As I explained on the description, a potential specialization may never trigger even if it is more profitable from one that did, just because it was discovered first. I could separate this change to another patch if it makes the review easier.

labrinea mentioned this in D120753: [FuncSpec][NFC] Refactor internal structures..Mar 1 2022, 10:30 AM

SjoerdMeijer added inline comments.Mar 3 2022, 1:57 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
551	Ok, I see, I probably didn't get that idea, but it makes sense. I guess the general idea is to collect all candidates first, calculate a profitability, then select a few candidates. Maybe I am still expecting some sort of penalty the more gets specialised. But definitely, if you can split more things off, that would certainly help. This needs a rebase anyway I guess, and I need to look at this again after that.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 1:57 AM

Changes from last revision:

bring back the penalty in function cost estimation (will factor out in a separate review)
rebased

Harbormaster completed remote builds in B152457: Diff 412828.Mar 3 2022, 1:54 PM

ormris removed a subscriber: ormris.Mar 3 2022, 2:49 PM

samtebbs added a subscriber: samtebbs.Mar 9 2022, 2:19 AM

samtebbs added inline comments.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
421–423	The `ActualArgs` and `FormalArg` similarity is a little confusing IMO. It might be useful to call them `Arguments`/`Args` (what the values passed to a function are called) and `Parameters` respectively, as that would make the difference clear.

labrinea added inline comments.Mar 9 2022, 3:02 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
421–423	I think we've established the naming here: https://reviews.llvm.org/D119874 (see the definition of `ArgInfo`). Wikipedia seems to agree on this: https://en.wikipedia.org/wiki/Parameter_(computer_programming).

samtebbs added inline comments.Mar 10 2022, 2:21 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
421–423	Sure, perhaps other people would have had to look extra hard because of the double use of "Arguments" but if that is also an accepted form then that's OK.

Sorry for the delay. I need to look a bit more at this, but I added some first thoughts inline.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
87	A few comments about this: First, thanks for restoring the penalty, I like doing things one step at a time. :) Can you add comments what this is, e.g. why it is 10000, but more generally what the rationale is and what this achieves. Can you add how this interacts with other options, like MaxClonesThreshold.
426	I probably need to think a bit more about this, but perhaps you can help.... That is, now we are considering multiple arguments, and we calculate the bonus for each of them, and accumulate the bonus here. I am wondering if composition of the bonus is the right model. Alternatives are: take the average, take the minimum/maximum, sum them like you do here, or something else? Do you have any thoughts on this?
434	Can we just do a `return` after this, thus don't need the `else` and decrease indentation?

Hey Sjoerd, thanks for picking this up again. I do have an idea so please wait for my new revision. The idea is to be able to choose between:

a positive MinGainThreshold (same as this revision works with the only difference being that the value should be user defined - default will be zero)
a zero MinGainThreshold (that will be set as the default value) and then sort the Specializations based on Gain and reject "some" of them. I am now trying to come up with a formula for the amount of specializations to reject.

I will add comments about the interaction of options as you suggest.
Regarding the bonus calculation, I believe that accumulating the bonus of each argument is the right thing because each one of them contributes to the benefits of specialization (inlining etc). I also think that the more we complicate the model (measuring average,min,max) the slower the pass becomes.

labrinea added inline comments.Mar 10 2022, 5:41 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
434	We would like to print the debug info that follows. It's useful even when the cost model is disregarded.

Changes from last revision:

Zero is now the default value for MinGainThreshold
The Specializations are sorted by Gain and the lower half gets rejected

Harbormaster completed remote builds in B153571: Diff 414382.Mar 10 2022, 8:22 AM

jdoerfert added a subscriber: jdoerfert.Mar 10 2022, 8:28 AM

SjoerdMeijer added inline comments.Mar 11 2022, 12:46 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
321	This isn't clear to me, I don't see how we sort things. First, we add things to a map here....
334	... and then we iterate over half the elements in the map here. Does the sorting rely on the keys of the map? Not sure I find that the cleanest way of doing it, think it could be a one liner and doing a llvm::stable_sort?

labrinea added inline comments.Mar 11 2022, 1:24 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
334	The std::map quarantees ordering of elements based on key (which is the Gain here). Stable sort requires a container with random access iteration. SpecializationMap is SmallMapVector, which might have random access, but Map is SmallDenseMap (hash table). On top of that we have nested structures (a map containing maps) so we can't sort everything all together with stable sort, right?

I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;. The idea is to keep the compilation times down in case of a source file which results in many candidate specializations (I haven't seen any but you never know).

Here are some values of f(x) = log10(x^4)+1 rounded to integer:
f(1) =1
f(2) = 2
f(3) = 2
f(4) = 3
f(5) = 3
f(6) = 4
...
f(10) = 5
...
f(100) = 9
...
f(1000) = 13

Posting some statistics from running the llvm test suite at -O3 on AArch64:

test name	num specializations one arg	num specializations mult args (sorted: keep n/2)	num specializations mult args (sorted: keep log10(n^4)+1)
MultiSource/Applications/ClamAV/clamscan.test	3	3	4
MultiSource/Applications/d/make_dparser.test	1	0	1
MultiSource/Applications/oggenc/oggenc.test	2	1	2
MultiSource/Applications/sqlite3/sqlite3.test	3	3	4

In D119880#3375169, @labrinea wrote:

I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;.

Not sure if I miss anything, but I don't see this in the code. Also, why this formula, what is the rationale?

That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).

Regarding your previous question; in order to use stable sort we would need to flatten the nested structure of SmallDenseMap<Function *, SpecializationMap> into a wider SpecializationMap, which would contain specializations of several functions in one data structure. The problem is that calculateGains currently expects an empty SpecializationMap, which corresponds to a single function, hence it would require heavy adaptation. I can experiment and see if it's worth pursuing this idea (maybe in follow up patches?).

In D119880#3380014, @labrinea wrote:

That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).

Oh okay, got it.

Regarding your previous question; in order to use stable sort we would need to flatten the nested structure of SmallDenseMap<Function *, SpecializationMap> into a wider SpecializationMap, which would contain specializations of several functions in one data structure. The problem is that calculateGains currently expects an empty SpecializationMap, which corresponds to a single function, hence it would require heavy adaptation. I can experiment and see if it's worth pursuing this idea (maybe in follow up patches?).

Okay, cheers, will leave it up to you then. I expect both approaches to be very close in terms of compile times. Algorithmically it is the same I think. So it's more about readability of the code.

I am now going to read the whole patch again.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
118–119	Nit: we might as well drop the 8 here. From the programmers manual: In the absence of a well-motivated choice for the number of inlined elements N, it is recommended to use SmallVector<T> (that is, omitting the N).
321	If you keep this version, can you at least add that this is sorting the candidates using a map.

Sorry for the late reply.

From the implementation, the main change is the introduction of MinGainThreshold. It would sort the candidates and erase the late half of candidates if MinGainThreshold is zero, do I understand right? And if MinGainThreshold is not zero, all the candidates whose benefit is below it would be erased. The idea it self looks not bad to me. But I would like to see an upper bound for the number of specialized functions. There isn't one in current version, right? I feel good with the idea: auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1; or anything else. I am OK to implement this one in following patches. But we need one indeed.

Beyond this patch, I think the main concern is still about the cost model. All of us talked about in the revision is about a high level. We are talking about how to filter the candidates. But I think the key point or the problem might be the cost model it self. I mean, why the benefit or cost gets the number for a particular function and correspond constant argument. The original model is relatively simple and I think it has a large space to improve.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
103	We should add a link to Nikic's compile time tracker. I guess there might be readers who don't know it.
301–302	I agree that the choice of data structure here is not clean enough. From the implementation, I feel `Map` could be a sorted heap (or PriorityQueue). Here are my thoughts. llvm::PriorityQueue<std::pair<Function *, SpecializationMap>> Map; // We need a self-defined compare class here. // Maybe we need a new name. And we could remove `Sorted`. Then we could insert the valid pair to `Map`. Then we could keep it sorted. Then we could only pop the later half candidates. It is ok to use vector like container and use `heap` APIs in std to build the heap by hand. The benefit could be that we could access the container randomly. (So we could avoid pop iterators one by one)
314–315	We shouldn't insert `F` to `Map` directly. Since it is possible that this Specializations is not valid.
428–456	The 2 stage construction of `Specializations` looks not good. It would fill things in Specializations first and remove the unwanted things. It is unnecessarily wasteful. We should check before inserting.

labrinea added inline comments.Mar 15 2022, 3:31 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
103	Will do.
118–119	I tried to use realistic numbers in all the Small ADTs in this patch. I don't think we'll ever get to a specialization with more than 8 arguments.
301–302	Thanks for the input. I have a working revision which does what I suggested in my last comment to Sjoerd - a flattened SpecializationMap that contains entries for multiple functions, which I am then sorting with stable sort. I'll upload it soon.
314–315	Good catch, in an improved version of this revision I removed the entry inside the following block `if (Specializations.empty()) {`. But as I said I will upload a different approach.
428–456	I am not sure if that's possible as we only know if the Gain is above the threshold after we've accumulated the bonus from each argument.

ChuanqiXu added inline comments.Mar 15 2022, 3:57 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128	It it worth to give a new name. The current name looks not straight.
428–456	It should be possible to implement if we would like to add some new local variables. There might be some extra copies. But one the one hand, we get better readability. On the other hand, the extra copies might be significant I think.

labrinea added inline comments.Mar 15 2022, 8:30 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128	I can't think of a better name. Any ideas?
428–456	I just read the implementation of `erase()` in MapVector and it's linear to the number of entries :( However it should be possible to use `remove_if()` instead, which is also linear, but it can erase multiple elements in a single pass.

I decided to keep this patch as close as possible to the original implementation, leaving the improvements to the cost model for later patches. That said, MinGainThreshold is withdrawn, so as the sorting of specializations across multiple functions.

Harbormaster completed remote builds in B154359: Diff 415477.Mar 15 2022, 11:04 AM

labrinea added inline comments.Mar 16 2022, 3:47 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
469–470	This looks suboptimal. Will try and write an erase method for MapVector which takes an iterator range.

Just found another show stopper. We can't use stable_sort on MapVector because the underlying DenseMap which holds the vector indices will stay outdated :/

labrinea updated this revision to Diff 417350.Mar 22 2022, 11:21 AM

labrinea edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B155664: Diff 417350.Mar 22 2022, 12:39 PM

labrinea edited the summary of this revision. (Show Details)Mar 22 2022, 12:59 PM

ChuanqiXu added inline comments.Mar 22 2022, 8:48 PM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128	Oh, I am not good at naming too. But I am sure the name is relatively bad. For example, the original `ConstList` is `SmallVector<Constant>`, which is pretty makes sense. But now it is `SmallVector<std::pair<CallBase , Constant *>>` and the name is still `ConstList`... It has the same problem with `SpecializationList` and `SpecializationMap`... @SjoerdMeijer do you have any suggestion?
308	We could submit such changes standalone as a NFC patch.
329	I think this change is not necessary. And we should keep the original `Changed` variable. Given `specializeFunctions` is called multiple times, `NbFunctionsSpecialized` might be bigger than 0 all the time. So the return value of specializeFunctions wouldn't be right.
333–335	Unnecessary change.
461–466	Could we sort directly on MapVector? I see MapVector implement both `swap` and `operator[]`. So it looks possible to sort directly for MapVector. Then could we remove `WorkList`?

labrinea added inline comments.Mar 23 2022, 1:21 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128	Side note: It would be nice if I could at least define `SpecializationList` (or whatever we decide to call it) as SpecializationMap::VectorType, but unfortunately VectorType is private for MapVector.
329	Good point, I forgot `specializeFunctions` is being called inside a loop without reseting `NbFunctionsSpecialized` first. I'll fix this.
333–335	Clang format made this.
461–466	We could directly sort on MapVector, yes, but it's the same thing. I think we shouldn't get rid of the `Worklist` because if we use the MapVector after this point the indices are outdated. Therefore if we try to use the [] operator at any point, the value it returns will be wrong. This is hard to observe and we could end up dealing with issues that go unnoticed. Also we won't be able to erase the excess specializations at the end of the list. If https://reviews.llvm.org/D121817 gets accepted we might be able to do so, but still doing it directly on a vector is cheaper.

ChuanqiXu added inline comments.Mar 23 2022, 1:38 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
333–335	We could format only for diffs. This is what we generally do.
461–466	To get D121817 accepted, I think you need to comment on that thread. I think there is no block comments now. For the problem of sorting MapVector, I think add a member function `sort()` to MapVector is accepted solution. In my imagination, I feel it wouldn't be hard to implement. But from the context of the current patch, I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I guess this is due to the patch get split. So I think we could not solve the problem for MapVector now. We could solve it after we need.

labrinea added inline comments.Mar 23 2022, 2:12 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
333–335	Alright.
461–466	As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary. I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I am not following. Could you please explain?

ChuanqiXu added inline comments.Mar 23 2022, 2:21 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
461–466	As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary. I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I am not following. Could you please explain? Oh, I guess I took an oversight. I missed you've updated the code after you said you can't sort on MapVector. Now it looks good to me.

Changes in this revision:

separated unrelated clang-format changes to NFC patch
renamed the data types for the ADT
added an explanation comment in the new test file
rebased

Harbormaster completed remote builds in B156017: Diff 417867.Mar 24 2022, 4:20 AM

LGTM basically. Please wait for a few days in case there are other comments.

llvm/lib/Transforms/Utils/SCCPSolver.cpp
542	Could we use `Iter->Formal`?

This revision is now accepted and ready to land.Mar 24 2022, 7:15 PM

It's looking good to me too, just a last round of nits (inlined) and one question just to double check: is the SPEC score the same? I.e., do we still specialise what we want to specialise?

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128	Sorry, bikeshedding names: if we group the caller (CallBase) and a constant actual argument, is `ActualArgPair` an accurate description? We don't group 2 actual args, which is what this name is suggesting to me.
429	Nit: can `0 - Cost` just be `-Cost`?
438	Another nit: if the gain is 0, then arguably it is not a gain and we increase code-size?
llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll
9	Perhaps you want to turn this into a test, i.e. match the debug output too? If so, that would require asserts. Not sure, but probably best done as an additional, separate test, otherwise we might loose the testing of this for the non-assert builds and bots.

labrinea added inline comments.Mar 25 2022, 12:08 PM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128	I agree it's not a good type name. Any ideas? Maybe it's not really necessary do define one.
429	We can't because the InstructionCost class does not implement the `-` operator.
438	We are doing the same as before (see line 423 on the left) : we reject specializations with Gain that is less or equal to zero. How does this increase code size?
llvm/lib/Transforms/Utils/SCCPSolver.cpp
542	possibly
llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll
9	This here is a comment; it does not contain check lines. The whole purpose is to show that specializations do get sorted and iterated correctly. I wouldn't mind adding test for debug output in a follow up patch. I couldn't find examples; not sure how to do them. Any pointers appreciated.

This revision was landed with ongoing or failed builds.Mar 28 2022, 4:08 AM

Closed by commit rG8045bf9d0dc5: [FuncSpec] Support function specialization across multiple arguments. (authored by labrinea). · Explain Why

This revision was automatically updated to reflect the committed changes.

labrinea added a commit: rG8045bf9d0dc5: [FuncSpec] Support function specialization across multiple arguments..

labrinea mentioned this in D121817: [llvm][ADT] Add a method to MapVector for erasing a range of elements..Mar 29 2022, 8:37 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

SCCPSolver.h

15 lines

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

174 lines

Utils/

SCCPSolver.cpp

23 lines

test/

Transforms/

FunctionSpecialization/

function-specialization4.ll

4 lines

specialize-multiple-arguments.ll

185 lines

Diff 418538

llvm/include/llvm/Transforms/Utils/SCCPSolver.h

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	public:

/// Helper to return a Constant if \p LV is either a constant or a constant		/// Helper to return a Constant if \p LV is either a constant or a constant
/// range with a single element.		/// range with a single element.
Constant *getConstant(const ValueLatticeElement &LV) const;		Constant *getConstant(const ValueLatticeElement &LV) const;

/// Return a reference to the set of argument tracked functions.		/// Return a reference to the set of argument tracked functions.
SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions();		SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions();

/// Mark the constant argument of a new function specialization. \p F points		/// Mark the constant arguments of a new function specialization. \p F points
/// to the cloned function and \p Arg represents the constant argument as a		/// to the cloned function and \p Args contains a list of constant arguments
/// pair of {formal,actual} values (the formal argument is associated with the		/// represented as pairs of {formal,actual} values (the formal argument is
/// original function definition). All other arguments of the specialization		/// associated with the original function definition). All other arguments of
/// inherit the lattice state of their corresponding values in the original		/// the specialization inherit the lattice state of their corresponding values
/// function.		/// in the original function.
void markArgInFuncSpecialization(Function *F, const ArgInfo &Arg);		void markArgInFuncSpecialization(Function *F,
		const SmallVectorImpl<ArgInfo> &Args);

/// Mark all of the blocks in function \p F non-executable. Clients can used		/// Mark all of the blocks in function \p F non-executable. Clients can used
/// this method to erase a function from the module (e.g., if it has been		/// this method to erase a function from the module (e.g., if it has been
/// completely specialized and is no longer needed).		/// completely specialized and is no longer needed).
void markFunctionUnreachable(Function *F);		void markFunctionUnreachable(Function *F);

void visit(Instruction *I);		void visit(Instruction *I);
void visitCall(CallInst &I);		void visitCall(CallInst &I);
};		};

} // namespace llvm		} // namespace llvm

#endif // LLVM_TRANSFORMS_UTILS_SCCPSOLVER_H		#endif // LLVM_TRANSFORMS_UTILS_SCCPSOLVER_H

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> FuncSpecializationMaxIters(
cl::init(1));		cl::init(1));

static cl::opt<unsigned> MaxClonesThreshold(		static cl::opt<unsigned> MaxClonesThreshold(
"func-specialization-max-clones", cl::Hidden,		"func-specialization-max-clones", cl::Hidden,
cl::desc("The maximum number of clones allowed for a single function "		cl::desc("The maximum number of clones allowed for a single function "
"specialization"),		"specialization"),
cl::init(3));		cl::init(3));

static cl::opt<unsigned> SmallFunctionThreshold(		static cl::opt<unsigned> SmallFunctionThreshold(
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions A few comments about this: First, thanks for restoring the penalty, I like doing things one step at a time. :) Can you add comments what this is, e.g. why it is 10000, but more generally what the rationale is and what this achieves. Can you add how this interacts with other options, like MaxClonesThreshold. SjoerdMeijer: A few comments about this: - First, thanks for restoring the penalty, I like doing things one…
"func-specialization-size-threshold", cl::Hidden,		"func-specialization-size-threshold", cl::Hidden,
cl::desc("Don't specialize functions that have less than this theshold "		cl::desc("Don't specialize functions that have less than this theshold "
"number of instructions"),		"number of instructions"),
cl::init(100));		cl::init(100));

static cl::opt<unsigned>		static cl::opt<unsigned>
AvgLoopIterationCount("func-specialization-avg-iters-cost", cl::Hidden,		AvgLoopIterationCount("func-specialization-avg-iters-cost", cl::Hidden,
cl::desc("Average loop iteration count cost"),		cl::desc("Average loop iteration count cost"),
cl::init(10));		cl::init(10));

static cl::opt<bool> SpecializeOnAddresses(		static cl::opt<bool> SpecializeOnAddresses(
"func-specialization-on-address", cl::init(false), cl::Hidden,		"func-specialization-on-address", cl::init(false), cl::Hidden,
cl::desc("Enable function specialization on the address of global values"));		cl::desc("Enable function specialization on the address of global values"));

// TODO: This needs checking to see the impact on compile-times, which is why		// Disabled by default as it can significantly increase compilation times.
// this is off by default for now.		// Running nikic's compile time tracker on x86 with instruction count as the
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We should add a link to Nikic's compile time tracker. I guess there might be readers who don't know it. ChuanqiXu: We should add a link to Nikic's compile time tracker. I guess there might be readers who don't…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Will do. labrinea: Will do.
		// metric shows 3-4% regression for SPASS while being neutral for all other
		// benchmarks of the llvm test suite.
		//
		// https://llvm-compile-time-tracker.com
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Like I mentioned before, not sure what the value is of recording compile time numbers here as things are still in flux; i.e. I don't think this information will age very well. SjoerdMeijer: Like I mentioned before, not sure what the value is of recording compile time numbers here as…
		labrineaAuthorUnsubmitted Done Reply Inline Actions That's why I am mentioning percentages and not absolute values, but okay, I'll remove this comment. labrinea: That's why I am mentioning percentages and not absolute values, but okay, I'll remove this…
		// https://github.com/nikic/llvm-compile-time-tracker
static cl::opt<bool> EnableSpecializationForLiteralConstant(		static cl::opt<bool> EnableSpecializationForLiteralConstant(
"function-specialization-for-literal-constant", cl::init(false), cl::Hidden,		"function-specialization-for-literal-constant", cl::init(false), cl::Hidden,
cl::desc("Enable specialization of functions that take a literal constant "		cl::desc("Enable specialization of functions that take a literal constant "
"as an argument."));		"as an argument."));

namespace {		namespace {
// Bookkeeping struct to pass data from the analysis and profitability phase		// Bookkeeping struct to pass data from the analysis and profitability phase
// to the actual transform helper functions.		// to the actual transform helper functions.
struct SpecializationInfo {		struct SpecializationInfo {
ArgInfo Arg; // Stores the {formal,actual} argument pair.		SmallVector<ArgInfo, 8> Args; // Stores the {formal,actual} argument pairs.
InstructionCost Gain; // Profitability: Gain = Bonus - Cost.		InstructionCost Gain; // Profitability: Gain = Bonus - Cost.
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: perhaps `SpecializationInfo`, to be a tiny bit more consistent (with ArgInfo) and descriptive. SjoerdMeijer: Nit: perhaps `SpecializationInfo`, to be a tiny bit more consistent (with ArgInfo) and…
		labrineaAuthorUnsubmitted Done Reply Inline Actions The name `SpecializationInfo` is being used below: using SpecializationInfo = SmallMapVector<CallBase , Specialization, 8>; I will change that one to`SpecializationMap` if that's okay. labrinea:* The name `SpecializationInfo` is being used below: ``` using SpecializationInfo =…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: we might as well drop the 8 here. From the programmers manual: In the absence of a well-motivated choice for the number of inlined elements N, it is recommended to use SmallVector<T> (that is, omitting the N). SjoerdMeijer: Nit: we might as well drop the 8 here. From the programmers manual: > In the absence of a well…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I tried to use realistic numbers in all the Small ADTs in this patch. I don't think we'll ever get to a specialization with more than 8 arguments. labrinea: I tried to use realistic numbers in all the Small ADTs in this patch. I don't think we'll ever…

SpecializationInfo(Argument A, Constant C, InstructionCost G)
: Arg(A, C), Gain(G){};
};		};
} // Anonymous namespace		} // Anonymous namespace

using FuncList = SmallVectorImpl<Function *>;		using FuncList = SmallVectorImpl<Function *>;
using ConstList = SmallVector<Constant *>;		using CallArgBinding = std::pair<CallBase , Constant >;
using SpecializationList = SmallVector<SpecializationInfo>;		using CallSpecBinding = std::pair<CallBase *, SpecializationInfo>;
		// We are using MapVector because it guarantees deterministic iteration
		// order across executions.
		using SpecializationMap = SmallMapVector<CallBase *, SpecializationInfo, 8>;
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions It it worth to give a new name. The current name looks not straight. ChuanqiXu: It it worth to give a new name. The current name looks not straight.
		labrineaAuthorUnsubmitted Done Reply Inline Actions I can't think of a better name. Any ideas? labrinea: I can't think of a better name. Any ideas?
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Oh, I am not good at naming too. But I am sure the name is relatively bad. For example, the original `ConstList` is `SmallVector<Constant>`, which is pretty makes sense. But now it is `SmallVector<std::pair<CallBase , Constant >>` and the name is still `ConstList`... It has the same problem with `SpecializationList` and `SpecializationMap`... @SjoerdMeijer do you have any suggestion? ChuanqiXu:* Oh, I am not good at naming too. But I am sure the name is relatively bad. For example, the…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Side note: It would be nice if I could at least define `SpecializationList` (or whatever we decide to call it) as SpecializationMap::VectorType, but unfortunately VectorType is private for MapVector. labrinea: Side note: It would be nice if I could at least define `SpecializationList` (or whatever we…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Sorry, bikeshedding names: if we group the caller (CallBase) and a constant actual argument, is `ActualArgPair` an accurate description? We don't group 2 actual args, which is what this name is suggesting to me. SjoerdMeijer: Sorry, bikeshedding names: if we group the caller (CallBase) and a constant actual argument, is…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I agree it's not a good type name. Any ideas? Maybe it's not really necessary do define one. labrinea: I agree it's not a good type name. Any ideas? Maybe it's not really necessary do define one.

// Helper to check if \p LV is either a constant or a constant		// Helper to check if \p LV is either a constant or a constant
// range with a single element. This should cover exactly the same cases as the		// range with a single element. This should cover exactly the same cases as the
// old ValueLatticeElement::isConstant() and is intended to be used in the		// old ValueLatticeElement::isConstant() and is intended to be used in the
// transition to ValueLatticeElement.		// transition to ValueLatticeElement.
static bool isConstant(const ValueLatticeElement &LV) {		static bool isConstant(const ValueLatticeElement &LV) {
return LV.isConstant() \|\|		return LV.isConstant() \|\|
(LV.isConstantRange() && LV.getConstantRange().isSingleElement());		(LV.isConstantRange() && LV.getConstantRange().isSingleElement());
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	public:
}		}

/// Attempt to specialize functions in the module to enable constant		/// Attempt to specialize functions in the module to enable constant
/// propagation across function boundaries.		/// propagation across function boundaries.
///		///
/// \returns true if at least one function is specialized.		/// \returns true if at least one function is specialized.
bool specializeFunctions(FuncList &Candidates, FuncList &WorkList) {		bool specializeFunctions(FuncList &Candidates, FuncList &WorkList) {
bool Changed = false;		bool Changed = false;
for (auto *F : Candidates) {		for (auto *F : Candidates) {
if (!isCandidateFunction(F))		if (!isCandidateFunction(F))
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions I agree that the choice of data structure here is not clean enough. From the implementation, I feel `Map` could be a sorted heap (or PriorityQueue). Here are my thoughts. llvm::PriorityQueue<std::pair<Function , SpecializationMap>> Map; // We need a self-defined compare class here. // Maybe we need a new name. And we could remove `Sorted`. Then we could insert the valid pair to `Map`. Then we could keep it sorted. Then we could only pop the later half candidates. It is ok to use vector like container and use `heap` APIs in std to build the heap by hand. The benefit could be that we could access the container randomly. (So we could avoid pop iterators one by one) ChuanqiXu:* I agree that the choice of data structure here is not clean enough. From the implementation, I…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Thanks for the input. I have a working revision which does what I suggested in my last comment to Sjoerd - a flattened SpecializationMap that contains entries for multiple functions, which I am then sorting with stable sort. I'll upload it soon. labrinea: Thanks for the input. I have a working revision which does what I suggested in my last comment…
continue;		continue;

auto Cost = getSpecializationCost(F);		auto Cost = getSpecializationCost(F);
if (!Cost.isValid()) {		if (!Cost.isValid()) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "FnSpecialization: Invalid specialization cost.\n");		dbgs() << "FnSpecialization: Invalid specialization cost.\n");
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We could submit such changes standalone as a NFC patch. ChuanqiXu: We could submit such changes standalone as a NFC patch.
continue;		continue;
}		}

LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for "		LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for "
<< F->getName() << " is " << Cost << "\n");		<< F->getName() << " is " << Cost << "\n");

SpecializationList Specializations;		SmallVector<CallSpecBinding, 8> Specializations;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: `Info` could be more descriptive. SjoerdMeijer: Nit: `Info` could be more descriptive.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Will change it to `Specializations`. labrinea: Will change it to `Specializations`.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We shouldn't insert `F` to `Map` directly. Since it is possible that this Specializations is not valid. ChuanqiXu: We shouldn't insert `F` to `Map` directly. Since it is possible that this Specializations is…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Good catch, in an improved version of this revision I removed the entry inside the following block `if (Specializations.empty()) {`. But as I said I will upload a different approach. labrinea: Good catch, in an improved version of this revision I removed the entry inside the following…
calculateGains(F, Cost, Specializations);		if (!calculateGains(F, Cost, Specializations)) {
if (Specializations.empty()) {		LLVM_DEBUG(dbgs() << "FnSpecialization: No possible constants found\n");
LLVM_DEBUG(dbgs() << "FnSpecialization: no possible constants found\n");
continue;		continue;
}		}

for (SpecializationInfo &S : Specializations) {
specializeFunction(F, S, WorkList);
Changed = true;		Changed = true;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions This isn't clear to me, I don't see how we sort things. First, we add things to a map here.... SjoerdMeijer: This isn't clear to me, I don't see how we sort things. First, we add things to a map here....
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions If you keep this version, can you at least add that this is sorting the candidates using a map. SjoerdMeijer: If you keep this version, can you at least add that this is sorting the candidates using a map.
}		for (auto &Entry : Specializations)
		specializeFunction(F, Entry.second, WorkList);
}		}

updateSpecializedFuncs(Candidates, WorkList);		updateSpecializedFuncs(Candidates, WorkList);
NumFuncSpecialized += NbFunctionsSpecialized;		NumFuncSpecialized += NbFunctionsSpecialized;
return Changed;		return Changed;
}		}
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions I think this change is not necessary. And we should keep the original `Changed` variable. Given `specializeFunctions` is called multiple times, `NbFunctionsSpecialized` might be bigger than 0 all the time. So the return value of specializeFunctions wouldn't be right. ChuanqiXu: I think this change is not necessary. And we should keep the original `Changed` variable. Given…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Good point, I forgot `specializeFunctions` is being called inside a loop without reseting `NbFunctionsSpecialized` first. I'll fix this. labrinea: Good point, I forgot `specializeFunctions` is being called inside a loop without reseting…

void removeDeadInstructions() {		void removeDeadInstructions() {
for (auto *I : ReplacedWithConstant) {		for (auto *I : ReplacedWithConstant) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead instruction " << *I		LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead instruction " << *I
<< "\n");		<< "\n");
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions ... and then we iterate over half the elements in the map here. Does the sorting rely on the keys of the map? Not sure I find that the cleanest way of doing it, think it could be a one liner and doing a llvm::stable_sort? SjoerdMeijer: ... and then we iterate over half the elements in the map here. Does the sorting rely on the…
		labrineaAuthorUnsubmitted Done Reply Inline Actions The std::map quarantees ordering of elements based on key (which is the Gain here). Stable sort requires a container with random access iteration. SpecializationMap is SmallMapVector, which might have random access, but Map is SmallDenseMap (hash table). On top of that we have nested structures (a map containing maps) so we can't sort everything all together with stable sort, right? labrinea: The std::map quarantees ordering of elements based on key (which is the Gain here). Stable sort…
I->eraseFromParent();		I->eraseFromParent();
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Unnecessary change. ChuanqiXu: Unnecessary change.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Clang format made this. labrinea: Clang format made this.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We could format only for diffs. This is what we generally do. ChuanqiXu: We could format only for diffs. This is what we generally do.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Alright. labrinea: Alright.
}		}
ReplacedWithConstant.clear();		ReplacedWithConstant.clear();
}		}

void removeDeadFunctions() {		void removeDeadFunctions() {
for (auto *F : FullySpecialized) {		for (auto *F : FullySpecialized) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead function "		LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead function "
<< F->getName() << "\n");		<< F->getName() << "\n");
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	private:
/// Clone the function \p F and remove the ssa_copy intrinsics added by		/// Clone the function \p F and remove the ssa_copy intrinsics added by
/// the SCCPSolver in the cloned version.		/// the SCCPSolver in the cloned version.
Function cloneCandidateFunction(Function F, ValueToValueMapTy &Mappings) {		Function cloneCandidateFunction(Function F, ValueToValueMapTy &Mappings) {
Function *Clone = CloneFunction(F, Mappings);		Function *Clone = CloneFunction(F, Mappings);
removeSSACopy(*Clone);		removeSSACopy(*Clone);
return Clone;		return Clone;
}		}

/// This function decides whether it's worthwhile to specialize function \p F		/// This function decides whether it's worthwhile to specialize function
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: `Info` could be more descriptive. SjoerdMeijer: Nit: `Info` could be more descriptive.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Will change it to `Specializations` labrinea: Will change it to `Specializations`
/// based on the known constant values its arguments can take on, i.e. it		/// \p F based on the known constant values its arguments can take on. It
/// calculates a gain and returns a list of actual arguments that are deemed		/// only discovers potential specialization opportunities without actually
/// profitable to specialize. Specialization is performed on the first		/// applying them.
/// interesting argument. Specializations based on additional arguments will		///
/// be evaluated on following iterations of the main IPSCCP solve loop.		/// \returns true if any specializations have been found.
void calculateGains(Function *F, InstructionCost Cost,		bool calculateGains(Function *F, InstructionCost Cost,
SpecializationList &WorkList) {		SmallVectorImpl<CallSpecBinding> &WorkList) {
		SpecializationMap Specializations;
// Determine if we should specialize the function based on the values the		// Determine if we should specialize the function based on the values the
// argument can take on. If specialization is not profitable, we continue		// argument can take on. If specialization is not profitable, we continue
// on to the next argument.		// on to the next argument.
for (Argument &FormalArg : F->args()) {		for (Argument &FormalArg : F->args()) {
// Determine if this argument is interesting. If we know the argument can		// Determine if this argument is interesting. If we know the argument can
// take on any constant values, they are collected in Constants.		// take on any constant values, they are collected in Constants.
ConstList ActualArgs;		SmallVector<CallArgBinding, 8> ActualArgs;
if (!isArgumentInteresting(&FormalArg, ActualArgs)) {		if (!isArgumentInteresting(&FormalArg, ActualArgs)) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Argument "		LLVM_DEBUG(dbgs() << "FnSpecialization: Argument "
<< FormalArg.getNameOrAsOperand()		<< FormalArg.getNameOrAsOperand()
<< " is not interesting\n");		<< " is not interesting\n");
continue;		continue;
}		}

for (auto *ActualArg : ActualArgs) {		for (const auto &Entry : ActualArgs) {
InstructionCost Gain =		CallBase *Call = Entry.first;
ForceFunctionSpecialization		Constant *ActualArg = Entry.second;
		samtebbsUnsubmitted Not Done Reply Inline Actions The `ActualArgs` and `FormalArg` similarity is a little confusing IMO. It might be useful to call them `Arguments`/`Args` (what the values passed to a function are called) and `Parameters` respectively, as that would make the difference clear. samtebbs: The `ActualArgs` and `FormalArg` similarity is a little confusing IMO. It might be useful to…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I think we've established the naming here: https://reviews.llvm.org/D119874 (see the definition of `ArgInfo`). Wikipedia seems to agree on this: https://en.wikipedia.org/wiki/Parameter_(computer_programming). labrinea: I think we've established the naming here: https://reviews.llvm.org/D119874 (see the definition…
		samtebbsUnsubmitted Not Done Reply Inline Actions Sure, perhaps other people would have had to look extra hard because of the double use of "Arguments" but if that is also an accepted form then that's OK. samtebbs: Sure, perhaps other people would have had to look extra hard because of the double use of…
? 1
: getSpecializationBonus(&FormalArg, ActualArg) - Cost;

if (Gain <= 0)		auto I = Specializations.insert({Call, SpecializationInfo()});
continue;		SpecializationInfo &S = I.first->second;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions I probably need to think a bit more about this, but perhaps you can help.... That is, now we are considering multiple arguments, and we calculate the bonus for each of them, and accumulate the bonus here. I am wondering if composition of the bonus is the right model. Alternatives are: take the average, take the minimum/maximum, sum them like you do here, or something else? Do you have any thoughts on this? SjoerdMeijer: I probably need to think a bit more about this, but perhaps you can help.... That is, now we…
WorkList.push_back({&FormalArg, ActualArg, Gain});
		if (I.second)
		S.Gain = ForceFunctionSpecialization ? 1 : 0 - Cost;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: can `0 - Cost` just be `-Cost`? SjoerdMeijer: Nit: can `0 - Cost` just be `-Cost`?
		labrineaAuthorUnsubmitted Done Reply Inline Actions We can't because the InstructionCost class does not implement the `-` operator. labrinea: We can't because the InstructionCost class does not implement the `-` operator.
		if (!ForceFunctionSpecialization)
		S.Gain += getSpecializationBonus(&FormalArg, ActualArg);
		S.Args.push_back({&FormalArg, ActualArg});
		}
}		}
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Can we just do a `return` after this, thus don't need the `else` and decrease indentation? SjoerdMeijer: Can we just do a `return` after this, thus don't need the `else` and decrease indentation?
		labrineaAuthorUnsubmitted Done Reply Inline Actions We would like to print the debug info that follows. It's useful even when the cost model is disregarded. labrinea: We would like to print the debug info that follows. It's useful even when the cost model is…

if (WorkList.empty())		// Remove unprofitable specializations.
continue;		Specializations.remove_if(
		[](const auto &Entry) { return Entry.second.Gain <= 0; });
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Another nit: if the gain is 0, then arguably it is not a gain and we increase code-size? SjoerdMeijer: Another nit: if the gain is 0, then arguably it is not a gain and we increase code-size?
		labrineaAuthorUnsubmitted Done Reply Inline Actions We are doing the same as before (see line 423 on the left) : we reject specializations with Gain that is less or equal to zero. How does this increase code size? labrinea: We are doing the same as before (see line 423 on the left) : we reject specializations with…

		// Clear the MapVector and return the underlying vector.
		WorkList = Specializations.takeVector();

// Sort the candidates in descending order.		// Sort the candidates in descending order.
llvm::stable_sort(WorkList, [](const SpecializationInfo &L,		llvm::stable_sort(WorkList, [](const auto &L, const auto &R) {
const SpecializationInfo &R) {		return L.second.Gain > R.second.Gain;
return L.Gain > R.Gain;
});		});

// Truncate the worklist to 'MaxClonesThreshold' candidates if		// Truncate the worklist to 'MaxClonesThreshold' candidates if necessary.
// necessary.
if (WorkList.size() > MaxClonesThreshold) {		if (WorkList.size() > MaxClonesThreshold) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Number of candidates exceed "		LLVM_DEBUG(dbgs() << "FnSpecialization: Number of candidates exceed "
<< "the maximum number of clones threshold.\n"		<< "the maximum number of clones threshold.\n"
<< "FnSpecialization: Truncating worklist to "		<< "FnSpecialization: Truncating worklist to "
<< MaxClonesThreshold << " candidates.\n");		<< MaxClonesThreshold << " candidates.\n");
WorkList.erase(WorkList.begin() + MaxClonesThreshold, WorkList.end());		WorkList.erase(WorkList.begin() + MaxClonesThreshold, WorkList.end());
}		}

		ChuanqiXuUnsubmitted Not Done Reply Inline Actions The 2 stage construction of `Specializations` looks not good. It would fill things in Specializations first and remove the unwanted things. It is unnecessarily wasteful. We should check before inserting. ChuanqiXu: The 2 stage construction of `Specializations` looks not good. It would fill things in…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I am not sure if that's possible as we only know if the Gain is above the threshold after we've accumulated the bonus from each argument. labrinea: I am not sure if that's possible as we only know if the Gain is above the threshold after we've…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions It should be possible to implement if we would like to add some new local variables. There might be some extra copies. But one the one hand, we get better readability. On the other hand, the extra copies might be significant I think. ChuanqiXu: It should be possible to implement if we would like to add some new local variables. There…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I just read the implementation of `erase()` in MapVector and it's linear to the number of entries :( However it should be possible to use `remove_if()` instead, which is also linear, but it can erase multiple elements in a single pass. labrinea: I just read the implementation of `erase()` in MapVector and it's linear to the number of…
LLVM_DEBUG(dbgs() << "FnSpecialization: Specializations for function "		LLVM_DEBUG(dbgs() << "FnSpecialization: Specializations for function "
<< F->getName() << "\n";		<< F->getName() << "\n";
for (SpecializationInfo &S		for (const auto &Entry
: WorkList) {		: WorkList) {
		dbgs() << "FnSpecialization: Gain = " << Entry.second.Gain
		<< "\n";
		for (const ArgInfo &Arg : Entry.second.Args)
dbgs() << "FnSpecialization: FormalArg = "		dbgs() << "FnSpecialization: FormalArg = "
<< S.Arg.Formal->getNameOrAsOperand()		<< Arg.Formal->getNameOrAsOperand()
<< ", ActualArg = "		<< ", ActualArg = "
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Could we sort directly on MapVector? I see MapVector implement both `swap` and `operator[]`. So it looks possible to sort directly for MapVector. Then could we remove `WorkList`? ChuanqiXu: Could we sort directly on MapVector? I see MapVector implement both `swap` and `operator[]`. So…
		labrineaAuthorUnsubmitted Done Reply Inline Actions We could directly sort on MapVector, yes, but it's the same thing. I think we shouldn't get rid of the `Worklist` because if we use the MapVector after this point the indices are outdated. Therefore if we try to use the [] operator at any point, the value it returns will be wrong. This is hard to observe and we could end up dealing with issues that go unnoticed. Also we won't be able to erase the excess specializations at the end of the list. If https://reviews.llvm.org/D121817 gets accepted we might be able to do so, but still doing it directly on a vector is cheaper. labrinea: We could directly sort on MapVector, yes, but it's the same thing. I think we shouldn't get rid…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions To get D121817 accepted, I think you need to comment on that thread. I think there is no block comments now. For the problem of sorting MapVector, I think add a member function `sort()` to MapVector is accepted solution. In my imagination, I feel it wouldn't be hard to implement. But from the context of the current patch, I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I guess this is due to the patch get split. So I think we could not solve the problem for MapVector now. We could solve it after we need. ChuanqiXu: To get D121817 accepted, I think you need to comment on that thread. I think there is no block…
		labrineaAuthorUnsubmitted Done Reply Inline Actions As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary. I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I am not following. Could you please explain? labrinea: As I have explained on the description we do need an associative container to bind all the…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary. I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I am not following. Could you please explain? Oh, I guess I took an oversight. I missed you've updated the code after you said you can't sort on MapVector. Now it looks good to me. ChuanqiXu: > As I have explained on the description we do need an associative container to bind all the…
<< S.Arg.Actual->getNameOrAsOperand()		<< Arg.Actual->getNameOrAsOperand() << "\n";
<< ", Gain = " << S.Gain << "\n";
});		});

// FIXME: Only one argument per function.		return !WorkList.empty();
		labrineaAuthorUnsubmitted Done Reply Inline Actions This looks suboptimal. Will try and write an erase method for MapVector which takes an iterator range. labrinea: This looks suboptimal. Will try and write an erase method for MapVector which takes an iterator…
break;
}
}		}

bool isCandidateFunction(Function *F) {		bool isCandidateFunction(Function *F) {
// Do not specialize the cloned function again.		// Do not specialize the cloned function again.
if (SpecializedFuncs.contains(F))		if (SpecializedFuncs.contains(F))
return false;		return false;

// If we're optimizing the function for size, we shouldn't specialize it.		// If we're optimizing the function for size, we shouldn't specialize it.
Show All 16 Lines	private:
}		}

void specializeFunction(Function *F, SpecializationInfo &S,		void specializeFunction(Function *F, SpecializationInfo &S,
FuncList &WorkList) {		FuncList &WorkList) {
ValueToValueMapTy Mappings;		ValueToValueMapTy Mappings;
Function *Clone = cloneCandidateFunction(F, Mappings);		Function *Clone = cloneCandidateFunction(F, Mappings);

// Rewrite calls to the function so that they call the clone instead.		// Rewrite calls to the function so that they call the clone instead.
rewriteCallSites(Clone, S.Arg, Mappings);		rewriteCallSites(Clone, S.Args, Mappings);

// Initialize the lattice state of the arguments of the function clone,		// Initialize the lattice state of the arguments of the function clone,
// marking the argument on which we specialized the function constant		// marking the argument on which we specialized the function constant
// with the given value.		// with the given value.
Solver.markArgInFuncSpecialization(Clone, S.Arg);		Solver.markArgInFuncSpecialization(Clone, S.Args);

// Mark all the specialized functions		// Mark all the specialized functions
WorkList.push_back(Clone);		WorkList.push_back(Clone);
NbFunctionsSpecialized++;		NbFunctionsSpecialized++;

// If the function has been completely specialized, the original function		// If the function has been completely specialized, the original function
// is no longer needed. Mark it unreachable.		// is no longer needed. Mark it unreachable.
if (F->getNumUses() == 0 \|\| all_of(F->users(), [F](User *U) {		if (F->getNumUses() == 0 \|\| all_of(F->users(), [F](User *U) {
Show All 26 Lines	if (Metrics.notDuplicatable \|\|
C.setInvalid();		C.setInvalid();
return C;		return C;
}		}

// Otherwise, set the specialization cost to be the cost of all the		// Otherwise, set the specialization cost to be the cost of all the
// instructions in the function and penalty for specializing more functions.		// instructions in the function and penalty for specializing more functions.
unsigned Penalty = NbFunctionsSpecialized + 1;		unsigned Penalty = NbFunctionsSpecialized + 1;
return Metrics.NumInsts * InlineConstants::InstrCost * Penalty;		return Metrics.NumInsts * InlineConstants::InstrCost * Penalty;
}		}
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions This was fundamental how the cost model used to work: the more function specialised, the harder that become due to the Penalty. So I disagree with this statement: I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation .... This was not arbitrary, but by design. SjoerdMeijer: This was fundamental how the cost model used to work: the more function specialised, the harder…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Maybe "arbitrary" was the wrong vocabulary here. Sorry. What I mean is that it is not fair to bias the calculation of the cost of a given function based on historical data as "how many specializations have happend so far". As I explained on the description, a potential specialization may never trigger even if it is more profitable from one that did, just because it was discovered first. I could separate this change to another patch if it makes the review easier. labrinea: Maybe "arbitrary" was the wrong vocabulary here. Sorry. What I mean is that it is not fair to…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Ok, I see, I probably didn't get that idea, but it makes sense. I guess the general idea is to collect all candidates first, calculate a profitability, then select a few candidates. Maybe I am still expecting some sort of penalty the more gets specialised. But definitely, if you can split more things off, that would certainly help. This needs a rebase anyway I guess, and I need to look at this again after that. SjoerdMeijer: Ok, I see, I probably didn't get that idea, but it makes sense. I guess the general idea is to…

InstructionCost getUserBonus(User *U, llvm::TargetTransformInfo &TTI,		InstructionCost getUserBonus(User *U, llvm::TargetTransformInfo &TTI,
LoopInfo &LI) {		LoopInfo &LI) {
auto *I = dyn_cast_or_null<Instruction>(U);		auto *I = dyn_cast_or_null<Instruction>(U);
// If not an instruction we do not know how to evaluate.		// If not an instruction we do not know how to evaluate.
// Keep minimum possible cost for now so that it doesnt affect		// Keep minimum possible cost for now so that it doesnt affect
// specialization.		// specialization.
if (!I)		if (!I)
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	private:
/// This function implements the goal-directed heuristic. It determines if		/// This function implements the goal-directed heuristic. It determines if
/// specializing the function based on the incoming values of argument \p A		/// specializing the function based on the incoming values of argument \p A
/// would result in any significant optimization opportunities. If		/// would result in any significant optimization opportunities. If
/// optimization opportunities exist, the constant values of \p A on which to		/// optimization opportunities exist, the constant values of \p A on which to
/// specialize the function are collected in \p Constants.		/// specialize the function are collected in \p Constants.
///		///
/// \returns true if the function should be specialized on the given		/// \returns true if the function should be specialized on the given
/// argument.		/// argument.
bool isArgumentInteresting(Argument *A, ConstList &Constants) {		bool isArgumentInteresting(Argument *A,
		SmallVectorImpl<CallArgBinding> &Constants) {
// For now, don't attempt to specialize functions based on the values of		// For now, don't attempt to specialize functions based on the values of
// composite types.		// composite types.
if (!A->getType()->isSingleValueType() \|\| A->user_empty())		if (!A->getType()->isSingleValueType() \|\| A->user_empty())
return false;		return false;

// If the argument isn't overdefined, there's nothing to do. It should		// If the argument isn't overdefined, there's nothing to do. It should
// already be constant.		// already be constant.
if (!Solver.getLatticeValueFor(A).isOverdefined()) {		if (!Solver.getLatticeValueFor(A).isOverdefined()) {
Show All 23 Lines	bool isArgumentInteresting(Argument *A,

LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting argument "		LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting argument "
<< A->getNameOrAsOperand() << "\n");		<< A->getNameOrAsOperand() << "\n");
return true;		return true;
}		}

/// Collect in \p Constants all the constant values that argument \p A can		/// Collect in \p Constants all the constant values that argument \p A can
/// take on.		/// take on.
void getPossibleConstants(Argument *A, ConstList &Constants) {		void getPossibleConstants(Argument *A,
		SmallVectorImpl<CallArgBinding> &Constants) {
Function *F = A->getParent();		Function *F = A->getParent();

// Iterate over all the call sites of the argument's parent function.		// Iterate over all the call sites of the argument's parent function.
for (User *U : F->users()) {		for (User *U : F->users()) {
if (!isa<CallInst>(U) && !isa<InvokeInst>(U))		if (!isa<CallInst>(U) && !isa<InvokeInst>(U))
continue;		continue;
auto &CS = *cast<CallBase>(U);		auto &CS = *cast<CallBase>(U);
// If the call site has attribute minsize set, that callsite won't be		// If the call site has attribute minsize set, that callsite won't be
Show All 25 Lines	for (User *U : F->users()) {
return;		return;

if (!GV->getValueType()->isSingleValueType())		if (!GV->getValueType()->isSingleValueType())
return;		return;
}		}

if (isa<Constant>(V) && (Solver.getLatticeValueFor(V).isConstant() \|\|		if (isa<Constant>(V) && (Solver.getLatticeValueFor(V).isConstant() \|\|
EnableSpecializationForLiteralConstant))		EnableSpecializationForLiteralConstant))
Constants.push_back(cast<Constant>(V));		Constants.push_back({&CS, cast<Constant>(V)});
}		}
}		}

/// Rewrite calls to function \p F to call function \p Clone instead.		/// Rewrite calls to function \p F to call function \p Clone instead.
///		///
/// This function modifies calls to function \p F as long as the actual		/// This function modifies calls to function \p F as long as the actual
/// argument matches the one in \p Arg. Note that for recursive calls we		/// arguments match those in \p Args. Note that for recursive calls we
/// need to compare against the cloned formal argument.		/// need to compare against the cloned formal arguments.
///		///
/// Callsites that have been marked with the MinSize function attribute won't		/// Callsites that have been marked with the MinSize function attribute won't
/// be specialized and rewritten.		/// be specialized and rewritten.
void rewriteCallSites(Function *Clone, const ArgInfo &Arg,		void rewriteCallSites(Function *Clone, const SmallVectorImpl<ArgInfo> &Args,
ValueToValueMapTy &Mappings) {		ValueToValueMapTy &Mappings) {
Function *F = Arg.Formal->getParent();		assert(!Args.empty() && "Specialization without arguments");
unsigned ArgNo = Arg.Formal->getArgNo();		Function *F = Args[0].Formal->getParent();
SmallVector<CallBase *, 4> CallSitesToRewrite;
		SmallVector<CallBase *, 8> CallSitesToRewrite;
for (auto *U : F->users()) {		for (auto *U : F->users()) {
if (!isa<CallInst>(U) && !isa<InvokeInst>(U))		if (!isa<CallInst>(U) && !isa<InvokeInst>(U))
continue;		continue;
auto &CS = *cast<CallBase>(U);		auto &CS = *cast<CallBase>(U);
if (!CS.getCalledFunction() \|\| CS.getCalledFunction() != F)		if (!CS.getCalledFunction() \|\| CS.getCalledFunction() != F)
continue;		continue;
CallSitesToRewrite.push_back(&CS);		CallSitesToRewrite.push_back(&CS);
}		}

LLVM_DEBUG(dbgs() << "FnSpecialization: Replacing call sites of "		LLVM_DEBUG(dbgs() << "FnSpecialization: Replacing call sites of "
<< F->getName() << " with " << Clone->getName() << "\n");		<< F->getName() << " with " << Clone->getName() << "\n");

for (auto *CS : CallSitesToRewrite) {		for (auto *CS : CallSitesToRewrite) {
LLVM_DEBUG(dbgs() << "FnSpecialization: "		LLVM_DEBUG(dbgs() << "FnSpecialization: "
<< CS->getFunction()->getName() << " ->" << *CS		<< CS->getFunction()->getName() << " ->" << *CS
<< "\n");		<< "\n");
if (/* recursive call */		if (/* recursive call */
(CS->getFunction() == Clone &&		(CS->getFunction() == Clone &&
CS->getArgOperand(ArgNo) == Mappings[Arg.Formal]) \|\|		all_of(Args,
		[CS, &Mappings](const ArgInfo &Arg) {
		unsigned ArgNo = Arg.Formal->getArgNo();
		return CS->getArgOperand(ArgNo) == Mappings[Arg.Formal];
		})) \|\|
/* normal call */		/* normal call */
CS->getArgOperand(ArgNo) == Arg.Actual) {		all_of(Args, [CS](const ArgInfo &Arg) {
		unsigned ArgNo = Arg.Formal->getArgNo();
		return CS->getArgOperand(ArgNo) == Arg.Actual;
		})) {
CS->setCalledFunction(Clone);		CS->setCalledFunction(Clone);
Solver.markOverdefined(CS);		Solver.markOverdefined(CS);
}		}
}		}
}		}

void updateSpecializedFuncs(FuncList &Candidates, FuncList &WorkList) {		void updateSpecializedFuncs(FuncList &Candidates, FuncList &WorkList) {
for (auto *F : WorkList) {		for (auto *F : WorkList) {
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
LLVM_DEBUG(dbgs() << "FnSpecialization: Worklist fn decls:\n");		LLVM_DEBUG(dbgs() << "FnSpecialization: Worklist fn decls:\n");
for (auto *F : FuncDecls)		for (auto *F : FuncDecls)
LLVM_DEBUG(dbgs() << "FnSpecialization: *) " << F->getName() << "\n");		LLVM_DEBUG(dbgs() << "FnSpecialization: *) " << F->getName() << "\n");
#endif		#endif

// Initially resolve the constants in all the argument tracked functions.		// Initially resolve the constants in all the argument tracked functions.
RunSCCPSolver(FuncDecls);		RunSCCPSolver(FuncDecls);

SmallVector<Function *, 2> WorkList;		SmallVector<Function *, 8> WorkList;
unsigned I = 0;		unsigned I = 0;
while (FuncSpecializationMaxIters != I++ &&		while (FuncSpecializationMaxIters != I++ &&
FS.specializeFunctions(FuncDecls, WorkList)) {		FS.specializeFunctions(FuncDecls, WorkList)) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Finished iteration " << I << "\n");		LLVM_DEBUG(dbgs() << "FnSpecialization: Finished iteration " << I << "\n");

// Run the solver for the specialized functions.		// Run the solver for the specialized functions.
RunSCCPSolver(WorkList);		RunSCCPSolver(WorkList);

Show All 14 Lines

llvm/lib/Transforms/Utils/SCCPSolver.cpp

Show First 20 Lines • Show All 444 Lines • ▼ Show 20 Lines	public:
bool isStructLatticeConstant(Function F, StructType STy);		bool isStructLatticeConstant(Function F, StructType STy);

Constant *getConstant(const ValueLatticeElement &LV) const;		Constant *getConstant(const ValueLatticeElement &LV) const;

SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions() {		SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions() {
return TrackingIncomingArguments;		return TrackingIncomingArguments;
}		}

void markArgInFuncSpecialization(Function *F, const ArgInfo &Arg);		void markArgInFuncSpecialization(Function *F,
		const SmallVectorImpl<ArgInfo> &Args);

void markFunctionUnreachable(Function *F) {		void markFunctionUnreachable(Function *F) {
for (auto &BB : *F)		for (auto &BB : *F)
BBExecutable.erase(&BB);		BBExecutable.erase(&BB);
}		}
};		};

} // namespace llvm		} // namespace llvm
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	Constant *SCCPInstVisitor::getConstant(const ValueLatticeElement &LV) const {
if (LV.isConstantRange()) {		if (LV.isConstantRange()) {
const auto &CR = LV.getConstantRange();		const auto &CR = LV.getConstantRange();
if (CR.getSingleElement())		if (CR.getSingleElement())
return ConstantInt::get(Ctx, *CR.getSingleElement());		return ConstantInt::get(Ctx, *CR.getSingleElement());
}		}
return nullptr;		return nullptr;
}		}

void SCCPInstVisitor::markArgInFuncSpecialization(Function *F,		void SCCPInstVisitor::markArgInFuncSpecialization(
const ArgInfo &Arg) {		Function *F, const SmallVectorImpl<ArgInfo> &Args) {
assert(F->arg_size() == Arg.Formal->getParent()->arg_size() &&		assert(!Args.empty() && "Specialization without arguments");
		assert(F->arg_size() == Args[0].Formal->getParent()->arg_size() &&
"Functions should have the same number of arguments");		"Functions should have the same number of arguments");

		auto Iter = Args.begin();
Argument *NewArg = F->arg_begin();		Argument *NewArg = F->arg_begin();
Argument *OldArg = Arg.Formal->getParent()->arg_begin();		Argument *OldArg = Args[0].Formal->getParent()->arg_begin();
for (auto End = F->arg_end(); NewArg != End; ++NewArg, ++OldArg) {		for (auto End = F->arg_end(); NewArg != End; ++NewArg, ++OldArg) {

LLVM_DEBUG(dbgs() << "SCCP: Marking argument "		LLVM_DEBUG(dbgs() << "SCCP: Marking argument "
<< NewArg->getNameOrAsOperand() << "\n");		<< NewArg->getNameOrAsOperand() << "\n");

if (OldArg == Arg.Formal) {		if (OldArg == Iter->Formal) {
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Could we use `Iter->Formal`? ChuanqiXu: Could we use `Iter->Formal`?
		labrineaAuthorUnsubmitted Done Reply Inline Actions possibly labrinea: possibly
// Mark the argument constants in the new function.		// Mark the argument constants in the new function.
markConstant(NewArg, Arg.Actual);		markConstant(NewArg, Iter->Actual);
		++Iter;
} else if (ValueState.count(OldArg)) {		} else if (ValueState.count(OldArg)) {
// For the remaining arguments in the new function, copy the lattice state		// For the remaining arguments in the new function, copy the lattice state
// over from the old function.		// over from the old function.
//		//
// Note: This previously looked like this:		// Note: This previously looked like this:
// ValueState[NewArg] = ValueState[OldArg];		// ValueState[NewArg] = ValueState[OldArg];
// This is incorrect because the DenseMap class may resize the underlying		// This is incorrect because the DenseMap class may resize the underlying
// memory when inserting `NewArg`, which will invalidate the reference to		// memory when inserting `NewArg`, which will invalidate the reference to
▲ Show 20 Lines • Show All 1,162 Lines • ▼ Show 20 Lines
Constant *SCCPSolver::getConstant(const ValueLatticeElement &LV) const {		Constant *SCCPSolver::getConstant(const ValueLatticeElement &LV) const {
return Visitor->getConstant(LV);		return Visitor->getConstant(LV);
}		}

SmallPtrSetImpl<Function *> &SCCPSolver::getArgumentTrackedFunctions() {		SmallPtrSetImpl<Function *> &SCCPSolver::getArgumentTrackedFunctions() {
return Visitor->getArgumentTrackedFunctions();		return Visitor->getArgumentTrackedFunctions();
}		}

void SCCPSolver::markArgInFuncSpecialization(Function *F, const ArgInfo &Arg) {		void SCCPSolver::markArgInFuncSpecialization(
Visitor->markArgInFuncSpecialization(F, Arg);		Function *F, const SmallVectorImpl<ArgInfo> &Args) {
		Visitor->markArgInFuncSpecialization(F, Args);
}		}

void SCCPSolver::markFunctionUnreachable(Function *F) {		void SCCPSolver::markFunctionUnreachable(Function *F) {
Visitor->markFunctionUnreachable(F);		Visitor->markFunctionUnreachable(F);
}		}

void SCCPSolver::visit(Instruction *I) { Visitor->visit(I); }		void SCCPSolver::visit(Instruction *I) { Visitor->visit(I); }

void SCCPSolver::visitCall(CallInst &I) { Visitor->visitCall(I); }		void SCCPSolver::visitCall(CallInst &I) { Visitor->visitCall(I); }

llvm/test/Transforms/FunctionSpecialization/function-specialization4.ll

	Show All 40 Lines

	; CONST1: define internal i32 @foo.1(i32 %x, i32* %b, i32* %c)			; CONST1: define internal i32 @foo.1(i32 %x, i32* %b, i32* %c)
	; CONST1-NOT: define internal i32 @foo.2(i32 %x, i32* %b, i32* %c)			; CONST1-NOT: define internal i32 @foo.2(i32 %x, i32* %b, i32* %c)

	; CHECK: define internal i32 @foo.1(i32 %x, i32* %b, i32* %c) {			; CHECK: define internal i32 @foo.1(i32 %x, i32* %b, i32* %c) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: %0 = load i32, i32* @A, align 4			; CHECK-NEXT: %0 = load i32, i32* @A, align 4
	; CHECK-NEXT: %add = add nsw i32 %x, %0			; CHECK-NEXT: %add = add nsw i32 %x, %0
	; CHECK-NEXT: %1 = load i32, i32* %c, align 4			; CHECK-NEXT: %1 = load i32, i32* @C, align 4
	; CHECK-NEXT: %add1 = add nsw i32 %add, %1			; CHECK-NEXT: %add1 = add nsw i32 %add, %1
	; CHECK-NEXT: ret i32 %add1			; CHECK-NEXT: ret i32 %add1
	; CHECK-NEXT: }			; CHECK-NEXT: }

	; CHECK: define internal i32 @foo.2(i32 %x, i32* %b, i32* %c) {			; CHECK: define internal i32 @foo.2(i32 %x, i32* %b, i32* %c) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: %0 = load i32, i32* @B, align 4			; CHECK-NEXT: %0 = load i32, i32* @B, align 4
	; CHECK-NEXT: %add = add nsw i32 %x, %0			; CHECK-NEXT: %add = add nsw i32 %x, %0
	; CHECK-NEXT: %1 = load i32, i32* %c, align 4			; CHECK-NEXT: %1 = load i32, i32* @D, align 4
	; CHECK-NEXT: %add1 = add nsw i32 %add, %1			; CHECK-NEXT: %add1 = add nsw i32 %add, %1
	; CHECK-NEXT: ret i32 %add1			; CHECK-NEXT: ret i32 %add1
	; CHECK-NEXT: }			; CHECK-NEXT: }

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -function-specialization -func-specialization-max-clones=0 -func-specialization-size-threshold=14 -S < %s \| FileCheck %s --check-prefix=NONE
				; RUN: opt -function-specialization -func-specialization-max-clones=1 -func-specialization-size-threshold=14 -S < %s \| FileCheck %s --check-prefix=ONE
				; RUN: opt -function-specialization -func-specialization-max-clones=2 -func-specialization-size-threshold=14 -S < %s \| FileCheck %s --check-prefix=TWO
				; RUN: opt -function-specialization -func-specialization-max-clones=3 -func-specialization-size-threshold=14 -S < %s \| FileCheck %s --check-prefix=THREE

				; Make sure that we iterate correctly after sorting the specializations:
				; FnSpecialization: Specializations for function compute
				; FnSpecialization: Gain = 608
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Perhaps you want to turn this into a test, i.e. match the debug output too? If so, that would require asserts. Not sure, but probably best done as an additional, separate test, otherwise we might loose the testing of this for the non-assert builds and bots. SjoerdMeijer: Perhaps you want to turn this into a test, i.e. match the debug output too? If so, that would…
				labrineaAuthorUnsubmitted Done Reply Inline Actions This here is a comment; it does not contain check lines. The whole purpose is to show that specializations do get sorted and iterated correctly. I wouldn't mind adding test for debug output in a follow up patch. I couldn't find examples; not sure how to do them. Any pointers appreciated. labrinea: This here is a comment; it does not contain check lines. The whole purpose is to show that…
				; FnSpecialization: FormalArg = binop1, ActualArg = power
				; FnSpecialization: FormalArg = binop2, ActualArg = mul
				; FnSpecialization: Gain = 982
				; FnSpecialization: FormalArg = binop1, ActualArg = plus
				; FnSpecialization: FormalArg = binop2, ActualArg = minus
				; FnSpecialization: Gain = 795
				; FnSpecialization: FormalArg = binop1, ActualArg = minus
				; FnSpecialization: FormalArg = binop2, ActualArg = power

				define i64 @main(i64 %x, i64 %y, i1 %flag) {
				; NONE-LABEL: @main(
				; NONE-NEXT: entry:
				; NONE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
				; NONE: plus:
				; NONE-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.]], i64 (i64, i64) @power, i64 (i64, i64)* @mul)
				; NONE-NEXT: br label [[MERGE:%.*]]
				; NONE: minus:
				; NONE-NEXT: [[TMP1:%.]] = call i64 @compute(i64 [[X]], i64 [[Y]], i64 (i64, i64) @plus, i64 (i64, i64)* @minus)
				; NONE-NEXT: br label [[MERGE]]
				; NONE: merge:
				; NONE-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
				; NONE-NEXT: [[TMP3:%.]] = call i64 @compute(i64 [[TMP2]], i64 42, i64 (i64, i64) @minus, i64 (i64, i64)* @power)
				; NONE-NEXT: ret i64 [[TMP3]]
				;
				; ONE-LABEL: @main(
				; ONE-NEXT: entry:
				; ONE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
				; ONE: plus:
				; ONE-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.]], i64 (i64, i64) @power, i64 (i64, i64)* @mul)
				; ONE-NEXT: br label [[MERGE:%.*]]
				; ONE: minus:
				; ONE-NEXT: [[TMP1:%.]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], i64 (i64, i64) @plus, i64 (i64, i64)* @minus)
				; ONE-NEXT: br label [[MERGE]]
				; ONE: merge:
				; ONE-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
				; ONE-NEXT: [[TMP3:%.]] = call i64 @compute(i64 [[TMP2]], i64 42, i64 (i64, i64) @minus, i64 (i64, i64)* @power)
				; ONE-NEXT: ret i64 [[TMP3]]
				;
				; TWO-LABEL: @main(
				; TWO-NEXT: entry:
				; TWO-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
				; TWO: plus:
				; TWO-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.]], i64 (i64, i64) @power, i64 (i64, i64)* @mul)
				; TWO-NEXT: br label [[MERGE:%.*]]
				; TWO: minus:
				; TWO-NEXT: [[TMP1:%.]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], i64 (i64, i64) @plus, i64 (i64, i64)* @minus)
				; TWO-NEXT: br label [[MERGE]]
				; TWO: merge:
				; TWO-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
				; TWO-NEXT: [[TMP3:%.]] = call i64 @compute.2(i64 [[TMP2]], i64 42, i64 (i64, i64) @minus, i64 (i64, i64)* @power)
				; TWO-NEXT: ret i64 [[TMP3]]
				;
				; THREE-LABEL: @main(
				; THREE-NEXT: entry:
				; THREE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
				; THREE: plus:
				; THREE-NEXT: [[TMP0:%.]] = call i64 @compute.3(i64 [[X:%.]], i64 [[Y:%.]], i64 (i64, i64) @power, i64 (i64, i64)* @mul)
				; THREE-NEXT: br label [[MERGE:%.*]]
				; THREE: minus:
				; THREE-NEXT: [[TMP1:%.]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], i64 (i64, i64) @plus, i64 (i64, i64)* @minus)
				; THREE-NEXT: br label [[MERGE]]
				; THREE: merge:
				; THREE-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
				; THREE-NEXT: [[TMP3:%.]] = call i64 @compute.2(i64 [[TMP2]], i64 42, i64 (i64, i64) @minus, i64 (i64, i64)* @power)
				; THREE-NEXT: ret i64 [[TMP3]]
				;
				entry:
				br i1 %flag, label %plus, label %minus

				plus:
				%tmp0 = call i64 @compute(i64 %x, i64 %y, i64 (i64, i64)* @power, i64 (i64, i64)* @mul)
				br label %merge

				minus:
				%tmp1 = call i64 @compute(i64 %x, i64 %y, i64 (i64, i64)* @plus, i64 (i64, i64)* @minus)
				br label %merge

				merge:
				%tmp2 = phi i64 [ %tmp0, %plus ], [ %tmp1, %minus]
				%tmp3 = call i64 @compute(i64 %tmp2, i64 42, i64 (i64, i64)* @minus, i64 (i64, i64)* @power)
				ret i64 %tmp3
				}

				; THREE-NOT: define internal i64 @compute
				;
				; THREE-LABEL: define internal i64 @compute.1(i64 %x, i64 %y, i64 (i64, i64)* %binop1, i64 (i64, i64)* %binop2) {
				; THREE-NEXT: entry:
				; THREE-NEXT: [[TMP0:%.+]] = call i64 @plus(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP1:%.+]] = call i64 @minus(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
				; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
				; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
				; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
				; THREE-NEXT: ret i64 [[TMP5]]
				; THREE-NEXT: }
				;
				; THREE-LABEL: define internal i64 @compute.2(i64 %x, i64 %y, i64 (i64, i64)* %binop1, i64 (i64, i64)* %binop2) {
				; THREE-NEXT: entry:
				; THREE-NEXT: [[TMP0:%.+]] = call i64 @minus(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP1:%.+]] = call i64 @power(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
				; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
				; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
				; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
				; THREE-NEXT: ret i64 [[TMP5]]
				; THREE-NEXT: }
				;
				; THREE-LABEL: define internal i64 @compute.3(i64 %x, i64 %y, i64 (i64, i64)* %binop1, i64 (i64, i64)* %binop2) {
				; THREE-NEXT: entry:
				; THREE-NEXT: [[TMP0:%.+]] = call i64 @power(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP1:%.+]] = call i64 @mul(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
				; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
				; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
				; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
				; THREE-NEXT: ret i64 [[TMP5]]
				; THREE-NEXT: }
				;
				define internal i64 @compute(i64 %x, i64 %y, i64 (i64, i64)* %binop1, i64 (i64, i64)* %binop2) {
				entry:
				%tmp0 = call i64 %binop1(i64 %x, i64 %y)
				%tmp1 = call i64 %binop2(i64 %x, i64 %y)
				%add = add i64 %tmp0, %tmp1
				%div = sdiv i64 %add, %x
				%sub = sub i64 %div, %y
				%mul = mul i64 %sub, 2
				ret i64 %mul
				}

				define internal i64 @plus(i64 %x, i64 %y) {
				entry:
				%tmp0 = add i64 %x, %y
				ret i64 %tmp0
				}

				define internal i64 @minus(i64 %x, i64 %y) {
				entry:
				%tmp0 = sub i64 %x, %y
				ret i64 %tmp0
				}

				define internal i64 @mul(i64 %x, i64 %n) {
				entry:
				%cmp6 = icmp sgt i64 %n, 1
				br i1 %cmp6, label %for.body, label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.body, %entry
				%x.addr.0.lcssa = phi i64 [ %x, %entry ], [ %add, %for.body ]
				ret i64 %x.addr.0.lcssa

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 1, %entry ]
				%x.addr.07 = phi i64 [ %add, %for.body ], [ %x, %entry ]
				%add = shl nsw i64 %x.addr.07, 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				define internal i64 @power(i64 %x, i64 %n) {
				entry:
				%cmp6 = icmp sgt i64 %n, 1
				br i1 %cmp6, label %for.body, label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.body, %entry
				%x.addr.0.lcssa = phi i64 [ %x, %entry ], [ %mul, %for.body ]
				ret i64 %x.addr.0.lcssa

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 1, %entry ]
				%x.addr.07 = phi i64 [ %mul, %for.body ], [ %x, %entry ]
				%mul = mul nsw i64 %x.addr.07, %x.addr.07
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Support function specialization across multiple arguments.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 418538

llvm/include/llvm/Transforms/Utils/SCCPSolver.h

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

llvm/lib/Transforms/Utils/SCCPSolver.cpp

llvm/test/Transforms/FunctionSpecialization/function-specialization4.ll

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

[FuncSpec] Support function specialization across multiple arguments.
ClosedPublic