This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
SCCPSolver.h
-
lib/Transforms/
-
Transforms/
-
IPO/
25/57
FunctionSpecialization.cpp
-
Utils/
1/2
SCCPSolver.cpp
-
test/Transforms/FunctionSpecialization/
-
Transforms/
-
FunctionSpecialization/
-
function-specialization-constant-integers.ll
-
function-specialization-loop.ll
-
function-specialization-minsize3.ll
-
function-specialization.ll
-
function-specialization4.ll
-
remove-dead-recursive-function.ll
1/2
specialize-multiple-arguments.ll

Differential D119880

[FuncSpec] Support function specialization across multiple arguments.
ClosedPublic

Authored by labrinea on Feb 15 2022, 12:10 PM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
dmgreen
fhahn
ChuanqiXu

Commits

rG8045bf9d0dc5: [FuncSpec] Support function specialization across multiple arguments.

Summary

The current implementation of Function Specialization does not allow specializing more than one arguments per function call, which is a limitation I am lifting with this patch.

My main challenge was to choose the most suitable ADT for storing the specializations. We need an associative container for binding all the actual arguments of a specialization to the function call. We also need a consistent iteration order across executions. Lastly we want to be able to sort the entries by Gain and reject the least profitable ones.

MapVector fits the bill but not quite; erasing elements is expensive and using stable_sort messes up the indices to the underlying vector. I am therefore using the underlying vector directly after calculating the Gain.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

labrinea created this revision.Feb 15 2022, 12:10 PM

Herald added subscribers: snehasish, ormris, hiraditya. · View Herald TranscriptFeb 15 2022, 12:10 PM

labrinea requested review of this revision.Feb 15 2022, 12:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 12:10 PM

labrinea added a parent revision: D119878: [FuncSpec] Remove definitions of fully specialized functions..Feb 15 2022, 12:10 PM

Harbormaster completed remote builds in B149788: Diff 408995.Feb 15 2022, 12:10 PM

There is quite a lot to unpack here:

This patch makes a significant change in the cost model as it no longer seems sensible to calculate the specialization gain per function argument, but rather as a whole. I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation for newly discovered functions, and the truncation of clones for a given function.

My request would be to split this up and do one thing at a time if possible. There also seems to a bit of refactoring and NFC changes in here, probably also best split off.

Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Clarifying my previous comment a bit more:

Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Compile times are important of course. But what I want to say is that we should aim to lift some of these arbitrary restrictions, like you mentioned, by providing new options/ways to control things but try to be as NFC or close to the original behaviour as possible. That was tuned to specialise very infrequently and a special case, so everything lifting of restrictions will increase compile times. Thus, the way I look at this that you put the infrastructure in place, so that perhaps later we can change things, or decisions can be manually overridden.

I have measured compilation times with the pass enabled/disabled by default using instruction count as the metric following these:

Could you share the link the the actual comparison (there's a C link on the left side for each commit on the overview page)? From the numbers you posted it is not clear for which configuration those numbers are (e.g. O3 + NewPM, ReleaseLTO + g + NewPM).

! In D119880#3325744, @fhahn wrote:
Could you share the link the the actual comparison (there's a C link on the left side for each commit on the overview page)? From the numbers you posted it is not clear for which configuration those numbers are (e.g. O3 + NewPM, ReleaseLTO + g + NewPM).

Sorry I wasn't clear. I performed a local run on my x86 machine configuring the build as cmake -GNinja /path/to/llvm-test-suite/ -DOPTFLAGS="" -C /path/to/llvm-test-suite/cmake/caches/O3.cmake -DCMAKE_C_COMPILER=/path/to/release-build-no-asserts/bin/clang -DTEST_SUITE_USE_PERF=true -DTEST_SUITE_SUBDIRS=CTMark -DTEST_SUITE_RUN_BENCHMARKS=false -DTEST_SUITE_COLLECT_CODE_SIZE=false. I am not sure which pass manager that configuration is using.

! In D119880#3325445, @SjoerdMeijer wrote:
My request would be to split this up and do one thing at a time if possible. There also seems to a bit of refactoring and NFC changes in here, probably also best split off.
Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Could you clarify which bits to split up, as I don't see how I could further break down this patch? Regarding the number of functions specialized in comparison to the previous version, I believe the llvm-test-suite reports statistics so I might be able to provide that information. Cheers.

! In D119880#3325448, @SjoerdMeijer wrote:
Compile times are important of course. But what I want to say is that we should aim to lift some of these arbitrary restrictions, like you mentioned, by providing new options/ways to control things but try to be as NFC or close to the original behaviour as possible. That was tuned to specialise very infrequently and a special case, so everything lifting of restrictions will increase compile times. Thus, the way I look at this that you put the infrastructure in place, so that perhaps later we can change things, or decisions can be manually overridden.

Indeed this pass is profitable for spec-int-mcf. The two interesting functions we get to specialize have a gain about 4M. I experimented with the default value of MinGainThreshold among {1, 1K, 10K, 100K, 1M}. Using the llvm-test-suite for measuring compilation times anything above 10K had more or less the same effect, so I chose that one.

Here are a few more statistics from comparing this patch to the current implementation of function specialization:

This is from compiling the llvm-test-suite at -O3 under perf with a release build (no asserts) targeting x86. The metric is instruction count (average of three)

test name	%delta
ClamAV	+0.009545546536911
7zip	-0.001629518928931
tramp3d-v4	-0.046465647871192
kimwitu++	+0.011940454030694
sqlite3	-0.158695422048798
mafft	-0.014463100189515
lencod	-0.020921880121996
SPASS	-0.047946880831827
Bullet	-0.003464312699035
consumer-typeset	-0.008383706273952

geomean = -0.0280598%

This is from compiling/running the llvm-test-suite at -O3 targeting AArch64 with statistics.

test name	num specializations before	num specializations after
MultiSource/Applications/ClamAV/clamscan.test	3	0
MultiSource/Applications/d/make_dparser.test	1	0
MultiSource/Applications/oggenc/oggenc.test	2	2
MultiSource/Applications/sqlite3/sqlite3.test	3	0

Sorry for the delay. First, about a NFC and some refactoring, can the reshuffle of ArgInfo and SpecialisationInfo and the changes in the Solver functions be an NFC change perhaps?

But more importantly, rereading the description, I disagree with these statements:

I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation for newly discovered functions, and the truncation of clones for a given function.

These are not arbitrary restrictions, but was by design, like I mentioned in a comment inline. For the former, this was fundamental how the cost model used to work. For the latter, the candidates were sorted first on profitability, then the candidates with the least gain disregarded. Again, this was both by design, to manage the number of specialisations and compile-times. Also, this patch introduces an arbitrary heuristic: MinGainThreshold.

The real arbitrary restriction of the current implementation is this: we only specialise the first argument. This is the real restriction that we should lift first, and it should be based of course on profitability. That's why I feel this patch is doing too much at the same time, from some restructuring, to changing how the cost-model works and specialising multiple arguments. So my suggestion/question is if it would be possible to split things up accordingly. Does that make sense?

In D119880#3327657, @labrinea wrote:

Here are a few more statistics from comparing this patch to the current implementation of function specialization:

This is from compiling the llvm-test-suite at -O3 under perf with a release build (no asserts) targeting x86. The metric is instruction count (average of three)

test name %delta

ClamAV +0.009545546536911

7zip -0.001629518928931

tramp3d-v4 -0.046465647871192

kimwitu++ +0.011940454030694

sqlite3 -0.158695422048798

mafft -0.014463100189515

lencod -0.020921880121996

SPASS -0.047946880831827

Bullet -0.003464312699035

consumer-typeset -0.008383706273952

geomean = -0.0280598%

This is from compiling/running the llvm-test-suite at -O3 targeting AArch64 with statistics.

test name num specializations before num specializations after

MultiSource/Applications/ClamAV/clamscan.test 3 0

MultiSource/Applications/d/make_dparser.test 1 0

MultiSource/Applications/oggenc/oggenc.test 2 2

MultiSource/Applications/sqlite3/sqlite3.test 3 0

So why exactly do we specialise less?

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
112	Like I mentioned before, not sure what the value is of recording compile time numbers here as things are still in flux; i.e. I don't think this information will age very well.
132	Nit: perhaps `SpecializationInfo`, to be a tiny bit more consistent (with ArgInfo) and descriptive.
321	Nit: `Info` could be more descriptive.
400	Nit: `Info` could be more descriptive.
550	This was fundamental how the cost model used to work: the more function specialised, the harder that become due to the Penalty. So I disagree with this statement: I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation .... This was not arbitrary, but by design.

! In D119880#3337134, @SjoerdMeijer wrote:
These are not arbitrary restrictions, but was by design, like I mentioned in a comment inline. For the former, this was fundamental how the cost model used to work. For the latter, the candidates were sorted first on profitability, then the candidates with the least gain disregarded. Again, this was both by design, to manage the number of specialisations and compile-times. Also, this patch introduces an arbitrary heuristic: MinGainThreshold.

I'll give an example to make the problem more clear. Let's say function foo has four candidate specializations with bonus 10, 15, 20, 25 and cost 5 (assume penalty is zero at this point), and function bar has four specializations with bonus 20, 25, 30, 35 and cost 20 (assume it would have been 5 without the penalty). The corresponding gains are 5, 10, 15, 20 for foo and 0, 5, 10, 15 for bar. With MaxClonesThreshold = 2 we reject the first two candidates of each function. With the new cost model the gains would be 5, 10, 15, 20 for foo (same as before) and 15, 20, 25, 30 for bar. With MinGainThreshold = 20 we reject the first three candidates of foo and the first candidate of bar. As a result we have the same number of specializations as before, but we have kept the most profitable ones. MinGainThreshold is a way to control the code size increase as well as the compilation times without having to sort all the specializations (of all functions, not per function). Its value was decided empirically as I explained in my previous comment. The existing cost model considers everything with gain above zero profitable. MinGainThreshold allows fine-tuning this value. If we prefered sorting all the specializations by gain we would then need to decide how many of them we are keeping based on some heuristic (percentage maybe?), which is more or less the same problem as deciding a value for MinGainThreshold: both are somewhat "arbitrary".

So why exactly do we specialise less?

We specialize less because the default value of MinGainThreshold has been set high.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
112	That's why I am mentioning percentages and not absolute values, but okay, I'll remove this comment.
132	The name `SpecializationInfo` is being used below: using SpecializationInfo = SmallMapVector<CallBase *, Specialization, 8>; I will change that one to`SpecializationMap` if that's okay.
321	Will change it to `Specializations`.
400	Will change it to `Specializations`
550	Maybe "arbitrary" was the wrong vocabulary here. Sorry. What I mean is that it is not fair to bias the calculation of the cost of a given function based on historical data as "how many specializations have happend so far". As I explained on the description, a potential specialization may never trigger even if it is more profitable from one that did, just because it was discovered first. I could separate this change to another patch if it makes the review easier.

labrinea mentioned this in D120753: [FuncSpec][NFC] Refactor internal structures..Mar 1 2022, 10:30 AM

SjoerdMeijer added inline comments.Mar 3 2022, 1:57 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
550	Ok, I see, I probably didn't get that idea, but it makes sense. I guess the general idea is to collect all candidates first, calculate a profitability, then select a few candidates. Maybe I am still expecting some sort of penalty the more gets specialised. But definitely, if you can split more things off, that would certainly help. This needs a rebase anyway I guess, and I need to look at this again after that.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 1:57 AM

Changes from last revision:

bring back the penalty in function cost estimation (will factor out in a separate review)
rebased

Harbormaster completed remote builds in B152457: Diff 412828.Mar 3 2022, 1:54 PM

ormris removed a subscriber: ormris.Mar 3 2022, 2:49 PM

samtebbs added a subscriber: samtebbs.Mar 9 2022, 2:19 AM

samtebbs added inline comments.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
420–422	The `ActualArgs` and `FormalArg` similarity is a little confusing IMO. It might be useful to call them `Arguments`/`Args` (what the values passed to a function are called) and `Parameters` respectively, as that would make the difference clear.

labrinea added inline comments.Mar 9 2022, 3:02 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
420–422	I think we've established the naming here: https://reviews.llvm.org/D119874 (see the definition of `ArgInfo`). Wikipedia seems to agree on this: https://en.wikipedia.org/wiki/Parameter_(computer_programming).

samtebbs added inline comments.Mar 10 2022, 2:21 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
420–422	Sure, perhaps other people would have had to look extra hard because of the double use of "Arguments" but if that is also an accepted form then that's OK.

Sorry for the delay. I need to look a bit more at this, but I added some first thoughts inline.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
86	A few comments about this: First, thanks for restoring the penalty, I like doing things one step at a time. :) Can you add comments what this is, e.g. why it is 10000, but more generally what the rationale is and what this achieves. Can you add how this interacts with other options, like MaxClonesThreshold.
426	I probably need to think a bit more about this, but perhaps you can help.... That is, now we are considering multiple arguments, and we calculate the bonus for each of them, and accumulate the bonus here. I am wondering if composition of the bonus is the right model. Alternatives are: take the average, take the minimum/maximum, sum them like you do here, or something else? Do you have any thoughts on this?
438	Can we just do a `return` after this, thus don't need the `else` and decrease indentation?

Hey Sjoerd, thanks for picking this up again. I do have an idea so please wait for my new revision. The idea is to be able to choose between:

a positive MinGainThreshold (same as this revision works with the only difference being that the value should be user defined - default will be zero)
a zero MinGainThreshold (that will be set as the default value) and then sort the Specializations based on Gain and reject "some" of them. I am now trying to come up with a formula for the amount of specializations to reject.

I will add comments about the interaction of options as you suggest.
Regarding the bonus calculation, I believe that accumulating the bonus of each argument is the right thing because each one of them contributes to the benefits of specialization (inlining etc). I also think that the more we complicate the model (measuring average,min,max) the slower the pass becomes.

labrinea added inline comments.Mar 10 2022, 5:41 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
438	We would like to print the debug info that follows. It's useful even when the cost model is disregarded.

Changes from last revision:

Zero is now the default value for MinGainThreshold
The Specializations are sorted by Gain and the lower half gets rejected

Harbormaster completed remote builds in B153571: Diff 414382.Mar 10 2022, 8:22 AM

jdoerfert added a subscriber: jdoerfert.Mar 10 2022, 8:28 AM

SjoerdMeijer added inline comments.Mar 11 2022, 12:46 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
327	This isn't clear to me, I don't see how we sort things. First, we add things to a map here....
340	... and then we iterate over half the elements in the map here. Does the sorting rely on the keys of the map? Not sure I find that the cleanest way of doing it, think it could be a one liner and doing a llvm::stable_sort?

labrinea added inline comments.Mar 11 2022, 1:24 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
340	The std::map quarantees ordering of elements based on key (which is the Gain here). Stable sort requires a container with random access iteration. SpecializationMap is SmallMapVector, which might have random access, but Map is SmallDenseMap (hash table). On top of that we have nested structures (a map containing maps) so we can't sort everything all together with stable sort, right?

I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;. The idea is to keep the compilation times down in case of a source file which results in many candidate specializations (I haven't seen any but you never know).

Here are some values of f(x) = log10(x^4)+1 rounded to integer:
f(1) =1
f(2) = 2
f(3) = 2
f(4) = 3
f(5) = 3
f(6) = 4
...
f(10) = 5
...
f(100) = 9
...
f(1000) = 13

Posting some statistics from running the llvm test suite at -O3 on AArch64:

test name	num specializations one arg	num specializations mult args (sorted: keep n/2)	num specializations mult args (sorted: keep log10(n^4)+1)
MultiSource/Applications/ClamAV/clamscan.test	3	3	4
MultiSource/Applications/d/make_dparser.test	1	0	1
MultiSource/Applications/oggenc/oggenc.test	2	1	2
MultiSource/Applications/sqlite3/sqlite3.test	3	3	4

In D119880#3375169, @labrinea wrote:

I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;.

Not sure if I miss anything, but I don't see this in the code. Also, why this formula, what is the rationale?

That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).

Regarding your previous question; in order to use stable sort we would need to flatten the nested structure of SmallDenseMap<Function *, SpecializationMap> into a wider SpecializationMap, which would contain specializations of several functions in one data structure. The problem is that calculateGains currently expects an empty SpecializationMap, which corresponds to a single function, hence it would require heavy adaptation. I can experiment and see if it's worth pursuing this idea (maybe in follow up patches?).

In D119880#3380014, @labrinea wrote:

That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).

Oh okay, got it.

Regarding your previous question; in order to use stable sort we would need to flatten the nested structure of SmallDenseMap<Function *, SpecializationMap> into a wider SpecializationMap, which would contain specializations of several functions in one data structure. The problem is that calculateGains currently expects an empty SpecializationMap, which corresponds to a single function, hence it would require heavy adaptation. I can experiment and see if it's worth pursuing this idea (maybe in follow up patches?).

Okay, cheers, will leave it up to you then. I expect both approaches to be very close in terms of compile times. Algorithmically it is the same I think. So it's more about readability of the code.

I am now going to read the whole patch again.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
131–134	Nit: we might as well drop the 8 here. From the programmers manual: In the absence of a well-motivated choice for the number of inlined elements N, it is recommended to use SmallVector<T> (that is, omitting the N).
327	If you keep this version, can you at least add that this is sorting the candidates using a map.

Sorry for the late reply.

From the implementation, the main change is the introduction of MinGainThreshold. It would sort the candidates and erase the late half of candidates if MinGainThreshold is zero, do I understand right? And if MinGainThreshold is not zero, all the candidates whose benefit is below it would be erased. The idea it self looks not bad to me. But I would like to see an upper bound for the number of specialized functions. There isn't one in current version, right? I feel good with the idea: auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1; or anything else. I am OK to implement this one in following patches. But we need one indeed.

Beyond this patch, I think the main concern is still about the cost model. All of us talked about in the revision is about a high level. We are talking about how to filter the candidates. But I think the key point or the problem might be the cost model it self. I mean, why the benefit or cost gets the number for a particular function and correspond constant argument. The original model is relatively simple and I think it has a large space to improve.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
108	We should add a link to Nikic's compile time tracker. I guess there might be readers who don't know it.
307–308	I agree that the choice of data structure here is not clean enough. From the implementation, I feel `Map` could be a sorted heap (or PriorityQueue). Here are my thoughts. llvm::PriorityQueue<std::pair<Function *, SpecializationMap>> Map; // We need a self-defined compare class here. // Maybe we need a new name. And we could remove `Sorted`. Then we could insert the valid pair to `Map`. Then we could keep it sorted. Then we could only pop the later half candidates. It is ok to use vector like container and use `heap` APIs in std to build the heap by hand. The benefit could be that we could access the container randomly. (So we could avoid pop iterators one by one)
320–321	We shouldn't insert `F` to `Map` directly. Since it is possible that this Specializations is not valid.
433–442	The 2 stage construction of `Specializations` looks not good. It would fill things in Specializations first and remove the unwanted things. It is unnecessarily wasteful. We should check before inserting.

labrinea added inline comments.Mar 15 2022, 3:31 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
108	Will do.
131–134	I tried to use realistic numbers in all the Small ADTs in this patch. I don't think we'll ever get to a specialization with more than 8 arguments.
307–308	Thanks for the input. I have a working revision which does what I suggested in my last comment to Sjoerd - a flattened SpecializationMap that contains entries for multiple functions, which I am then sorting with stable sort. I'll upload it soon.
320–321	Good catch, in an improved version of this revision I removed the entry inside the following block `if (Specializations.empty()) {`. But as I said I will upload a different approach.
433–442	I am not sure if that's possible as we only know if the Gain is above the threshold after we've accumulated the bonus from each argument.

ChuanqiXu added inline comments.Mar 15 2022, 3:57 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140	It it worth to give a new name. The current name looks not straight.
433–442	It should be possible to implement if we would like to add some new local variables. There might be some extra copies. But one the one hand, we get better readability. On the other hand, the extra copies might be significant I think.

labrinea added inline comments.Mar 15 2022, 8:30 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140	I can't think of a better name. Any ideas?
433–442	I just read the implementation of `erase()` in MapVector and it's linear to the number of entries :( However it should be possible to use `remove_if()` instead, which is also linear, but it can erase multiple elements in a single pass.

I decided to keep this patch as close as possible to the original implementation, leaving the improvements to the cost model for later patches. That said, MinGainThreshold is withdrawn, so as the sorting of specializations across multiple functions.

Harbormaster completed remote builds in B154359: Diff 415477.Mar 15 2022, 11:04 AM

labrinea added inline comments.Mar 16 2022, 3:47 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
459–460	This looks suboptimal. Will try and write an erase method for MapVector which takes an iterator range.

Just found another show stopper. We can't use stable_sort on MapVector because the underlying DenseMap which holds the vector indices will stay outdated :/

labrinea updated this revision to Diff 417350.Mar 22 2022, 11:21 AM

labrinea edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B155664: Diff 417350.Mar 22 2022, 12:39 PM

labrinea edited the summary of this revision. (Show Details)Mar 22 2022, 12:59 PM

ChuanqiXu added inline comments.Mar 22 2022, 8:48 PM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140	Oh, I am not good at naming too. But I am sure the name is relatively bad. For example, the original `ConstList` is `SmallVector<Constant>`, which is pretty makes sense. But now it is `SmallVector<std::pair<CallBase , Constant *>>` and the name is still `ConstList`... It has the same problem with `SpecializationList` and `SpecializationMap`... @SjoerdMeijer do you have any suggestion?
314	We could submit such changes standalone as a NFC patch.
333	I think this change is not necessary. And we should keep the original `Changed` variable. Given `specializeFunctions` is called multiple times, `NbFunctionsSpecialized` might be bigger than 0 all the time. So the return value of specializeFunctions wouldn't be right.
338–339	Unnecessary change.
447–452	Could we sort directly on MapVector? I see MapVector implement both `swap` and `operator[]`. So it looks possible to sort directly for MapVector. Then could we remove `WorkList`?

labrinea added inline comments.Mar 23 2022, 1:21 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140	Side note: It would be nice if I could at least define `SpecializationList` (or whatever we decide to call it) as SpecializationMap::VectorType, but unfortunately VectorType is private for MapVector.
333	Good point, I forgot `specializeFunctions` is being called inside a loop without reseting `NbFunctionsSpecialized` first. I'll fix this.
338–339	Clang format made this.
447–452	We could directly sort on MapVector, yes, but it's the same thing. I think we shouldn't get rid of the `Worklist` because if we use the MapVector after this point the indices are outdated. Therefore if we try to use the [] operator at any point, the value it returns will be wrong. This is hard to observe and we could end up dealing with issues that go unnoticed. Also we won't be able to erase the excess specializations at the end of the list. If https://reviews.llvm.org/D121817 gets accepted we might be able to do so, but still doing it directly on a vector is cheaper.

ChuanqiXu added inline comments.Mar 23 2022, 1:38 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
338–339	We could format only for diffs. This is what we generally do.
447–452	To get D121817 accepted, I think you need to comment on that thread. I think there is no block comments now. For the problem of sorting MapVector, I think add a member function `sort()` to MapVector is accepted solution. In my imagination, I feel it wouldn't be hard to implement. But from the context of the current patch, I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I guess this is due to the patch get split. So I think we could not solve the problem for MapVector now. We could solve it after we need.

labrinea added inline comments.Mar 23 2022, 2:12 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
338–339	Alright.
447–452	As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary. I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I am not following. Could you please explain?

ChuanqiXu added inline comments.Mar 23 2022, 2:21 AM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
447–452	As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary. I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I am not following. Could you please explain? Oh, I guess I took an oversight. I missed you've updated the code after you said you can't sort on MapVector. Now it looks good to me.

Changes in this revision:

separated unrelated clang-format changes to NFC patch
renamed the data types for the ADT
added an explanation comment in the new test file
rebased

Harbormaster completed remote builds in B156017: Diff 417867.Mar 24 2022, 4:20 AM

LGTM basically. Please wait for a few days in case there are other comments.

llvm/lib/Transforms/Utils/SCCPSolver.cpp
543–551	Could we use `Iter->Formal`?

This revision is now accepted and ready to land.Mar 24 2022, 7:15 PM

It's looking good to me too, just a last round of nits (inlined) and one question just to double check: is the SPEC score the same? I.e., do we still specialise what we want to specialise?

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140	Sorry, bikeshedding names: if we group the caller (CallBase) and a constant actual argument, is `ActualArgPair` an accurate description? We don't group 2 actual args, which is what this name is suggesting to me.
434	Nit: can `0 - Cost` just be `-Cost`?
439	Another nit: if the gain is 0, then arguably it is not a gain and we increase code-size?
llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll
9	Perhaps you want to turn this into a test, i.e. match the debug output too? If so, that would require asserts. Not sure, but probably best done as an additional, separate test, otherwise we might loose the testing of this for the non-assert builds and bots.

labrinea added inline comments.Mar 25 2022, 12:08 PM

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140	I agree it's not a good type name. Any ideas? Maybe it's not really necessary do define one.
434	We can't because the InstructionCost class does not implement the `-` operator.
439	We are doing the same as before (see line 423 on the left) : we reject specializations with Gain that is less or equal to zero. How does this increase code size?
llvm/lib/Transforms/Utils/SCCPSolver.cpp
543–551	possibly
llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll
9	This here is a comment; it does not contain check lines. The whole purpose is to show that specializations do get sorted and iterated correctly. I wouldn't mind adding test for debug output in a follow up patch. I couldn't find examples; not sure how to do them. Any pointers appreciated.

This revision was landed with ongoing or failed builds.Mar 28 2022, 4:08 AM

Closed by commit rG8045bf9d0dc5: [FuncSpec] Support function specialization across multiple arguments. (authored by labrinea). · Explain Why

This revision was automatically updated to reflect the committed changes.

labrinea added a commit: rG8045bf9d0dc5: [FuncSpec] Support function specialization across multiple arguments..

labrinea mentioned this in D121817: [llvm][ADT] Add a method to MapVector for erasing a range of elements..Mar 29 2022, 8:37 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

SCCPSolver.h

20 lines

lib/

Transforms/

IPO/

FunctionSpecialization.cpp

235 lines

Utils/

SCCPSolver.cpp

37 lines

test/

Transforms/

FunctionSpecialization/

function-specialization-constant-integers.ll

4 lines

function-specialization-loop.ll

12 lines

function-specialization-minsize3.ll

2 lines

function-specialization.ll

2 lines

function-specialization4.ll

4 lines

remove-dead-recursive-function.ll

2 lines

specialize-multiple-arguments.ll

172 lines

Diff 408995

llvm/include/llvm/Transforms/Utils/SCCPSolver.h

Show All 28 Lines

/// Helper struct for bundling up the analysis results per function for IPSCCP.		/// Helper struct for bundling up the analysis results per function for IPSCCP.
struct AnalysisResultsForFn {		struct AnalysisResultsForFn {
std::unique_ptr<PredicateInfo> PredInfo;		std::unique_ptr<PredicateInfo> PredInfo;
DominatorTree *DT;		DominatorTree *DT;
PostDominatorTree *PDT;		PostDominatorTree *PDT;
};		};

		/// Helper struct shared between Function Specialization and SCCP Solver.
		struct ArgInfo {
		Argument *Formal; // The Formal argument being analysed.
		Constant *Actual; // A corresponding actual constant argument.

		ArgInfo(Argument F, Constant A) : Formal(F), Actual(A) {};
		};

class SCCPInstVisitor;		class SCCPInstVisitor;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// SCCPSolver - This interface class is a general purpose solver for Sparse		/// SCCPSolver - This interface class is a general purpose solver for Sparse
/// Conditional Constant Propagation (SCCP).		/// Conditional Constant Propagation (SCCP).
///		///
class SCCPSolver {		class SCCPSolver {
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	public:

/// Helper to return a Constant if \p LV is either a constant or a constant		/// Helper to return a Constant if \p LV is either a constant or a constant
/// range with a single element.		/// range with a single element.
Constant *getConstant(const ValueLatticeElement &LV) const;		Constant *getConstant(const ValueLatticeElement &LV) const;

/// Return a reference to the set of argument tracked functions.		/// Return a reference to the set of argument tracked functions.
SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions();		SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions();

/// Mark argument \p A constant with value \p C in a new function		/// Mark the constant arguments of a new function specialization \p F. A list
/// specialization. The argument's parent function is a specialization of the		/// of {formal,actual} pairs corresponding to these constants resides in
/// original function \p F. All other arguments of the specialization inherit		/// \p Args. The formal arguments are associated with the original function
/// the lattice state of their corresponding values in the original function.		/// definition.
void markArgInFuncSpecialization(Function F, Argument A, Constant *C);		/// All other arguments of the specialization inherit the lattice state of
		/// their corresponding values in the original function.
		void markArgInFuncSpecialization(Function *F, SmallVectorImpl<ArgInfo> &Args);

/// Mark all of the blocks in function \p F non-executable. Clients can used		/// Mark all of the blocks in function \p F non-executable. Clients can used
/// this method to erase a function from the module (e.g., if it has been		/// this method to erase a function from the module (e.g., if it has been
/// completely specialized and is no longer needed).		/// completely specialized and is no longer needed).
void markFunctionUnreachable(Function *F);		void markFunctionUnreachable(Function *F);

void visit(Instruction *I);		void visit(Instruction *I);
void visitCall(CallInst &I);		void visitCall(CallInst &I);
};		};

} // namespace llvm		} // namespace llvm

#endif // LLVM_TRANSFORMS_UTILS_SCCPSOLVER_H		#endif // LLVM_TRANSFORMS_UTILS_SCCPSOLVER_H

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> FuncSpecializationMaxIters(
cl::init(1));		cl::init(1));

static cl::opt<unsigned> MaxClonesThreshold(		static cl::opt<unsigned> MaxClonesThreshold(
"func-specialization-max-clones", cl::Hidden,		"func-specialization-max-clones", cl::Hidden,
cl::desc("The maximum number of clones allowed for a single function "		cl::desc("The maximum number of clones allowed for a single function "
"specialization"),		"specialization"),
cl::init(3));		cl::init(3));

		static cl::opt<unsigned> MinGainThreshold(
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions A few comments about this: First, thanks for restoring the penalty, I like doing things one step at a time. :) Can you add comments what this is, e.g. why it is 10000, but more generally what the rationale is and what this achieves. Can you add how this interacts with other options, like MaxClonesThreshold. SjoerdMeijer: A few comments about this: - First, thanks for restoring the penalty, I like doing things one…
		"func-specialization-min-gain", cl::Hidden,
		cl::desc("The minimum gain for a specialization to be considered "
		"profitable"),
		cl::init(10000));

static cl::opt<unsigned> SmallFunctionThreshold(		static cl::opt<unsigned> SmallFunctionThreshold(
"func-specialization-size-threshold", cl::Hidden,		"func-specialization-size-threshold", cl::Hidden,
cl::desc("Don't specialize functions that have less than this theshold "		cl::desc("Don't specialize functions that have less than this theshold "
"number of instructions"),		"number of instructions"),
cl::init(100));		cl::init(100));

static cl::opt<unsigned>		static cl::opt<unsigned>
AvgLoopIterationCount("func-specialization-avg-iters-cost", cl::Hidden,		AvgLoopIterationCount("func-specialization-avg-iters-cost", cl::Hidden,
cl::desc("Average loop iteration count cost"),		cl::desc("Average loop iteration count cost"),
cl::init(10));		cl::init(10));

static cl::opt<bool> SpecializeOnAddresses(		static cl::opt<bool> SpecializeOnAddresses(
"func-specialization-on-address", cl::init(false), cl::Hidden,		"func-specialization-on-address", cl::init(false), cl::Hidden,
cl::desc("Enable function specialization on the address of global values"));		cl::desc("Enable function specialization on the address of global values"));

// TODO: This needs checking to see the impact on compile-times, which is why		// Disabled by default as it can significantly increase compilation times.
// this is off by default for now.		// Metrics based on instruction count as per
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We should add a link to Nikic's compile time tracker. I guess there might be readers who don't know it. ChuanqiXu: We should add a link to Nikic's compile time tracker. I guess there might be readers who don't…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Will do. labrinea: Will do.
		// https://llvm-compile-time-tracker.com
		// https://github.com/nikic/llvm-compile-time-tracker
		//
		// testname \| % diff
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Like I mentioned before, not sure what the value is of recording compile time numbers here as things are still in flux; i.e. I don't think this information will age very well. SjoerdMeijer: Like I mentioned before, not sure what the value is of recording compile time numbers here as…
		labrineaAuthorUnsubmitted Done Reply Inline Actions That's why I am mentioning percentages and not absolute values, but okay, I'll remove this comment. labrinea: That's why I am mentioning percentages and not absolute values, but okay, I'll remove this…
		// -----------------+-------------------
		// ClamAV \| -0.003008478350503
		// 7zip \| -0.00387846255468
		// tramp3d-v4 \| +0.032250428769666
		// kimwitu++ \| +0.012783554583518
		// sqlite3 \| +0.064026452410774
		// mafft \| -0.010917789416896
		// lencod \| -0.010513312365949
		// SPASS \| +4.0348661146709
		// Bullet \| +0.000114373245283
		// consumer-typeset \| +0.007390973545237
static cl::opt<bool> EnableSpecializationForLiteralConstant(		static cl::opt<bool> EnableSpecializationForLiteralConstant(
"function-specialization-for-literal-constant", cl::init(false), cl::Hidden,		"function-specialization-for-literal-constant", cl::init(false), cl::Hidden,
cl::desc("Enable specialization of functions that take a literal constant "		cl::desc("Enable specialization of functions that take a literal constant "
"as an argument."));		"as an argument."));

namespace {		namespace {
// Bookkeeping struct to pass data from the analysis and profitability phase		// Bookkeeping struct to pass data from the analysis and profitability phase
// to the actual transform helper functions.		// to the actual transform helper functions.
struct ArgInfo {		struct Specialization {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: perhaps `SpecializationInfo`, to be a tiny bit more consistent (with ArgInfo) and descriptive. SjoerdMeijer: Nit: perhaps `SpecializationInfo`, to be a tiny bit more consistent (with ArgInfo) and…
		labrineaAuthorUnsubmitted Done Reply Inline Actions The name `SpecializationInfo` is being used below: using SpecializationInfo = SmallMapVector<CallBase , Specialization, 8>; I will change that one to`SpecializationMap` if that's okay. labrinea:* The name `SpecializationInfo` is being used below: ``` using SpecializationInfo =…
Function *Fn; // The function to perform specialisation on.		SmallVector<ArgInfo, 8> Args; // Stores the {formal,actual} argument pairs.
Argument *Formal; // The Formal argument being analysed.		InstructionCost Gain = 0; // Profitability: Gain = Bonus - Cost.
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: we might as well drop the 8 here. From the programmers manual: In the absence of a well-motivated choice for the number of inlined elements N, it is recommended to use SmallVector<T> (that is, omitting the N). SjoerdMeijer: Nit: we might as well drop the 8 here. From the programmers manual: > In the absence of a well…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I tried to use realistic numbers in all the Small ADTs in this patch. I don't think we'll ever get to a specialization with more than 8 arguments. labrinea: I tried to use realistic numbers in all the Small ADTs in this patch. I don't think we'll ever…
Constant *Actual; // A corresponding actual constant argument.
InstructionCost Gain; // Profitability: Gain = Bonus - Cost.

// Flag if this will be a partial specialization, in which case we will need
// to keep the original function around in addition to the added
// specializations.
bool Partial = false;

ArgInfo(Function F, Argument A, Constant *C, InstructionCost G)
: Fn(F), Formal(A), Actual(C), Gain(G){};
};		};
} // Anonymous namespace		} // Anonymous namespace

using FuncList = SmallVectorImpl<Function *>;		using FuncList = SmallVectorImpl<Function *>;
using ConstList = SmallVectorImpl<Constant *>;		using ConstList = SmallVector<std::pair<CallBase , Constant >>;
		using SpecializationInfo = SmallMapVector<CallBase *, Specialization, 8>;
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions It it worth to give a new name. The current name looks not straight. ChuanqiXu: It it worth to give a new name. The current name looks not straight.
		labrineaAuthorUnsubmitted Done Reply Inline Actions I can't think of a better name. Any ideas? labrinea: I can't think of a better name. Any ideas?
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Oh, I am not good at naming too. But I am sure the name is relatively bad. For example, the original `ConstList` is `SmallVector<Constant>`, which is pretty makes sense. But now it is `SmallVector<std::pair<CallBase , Constant >>` and the name is still `ConstList`... It has the same problem with `SpecializationList` and `SpecializationMap`... @SjoerdMeijer do you have any suggestion? ChuanqiXu:* Oh, I am not good at naming too. But I am sure the name is relatively bad. For example, the…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Side note: It would be nice if I could at least define `SpecializationList` (or whatever we decide to call it) as SpecializationMap::VectorType, but unfortunately VectorType is private for MapVector. labrinea: Side note: It would be nice if I could at least define `SpecializationList` (or whatever we…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Sorry, bikeshedding names: if we group the caller (CallBase) and a constant actual argument, is `ActualArgPair` an accurate description? We don't group 2 actual args, which is what this name is suggesting to me. SjoerdMeijer: Sorry, bikeshedding names: if we group the caller (CallBase) and a constant actual argument, is…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I agree it's not a good type name. Any ideas? Maybe it's not really necessary do define one. labrinea: I agree it's not a good type name. Any ideas? Maybe it's not really necessary do define one.

// Helper to check if \p LV is either a constant or a constant		// Helper to check if \p LV is either a constant or a constant
// range with a single element. This should cover exactly the same cases as the		// range with a single element. This should cover exactly the same cases as the
// old ValueLatticeElement::isConstant() and is intended to be used in the		// old ValueLatticeElement::isConstant() and is intended to be used in the
// transition to ValueLatticeElement.		// transition to ValueLatticeElement.
static bool isConstant(const ValueLatticeElement &LV) {		static bool isConstant(const ValueLatticeElement &LV) {
return LV.isConstant() \|\|		return LV.isConstant() \|\|
(LV.isConstantRange() && LV.getConstantRange().isSingleElement());		(LV.isConstantRange() && LV.getConstantRange().isSingleElement());
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	FunctionSpecializer(SCCPSolver &Solver,
: Solver(Solver), GetAC(GetAC), GetTTI(GetTTI), GetTLI(GetTLI) {}		: Solver(Solver), GetAC(GetAC), GetTTI(GetTTI), GetTLI(GetTLI) {}

/// Attempt to specialize functions in the module to enable constant		/// Attempt to specialize functions in the module to enable constant
/// propagation across function boundaries.		/// propagation across function boundaries.
///		///
/// \returns true if at least one function is specialized.		/// \returns true if at least one function is specialized.
bool specializeFunctions(FuncList &Candidates, FuncList &WorkList) {		bool specializeFunctions(FuncList &Candidates, FuncList &WorkList) {
bool Changed = false;		bool Changed = false;
for (auto *F : Candidates) {		for (auto *F : Candidates) {
if (!isCandidateFunction(F))		if (!isCandidateFunction(F))
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions I agree that the choice of data structure here is not clean enough. From the implementation, I feel `Map` could be a sorted heap (or PriorityQueue). Here are my thoughts. llvm::PriorityQueue<std::pair<Function , SpecializationMap>> Map; // We need a self-defined compare class here. // Maybe we need a new name. And we could remove `Sorted`. Then we could insert the valid pair to `Map`. Then we could keep it sorted. Then we could only pop the later half candidates. It is ok to use vector like container and use `heap` APIs in std to build the heap by hand. The benefit could be that we could access the container randomly. (So we could avoid pop iterators one by one) ChuanqiXu:* I agree that the choice of data structure here is not clean enough. From the implementation, I…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Thanks for the input. I have a working revision which does what I suggested in my last comment to Sjoerd - a flattened SpecializationMap that contains entries for multiple functions, which I am then sorting with stable sort. I'll upload it soon. labrinea: Thanks for the input. I have a working revision which does what I suggested in my last comment…
continue;		continue;

auto Cost = getSpecializationCost(F);		auto Cost = getSpecializationCost(F);
if (!Cost.isValid()) {		if (!Cost.isValid()) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "FnSpecialization: Invalid specialisation cost.\n");		dbgs() << "FnSpecialization: Invalid specialisation cost.\n");
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We could submit such changes standalone as a NFC patch. ChuanqiXu: We could submit such changes standalone as a NFC patch.
continue;		continue;
}		}

LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for "		LLVM_DEBUG(dbgs() << "FnSpecialization: Specialization cost for "
<< F->getName() << " is " << Cost << "\n");		<< F->getName() << " is " << Cost << "\n");

auto ConstArgs = calculateGains(F, Cost);		SpecializationInfo Info = calculateGains(F, Cost);
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: `Info` could be more descriptive. SjoerdMeijer: Nit: `Info` could be more descriptive.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Will change it to `Specializations`. labrinea: Will change it to `Specializations`.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We shouldn't insert `F` to `Map` directly. Since it is possible that this Specializations is not valid. ChuanqiXu: We shouldn't insert `F` to `Map` directly. Since it is possible that this Specializations is…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Good catch, in an improved version of this revision I removed the entry inside the following block `if (Specializations.empty()) {`. But as I said I will upload a different approach. labrinea: Good catch, in an improved version of this revision I removed the entry inside the following…
if (ConstArgs.empty()) {		if (Info.empty()) {
LLVM_DEBUG(dbgs() << "FnSpecialization: no possible constants found\n");		LLVM_DEBUG(dbgs() << "FnSpecialization: no possible constants found\n");
continue;		continue;
}		}

for (auto &CA : ConstArgs) {		specializeFunction(F, Info, WorkList);
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions This isn't clear to me, I don't see how we sort things. First, we add things to a map here.... SjoerdMeijer: This isn't clear to me, I don't see how we sort things. First, we add things to a map here....
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions If you keep this version, can you at least add that this is sorting the candidates using a map. SjoerdMeijer: If you keep this version, can you at least add that this is sorting the candidates using a map.
specializeFunction(CA, WorkList);
Changed = true;		Changed = true;
}		}
}

updateSpecializedFuncs(Candidates, WorkList);		updateSpecializedFuncs(Candidates, WorkList);
NumFuncSpecialized += NbFunctionsSpecialized;
return Changed;		return Changed;
}		}
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions I think this change is not necessary. And we should keep the original `Changed` variable. Given `specializeFunctions` is called multiple times, `NbFunctionsSpecialized` might be bigger than 0 all the time. So the return value of specializeFunctions wouldn't be right. ChuanqiXu: I think this change is not necessary. And we should keep the original `Changed` variable. Given…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Good point, I forgot `specializeFunctions` is being called inside a loop without reseting `NbFunctionsSpecialized` first. I'll fix this. labrinea: Good point, I forgot `specializeFunctions` is being called inside a loop without reseting…

void removeDeadInstructions() {		void removeDeadInstructions() {
for (auto *I : ReplacedWithConstant) {		for (auto *I : ReplacedWithConstant) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead instruction "		LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead instruction "
<< *I << "\n");		<< *I << "\n");
I->eraseFromParent();		I->eraseFromParent();
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Unnecessary change. ChuanqiXu: Unnecessary change.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Clang format made this. labrinea: Clang format made this.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions We could format only for diffs. This is what we generally do. ChuanqiXu: We could format only for diffs. This is what we generally do.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Alright. labrinea: Alright.
}		}
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions ... and then we iterate over half the elements in the map here. Does the sorting rely on the keys of the map? Not sure I find that the cleanest way of doing it, think it could be a one liner and doing a llvm::stable_sort? SjoerdMeijer: ... and then we iterate over half the elements in the map here. Does the sorting rely on the…
		labrineaAuthorUnsubmitted Done Reply Inline Actions The std::map quarantees ordering of elements based on key (which is the Gain here). Stable sort requires a container with random access iteration. SpecializationMap is SmallMapVector, which might have random access, but Map is SmallDenseMap (hash table). On top of that we have nested structures (a map containing maps) so we can't sort everything all together with stable sort, right? labrinea: The std::map quarantees ordering of elements based on key (which is the Gain here). Stable sort…
ReplacedWithConstant.clear();		ReplacedWithConstant.clear();
}		}

void removeDeadFunctions() {		void removeDeadFunctions() {
for (auto *F : FullySpecialized) {		for (auto *F : FullySpecialized) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead function "		LLVM_DEBUG(dbgs() << "FnSpecialization: Removing dead function "
<< F->getName() << "\n");		<< F->getName() << "\n");
F->eraseFromParent();		F->eraseFromParent();
Show All 28 Lines	if (auto *I = dyn_cast<Instruction>(V)) {
ReplacedWithConstant.push_back(I);		ReplacedWithConstant.push_back(I);
Solver.removeLatticeValueFor(I);		Solver.removeLatticeValueFor(I);
}		}
}		}
return true;		return true;
}		}

private:		private:
// The number of functions specialised, used for collecting statistics and
// also in the cost model.
unsigned NbFunctionsSpecialized = 0;

/// Clone the function \p F and remove the ssa_copy intrinsics added by		/// Clone the function \p F and remove the ssa_copy intrinsics added by
/// the SCCPSolver in the cloned version.		/// the SCCPSolver in the cloned version.
Function cloneCandidateFunction(Function F) {		Function cloneCandidateFunction(Function F, ValueToValueMapTy &Mappings) {
ValueToValueMapTy EmptyMap;		Function *Clone = CloneFunction(F, Mappings);
Function *Clone = CloneFunction(F, EmptyMap);
removeSSACopy(*Clone);		removeSSACopy(*Clone);
return Clone;		return Clone;
}		}

/// This function decides whether it's worthwhile to specialize function \p F		/// This function decides whether it's worthwhile to specialize function \p F
/// based on the known constant values its arguments can take on, i.e. it		/// based on the known constant values its arguments can take on, i.e. it
/// calculates a gain and returns a list of actual arguments that are deemed		/// calculates a gain and returns a list of actual arguments that are deemed
/// profitable to specialize. Specialization is performed on the first		/// profitable to specialize. Specialization is performed on the first
/// interesting argument. Specializations based on additional arguments will		/// interesting argument. Specializations based on additional arguments will
/// be evaluated on following iterations of the main IPSCCP solve loop.		/// be evaluated on following iterations of the main IPSCCP solve loop.
SmallVector<ArgInfo> calculateGains(Function *F, InstructionCost Cost) {		SpecializationInfo calculateGains(Function *F, InstructionCost Cost) {
SmallVector<ArgInfo> Worklist;		SpecializationInfo Info;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: `Info` could be more descriptive. SjoerdMeijer: Nit: `Info` could be more descriptive.
		labrineaAuthorUnsubmitted Done Reply Inline Actions Will change it to `Specializations` labrinea: Will change it to `Specializations`
// Determine if we should specialize the function based on the values the		// Determine if we should specialize the function based on the values the
// argument can take on. If specialization is not profitable, we continue		// argument can take on. If specialization is not profitable, we continue
// on to the next argument.		// on to the next argument.
for (Argument &FormalArg : F->args()) {		for (Argument &FormalArg : F->args()) {
// Determine if this argument is interesting. If we know the argument can		// Determine if this argument is interesting. If we know the argument can
// take on any constant values, they are collected in Constants. If the		// take on any constant values, they are collected in Constants. If the
// argument can only ever equal a constant value in Constants, the		// argument can only ever equal a constant value in Constants, the
// function will be completely specialized, and the IsPartial flag will		// function will be completely specialized, and the IsPartial flag will
// be set to false by isArgumentInteresting (that function only adds		// be set to false by isArgumentInteresting (that function only adds
// values to the Constants list that are deemed profitable).		// values to the Constants list that are deemed profitable).
bool IsPartial = true;		bool IsPartial = true;
SmallVector<Constant *> ActualArgs;		ConstList ActualArgs;
if (!isArgumentInteresting(&FormalArg, ActualArgs, IsPartial)) {		if (!isArgumentInteresting(&FormalArg, ActualArgs, IsPartial)) {
LLVM_DEBUG(dbgs() << "FnSpecialization: Argument "		LLVM_DEBUG(dbgs() << "FnSpecialization: Argument "
<< FormalArg.getNameOrAsOperand()		<< FormalArg.getNameOrAsOperand()
<< " is not interesting\n");		<< " is not interesting\n");
continue;		continue;
}		}

for (auto *ActualArg : ActualArgs) {		for (auto &Entry : ActualArgs) {
InstructionCost Gain =		CallBase *Call = Entry.first;
ForceFunctionSpecialization		Constant *ActualArg = Entry.second;
		samtebbsUnsubmitted Not Done Reply Inline Actions The `ActualArgs` and `FormalArg` similarity is a little confusing IMO. It might be useful to call them `Arguments`/`Args` (what the values passed to a function are called) and `Parameters` respectively, as that would make the difference clear. samtebbs: The `ActualArgs` and `FormalArg` similarity is a little confusing IMO. It might be useful to…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I think we've established the naming here: https://reviews.llvm.org/D119874 (see the definition of `ArgInfo`). Wikipedia seems to agree on this: https://en.wikipedia.org/wiki/Parameter_(computer_programming). labrinea: I think we've established the naming here: https://reviews.llvm.org/D119874 (see the definition…
		samtebbsUnsubmitted Not Done Reply Inline Actions Sure, perhaps other people would have had to look extra hard because of the double use of "Arguments" but if that is also an accepted form then that's OK. samtebbs: Sure, perhaps other people would have had to look extra hard because of the double use of…
? 1
: getSpecializationBonus(&FormalArg, ActualArg) - Cost;

if (Gain <= 0)		Specialization &S = Info[Call];
continue;		if (!ForceFunctionSpecialization)
Worklist.push_back({F, &FormalArg, ActualArg, Gain});		S.Gain += getSpecializationBonus(&FormalArg, ActualArg);
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions I probably need to think a bit more about this, but perhaps you can help.... That is, now we are considering multiple arguments, and we calculate the bonus for each of them, and accumulate the bonus here. I am wondering if composition of the bonus is the right model. Alternatives are: take the average, take the minimum/maximum, sum them like you do here, or something else? Do you have any thoughts on this? SjoerdMeijer: I probably need to think a bit more about this, but perhaps you can help.... That is, now we…
		S.Args.push_back({&FormalArg, ActualArg});
		}
}		}

if (Worklist.empty())		if (Info.empty())
continue;		return Info;

// Sort the candidates in descending order.		if (ForceFunctionSpecialization) {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Nit: can `0 - Cost` just be `-Cost`? SjoerdMeijer: Nit: can `0 - Cost` just be `-Cost`?
		labrineaAuthorUnsubmitted Done Reply Inline Actions We can't because the InstructionCost class does not implement the `-` operator. labrinea: We can't because the InstructionCost class does not implement the `-` operator.
llvm::stable_sort(Worklist, [](const ArgInfo &L, const ArgInfo &R) {		// The cost model is disregarded so keep up to \p MaxClonesThreshold
return L.Gain > R.Gain;		// per function specialization.
});		while (Info.size() > MaxClonesThreshold)
		Info.erase(Info.begin());
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Can we just do a `return` after this, thus don't need the `else` and decrease indentation? SjoerdMeijer: Can we just do a `return` after this, thus don't need the `else` and decrease indentation?
		labrineaAuthorUnsubmitted Done Reply Inline Actions We would like to print the debug info that follows. It's useful even when the cost model is disregarded. labrinea: We would like to print the debug info that follows. It's useful even when the cost model is…
		} else {
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Another nit: if the gain is 0, then arguably it is not a gain and we increase code-size? SjoerdMeijer: Another nit: if the gain is 0, then arguably it is not a gain and we increase code-size?
		labrineaAuthorUnsubmitted Done Reply Inline Actions We are doing the same as before (see line 423 on the left) : we reject specializations with Gain that is less or equal to zero. How does this increase code size? labrinea: We are doing the same as before (see line 423 on the left) : we reject specializations with…
		// Account the Cost per specialization.
		SmallVector<CallBase *, 8> KeysToRemove;

		ChuanqiXuUnsubmitted Not Done Reply Inline Actions The 2 stage construction of `Specializations` looks not good. It would fill things in Specializations first and remove the unwanted things. It is unnecessarily wasteful. We should check before inserting. ChuanqiXu: The 2 stage construction of `Specializations` looks not good. It would fill things in…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I am not sure if that's possible as we only know if the Gain is above the threshold after we've accumulated the bonus from each argument. labrinea: I am not sure if that's possible as we only know if the Gain is above the threshold after we've…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions It should be possible to implement if we would like to add some new local variables. There might be some extra copies. But one the one hand, we get better readability. On the other hand, the extra copies might be significant I think. ChuanqiXu: It should be possible to implement if we would like to add some new local variables. There…
		labrineaAuthorUnsubmitted Done Reply Inline Actions I just read the implementation of `erase()` in MapVector and it's linear to the number of entries :( However it should be possible to use `remove_if()` instead, which is also linear, but it can erase multiple elements in a single pass. labrinea: I just read the implementation of `erase()` in MapVector and it's linear to the number of…
// Truncate the worklist to 'MaxClonesThreshold' candidates if		for (auto &Entry : Info) {
// necessary.		CallBase *Call = Entry.first;
if (Worklist.size() > MaxClonesThreshold) {		Specialization &S = Entry.second;
LLVM_DEBUG(dbgs() << "FnSpecialization: Number of candidates exceed "
<< "the maximum number of clones threshold.\n"		S.Gain -= Cost;
<< "FnSpecialization: Truncating worklist to "		if (S.Gain < MinGainThreshold)
<< MaxClonesThreshold << " candidates.\n");		KeysToRemove.push_back(Call);
Worklist.erase(Worklist.begin() + MaxClonesThreshold,		}
Worklist.end());		// Remove unprofitable specializations.
		for (CallBase *Call : KeysToRemove)
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Could we sort directly on MapVector? I see MapVector implement both `swap` and `operator[]`. So it looks possible to sort directly for MapVector. Then could we remove `WorkList`? ChuanqiXu: Could we sort directly on MapVector? I see MapVector implement both `swap` and `operator[]`. So…
		labrineaAuthorUnsubmitted Done Reply Inline Actions We could directly sort on MapVector, yes, but it's the same thing. I think we shouldn't get rid of the `Worklist` because if we use the MapVector after this point the indices are outdated. Therefore if we try to use the [] operator at any point, the value it returns will be wrong. This is hard to observe and we could end up dealing with issues that go unnoticed. Also we won't be able to erase the excess specializations at the end of the list. If https://reviews.llvm.org/D121817 gets accepted we might be able to do so, but still doing it directly on a vector is cheaper. labrinea: We could directly sort on MapVector, yes, but it's the same thing. I think we shouldn't get rid…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions To get D121817 accepted, I think you need to comment on that thread. I think there is no block comments now. For the problem of sorting MapVector, I think add a member function `sort()` to MapVector is accepted solution. In my imagination, I feel it wouldn't be hard to implement. But from the context of the current patch, I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I guess this is due to the patch get split. So I think we could not solve the problem for MapVector now. We could solve it after we need. ChuanqiXu: To get D121817 accepted, I think you need to comment on that thread. I think there is no block…
		labrineaAuthorUnsubmitted Done Reply Inline Actions As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary. I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I am not following. Could you please explain? labrinea: As I have explained on the description we do need an associative container to bind all the…
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary. I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I am not following. Could you please explain? Oh, I guess I took an oversight. I missed you've updated the code after you said you can't sort on MapVector. Now it looks good to me. ChuanqiXu: > As I have explained on the description we do need an associative container to bind all the…
		Info.erase(Call);
}		}

if (IsPartial \|\| Worklist.size() < ActualArgs.size())
for (auto &ActualArg : Worklist)
ActualArg.Partial = true;

LLVM_DEBUG(		LLVM_DEBUG(
		if (!Info.empty())
dbgs() << "FnSpecialization: Specializations for function "		dbgs() << "FnSpecialization: Specializations for function "
<< F->getName() << "\n";		<< F->getName() << "\n";
for (auto &C : Worklist) {		for (auto &Entry : Info) {
		labrineaAuthorUnsubmitted Done Reply Inline Actions This looks suboptimal. Will try and write an erase method for MapVector which takes an iterator range. labrinea: This looks suboptimal. Will try and write an erase method for MapVector which takes an iterator…
dbgs() << "FnSpecialization: FormalArg = "		Specialization &S = Entry.second;
<< C.Formal->getNameOrAsOperand() << ", ActualArg = "
<< C.Actual->getNameOrAsOperand() << ", Gain = "		dbgs() << "FnSpecialization: Gain = " << S.Gain << "\n";
<< C.Gain << "\n";		for (ArgInfo &Arg : S.Args)
		dbgs() << "FnSpecialization: - FormalArg = "
		<< Arg.Formal->getNameOrAsOperand() << ", " << "ActualArg = "
		<< Arg.Actual->getNameOrAsOperand() << "\n";
}		}
);		);

// FIXME: Only one argument per function.		return Info;
break;
}
return Worklist;
}		}

bool isCandidateFunction(Function *F) {		bool isCandidateFunction(Function *F) {
// Do not specialize the cloned function again.		// Do not specialize the cloned function again.
if (SpecializedFuncs.contains(F))		if (SpecializedFuncs.contains(F))
return false;		return false;

// If we're optimizing the function for size, we shouldn't specialize it.		// If we're optimizing the function for size, we shouldn't specialize it.
Show All 10 Lines	bool isCandidateFunction(Function *F) {
if (F->hasFnAttribute(Attribute::AlwaysInline))		if (F->hasFnAttribute(Attribute::AlwaysInline))
return false;		return false;

LLVM_DEBUG(dbgs() << "FnSpecialization: Try function: " << F->getName()		LLVM_DEBUG(dbgs() << "FnSpecialization: Try function: " << F->getName()
<< "\n");		<< "\n");
return true;		return true;
}		}

void specializeFunction(ArgInfo &AI, FuncList &WorkList) {		void specializeFunction(Function *F, SpecializationInfo &Info,
Function *Clone = cloneCandidateFunction(AI.Fn);		FuncList &WorkList) {
Argument *ClonedArg = Clone->getArg(AI.Formal->getArgNo());		for (auto &Entry : Info) {
		Specialization &S = Entry.second;
// Rewrite calls to the function so that they call the clone instead.
rewriteCallSites(AI.Fn, Clone, *ClonedArg, AI.Actual);		ValueToValueMapTy Mappings;
		Function *Clone = cloneCandidateFunction(F, Mappings);
		rewriteCallSites(Clone, S.Args, Mappings);
// Initialize the lattice state of the arguments of the function clone,		// Initialize the lattice state of the arguments of the function clone,
// marking the argument on which we specialized the function constant		// marking the argument on which we specialized the function constant
// with the given value.		// with the given value.
Solver.markArgInFuncSpecialization(AI.Fn, ClonedArg, AI.Actual);		Solver.markArgInFuncSpecialization(Clone, S.Args);

// Mark all the specialized functions		// Mark all the specialized functions
WorkList.push_back(Clone);		WorkList.push_back(Clone);
NbFunctionsSpecialized++;		++NumFuncSpecialized;
		}
// If the function has been completely specialized, the original function		// If the function has been completely specialized, the original function
// is no longer needed. Mark it unreachable.		// is no longer needed. Mark it unreachable.
if (AI.Fn->getNumUses() == 0 \|\|		if (F->getNumUses() == 0 \|\|
all_of(AI.Fn->users(), [&AI](User *U) {		all_of(F->users(), [F](User *U) {
if (auto *CS = dyn_cast_or_null<CallBase>(U))		if (auto *CS = dyn_cast_or_null<CallBase>(U))
return CS->getFunction() == AI.Fn;		return CS->getFunction() == F;
return false;		return false;
})) {		})) {
Solver.markFunctionUnreachable(AI.Fn);		Solver.markFunctionUnreachable(F);
FullySpecialized.push_back(AI.Fn);		FullySpecialized.push_back(F);
}		}
}		}

/// Compute and return the cost of specializing function \p F.		/// Compute and return the cost of specializing function \p F.
InstructionCost getSpecializationCost(Function *F) {		InstructionCost getSpecializationCost(Function *F) {
// Compute the code metrics for the function.		// Compute the code metrics for the function.
SmallPtrSet<const Value *, 32> EphValues;		SmallPtrSet<const Value *, 32> EphValues;
CodeMetrics::collectEphemeralValues(F, &(GetAC)(*F), EphValues);		CodeMetrics::collectEphemeralValues(F, &(GetAC)(*F), EphValues);
Show All 9 Lines	if (Metrics.notDuplicatable \|\|
(!ForceFunctionSpecialization &&		(!ForceFunctionSpecialization &&
Metrics.NumInsts < SmallFunctionThreshold)) {		Metrics.NumInsts < SmallFunctionThreshold)) {
InstructionCost C{};		InstructionCost C{};
C.setInvalid();		C.setInvalid();
return C;		return C;
}		}

// Otherwise, set the specialization cost to be the cost of all the		// Otherwise, set the specialization cost to be the cost of all the
// instructions in the function and penalty for specializing more functions.		// instructions in the function.
unsigned Penalty = NbFunctionsSpecialized + 1;		return Metrics.NumInsts * InlineConstants::InstrCost;
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions This was fundamental how the cost model used to work: the more function specialised, the harder that become due to the Penalty. So I disagree with this statement: I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation .... This was not arbitrary, but by design. SjoerdMeijer: This was fundamental how the cost model used to work: the more function specialised, the harder…
		labrineaAuthorUnsubmitted Done Reply Inline Actions Maybe "arbitrary" was the wrong vocabulary here. Sorry. What I mean is that it is not fair to bias the calculation of the cost of a given function based on historical data as "how many specializations have happend so far". As I explained on the description, a potential specialization may never trigger even if it is more profitable from one that did, just because it was discovered first. I could separate this change to another patch if it makes the review easier. labrinea: Maybe "arbitrary" was the wrong vocabulary here. Sorry. What I mean is that it is not fair to…
		SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Ok, I see, I probably didn't get that idea, but it makes sense. I guess the general idea is to collect all candidates first, calculate a profitability, then select a few candidates. Maybe I am still expecting some sort of penalty the more gets specialised. But definitely, if you can split more things off, that would certainly help. This needs a rebase anyway I guess, and I need to look at this again after that. SjoerdMeijer: Ok, I see, I probably didn't get that idea, but it makes sense. I guess the general idea is to…
return Metrics.NumInsts * InlineConstants::InstrCost * Penalty;
}		}

InstructionCost getUserBonus(User *U, llvm::TargetTransformInfo &TTI,		InstructionCost getUserBonus(User *U, llvm::TargetTransformInfo &TTI,
LoopInfo &LI) {		LoopInfo &LI) {
auto *I = dyn_cast_or_null<Instruction>(U);		auto *I = dyn_cast_or_null<Instruction>(U);
// If not an instruction we do not know how to evaluate.		// If not an instruction we do not know how to evaluate.
// Keep minimum possible cost for now so that it doesnt affect		// Keep minimum possible cost for now so that it doesnt affect
// specialization.		// specialization.
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	bool isArgumentInteresting(Argument *A, ConstList &Constants,
// TODO 1: currently it won't specialize if there are over the threshold of		// TODO 1: currently it won't specialize if there are over the threshold of
// calls using the same argument, e.g foo(a) x 4 and foo(b) x 1, but it		// calls using the same argument, e.g foo(a) x 4 and foo(b) x 1, but it
// might be beneficial to take the occurrences into account in the cost		// might be beneficial to take the occurrences into account in the cost
// model, so we would need to find the unique constants.		// model, so we would need to find the unique constants.
//		//
// TODO 2: this currently does not support constants, i.e. integer ranges.		// TODO 2: this currently does not support constants, i.e. integer ranges.
//		//
IsPartial = !getPossibleConstants(A, Constants);		IsPartial = !getPossibleConstants(A, Constants);

		if (Constants.empty())
		return false;

LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting argument "		LLVM_DEBUG(dbgs() << "FnSpecialization: Found interesting argument "
<< A->getNameOrAsOperand() << "\n");		<< A->getNameOrAsOperand() << "\n");
return true;		return true;
}		}

/// Collect in \p Constants all the constant values that argument \p A can		/// Collect in \p Constants all the constant values that argument \p A can
/// take on.		/// take on.
///		///
Show All 40 Lines	for (User *U : F->users()) {
return false;		return false;

if (!GV->getValueType()->isSingleValueType())		if (!GV->getValueType()->isSingleValueType())
return false;		return false;
}		}

if (isa<Constant>(V) && (Solver.getLatticeValueFor(V).isConstant() \|\|		if (isa<Constant>(V) && (Solver.getLatticeValueFor(V).isConstant() \|\|
EnableSpecializationForLiteralConstant))		EnableSpecializationForLiteralConstant))
Constants.push_back(cast<Constant>(V));		Constants.push_back(std::make_pair(&CS, cast<Constant>(V)));
else		else
AllConstant = false;		AllConstant = false;
}		}

// If the argument can only take on constant values, AllConstant will be		// If the argument can only take on constant values, AllConstant will be
// true.		// true.
return AllConstant;		return AllConstant;
}		}

/// Rewrite calls to function \p F to call function \p Clone instead.		/// Rewrite calls to function \p F to call function \p Clone instead.
///		///
/// This function modifies calls to function \p F whose argument at index \p		/// This function modifies calls to function \p F as long as the actual
/// ArgNo is equal to constant \p C. The calls are rewritten to call function		/// arguments match those in \p Args. Note that for recursive calls we
/// \p Clone instead.		/// need to compare against the cloned formal arguments.
///		///
/// Callsites that have been marked with the MinSize function attribute won't		/// Callsites that have been marked with the MinSize function attribute won't
/// be specialized and rewritten.		/// be specialized and rewritten.
void rewriteCallSites(Function F, Function Clone, Argument &Arg,		void rewriteCallSites(Function *Clone, SmallVectorImpl<ArgInfo> &Args,
Constant *C) {		ValueToValueMapTy &Mappings) {
unsigned ArgNo = Arg.getArgNo();		assert(!Args.empty() && "Specialization without arguments");
SmallVector<CallBase *, 4> CallSitesToRewrite;		Function *F = Args[0].Formal->getParent();

		SmallVector<CallBase *, 8> CallSitesToRewrite;
for (auto *U : F->users()) {		for (auto *U : F->users()) {
if (!isa<CallInst>(U) && !isa<InvokeInst>(U))		if (!isa<CallInst>(U) && !isa<InvokeInst>(U))
continue;		continue;
auto &CS = *cast<CallBase>(U);		auto &CS = *cast<CallBase>(U);
if (!CS.getCalledFunction() \|\| CS.getCalledFunction() != F)		if (!CS.getCalledFunction() \|\| CS.getCalledFunction() != F)
continue;		continue;
CallSitesToRewrite.push_back(&CS);		CallSitesToRewrite.push_back(&CS);
}		}

LLVM_DEBUG(dbgs() << "FnSpecialization: Replacing call sites of "		LLVM_DEBUG(dbgs() << "FnSpecialization: Replacing call sites of "
<< F->getName() << " with "		<< F->getName() << " with "
<< Clone->getName() << "\n");		<< Clone->getName() << "\n");

for (auto *CS : CallSitesToRewrite) {		for (auto *CS : CallSitesToRewrite) {
LLVM_DEBUG(dbgs() << "FnSpecialization: "		LLVM_DEBUG(dbgs() << "FnSpecialization: "
<< CS->getFunction()->getName() << " ->"		<< CS->getFunction()->getName() << " ->"
<< *CS << "\n");		<< *CS << "\n");
if ((CS->getFunction() == Clone && CS->getArgOperand(ArgNo) == &Arg) \|\|		if (/* recursive call */
CS->getArgOperand(ArgNo) == C) {		(CS->getFunction() == Clone &&
		all_of(Args, [CS, &Mappings](ArgInfo &Arg) {
		unsigned ArgNo = Arg.Formal->getArgNo();
		return CS->getArgOperand(ArgNo) == Mappings[Arg.Formal]; })) \|\|
		/* normal call */
		all_of(Args, [CS](ArgInfo &Arg) {
		unsigned ArgNo = Arg.Formal->getArgNo();
		return CS->getArgOperand(ArgNo) == Arg.Actual; })) {
CS->setCalledFunction(Clone);		CS->setCalledFunction(Clone);
Solver.markOverdefined(CS);		Solver.markOverdefined(CS);
}		}
}		}
}		}

void updateSpecializedFuncs(FuncList &Candidates, FuncList &WorkList) {		void updateSpecializedFuncs(FuncList &Candidates, FuncList &WorkList) {
for (auto *F : WorkList) {		for (auto *F : WorkList) {
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/SCCPSolver.cpp

Show First 20 Lines • Show All 446 Lines • ▼ Show 20 Lines	public:
bool isStructLatticeConstant(Function F, StructType STy);		bool isStructLatticeConstant(Function F, StructType STy);

Constant *getConstant(const ValueLatticeElement &LV) const;		Constant *getConstant(const ValueLatticeElement &LV) const;

SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions() {		SmallPtrSetImpl<Function *> &getArgumentTrackedFunctions() {
return TrackingIncomingArguments;		return TrackingIncomingArguments;
}		}

void markArgInFuncSpecialization(Function F, Argument A, Constant *C);		void markArgInFuncSpecialization(Function *F, SmallVectorImpl<ArgInfo> &Args);

void markFunctionUnreachable(Function *F) {		void markFunctionUnreachable(Function *F) {
for (auto &BB : *F)		for (auto &BB : *F)
BBExecutable.erase(&BB);		BBExecutable.erase(&BB);
}		}
};		};

} // namespace llvm		} // namespace llvm
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	Constant *SCCPInstVisitor::getConstant(const ValueLatticeElement &LV) const {
if (LV.isConstantRange()) {		if (LV.isConstantRange()) {
const auto &CR = LV.getConstantRange();		const auto &CR = LV.getConstantRange();
if (CR.getSingleElement())		if (CR.getSingleElement())
return ConstantInt::get(Ctx, *CR.getSingleElement());		return ConstantInt::get(Ctx, *CR.getSingleElement());
}		}
return nullptr;		return nullptr;
}		}

void SCCPInstVisitor::markArgInFuncSpecialization(Function F, Argument A,		void
Constant *C) {		SCCPInstVisitor::markArgInFuncSpecialization(Function *F,
assert(F->arg_size() == A->getParent()->arg_size() &&		SmallVectorImpl<ArgInfo> &Args) {
		assert(!Args.empty() && "Specialization without arguments");
		assert(F->arg_size() == Args[0].Formal->getParent()->arg_size() &&
"Functions should have the same number of arguments");		"Functions should have the same number of arguments");

// Mark the argument constant in the new function.		auto Iter = Args.begin();
markConstant(A, C);		Argument *NewArg = F->arg_begin();
		Argument *OldArg = Args[0].Formal->getParent()->arg_begin();
// For the remaining arguments in the new function, copy the lattice state		for (auto End = F->arg_end(); NewArg != End; ++NewArg, ++OldArg) {
// over from the old function.
for (Argument OldArg = F->arg_begin(), NewArg = A->getParent()->arg_begin(),
*End = F->arg_end();
OldArg != End; ++OldArg, ++NewArg) {

LLVM_DEBUG(dbgs() << "SCCP: Marking argument "		LLVM_DEBUG(dbgs() << "SCCP: Marking argument "
<< NewArg->getNameOrAsOperand() << "\n");		<< NewArg->getNameOrAsOperand() << "\n");

if (NewArg != A && ValueState.count(OldArg)) {		if (OldArg == (*Iter).Formal) {
		// Mark the argument constants in the new function.
		markConstant(NewArg, (*Iter).Actual);
		++Iter;
		} else if (ValueState.count(OldArg)) {
		// For the remaining arguments in the new function, copy the lattice state
		// over from the old function.
		//
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions Could we use `Iter->Formal`? ChuanqiXu: Could we use `Iter->Formal`?
		labrineaAuthorUnsubmitted Done Reply Inline Actions possibly labrinea: possibly
// Note: This previously looked like this:		// Note: This previously looked like this:
// ValueState[NewArg] = ValueState[OldArg];		// ValueState[NewArg] = ValueState[OldArg];
// This is incorrect because the DenseMap class may resize the underlying		// This is incorrect because the DenseMap class may resize the underlying
// memory when inserting `NewArg`, which will invalidate the reference to		// memory when inserting `NewArg`, which will invalidate the reference to
// `OldArg`. Instead, we make sure `NewArg` exists before setting it.		// `OldArg`. Instead, we make sure `NewArg` exists before setting it.
auto &NewValue = ValueState[NewArg];		auto &NewValue = ValueState[NewArg];
NewValue = ValueState[OldArg];		NewValue = ValueState[OldArg];
pushToWorkList(NewValue, NewArg);		pushToWorkList(NewValue, NewArg);
▲ Show 20 Lines • Show All 1,158 Lines • ▼ Show 20 Lines
Constant *SCCPSolver::getConstant(const ValueLatticeElement &LV) const {		Constant *SCCPSolver::getConstant(const ValueLatticeElement &LV) const {
return Visitor->getConstant(LV);		return Visitor->getConstant(LV);
}		}

SmallPtrSetImpl<Function *> &SCCPSolver::getArgumentTrackedFunctions() {		SmallPtrSetImpl<Function *> &SCCPSolver::getArgumentTrackedFunctions() {
return Visitor->getArgumentTrackedFunctions();		return Visitor->getArgumentTrackedFunctions();
}		}

void SCCPSolver::markArgInFuncSpecialization(Function F, Argument A,		void SCCPSolver::markArgInFuncSpecialization(Function *F,
Constant *C) {		SmallVectorImpl<ArgInfo> &Args) {
Visitor->markArgInFuncSpecialization(F, A, C);		Visitor->markArgInFuncSpecialization(F, Args);
}		}

void SCCPSolver::markFunctionUnreachable(Function *F) {		void SCCPSolver::markFunctionUnreachable(Function *F) {
Visitor->markFunctionUnreachable(F);		Visitor->markFunctionUnreachable(F);
}		}

void SCCPSolver::visit(Instruction *I) { Visitor->visit(I); }		void SCCPSolver::visit(Instruction *I) { Visitor->visit(I); }

void SCCPSolver::visitCall(CallInst &I) { Visitor->visitCall(I); }		void SCCPSolver::visitCall(CallInst &I) { Visitor->visitCall(I); }

llvm/test/Transforms/FunctionSpecialization/function-specialization-constant-integers.ll

	; RUN: opt -function-specialization -function-specialization-for-literal-constant=true -func-specialization-size-threshold=10 -S < %s \| FileCheck %s			; RUN: opt -function-specialization -func-specialization-min-gain=50 -function-specialization-for-literal-constant=true -func-specialization-size-threshold=10 -S < %s \| FileCheck %s

	; Check that the literal constant parameter could be specialized.			; Check that the literal constant parameter could be specialized.
	; CHECK: @foo.1(			; CHECK: @foo.1(
	; CHECK: @foo.2(			; CHECK: @foo.2(

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	declare i32 @getValue()			declare i32 @getValue()
	Show All 26 Lines
	}			}

	define dso_local i32 @bar(i32 %x, i32 %y) {			define dso_local i32 @bar(i32 %x, i32 %y) {
	entry:			entry:
	%retval.1 = call i32 @foo(i1 1)			%retval.1 = call i32 @foo(i1 1)
	%retval.2 = call i32 @foo(i1 0)			%retval.2 = call i32 @foo(i1 0)
	%retval = add nsw i32 %retval.1, %retval.2			%retval = add nsw i32 %retval.1, %retval.2
	ret i32 %retval			ret i32 %retval
	}			}
	No newline at end of file

llvm/test/Transforms/FunctionSpecialization/function-specialization-loop.ll

	; RUN: opt -function-specialization -func-specialization-avg-iters-cost=3 -func-specialization-size-threshold=10 -S < %s \| FileCheck %s			; RUN: opt -function-specialization -func-specialization-min-gain=20000 -func-specialization-avg-iters-cost=10 -func-specialization-size-threshold=10 -S < %s \| FileCheck %s --check-prefix=HIGH_ITER_COST
				; RUN: opt -function-specialization -func-specialization-min-gain=20000 -func-specialization-avg-iters-cost=3 -func-specialization-size-threshold=10 -S < %s \| FileCheck %s --check-prefix=LOW_ITER_COST

	; Check that the loop depth results in a larger specialization bonus.			; Check that the loop depth results in a larger specialization bonus.
	; CHECK: @foo.1(			; HIGH_ITER_COST: @foo.1(
	; CHECK: @foo.2(			; HIGH_ITER_COST: @foo.2(

				; LOW_ITER_COST-NOT: @foo.1(
				; LOW_ITER_COST-NOT: @foo.2(

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"

	@A = external dso_local constant i32, align 4			@A = external dso_local constant i32, align 4
	@B = external dso_local constant i32, align 4			@B = external dso_local constant i32, align 4
	@C = external dso_local constant i32, align 4			@C = external dso_local constant i32, align 4
	@D = external dso_local constant i32, align 4			@D = external dso_local constant i32, align 4

	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines

	if.else:			if.else:
	%call1 = call i32 @foo(i32 %y, i32* @B, i32* @D)			%call1 = call i32 @foo(i32 %y, i32* @B, i32* @D)
	br label %return			br label %return

	return:			return:
	%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]			%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
	ret i32 %retval.0			ret i32 %retval.0
	}			}
	No newline at end of file

llvm/test/Transforms/FunctionSpecialization/function-specialization-minsize3.ll

	; RUN: opt -function-specialization -func-specialization-size-threshold=3 -S < %s \| FileCheck %s			; RUN: opt -function-specialization -func-specialization-min-gain=500 -func-specialization-size-threshold=3 -S < %s \| FileCheck %s

	; Checks for callsites that have been annotated with MinSize. We only expect			; Checks for callsites that have been annotated with MinSize. We only expect
	; specialisation for the call that does not have the attribute:			; specialisation for the call that does not have the attribute:
	;			;
	; CHECK: plus:			; CHECK: plus:
	; CHECK: %tmp0 = call i64 @compute.1(i64 %x, i64 (i64)* @plus)			; CHECK: %tmp0 = call i64 @compute.1(i64 %x, i64 (i64)* @plus)
	; CHECK: br label %merge			; CHECK: br label %merge
	; CHECK: minus:			; CHECK: minus:
	Show All 39 Lines

llvm/test/Transforms/FunctionSpecialization/function-specialization.ll

	; RUN: opt -function-specialization -func-specialization-size-threshold=3 -S < %s \| FileCheck %s			; RUN: opt -function-specialization -func-specialization-min-gain=500 -func-specialization-size-threshold=3 -S < %s \| FileCheck %s

	define i64 @main(i64 %x, i1 %flag) {			define i64 @main(i64 %x, i1 %flag) {
	;			;
	; CHECK-LABEL: @main(i64 %x, i1 %flag) {			; CHECK-LABEL: @main(i64 %x, i1 %flag) {
	; CHECK: entry:			; CHECK: entry:
	; CHECK-NEXT: br i1 %flag, label %plus, label %minus			; CHECK-NEXT: br i1 %flag, label %plus, label %minus
	; CHECK: plus:			; CHECK: plus:
	; CHECK-NEXT: [[TMP0:%.+]] = call i64 @compute.1(i64 %x, i64 (i64)* @plus)			; CHECK-NEXT: [[TMP0:%.+]] = call i64 @compute.1(i64 %x, i64 (i64)* @plus)
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/function-specialization4.ll

	Show All 40 Lines

	; CONST1: define internal i32 @foo.1(i32 %x, i32* %b, i32* %c)			; CONST1: define internal i32 @foo.1(i32 %x, i32* %b, i32* %c)
	; CONST1-NOT: define internal i32 @foo.2(i32 %x, i32* %b, i32* %c)			; CONST1-NOT: define internal i32 @foo.2(i32 %x, i32* %b, i32* %c)

	; CHECK: define internal i32 @foo.1(i32 %x, i32* %b, i32* %c) {			; CHECK: define internal i32 @foo.1(i32 %x, i32* %b, i32* %c) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: %0 = load i32, i32* @A, align 4			; CHECK-NEXT: %0 = load i32, i32* @A, align 4
	; CHECK-NEXT: %add = add nsw i32 %x, %0			; CHECK-NEXT: %add = add nsw i32 %x, %0
	; CHECK-NEXT: %1 = load i32, i32* %c, align 4			; CHECK-NEXT: %1 = load i32, i32* @C, align 4
	; CHECK-NEXT: %add1 = add nsw i32 %add, %1			; CHECK-NEXT: %add1 = add nsw i32 %add, %1
	; CHECK-NEXT: ret i32 %add1			; CHECK-NEXT: ret i32 %add1
	; CHECK-NEXT: }			; CHECK-NEXT: }

	; CHECK: define internal i32 @foo.2(i32 %x, i32* %b, i32* %c) {			; CHECK: define internal i32 @foo.2(i32 %x, i32* %b, i32* %c) {
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: %0 = load i32, i32* @B, align 4			; CHECK-NEXT: %0 = load i32, i32* @B, align 4
	; CHECK-NEXT: %add = add nsw i32 %x, %0			; CHECK-NEXT: %add = add nsw i32 %x, %0
	; CHECK-NEXT: %1 = load i32, i32* %c, align 4			; CHECK-NEXT: %1 = load i32, i32* @D, align 4
	; CHECK-NEXT: %add1 = add nsw i32 %add, %1			; CHECK-NEXT: %add1 = add nsw i32 %add, %1
	; CHECK-NEXT: ret i32 %add1			; CHECK-NEXT: ret i32 %add1
	; CHECK-NEXT: }			; CHECK-NEXT: }

llvm/test/Transforms/FunctionSpecialization/remove-dead-recursive-function.ll

	; RUN: opt -function-specialization -func-specialization-size-threshold=3 -S < %s \| FileCheck %s			; RUN: opt -function-specialization -func-specialization-min-gain=400 -func-specialization-size-threshold=3 -S < %s \| FileCheck %s

	define i64 @main(i64 %x, i1 %flag) {			define i64 @main(i64 %x, i1 %flag) {
	entry:			entry:
	br i1 %flag, label %plus, label %minus			br i1 %flag, label %plus, label %minus

	plus:			plus:
	%tmp0 = call i64 @compute(i64 %x, i64 (i64)* @plus)			%tmp0 = call i64 @compute(i64 %x, i64 (i64)* @plus)
	br label %merge			br label %merge
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

This file was added.

				; RUN: opt -function-specialization -func-specialization-min-gain=1000 -func-specialization-size-threshold=14 -S < %s \| FileCheck %s --check-prefix=NONE
				; RUN: opt -function-specialization -func-specialization-min-gain=900 -func-specialization-size-threshold=14 -S < %s \| FileCheck %s --check-prefixes=ONE
				; RUN: opt -function-specialization -func-specialization-min-gain=700 -func-specialization-size-threshold=14 -S < %s \| FileCheck %s --check-prefix=TWO
				; RUN: opt -function-specialization -func-specialization-min-gain=600 -func-specialization-size-threshold=14 -S < %s \| FileCheck %s --check-prefix=THREE

				define i64 @main(i64 %x, i64 %y, i1 %flag) {
				; NONE-LABEL: @main(
				; NONE-NEXT: entry:
				; NONE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
				SjoerdMeijerUnsubmitted Not Done Reply Inline Actions Perhaps you want to turn this into a test, i.e. match the debug output too? If so, that would require asserts. Not sure, but probably best done as an additional, separate test, otherwise we might loose the testing of this for the non-assert builds and bots. SjoerdMeijer: Perhaps you want to turn this into a test, i.e. match the debug output too? If so, that would…
				labrineaAuthorUnsubmitted Done Reply Inline Actions This here is a comment; it does not contain check lines. The whole purpose is to show that specializations do get sorted and iterated correctly. I wouldn't mind adding test for debug output in a follow up patch. I couldn't find examples; not sure how to do them. Any pointers appreciated. labrinea: This here is a comment; it does not contain check lines. The whole purpose is to show that…
				; NONE: plus:
				; NONE-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.]], i64 (i64, i64) @power, i64 (i64, i64)* @mul)
				; NONE-NEXT: br label [[MERGE:%.*]]
				; NONE: minus:
				; NONE-NEXT: [[TMP1:%.]] = call i64 @compute(i64 [[X]], i64 [[Y]], i64 (i64, i64) @minus, i64 (i64, i64)* @power)
				; NONE-NEXT: br label [[MERGE]]
				; NONE: merge:
				; NONE-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
				; NONE-NEXT: [[TMP3:%.]] = call i64 @compute(i64 [[TMP2]], i64 42, i64 (i64, i64) @plus, i64 (i64, i64)* @minus)
				; NONE-NEXT: ret i64 [[TMP3]]
				;
				; ONE-LABEL: @main(
				; ONE-NEXT: entry:
				; ONE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
				; ONE: plus:
				; ONE-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.]], i64 (i64, i64) @power, i64 (i64, i64)* @mul)
				; ONE-NEXT: br label [[MERGE:%.*]]
				; ONE: minus:
				; ONE-NEXT: [[TMP1:%.]] = call i64 @compute(i64 [[X]], i64 [[Y]], i64 (i64, i64) @minus, i64 (i64, i64)* @power)
				; ONE-NEXT: br label [[MERGE]]
				; ONE: merge:
				; ONE-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
				; ONE-NEXT: [[TMP3:%.]] = call i64 @compute.1(i64 [[TMP2]], i64 42, i64 (i64, i64) @plus, i64 (i64, i64)* @minus)
				; ONE-NEXT: ret i64 [[TMP3]]
				;
				; TWO-LABEL: @main(
				; TWO-NEXT: entry:
				; TWO-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
				; TWO: plus:
				; TWO-NEXT: [[TMP0:%.]] = call i64 @compute(i64 [[X:%.]], i64 [[Y:%.]], i64 (i64, i64) @power, i64 (i64, i64)* @mul)
				; TWO-NEXT: br label [[MERGE:%.*]]
				; TWO: minus:
				; TWO-NEXT: [[TMP1:%.]] = call i64 @compute.1(i64 [[X]], i64 [[Y]], i64 (i64, i64) @minus, i64 (i64, i64)* @power)
				; TWO-NEXT: br label [[MERGE]]
				; TWO: merge:
				; TWO-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
				; TWO-NEXT: [[TMP3:%.]] = call i64 @compute.2(i64 [[TMP2]], i64 42, i64 (i64, i64) @plus, i64 (i64, i64)* @minus)
				; TWO-NEXT: ret i64 [[TMP3]]
				;
				; THREE-LABEL: @main(
				; THREE-NEXT: entry:
				; THREE-NEXT: br i1 [[FLAG:%.]], label [[PLUS:%.]], label [[MINUS:%.*]]
				; THREE: plus:
				; THREE-NEXT: [[TMP0:%.]] = call i64 @compute.1(i64 [[X:%.]], i64 [[Y:%.]], i64 (i64, i64) @power, i64 (i64, i64)* @mul)
				; THREE-NEXT: br label [[MERGE:%.*]]
				; THREE: minus:
				; THREE-NEXT: [[TMP1:%.]] = call i64 @compute.2(i64 [[X]], i64 [[Y]], i64 (i64, i64) @minus, i64 (i64, i64)* @power)
				; THREE-NEXT: br label [[MERGE]]
				; THREE: merge:
				; THREE-NEXT: [[TMP2:%.*]] = phi i64 [ [[TMP0]], [[PLUS]] ], [ [[TMP1]], [[MINUS]] ]
				; THREE-NEXT: [[TMP3:%.]] = call i64 @compute.3(i64 [[TMP2]], i64 42, i64 (i64, i64) @plus, i64 (i64, i64)* @minus)
				; THREE-NEXT: ret i64 [[TMP3]]
				;
				entry:
				br i1 %flag, label %plus, label %minus

				plus:
				%tmp0 = call i64 @compute(i64 %x, i64 %y, i64 (i64, i64)* @power, i64 (i64, i64)* @mul)
				br label %merge

				minus:
				%tmp1 = call i64 @compute(i64 %x, i64 %y, i64 (i64, i64)* @minus, i64 (i64, i64)* @power)
				br label %merge

				merge:
				%tmp2 = phi i64 [ %tmp0, %plus ], [ %tmp1, %minus]
				%tmp3 = call i64 @compute(i64 %tmp2, i64 42, i64 (i64, i64)* @plus, i64 (i64, i64)* @minus)
				ret i64 %tmp3
				}

				; THREE-NOT: define internal i64 @compute
				;
				; THREE-LABEL: define internal i64 @compute.1(i64 %x, i64 %y, i64 (i64, i64)* %binop1, i64 (i64, i64)* %binop2) {
				; THREE-NEXT: entry:
				; THREE-NEXT: [[TMP0:%.+]] = call i64 @power(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP1:%.+]] = call i64 @mul(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
				; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
				; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
				; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
				; THREE-NEXT: ret i64 [[TMP5]]
				; THREE-NEXT: }
				;
				; THREE-LABEL: define internal i64 @compute.2(i64 %x, i64 %y, i64 (i64, i64)* %binop1, i64 (i64, i64)* %binop2) {
				; THREE-NEXT: entry:
				; THREE-NEXT: [[TMP0:%.+]] = call i64 @minus(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP1:%.+]] = call i64 @power(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
				; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
				; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
				; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
				; THREE-NEXT: ret i64 [[TMP5]]
				; THREE-NEXT: }
				;
				; THREE-LABEL: define internal i64 @compute.3(i64 %x, i64 %y, i64 (i64, i64)* %binop1, i64 (i64, i64)* %binop2) {
				; THREE-NEXT: entry:
				; THREE-NEXT: [[TMP0:%.+]] = call i64 @plus(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP1:%.+]] = call i64 @minus(i64 %x, i64 %y)
				; THREE-NEXT: [[TMP2:%.+]] = add i64 [[TMP0]], [[TMP1]]
				; THREE-NEXT: [[TMP3:%.+]] = sdiv i64 [[TMP2]], %x
				; THREE-NEXT: [[TMP4:%.+]] = sub i64 [[TMP3]], %y
				; THREE-NEXT: [[TMP5:%.+]] = mul i64 [[TMP4]], 2
				; THREE-NEXT: ret i64 [[TMP5]]
				; THREE-NEXT: }
				;
				define internal i64 @compute(i64 %x, i64 %y, i64 (i64, i64)* %binop1, i64 (i64, i64)* %binop2) {
				entry:
				%tmp0 = call i64 %binop1(i64 %x, i64 %y)
				%tmp1 = call i64 %binop2(i64 %x, i64 %y)
				%add = add i64 %tmp0, %tmp1
				%div = sdiv i64 %add, %x
				%sub = sub i64 %div, %y
				%mul = mul i64 %sub, 2
				ret i64 %mul
				}

				define internal i64 @plus(i64 %x, i64 %y) {
				entry:
				%tmp0 = add i64 %x, %y
				ret i64 %tmp0
				}

				define internal i64 @minus(i64 %x, i64 %y) {
				entry:
				%tmp0 = sub i64 %x, %y
				ret i64 %tmp0
				}

				define internal i64 @mul(i64 %x, i64 %n) {
				entry:
				%cmp6 = icmp sgt i64 %n, 1
				br i1 %cmp6, label %for.body, label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.body, %entry
				%x.addr.0.lcssa = phi i64 [ %x, %entry ], [ %add, %for.body ]
				ret i64 %x.addr.0.lcssa

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 1, %entry ]
				%x.addr.07 = phi i64 [ %add, %for.body ], [ %x, %entry ]
				%add = shl nsw i64 %x.addr.07, 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

				define internal i64 @power(i64 %x, i64 %n) {
				entry:
				%cmp6 = icmp sgt i64 %n, 1
				br i1 %cmp6, label %for.body, label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.body, %entry
				%x.addr.0.lcssa = phi i64 [ %x, %entry ], [ %mul, %for.body ]
				ret i64 %x.addr.0.lcssa

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 1, %entry ]
				%x.addr.07 = phi i64 [ %mul, %for.body ], [ %x, %entry ]
				%mul = mul nsw i64 %x.addr.07, %x.addr.07
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond.not = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
				}

This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Support function specialization across multiple arguments.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 408995

llvm/include/llvm/Transforms/Utils/SCCPSolver.h

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp

llvm/lib/Transforms/Utils/SCCPSolver.cpp

llvm/test/Transforms/FunctionSpecialization/function-specialization-constant-integers.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization-loop.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization-minsize3.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization.ll

llvm/test/Transforms/FunctionSpecialization/function-specialization4.ll

llvm/test/Transforms/FunctionSpecialization/remove-dead-recursive-function.ll

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll

[FuncSpec] Support function specialization across multiple arguments.
ClosedPublic