This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Support function specialization across multiple arguments.
ClosedPublic

Authored by labrinea on Feb 15 2022, 12:10 PM.

Details

Summary

The current implementation of Function Specialization does not allow specializing more than one arguments per function call, which is a limitation I am lifting with this patch.

My main challenge was to choose the most suitable ADT for storing the specializations. We need an associative container for binding all the actual arguments of a specialization to the function call. We also need a consistent iteration order across executions. Lastly we want to be able to sort the entries by Gain and reject the least profitable ones.

MapVector fits the bill but not quite; erasing elements is expensive and using stable_sort messes up the indices to the underlying vector. I am therefore using the underlying vector directly after calculating the Gain.

Diff Detail

Event Timeline

labrinea created this revision.Feb 15 2022, 12:10 PM
labrinea requested review of this revision.Feb 15 2022, 12:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2022, 12:10 PM

There is quite a lot to unpack here:

This patch makes a significant change in the cost model as it no longer seems sensible to calculate the specialization gain per function argument, but rather as a whole. I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation for newly discovered functions, and the truncation of clones for a given function.

My request would be to split this up and do one thing at a time if possible. There also seems to a bit of refactoring and NFC changes in here, probably also best split off.

Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Clarifying my previous comment a bit more:

Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Compile times are important of course. But what I want to say is that we should aim to lift some of these arbitrary restrictions, like you mentioned, by providing new options/ways to control things but try to be as NFC or close to the original behaviour as possible. That was tuned to specialise very infrequently and a special case, so everything lifting of restrictions will increase compile times. Thus, the way I look at this that you put the infrastructure in place, so that perhaps later we can change things, or decisions can be manually overridden.

fhahn added a comment.Feb 16 2022, 3:11 AM

I have measured compilation times with the pass enabled/disabled by default using instruction count as the metric following these:

Could you share the link the the actual comparison (there's a C link on the left side for each commit on the overview page)? From the numbers you posted it is not clear for which configuration those numbers are (e.g. O3 + NewPM, ReleaseLTO + g + NewPM).

! In D119880#3325744, @fhahn wrote:
Could you share the link the the actual comparison (there's a C link on the left side for each commit on the overview page)? From the numbers you posted it is not clear for which configuration those numbers are (e.g. O3 + NewPM, ReleaseLTO + g + NewPM).

Sorry I wasn't clear. I performed a local run on my x86 machine configuring the build as cmake -GNinja /path/to/llvm-test-suite/ -DOPTFLAGS="" -C /path/to/llvm-test-suite/cmake/caches/O3.cmake -DCMAKE_C_COMPILER=/path/to/release-build-no-asserts/bin/clang -DTEST_SUITE_USE_PERF=true -DTEST_SUITE_SUBDIRS=CTMark -DTEST_SUITE_RUN_BENCHMARKS=false -DTEST_SUITE_COLLECT_CODE_SIZE=false. I am not sure which pass manager that configuration is using.

! In D119880#3325445, @SjoerdMeijer wrote:
My request would be to split this up and do one thing at a time if possible. There also seems to a bit of refactoring and NFC changes in here, probably also best split off.
Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).

Could you clarify which bits to split up, as I don't see how I could further break down this patch? Regarding the number of functions specialized in comparison to the previous version, I believe the llvm-test-suite reports statistics so I might be able to provide that information. Cheers.

! In D119880#3325448, @SjoerdMeijer wrote:
Compile times are important of course. But what I want to say is that we should aim to lift some of these arbitrary restrictions, like you mentioned, by providing new options/ways to control things but try to be as NFC or close to the original behaviour as possible. That was tuned to specialise very infrequently and a special case, so everything lifting of restrictions will increase compile times. Thus, the way I look at this that you put the infrastructure in place, so that perhaps later we can change things, or decisions can be manually overridden.

Indeed this pass is profitable for spec-int-mcf. The two interesting functions we get to specialize have a gain about 4M. I experimented with the default value of MinGainThreshold among {1, 1K, 10K, 100K, 1M}. Using the llvm-test-suite for measuring compilation times anything above 10K had more or less the same effect, so I chose that one.

Here are a few more statistics from comparing this patch to the current implementation of function specialization:

This is from compiling the llvm-test-suite at -O3 under perf with a release build (no asserts) targeting x86. The metric is instruction count (average of three)

test name%delta
ClamAV+0.009545546536911
7zip-0.001629518928931
tramp3d-v4-0.046465647871192
kimwitu+++0.011940454030694
sqlite3-0.158695422048798
mafft-0.014463100189515
lencod-0.020921880121996
SPASS-0.047946880831827
Bullet-0.003464312699035
consumer-typeset-0.008383706273952

geomean = -0.0280598%

This is from compiling/running the llvm-test-suite at -O3 targeting AArch64 with statistics.

test namenum specializations beforenum specializations after
MultiSource/Applications/ClamAV/clamscan.test30
MultiSource/Applications/d/make_dparser.test10
MultiSource/Applications/oggenc/oggenc.test22
MultiSource/Applications/sqlite3/sqlite3.test30

Sorry for the delay. First, about a NFC and some refactoring, can the reshuffle of ArgInfo and SpecialisationInfo and the changes in the Solver functions be an NFC change perhaps?

But more importantly, rereading the description, I disagree with these statements:

I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation for newly discovered functions, and the truncation of clones for a given function.

These are not arbitrary restrictions, but was by design, like I mentioned in a comment inline. For the former, this was fundamental how the cost model used to work. For the latter, the candidates were sorted first on profitability, then the candidates with the least gain disregarded. Again, this was both by design, to manage the number of specialisations and compile-times. Also, this patch introduces an arbitrary heuristic: MinGainThreshold.

The real arbitrary restriction of the current implementation is this: we only specialise the first argument. This is the real restriction that we should lift first, and it should be based of course on profitability. That's why I feel this patch is doing too much at the same time, from some restructuring, to changing how the cost-model works and specialising multiple arguments. So my suggestion/question is if it would be possible to split things up accordingly. Does that make sense?

Here are a few more statistics from comparing this patch to the current implementation of function specialization:

This is from compiling the llvm-test-suite at -O3 under perf with a release build (no asserts) targeting x86. The metric is instruction count (average of three)

test name%delta
ClamAV+0.009545546536911
7zip-0.001629518928931
tramp3d-v4-0.046465647871192
kimwitu+++0.011940454030694
sqlite3-0.158695422048798
mafft-0.014463100189515
lencod-0.020921880121996
SPASS-0.047946880831827
Bullet-0.003464312699035
consumer-typeset-0.008383706273952

geomean = -0.0280598%

This is from compiling/running the llvm-test-suite at -O3 targeting AArch64 with statistics.

test namenum specializations beforenum specializations after
MultiSource/Applications/ClamAV/clamscan.test30
MultiSource/Applications/d/make_dparser.test10
MultiSource/Applications/oggenc/oggenc.test22
MultiSource/Applications/sqlite3/sqlite3.test30

So why exactly do we specialise less?

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
112

Like I mentioned before, not sure what the value is of recording compile time numbers here as things are still in flux; i.e. I don't think this information will age very well.

132

Nit: perhaps SpecializationInfo, to be a tiny bit more consistent (with ArgInfo) and descriptive.

321

Nit: Info could be more descriptive.

400

Nit: Info could be more descriptive.

550

This was fundamental how the cost model used to work: the more function specialised, the harder that become due to the Penalty. So I disagree with this statement:

I've also lifted two arbitrary limitations around the specialization selection: the penalty in cost estimation ....

This was not arbitrary, but by design.

! In D119880#3337134, @SjoerdMeijer wrote:
These are not arbitrary restrictions, but was by design, like I mentioned in a comment inline. For the former, this was fundamental how the cost model used to work. For the latter, the candidates were sorted first on profitability, then the candidates with the least gain disregarded. Again, this was both by design, to manage the number of specialisations and compile-times. Also, this patch introduces an arbitrary heuristic: MinGainThreshold.

I'll give an example to make the problem more clear. Let's say function foo has four candidate specializations with bonus 10, 15, 20, 25 and cost 5 (assume penalty is zero at this point), and function bar has four specializations with bonus 20, 25, 30, 35 and cost 20 (assume it would have been 5 without the penalty). The corresponding gains are 5, 10, 15, 20 for foo and 0, 5, 10, 15 for bar. With MaxClonesThreshold = 2 we reject the first two candidates of each function. With the new cost model the gains would be 5, 10, 15, 20 for foo (same as before) and 15, 20, 25, 30 for bar. With MinGainThreshold = 20 we reject the first three candidates of foo and the first candidate of bar. As a result we have the same number of specializations as before, but we have kept the most profitable ones. MinGainThreshold is a way to control the code size increase as well as the compilation times without having to sort all the specializations (of all functions, not per function). Its value was decided empirically as I explained in my previous comment. The existing cost model considers everything with gain above zero profitable. MinGainThreshold allows fine-tuning this value. If we prefered sorting all the specializations by gain we would then need to decide how many of them we are keeping based on some heuristic (percentage maybe?), which is more or less the same problem as deciding a value for MinGainThreshold: both are somewhat "arbitrary".

So why exactly do we specialise less?

We specialize less because the default value of MinGainThreshold has been set high.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
112

That's why I am mentioning percentages and not absolute values, but okay, I'll remove this comment.

132

The name SpecializationInfo is being used below:

using SpecializationInfo = SmallMapVector<CallBase *, Specialization, 8>;

I will change that one to`SpecializationMap` if that's okay.

321

Will change it to Specializations.

400

Will change it to Specializations

550

Maybe "arbitrary" was the wrong vocabulary here. Sorry. What I mean is that it is not fair to bias the calculation of the cost of a given function based on historical data as "how many specializations have happend so far". As I explained on the description, a potential specialization may never trigger even if it is more profitable from one that did, just because it was discovered first. I could separate this change to another patch if it makes the review easier.

SjoerdMeijer added inline comments.Mar 3 2022, 1:57 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
550

Ok, I see, I probably didn't get that idea, but it makes sense. I guess the general idea is to collect all candidates first, calculate a profitability, then select a few candidates. Maybe I am still expecting some sort of penalty the more gets specialised. But definitely, if you can split more things off, that would certainly help. This needs a rebase anyway I guess, and I need to look at this again after that.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 1:57 AM
labrinea updated this revision to Diff 412828.Mar 3 2022, 1:53 PM
labrinea edited the summary of this revision. (Show Details)

Changes from last revision:

  • bring back the penalty in function cost estimation (will factor out in a separate review)
  • rebased
ormris removed a subscriber: ormris.Mar 3 2022, 2:49 PM
samtebbs added inline comments.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
420–422

The ActualArgs and FormalArg similarity is a little confusing IMO. It might be useful to call them Arguments/Args (what the values passed to a function are called) and Parameters respectively, as that would make the difference clear.

labrinea added inline comments.Mar 9 2022, 3:02 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
420–422

I think we've established the naming here: https://reviews.llvm.org/D119874 (see the definition of ArgInfo). Wikipedia seems to agree on this: https://en.wikipedia.org/wiki/Parameter_(computer_programming).

samtebbs added inline comments.Mar 10 2022, 2:21 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
420–422

Sure, perhaps other people would have had to look extra hard because of the double use of "Arguments" but if that is also an accepted form then that's OK.

Sorry for the delay. I need to look a bit more at this, but I added some first thoughts inline.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
86

A few comments about this:

  • First, thanks for restoring the penalty, I like doing things one step at a time. :)
  • Can you add comments what this is, e.g. why it is 10000, but more generally what the rationale is and what this achieves.
  • Can you add how this interacts with other options, like MaxClonesThreshold.
426

I probably need to think a bit more about this, but perhaps you can help.... That is, now we are considering multiple arguments, and we calculate the bonus for each of them, and accumulate the bonus here. I am wondering if composition of the bonus is the right model. Alternatives are: take the average, take the minimum/maximum, sum them like you do here, or something else? Do you have any thoughts on this?

438

Can we just do a return after this, thus don't need the else and decrease indentation?

Hey Sjoerd, thanks for picking this up again. I do have an idea so please wait for my new revision. The idea is to be able to choose between:

  • a positive MinGainThreshold (same as this revision works with the only difference being that the value should be user defined - default will be zero)
  • a zero MinGainThreshold (that will be set as the default value) and then sort the Specializations based on Gain and reject "some" of them. I am now trying to come up with a formula for the amount of specializations to reject.

I will add comments about the interaction of options as you suggest.
Regarding the bonus calculation, I believe that accumulating the bonus of each argument is the right thing because each one of them contributes to the benefits of specialization (inlining etc). I also think that the more we complicate the model (measuring average,min,max) the slower the pass becomes.

labrinea added inline comments.Mar 10 2022, 5:41 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
438

We would like to print the debug info that follows. It's useful even when the cost model is disregarded.

labrinea updated this revision to Diff 414382.Mar 10 2022, 8:22 AM

Changes from last revision:

  • Zero is now the default value for MinGainThreshold
  • The Specializations are sorted by Gain and the lower half gets rejected
SjoerdMeijer added inline comments.Mar 11 2022, 12:46 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
327

This isn't clear to me, I don't see how we sort things. First, we add things to a map here....

340

... and then we iterate over half the elements in the map here.

Does the sorting rely on the keys of the map? Not sure I find that the cleanest way of doing it, think it could be a one liner and doing a llvm::stable_sort?

labrinea added inline comments.Mar 11 2022, 1:24 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
340

The std::map quarantees ordering of elements based on key (which is the Gain here). Stable sort requires a container with random access iteration. SpecializationMap is SmallMapVector, which might have random access, but Map is SmallDenseMap (hash table). On top of that we have nested structures (a map containing maps) so we can't sort everything all together with stable sort, right?

labrinea added a comment.EditedMar 11 2022, 6:48 AM

I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;. The idea is to keep the compilation times down in case of a source file which results in many candidate specializations (I haven't seen any but you never know).

Here are some values of f(x) = log10(x^4)+1 rounded to integer:
f(1) =1
f(2) = 2
f(3) = 2
f(4) = 3
f(5) = 3
f(6) = 4
...
f(10) = 5
...
f(100) = 9
...
f(1000) = 13

Posting some statistics from running the llvm test suite at -O3 on AArch64:

test namenum specializations one argnum specializations mult args (sorted: keep n/2)num specializations mult args (sorted: keep log10(n^4)+1)
MultiSource/Applications/ClamAV/clamscan.test334
MultiSource/Applications/d/make_dparser.test101
MultiSource/Applications/oggenc/oggenc.test212
MultiSource/Applications/sqlite3/sqlite3.test334

I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;.

Not sure if I miss anything, but I don't see this in the code. Also, why this formula, what is the rationale?

That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).

Regarding your previous question; in order to use stable sort we would need to flatten the nested structure of SmallDenseMap<Function *, SpecializationMap> into a wider SpecializationMap, which would contain specializations of several functions in one data structure. The problem is that calculateGains currently expects an empty SpecializationMap, which corresponds to a single function, hence it would require heavy adaptation. I can experiment and see if it's worth pursuing this idea (maybe in follow up patches?).

That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).

Oh okay, got it.

Regarding your previous question; in order to use stable sort we would need to flatten the nested structure of SmallDenseMap<Function *, SpecializationMap> into a wider SpecializationMap, which would contain specializations of several functions in one data structure. The problem is that calculateGains currently expects an empty SpecializationMap, which corresponds to a single function, hence it would require heavy adaptation. I can experiment and see if it's worth pursuing this idea (maybe in follow up patches?).

Okay, cheers, will leave it up to you then. I expect both approaches to be very close in terms of compile times. Algorithmically it is the same I think. So it's more about readability of the code.

I am now going to read the whole patch again.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
131–134

Nit: we might as well drop the 8 here. From the programmers manual:

In the absence of a well-motivated choice for the number of inlined elements N, it is recommended to use SmallVector<T> (that is, omitting the N).

327

If you keep this version, can you at least add that this is sorting the candidates using a map.

Sorry for the late reply.

From the implementation, the main change is the introduction of MinGainThreshold. It would sort the candidates and erase the late half of candidates if MinGainThreshold is zero, do I understand right? And if MinGainThreshold is not zero, all the candidates whose benefit is below it would be erased. The idea it self looks not bad to me. But I would like to see an upper bound for the number of specialized functions. There isn't one in current version, right? I feel good with the idea: auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1; or anything else. I am OK to implement this one in following patches. But we need one indeed.


Beyond this patch, I think the main concern is still about the cost model. All of us talked about in the revision is about a high level. We are talking about how to filter the candidates. But I think the key point or the problem might be the cost model it self. I mean, why the benefit or cost gets the number for a particular function and correspond constant argument. The original model is relatively simple and I think it has a large space to improve.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
108

We should add a link to Nikic's compile time tracker. I guess there might be readers who don't know it.

307–308

I agree that the choice of data structure here is not clean enough. From the implementation, I feel Map could be a sorted heap (or PriorityQueue). Here are my thoughts.

llvm::PriorityQueue<std::pair<Function *, SpecializationMap>> Map; // We need a self-defined compare class here.
                                                                                                                     // Maybe we need a new name.

And we could remove Sorted.

Then we could insert the valid pair to Map. Then we could keep it sorted. Then we could only pop the later half candidates.

It is ok to use vector like container and use heap APIs in std to build the heap by hand. The benefit could be that we could access the container randomly. (So we could avoid pop iterators one by one)

320–321

We shouldn't insert F to Map directly. Since it is possible that this Specializations is not valid.

433–442

The 2 stage construction of Specializations looks not good. It would fill things in Specializations first and remove the unwanted things. It is unnecessarily wasteful. We should check before inserting.

labrinea added inline comments.Mar 15 2022, 3:31 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
108

Will do.

131–134

I tried to use realistic numbers in all the Small ADTs in this patch. I don't think we'll ever get to a specialization with more than 8 arguments.

307–308

Thanks for the input. I have a working revision which does what I suggested in my last comment to Sjoerd - a flattened SpecializationMap that contains entries for multiple functions, which I am then sorting with stable sort. I'll upload it soon.

320–321

Good catch, in an improved version of this revision I removed the entry inside the following block if (Specializations.empty()) {. But as I said I will upload a different approach.

433–442

I am not sure if that's possible as we only know if the Gain is above the threshold after we've accumulated the bonus from each argument.

ChuanqiXu added inline comments.Mar 15 2022, 3:57 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140

It it worth to give a new name. The current name looks not straight.

433–442

It should be possible to implement if we would like to add some new local variables. There might be some extra copies. But one the one hand, we get better readability. On the other hand, the extra copies might be significant I think.

labrinea added inline comments.Mar 15 2022, 8:30 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140

I can't think of a better name. Any ideas?

433–442

I just read the implementation of erase() in MapVector and it's linear to the number of entries :( However it should be possible to use remove_if() instead, which is also linear, but it can erase multiple elements in a single pass.

labrinea updated this revision to Diff 415477.Mar 15 2022, 9:45 AM
labrinea edited the summary of this revision. (Show Details)

I decided to keep this patch as close as possible to the original implementation, leaving the improvements to the cost model for later patches. That said, MinGainThreshold is withdrawn, so as the sorting of specializations across multiple functions.

labrinea added inline comments.Mar 16 2022, 3:47 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
459–460

This looks suboptimal. Will try and write an erase method for MapVector which takes an iterator range.

Just found another show stopper. We can't use stable_sort on MapVector because the underlying DenseMap which holds the vector indices will stay outdated :/

labrinea updated this revision to Diff 417350.Mar 22 2022, 11:21 AM
labrinea edited the summary of this revision. (Show Details)
labrinea edited the summary of this revision. (Show Details)Mar 22 2022, 12:59 PM
ChuanqiXu added inline comments.Mar 22 2022, 8:48 PM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140

Oh, I am not good at naming too. But I am sure the name is relatively bad. For example, the original ConstList is SmallVector<Constant*>, which is pretty makes sense. But now it is SmallVector<std::pair<CallBase *, Constant *>> and the name is still ConstList... It has the same problem with SpecializationList and SpecializationMap...

@SjoerdMeijer do you have any suggestion?

314

We could submit such changes standalone as a NFC patch.

333

I think this change is not necessary. And we should keep the original Changed variable. Given specializeFunctions is called multiple times, NbFunctionsSpecialized might be bigger than 0 all the time. So the return value of specializeFunctions wouldn't be right.

338–339

Unnecessary change.

447–452

Could we sort directly on MapVector? I see MapVector implement both swap and operator[]. So it looks possible to sort directly for MapVector.

Then could we remove WorkList?

labrinea added inline comments.Mar 23 2022, 1:21 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140

Side note: It would be nice if I could at least define SpecializationList (or whatever we decide to call it) as SpecializationMap::VectorType, but unfortunately VectorType is private for MapVector.

333

Good point, I forgot specializeFunctions is being called inside a loop without reseting NbFunctionsSpecialized first. I'll fix this.

338–339

Clang format made this.

447–452

We could directly sort on MapVector, yes, but it's the same thing. I think we shouldn't get rid of the Worklist because if we use the MapVector after this point the indices are outdated. Therefore if we try to use the [] operator at any point, the value it returns will be wrong. This is hard to observe and we could end up dealing with issues that go unnoticed.

Also we won't be able to erase the excess specializations at the end of the list. If https://reviews.llvm.org/D121817 gets accepted we might be able to do so, but still doing it directly on a vector is cheaper.

ChuanqiXu added inline comments.Mar 23 2022, 1:38 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
338–339

We could format only for diffs. This is what we generally do.

447–452

To get D121817 accepted, I think you need to comment on that thread. I think there is no block comments now. For the problem of sorting MapVector, I think add a member function sort() to MapVector is accepted solution. In my imagination, I feel it wouldn't be hard to implement.

But from the context of the current patch, I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I guess this is due to the patch get split. So I think we could not solve the problem for MapVector now. We could solve it after we need.

labrinea added inline comments.Mar 23 2022, 2:12 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
338–339

Alright.

447–452

As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary.

I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all.

I am not following. Could you please explain?

ChuanqiXu added inline comments.Mar 23 2022, 2:21 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
447–452

As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary.

I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all.

I am not following. Could you please explain?

Oh, I guess I took an oversight. I missed you've updated the code after you said you can't sort on MapVector. Now it looks good to me.

labrinea updated this revision to Diff 417867.Mar 24 2022, 3:23 AM

Changes in this revision:

  • separated unrelated clang-format changes to NFC patch
  • renamed the data types for the ADT
  • added an explanation comment in the new test file
  • rebased
ChuanqiXu accepted this revision.Mar 24 2022, 7:15 PM

LGTM basically. Please wait for a few days in case there are other comments.

llvm/lib/Transforms/Utils/SCCPSolver.cpp
543–551

Could we use Iter->Formal?

This revision is now accepted and ready to land.Mar 24 2022, 7:15 PM

It's looking good to me too, just a last round of nits (inlined) and one question just to double check: is the SPEC score the same? I.e., do we still specialise what we want to specialise?

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140

Sorry, bikeshedding names: if we group the caller (CallBase) and a constant actual argument, is ActualArgPair an accurate description? We don't group 2 actual args, which is what this name is suggesting to me.

434

Nit: can 0 - Cost just be -Cost?

439

Another nit: if the gain is 0, then arguably it is not a gain and we increase code-size?

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll
9

Perhaps you want to turn this into a test, i.e. match the debug output too? If so, that would require asserts. Not sure, but probably best done as an additional, separate test, otherwise we might loose the testing of this for the non-assert builds and bots.

labrinea added inline comments.Mar 25 2022, 12:08 PM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
139–140

I agree it's not a good type name. Any ideas? Maybe it's not really necessary do define one.

434

We can't because the InstructionCost class does not implement the - operator.

439

We are doing the same as before (see line 423 on the left) : we reject specializations with Gain that is less or equal to zero. How does this increase code size?

llvm/lib/Transforms/Utils/SCCPSolver.cpp
543–551

possibly

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll
9

This here is a comment; it does not contain check lines. The whole purpose is to show that specializations do get sorted and iterated correctly.

I wouldn't mind adding test for debug output in a follow up patch. I couldn't find examples; not sure how to do them. Any pointers appreciated.

This revision was landed with ongoing or failed builds.Mar 28 2022, 4:08 AM
This revision was automatically updated to reflect the committed changes.