Page MenuHomePhabricator

[FuncSpec] Support function specialization across multiple arguments.
ClosedPublic

Authored by labrinea on Feb 15 2022, 12:10 PM.

Details

Summary

The current implementation of Function Specialization does not allow specializing more than one arguments per function call, which is a limitation I am lifting with this patch.

My main challenge was to choose the most suitable ADT for storing the specializations. We need an associative container for binding all the actual arguments of a specialization to the function call. We also need a consistent iteration order across executions. Lastly we want to be able to sort the entries by Gain and reject the least profitable ones.

MapVector fits the bill but not quite; erasing elements is expensive and using stable_sort messes up the indices to the underlying vector. I am therefore using the underlying vector directly after calculating the Gain.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 1:57 AM
labrinea updated this revision to Diff 412828.Mar 3 2022, 1:53 PM
labrinea edited the summary of this revision. (Show Details)

Changes from last revision:

  • bring back the penalty in function cost estimation (will factor out in a separate review)
  • rebased
ormris removed a subscriber: ormris.Mar 3 2022, 2:49 PM
samtebbs added inline comments.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
421–423

The ActualArgs and FormalArg similarity is a little confusing IMO. It might be useful to call them Arguments/Args (what the values passed to a function are called) and Parameters respectively, as that would make the difference clear.

labrinea added inline comments.Mar 9 2022, 3:02 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
421–423

I think we've established the naming here: https://reviews.llvm.org/D119874 (see the definition of ArgInfo). Wikipedia seems to agree on this: https://en.wikipedia.org/wiki/Parameter_(computer_programming).

samtebbs added inline comments.Mar 10 2022, 2:21 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
421–423

Sure, perhaps other people would have had to look extra hard because of the double use of "Arguments" but if that is also an accepted form then that's OK.

Sorry for the delay. I need to look a bit more at this, but I added some first thoughts inline.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
87

A few comments about this:

  • First, thanks for restoring the penalty, I like doing things one step at a time. :)
  • Can you add comments what this is, e.g. why it is 10000, but more generally what the rationale is and what this achieves.
  • Can you add how this interacts with other options, like MaxClonesThreshold.
426

I probably need to think a bit more about this, but perhaps you can help.... That is, now we are considering multiple arguments, and we calculate the bonus for each of them, and accumulate the bonus here. I am wondering if composition of the bonus is the right model. Alternatives are: take the average, take the minimum/maximum, sum them like you do here, or something else? Do you have any thoughts on this?

434

Can we just do a return after this, thus don't need the else and decrease indentation?

Hey Sjoerd, thanks for picking this up again. I do have an idea so please wait for my new revision. The idea is to be able to choose between:

  • a positive MinGainThreshold (same as this revision works with the only difference being that the value should be user defined - default will be zero)
  • a zero MinGainThreshold (that will be set as the default value) and then sort the Specializations based on Gain and reject "some" of them. I am now trying to come up with a formula for the amount of specializations to reject.

I will add comments about the interaction of options as you suggest.
Regarding the bonus calculation, I believe that accumulating the bonus of each argument is the right thing because each one of them contributes to the benefits of specialization (inlining etc). I also think that the more we complicate the model (measuring average,min,max) the slower the pass becomes.

labrinea added inline comments.Mar 10 2022, 5:41 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
434

We would like to print the debug info that follows. It's useful even when the cost model is disregarded.

labrinea updated this revision to Diff 414382.Mar 10 2022, 8:22 AM

Changes from last revision:

  • Zero is now the default value for MinGainThreshold
  • The Specializations are sorted by Gain and the lower half gets rejected
SjoerdMeijer added inline comments.Mar 11 2022, 12:46 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
321

This isn't clear to me, I don't see how we sort things. First, we add things to a map here....

334

... and then we iterate over half the elements in the map here.

Does the sorting rely on the keys of the map? Not sure I find that the cleanest way of doing it, think it could be a one liner and doing a llvm::stable_sort?

labrinea added inline comments.Mar 11 2022, 1:24 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
334

The std::map quarantees ordering of elements based on key (which is the Gain here). Stable sort requires a container with random access iteration. SpecializationMap is SmallMapVector, which might have random access, but Map is SmallDenseMap (hash table). On top of that we have nested structures (a map containing maps) so we can't sort everything all together with stable sort, right?

labrinea added a comment.EditedMar 11 2022, 6:48 AM

I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;. The idea is to keep the compilation times down in case of a source file which results in many candidate specializations (I haven't seen any but you never know).

Here are some values of f(x) = log10(x^4)+1 rounded to integer:
f(1) =1
f(2) = 2
f(3) = 2
f(4) = 3
f(5) = 3
f(6) = 4
...
f(10) = 5
...
f(100) = 9
...
f(1000) = 13

Posting some statistics from running the llvm test suite at -O3 on AArch64:

test namenum specializations one argnum specializations mult args (sorted: keep n/2)num specializations mult args (sorted: keep log10(n^4)+1)
MultiSource/Applications/ClamAV/clamscan.test334
MultiSource/Applications/d/make_dparser.test101
MultiSource/Applications/oggenc/oggenc.test212
MultiSource/Applications/sqlite3/sqlite3.test334

I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;.

Not sure if I miss anything, but I don't see this in the code. Also, why this formula, what is the rationale?

That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).

Regarding your previous question; in order to use stable sort we would need to flatten the nested structure of SmallDenseMap<Function *, SpecializationMap> into a wider SpecializationMap, which would contain specializations of several functions in one data structure. The problem is that calculateGains currently expects an empty SpecializationMap, which corresponds to a single function, hence it would require heavy adaptation. I can experiment and see if it's worth pursuing this idea (maybe in follow up patches?).

That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).

Oh okay, got it.

Regarding your previous question; in order to use stable sort we would need to flatten the nested structure of SmallDenseMap<Function *, SpecializationMap> into a wider SpecializationMap, which would contain specializations of several functions in one data structure. The problem is that calculateGains currently expects an empty SpecializationMap, which corresponds to a single function, hence it would require heavy adaptation. I can experiment and see if it's worth pursuing this idea (maybe in follow up patches?).

Okay, cheers, will leave it up to you then. I expect both approaches to be very close in terms of compile times. Algorithmically it is the same I think. So it's more about readability of the code.

I am now going to read the whole patch again.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
118–119

Nit: we might as well drop the 8 here. From the programmers manual:

In the absence of a well-motivated choice for the number of inlined elements N, it is recommended to use SmallVector<T> (that is, omitting the N).

321

If you keep this version, can you at least add that this is sorting the candidates using a map.

Sorry for the late reply.

From the implementation, the main change is the introduction of MinGainThreshold. It would sort the candidates and erase the late half of candidates if MinGainThreshold is zero, do I understand right? And if MinGainThreshold is not zero, all the candidates whose benefit is below it would be erased. The idea it self looks not bad to me. But I would like to see an upper bound for the number of specialized functions. There isn't one in current version, right? I feel good with the idea: auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1; or anything else. I am OK to implement this one in following patches. But we need one indeed.


Beyond this patch, I think the main concern is still about the cost model. All of us talked about in the revision is about a high level. We are talking about how to filter the candidates. But I think the key point or the problem might be the cost model it self. I mean, why the benefit or cost gets the number for a particular function and correspond constant argument. The original model is relatively simple and I think it has a large space to improve.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
103

We should add a link to Nikic's compile time tracker. I guess there might be readers who don't know it.

301–302

I agree that the choice of data structure here is not clean enough. From the implementation, I feel Map could be a sorted heap (or PriorityQueue). Here are my thoughts.

llvm::PriorityQueue<std::pair<Function *, SpecializationMap>> Map; // We need a self-defined compare class here.
                                                                                                                     // Maybe we need a new name.

And we could remove Sorted.

Then we could insert the valid pair to Map. Then we could keep it sorted. Then we could only pop the later half candidates.

It is ok to use vector like container and use heap APIs in std to build the heap by hand. The benefit could be that we could access the container randomly. (So we could avoid pop iterators one by one)

314–315

We shouldn't insert F to Map directly. Since it is possible that this Specializations is not valid.

428–456

The 2 stage construction of Specializations looks not good. It would fill things in Specializations first and remove the unwanted things. It is unnecessarily wasteful. We should check before inserting.

labrinea added inline comments.Mar 15 2022, 3:31 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
103

Will do.

118–119

I tried to use realistic numbers in all the Small ADTs in this patch. I don't think we'll ever get to a specialization with more than 8 arguments.

301–302

Thanks for the input. I have a working revision which does what I suggested in my last comment to Sjoerd - a flattened SpecializationMap that contains entries for multiple functions, which I am then sorting with stable sort. I'll upload it soon.

314–315

Good catch, in an improved version of this revision I removed the entry inside the following block if (Specializations.empty()) {. But as I said I will upload a different approach.

428–456

I am not sure if that's possible as we only know if the Gain is above the threshold after we've accumulated the bonus from each argument.

ChuanqiXu added inline comments.Mar 15 2022, 3:57 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128

It it worth to give a new name. The current name looks not straight.

428–456

It should be possible to implement if we would like to add some new local variables. There might be some extra copies. But one the one hand, we get better readability. On the other hand, the extra copies might be significant I think.

labrinea added inline comments.Mar 15 2022, 8:30 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128

I can't think of a better name. Any ideas?

428–456

I just read the implementation of erase() in MapVector and it's linear to the number of entries :( However it should be possible to use remove_if() instead, which is also linear, but it can erase multiple elements in a single pass.

labrinea updated this revision to Diff 415477.Mar 15 2022, 9:45 AM
labrinea edited the summary of this revision. (Show Details)

I decided to keep this patch as close as possible to the original implementation, leaving the improvements to the cost model for later patches. That said, MinGainThreshold is withdrawn, so as the sorting of specializations across multiple functions.

labrinea added inline comments.Mar 16 2022, 3:47 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
469–470

This looks suboptimal. Will try and write an erase method for MapVector which takes an iterator range.

Just found another show stopper. We can't use stable_sort on MapVector because the underlying DenseMap which holds the vector indices will stay outdated :/

labrinea updated this revision to Diff 417350.Mar 22 2022, 11:21 AM
labrinea edited the summary of this revision. (Show Details)
labrinea edited the summary of this revision. (Show Details)Mar 22 2022, 12:59 PM
ChuanqiXu added inline comments.Mar 22 2022, 8:48 PM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128

Oh, I am not good at naming too. But I am sure the name is relatively bad. For example, the original ConstList is SmallVector<Constant*>, which is pretty makes sense. But now it is SmallVector<std::pair<CallBase *, Constant *>> and the name is still ConstList... It has the same problem with SpecializationList and SpecializationMap...

@SjoerdMeijer do you have any suggestion?

308

We could submit such changes standalone as a NFC patch.

329

I think this change is not necessary. And we should keep the original Changed variable. Given specializeFunctions is called multiple times, NbFunctionsSpecialized might be bigger than 0 all the time. So the return value of specializeFunctions wouldn't be right.

333–335

Unnecessary change.

461–466

Could we sort directly on MapVector? I see MapVector implement both swap and operator[]. So it looks possible to sort directly for MapVector.

Then could we remove WorkList?

labrinea added inline comments.Mar 23 2022, 1:21 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128

Side note: It would be nice if I could at least define SpecializationList (or whatever we decide to call it) as SpecializationMap::VectorType, but unfortunately VectorType is private for MapVector.

329

Good point, I forgot specializeFunctions is being called inside a loop without reseting NbFunctionsSpecialized first. I'll fix this.

333–335

Clang format made this.

461–466

We could directly sort on MapVector, yes, but it's the same thing. I think we shouldn't get rid of the Worklist because if we use the MapVector after this point the indices are outdated. Therefore if we try to use the [] operator at any point, the value it returns will be wrong. This is hard to observe and we could end up dealing with issues that go unnoticed.

Also we won't be able to erase the excess specializations at the end of the list. If https://reviews.llvm.org/D121817 gets accepted we might be able to do so, but still doing it directly on a vector is cheaper.

ChuanqiXu added inline comments.Mar 23 2022, 1:38 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
333–335

We could format only for diffs. This is what we generally do.

461–466

To get D121817 accepted, I think you need to comment on that thread. I think there is no block comments now. For the problem of sorting MapVector, I think add a member function sort() to MapVector is accepted solution. In my imagination, I feel it wouldn't be hard to implement.

But from the context of the current patch, I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all. I guess this is due to the patch get split. So I think we could not solve the problem for MapVector now. We could solve it after we need.

labrinea added inline comments.Mar 23 2022, 2:12 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
333–335

Alright.

461–466

As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary.

I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all.

I am not following. Could you please explain?

ChuanqiXu added inline comments.Mar 23 2022, 2:21 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
461–466

As I have explained on the description we do need an associative container to bind all the actual arguments of a specialization to the same function call. That is only needed during the gain calculation phase. After that we only care about iterating over the specializations found. Therefore, I don't see the need to update the map indices while sorting the vector. It is expensive and in my opinion unnecessary.

I found we don't need to use MapVector at all. The first part of the pair of the map didn't get used at all.

I am not following. Could you please explain?

Oh, I guess I took an oversight. I missed you've updated the code after you said you can't sort on MapVector. Now it looks good to me.

labrinea updated this revision to Diff 417867.Mar 24 2022, 3:23 AM

Changes in this revision:

  • separated unrelated clang-format changes to NFC patch
  • renamed the data types for the ADT
  • added an explanation comment in the new test file
  • rebased
ChuanqiXu accepted this revision.Mar 24 2022, 7:15 PM

LGTM basically. Please wait for a few days in case there are other comments.

llvm/lib/Transforms/Utils/SCCPSolver.cpp
542

Could we use Iter->Formal?

This revision is now accepted and ready to land.Mar 24 2022, 7:15 PM

It's looking good to me too, just a last round of nits (inlined) and one question just to double check: is the SPEC score the same? I.e., do we still specialise what we want to specialise?

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128

Sorry, bikeshedding names: if we group the caller (CallBase) and a constant actual argument, is ActualArgPair an accurate description? We don't group 2 actual args, which is what this name is suggesting to me.

429

Nit: can 0 - Cost just be -Cost?

438

Another nit: if the gain is 0, then arguably it is not a gain and we increase code-size?

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll
9

Perhaps you want to turn this into a test, i.e. match the debug output too? If so, that would require asserts. Not sure, but probably best done as an additional, separate test, otherwise we might loose the testing of this for the non-assert builds and bots.

labrinea added inline comments.Mar 25 2022, 12:08 PM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
124–128

I agree it's not a good type name. Any ideas? Maybe it's not really necessary do define one.

429

We can't because the InstructionCost class does not implement the - operator.

438

We are doing the same as before (see line 423 on the left) : we reject specializations with Gain that is less or equal to zero. How does this increase code size?

llvm/lib/Transforms/Utils/SCCPSolver.cpp
542

possibly

llvm/test/Transforms/FunctionSpecialization/specialize-multiple-arguments.ll
9

This here is a comment; it does not contain check lines. The whole purpose is to show that specializations do get sorted and iterated correctly.

I wouldn't mind adding test for debug output in a follow up patch. I couldn't find examples; not sure how to do them. Any pointers appreciated.

This revision was landed with ongoing or failed builds.Mar 28 2022, 4:08 AM
This revision was automatically updated to reflect the committed changes.