User Details
- User Since
- Jun 10 2015, 2:25 AM (369 w, 1 d)
Wed, Jun 29
rebased
rebased + fixed tests
May 26 2022
This is a proof of concept for RFC: Should we enable Function Specialization?, not ready for review.
This is a proof of concept for RFC: Should we enable Function Specialization?, not ready for review.
Apr 28 2022
Okay then, looks good.
Apr 27 2022
I think it's good. The tests suggest that when the -mattr string is not empty then +v8a has to be explicitely specified. Is this good enough for https://github.com/llvm/llvm-project/issues/53956 or do we want to check whether FS contains +v8r?
Apr 26 2022
Changes to prior revision:
- added a validity check before accessing the contents of iterator
- reduced the testcase
Apr 21 2022
Thank you both for the feedback. I'll investigate further as I think there's more to it. I've added some additional info in https://github.com/llvm/llvm-project/issues/55000#issuecomment-1105169299. Cheers.
Apr 20 2022
Apr 1 2022
Mar 31 2022
Mar 30 2022
Mar 29 2022
I didn't end up using this patch in https://reviews.llvm.org/D119880, but if you still find it useful I can address the review comments, otherwise I'll abandon it.
Mar 28 2022
Mar 25 2022
Mar 24 2022
Changes in this revision:
- separated unrelated clang-format changes to NFC patch
- renamed the data types for the ADT
- added an explanation comment in the new test file
- rebased
Mar 23 2022
Mar 22 2022
Mar 16 2022
Just found another show stopper. We can't use stable_sort on MapVector because the underlying DenseMap which holds the vector indices will stay outdated :/
Mar 15 2022
I decided to keep this patch as close as possible to the original implementation, leaving the improvements to the cost model for later patches. That said, MinGainThreshold is withdrawn, so as the sorting of specializations across multiple functions.
Mar 14 2022
That's true; the new formula is not in the current revision. The idea is to keep a sublinear number of specializations when the number of candidates grows enormously (not expected to happen in real life code). So imagine we had 1000 candidates n/2 would be 500 whereas log10(n^4)+1 will be 13. I measured the instruction count of clang when compiling the llvm test suite with log10(n^5)+1 ( this function has a steeper curve - see https://www.google.com/search?q=plot+log10(x%5E5)%2B1 ) and it had a significant impact on ClamAV (1% more instructions over baseline compared to 0,57% increase with log10(x^4)+1).
Mar 11 2022
I've tried another formula to determine the amount of speciazations we are keeping (instead of Sorted.size()/2). It is defined as auto NumSpecKept = (size_t)std::log10(std::pow(Sorted.size(), 4))+1;. The idea is to keep the compilation times down in case of a source file which results in many candidate specializations (I haven't seen any but you never know).
Mar 10 2022
Changes from last revision:
- Zero is now the default value for MinGainThreshold
- The Specializations are sorted by Gain and the lower half gets rejected
Hey Sjoerd, thanks for picking this up again. I do have an idea so please wait for my new revision. The idea is to be able to choose between:
- a positive MinGainThreshold (same as this revision works with the only difference being that the value should be user defined - default will be zero)
- a zero MinGainThreshold (that will be set as the default value) and then sort the Specializations based on Gain and reject "some" of them. I am now trying to come up with a formula for the amount of specializations to reject.
Mar 9 2022
Mar 7 2022
Do we care about vector alignment? I am just looking at the x86 code, which does. Also, does this change apply to subvector extract too?
Mar 3 2022
My understanding is that function specialization of such constant arguments is controlled by the option -function-specialization-for-literal-constant. That is disabled by default as it increases compilation times. I measured how it affects instruction count using perf when compiling the llvm test suite. It was generally neutral except for the benchmark SPASS which was regressed by 4%.
Changes from last revision:
- bring back the penalty in function cost estimation (will factor out in a separate review)
- rebased
Mar 2 2022
Changes from last revision:
- marked ArgInfo as const reference when passed to markArgInFuncSpecialization() and rewriteCallSites()
- passed SpecializationList by reference to calculateGains() instead of returning by value
- removed the IsPartial flag from isArgumentInteresting() and getPossibleConstants() as it's no longer used anywhere in the code
- clang-formatted the code
Mar 1 2022
! In D119880#3337134, @SjoerdMeijer wrote:
These are not arbitrary restrictions, but was by design, like I mentioned in a comment inline. For the former, this was fundamental how the cost model used to work. For the latter, the candidates were sorted first on profitability, then the candidates with the least gain disregarded. Again, this was both by design, to manage the number of specialisations and compile-times. Also, this patch introduces an arbitrary heuristic: MinGainThreshold.
Feb 16 2022
Here are a few more statistics from comparing this patch to the current implementation of function specialization:
- Fixed a segfault which would happen if you tried to erase a function twice (replaced SmallVector with SmallPtrSet)
- Moved the deletion of dead instructions and functions in a destructor to make sure it's the last thing being called (though I am wondering if the Solver could still contain references to dead functions/instructions)
- Changed the dyn_cast_or_null to dyn_cast as suggested
This patch makes clang crash when compiling ClamAV from the llvm-test-suite:
! In D119880#3325445, @SjoerdMeijer wrote:
My request would be to split this up and do one thing at a time if possible. There also seems to a bit of refactoring and NFC changes in here, probably also best split off.
Regarding the compile-times, thanks for measuring that. I think you also need to report how many functions were specialised, also compared to previous version. But I think the compile-times discussion is for another time, when we start the discussion of possibly enabling this by default (don't think it is should be recorded/included as a code comment).
Could you clarify which bits to split up, as I don't see how I could further break down this patch? Regarding the number of functions specialized in comparison to the previous version, I believe the llvm-test-suite reports statistics so I might be able to provide that information. Cheers.
! In D119880#3325448, @SjoerdMeijer wrote:
Compile times are important of course. But what I want to say is that we should aim to lift some of these arbitrary restrictions, like you mentioned, by providing new options/ways to control things but try to be as NFC or close to the original behaviour as possible. That was tuned to specialise very infrequently and a special case, so everything lifting of restrictions will increase compile times. Thus, the way I look at this that you put the infrastructure in place, so that perhaps later we can change things, or decisions can be manually overridden.
Indeed this pass is profitable for spec-int-mcf. The two interesting functions we get to specialize have a gain about 4M. I experimented with the default value of MinGainThreshold among {1, 1K, 10K, 100K, 1M}. Using the llvm-test-suite for measuring compilation times anything above 10K had more or less the same effect, so I chose that one.
! In D119880#3325744, @fhahn wrote:
Could you share the link the the actual comparison (there's a C link on the left side for each commit on the overview page)? From the numbers you posted it is not clear for which configuration those numbers are (e.g. O3 + NewPM, ReleaseLTO + g + NewPM).
Feb 15 2022
Feb 2 2022
Ah, right, I didn't notice there was another builder page. I'll generate some check lines and commit. Thanks both :)
I have heavily modified the original reproducer, but the problem is that we don't have buildbots running the llvm regression tests with asan. I can only see two builders for libc with asan here https://lab.llvm.org/buildbot/#/builders. Do you think there's still value in adding this testcase?
Feb 1 2022
Added some debug output:
- when replacing a value
- when removing dead instructions
Jan 31 2022
As suggested by Florian, instead of using a WeakVH I am lazily removing the replaced instructions after the Solver has run. None of the existing tests actually covers this code path I am afraid. Examining the debug output of the reproducer I found two PhiNode instructions being replaced with null.
Jan 28 2022
This has landed as https://reviews.llvm.org/D109827.
Jan 26 2022
Jan 25 2022
Jan 20 2022
Indeed the reference manual suggests that the <n> sysreg names are in the range of 1-15:
- G1.3.18 PRBAR<n>_EL1, Protection Region Base Address Register n (EL1), n = 1 - 15
- G1.3.19 PRBAR<n>_EL2, Protection Region Base Address Register n (EL2), n = 1 - 15
- G1.3.24 PRLAR<n>_EL1, Protection Region Limit Address Register n (EL1), n = 1 - 15
- G1.3.25 PRLAR<n>_EL2, Protection Region Limit Address Register n (EL2), n = 1 - 15