- Changes the default value of FuncSpecMaxIters from 1 to 10. This allows specialization of recursive functions.
- Adds an option to control the maximum codesize growth per function.
- Measured ~45% performance uplift for SPEC2017:548.exchange2_r on AWS Graviton3.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Can we achieve this with lower FuncSpecMaxIters than 10 as 10 could be quite constly in terms of compile times?
Exchange requires 8 iterations, but we should be making the specializer as generic as possible. Note that FuncSpecMaxIters is the maximum number of times the specializer will run as long as each previous iteration has found specializations. As described in the summary we only consider callsites which reside in the bodies of previous specializations after the propagation of constant arguments. For now, that specialization on literal constants is disabled, this only triggers for recursive functions through promotion of constant stack values to globals which is quite rare. Therefore the impact in compilation times is zero for the general case. Only in code like exchange I would expect significant regression. That said I will post some numbers from the compile time tracker on the LLVM test suite.
Another important observation is that the compile time hit is attributed to the number of clones (the more, the higher) which suggests that the time is not spent on the specializer but on other optimization passes.
Metric: Instruction Count
Comparison with base (parent of D145374 - top of the chain)
- Non LTO
testname | delta % |
ClamAV | -0.101 |
7zip | +0.030 |
tramp3d-v4 | -0.027 |
kimwitu++ | -0.068 |
sqlite3 | -0.213 |
mafft | -0.030 |
lencod | +0.020 |
SPASS | -0.324 |
consumer-typeset | +0.016 |
Bullet | +0.029 |
geomean | -0.067 |
- LTO
testname | delta % |
ClamAV | +0.221 |
7zip | -0.034 |
tramp3d-v4 | -0.003 |
kimwitu++ | -0.128 |
sqlite3 | +0.081 |
mafft | -0.039 |
lencod | -0.016 |
SPASS | -0.795 |
consumer-typeset | -0.001 |
Bullet | -0.044 |
geomean | -0.076 |
As I said the compile time hit is attributed to the number of clones so these changes are coming from D145379 (added one extra specialization for sqlite3 and one for ClamAV when linking) and from D145394 (removed all the specializations for SPASS).
@labrinea Is this patch the top-level patch for recursive function-specialization that benefits exchange2? Is there any setting required or would this perform specialization by default? Also, assuming this is with llvm-project/flang.
You can see the patch dependencies in the chain. Some of them are merged, others are pending review. I will rebase it once the chain is ready. The patch benefits exchange2_r when compiled with flang. No other changes are necessary.
High level question first, it looks like that geomean compile-times improve, and that is very surprising.
My guess is that one of the functional changes in findSpecialization is a good improvement on itself. Or maybe it is the caching of the CodeMetrics? Anyway, that's my question, just curious if you know what it is?
As I said the compile time hit is attributed to the number of clones so these changes are coming from D145379 (added one extra specialization for sqlite3 and one for ClamAV when linking) and from D145394 (removed all the specializations for SPASS).
(the compile time data include other patches of this chain, not just this one)
Discarded most changes from previous revisions since the refactoring didn't seem necessary for keeping the compile times low.
@labrinea : Have you seen that there are problems with https://reviews.llvm.org/D150375 ?
- rebased on D152799
- added an option to control max clones per function across multiple iterations
I think this revision makes more sense than the previous attempts. Perhaps it also makes the option MaxClones redundant, and so we could save some compile time by avoiding the heap-sort of the specialization candidates. But that's an improvement to consider for the future.
The patch itself looks pretty simple and the numbers look good too. So I think there may not be reason to block this.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
869–871 | It looks a little better to remove the variable. |
It looks a little better to remove the variable.