This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Increase the maximum number of times the specializer can run.
ClosedPublic

Authored by labrinea on Mar 10 2023, 11:38 AM.

Details

Summary
  • Changes the default value of FuncSpecMaxIters from 1 to 10. This allows specialization of recursive functions.
  • Adds an option to control the maximum codesize growth per function.
  • Measured ~45% performance uplift for SPEC2017:548.exchange2_r on AWS Graviton3.

Diff Detail

Event Timeline

labrinea created this revision.Mar 10 2023, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2023, 11:38 AM
labrinea requested review of this revision.Mar 10 2023, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2023, 11:38 AM

Bumps the score of 548.exchange2_r from SPEC2017 by about 60% on Neoverse N1.

Bumps the score of 548.exchange2_r from SPEC2017 by about 60% on Neoverse N1.

Can we achieve this with lower FuncSpecMaxIters than 10 as 10 could be quite constly in terms of compile times?

labrinea added a comment.EditedMar 12 2023, 8:43 AM

Bumps the score of 548.exchange2_r from SPEC2017 by about 60% on Neoverse N1.

Can we achieve this with lower FuncSpecMaxIters than 10 as 10 could be quite constly in terms of compile times?

Exchange requires 8 iterations, but we should be making the specializer as generic as possible. Note that FuncSpecMaxIters is the maximum number of times the specializer will run as long as each previous iteration has found specializations. As described in the summary we only consider callsites which reside in the bodies of previous specializations after the propagation of constant arguments. For now, that specialization on literal constants is disabled, this only triggers for recursive functions through promotion of constant stack values to globals which is quite rare. Therefore the impact in compilation times is zero for the general case. Only in code like exchange I would expect significant regression. That said I will post some numbers from the compile time tracker on the LLVM test suite.

Another important observation is that the compile time hit is attributed to the number of clones (the more, the higher) which suggests that the time is not spent on the specializer but on other optimization passes.

labrinea added a comment.EditedMar 12 2023, 9:27 AM

Metric: Instruction Count
Comparison with base (parent of D145374 - top of the chain)

  • Non LTO
testnamedelta %
ClamAV-0.101
7zip+0.030
tramp3d-v4-0.027
kimwitu++-0.068
sqlite3-0.213
mafft-0.030
lencod+0.020
SPASS-0.324
consumer-typeset+0.016
Bullet+0.029
geomean-0.067
  • LTO
testnamedelta %
ClamAV+0.221
7zip-0.034
tramp3d-v4-0.003
kimwitu++-0.128
sqlite3+0.081
mafft-0.039
lencod-0.016
SPASS-0.795
consumer-typeset-0.001
Bullet-0.044
geomean-0.076

As I said the compile time hit is attributed to the number of clones so these changes are coming from D145379 (added one extra specialization for sqlite3 and one for ClamAV when linking) and from D145394 (removed all the specializations for SPASS).

@labrinea Is this patch the top-level patch for recursive function-specialization that benefits exchange2? Is there any setting required or would this perform specialization by default? Also, assuming this is with llvm-project/flang.

@labrinea Is this patch the top-level patch for recursive function-specialization that benefits exchange2? Is there any setting required or would this perform specialization by default? Also, assuming this is with llvm-project/flang.

You can see the patch dependencies in the chain. Some of them are merged, others are pending review. I will rebase it once the chain is ready. The patch benefits exchange2_r when compiled with flang. No other changes are necessary.

High level question first, it looks like that geomean compile-times improve, and that is very surprising.
My guess is that one of the functional changes in findSpecialization is a good improvement on itself. Or maybe it is the caching of the CodeMetrics? Anyway, that's my question, just curious if you know what it is?

labrinea added a comment.EditedMar 16 2023, 3:02 AM

@SjoerdMeijer

As I said the compile time hit is attributed to the number of clones so these changes are coming from D145379 (added one extra specialization for sqlite3 and one for ClamAV when linking) and from D145394 (removed all the specializations for SPASS).

(the compile time data include other patches of this chain, not just this one)

labrinea planned changes to this revision.Mar 21 2023, 7:49 AM
labrinea updated this revision to Diff 526016.May 26 2023, 3:56 AM
labrinea edited the summary of this revision. (Show Details)

Discarded most changes from previous revisions since the refactoring didn't seem necessary for keeping the compile times low.

bjope added a subscriber: bjope.May 26 2023, 4:22 AM

Ping!

@labrinea : Have you seen that there are problems with https://reviews.llvm.org/D150375 ?

This comment was removed by labrinea.
labrinea updated this revision to Diff 531379.Jun 14 2023, 9:28 AM
labrinea edited the summary of this revision. (Show Details)
  • rebased on D152799
  • added an option to control max clones per function across multiple iterations
labrinea updated this revision to Diff 551999.Aug 21 2023, 6:17 AM
labrinea edited the summary of this revision. (Show Details)

I think this revision makes more sense than the previous attempts. Perhaps it also makes the option MaxClones redundant, and so we could save some compile time by avoiding the heap-sort of the specialization candidates. But that's an improvement to consider for the future.

ChuanqiXu accepted this revision.Aug 21 2023, 11:20 PM

The patch itself looks pretty simple and the numbers look good too. So I think there may not be reason to block this.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
875–877

It looks a little better to remove the variable.

This revision is now accepted and ready to land.Aug 21 2023, 11:20 PM
This revision was landed with ongoing or failed builds.Aug 22 2023, 1:41 AM
This revision was automatically updated to reflect the committed changes.