This patch allows constant folding of PHIs when estimating the user bonus. Phi
nodes are a special case since some of their inputs may remain unresolved until
all the specialization arguments have been processed by the InstCostVisitor.
Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes
once the user bonus has been computed for all the specialization arguments.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The motivation of the patch looks a little bit odd to me.
Given:
a: br label %merge b: br label %merge merge: %p = phi i32 [%arg0, %a], [%non_constant, %b]
Given %arg0 may be constant , I don't think the information implies (any) benefits with function specialization. Because after the transformation, %p won't be constant still.
I know there is an optimization (may be aggressive JumpThreading?) to optimize the case:
a: br label %merge b: br label %merge merge: %p = phi i32 [%constant, %a], [%non_constant, %b] use of %p goto successors
to
a: br label %merge.a b: br label %merge merge: %p = phi i32 [%non_constant, %b] use of %p goto successors merge.a: %p.a = %constant use of %constant goto copy of successors
But I am not sure if this is implemented in LLVM. Also it smells not good to make FS to dependent on such optimizations.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
127–128 | What's the semantics of DeadBlocks here? Why would we add BB to DeadBlocks? It looks odd to me since it implies that estimating basic blocks will mark these blocks as dead. | |
275–277 | Maybe it is slightly better to alert the findConstantFor . |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
184–186 | This smells not good. It looks dangerous that the LastVisited may be invalid. | |
278–280 | What's the semantics for PendingPHIs here? It looks like a set of PHIs which would be constant but we don't know its values. But if it is this case, why can we insert the non-const and visited PHIs to PendingPHIs? It looks odd. | |
284 | Also I am curious why can we treat the PHI as constant with one of its incoming value? That doesn't make sense to me. |
Apologies for the confusion, perhaps I need to add some comments explaining the intention. If not all the incoming values of a PHI node evaluate to the same constant then the PHI node cannot be folded.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
127–128 | This function is called when we constant fold switch instructions or conditional branches. It estimates the cost of instructions residing in dead blocks (unless they've already been estimated while constant folding). | |
184–186 | The function getBonusFromPendingPHIs just above contains the only callsite to getUserBonus where nullptr is passed for the last two arguments. The key to a DenseMap cannot be nullptr. This isn't problematic as visitPHINode doesn't use LastVisited. | |
275–277 | Ack | |
278–280 | Pending PHIs are phi nodes we have visited once without successfully folding them to a constant value, so we want to repeat this once more after all the specialization arguments have been estimated by the InstCostVisitor. It is possible that a later argument makes a conditional branch unconditional, and consequently a basic block turns dead (if it has a unique predecessor). As a result one of the incoming values of the PHI can be disregarded for the evaluation of the PHI. I'll add a comment explaining this. | |
284 | Const is initialized to nullptr. The first incoming value that is identified to be constant sets Const to that value. If the remaing incomign values have that same constant value then we can replace the PHI, otherwise we can't. The only exception is if the incoming value is coming from a basic block we have proved is dead, or if the incoming values is this PHI itself, in which cases we ingore (skip) that incoming value. Perhaps I need to add a comment with this explanation. |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
273–274 | I am wondering whether we should be checking DeadBlocks.contains(I.getIncomingBlock(Idx)). Imagine this scenario: x = 2 / \ / aliveBB deadBB \ y = 3 \ / phi ( x , y ) The basic block where x is defined is alive, therefore we can't fold the PHI because x != y. Is it not a pessimization? |
- Fixed the bug with dead incoming basic blocks.
- Added a few comments.
- Rebased on D155177 (minor refactoring)
If not all the incoming values of a PHI node evaluate to the same constant then the PHI node cannot be folded.
Got it. Then it makes sense. Sorry for misreading.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
110 | Let's mention the semantics of DeadBlocks here. | |
127–128 | I got it. The semantics is "these blocks are dead if we convert the visiting values to constant. (But we haven't decided we're going to transform them, right?)". It is still slightly odd for me. But I don't have a suggestion here. | |
184–186 | I know it. I just feel the pattern is dangerous. For example, in all other visit* methods, we always assume the LastVisited is valid, but it won't be true after this patch. (Although it is not formally stated). The concern I had was that we broke the assumption here. And it is almost bad to broke some assumptions. I think the easy solution here may insert: if (LastVisited == KnownConstants.end()) return nullptr; to many other visit* methods. And set LastVisited to KnownConstants.end() if the use and C is nullptr. | |
264–265 | What's this insight for limitation? I don't feel the new algorithm is super time consuming. | |
273–274 | I don't understand this. Given DeadBlocks contains deadBB and I.getIncomingBlock(Idx) for x returns deadBB, we won't consider the value from x, right? Then it doesn't matter if x != y. Do I understand correctly? | |
278–280 | Got it. It makes sense now. | |
284 | Oh, I got it. It makes sense. | |
790–796 | Given getSpecializationBonus would do additional computations exposing inlining opportunities exposing inlining via indirect call promotion., (e.g., converting the call to a function ptr to a direct function call), it looks like we didn't compute these for pending PHIs. While the current patch is self-contained, maybe we can add a TODO/FIXME here for future improvements. |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
110 | I'll improve the wording. | |
127–128 | The DeadBlocks is a member of the InstCostVisitor, which by definition only estimates, does not transform itself. I'll make it clear on the description that here we are estimating the benefit to later decide whether we are going to make a transformation. | |
184–186 | There are other visit methods which are already not using the LastVisited iterator like visitCallBase and visitGetElementPtrInst. I think the extra checks are expensive, perhaps we could do: LastVisited = Use ? KnownConstants.insert({Use, C}).first : KnownConstants.end(); then inside the visit methods assert(LastVisited != KnownConstants.end()) | |
264–265 | There can be phi nodes with 64 incoming values or more, in which case it can be time consuming. We do need to early exit. I think 4 is a reasonable number. | |
273–274 | Exactly. The previous revision was checking if (DeadBlocks.contains(Inst->getParent()), which was wrong. | |
790–796 | As is, getSpecializationBonus only checks the specialization arguments (the root of the use-def chain) for inlining opportunities, not every single user of the use-def chain which might become constant. In that regard the pending PHIs are no different from the rest of the users that the InstCostVisitor is visiting through getUserBonus. Is this something we would like to change? |
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
184–186 | Yeah, that sounds much better. | |
264–265 | While I agree with you, it is better to make this an option. Then we reduce the magic number and have better semantics here even if we can image that there won't be so many people tuning on that. | |
790–796 | I think at least it is a potential optimization opportunities. Then if we have a consensus on this, it looks not bad to leave a TODO/FIXME.
A difference I thought is that the PHIs are more possible to be a function pointers. (But now I don't think it is better to say "more possible" while both GEP and load instructions may get a constant too ) |
- added an option to control the maximum number of incoming phi values
- added an assertion for the LastVisited iterator inside visit methods
- added a comment to explain the semantics of DeadBlocks
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp | ||
---|---|---|
790–796 | In my opinion, I think it's better to add the FIXME under a separate commit, since it's not directly related to this patch as I've explained above. |
I am seeing a crash in libLLVM-17.so when compiling socat on aarch64 and have narrowed it down to this change. here is testcase
https://uclibc.org/~kraj/nestlex.zip
compiler cmdline is
aarch64-yoe-linux-clang -target aarch64-yoe-linux nestlex.i -c -O
results in
segmentation fault (core dumped) ../recipe-sysroot-native/usr/bin/aarch64-yoe-linux/aarch64-yoe-linux-clang
works ok on reverting this patch from 17.x release branch
here is stack trace
Stack trace of thread 1747742: #0 0x00007f84b9e8eaa7 _ZN4llvm12hash_combineIJhhtNS_9hash_codeES1_PNS_4TypeEEEES1_DpRKT_ (libLLVM-17.so + 0xc8eaa7) #1 0x00007f84b9e9a3ae _ZN4llvm17ConstantUniqueMapINS_12ConstantExprEE11getOrCreateEPNS_4TypeENS_19ConstantExprKeyTypeE (libLLVM-17.so + 0xc9a3ae) #2 0x00007f84b9ea14e4 _ZN4llvm12ConstantExpr11getIntToPtrEPNS_8ConstantEPNS_4TypeEb (libLLVM-17.so + 0xca14e4) #3 0x00007f84bb4f3dcc _ZN12_GLOBAL__N_123SymbolicallyEvaluateGEPEPKN4llvm11GEPOperatorENS0_8ArrayRefIPNS0_8ConstantEEERKNS0_10DataLayoutEPKNS0_17TargetLibraryInfoE (libLLVM- 17.so + 0x22f3dcc) #4 0x00007f84bb4f44d3 _ZN12_GLOBAL__N_128ConstantFoldInstOperandsImplEPKN4llvm5ValueEjNS0_8ArrayRefIPNS0_8ConstantEEERKNS0_10DataLayoutEPKNS0_17TargetLibraryInfoE (libLLVM-1 7.so + 0x22f44d3) #5 0x00007f84bb1b6e37 _ZN4llvm15InstCostVisitor22visitGetElementPtrInstERNS_17GetElementPtrInstE (libLLVM-17.so + 0x1fb6e37) #6 0x00007f84bb1bb36b _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb36b) #7 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711) #8 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711) #9 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711) #10 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711) #11 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711) #12 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711) #13 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711) #14 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711) #15 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
@labrinea the test you added seems to be failing on several bots. Can you take a look and revert if you need time to investigate?
https://lab.llvm.org/buildbot/#/builders/216/builds/24650
https://lab.llvm.org/buildbot/#/builders/247/builds/7037
https://lab.llvm.org/buildbot/#/builders/139/builds/46259
Hi @labrinea
the same problem with the following builders:
- https://lab.llvm.org/buildbot/#/builders/235/builds/975
- https://lab.llvm.org/buildbot/#/builders/58/builds/42553
- https://lab.llvm.org/buildbot/#/builders/58/builds/42553
and few others ...
******************** TEST 'LLVM-Unit :: Transforms/IPO/./IPOTests/11/12' FAILED ******************** Script(shard): -- GTEST_OUTPUT=json:/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/unittests/Transforms/IPO/./IPOTests-LLVM-Unit-1080736-11-12.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=12 GTEST_SHARD_INDEX=11 /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/unittests/Transforms/IPO/./IPOTests -- Script: -- /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/unittests/Transforms/IPO/./IPOTests --gtest_filter=FunctionSpecializationTest.PhiNode -- /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp:358: Failure Expected equality of these values: Bonus Which is: 0 Ref Which is: 2014 /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp:359: Failure Value of: Bonus > 0 Actual: false Expected: true /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp:358 Expected equality of these values: Bonus Which is: 0 Ref Which is: 2014 /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp:359 Value of: Bonus > 0 Actual: false Expected: true ********************
We also see the same test failure on Windows:
Script: -- C:\b\s\w\ir\x\w\llvm_build\unittests\Transforms\IPO\.\IPOTests.exe --gtest_filter=FunctionSpecializationTest.PhiNode -- C:\b\s\w\ir\x\w\llvm-llvm-project\llvm\unittests\Transforms\IPO\FunctionSpecializationTest.cpp:358 Expected equality of these values: Bonus Which is: 0 Ref Which is: 2014 C:\b\s\w\ir\x\w\llvm-llvm-project\llvm\unittests\Transforms\IPO\FunctionSpecializationTest.cpp:359 Value of: Bonus > 0 Actual: false Expected: true
The previous revision was sensitive to the order getSpecializationBonus was evaluated in the unittest.
Specifically I was under the assumption that the expression
Specializer.getSpecializationBonus(F->getArg(0), One, Visitor) + Specializer.getSpecializationBonus(F->getArg(1), One, Visitor) + Specializer.getSpecializationBonus(F->getArg(2), One, Visitor)
would evaluate in order, but it depends on the target, so it has an effect on the resulting value.
Let's mention the semantics of DeadBlocks here.