This is an archive of the discontinued LLVM Phabricator instance.

[FuncSpec] Add Phi nodes to the InstCostVisitor.
ClosedPublic

Authored by labrinea on Jul 10 2023, 8:11 AM.

Details

Summary

This patch allows constant folding of PHIs when estimating the user bonus. Phi
nodes are a special case since some of their inputs may remain unresolved until
all the specialization arguments have been processed by the InstCostVisitor.
Therefore, we keep a list of dead basic blocks and then lazily visit the Phi nodes
once the user bonus has been computed for all the specialization arguments.

Diff Detail

Event Timeline

labrinea created this revision.Jul 10 2023, 8:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 8:11 AM
labrinea requested review of this revision.Jul 10 2023, 8:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 8:11 AM

The motivation of the patch looks a little bit odd to me.

Given:

a:
  br label %merge

b:
  br label %merge

merge:
  %p = phi i32 [%arg0, %a], [%non_constant, %b]

Given %arg0 may be constant , I don't think the information implies (any) benefits with function specialization. Because after the transformation, %p won't be constant still.

I know there is an optimization (may be aggressive JumpThreading?) to optimize the case:

a:
  br label %merge

b:
  br label %merge

merge:
  %p = phi i32 [%constant, %a], [%non_constant, %b]
  use of %p
  goto successors

to

a:
  br label %merge.a

b:
  br label %merge

merge:
  %p = phi i32  [%non_constant, %b]
  use of %p
  goto successors

merge.a:
   %p.a = %constant
   use of %constant
   goto copy of successors

But I am not sure if this is implemented in LLVM. Also it smells not good to make FS to dependent on such optimizations.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
127–128

What's the semantics of DeadBlocks here? Why would we add BB to DeadBlocks? It looks odd to me since it implies that estimating basic blocks will mark these blocks as dead.

275–277

Maybe it is slightly better to alert the findConstantFor .

ChuanqiXu added inline comments.Jul 10 2023, 7:22 PM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
184–186

This smells not good. It looks dangerous that the LastVisited may be invalid.

278–280

What's the semantics for PendingPHIs here? It looks like a set of PHIs which would be constant but we don't know its values. But if it is this case, why can we insert the non-const and visited PHIs to PendingPHIs? It looks odd.

284

Also I am curious why can we treat the PHI as constant with one of its incoming value? That doesn't make sense to me.

The motivation of the patch looks a little bit odd to me.

Given:

a:
  br label %merge

b:
  br label %merge

merge:
  %p = phi i32 [%arg0, %a], [%non_constant, %b]

Given %arg0 may be constant , I don't think the information implies (any) benefits with function specialization. Because after the transformation, %p won't be constant still.

I know there is an optimization (may be aggressive JumpThreading?) to optimize the case:

a:
  br label %merge

b:
  br label %merge

merge:
  %p = phi i32 [%constant, %a], [%non_constant, %b]
  use of %p
  goto successors

to

a:
  br label %merge.a

b:
  br label %merge

merge:
  %p = phi i32  [%non_constant, %b]
  use of %p
  goto successors

merge.a:
   %p.a = %constant
   use of %constant
   goto copy of successors

But I am not sure if this is implemented in LLVM. Also it smells not good to make FS to dependent on such optimizations.

Apologies for the confusion, perhaps I need to add some comments explaining the intention. If not all the incoming values of a PHI node evaluate to the same constant then the PHI node cannot be folded.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
127–128

This function is called when we constant fold switch instructions or conditional branches. It estimates the cost of instructions residing in dead blocks (unless they've already been estimated while constant folding).

184–186

The function getBonusFromPendingPHIs just above contains the only callsite to getUserBonus where nullptr is passed for the last two arguments. The key to a DenseMap cannot be nullptr. This isn't problematic as visitPHINode doesn't use LastVisited.

275–277

Ack

278–280

Pending PHIs are phi nodes we have visited once without successfully folding them to a constant value, so we want to repeat this once more after all the specialization arguments have been estimated by the InstCostVisitor. It is possible that a later argument makes a conditional branch unconditional, and consequently a basic block turns dead (if it has a unique predecessor). As a result one of the incoming values of the PHI can be disregarded for the evaluation of the PHI. I'll add a comment explaining this.

284

Const is initialized to nullptr. The first incoming value that is identified to be constant sets Const to that value. If the remaing incomign values have that same constant value then we can replace the PHI, otherwise we can't. The only exception is if the incoming value is coming from a basic block we have proved is dead, or if the incoming values is this PHI itself, in which cases we ingore (skip) that incoming value. Perhaps I need to add a comment with this explanation.

labrinea added inline comments.Jul 11 2023, 6:24 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
273–274

I am wondering whether we should be checking DeadBlocks.contains(I.getIncomingBlock(Idx)). Imagine this scenario:

      x = 2
     /     \
    /    aliveBB
deadBB
    \     y = 3
     \     /
phi ( x , y )

The basic block where x is defined is alive, therefore we can't fold the PHI because x != y. Is it not a pessimization?

labrinea updated this revision to Diff 539949.EditedJul 13 2023, 4:23 AM
  • Fixed the bug with dead incoming basic blocks.
  • Added a few comments.
  • Rebased on D155177 (minor refactoring)
labrinea marked 5 inline comments as done.Jul 13 2023, 4:25 AM

If not all the incoming values of a PHI node evaluate to the same constant then the PHI node cannot be folded.

Got it. Then it makes sense. Sorry for misreading.

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
110

Let's mention the semantics of DeadBlocks here.

127–128

I got it. The semantics is "these blocks are dead if we convert the visiting values to constant. (But we haven't decided we're going to transform them, right?)". It is still slightly odd for me. But I don't have a suggestion here.

184–186

I know it. I just feel the pattern is dangerous. For example, in all other visit* methods, we always assume the LastVisited is valid, but it won't be true after this patch. (Although it is not formally stated). The concern I had was that we broke the assumption here. And it is almost bad to broke some assumptions. I think the easy solution here may insert:

if (LastVisited == KnownConstants.end())
    return nullptr;

to many other visit* methods. And set LastVisited to KnownConstants.end() if the use and C is nullptr.

264–265

What's this insight for limitation? I don't feel the new algorithm is super time consuming.

273–274

I don't understand this. Given DeadBlocks contains deadBB and I.getIncomingBlock(Idx) for x returns deadBB, we won't consider the value from x, right? Then it doesn't matter if x != y. Do I understand correctly?

278–280

Got it. It makes sense now.

284

Oh, I got it. It makes sense.

790–796

Given getSpecializationBonus would do additional computations exposing inlining opportunities exposing inlining via indirect call promotion., (e.g., converting the call to a function ptr to a direct function call), it looks like we didn't compute these for pending PHIs.

While the current patch is self-contained, maybe we can add a TODO/FIXME here for future improvements.

labrinea marked 8 inline comments as done.Jul 21 2023, 1:46 AM
labrinea added inline comments.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
110

I'll improve the wording.

127–128

The DeadBlocks is a member of the InstCostVisitor, which by definition only estimates, does not transform itself. I'll make it clear on the description that here we are estimating the benefit to later decide whether we are going to make a transformation.

184–186

There are other visit methods which are already not using the LastVisited iterator like visitCallBase and visitGetElementPtrInst. I think the extra checks are expensive, perhaps we could do:

LastVisited = Use ? KnownConstants.insert({Use, C}).first : KnownConstants.end();

then inside the visit methods assert(LastVisited != KnownConstants.end())

264–265

There can be phi nodes with 64 incoming values or more, in which case it can be time consuming. We do need to early exit. I think 4 is a reasonable number.

273–274

Exactly. The previous revision was checking if (DeadBlocks.contains(Inst->getParent()), which was wrong.

790–796

As is, getSpecializationBonus only checks the specialization arguments (the root of the use-def chain) for inlining opportunities, not every single user of the use-def chain which might become constant. In that regard the pending PHIs are no different from the rest of the users that the InstCostVisitor is visiting through getUserBonus. Is this something we would like to change?

ChuanqiXu added inline comments.Jul 21 2023, 2:09 AM
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
184–186

Yeah, that sounds much better.

264–265

While I agree with you, it is better to make this an option. Then we reduce the magic number and have better semantics here even if we can image that there won't be so many people tuning on that.

790–796

I think at least it is a potential optimization opportunities. Then if we have a consensus on this, it looks not bad to leave a TODO/FIXME.

In that regard the pending PHIs are no different from the rest of the users that the InstCostVisitor is visiting through getUserBonus.

A difference I thought is that the PHIs are more possible to be a function pointers. (But now I don't think it is better to say "more possible" while both GEP and load instructions may get a constant too )

labrinea updated this revision to Diff 542834.Jul 21 2023, 2:51 AM
labrinea marked 7 inline comments as done.
  • added an option to control the maximum number of incoming phi values
  • added an assertion for the LastVisited iterator inside visit methods
  • added a comment to explain the semantics of DeadBlocks
labrinea marked 2 inline comments as done.Jul 21 2023, 2:52 AM
labrinea added inline comments.
llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
790–796

In my opinion, I think it's better to add the FIXME under a separate commit, since it's not directly related to this patch as I've explained above.

ChuanqiXu accepted this revision.Jul 24 2023, 6:53 PM

Thanks!

llvm/lib/Transforms/IPO/FunctionSpecialization.cpp
790–796

That's fine.

This revision is now accepted and ready to land.Jul 24 2023, 6:53 PM
This revision was landed with ongoing or failed builds.Jul 25 2023, 3:01 AM
This revision was automatically updated to reflect the committed changes.
raj.khem reopened this revision.Jul 26 2023, 2:07 AM
raj.khem added a subscriber: raj.khem.

I am seeing a crash in libLLVM-17.so when compiling socat on aarch64 and have narrowed it down to this change. here is testcase
https://uclibc.org/~kraj/nestlex.zip

compiler cmdline is

aarch64-yoe-linux-clang -target aarch64-yoe-linux nestlex.i -c -O

results in
segmentation fault (core dumped) ../recipe-sysroot-native/usr/bin/aarch64-yoe-linux/aarch64-yoe-linux-clang

works ok on reverting this patch from 17.x release branch

here is stack trace

                Stack trace of thread 1747742:
                #0  0x00007f84b9e8eaa7 _ZN4llvm12hash_combineIJhhtNS_9hash_codeES1_PNS_4TypeEEEES1_DpRKT_ (libLLVM-17.so + 0xc8eaa7)
                #1  0x00007f84b9e9a3ae _ZN4llvm17ConstantUniqueMapINS_12ConstantExprEE11getOrCreateEPNS_4TypeENS_19ConstantExprKeyTypeE (libLLVM-17.so + 0xc9a3ae)
                #2  0x00007f84b9ea14e4 _ZN4llvm12ConstantExpr11getIntToPtrEPNS_8ConstantEPNS_4TypeEb (libLLVM-17.so + 0xca14e4)
                #3  0x00007f84bb4f3dcc _ZN12_GLOBAL__N_123SymbolicallyEvaluateGEPEPKN4llvm11GEPOperatorENS0_8ArrayRefIPNS0_8ConstantEEERKNS0_10DataLayoutEPKNS0_17TargetLibraryInfoE (libLLVM-
17.so + 0x22f3dcc)
                #4  0x00007f84bb4f44d3 _ZN12_GLOBAL__N_128ConstantFoldInstOperandsImplEPKN4llvm5ValueEjNS0_8ArrayRefIPNS0_8ConstantEEERKNS0_10DataLayoutEPKNS0_17TargetLibraryInfoE (libLLVM-1
7.so + 0x22f44d3)
                #5  0x00007f84bb1b6e37 _ZN4llvm15InstCostVisitor22visitGetElementPtrInstERNS_17GetElementPtrInstE (libLLVM-17.so + 0x1fb6e37)
                #6  0x00007f84bb1bb36b _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb36b)
                #7  0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
                #8  0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
                #9  0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
                #10 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
                #11 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
                #12 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
                #13 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
                #14 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
                #15 0x00007f84bb1bb711 _ZN4llvm15InstCostVisitor12getUserBonusEPNS_11InstructionEPNS_5ValueEPNS_8ConstantE (libLLVM-17.so + 0x1fbb711)
This revision is now accepted and ready to land.Jul 26 2023, 2:07 AM

Thanks. Reverted. Looks like an infinite loop.

Thanks. Reverted. Looks like an infinite loop.

thank you. Can you also revert it on release/17.x branch as well please ?

Thanks. Reverted. Looks like an infinite loop.

thank you. Can you also revert it on release/17.x branch as well please ?

Will do.

labrinea updated this revision to Diff 544505.Jul 26 2023, 2:02 PM

Fixes the infinite loop in getUserBonus.

labrinea requested review of this revision.Jul 26 2023, 2:02 PM
labrinea planned changes to this revision.Jul 26 2023, 2:09 PM

The assertion "Found circle" fails.

labrinea updated this revision to Diff 544629.Jul 27 2023, 12:19 AM

Turned the "found circle" assertion in getUserBonus into early exit.

ChuanqiXu accepted this revision.Jul 27 2023, 12:29 AM

Maybe it is a good idea to add a test case for the circle issue?

This revision is now accepted and ready to land.Jul 27 2023, 12:29 AM
This revision was landed with ongoing or failed builds.Jul 27 2023, 11:27 AM
This revision was automatically updated to reflect the committed changes.

Hi @labrinea

the same problem with the following builders:

and few others ...

******************** TEST 'LLVM-Unit :: Transforms/IPO/./IPOTests/11/12' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/unittests/Transforms/IPO/./IPOTests-LLVM-Unit-1080736-11-12.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=12 GTEST_SHARD_INDEX=11 /home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/unittests/Transforms/IPO/./IPOTests
--
Script:
--
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/build/unittests/Transforms/IPO/./IPOTests --gtest_filter=FunctionSpecializationTest.PhiNode
--
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp:358: Failure
Expected equality of these values:
  Bonus
    Which is: 0
  Ref
    Which is: 2014
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp:359: Failure
Value of: Bonus > 0
  Actual: false
Expected: true
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp:358
Expected equality of these values:
  Bonus
    Which is: 0
  Ref
    Which is: 2014
/home/buildbot/as-builder-4/lld-x86_64-ubuntu-fast/llvm-project/llvm/unittests/Transforms/IPO/FunctionSpecializationTest.cpp:359
Value of: Bonus > 0
  Actual: false
Expected: true
********************
phosek added a subscriber: phosek.Jul 27 2023, 2:58 PM

We also see the same test failure on Windows:

Script:
--
C:\b\s\w\ir\x\w\llvm_build\unittests\Transforms\IPO\.\IPOTests.exe --gtest_filter=FunctionSpecializationTest.PhiNode
--
C:\b\s\w\ir\x\w\llvm-llvm-project\llvm\unittests\Transforms\IPO\FunctionSpecializationTest.cpp:358
Expected equality of these values:
  Bonus
    Which is: 0
  Ref
    Which is: 2014
C:\b\s\w\ir\x\w\llvm-llvm-project\llvm\unittests\Transforms\IPO\FunctionSpecializationTest.cpp:359
Value of: Bonus > 0
  Actual: false
Expected: true
labrinea updated this revision to Diff 545175.EditedJul 28 2023, 8:33 AM

The previous revision was sensitive to the order getSpecializationBonus was evaluated in the unittest.
Specifically I was under the assumption that the expression

Specializer.getSpecializationBonus(F->getArg(0), One, Visitor) +
Specializer.getSpecializationBonus(F->getArg(1), One, Visitor) +
Specializer.getSpecializationBonus(F->getArg(2), One, Visitor)

would evaluate in order, but it depends on the target, so it has an effect on the resulting value.

labrinea reopened this revision.Jul 28 2023, 8:33 AM
This revision is now accepted and ready to land.Jul 28 2023, 8:33 AM
labrinea requested review of this revision.Jul 28 2023, 8:33 AM
ChuanqiXu accepted this revision.Jul 30 2023, 6:55 PM
This revision is now accepted and ready to land.Jul 30 2023, 6:55 PM
This revision was landed with ongoing or failed builds.Jul 31 2023, 12:29 AM
This revision was automatically updated to reflect the committed changes.