This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold PHIs with equal incoming pointers
ClosedPublic

Authored by DaniilSuchkov on Sep 27 2019, 4:53 AM.

Details

Summary

In case when all incoming values of a PHI are equal pointers, this transformation inserts a definition of such a pointer right after definition of the base pointer and replaces with this value both PHI and all it's incoming pointers. Primary goal of this transformation is canonicalization of this pattern in order to enable optimizations that can't handle PHIs. Non-inbounds pointers aren't currently supported.

Diff Detail

Event Timeline

DaniilSuchkov created this revision.Sep 27 2019, 4:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2019, 4:53 AM

Please split baseline tests into a parent review and autogenerate them via utils/update_test_checks.py
It is not obvious what this actually does based just on those few sparse check-lines.
Should this go elsewhere, SimplifyCFG?

(style) auto should only be used when the type is obvious, this typically means the type is already present in the expression (cast/dyn_cast etc.)

Comments addressed: checks in tests are autogenerated, positive tests moved to a separate patch D68263, got rid of excessive use of auto keyword.

Should this go elsewhere, SimplifyCFG?

I don't see why it should, but if you think it does and others don't object, I won't mind moving it.

I see.
So we want to hoist identical gep+bitcast from 2 BB's into their closest dominating BB, more or less.
Do we have standalone hoisting pass?

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1157–1168

I'm not sure if this is correct.
I'd expect that we actually need to check to where we can move these geps.

I see.
So we want to hoist identical gep+bitcast from 2 BB's into their closest dominating BB, more or less.
Do we have standalone hoisting pass?

As stated in the "Summary", the main point is not to just hoist expressions, but to canonicalize them in order to open opportunities for further optimizations, so hoisting is more of a side-effect here (I chose it because sinking doesn't work). InstCombine sounds like the right place for canonicalization of specific simple patterns of instructions. GVN performs transformations of this kind but it takes number of uses into account (while for canonicalization it shouldn't matter) and also won't handle cases when the sequences of instructions producing equal pointers are different.

InstCombine already handles such patterns (is sinks instructions one by one) but with some serious limitations: only identical sequences of instructions are allowed and only for cases where PHI is the only user. If the latter restriction is lifted, the resulting code is then "optimized" back by GVN.

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1157–1168

Why do you think it's incorrect? Given the offset is constant, this code just looks for an insertion point right after base pointer definition and bails out in case when such point doesn't exist.

What restrictions did I miss?

lebedev.ri marked an inline comment as done.Oct 1 2019, 3:58 AM
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1157–1168

Actually, i guess this might be ok,
we don't do anything with the users,
and gep inbounds at worst produces poison.

Test checks updated.

lebedev.ri added inline comments.Oct 1 2019, 11:07 AM
llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1144–1155

This still looks like a hoisting problem to me, honestly.

In particular, why do we want *all* the incoming values to be identical?
I.e. why can't we track *which* values are identical,
and if there are two-or-more of of particular value hoist+common them?

lebedev.ri added inline comments.Oct 1 2019, 3:46 PM
llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1144–1155

To be more specific

struct ZZZ {
  Value *Base;
  int64_t Offset;
  bool IsInBounds;
}

DenseMap<ZZZ, SmallVector<Value *, 4>> QQQ;

for (Use &Incoming : PN.incoming_values()) {
  QQQ[GetPointerBaseWithConstantOffset(Incoming)].append(Incoming);
}

for(??? I : QQQ) {
  if(I.second.size() < 2)
    continue;
  <hoist>
  replaceInstUsesWith()
}
DaniilSuchkov added inline comments.Oct 1 2019, 11:11 PM
llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1144–1155

This still looks like a hoisting problem to me, honestly.

In particular, why do we want *all* the incoming values to be identical?
I.e. why can't we track *which* values are identical,
and if there are two-or-more of of particular value hoist+common them?

Because the main idea here is to get rid of a redundant PHI, since it makes the code analyzable by optimizations that can't (and in general don't need to) walk through PHIs. Yes, it also makes a number of identical pointers trivially equal, but it is more of a side-effect.
What you propose is more complex, has broader scope (thus is more likely to have unwanted side-effects) but it doesn't improve the solution of the initial problem.

I didn't step through the entire patch/tests, but the first few examples are flattened/simplified by -simplifycfg. Would it be better to look there to add the missing functionality? Is there a test in this patch that shows a case that simplifycfg can't already flatten?

lebedev.ri marked an inline comment as done.Oct 16 2019, 11:34 AM
lebedev.ri added inline comments.
llvm/include/llvm/IR/Value.h
575–576

This could better out what happens when more than one GEP is stripped, and they had different inbound-ness.

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1144–1155

Okay.

I didn't step through the entire patch/tests, but the first few examples are flattened/simplified by -simplifycfg. Would it be better to look there to add the missing functionality? Is there a test in this patch that shows a case that simplifycfg can't already flatten?

SimplifyCFG happens to flatten most of these tests by a transformation that performs hoisting of the common code from the branches to their only immediate predecessor and does only very primitive matching, and, according to comments around it, this transformation is intentionally cheap and simple. So, such an extension of simplifycfg would perform worse on all cases other than primitive tests and it's not exactly in line with the intentions behind the transformation.

The tests from D68263 are bad at showing when this transformation is more powerful than what SimplifyCFG can do because it wasn't the intention. Out of all the tests, following are out of scope of the SimplifyCFG transform and thus not simplified: test_extra_uses_multiple_geps, test_gep_and_bitcast_same_bb_and_extra_use, test_gep_and_bitcast_same_bb.

DaniilSuchkov added inline comments.Oct 17 2019, 12:31 AM
llvm/include/llvm/IR/Value.h
575–576

You're right, that is a tricky question, I will provide details on the behavior in this case.

Added a comment explaining how IsInBounds is set by stripAndAccumulateConstantOffsets.

DaniilSuchkov marked an inline comment as done.Oct 17 2019, 4:17 AM

Can we handle the case with inbounds offsets only first and extend with non-inbounds support in a follow up change? This way in the first change you'll not need to change IR interfaces at all.

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1163–1164

It's ok to insert at the end of BB.

Use IRBuilder::SetInsertPoint(BasicBlock *TheBB, BasicBlock::iterator IP) for that. After this change there will be no situation when InsertPt is available, so the code can structured slightly differently:

if (auto *BaseInst = dyn_cast<Instruction>(Base)) {
  if (isa<PHINode>(BaseInst))
    Builder.SetInsertPoint(BaseInst->getParent(), BaseInst->getParent()->getFirstInsertionPt())
  else
    Builder.SetInsertPoint(BaseInst->getNextNode());
} else
  Builder.SetInsertPoint(...);
DaniilSuchkov marked an inline comment as done.Nov 7 2019, 9:59 PM
DaniilSuchkov added inline comments.
llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1163–1164

It's ok to insert at the end of BB.

It can result in broken IR by insertion of a regular instruction after terminator.

Take a look at "test_neg_gep_and_bitcast_phi_no_insert_pt", in that case there is no valid insertion point in bb at all:

catch:
  ; There is no insertion point in this basic block!
  %obj = phi i8* [ %obj1, %entry ], [ %obj2, %cont ]
  %cs = catchswitch within none [label %doit] unwind to caller
DaniilSuchkov edited the summary of this revision. (Show Details)

Removed support of non-inbounds pointers.

apilipenko accepted this revision.Nov 8 2019, 6:58 PM
This revision is now accepted and ready to land.Nov 8 2019, 6:58 PM

No further comments from me.
This seems correct, but still not sure where this should actually be performed.

llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1171

Do you want to restore previous insertion point afterwards, i.e. use InsertPointGuard?

DaniilSuchkov marked an inline comment as done.Nov 10 2019, 10:24 PM
DaniilSuchkov added inline comments.
llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1171

I don't think so. In general it doesn't look like it is a common practice to restore insertion point in InstCombine (5 uses of InsertPointGuard vs 27 uses of SetInsertPoint). Moreover, in InstCombiner::run IP is being reset at every iteration, so it makes sense to reset IP only within helper functions that are to be called from other transformations, but this function isn't intended to be used this way.

This revision was automatically updated to reflect the committed changes.