This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest
ClosedPublic

Authored by aeubanks on Aug 27 2021, 12:37 PM.

Details

Summary

When determining whether to fold branches to a common destination by
merging two blocks, SimplifyCFG will count the number of instructions to
be moved into the first basic block. However, there's no reason to count
free instructions like bitcasts and other similar instructions.

This resolves missed branch foldings with -fstrict-vtable-pointers in
llvm-test-suite's lambda benchmark.

Diff Detail

Event Timeline

aeubanks created this revision.Aug 27 2021, 12:37 PM
aeubanks requested review of this revision.Aug 27 2021, 12:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2021, 12:37 PM

I'll precommit the test/update newly failing tests if this looks good

Please do.
I've been meaning to do something along those lines.

This probably should probably be TTI-cost driven completely, not simply instruction count, but the cost model here is dubious even ignoring that.

xbolva00 added inline comments.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3236–3237

Skip also AssumeInst?

vector-reductions-logical.ll looks bad, will need investigation

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3236–3237

should be handled with the new code
and actually, same with DbgInfoIntrinsic, which I've updated

lebedev.ri added inline comments.Aug 27 2021, 2:16 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3236–3237

Surely assumes aren't speculatable?

aeubanks added inline comments.Aug 27 2021, 2:22 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3236–3237

oh yes you're right
anyway, can be handled in a separate patch

aeubanks added inline comments.Aug 27 2021, 2:44 PM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

Now this branch is getting folded into the next basic block. Then at the end of -O2 when every fpext is eliminated, the final simplifycfg will fold every branch (since each block only consists of at most one extra instruction besides the cmp and branch), except for this block which is now slightly bigger.
Any ideas on how to fix this?

lebedev.ri added inline comments.Aug 27 2021, 2:57 PM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

I do not understand why this test is being affected at all, there are no zero-cost instructions here?

aeubanks added inline comments.Aug 27 2021, 3:13 PM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

seems like we consider %vecext17 = extractelement <4 x float> %t, i32 0 to be free

https://github.com/llvm/llvm-project/blob/063af63b9664151b3a9206feefa9a6a36a471e80/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L3433

I tried looking at the history, this special case seems very old

lebedev.ri added inline comments.Aug 27 2021, 3:22 PM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

Ah, right, that makes sense.
Didn't look, but not sure there is a nice fix here.

aeubanks added inline comments.
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

@spatel any thoughts on this?

This looks resonable. BTW are {launder,strip}.inariant.group intrinsics considered to be free now? I remember fixing this in InlineCost, but this seems to be other cost heuristic. They totally should be considered free.

spatel added inline comments.Aug 29 2021, 7:10 AM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

It's unfortunately a regression, but as the related tests show, we're not getting ideal results (2 vector compares) on most examples.

The cost model is telling the truth from its limited perspective - the extract from elt 0 is free, but the rest are not (they require shuffles).

We need to be able to view these as sequences rather than as individual instructions or basic blocks either here or in SLP to improve things.

A quick hack solution might be to adjust the bonus budget in the presence of vector ops. Ie, if code has vectors, we try harder to speculate instructions because we assume that the cost of branching is likely greater than it appears, and we recognize that creating larger basic blocks has positive impact on SLP.

lebedev.ri added inline comments.Aug 29 2021, 8:10 AM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

Now that simplifycfg does not speculate known not-taken branches,
my next step was to introduce a multiplier to be appled to speculation budgets
for known-taken branches.

So i think the vector hack makes sense.

aeubanks added inline comments.Aug 29 2021, 4:01 PM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

what's the right way to check if an instruction is a vector op?

lebedev.ri added inline comments.Aug 29 2021, 4:10 PM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

I guess you want to add Instruction::isVectorOp().
Perhaps we just want to check that any of: produced type, argument types
is a vector type?

aeubanks updated this revision to Diff 369506.Aug 30 2021, 11:36 AM

add bonus when we see a vector op

This looks resonable. BTW are {launder,strip}.inariant.group intrinsics considered to be free now? I remember fixing this in InlineCost, but this seems to be other cost heuristic. They totally should be considered free.

https://github.com/llvm/llvm-project/blob/83df94067d367d91dcc37e269a3d7317ebe97bb4/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h#L588

This looks resonable. BTW are {launder,strip}.inariant.group intrinsics considered to be free now? I remember fixing this in InlineCost, but this seems to be other cost heuristic. They totally should be considered free.

https://github.com/llvm/llvm-project/blob/83df94067d367d91dcc37e269a3d7317ebe97bb4/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h#L588

Looks like you added this in https://reviews.llvm.org/D51814 :)

lebedev.ri requested changes to this revision.Aug 30 2021, 11:49 AM

Could you please split this into

  1. introducing instruction::isvectorop
  2. adding bonus when vector ops are present? i'm not sure it should be an increment, and i don't think this is the right function
  3. rest of the patch?
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3140

I think this should be in Instruction.
Is there any instruction that takes scalars only but produces vector?

3230

I think this should be a cl::opt, and applied as a multiplier to BonusInstThreshold.

This revision now requires changes to proceed.Aug 30 2021, 11:49 AM

Could you please split this into

  1. introducing instruction::isvectorop
  2. adding bonus when vector ops are present? i'm not sure it should be an increment, and i don't think this is the right function
  3. rest of the patch?

D108935

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3140

Instruction doesn't have methods that inspect the operands (except some equality methods). Currently this is specific to SimplifyCFG so I think at least for now it should be here.

Not sure if any instruction takes scalars and produces vectors, but those should count?

lebedev.ri added inline comments.Aug 30 2021, 12:20 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3140

Err, right, they are, missed the I.getType()->isVectorTy() || , sorry.

some tests are crashing in CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses(), reduced to running simplifycfg on

define i32 @test(i32 %len) local_unnamed_addr {
entry:
  br i1 undef, label %for.cond.preheader, label %if.end

for.cond.preheader:                               ; preds = %entry
  %0 = bitcast [1 x i64]* undef to i8*
  %cmp15 = icmp slt i32 1, %len
  br i1 %cmp15, label %for.body.lr.ph, label %if.end.loopexit

for.body.lr.ph:                                   ; preds = %for.cond.preheader
  br label %for.body

for.body:                                         ; preds = %for.body, %for.body.lr.ph
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %0)
  br label %for.body

if.end.loopexit:                                  ; preds = %for.cond.preheader
  br label %if.end

if.end:                                           ; preds = %if.end.loopexit, %entry
  ret i32 0
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #0

declare void @foo() local_unnamed_addr

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

taking a look

Seems to be in the code added recently in https://reviews.llvm.org/rG909cba969981032c5740774ca84a34b7f76b909b

with

diff --cc llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index ac386761e12b,ac386761e12b..32bba11659e7
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@@ -1107,6 -1107,6 +1107,9 @@@ static void CloneInstructionsIntoPredec
        auto *UI = cast<Instruction>(U.getUser());
        auto *PN = dyn_cast<PHINode>(UI);
        if (!PN) {
++        BonusInst.dump();
++        UI->dump();
++        errs() << (UI->getParent() == BB) << " " << BonusInst.comesBefore(UI) << "\n";
          assert(UI->getParent() == BB && BonusInst.comesBefore(UI) &&
                 "If the user is not a PHI node, then it should be in the same "
                 "block as, and come after, the original bonus instruction.");

I'm seeing

  %.old = bitcast [1 x i64]* undef to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %.old)
0 opt: ../../llvm/lib/IR/Instruction.cpp:114: bool llvm::Instruction::comesBefore(const llvm::Instruction *) const: Assertion `Parent == Other->Parent && "cross-BB instruction order comparison"' failed.
aeubanks updated this revision to Diff 373062.Sep 16 2021, 1:58 PM

don't skip over recently added IsBCSSAUse check for free instructions
should be good to review now

spatel accepted this revision.Sep 22 2021, 6:03 AM

We handled any controversy in the earlier patches, so this is now the minimal patch as described. LGTM

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3255–3256

Nit: I'd put the comment before the 'if' line that it explains.

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135

Checking the types seems like it would do.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 22 2021, 9:53 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.