This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
4/9
SimplifyCFG.cpp
-
test/
-
CodeGen/AArch64/
-
AArch64/
-
csr-split.ll
-
Transforms/
-
PhaseOrdering/X86/
-
X86/
4/10
vector-reductions-logical.ll
-
SimplifyCFG/
-
fold-branch-to-common-dest-free-cost.ll

Differential D108837

[SimplifyCFG] Ignore free instructions when computing cost for folding branch to common dest
ClosedPublic

Authored by aeubanks on Aug 27 2021, 12:37 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
Prazek
spatel

Commits

rGe7249e4acf3c: [SimplifyCFG] Ignore free instructions when computing cost for folding branch…

Summary

When determining whether to fold branches to a common destination by
merging two blocks, SimplifyCFG will count the number of instructions to
be moved into the first basic block. However, there's no reason to count
free instructions like bitcasts and other similar instructions.

This resolves missed branch foldings with -fstrict-vtable-pointers in
llvm-test-suite's lambda benchmark.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aeubanks created this revision.Aug 27 2021, 12:37 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 27 2021, 12:37 PM

aeubanks requested review of this revision.Aug 27 2021, 12:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2021, 12:37 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I'll precommit the test/update newly failing tests if this looks good

Please do.
I've been meaning to do something along those lines.

This probably should probably be TTI-cost driven completely, not simply instruction count, but the cost model here is dubious even ignoring that.

xbolva00 added a subscriber: xbolva00.Aug 27 2021, 1:17 PM

xbolva00 added inline comments.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3236–3237	Skip also AssumeInst?

Harbormaster completed remote builds in B121537: Diff 369162.Aug 27 2021, 1:21 PM

aeubanks mentioned this in rG97ae9193dfe1: [test] Precommit test for D108837.Aug 27 2021, 1:40 PM

update

vector-reductions-logical.ll looks bad, will need investigation

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3236–3237	should be handled with the new code and actually, same with DbgInfoIntrinsic, which I've updated

lebedev.ri added inline comments.Aug 27 2021, 2:16 PM

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3236–3237	Surely assumes aren't speculatable?

aeubanks added inline comments.Aug 27 2021, 2:22 PM

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3236–3237	oh yes you're right anyway, can be handled in a separate patch

aeubanks added inline comments.Aug 27 2021, 2:44 PM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	Now this branch is getting folded into the next basic block. Then at the end of -O2 when every `fpext` is eliminated, the final simplifycfg will fold every branch (since each block only consists of at most one extra instruction besides the cmp and branch), except for this block which is now slightly bigger. Any ideas on how to fix this?

lebedev.ri added inline comments.Aug 27 2021, 2:57 PM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	I do not understand why this test is being affected at all, there are no zero-cost instructions here?

aeubanks added inline comments.Aug 27 2021, 3:13 PM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	seems like we consider `%vecext17 = extractelement <4 x float> %t, i32 0` to be free https://github.com/llvm/llvm-project/blob/063af63b9664151b3a9206feefa9a6a36a471e80/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L3433 I tried looking at the history, this special case seems very old

lebedev.ri added inline comments.Aug 27 2021, 3:22 PM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	Ah, right, that makes sense. Didn't look, but not sure there is a nice fix here.

aeubanks added a reviewer: spatel.Aug 27 2021, 3:34 PM

aeubanks added inline comments.

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	@spatel any thoughts on this?

Harbormaster completed remote builds in B121549: Diff 369180.Aug 27 2021, 3:47 PM

This looks resonable. BTW are {launder,strip}.inariant.group intrinsics considered to be free now? I remember fixing this in InlineCost, but this seems to be other cost heuristic. They totally should be considered free.

spatel added inline comments.Aug 29 2021, 7:10 AM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	It's unfortunately a regression, but as the related tests show, we're not getting ideal results (2 vector compares) on most examples. The cost model is telling the truth from its limited perspective - the extract from elt 0 is free, but the rest are not (they require shuffles). We need to be able to view these as sequences rather than as individual instructions or basic blocks either here or in SLP to improve things. A quick hack solution might be to adjust the bonus budget in the presence of vector ops. Ie, if code has vectors, we try harder to speculate instructions because we assume that the cost of branching is likely greater than it appears, and we recognize that creating larger basic blocks has positive impact on SLP.

lebedev.ri added inline comments.Aug 29 2021, 8:10 AM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	Now that simplifycfg does not speculate known not-taken branches, my next step was to introduce a multiplier to be appled to speculation budgets for known-taken branches. So i think the vector hack makes sense.

aeubanks added inline comments.Aug 29 2021, 4:01 PM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	what's the right way to check if an instruction is a vector op?

lebedev.ri added inline comments.Aug 29 2021, 4:10 PM

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	I guess you want to add `Instruction::isVectorOp()`. Perhaps we just want to check that any of: produced type, argument types is a vector type?

add bonus when we see a vector op

In D108837#2970575, @Prazek wrote:

This looks resonable. BTW are {launder,strip}.inariant.group intrinsics considered to be free now? I remember fixing this in InlineCost, but this seems to be other cost heuristic. They totally should be considered free.

https://github.com/llvm/llvm-project/blob/83df94067d367d91dcc37e269a3d7317ebe97bb4/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h#L588

Harbormaster completed remote builds in B121786: Diff 369506.Aug 30 2021, 11:37 AM

In D108837#2972812, @aeubanks wrote:

In D108837#2970575, @Prazek wrote:

This looks resonable. BTW are {launder,strip}.inariant.group intrinsics considered to be free now? I remember fixing this in InlineCost, but this seems to be other cost heuristic. They totally should be considered free.

https://github.com/llvm/llvm-project/blob/83df94067d367d91dcc37e269a3d7317ebe97bb4/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h#L588

Looks like you added this in https://reviews.llvm.org/D51814 :)

Could you please split this into

introducing instruction::isvectorop
adding bonus when vector ops are present? i'm not sure it should be an increment, and i don't think this is the right function
rest of the patch?

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3140	I think this should be in `Instruction`. Is there any instruction that takes scalars only but produces vector?
3230	I think this should be a `cl::opt`, and applied as a multiplier to `BonusInstThreshold`.

This revision now requires changes to proceed.Aug 30 2021, 11:49 AM

aeubanks mentioned this in D108935: [SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest.Aug 30 2021, 12:17 PM

In D108837#2972861, @lebedev.ri wrote:

Could you please split this into

introducing instruction::isvectorop

adding bonus when vector ops are present? i'm not sure it should be an increment, and i don't think this is the right function

rest of the patch?

D108935

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3140	`Instruction` doesn't have methods that inspect the operands (except some equality methods). Currently this is specific to SimplifyCFG so I think at least for now it should be here. Not sure if any instruction takes scalars and produces vectors, but those should count?

lebedev.ri added inline comments.Aug 30 2021, 12:20 PM

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3140	Err, right, they are, missed the `I.getType()->isVectorTy() \|\|` , sorry.

aeubanks mentioned this in rGd49cb5b3035b: [SimplifyCFG] Add bonus when seeing vector ops to branch fold to common dest.Sep 16 2021, 10:55 AM

rebase

some tests are crashing in CloneInstructionsIntoPredecessorBlockAndUpdateSSAUses(), reduced to running simplifycfg on

define i32 @test(i32 %len) local_unnamed_addr {
entry:
  br i1 undef, label %for.cond.preheader, label %if.end

for.cond.preheader:                               ; preds = %entry
  %0 = bitcast [1 x i64]* undef to i8*
  %cmp15 = icmp slt i32 1, %len
  br i1 %cmp15, label %for.body.lr.ph, label %if.end.loopexit

for.body.lr.ph:                                   ; preds = %for.cond.preheader
  br label %for.body

for.body:                                         ; preds = %for.body, %for.body.lr.ph
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %0)
  br label %for.body

if.end.loopexit:                                  ; preds = %for.cond.preheader
  br label %if.end

if.end:                                           ; preds = %if.end.loopexit, %entry
  ret i32 0
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #0

declare void @foo() local_unnamed_addr

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

taking a look

Seems to be in the code added recently in https://reviews.llvm.org/rG909cba969981032c5740774ca84a34b7f76b909b

with

diff --cc llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index ac386761e12b,ac386761e12b..32bba11659e7
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@@ -1107,6 -1107,6 +1107,9 @@@ static void CloneInstructionsIntoPredec
        auto *UI = cast<Instruction>(U.getUser());
        auto *PN = dyn_cast<PHINode>(UI);
        if (!PN) {
++        BonusInst.dump();
++        UI->dump();
++        errs() << (UI->getParent() == BB) << " " << BonusInst.comesBefore(UI) << "\n";
          assert(UI->getParent() == BB && BonusInst.comesBefore(UI) &&
                 "If the user is not a PHI node, then it should be in the same "
                 "block as, and come after, the original bonus instruction.");

I'm seeing

  %.old = bitcast [1 x i64]* undef to i8*
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %.old)
0 opt: ../../llvm/lib/IR/Instruction.cpp:114: bool llvm::Instruction::comesBefore(const llvm::Instruction *) const: Assertion `Parent == Other->Parent && "cross-BB instruction order comparison"' failed.

Harbormaster completed remote builds in B124240: Diff 373013.Sep 16 2021, 11:49 AM

don't skip over recently added IsBCSSAUse check for free instructions
should be good to review now

Harbormaster completed remote builds in B124277: Diff 373062.Sep 16 2021, 2:19 PM

We handled any controversy in the earlier patches, so this is now the minimal patch as described. LGTM

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
3255–3256	Nit: I'd put the comment before the 'if' line that it explains.
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
135	Checking the types seems like it would do.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 22 2021, 9:53 AM

This revision was landed with ongoing or failed builds.

Closed by commit rGe7249e4acf3c: [SimplifyCFG] Ignore free instructions when computing cost for folding branch… (authored by aeubanks). · Explain Why

This revision was automatically updated to reflect the committed changes.

aeubanks added a commit: rGe7249e4acf3c: [SimplifyCFG] Ignore free instructions when computing cost for folding branch….

nikic mentioned this in D110290: [JumpThreading] Ignore free instructions.Sep 22 2021, 1:52 PM

nikic mentioned this in rG1e3c6fc7cb9d: [JumpThreading] Ignore free instructions.Sep 23 2021, 9:28 AM

hans mentioned this in rG4604695d7c20: Revert "[JumpThreading] Ignore free instructions".Sep 24 2021, 7:25 AM

hans mentioned this in rG1e9afab87569: Re-apply "[JumpThreading] Ignore free instructions".Sep 24 2021, 9:52 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

SimplifyCFG.cpp

20 lines

test/

CodeGen/

AArch64/

csr-split.ll

34 lines

Transforms/

PhaseOrdering/

X86/

vector-reductions-logical.ll

43 lines

SimplifyCFG/

fold-branch-to-common-dest-free-cost.ll

5 lines

Diff 369506

llvm/lib/Transforms/Utils/SimplifyCFG.cpp

Show First 20 Lines • Show All 3,131 Lines • ▼ Show 20 Lines	if (isa<DbgInfoIntrinsic>(I)) {
NewI->insertBefore(PBI);		NewI->insertBefore(PBI);
}		}
}		}

++NumFoldBranchToCommonDest;		++NumFoldBranchToCommonDest;
return true;		return true;
}		}

		static bool isVectorOp(Instruction &I) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions I think this should be in `Instruction`. Is there any instruction that takes scalars only but produces vector? lebedev.ri: I think this should be in `Instruction`. Is there any instruction that takes scalars only but…
		aeubanksAuthorUnsubmitted Done Reply Inline Actions `Instruction` doesn't have methods that inspect the operands (except some equality methods). Currently this is specific to SimplifyCFG so I think at least for now it should be here. Not sure if any instruction takes scalars and produces vectors, but those should count? aeubanks: `Instruction` doesn't have methods that inspect the operands (except some equality methods).
		lebedev.riUnsubmitted Done Reply Inline Actions Err, right, they are, missed the `I.getType()->isVectorTy() \|\|` , sorry. lebedev.ri: Err, right, they are, missed the `I.getType()->isVectorTy() \|\| `, sorry.
		return I.getType()->isVectorTy() \|\| any_of(I.operands(), [](Use &U) {
		return U->getType()->isVectorTy();
		});
		}

/// If this basic block is simple enough, and if a predecessor branches to us		/// If this basic block is simple enough, and if a predecessor branches to us
/// and one of our successors, fold the block into the predecessor and use		/// and one of our successors, fold the block into the predecessor and use
/// logical operations to pick the right destination.		/// logical operations to pick the right destination.
bool llvm::FoldBranchToCommonDest(BranchInst BI, DomTreeUpdater DTU,		bool llvm::FoldBranchToCommonDest(BranchInst BI, DomTreeUpdater DTU,
MemorySSAUpdater *MSSAU,		MemorySSAUpdater *MSSAU,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
unsigned BonusInstThreshold) {		unsigned BonusInstThreshold) {
// If this block ends with an unconditional branch,		// If this block ends with an unconditional branch,
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	bool llvm::FoldBranchToCommonDest(BranchInst BI, DomTreeUpdater DTU,

// Only allow this transformation if computing the condition doesn't involve		// Only allow this transformation if computing the condition doesn't involve
// too many instructions and these involved instructions can be executed		// too many instructions and these involved instructions can be executed
// unconditionally. We denote all involved instructions except the condition		// unconditionally. We denote all involved instructions except the condition
// as "bonus instructions", and only allow this transformation when the		// as "bonus instructions", and only allow this transformation when the
// number of the bonus instructions we'll need to create when cloning into		// number of the bonus instructions we'll need to create when cloning into
// each predecessor does not exceed a certain threshold.		// each predecessor does not exceed a certain threshold.
unsigned NumBonusInsts = 0;		unsigned NumBonusInsts = 0;
		unsigned VectorBonusThreshold = 0;
		lebedev.riUnsubmitted Not Done Reply Inline Actions I think this should be a `cl::opt`, and applied as a multiplier to `BonusInstThreshold`. lebedev.ri: I think this should be a `cl::opt`, and applied as a multiplier to `BonusInstThreshold`.
const unsigned PredCount = Preds.size();		const unsigned PredCount = Preds.size();
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
// Don't check the branch condition comparison itself.		// Don't check the branch condition comparison itself.
if (&I == Cond)		if (&I == Cond)
continue;		continue;
// Ignore dbg intrinsics, and the terminator.		// Ignore the terminator.
if (isa<DbgInfoIntrinsic>(I) \|\| isa<BranchInst>(I))		if (isa<BranchInst>(I))
		xbolva00Unsubmitted Not Done Reply Inline Actions Skip also AssumeInst? xbolva00: Skip also AssumeInst?
		aeubanksAuthorUnsubmitted Done Reply Inline Actions should be handled with the new code and actually, same with DbgInfoIntrinsic, which I've updated aeubanks: should be handled with the new code and actually, same with DbgInfoIntrinsic, which I've updated
		lebedev.riUnsubmitted Not Done Reply Inline Actions Surely assumes aren't speculatable? lebedev.ri: Surely assumes aren't speculatable?
		aeubanksAuthorUnsubmitted Done Reply Inline Actions oh yes you're right anyway, can be handled in a separate patch aeubanks: oh yes you're right anyway, can be handled in a separate patch
continue;		continue;
// I must be safe to execute unconditionally.		// I must be safe to execute unconditionally.
if (!isSafeToSpeculativelyExecute(&I))		if (!isSafeToSpeculativelyExecute(&I))
return false;		return false;
		// Ignore free instructions.
		if (TTI && TTI->getUserCost(&I, CostKind) == TargetTransformInfo::TCC_Free)
		continue;

		// Add a bonus if we see vector instructions.
		if (isVectorOp(I))
		VectorBonusThreshold = 2;

// Account for the cost of duplicating this instruction into each		// Account for the cost of duplicating this instruction into each
// predecessor.		// predecessor.
NumBonusInsts += PredCount;		NumBonusInsts += PredCount;
// Early exits once we reach the limit.		// Early exits once we reach the limit.
if (NumBonusInsts > BonusInstThreshold)		if (NumBonusInsts > BonusInstThreshold + VectorBonusThreshold)
return false;		return false;
}		}
		spatelUnsubmitted Not Done Reply Inline Actions Nit: I'd put the comment before the 'if' line that it explains. spatel: Nit: I'd put the comment before the 'if' line that it explains.

// Ok, we have the budget. Perform the transformation.		// Ok, we have the budget. Perform the transformation.
for (BasicBlock *PredBlock : Preds) {		for (BasicBlock *PredBlock : Preds) {
auto *PBI = cast<BranchInst>(PredBlock->getTerminator());		auto *PBI = cast<BranchInst>(PredBlock->getTerminator());
return performBranchToCommonDestFolding(BI, PBI, DTU, MSSAU, TTI);		return performBranchToCommonDestFolding(BI, PBI, DTU, MSSAU, TTI);
}		}
return false;		return false;
}		}
▲ Show 20 Lines • Show All 3,472 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/csr-split.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines

	define dso_local signext i32 @test2(i32* %p1) local_unnamed_addr {			define dso_local signext i32 @test2(i32* %p1) local_unnamed_addr {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: stp x30, x19, [sp, #-16]! // 16-byte Folded Spill			; CHECK-NEXT: stp x30, x19, [sp, #-16]! // 16-byte Folded Spill
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: .cfi_offset w19, -8			; CHECK-NEXT: .cfi_offset w19, -8
	; CHECK-NEXT: .cfi_offset w30, -16			; CHECK-NEXT: .cfi_offset w30, -16
	; CHECK-NEXT: cbz x0, .LBB1_2			; CHECK-NEXT: cbz x0, .LBB1_3
	; CHECK-NEXT: // %bb.1: // %if.end			; CHECK-NEXT: // %bb.1: // %entry
	; CHECK-NEXT: adrp x8, a			; CHECK-NEXT: adrp x8, a
	; CHECK-NEXT: ldrsw x8, [x8, :lo12:a]			; CHECK-NEXT: ldrsw x8, [x8, :lo12:a]
	; CHECK-NEXT: mov x19, x0			; CHECK-NEXT: mov x19, x0
	; CHECK-NEXT: cmp x8, x0			; CHECK-NEXT: cmp x8, x0
	; CHECK-NEXT: b.eq .LBB1_3			; CHECK-NEXT: b.ne .LBB1_3
	; CHECK-NEXT: .LBB1_2: // %return			; CHECK-NEXT: // %bb.2: // %if.then2
	; CHECK-NEXT: mov w0, wzr
	; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload
	; CHECK-NEXT: ret
	; CHECK-NEXT: .LBB1_3: // %if.then2
	; CHECK-NEXT: bl callVoid			; CHECK-NEXT: bl callVoid
	; CHECK-NEXT: mov x0, x19			; CHECK-NEXT: mov x0, x19
	; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload			; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload
	; CHECK-NEXT: b callNonVoid			; CHECK-NEXT: b callNonVoid
				; CHECK-NEXT: .LBB1_3: // %return
				; CHECK-NEXT: mov w0, wzr
				; CHECK-NEXT: ldp x30, x19, [sp], #16 // 16-byte Folded Reload
				; CHECK-NEXT: ret
	;			;
	; CHECK-APPLE-LABEL: test2:			; CHECK-APPLE-LABEL: test2:
	; CHECK-APPLE: ; %bb.0: ; %entry			; CHECK-APPLE: ; %bb.0: ; %entry
	; CHECK-APPLE-NEXT: stp x20, x19, [sp, #-32]! ; 16-byte Folded Spill			; CHECK-APPLE-NEXT: stp x20, x19, [sp, #-32]! ; 16-byte Folded Spill
	; CHECK-APPLE-NEXT: stp x29, x30, [sp, #16] ; 16-byte Folded Spill			; CHECK-APPLE-NEXT: stp x29, x30, [sp, #16] ; 16-byte Folded Spill
	; CHECK-APPLE-NEXT: .cfi_def_cfa_offset 32			; CHECK-APPLE-NEXT: .cfi_def_cfa_offset 32
	; CHECK-APPLE-NEXT: .cfi_offset w30, -8			; CHECK-APPLE-NEXT: .cfi_offset w30, -8
	; CHECK-APPLE-NEXT: .cfi_offset w29, -16			; CHECK-APPLE-NEXT: .cfi_offset w29, -16
	; CHECK-APPLE-NEXT: .cfi_offset w19, -24			; CHECK-APPLE-NEXT: .cfi_offset w19, -24
	; CHECK-APPLE-NEXT: .cfi_offset w20, -32			; CHECK-APPLE-NEXT: .cfi_offset w20, -32
	; CHECK-APPLE-NEXT: cbz x0, LBB1_2			; CHECK-APPLE-NEXT: cbz x0, LBB1_3
	; CHECK-APPLE-NEXT: ; %bb.1: ; %if.end			; CHECK-APPLE-NEXT: ; %bb.1: ; %entry
	; CHECK-APPLE-NEXT: Lloh2:			; CHECK-APPLE-NEXT: Lloh2:
	; CHECK-APPLE-NEXT: adrp x8, _a@PAGE			; CHECK-APPLE-NEXT: adrp x8, _a@PAGE
	; CHECK-APPLE-NEXT: Lloh3:			; CHECK-APPLE-NEXT: Lloh3:
	; CHECK-APPLE-NEXT: ldrsw x8, [x8, _a@PAGEOFF]			; CHECK-APPLE-NEXT: ldrsw x8, [x8, _a@PAGEOFF]
	; CHECK-APPLE-NEXT: mov x19, x0			; CHECK-APPLE-NEXT: mov x19, x0
	; CHECK-APPLE-NEXT: cmp x8, x0			; CHECK-APPLE-NEXT: cmp x8, x0
	; CHECK-APPLE-NEXT: b.eq LBB1_3			; CHECK-APPLE-NEXT: b.ne LBB1_3
	; CHECK-APPLE-NEXT: LBB1_2: ; %return			; CHECK-APPLE-NEXT: ; %bb.2: ; %if.then2
	; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
	; CHECK-APPLE-NEXT: mov w0, wzr
	; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
	; CHECK-APPLE-NEXT: ret
	; CHECK-APPLE-NEXT: LBB1_3: ; %if.then2
	; CHECK-APPLE-NEXT: bl _callVoid			; CHECK-APPLE-NEXT: bl _callVoid
	; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload			; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
	; CHECK-APPLE-NEXT: mov x0, x19			; CHECK-APPLE-NEXT: mov x0, x19
	; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload			; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
	; CHECK-APPLE-NEXT: b _callNonVoid			; CHECK-APPLE-NEXT: b _callNonVoid
				; CHECK-APPLE-NEXT: LBB1_3: ; %return
				; CHECK-APPLE-NEXT: ldp x29, x30, [sp, #16] ; 16-byte Folded Reload
				; CHECK-APPLE-NEXT: mov w0, wzr
				; CHECK-APPLE-NEXT: ldp x20, x19, [sp], #32 ; 16-byte Folded Reload
				; CHECK-APPLE-NEXT: ret
	; CHECK-APPLE-NEXT: .loh AdrpLdr Lloh2, Lloh3			; CHECK-APPLE-NEXT: .loh AdrpLdr Lloh2, Lloh3
	entry:			entry:
	%tobool = icmp eq i32* %p1, null			%tobool = icmp eq i32* %p1, null
	br i1 %tobool, label %return, label %if.end			br i1 %tobool, label %return, label %if.end

	if.end: ; preds = %entry			if.end: ; preds = %entry
	%0 = load i32, i32* @a, align 4, !tbaa !2			%0 = load i32, i32* @a, align 4, !tbaa !2
	%conv = sext i32 %0 to i64			%conv = sext i32 %0 to i64
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	lor.lhs.false6:
%conv8 = fpext float %vecext7 to double		%conv8 = fpext float %vecext7 to double
%cmp9 = fcmp olt double %conv8, 0.000000e+00		%cmp9 = fcmp olt double %conv8, 0.000000e+00
br i1 %cmp9, label %if.then, label %lor.lhs.false11		br i1 %cmp9, label %if.then, label %lor.lhs.false11

lor.lhs.false11:		lor.lhs.false11:
%vecext12 = extractelement <4 x float> %t, i32 3		%vecext12 = extractelement <4 x float> %t, i32 3
%conv13 = fpext float %vecext12 to double		%conv13 = fpext float %vecext12 to double
%cmp14 = fcmp olt double %conv13, 0.000000e+00		%cmp14 = fcmp olt double %conv13, 0.000000e+00
br i1 %cmp14, label %if.then, label %lor.lhs.false16		br i1 %cmp14, label %if.then, label %lor.lhs.false16
		aeubanksAuthorUnsubmitted Done Reply Inline Actions Now this branch is getting folded into the next basic block. Then at the end of -O2 when every `fpext` is eliminated, the final simplifycfg will fold every branch (since each block only consists of at most one extra instruction besides the cmp and branch), except for this block which is now slightly bigger. Any ideas on how to fix this? aeubanks: Now this branch is getting folded into the next basic block. Then at the end of -O2 when every…
		lebedev.riUnsubmitted Not Done Reply Inline Actions I do not understand why this test is being affected at all, there are no zero-cost instructions here? lebedev.ri: I do not understand why this test is being affected at all, there are no zero-cost instructions…
		aeubanksAuthorUnsubmitted Done Reply Inline Actions seems like we consider `%vecext17 = extractelement <4 x float> %t, i32 0` to be free https://github.com/llvm/llvm-project/blob/063af63b9664151b3a9206feefa9a6a36a471e80/llvm/lib/Target/X86/X86TargetTransformInfo.cpp#L3433 I tried looking at the history, this special case seems very old aeubanks: seems like we consider `%vecext17 = extractelement <4 x float> %t, i32 0` to be free https…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Ah, right, that makes sense. Didn't look, but not sure there is a nice fix here. lebedev.ri: Ah, right, that makes sense. Didn't look, but not sure there is a nice fix here.
		aeubanksAuthorUnsubmitted Done Reply Inline Actions @spatel any thoughts on this? aeubanks: @spatel any thoughts on this?
		spatelUnsubmitted Not Done Reply Inline Actions It's unfortunately a regression, but as the related tests show, we're not getting ideal results (2 vector compares) on most examples. The cost model is telling the truth from its limited perspective - the extract from elt 0 is free, but the rest are not (they require shuffles). We need to be able to view these as sequences rather than as individual instructions or basic blocks either here or in SLP to improve things. A quick hack solution might be to adjust the bonus budget in the presence of vector ops. Ie, if code has vectors, we try harder to speculate instructions because we assume that the cost of branching is likely greater than it appears, and we recognize that creating larger basic blocks has positive impact on SLP. spatel: It's unfortunately a regression, but as the related tests show, we're not getting ideal results…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Now that simplifycfg does not speculate known not-taken branches, my next step was to introduce a multiplier to be appled to speculation budgets for known-taken branches. So i think the vector hack makes sense. lebedev.ri: Now that simplifycfg does not speculate known not-taken branches, my next step was to introduce…
		aeubanksAuthorUnsubmitted Done Reply Inline Actions what's the right way to check if an instruction is a vector op? aeubanks: what's the right way to check if an instruction is a vector op?
		lebedev.riUnsubmitted Not Done Reply Inline Actions I guess you want to add `Instruction::isVectorOp()`. Perhaps we just want to check that any of: produced type, argument types is a vector type? lebedev.ri: I guess you want to add `Instruction::isVectorOp()`. Perhaps we just want to check that any of…
		spatelUnsubmitted Not Done Reply Inline Actions Checking the types seems like it would do. spatel: Checking the types seems like it would do.

lor.lhs.false16:		lor.lhs.false16:
%vecext17 = extractelement <4 x float> %t, i32 0		%vecext17 = extractelement <4 x float> %t, i32 0
%conv18 = fpext float %vecext17 to double		%conv18 = fpext float %vecext17 to double
%cmp19 = fcmp ogt double %conv18, 1.000000e+00		%cmp19 = fcmp ogt double %conv18, 1.000000e+00
br i1 %cmp19, label %if.then, label %lor.lhs.false21		br i1 %cmp19, label %if.then, label %lor.lhs.false21

lor.lhs.false21:		lor.lhs.false21:
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
return:		return:
%retval.0 = phi float [ 0.000000e+00, %if.then ], [ 0.000000e+00, %if.then35 ], [ %add, %if.end36 ]		%retval.0 = phi float [ 0.000000e+00, %if.then ], [ 0.000000e+00, %if.then35 ], [ %add, %if.end36 ]
ret float %retval.0		ret float %retval.0
}		}

define float @test_separate_anyof_v4sf(<4 x float> %t) {		define float @test_separate_anyof_v4sf(<4 x float> %t) {
; CHECK-LABEL: @test_separate_anyof_v4sf(		; CHECK-LABEL: @test_separate_anyof_v4sf(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = extractelement <4 x float> [[T:%.]], i32 3		; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x float> [[T:%.]]
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[T]], i32 2		; CHECK-NEXT: [[TMP0:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <4 x float> [[T]], i32 1		; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[T]], i32 0		; CHECK-NEXT: [[DOTNOT:%.*]] = icmp eq i4 [[TMP1]], 0
; CHECK-NEXT: [[T_FR:%.*]] = freeze <4 x float> [[T]]		; CHECK-NEXT: br i1 [[DOTNOT]], label [[IF_END:%.]], label [[RETURN:%.]]
; CHECK-NEXT: [[TMP4:%.*]] = fcmp olt <4 x float> [[T_FR]], zeroinitializer		; CHECK: if.end:
; CHECK-NEXT: [[TMP5:%.*]] = bitcast <4 x i1> [[TMP4]] to i4		; CHECK-NEXT: [[TMP2:%.*]] = fcmp ogt <4 x float> [[T_FR]], <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne i4 [[TMP5]], 0		; CHECK-NEXT: [[TMP3:%.*]] = bitcast <4 x i1> [[TMP2]] to i4
; CHECK-NEXT: [[CMP18:%.*]] = fcmp ogt float [[TMP3]], 1.000000e+00		; CHECK-NEXT: [[DOTNOT7:%.*]] = icmp eq i4 [[TMP3]], 0
; CHECK-NEXT: [[OR_COND3:%.*]] = select i1 [[TMP6]], i1 true, i1 [[CMP18]]		; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x float> [[T_FR]], <4 x float> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[CMP23:%.*]] = fcmp ogt float [[TMP2]], 1.000000e+00		; CHECK-NEXT: [[TMP4:%.*]] = fadd <4 x float> [[SHIFT]], [[T_FR]]
; CHECK-NEXT: [[OR_COND4:%.*]] = select i1 [[OR_COND3]], i1 true, i1 [[CMP23]]		; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x float> [[TMP4]], i32 0
; CHECK-NEXT: [[CMP28:%.*]] = fcmp ogt float [[TMP1]], 1.000000e+00		; CHECK-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[DOTNOT7]], float [[ADD]], float 0.000000e+00
; CHECK-NEXT: [[OR_COND5:%.*]] = select i1 [[OR_COND4]], i1 true, i1 [[CMP28]]		; CHECK-NEXT: br label [[RETURN]]
; CHECK-NEXT: [[CMP33:%.*]] = fcmp ogt float [[TMP0]], 1.000000e+00		; CHECK: return:
; CHECK-NEXT: [[OR_COND6:%.*]] = select i1 [[OR_COND5]], i1 true, i1 [[CMP33]]		; CHECK-NEXT: [[RETVAL_0:%.]] = phi float [ 0.000000e+00, [[ENTRY:%.]] ], [ [[SPEC_SELECT]], [[IF_END]] ]
; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP2]]
; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND6]], float 0.000000e+00, float [[ADD]]
; CHECK-NEXT: ret float [[RETVAL_0]]		; CHECK-NEXT: ret float [[RETVAL_0]]
;		;
entry:		entry:
%vecext = extractelement <4 x float> %t, i32 0		%vecext = extractelement <4 x float> %t, i32 0
%conv = fpext float %vecext to double		%conv = fpext float %vecext to double
%cmp = fcmp olt double %conv, 0.000000e+00		%cmp = fcmp olt double %conv, 0.000000e+00
br i1 %cmp, label %if.then, label %lor.lhs.false		br i1 %cmp, label %if.then, label %lor.lhs.false

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

define float @test_merge_allof_v4si(<4 x i32> %t) {		define float @test_merge_allof_v4si(<4 x i32> %t) {
; CHECK-LABEL: @test_merge_allof_v4si(		; CHECK-LABEL: @test_merge_allof_v4si(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x i32> [[T:%.]]		; CHECK-NEXT: [[T_FR:%.]] = freeze <4 x i32> [[T:%.]]
; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt <4 x i32> [[T_FR]], zeroinitializer		; CHECK-NEXT: [[TMP0:%.*]] = icmp sgt <4 x i32> [[T_FR]], zeroinitializer
; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4		; CHECK-NEXT: [[TMP1:%.*]] = bitcast <4 x i1> [[TMP0]] to i4
; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i4 [[TMP1]], 0		; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i4 [[TMP1]], 0
; CHECK-NEXT: br i1 [[TMP2]], label [[RETURN:%.]], label [[LOR_LHS_FALSE:%.]]
; CHECK: lor.lhs.false:
; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 256, i32 256, i32 256, i32 256>		; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[T_FR]], <i32 256, i32 256, i32 256, i32 256>
; CHECK-NEXT: [[TMP4:%.*]] = bitcast <4 x i1> [[TMP3]] to i4		; CHECK-NEXT: [[TMP4:%.*]] = bitcast <4 x i1> [[TMP3]] to i4
; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i4 [[TMP4]], 0		; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i4 [[TMP4]], 0
; CHECK-NEXT: br i1 [[TMP5]], label [[RETURN]], label [[IF_END:%.*]]		; CHECK-NEXT: [[OR_COND:%.*]] = or i1 [[TMP2]], [[TMP5]]
; CHECK: if.end:
; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T_FR]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>		; CHECK-NEXT: [[SHIFT:%.*]] = shufflevector <4 x i32> [[T_FR]], <4 x i32> poison, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[T_FR]], [[SHIFT]]		; CHECK-NEXT: [[TMP6:%.*]] = add nsw <4 x i32> [[T_FR]], [[SHIFT]]
; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP6]], i32 0		; CHECK-NEXT: [[ADD:%.*]] = extractelement <4 x i32> [[TMP6]], i32 0
; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float		; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[ADD]] to float
; CHECK-NEXT: br label [[RETURN]]		; CHECK-NEXT: [[RETVAL_0:%.*]] = select i1 [[OR_COND]], float 0.000000e+00, float [[CONV]]
; CHECK: return:
; CHECK-NEXT: [[RETVAL_0:%.]] = phi float [ [[CONV]], [[IF_END]] ], [ 0.000000e+00, [[LOR_LHS_FALSE]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
; CHECK-NEXT: ret float [[RETVAL_0]]		; CHECK-NEXT: ret float [[RETVAL_0]]
;		;
entry:		entry:
%vecext = extractelement <4 x i32> %t, i32 0		%vecext = extractelement <4 x i32> %t, i32 0
%cmp = icmp slt i32 %vecext, 1		%cmp = icmp slt i32 %vecext, 1
br i1 %cmp, label %land.lhs.true, label %lor.lhs.false		br i1 %cmp, label %land.lhs.true, label %lor.lhs.false

land.lhs.true:		land.lhs.true:
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

llvm/test/Transforms/SimplifyCFG/fold-branch-to-common-dest-free-cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -passes=simplifycfg -bonus-inst-threshold=1 \| FileCheck %s			; RUN: opt < %s -S -passes=simplifycfg -bonus-inst-threshold=1 \| FileCheck %s

	declare i8* @llvm.strip.invariant.group.p0i8(i8*)			declare i8* @llvm.strip.invariant.group.p0i8(i8*)

	declare void @g1()			declare void @g1()
	declare void @g2()			declare void @g2()

	define void @f(i8* %a, i8* %b, i1 %c, i1 %d, i1 %e) {			define void @f(i8* %a, i8* %b, i1 %c, i1 %d, i1 %e) {
	; CHECK-LABEL: @f(			; CHECK-LABEL: @f(
	; CHECK-NEXT: br i1 [[C:%.]], label [[L1:%.]], label [[L3:%.*]]
	; CHECK: l1:
	; CHECK-NEXT: [[A1:%.]] = call i8 @llvm.strip.invariant.group.p0i8(i8* [[A:%.*]])			; CHECK-NEXT: [[A1:%.]] = call i8 @llvm.strip.invariant.group.p0i8(i8* [[A:%.*]])
	; CHECK-NEXT: [[B1:%.]] = call i8 @llvm.strip.invariant.group.p0i8(i8* [[B:%.*]])			; CHECK-NEXT: [[B1:%.]] = call i8 @llvm.strip.invariant.group.p0i8(i8* [[B:%.*]])
	; CHECK-NEXT: [[I:%.]] = icmp eq i8 [[A1]], [[B1]]			; CHECK-NEXT: [[I:%.]] = icmp eq i8 [[A1]], [[B1]]
	; CHECK-NEXT: br i1 [[I]], label [[L2:%.*]], label [[L3]]			; CHECK-NEXT: [[OR_COND:%.]] = select i1 [[C:%.]], i1 [[I]], i1 false
				; CHECK-NEXT: br i1 [[OR_COND]], label [[L2:%.]], label [[L3:%.]]
	; CHECK: l2:			; CHECK: l2:
	; CHECK-NEXT: call void @g1()			; CHECK-NEXT: call void @g1()
	; CHECK-NEXT: br label [[RET:%.*]]			; CHECK-NEXT: br label [[RET:%.*]]
	; CHECK: l3:			; CHECK: l3:
	; CHECK-NEXT: call void @g2()			; CHECK-NEXT: call void @g2()
	; CHECK-NEXT: br label [[RET]]			; CHECK-NEXT: br label [[RET]]
	; CHECK: ret:			; CHECK: ret:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	Show All 16 Lines