This is an archive of the discontinued LLVM Phabricator instance.

define i32 @test(i32* %a, i32 %v, i32 %p) {
TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %Tail
Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  ret i32 %r
}

test/Transforms/CallSiteSplitting/callsite-split-or-phi.ll

169 ↗

(On Diff #125142)

We need more test cases because this change will hit even very simple case like :

define i32 @test(i32* %a, i32 %v, i32 %p) {
Header:
  br i1 undef, label %Tail, label %End

TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %End

Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  ret i32 %r

End:
  ret i32 %v
}

Added testcases as suggested. I've also moved the or structure related tests to a separate file and the test case that should not be split to another file to avoid having test files that are too big. I think it would also make sense to move tests we do not expect to be split to that file.

I've run the LNT test suite, SPEC2006 and SPEC2000. There were no noticeable changes in code size (all changes < 0.5% on AArch64)

@junbuml I'd appreciate another look, please let me know if you are happy with the test changes.

I've run the LNT test suite, SPEC2006 and SPEC2000. There were no noticeable changes in code size (all changes < 0.5% on AArch64)

Thanks Florian for the update. Can you please also share if you see any + or - in performance? I will run performance tests in my side as well. It seems that the comment in the top of CallSiteSplitting.cpp should be updated accordingly.

test/Transforms/CallSiteSplitting/callsite-no-or-structure.ll

136 ↗

(On Diff #127494)

It will be good to add a test case where Tail form a loop like :

define i32 @test_loop(i32* %a, i32 %v, i32 %p) {
TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %End

Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  br i1 undef, label %Tail, label %End

End:
  ret i32 %v
}

I didn't see any significant performance changes in my spec2000/2006/2017 runs on AArch64 .

In D40729#962145, @junbuml wrote:

I didn't see any significant performance changes in my spec2000/2006/2017 runs on AArch64 .

I don't see any signification changes either. It's more a step towards potentially making the pass more general, as @davidxl suggested a while ago in a comment.

For Cortex-A57, I get -0.23% on the geomean of SPEC2006 exec times, for SPEC2000 + test suite it's -0.38%

LGTM, assuming the update in the top comment .

Closed by commit rL321413: [CallSiteSplitting] Remove isOrHeader restriction. (authored by fhahn). · Explain WhyDec 23 2017, 12:03 PM

This revision was automatically updated to reflect the committed changes.

In D40729#962168, @junbuml wrote:

LGTM, assuming the update in the top comment .

I've committed this after updating the top comment, as well as dropped to Or from tryToSplitOnOrPredicatedArgument and updated the comment there.

I hope the comments are clear.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Scalar/

CallSiteSplitting.cpp

46 lines

test/

Transforms/

CallSiteSplitting/

callsite-no-or-structure.ll

139 lines

callsite-no-splitting.ll

18 lines

Diff 128089

llvm/trunk/lib/Transforms/Scalar/CallSiteSplitting.cpp

//===- CallSiteSplitting.cpp ----------------------------------------------===//		//===- CallSiteSplitting.cpp ----------------------------------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements a transformation that tries to split a call-site to pass		// This file implements a transformation that tries to split a call-site to pass
// more constrained arguments if its argument is predicated in the control flow		// more constrained arguments if its argument is predicated in the control flow
// so that we can expose better context to the later passes (e.g, inliner, jump		// so that we can expose better context to the later passes (e.g, inliner, jump
// threading, or IPA-CP based function cloning, etc.).		// threading, or IPA-CP based function cloning, etc.).
// As of now we support two cases :		// As of now we support two cases :
//		//
// 1) If a call site is dominated by an OR condition and if any of its arguments		// 1) Try to a split call-site with constrained arguments, if any constraints
// are predicated on this OR condition, try to split the condition with more		// on any argument can be found by following the single predecessors of the
// constrained arguments. For example, in the code below, we try to split the		// all site's predecessors. Currently this pass only handles call-sites with 2
// call site since we can predicate the argument(ptr) based on the OR condition.		// predecessors. For example, in the code below, we try to split the call-site
		// since we can predicate the argument(ptr) based on the OR condition.
//		//
// Split from :		// Split from :
// if (!ptr \|\| c)		// if (!ptr \|\| c)
// callee(ptr);		// callee(ptr);
// to :		// to :
// if (!ptr)		// if (!ptr)
// callee(null) // set the known constant value		// callee(null) // set the known constant value
// else if (c)		// else if (c)
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	static bool canSplitCallSite(CallSite CS) {
if (Preds.size() != 2 \|\| isa<IndirectBrInst>(Preds[0]->getTerminator()) \|\|		if (Preds.size() != 2 \|\| isa<IndirectBrInst>(Preds[0]->getTerminator()) \|\|
isa<IndirectBrInst>(Preds[1]->getTerminator()))		isa<IndirectBrInst>(Preds[1]->getTerminator()))
return false;		return false;

return CallSiteBB->canSplitPredecessors();		return CallSiteBB->canSplitPredecessors();
}		}

/// Return true if the CS is split into its new predecessors which are directly		/// Return true if the CS is split into its new predecessors which are directly
/// hooked to each of its orignial predecessors pointed by PredBB1 and PredBB2.		/// hooked to each of its original predecessors pointed by PredBB1 and PredBB2.
/// In OR predicated case, PredBB1 will point the header, and PredBB2 will point		/// CallInst1 and CallInst2 will be the new call-sites placed in the new
/// to the second compare block. CallInst1 and CallInst2 will be the new		/// predecessors split for PredBB1 and PredBB2, respectively.
/// call-sites placed in the new predecessors split for PredBB1 and PredBB2,		/// For example, in the IR below with an OR condition, the call-site can
/// repectively. Therefore, CallInst1 will be the call-site placed		/// be split. Assuming PredBB1=Header and PredBB2=TBB, CallInst1 will be the
/// between Header and Tail, and CallInst2 will be the call-site between TBB and		/// call-site placed between Header and Tail, and CallInst2 will be the
/// Tail. For example, in the IR below with an OR condition, the call-site can		/// call-site between TBB and Tail.
/// be split
///		///
/// from :		/// From :
///		///
/// Header:		/// Header:
/// %c = icmp eq i32* %a, null		/// %c = icmp eq i32* %a, null
/// br i1 %c %Tail, %TBB		/// br i1 %c %Tail, %TBB
/// TBB:		/// TBB:
/// %c2 = icmp eq i32* %b, null		/// %c2 = icmp eq i32* %b, null
/// br i1 %c %Tail, %End		/// br i1 %c %Tail, %End
/// Tail:		/// Tail:
Show All 11 Lines
/// %ca1 = call @callee (i32* null, i32* %b) // CallInst1		/// %ca1 = call @callee (i32* null, i32* %b) // CallInst1
/// br %Tail		/// br %Tail
/// Tail-split2:		/// Tail-split2:
/// %ca2 = call @callee (i32* nonnull %a, i32* null) // CallInst2		/// %ca2 = call @callee (i32* nonnull %a, i32* null) // CallInst2
/// br %Tail		/// br %Tail
/// Tail:		/// Tail:
/// %p = phi i1 [%ca1, %Tail-split1],[%ca2, %Tail-split2]		/// %p = phi i1 [%ca1, %Tail-split1],[%ca2, %Tail-split2]
///		///
/// Note that for an OR predicated case, CallInst1 and CallInst2 should be		/// Note that in case any arguments at the call-site are constrained by its
/// created with more constrained arguments in		/// predecessors, new call-sites with more constrained arguments will be
/// createCallSitesOnOrPredicatedArgument().		/// created in createCallSitesOnPredicatedArgument().
static void splitCallSite(CallSite CS, BasicBlock PredBB1, BasicBlock PredBB2,		static void splitCallSite(CallSite CS, BasicBlock PredBB1, BasicBlock PredBB2,
Instruction CallInst1, Instruction CallInst2) {		Instruction CallInst1, Instruction CallInst2) {
Instruction *Instr = CS.getInstruction();		Instruction *Instr = CS.getInstruction();
BasicBlock *TailBB = Instr->getParent();		BasicBlock *TailBB = Instr->getParent();
assert(Instr == (TailBB->getFirstNonPHIOrDbg()) && "Unexpected call-site");		assert(Instr == (TailBB->getFirstNonPHIOrDbg()) && "Unexpected call-site");

BasicBlock *SplitBlock1 =		BasicBlock *SplitBlock1 =
SplitBlockPredecessors(TailBB, PredBB1, ".predBB1.split");		SplitBlockPredecessors(TailBB, PredBB1, ".predBB1.split");
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
static bool tryToSplitOnPHIPredicatedArgument(CallSite CS) {		static bool tryToSplitOnPHIPredicatedArgument(CallSite CS) {
if (!isPredicatedOnPHI(CS))		if (!isPredicatedOnPHI(CS))
return false;		return false;

auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());		auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());
splitCallSite(CS, Preds[0], Preds[1], nullptr, nullptr);		splitCallSite(CS, Preds[0], Preds[1], nullptr, nullptr);
return true;		return true;
}		}
// Check if one of the predecessors is a single predecessors of the other.
// This is a requirement for control flow modeling an OR. HeaderBB points to
// the single predecessor and OrBB points to other node. HeaderBB potentially
// contains the first compare of the OR and OrBB the second.
static bool isOrHeader(BasicBlock HeaderBB, BasicBlock OrBB) {
return OrBB->getSinglePredecessor() == HeaderBB &&
HeaderBB->getTerminator()->getNumSuccessors() == 2;
}

static bool tryToSplitOnOrPredicatedArgument(CallSite CS) {		static bool tryToSplitOnPredicatedArgument(CallSite CS) {
auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());		auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());
if (!isOrHeader(Preds[0], Preds[1]) && !isOrHeader(Preds[1], Preds[0]))		if (Preds[0] == Preds[1])
return false;		return false;

SmallVector<std::pair<ICmpInst *, unsigned>, 2> C1, C2;		SmallVector<std::pair<ICmpInst *, unsigned>, 2> C1, C2;
recordConditions(CS, Preds[0], C1);		recordConditions(CS, Preds[0], C1);
recordConditions(CS, Preds[1], C2);		recordConditions(CS, Preds[1], C2);

Instruction *CallInst1 = addConditions(CS, C1);		Instruction *CallInst1 = addConditions(CS, C1);
Instruction *CallInst2 = addConditions(CS, C2);		Instruction *CallInst2 = addConditions(CS, C2);
if (!CallInst1 && !CallInst2)		if (!CallInst1 && !CallInst2)
return false;		return false;

splitCallSite(CS, Preds[1], Preds[0], CallInst2, CallInst1);		splitCallSite(CS, Preds[1], Preds[0], CallInst2, CallInst1);
return true;		return true;
}		}

static bool tryToSplitCallSite(CallSite CS) {		static bool tryToSplitCallSite(CallSite CS) {
if (!CS.arg_size() \|\| !canSplitCallSite(CS))		if (!CS.arg_size() \|\| !canSplitCallSite(CS))
return false;		return false;
return tryToSplitOnOrPredicatedArgument(CS) \|\|		return tryToSplitOnPredicatedArgument(CS) \|\|
tryToSplitOnPHIPredicatedArgument(CS);		tryToSplitOnPHIPredicatedArgument(CS);
}		}

static bool doCallSiteSplitting(Function &F, TargetLibraryInfo &TLI) {		static bool doCallSiteSplitting(Function &F, TargetLibraryInfo &TLI) {
bool Changed = false;		bool Changed = false;
for (Function::iterator BI = F.begin(), BE = F.end(); BI != BE;) {		for (Function::iterator BI = F.begin(), BE = F.end(); BI != BE;) {
BasicBlock &BB = *BI++;		BasicBlock &BB = *BI++;
for (BasicBlock::iterator II = BB.begin(), IE = BB.end(); II != IE;) {		for (BasicBlock::iterator II = BB.begin(), IE = BB.end(); II != IE;) {
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/CallSiteSplitting/callsite-no-or-structure.ll

				; RUN: opt < %s -callsite-splitting -S \| FileCheck %s
				; RUN: opt < %s -passes='function(callsite-splitting)' -S \| FileCheck %s

				; CHECK-LABEL: @test_simple
				; CHECK-LABEL: Header:
				; CHECK-NEXT: br i1 undef, label %Tail.predBB1.split
				; CHECK-LABEL: TBB:
				; CHECK: br i1 %cmp, label %Tail.predBB2.split
				; CHECK-LABEL: Tail.predBB1.split:
				; CHECK: %[[CALL1:.]] = call i32 @callee(i32 %a, i32 %v, i32 %p)
				; CHECK-LABEL: Tail.predBB2.split:
				; CHECK: %[[CALL2:.]] = call i32 @callee(i32 null, i32 %v, i32 %p)
				; CHECK-LABEL: Tail
				; CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
				; CHECK: ret i32 %[[MERGED]]
				define i32 @test_simple(i32* %a, i32 %v, i32 %p) {
				Header:
				br i1 undef, label %Tail, label %End

				TBB:
				%cmp = icmp eq i32* %a, null
				br i1 %cmp, label %Tail, label %End

				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r

				End:
				ret i32 %v
				}

				; CHECK-LABEL: @test_eq_eq_eq_untaken
				; CHECK-LABEL: Header:
				; CHECK: br i1 %tobool1, label %TBB1, label %Tail.predBB1.split
				; CHECK-LABEL: TBB2:
				; CHECK: br i1 %cmp2, label %Tail.predBB2.split, label %End
				; CHECK-LABEL: Tail.predBB1.split:
				; CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 %p)
				; CHECK-LABEL: Tail.predBB2.split:
				; CHECK: %[[CALL2:.]] = call i32 @callee(i32 null, i32 1, i32 99)
				; CHECK-LABEL: Tail
				; CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
				; CHECK: ret i32 %[[MERGED]]
				define i32 @test_eq_eq_eq_untaken2(i32* %a, i32 %v, i32 %p) {
				Header:
				%tobool1 = icmp eq i32* %a, null
				br i1 %tobool1, label %TBB1, label %Tail

				TBB1:
				%cmp1 = icmp eq i32 %v, 1
				br i1 %cmp1, label %TBB2, label %End

				TBB2:
				%cmp2 = icmp eq i32 %p, 99
				br i1 %cmp2, label %Tail, label %End

				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r

				End:
				ret i32 %v
				}

				; CHECK-LABEL: @test_eq_ne_eq_untaken
				; CHECK-LABEL: Header:
				; CHECK: br i1 %tobool1, label %TBB1, label %Tail.predBB1.split
				; CHECK-LABEL: TBB2:
				; CHECK: br i1 %cmp2, label %Tail.predBB2.split, label %End
				; CHECK-LABEL: Tail.predBB1.split:
				; CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 %p)
				; CHECK-LABEL: Tail.predBB2.split:
				; CHECK: %[[CALL2:.]] = call i32 @callee(i32 null, i32 %v, i32 99)
				; CHECK-LABEL: Tail
				; CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
				; CHECK: ret i32 %[[MERGED]]
				define i32 @test_eq_ne_eq_untaken(i32* %a, i32 %v, i32 %p) {
				Header:
				%tobool1 = icmp eq i32* %a, null
				br i1 %tobool1, label %TBB1, label %Tail

				TBB1:
				%cmp1 = icmp ne i32 %v, 1
				br i1 %cmp1, label %TBB2, label %End

				TBB2:
				%cmp2 = icmp eq i32 %p, 99
				br i1 %cmp2, label %Tail, label %End

				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r

				End:
				ret i32 %v
				}

				; CHECK-LABEL: @test_header_header2_tbb
				; CHECK: Header2:
				; CHECK:br i1 %tobool2, label %Tail.predBB1.split, label %TBB1
				; CHECK-LABEL: TBB2:
				; CHECK: br i1 %cmp2, label %Tail.predBB2.split, label %End
				; CHECK-LABEL: Tail.predBB1.split:
				; CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 10)
				; CHECK-LABEL: Tail.predBB2.split:
				; NOTE: CallSiteSplitting cannot infer that %a is null here, as it currently
				; only supports recording conditions along a single predecessor path.
				; CHECK: %[[CALL2:.]] = call i32 @callee(i32 %a, i32 1, i32 99)
				; CHECK-LABEL: Tail
				; CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
				; CHECK: ret i32 %[[MERGED]]
				define i32 @test_header_header2_tbb(i32* %a, i32 %v, i32 %p) {
				Header:
				%tobool1 = icmp eq i32* %a, null
				br i1 %tobool1, label %TBB1, label %Header2

				Header2:
				%tobool2 = icmp eq i32 %p, 10
				br i1 %tobool2, label %Tail, label %TBB1

				TBB1:
				%cmp1 = icmp eq i32 %v, 1
				br i1 %cmp1, label %TBB2, label %End

				TBB2:
				%cmp2 = icmp eq i32 %p, 99
				br i1 %cmp2, label %Tail, label %End

				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r

				End:
				ret i32 %v
				}

				define i32 @callee(i32* %a, i32 %v, i32 %p) {
				ret i32 10
				}

llvm/trunk/test/Transforms/CallSiteSplitting/callsite-no-splitting.ll

				; RUN: opt < %s -callsite-splitting -S \| FileCheck %s
				; RUN: opt < %s -passes='function(callsite-splitting)' -S \| FileCheck %s

				define i32 @callee(i32*, i32, i32) {
				ret i32 10
				}

				; CHECK-LABEL: @test_preds_equal
				; CHECK-NOT: split
				; CHECK: br i1 %cmp, label %Tail, label %Tail
				define i32 @test_preds_equal(i32* %a, i32 %v, i32 %p) {
				TBB:
				%cmp = icmp eq i32* %a, null
				br i1 %cmp, label %Tail, label %Tail
				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CallSiteSplitting] Remove isOrHeader restriction.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 128089

llvm/trunk/lib/Transforms/Scalar/CallSiteSplitting.cpp

llvm/trunk/test/Transforms/CallSiteSplitting/callsite-no-or-structure.ll

llvm/trunk/test/Transforms/CallSiteSplitting/callsite-no-splitting.ll

[CallSiteSplitting] Remove isOrHeader restriction.
ClosedPublic