This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
1/1
CallSiteSplitting.cpp
-
test/Transforms/CallSiteSplitting/
-
Transforms/
-
CallSiteSplitting/
1
callsite-no-or-structure.ll
-
callsite-no-splitting.ll

Differential D40729

[CallSiteSplitting] Remove isOrHeader restriction.
ClosedPublic

Authored by fhahn on Dec 1 2017, 7:36 AM.

Download Raw Diff

Details

Reviewers

davidxl
junbuml
davide

Commits

rG7e9328906baa: [CallSiteSplitting] Remove isOrHeader restriction.
rL321413: [CallSiteSplitting] Remove isOrHeader restriction.

Summary

By following the single predecessors of the predecessors of the call
site, we do not need to restrict the control flow.

Diff Detail

Event Timeline

fhahn created this revision.Dec 1 2017, 7:36 AM

fhahn added a parent revision: D40728: [CallSiteSplitting] Refactor creating callsites..

LGTM

This revision is now accepted and ready to land.Dec 1 2017, 10:17 AM

I believe we need to check the code size increase with this because it will be applied widely and clone call-sites which could be inlined.

lib/Transforms/Scalar/CallSiteSplitting.cpp

346

We should bail out when Preds[0] == Preds[1]. For below IR , we should not split CS.

define i32 @test(i32* %a, i32 %v, i32 %p) {
TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %Tail
Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  ret i32 %r
}

test/Transforms/CallSiteSplitting/callsite-split-or-phi.ll

169 ↗

(On Diff #125142)

We need more test cases because this change will hit even very simple case like :

define i32 @test(i32* %a, i32 %v, i32 %p) {
Header:
  br i1 undef, label %Tail, label %End

TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %End

Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  ret i32 %r

End:
  ret i32 %v
}

Added testcases as suggested. I've also moved the or structure related tests to a separate file and the test case that should not be split to another file to avoid having test files that are too big. I think it would also make sense to move tests we do not expect to be split to that file.

I've run the LNT test suite, SPEC2006 and SPEC2000. There were no noticeable changes in code size (all changes < 0.5% on AArch64)

@junbuml I'd appreciate another look, please let me know if you are happy with the test changes.

I've run the LNT test suite, SPEC2006 and SPEC2000. There were no noticeable changes in code size (all changes < 0.5% on AArch64)

Thanks Florian for the update. Can you please also share if you see any + or - in performance? I will run performance tests in my side as well. It seems that the comment in the top of CallSiteSplitting.cpp should be updated accordingly.

test/Transforms/CallSiteSplitting/callsite-no-or-structure.ll

136

It will be good to add a test case where Tail form a loop like :

define i32 @test_loop(i32* %a, i32 %v, i32 %p) {
TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %End

Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  br i1 undef, label %Tail, label %End

End:
  ret i32 %v
}

I didn't see any significant performance changes in my spec2000/2006/2017 runs on AArch64 .

In D40729#962145, @junbuml wrote:

I didn't see any significant performance changes in my spec2000/2006/2017 runs on AArch64 .

I don't see any signification changes either. It's more a step towards potentially making the pass more general, as @davidxl suggested a while ago in a comment.

For Cortex-A57, I get -0.23% on the geomean of SPEC2006 exec times, for SPEC2000 + test suite it's -0.38%

LGTM, assuming the update in the top comment .

Closed by commit rL321413: [CallSiteSplitting] Remove isOrHeader restriction. (authored by fhahn). · Explain WhyDec 23 2017, 12:03 PM

This revision was automatically updated to reflect the committed changes.

In D40729#962168, @junbuml wrote:

LGTM, assuming the update in the top comment .

I've committed this after updating the top comment, as well as dropped to Or from tryToSplitOnOrPredicatedArgument and updated the comment there.

I hope the comments are clear.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

CallSiteSplitting.cpp

10 lines

test/

Transforms/

CallSiteSplitting/

callsite-no-or-structure.ll

139 lines

callsite-no-splitting.ll

18 lines

Diff 127494

lib/Transforms/Scalar/CallSiteSplitting.cpp

	Show First 20 Lines • Show All 326 Lines • ▼ Show 20 Lines
	static bool tryToSplitOnPHIPredicatedArgument(CallSite CS) {			static bool tryToSplitOnPHIPredicatedArgument(CallSite CS) {
	if (!isPredicatedOnPHI(CS))			if (!isPredicatedOnPHI(CS))
	return false;			return false;

	auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());			auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());
	splitCallSite(CS, Preds[0], Preds[1], nullptr, nullptr);			splitCallSite(CS, Preds[0], Preds[1], nullptr, nullptr);
	return true;			return true;
	}			}
	// Check if one of the predecessors is a single predecessors of the other.
	// This is a requirement for control flow modeling an OR. HeaderBB points to
	// the single predecessor and OrBB points to other node. HeaderBB potentially
	// contains the first compare of the OR and OrBB the second.
	static bool isOrHeader(BasicBlock HeaderBB, BasicBlock OrBB) {
	return OrBB->getSinglePredecessor() == HeaderBB &&
	HeaderBB->getTerminator()->getNumSuccessors() == 2;
	}

	static bool tryToSplitOnOrPredicatedArgument(CallSite CS) {			static bool tryToSplitOnOrPredicatedArgument(CallSite CS) {
	auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());			auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());
	if (!isOrHeader(Preds[0], Preds[1]) && !isOrHeader(Preds[1], Preds[0]))			if (Preds[0] == Preds[1])
	junbumlUnsubmitted Done Reply Inline Actions We should bail out when Preds[0] == Preds[1]. For below IR , we should not split CS. define i32 @test(i32* %a, i32 %v, i32 %p) { TBB: %cmp = icmp eq i32* %a, null br i1 %cmp, label %Tail, label %Tail Tail: %r = call i32 @callee(i32* %a, i32 %v, i32 %p) ret i32 %r } junbuml: We should bail out when Preds[0] == Preds[1]. For below IR , we should not split CS. ```…
	return false;			return false;

	SmallVector<std::pair<ICmpInst *, unsigned>, 2> C1, C2;			SmallVector<std::pair<ICmpInst *, unsigned>, 2> C1, C2;
	recordConditions(CS, Preds[0], C1);			recordConditions(CS, Preds[0], C1);
	recordConditions(CS, Preds[1], C2);			recordConditions(CS, Preds[1], C2);

	Instruction *CallInst1 = addConditions(CS, C1);			Instruction *CallInst1 = addConditions(CS, C1);
	Instruction *CallInst2 = addConditions(CS, C2);			Instruction *CallInst2 = addConditions(CS, C2);
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

test/Transforms/CallSiteSplitting/callsite-no-or-structure.ll

This file was added.

				; RUN: opt < %s -callsite-splitting -S \| FileCheck %s
				; RUN: opt < %s -passes='function(callsite-splitting)' -S \| FileCheck %s

				; CHECK-LABEL: @test_simple
				; CHECK-LABEL: Header:
				; CHECK-NEXT: br i1 undef, label %Tail.predBB1.split
				; CHECK-LABEL: TBB:
				; CHECK: br i1 %cmp, label %Tail.predBB2.split
				; CHECK-LABEL: Tail.predBB1.split:
				; CHECK: %[[CALL1:.]] = call i32 @callee(i32 %a, i32 %v, i32 %p)
				; CHECK-LABEL: Tail.predBB2.split:
				; CHECK: %[[CALL2:.]] = call i32 @callee(i32 null, i32 %v, i32 %p)
				; CHECK-LABEL: Tail
				; CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
				; CHECK: ret i32 %[[MERGED]]
				define i32 @test_simple(i32* %a, i32 %v, i32 %p) {
				Header:
				br i1 undef, label %Tail, label %End

				TBB:
				%cmp = icmp eq i32* %a, null
				br i1 %cmp, label %Tail, label %End

				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r

				End:
				ret i32 %v
				}

				; CHECK-LABEL: @test_eq_eq_eq_untaken
				; CHECK-LABEL: Header:
				; CHECK: br i1 %tobool1, label %TBB1, label %Tail.predBB1.split
				; CHECK-LABEL: TBB2:
				; CHECK: br i1 %cmp2, label %Tail.predBB2.split, label %End
				; CHECK-LABEL: Tail.predBB1.split:
				; CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 %p)
				; CHECK-LABEL: Tail.predBB2.split:
				; CHECK: %[[CALL2:.]] = call i32 @callee(i32 null, i32 1, i32 99)
				; CHECK-LABEL: Tail
				; CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
				; CHECK: ret i32 %[[MERGED]]
				define i32 @test_eq_eq_eq_untaken2(i32* %a, i32 %v, i32 %p) {
				Header:
				%tobool1 = icmp eq i32* %a, null
				br i1 %tobool1, label %TBB1, label %Tail

				TBB1:
				%cmp1 = icmp eq i32 %v, 1
				br i1 %cmp1, label %TBB2, label %End

				TBB2:
				%cmp2 = icmp eq i32 %p, 99
				br i1 %cmp2, label %Tail, label %End

				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r

				End:
				ret i32 %v
				}

				; CHECK-LABEL: @test_eq_ne_eq_untaken
				; CHECK-LABEL: Header:
				; CHECK: br i1 %tobool1, label %TBB1, label %Tail.predBB1.split
				; CHECK-LABEL: TBB2:
				; CHECK: br i1 %cmp2, label %Tail.predBB2.split, label %End
				; CHECK-LABEL: Tail.predBB1.split:
				; CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 %p)
				; CHECK-LABEL: Tail.predBB2.split:
				; CHECK: %[[CALL2:.]] = call i32 @callee(i32 null, i32 %v, i32 99)
				; CHECK-LABEL: Tail
				; CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
				; CHECK: ret i32 %[[MERGED]]
				define i32 @test_eq_ne_eq_untaken(i32* %a, i32 %v, i32 %p) {
				Header:
				%tobool1 = icmp eq i32* %a, null
				br i1 %tobool1, label %TBB1, label %Tail

				TBB1:
				%cmp1 = icmp ne i32 %v, 1
				br i1 %cmp1, label %TBB2, label %End

				TBB2:
				%cmp2 = icmp eq i32 %p, 99
				br i1 %cmp2, label %Tail, label %End

				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r

				End:
				ret i32 %v
				}

				; CHECK-LABEL: @test_header_header2_tbb
				; CHECK: Header2:
				; CHECK:br i1 %tobool2, label %Tail.predBB1.split, label %TBB1
				; CHECK-LABEL: TBB2:
				; CHECK: br i1 %cmp2, label %Tail.predBB2.split, label %End
				; CHECK-LABEL: Tail.predBB1.split:
				; CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 10)
				; CHECK-LABEL: Tail.predBB2.split:
				; NOTE: CallSiteSplitting cannot infer that %a is null here, as it currently
				; only supports recording conditions along a single predecessor path.
				; CHECK: %[[CALL2:.]] = call i32 @callee(i32 %a, i32 1, i32 99)
				; CHECK-LABEL: Tail
				; CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
				; CHECK: ret i32 %[[MERGED]]
				define i32 @test_header_header2_tbb(i32* %a, i32 %v, i32 %p) {
				Header:
				%tobool1 = icmp eq i32* %a, null
				br i1 %tobool1, label %TBB1, label %Header2

				Header2:
				%tobool2 = icmp eq i32 %p, 10
				br i1 %tobool2, label %Tail, label %TBB1

				TBB1:
				%cmp1 = icmp eq i32 %v, 1
				br i1 %cmp1, label %TBB2, label %End

				TBB2:
				%cmp2 = icmp eq i32 %p, 99
				br i1 %cmp2, label %Tail, label %End

				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r

				End:
				ret i32 %v
				}

				junbumlUnsubmitted Not Done Reply Inline Actions It will be good to add a test case where Tail form a loop like : define i32 @test_loop(i32* %a, i32 %v, i32 %p) { TBB: %cmp = icmp eq i32* %a, null br i1 %cmp, label %Tail, label %End Tail: %r = call i32 @callee(i32* %a, i32 %v, i32 %p) br i1 undef, label %Tail, label %End End: ret i32 %v } junbuml: It will be good to add a test case where Tail form a loop like : ``` define i32 @test_loop…
				define i32 @callee(i32* %a, i32 %v, i32 %p) {
				ret i32 10
				}

test/Transforms/CallSiteSplitting/callsite-no-splitting.ll

This file was added.

				; RUN: opt < %s -callsite-splitting -S \| FileCheck %s
				; RUN: opt < %s -passes='function(callsite-splitting)' -S \| FileCheck %s

				define i32 @callee(i32*, i32, i32) {
				ret i32 10
				}

				; CHECK-LABEL: @test_preds_equal
				; CHECK-NOT: split
				; CHECK: br i1 %cmp, label %Tail, label %Tail
				define i32 @test_preds_equal(i32* %a, i32 %v, i32 %p) {
				TBB:
				%cmp = icmp eq i32* %a, null
				br i1 %cmp, label %Tail, label %Tail
				Tail:
				%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
				ret i32 %r
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CallSiteSplitting] Remove isOrHeader restriction.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 127494

lib/Transforms/Scalar/CallSiteSplitting.cpp

test/Transforms/CallSiteSplitting/callsite-no-or-structure.ll

test/Transforms/CallSiteSplitting/callsite-no-splitting.ll

[CallSiteSplitting] Remove isOrHeader restriction.
ClosedPublic