This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
1/1
CallSiteSplitting.cpp
-
test/Transforms/CallSiteSplitting/
-
Transforms/
-
CallSiteSplitting/
1/1
callsite-split-or-phi.ll

Differential D40729

[CallSiteSplitting] Remove isOrHeader restriction.
ClosedPublic

Authored by fhahn on Dec 1 2017, 7:36 AM.

Download Raw Diff

Details

Reviewers

davidxl
junbuml
davide

Commits

rG7e9328906baa: [CallSiteSplitting] Remove isOrHeader restriction.
rL321413: [CallSiteSplitting] Remove isOrHeader restriction.

Summary

By following the single predecessors of the predecessors of the call
site, we do not need to restrict the control flow.

Diff Detail

Event Timeline

fhahn created this revision.Dec 1 2017, 7:36 AM

fhahn added a parent revision: D40728: [CallSiteSplitting] Refactor creating callsites..

LGTM

This revision is now accepted and ready to land.Dec 1 2017, 10:17 AM

I believe we need to check the code size increase with this because it will be applied widely and clone call-sites which could be inlined.

lib/Transforms/Scalar/CallSiteSplitting.cpp

335

We should bail out when Preds[0] == Preds[1]. For below IR , we should not split CS.

define i32 @test(i32* %a, i32 %v, i32 %p) {
TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %Tail
Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  ret i32 %r
}

test/Transforms/CallSiteSplitting/callsite-split-or-phi.ll

169

We need more test cases because this change will hit even very simple case like :

define i32 @test(i32* %a, i32 %v, i32 %p) {
Header:
  br i1 undef, label %Tail, label %End

TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %End

Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  ret i32 %r

End:
  ret i32 %v
}

Added testcases as suggested. I've also moved the or structure related tests to a separate file and the test case that should not be split to another file to avoid having test files that are too big. I think it would also make sense to move tests we do not expect to be split to that file.

I've run the LNT test suite, SPEC2006 and SPEC2000. There were no noticeable changes in code size (all changes < 0.5% on AArch64)

@junbuml I'd appreciate another look, please let me know if you are happy with the test changes.

I've run the LNT test suite, SPEC2006 and SPEC2000. There were no noticeable changes in code size (all changes < 0.5% on AArch64)

Thanks Florian for the update. Can you please also share if you see any + or - in performance? I will run performance tests in my side as well. It seems that the comment in the top of CallSiteSplitting.cpp should be updated accordingly.

test/Transforms/CallSiteSplitting/callsite-no-or-structure.ll

136 ↗

(On Diff #127494)

It will be good to add a test case where Tail form a loop like :

define i32 @test_loop(i32* %a, i32 %v, i32 %p) {
TBB:
  %cmp = icmp eq i32* %a, null
  br i1 %cmp, label %Tail, label %End

Tail:
  %r = call i32 @callee(i32* %a, i32 %v, i32 %p)
  br i1 undef, label %Tail, label %End

End:
  ret i32 %v
}

I didn't see any significant performance changes in my spec2000/2006/2017 runs on AArch64 .

In D40729#962145, @junbuml wrote:

I didn't see any significant performance changes in my spec2000/2006/2017 runs on AArch64 .

I don't see any signification changes either. It's more a step towards potentially making the pass more general, as @davidxl suggested a while ago in a comment.

For Cortex-A57, I get -0.23% on the geomean of SPEC2006 exec times, for SPEC2000 + test suite it's -0.38%

LGTM, assuming the update in the top comment .

Closed by commit rL321413: [CallSiteSplitting] Remove isOrHeader restriction. (authored by fhahn). · Explain WhyDec 23 2017, 12:03 PM

This revision was automatically updated to reflect the committed changes.

In D40729#962168, @junbuml wrote:

LGTM, assuming the update in the top comment .

I've committed this after updating the top comment, as well as dropped to Or from tryToSplitOnOrPredicatedArgument and updated the comment there.

I hope the comments are clear.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

CallSiteSplitting.cpp

11 lines

test/

Transforms/

CallSiteSplitting/

callsite-split-or-phi.ll

97 lines

Diff 125142

lib/Transforms/Scalar/CallSiteSplitting.cpp

	Show First 20 Lines • Show All 315 Lines • ▼ Show 20 Lines
	static bool tryToSplitOnPHIPredicatedArgument(CallSite CS) {			static bool tryToSplitOnPHIPredicatedArgument(CallSite CS) {
	if (!isPredicatedOnPHI(CS))			if (!isPredicatedOnPHI(CS))
	return false;			return false;

	auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());			auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());
	splitCallSite(CS, Preds[0], Preds[1], nullptr, nullptr);			splitCallSite(CS, Preds[0], Preds[1], nullptr, nullptr);
	return true;			return true;
	}			}
	// Check if one of the predecessors is a single predecessors of the other.
	// This is a requirement for control flow modeling an OR. HeaderBB points to
	// the single predecessor and OrBB points to other node. HeaderBB potentially
	// contains the first compare of the OR and OrBB the second.
	static bool isOrHeader(BasicBlock HeaderBB, BasicBlock OrBB) {
	return OrBB->getSinglePredecessor() == HeaderBB &&
	HeaderBB->getTerminator()->getNumSuccessors() == 2;
	}

	static bool tryToSplitOnOrPredicatedArgument(CallSite CS) {			static bool tryToSplitOnOrPredicatedArgument(CallSite CS) {
	auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());			auto Preds = getTwoPredecessors(CS.getInstruction()->getParent());
	if (!isOrHeader(Preds[0], Preds[1]) && !isOrHeader(Preds[1], Preds[0]))
	junbumlUnsubmitted Done Reply Inline Actions We should bail out when Preds[0] == Preds[1]. For below IR , we should not split CS. define i32 @test(i32* %a, i32 %v, i32 %p) { TBB: %cmp = icmp eq i32* %a, null br i1 %cmp, label %Tail, label %Tail Tail: %r = call i32 @callee(i32* %a, i32 %v, i32 %p) ret i32 %r } junbuml: We should bail out when Preds[0] == Preds[1]. For below IR , we should not split CS. ```…
	return false;

	SmallVector<std::pair<ICmpInst*, unsigned>, 2> C1, C2;			SmallVector<std::pair<ICmpInst*, unsigned>, 2> C1, C2;
	recordConditions(CS, Preds[0], C1);			recordConditions(CS, Preds[0], C1);
	recordConditions(CS, Preds[1], C2);			recordConditions(CS, Preds[1], C2);

	Instruction *CallInst1 = addConditions(CS, C1);			Instruction *CallInst1 = addConditions(CS, C1);
	Instruction *CallInst2 = addConditions(CS, C2);			Instruction *CallInst2 = addConditions(CS, C2);
	if (!CallInst1 && !CallInst2)			if (!CallInst1 && !CallInst2)
	return false;			return false;
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

test/Transforms/CallSiteSplitting/callsite-split-or-phi.ll

Show First 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	Tail:
%p = phi i32[1,%Header], [2, %TBB]		%p = phi i32[1,%Header], [2, %TBB]
%r = call i32 @callee(i32* %a, i32 %v, i32 %p)		%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
ret i32 %r		ret i32 %r

End:		End:
ret i32 %v		ret i32 %v
}		}

		;CHECK-LABEL: @test_eq_eq_eq_untaken
		junbumlUnsubmitted Done Reply Inline Actions We need more test cases because this change will hit even very simple case like : define i32 @test(i32* %a, i32 %v, i32 %p) { Header: br i1 undef, label %Tail, label %End TBB: %cmp = icmp eq i32* %a, null br i1 %cmp, label %Tail, label %End Tail: %r = call i32 @callee(i32* %a, i32 %v, i32 %p) ret i32 %r End: ret i32 %v } junbuml: We need more test cases because this change will hit even very simple case like : ``` define…
		;CHECK-LABEL: Tail.predBB1.split:
		;CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 %p)
		;CHECK-LABEL: Tail.predBB2.split:
		;CHECK: %[[CALL2:.]] = call i32 @callee(i32 null, i32 1, i32 99)
		;CHECK-LABEL: Tail
		;CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
		;CHECK: ret i32 %[[MERGED]]
		define i32 @test_eq_eq_eq_untaken(i32* %a, i32 %v, i32 %p) {
		Header:
		%tobool1 = icmp eq i32* %a, null
		br i1 %tobool1, label %TBB1, label %Tail

		TBB1:
		%cmp1 = icmp eq i32 %v, 1
		br i1 %cmp1, label %TBB2, label %End

		TBB2:
		%cmp2 = icmp eq i32 %p, 99
		br i1 %cmp2, label %Tail, label %End

		Tail:
		%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
		ret i32 %r

		End:
		ret i32 %v
		}

		;CHECK-LABEL: @test_eq_ne_eq_untaken
		;CHECK-LABEL: Tail.predBB1.split:
		;CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 %p)
		;CHECK-LABEL: Tail.predBB2.split:
		;CHECK: %[[CALL2:.]] = call i32 @callee(i32 null, i32 %v, i32 99)
		;CHECK-LABEL: Tail
		;CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
		;CHECK: ret i32 %[[MERGED]]
		define i32 @test_eq_ne_eq_untaken(i32* %a, i32 %v, i32 %p) {
		Header:
		%tobool1 = icmp eq i32* %a, null
		br i1 %tobool1, label %TBB1, label %Tail

		TBB1:
		%cmp1 = icmp ne i32 %v, 1
		br i1 %cmp1, label %TBB2, label %End

		TBB2:
		%cmp2 = icmp eq i32 %p, 99
		br i1 %cmp2, label %Tail, label %End

		Tail:
		%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
		ret i32 %r

		End:
		ret i32 %v
		}

		;CHECK-LABEL: @test_eq_eq_eq_eq_untaken
		;CHECK-LABEL: Tail.predBB1.split:
		;CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 10)
		;CHECK-LABEL: Tail.predBB2.split:
		; NOTE: CallSiteSplitting cannot infer that %a is null here, as it currently
		; only supports recording conditions along a single predecessor path.
		;CHECK: %[[CALL2:.]] = call i32 @callee(i32 %a, i32 1, i32 99)
		;CHECK-LABEL: Tail
		;CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
		;CHECK: ret i32 %[[MERGED]]
		define i32 @test_eq_eq_eq_eq_untaken(i32* %a, i32 %v, i32 %p) {
		Header:
		%tobool1 = icmp eq i32* %a, null
		br i1 %tobool1, label %TBB1, label %Header2

		Header2:
		%tobool2 = icmp eq i32 %p, 10
		br i1 %tobool2, label %Tail, label %TBB1

		TBB1:
		%cmp1 = icmp eq i32 %v, 1
		br i1 %cmp1, label %TBB2, label %End

		TBB2:
		%cmp2 = icmp eq i32 %p, 99
		br i1 %cmp2, label %Tail, label %End

		Tail:
		%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
		ret i32 %r

		End:
		ret i32 %v
		}

;CHECK-LABEL: @test_nonconst_const_phi		;CHECK-LABEL: @test_nonconst_const_phi
;CHECK-LABEL: Tail.predBB1.split:		;CHECK-LABEL: Tail.predBB1.split:
;CHECK: %[[CALL1:.]] = call i32 @callee(i32 %a, i32 %v, i32 1)		;CHECK: %[[CALL1:.]] = call i32 @callee(i32 %a, i32 %v, i32 1)
;CHECK-LABEL: Tail.predBB2.split:		;CHECK-LABEL: Tail.predBB2.split:
;CHECK: %[[CALL2:.]] = call i32 @callee(i32 %a, i32 1, i32 2)		;CHECK: %[[CALL2:.]] = call i32 @callee(i32 %a, i32 1, i32 2)
;CHECK-LABEL: Tail		;CHECK-LABEL: Tail
;CHECK: %p = phi i32 [ 1, %Tail.predBB1.split ], [ 2, %Tail.predBB2.split ]		;CHECK: %p = phi i32 [ 1, %Tail.predBB1.split ], [ 2, %Tail.predBB2.split ]
;CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]		;CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines	Tail:
%r = call i32 @callee(i32* %a, i32 %v, i32 %p)		%r = call i32 @callee(i32* %a, i32 %v, i32 %p)
ret i32 %r		ret i32 %r

End:		End:
ret i32 %v		ret i32 %v
}		}


;CHECK-LABEL: @test_eq_eq_eq_untaken		;CHECK-LABEL: @test_header_header2_tbb
;CHECK-LABEL: Tail.predBB1.split:		;CHECK-LABEL: Tail.predBB1.split:
;CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 10)		;CHECK: %[[CALL1:.]] = call i32 @callee(i32 nonnull %a, i32 %v, i32 10)
;CHECK-LABEL: Tail.predBB2.split:		;CHECK-LABEL: Tail.predBB2.split:
;CHECK: %[[CALL2:.]] = call i32 @callee(i32 nonnull %a, i32 1, i32 %p)		;CHECK: %[[CALL2:.]] = call i32 @callee(i32 nonnull %a, i32 1, i32 %p)
;CHECK-LABEL: Tail		;CHECK-LABEL: Tail
;CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]		;CHECK: %[[MERGED:.*]] = phi i32 [ %[[CALL1]], %Tail.predBB1.split ], [ %[[CALL2]], %Tail.predBB2.split ]
;CHECK: ret i32 %[[MERGED]]		;CHECK: ret i32 %[[MERGED]]
define i32 @test_eq_eq_eq_untaken(i32* %a, i32 %v, i32 %p) {		define i32 @test_header_header2_tbb(i32* %a, i32 %v, i32 %p) {
Header:		Header:
%tobool1 = icmp eq i32* %a, null		%tobool1 = icmp eq i32* %a, null
br i1 %tobool1, label %End, label %Header2		br i1 %tobool1, label %End, label %Header2

Header2:		Header2:
%tobool2 = icmp eq i32 %p, 10		%tobool2 = icmp eq i32 %p, 10
br i1 %tobool2, label %Tail, label %TBB		br i1 %tobool2, label %Tail, label %TBB

Show All 31 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CallSiteSplitting] Remove isOrHeader restriction.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 125142

lib/Transforms/Scalar/CallSiteSplitting.cpp

test/Transforms/CallSiteSplitting/callsite-split-or-phi.ll

[CallSiteSplitting] Remove isOrHeader restriction.
ClosedPublic