This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall
ClosedPublic

Authored by Carrot on Mar 21 2020, 1:17 AM.

Download Raw Diff

Details

Reviewers

arsenm
xbolva00

Commits

rG6d20937c29a1: [CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall

Summary

The attached test case is simplified from tcmalloc. Both function calls should be optimized as tailcall. But llvm can only optimize the first call. The second call can't be optimized because function dupRetToEnableTailCallOpts failed to duplicate ret into block case2.

There 2 problems blocked the duplication:

1 Intrinsic call llvm.assume is not handled by dupRetToEnableTailCallOpts
2 The control flow is more complex than expected, dupRetToEnableTailCallOpts can only duplicate ret into its predecessor, but here we have an intermediate block between call and ret.

The solutions:

1 Since CodeGenPrepare is already at the end of LLVM IR phase, we can simply delete the intrinsic call to llvm.assume.
2 A general solution to the complex control flow is hard, but for this case, after exit2 is duplicated into case1, exit2 is the only successor of exit1 and exit1 is the only predecessor of exit2, so they can be combined through eliminateFallThrough. But this function is called too late, there is no more dupRetToEnableTailCallOpts after it. We can add an earlier call to eliminateFallThrough to solve it.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Carrot created this revision.Mar 21 2020, 1:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2020, 1:17 AM

Herald added subscribers: llvm-commits, hiraditya, wdng. · View Herald Transcript

Harbormaster completed remote builds in B49988: Diff 251827.Mar 21 2020, 2:06 AM

arsenm added inline comments.Mar 22 2020, 7:57 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
1989–1990	I would expect like object size/is_constant this would have been lowered already? I do disagree with using unreachable to error on this case though

Carrot marked an inline comment as done.Mar 23 2020, 9:43 AM

Carrot added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1989–1990	This is not in my patch. Do you want me to delete these 2 cases in the same patch?

arsenm added inline comments.Mar 23 2020, 10:17 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
1989–1990	I know, but I would expect this case to be handle the same way. These should also not be deleted, just upgraded to an error that won't be deleted in a release build

Carrot marked an inline comment as done.Mar 23 2020, 5:13 PM

Carrot added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1989–1990	objectsize and is_constant are lowered at LowerConstantIntrinsics, Intrinsic::assume does not generate instructions, so was simply ignored at SelectionDAG. Do you mean delete it at LowerConstantIntrinsics? Also does "just upgraded to an error that won't be deleted in a release build" mean Ctx.emitError()? thanks!

arsenm added inline comments.Mar 24 2020, 9:14 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
1989–1990	That or report_fatal_error

arsenm added inline comments.Mar 24 2020, 9:19 AM

llvm/test/CodeGen/X86/tailcall-assume-xbb.ll
1 ↗	(On Diff #251827)	Move to test/CodeGenPrepare/X86?
2 ↗	(On Diff #251827)	Add a comment explaining what this tests?

Carrot updated this revision to Diff 252362.Mar 24 2020, 10:13 AM

Carrot marked 4 inline comments as done.

Carrot added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1989–1990	Also delete Intrinsic::assume in LowerConstantIntrinsics?
llvm/test/CodeGen/X86/tailcall-assume-xbb.ll
1 ↗	(On Diff #251827)	I assume you mean test/Transforms/CodeGenPrepare/X86.

arsenm added inline comments.Mar 24 2020, 11:14 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
1989–1990	It's not handled there as you said, so I'm not sure what you're asking?

Carrot marked an inline comment as done.Mar 24 2020, 12:28 PM

Carrot added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1989–1990	Sorry for my bad English :( Is this version OK now?

xbolva00 added inline comments.Mar 29 2020, 9:46 AM

llvm/lib/CodeGen/CodeGenPrepare.cpp
1971	Do we need to run RecursivelyDeleteTriviallyDeadInstructions on arg?
1992	Split these changes to new patch?

Carrot updated this revision to Diff 253699.Mar 30 2020, 2:09 PM

Carrot marked 2 inline comments as done.

Carrot added inline comments.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1971	This function is called in a "while (MadeChange)" loop in function runOnFunction, so the instruction generates the assume condition will be deleted in next iteration. Either or not call RecursivelyDeleteTriviallyDeadInstructions are both OK.

Ok, thanks.

This revision is now accepted and ready to land.Mar 30 2020, 2:32 PM

Closed by commit rG6d20937c29a1: [CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall (authored by Carrot). · Explain WhyMar 31 2020, 12:00 PM

This revision was automatically updated to reflect the committed changes.

This likely broke many tests on Windows: http://45.33.8.238/win/11741/step_7.txt / http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/15029

Can you take a look, and revert while you investigate if it takes a while?

This patch doesn't do any register level changes, so it looks unlike the root cause of the failure, maybe some unrelated bug was uncovered by this patch.

Do you have a linux reproduction? This test passed on my linux desktop.

I took a look at the first failed test case amdgpu-hip-implicit-kernarg.cu, it contains an empty kernel function, it should not trigger any optimization code.

Does https://reviews.llvm.org/rG7e4e9f4a2fcb096778fb81fc96da6bb8aa966661 fixed this failure?

junparser mentioned this in D95424: [CodeGenPrepare] Also skip lifetime.end intrinsic when check return block in dupRetToEnableTailCallOpts.Jan 28 2021, 6:44 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CodeGenPrepare.cpp

8 lines

test/

Transforms/

CodeGenPrepare/

X86/

extend-sink-hoist.ll

2 lines

optimizeSelect-DT.ll

5 lines

tailcall-assume-xbb.ll

48 lines

Diff 253952

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 489 Lines • ▼ Show 20 Lines	for (Function::iterator I = F.begin(); I != F.end(); ) {
if (ModifiedDTOnIteration)		if (ModifiedDTOnIteration)
break;		break;
}		}
if (EnableTypePromotionMerge && !ValToSExtendedUses.empty())		if (EnableTypePromotionMerge && !ValToSExtendedUses.empty())
MadeChange \|= mergeSExts(F);		MadeChange \|= mergeSExts(F);
if (!LargeOffsetGEPMap.empty())		if (!LargeOffsetGEPMap.empty())
MadeChange \|= splitLargeGEPOffsets();		MadeChange \|= splitLargeGEPOffsets();

		if (MadeChange)
		eliminateFallThrough(F);

// Really free removed instructions during promotion.		// Really free removed instructions during promotion.
for (Instruction *I : RemovedInsts)		for (Instruction *I : RemovedInsts)
I->deleteValue();		I->deleteValue();

EverMadeChange \|= MadeChange;		EverMadeChange \|= MadeChange;
SeenChainsForSExt.clear();		SeenChainsForSExt.clear();
ValToSExtendedUses.clear();		ValToSExtendedUses.clear();
RemovedInsts.clear();		RemovedInsts.clear();
▲ Show 20 Lines • Show All 1,453 Lines • ▼ Show 20 Lines	for (auto &Arg : CI->arg_operands()) {
unsigned AS = Arg->getType()->getPointerAddressSpace();		unsigned AS = Arg->getType()->getPointerAddressSpace();
return optimizeMemoryInst(CI, Arg, Arg->getType(), AS);		return optimizeMemoryInst(CI, Arg, Arg->getType(), AS);
}		}

IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);		IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI);
if (II) {		if (II) {
switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default: break;		default: break;
		case Intrinsic::assume: {
		II->eraseFromParent();
		xbolva00Unsubmitted Not Done Reply Inline Actions Do we need to run RecursivelyDeleteTriviallyDeadInstructions on arg? xbolva00: Do we need to run RecursivelyDeleteTriviallyDeadInstructions on arg?
		CarrotAuthorUnsubmitted Done Reply Inline Actions This function is called in a "while (MadeChange)" loop in function runOnFunction, so the instruction generates the assume condition will be deleted in next iteration. Either or not call RecursivelyDeleteTriviallyDeadInstructions are both OK. Carrot: This function is called in a "while (MadeChange)" loop in function runOnFunction, so the…
		return true;
		}

case Intrinsic::experimental_widenable_condition: {		case Intrinsic::experimental_widenable_condition: {
// Give up on future widening oppurtunties so that we can fold away dead		// Give up on future widening oppurtunties so that we can fold away dead
// paths and merge blocks before going into block-local instruction		// paths and merge blocks before going into block-local instruction
// selection.		// selection.
if (II->use_empty()) {		if (II->use_empty()) {
II->eraseFromParent();		II->eraseFromParent();
return true;		return true;
}		}
Constant *RetVal = ConstantInt::getTrue(II->getContext());		Constant *RetVal = ConstantInt::getTrue(II->getContext());
resetIteratorIfInvalidatedWhileCalling(BB, [&]() {		resetIteratorIfInvalidatedWhileCalling(BB, [&]() {
replaceAndRecursivelySimplify(CI, RetVal, TLInfo, nullptr);		replaceAndRecursivelySimplify(CI, RetVal, TLInfo, nullptr);
});		});
return true;		return true;
}		}
case Intrinsic::objectsize:		case Intrinsic::objectsize:
llvm_unreachable("llvm.objectsize.* should have been lowered already");		llvm_unreachable("llvm.objectsize.* should have been lowered already");
		arsenmUnsubmitted Not Done Reply Inline Actions I would expect like object size/is_constant this would have been lowered already? I do disagree with using unreachable to error on this case though arsenm: I would expect like object size/is_constant this would have been lowered already? I do disagree…
		CarrotAuthorUnsubmitted Done Reply Inline Actions This is not in my patch. Do you want me to delete these 2 cases in the same patch? Carrot: This is not in my patch. Do you want me to delete these 2 cases in the same patch?
		arsenmUnsubmitted Not Done Reply Inline Actions I know, but I would expect this case to be handle the same way. These should also not be deleted, just upgraded to an error that won't be deleted in a release build arsenm: I know, but I would expect this case to be handle the same way. These should also not be…
		CarrotAuthorUnsubmitted Done Reply Inline Actions objectsize and is_constant are lowered at LowerConstantIntrinsics, Intrinsic::assume does not generate instructions, so was simply ignored at SelectionDAG. Do you mean delete it at LowerConstantIntrinsics? Also does "just upgraded to an error that won't be deleted in a release build" mean Ctx.emitError()? thanks! Carrot: objectsize and is_constant are lowered at LowerConstantIntrinsics, Intrinsic::assume does not…
		arsenmUnsubmitted Not Done Reply Inline Actions That or report_fatal_error arsenm: That or report_fatal_error
		CarrotAuthorUnsubmitted Done Reply Inline Actions Also delete Intrinsic::assume in LowerConstantIntrinsics? Carrot: Also delete Intrinsic::assume in LowerConstantIntrinsics?
		arsenmUnsubmitted Not Done Reply Inline Actions It's not handled there as you said, so I'm not sure what you're asking? arsenm: It's not handled there as you said, so I'm not sure what you're asking?
		CarrotAuthorUnsubmitted Done Reply Inline Actions Sorry for my bad English :( Is this version OK now? Carrot: Sorry for my bad English :( Is this version OK now?
case Intrinsic::is_constant:		case Intrinsic::is_constant:
llvm_unreachable("llvm.is.constant.* should have been lowered already");		llvm_unreachable("llvm.is.constant.* should have been lowered already");
		xbolva00Unsubmitted Done Reply Inline Actions Split these changes to new patch? xbolva00: Split these changes to new patch?
case Intrinsic::aarch64_stlxr:		case Intrinsic::aarch64_stlxr:
case Intrinsic::aarch64_stxr: {		case Intrinsic::aarch64_stxr: {
ZExtInst *ExtVal = dyn_cast<ZExtInst>(CI->getArgOperand(0));		ZExtInst *ExtVal = dyn_cast<ZExtInst>(CI->getArgOperand(0));
if (!ExtVal \|\| !ExtVal->hasOneUse() \|\|		if (!ExtVal \|\| !ExtVal->hasOneUse() \|\|
ExtVal->getParent() == CI->getParent())		ExtVal->getParent() == CI->getParent())
return false;		return false;
// Sink a zext feeding stlxr/stxr before it, so it can be folded into it.		// Sink a zext feeding stlxr/stxr before it, so it can be folded into it.
ExtVal->moveBefore(CI);		ExtVal->moveBefore(CI);
▲ Show 20 Lines • Show All 5,570 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/X86/extend-sink-hoist.ll

	; RUN: opt -codegenprepare -disable-cgp-branch-opts -S < %s \| FileCheck %s			; RUN: opt -codegenprepare -disable-cgp-branch-opts -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	; The first cast should be sunk into block2, in order that the			; The first cast should be sunk into block2, in order that the
	; instruction selector can form an efficient			; instruction selector can form an efficient
	; i64 * i64 -> i128 multiplication.			; i64 * i64 -> i128 multiplication.
	define i128 @sink(i64* %mem1, i64* %mem2) {			define i128 @sink(i64* %mem1, i64* %mem2) {
	; CHECK-LABEL: block1:			; CHECK-LABEL: block1:
	; CHECK-NEXT: load			; CHECK-NEXT: load
	block1:			block1:
	%l1 = load i64, i64* %mem1			%l1 = load i64, i64* %mem1
	%s1 = sext i64 %l1 to i128			%s1 = sext i64 %l1 to i128
	br label %block2			br label %block2

	; CHECK-LABEL: block2:
	; CHECK-NEXT: sext			; CHECK-NEXT: sext
	; CHECK-NEXT: load			; CHECK-NEXT: load
	; CHECK-NEXT: sext			; CHECK-NEXT: sext
	block2:			block2:
	%l2 = load i64, i64* %mem2			%l2 = load i64, i64* %mem2
	%s2 = sext i64 %l2 to i128			%s2 = sext i64 %l2 to i128
	%res = mul i128 %s1, %s2			%res = mul i128 %s1, %s2
	ret i128 %res			ret i128 %res
	}			}

	; The first cast should be hoisted into block1, in order that the			; The first cast should be hoisted into block1, in order that the
	; instruction selector can form an extend-load.			; instruction selector can form an extend-load.
	define i64 @hoist(i32* %mem1, i32* %mem2) {			define i64 @hoist(i32* %mem1, i32* %mem2) {
	; CHECK-LABEL: block1:			; CHECK-LABEL: block1:
	; CHECK-NEXT: load			; CHECK-NEXT: load
	; CHECK-NEXT: sext			; CHECK-NEXT: sext
	block1:			block1:
	%l1 = load i32, i32* %mem1			%l1 = load i32, i32* %mem1
	br label %block2			br label %block2

	; CHECK-LABEL: block2:
	; CHECK-NEXT: load			; CHECK-NEXT: load
	; CHECK-NEXT: sext			; CHECK-NEXT: sext
	block2:			block2:
	%s1 = sext i32 %l1 to i64			%s1 = sext i32 %l1 to i64
	%l2 = load i32, i32* %mem2			%l2 = load i32, i32* %mem2
	%s2 = sext i32 %l2 to i64			%s2 = sext i32 %l2 to i64
	%res = mul i64 %s1, %s2			%res = mul i64 %s1, %s2
	ret i64 %res			ret i64 %res
	Show All 19 Lines

llvm/test/Transforms/CodeGenPrepare/X86/optimizeSelect-DT.ll

	Show All 9 Lines
	; CHECK-NEXT: [[MUL_FR:%.]] = freeze i32 [[Y:%.]]			; CHECK-NEXT: [[MUL_FR:%.]] = freeze i32 [[Y:%.]]
	; CHECK-NEXT: [[T0:%.*]] = icmp eq i32 [[MUL_FR]], 1			; CHECK-NEXT: [[T0:%.*]] = icmp eq i32 [[MUL_FR]], 1
	; CHECK-NEXT: br i1 [[T0]], label [[SELECT_TRUE_SINK:%.]], label [[SELECT_END:%.]]			; CHECK-NEXT: br i1 [[T0]], label [[SELECT_TRUE_SINK:%.]], label [[SELECT_END:%.]]
	; CHECK: select.true.sink:			; CHECK: select.true.sink:
	; CHECK-NEXT: [[REM:%.]] = srem i32 [[X:%.]], 2			; CHECK-NEXT: [[REM:%.]] = srem i32 [[X:%.]], 2
	; CHECK-NEXT: br label [[SELECT_END]]			; CHECK-NEXT: br label [[SELECT_END]]
	; CHECK: select.end:			; CHECK: select.end:
	; CHECK-NEXT: [[MUL:%.]] = phi i32 [ [[REM]], [[SELECT_TRUE_SINK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[MUL:%.]] = phi i32 [ [[REM]], [[SELECT_TRUE_SINK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[NEG:%.]] = add i32 [[T1:%.]], -1			; CHECK-NEXT: [[USUB:%.]] = call { i32, i1 } @llvm.usub.with.overflow.i32(i32 [[T1:%.]], i32 1)
				; CHECK-NEXT: [[NEG:%.*]] = extractvalue { i32, i1 } [[USUB]], 0
				; CHECK-NEXT: [[TOBOOL:%.*]] = extractvalue { i32, i1 } [[USUB]], 1
	; CHECK-NEXT: [[ADD:%.*]] = add i32 [[NEG]], [[MUL]]			; CHECK-NEXT: [[ADD:%.*]] = add i32 [[NEG]], [[MUL]]
	; CHECK-NEXT: [[TOBOOL:%.*]] = icmp eq i32 [[T1]], 0
	; CHECK-NEXT: ret i1 [[TOBOOL]]			; CHECK-NEXT: ret i1 [[TOBOOL]]
	;			;
	entry:			entry:
	%rem = srem i32 %x, 2			%rem = srem i32 %x, 2
	%t0 = icmp eq i32 %y, 1			%t0 = icmp eq i32 %y, 1
	%mul = select i1 %t0, i32 %rem, i32 0			%mul = select i1 %t0, i32 %rem, i32 0
	%neg = add i32 %t1, -1			%neg = add i32 %t1, -1
	%add = add i32 %neg, %mul			%add = add i32 %neg, %mul
	br label %if			br label %if

	if:			if:
	%tobool = icmp eq i32 %t1, 0			%tobool = icmp eq i32 %t1, 0
	ret i1 %tobool			ret i1 %tobool
	}			}

llvm/test/Transforms/CodeGenPrepare/X86/tailcall-assume-xbb.ll

This file was added.

				; RUN: opt -codegenprepare -S -mtriple=x86_64-linux < %s \| FileCheck %s

				; The ret instruction can be duplicated into BB case2 even though there is an
				; intermediate BB exit1 and call to llvm.assume.

				@ptr = external global i8*, align 8

				; CHECK: %ret1 = tail call i8* @qux()
				; CHECK-NEXT: ret i8* %ret1

				; CHECK: %ret2 = tail call i8* @bar()
				; CHECK-NEXT: ret i8* %ret2

				define i8* @foo(i64 %size, i64 %v1, i64 %v2) {
				entry:
				%cmp1 = icmp ult i64 %size, 1025
				br i1 %cmp1, label %if.end, label %case1

				case1:
				%ret1 = tail call i8* @qux()
				br label %exit2

				if.end:
				%cmp2 = icmp ult i64 %v1, %v2
				br i1 %cmp2, label %case3, label %case2

				case2:
				%ret2 = tail call i8* @bar()
				br label %exit1

				case3:
				%ret3 = load i8, i8* @ptr, align 8
				br label %exit1

				exit1:
				%retval1 = phi i8* [ %ret2, %case2 ], [ %ret3, %case3 ]
				%cmp3 = icmp ne i8* %retval1, null
				tail call void @llvm.assume(i1 %cmp3)
				br label %exit2

				exit2:
				%retval2 = phi i8* [ %ret1, %case1 ], [ %retval1, %exit1 ]
				ret i8* %retval2
				}

				declare void @llvm.assume(i1)
				declare i8* @qux()
				declare i8* @bar()

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcallClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 253952

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/test/Transforms/CodeGenPrepare/X86/extend-sink-hoist.ll

llvm/test/Transforms/CodeGenPrepare/X86/optimizeSelect-DT.ll

llvm/test/Transforms/CodeGenPrepare/X86/tailcall-assume-xbb.ll

[CodeGenPrepare] Delete intrinsic call to llvm.assume to enable more tailcall
ClosedPublic