This is an archive of the discontinued LLVM Phabricator instance.

[TailCall] Disable tail call if the callee function contain __builtin_frame_address or __builtin_return_address
AbandonedPublic

Authored by shiva0217 on May 7 2019, 8:03 PM.

Download Raw Diff

Details

Reviewers: None

Summary

Enabling tail call may remove the frame pointer and return address restoration in caller which will make the above two builtin functions get incorrect value if the depth parameter > 0.

E.g.

void __attribute__((noinline)) *callee (char *p) {
    return __builtin_frame_address (1);
}
void *caller (void) {
    char * save = (char*) alloca (4);
    return callee (save);
}

Diff Detail

Repository: rL LLVM

Event Timeline

shiva0217 created this revision.May 7 2019, 8:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2019, 8:03 PM

Herald added subscribers: jocewei, PkmX, the_o and 13 others. · View Herald Transcript

shiva0217 mentioned this in D61626: [RISCV] Disable tail call if the callee function contain __builtin_frame_address or __builtin_return_address.May 7 2019, 8:12 PM

The description of the llvm.frameaddress and llvm.returnaddress intrinsics seems to indicate that these are "best effort" and LLVM doesn't really guarantee a correct result for a depth > 1 https://llvm.org/docs/LangRef.html#llvm-returnaddress-intrinsic https://llvm.org/docs/LangRef.html#llvm-returnaddress-intrinsic

Is there a particular use case that is improved by improving the quality of frameaddress/returnaddress results?

In D61665#1494592, @asb wrote:

The description of the llvm.frameaddress and llvm.returnaddress intrinsics seems to indicate that these are "best effort" and LLVM doesn't really guarantee a correct result for a depth > 1 https://llvm.org/docs/LangRef.html#llvm-returnaddress-intrinsic https://llvm.org/docs/LangRef.html#llvm-returnaddress-intrinsic

Is there a particular use case that is improved by improving the quality of frameaddress/returnaddress results?

Hi Alex,
There is a test case gcc/testsuite/gcc.c-torture/execute/20010122-1.c in gcc testsuite. GCC will not generate tail call and can get the correct builtin function results for depth > 0 in this case.

I have a few concerns here:

Looping over every instruction in a function is expensive, and makes any pass which checks this for every call in a function take quadratic time overall.
You can't inspect the body of a function pointer, or a function in a different translation unit, so we can't make this work consistently.
Even in the same translation unit, how do we "preserve" the behavior for values greater than 1?

I'd prefer to just leave the current behavior if it isn't causing any practical problems. The user can always use -fno-optimize-sibling-calls if their codebase needs it for some reason.

In D61665#1495701, @efriedma wrote:

I have a few concerns here:

Looping over every instruction in a function is expensive, and makes any pass which checks this for every call in a function take quadratic time overall.

You can't inspect the body of a function pointer, or a function in a different translation unit, so we can't make this work consistently.

Even in the same translation unit, how do we "preserve" the behavior for values greater than 1?

I'd prefer to just leave the current behavior if it isn't causing any practical problems. The user can always use -fno-optimize-sibling-calls if their codebase needs it for some reason.

If we do wish to make our "best effort" contain more effort, I think that we'd want to do this during function-attribute inference - there we can iterate over the call graph and add some inhibiting attributes. That having been said, if the only use case we have for this is matching some portion of GCC's heuristic for the purpose of making their test case pass, I'm not sure that this is worthwhile.

In D61665#1495723, @hfinkel wrote:

In D61665#1495701, @efriedma wrote:

I have a few concerns here:

Looping over every instruction in a function is expensive, and makes any pass which checks this for every call in a function take quadratic time overall.

It may too expansive, using function attribute as Hal's comment would be a better approach.

You can't inspect the body of a function pointer, or a function in a different translation unit, so we can't make this work consistently.

Yes, we can't detect the function in a different translation unit or pointed by a function pointer.

Even in the same translation unit, how do we "preserve" the behavior for values greater than 1?

Yes, disabling tail call can only preserve the stack in depth 1, it may have other optimizations change the behavior of the depth greater than one.

I'd prefer to just leave the current behavior if it isn't causing any practical problems. The user can always use -fno-optimize-sibling-calls if their codebase needs it for some reason.

If we do wish to make our "best effort" contain more effort, I think that we'd want to do this during function-attribute inference - there we can iterate over the call graph and add some inhibiting attributes. That having been said, if the only use case we have for this is matching some portion of GCC's heuristic for the purpose of making their test case pass, I'm not sure that this is worthwhile.

I think the consensus is to leave the current behavior. Thanks @eli.friedman pointed out the cases the patch can't cover and the better approach suggested by @hfinkel if we would like to do so.

Yes, I think leaving the current behaviour probably makes sense. I encountered that test failure too and mask that test on the basis that that those builtins aren't documented as being guaranteed to work https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html (that page even notes that crashing is allowable behaviour).

In D61665#1496085, @asb wrote:

Yes, I think leaving the current behaviour probably makes sense. I encountered that test failure too and mask that test on the basis that that those builtins aren't documented as being guaranteed to work https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html (that page even notes that crashing is allowable behaviour).

Hi, Alex
Thanks for the information.

Revision Contents

Path

Size

lib/

CodeGen/

Analysis.cpp

44 lines

test/

CodeGen/

RISCV/

builtin-frame-address.ll

70 lines

builtin-return-address.ll

64 lines

Diff 198579

lib/CodeGen/Analysis.cpp

Show First 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	if (!advanceToNextLeafType(SubTypes, Path))
return false;		return false;

assert(!Path.empty() && "found a leaf but didn't set the path?");		assert(!Path.empty() && "found a leaf but didn't set the path?");
} while (SubTypes.back()->getTypeAtIndex(Path.back())->isAggregateType());		} while (SubTypes.back()->getTypeAtIndex(Path.back())->isAggregateType());

return true;		return true;
}		}

		// Return true if the instruction is __builtin_frame_address or
		// __builtin_return_address function call.
		static bool isFrameAddressOrReturnAddressCall(const Instruction *I) {
		if (const CallInst *CI = dyn_cast<CallInst>(I))
		if (Function *IntrinsicF = CI->getCalledFunction()) {
		Intrinsic::ID ID = IntrinsicF->getIntrinsicID();
		if ((ID == Intrinsic::frameaddress) \|\|
		(ID == Intrinsic::returnaddress))
		return true;
		}
		return false;
		}

		// Return frame depth argument of __builtin_frame_address or
		// __builtin_return_address.
		// Return -1 if the instruction is not a __builtin_frame_address or
		// __builtin_return_address function call.
		static int getFrameDepthArg(const Instruction *I) {
		if (!isFrameAddressOrReturnAddressCall(I))
		return -1;
		const CallInst *CI = cast<CallInst>(I);
		Constant C = cast<Constant>(CI->arg_begin());
		return C->getUniqueInteger().getZExtValue();
		}

/// Test if the given instruction is in a position to be optimized		/// Test if the given instruction is in a position to be optimized
/// with a tail-call. This roughly means that it's in a block with		/// with a tail-call. This roughly means that it's in a block with
/// a return and there's nothing that needs to be scheduled		/// a return and there's nothing that needs to be scheduled
/// between it and the return.		/// between it and the return.
///		///
/// This function only tests target-independent requirements.		/// This function only tests target-independent requirements.
bool llvm::isInTailCallPosition(ImmutableCallSite CS, const TargetMachine &TM) {		bool llvm::isInTailCallPosition(ImmutableCallSite CS, const TargetMachine &TM) {
const Instruction *I = CS.getInstruction();		const Instruction *I = CS.getInstruction();
const BasicBlock *ExitBB = I->getParent();		const BasicBlock *ExitBB = I->getParent();
const Instruction *Term = ExitBB->getTerminator();		const Instruction *Term = ExitBB->getTerminator();
const ReturnInst *Ret = dyn_cast<ReturnInst>(Term);		const ReturnInst *Ret = dyn_cast<ReturnInst>(Term);
		const Function *CalleeFn = CS.getCalledFunction();

// The block must end in a return statement or unreachable.		// The block must end in a return statement or unreachable.
//		//
// FIXME: Decline tailcall if it's not guaranteed and if the block ends in		// FIXME: Decline tailcall if it's not guaranteed and if the block ends in
// an unreachable, for now. The way tailcall optimization is currently		// an unreachable, for now. The way tailcall optimization is currently
// implemented means it will add an epilogue followed by a jump. That is		// implemented means it will add an epilogue followed by a jump. That is
// not profitable. Also, if the callee is a special function (e.g.		// not profitable. Also, if the callee is a special function (e.g.
// longjmp on x86), it can end up causing miscompilation that has not		// longjmp on x86), it can end up causing miscompilation that has not
Show All 16 Lines	for (BasicBlock::const_iterator BBI = std::prev(ExitBB->end(), 2);; --BBI) {
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(BBI))		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(BBI))
if (II->getIntrinsicID() == Intrinsic::lifetime_end)		if (II->getIntrinsicID() == Intrinsic::lifetime_end)
continue;		continue;
if (BBI->mayHaveSideEffects() \|\| BBI->mayReadFromMemory() \|\|		if (BBI->mayHaveSideEffects() \|\| BBI->mayReadFromMemory() \|\|
!isSafeToSpeculativelyExecute(&*BBI))		!isSafeToSpeculativelyExecute(&*BBI))
return false;		return false;
}		}

		// Do not do tail call if the callee function contain __builtin_frame_address
		// or __builtin_return_address.
		//
		// Enabling tail call may remove the frame pointer and return address
		// restoration in caller which will make the above two builtin functions get
		// incorrect value if the depth parameter > 0.
		// E.g.
		// void __attribute__((noinline))
		// callee (char p) { return __builtin_frame_address (1); }
		// void
		// caller (void) { char save = (char*) alloca (4);
		// return callee (save); }
		if (CalleeFn != nullptr) {
		for (const BasicBlock &BB : *CalleeFn)
		for (const Instruction &I : BB)
		if (getFrameDepthArg(&I) > 0)
		return false;
		}

const Function *F = ExitBB->getParent();		const Function *F = ExitBB->getParent();
return returnTypeIsEligibleForTailCall(		return returnTypeIsEligibleForTailCall(
F, I, Ret, TM.getSubtargetImpl(F)->getTargetLowering());		F, I, Ret, TM.getSubtargetImpl(F)->getTargetLowering());
}		}

bool llvm::attributesPermitTailCall(const Function F, const Instruction I,		bool llvm::attributesPermitTailCall(const Function F, const Instruction I,
const ReturnInst *Ret,		const ReturnInst *Ret,
const TargetLoweringBase &TLI,		const TargetLoweringBase &TLI,
▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

test/CodeGen/RISCV/builtin-frame-address.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s -check-prefix=RV32I

				; The test case check that tail call optimization will be suppressed for
				; @llvm.frameaddress with depth > 0.
				; Otherwise, @llvm.frameaddress(i32 1) will get wrong frame address.

				; Tail call will be suppressed in caller1 because callee1 contain
				; @llvm.rameaddress(i32 1).
				define i8* @caller1() {
				; RV32I-LABEL: caller1:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: addi sp, sp, -16
				; RV32I-NEXT: sw ra, 12(sp)
				; RV32I-NEXT: call callee1
				; RV32I-NEXT: lw ra, 12(sp)
				; RV32I-NEXT: addi sp, sp, 16
				; RV32I-NEXT: ret
				entry:
				%call = tail call i8* @callee1(i8* undef)
				ret i8* %call
				}

				define i8* @callee1(i8* nocapture readnone %p) {
				; RV32I-LABEL: callee1:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: addi sp, sp, -16
				; RV32I-NEXT: sw ra, 12(sp)
				; RV32I-NEXT: sw s0, 8(sp)
				; RV32I-NEXT: addi s0, sp, 16
				; RV32I-NEXT: lw a0, -8(s0)
				; RV32I-NEXT: lw s0, 8(sp)
				; RV32I-NEXT: lw ra, 12(sp)
				; RV32I-NEXT: addi sp, sp, 16
				; RV32I-NEXT: ret
				entry:
				%0 = tail call i8* @llvm.frameaddress(i32 1)
				ret i8* %0
				}

				; Tail call won't be suppressed in caller0 because callee0 contain
				; @llvm.frameaddress(i32 0) which will not backtrace to caller0's stack.
				define i8* @caller0() {
				; RV32I-LABEL: caller0:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: tail callee0
				entry:
				%call = tail call i8* @callee0(i8* undef)
				ret i8* %call
				}

				define i8* @callee0(i8* nocapture readnone %p) {
				; RV32I-LABEL: callee0:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: addi sp, sp, -16
				; RV32I-NEXT: sw ra, 12(sp)
				; RV32I-NEXT: sw s0, 8(sp)
				; RV32I-NEXT: addi s0, sp, 16
				; RV32I-NEXT: mv a0, s0
				; RV32I-NEXT: lw s0, 8(sp)
				; RV32I-NEXT: lw ra, 12(sp)
				; RV32I-NEXT: addi sp, sp, 16
				; RV32I-NEXT: ret
				entry:
				%0 = tail call i8* @llvm.frameaddress(i32 0)
				ret i8* %0
				}

				declare i8* @llvm.frameaddress(i32)

test/CodeGen/RISCV/builtin-return-address.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s -check-prefix=RV32I

				; The test case check that tail call optimization will be suppressed for
				; @llvm.returnaddress with depth > 0.
				; Otherwise, @llvm.returnaddress(i32 1) will get wrong return address.

				; Tail call will be suppressed in caller1 because callee1 contain
				; @llvm.returnaddress(i32 1).
				define i8* @caller1() {
				; RV32I-LABEL: caller1:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: addi sp, sp, -16
				; RV32I-NEXT: sw ra, 12(sp)
				; RV32I-NEXT: call callee1
				; RV32I-NEXT: lw ra, 12(sp)
				; RV32I-NEXT: addi sp, sp, 16
				; RV32I-NEXT: ret
				entry:
				%call = tail call i8* @callee1(i8* undef)
				ret i8* %call
				}

				define i8* @callee1(i8* nocapture readnone %p) {
				; RV32I-LABEL: callee1:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: addi sp, sp, -16
				; RV32I-NEXT: sw ra, 12(sp)
				; RV32I-NEXT: sw s0, 8(sp)
				; RV32I-NEXT: addi s0, sp, 16
				; RV32I-NEXT: lw a0, -8(s0)
				; RV32I-NEXT: lw a0, -4(a0)
				; RV32I-NEXT: lw s0, 8(sp)
				; RV32I-NEXT: lw ra, 12(sp)
				; RV32I-NEXT: addi sp, sp, 16
				; RV32I-NEXT: ret
				entry:
				%0 = tail call i8* @llvm.returnaddress(i32 1)
				ret i8* %0
				}

				; Tail call won't be suppressed in caller0 because callee0 contain
				; @llvm.returnaddress(i32 0) which will not backtrace to caller0's stack.
				define i8* @caller0() {
				; RV32I-LABEL: caller0:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: tail callee0
				entry:
				%call = tail call i8* @callee0(i8* undef)
				ret i8* %call
				}

				define i8* @callee0(i8* nocapture readnone %p) {
				; RV32I-LABEL: callee0:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: mv a0, ra
				; RV32I-NEXT: ret
				entry:
				%0 = tail call i8* @llvm.returnaddress(i32 0)
				ret i8* %0
				}

				declare i8* @llvm.returnaddress(i32)