This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
-
CGVTables.cpp
-
test/CodeGenCXX/
-
CodeGenCXX/
-
microsoft-abi-structors.cpp
-
tail-byval.cpp

Differential D22900

Revert r244207 - Mark calls in thunk functions as tail-call optimization
Needs ReviewPublic

Authored by Gerolf on Jul 27 2016, 7:15 PM.

Download Raw Diff

Details

Reviewers

eli.friedman

Summary

This is just closing the loop for
https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg28837.html with a
test case and fixes PR28748 which had been introduced by r244207.

Diff Detail

Event Timeline

Gerolf updated this revision to Diff 65858.Jul 27 2016, 7:15 PM

Gerolf retitled this revision from to Revert r244207 - Mark calls in thunk functions as tail-call optimization.

Gerolf updated this object.

Gerolf added reviewers: eli.friedman, mkuper.

Gerolf added a subscriber: cfe-commits.

The test seems a little large, the following shows that we emit a tail call with a byval argument on trunk.

struct LARGE {
  union {
    int i;
  };
};
struct I {
  virtual void m_fn1(LARGE);
};
struct CBase {
  virtual ~CBase();
};
struct C : CBase, I {
  void Seek(LARGE);
};
void C::Seek(LARGE) {}

Please add the options you used to compile? I can certainly shrink the test case a bit before I commit.

In D22900#498781, @Gerolf wrote:

Please add the options you used to compile? I can certainly shrink the test case a bit before I commit.

clang -cc1 -x c++ -emit-llvm -triple i386-apple-darwin9 t.ii -o -

Nope, I don't see the tail call. Anyway, I'll simplify my test case. Don't worry about it.

clang++ -cc1 -x c++ -emit-llvm -triple i386-apple-darwin9 t.cpp

cat t.ll:

; ModuleID = 't.cpp'
source_filename = "t.cpp"
target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
target triple = "i386-apple-darwin9"

%struct.C = type { %struct.CBase, %struct.I }
%struct.CBase = type { i32 (...) }
%struct.I = type { i32 (...) }
%struct.LARGE = type { %union.anon }
%union.anon = type { i32 }

; Function Attrs: nounwind
define void @_ZN1C4SeekE5LARGE(%struct.C* %this, %struct.LARGE* byval align 4) #0 align 2 {
entry:

%this.addr = alloca %struct.C*, align 4
store %struct.C* %this, %struct.C** %this.addr, align 4
%this1 = load %struct.C*, %struct.C** %this.addr, align 4
ret void

}

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-features"="+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

In D22900#498793, @Gerolf wrote:
Nope, I don't see the tail call. Anyway, I'll simplify my test case. Don't worry about it.

clang++ -cc1 -x c++ -emit-llvm -triple i386-apple-darwin9 t.cpp

cat t.ll:

; ModuleID = 't.cpp'
source_filename = "t.cpp"
target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
target triple = "i386-apple-darwin9"

%struct.C = type { %struct.CBase, %struct.I }
%struct.CBase = type { i32 (...) }
%struct.I = type { i32 (...) }
%struct.LARGE = type { %union.anon }
%union.anon = type { i32 }

; Function Attrs: nounwind
define void @_ZN1C4SeekE5LARGE(%struct.C* %this, %struct.LARGE* byval align 4) #0 align 2 {
entry:
%this.addr = alloca %struct.C*, align 4
store %struct.C* %this, %struct.C** %this.addr, align 4
%this1 = load %struct.C*, %struct.C** %this.addr, align 4
ret void
}

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-features"="+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

D'oh, that m_fn1 should have been Seek :/

$ cat t.ii

struct LARGE {
  union {
    int i;
  };
};
struct I {
  virtual void m_fn1(LARGE);
};
struct CBase {
  virtual ~CBase();
};
struct C : CBase, I {
  void m_fn1(LARGE);
};
void C::m_fn1(LARGE) {}

$ ~/llvm/Debug+Asserts/bin/clang -cc1 -x c++ -emit-llvm -triple i386-apple-darwin9 t.ii && grep tail t.ll

tail call void @_ZN1C5m_fn1E5LARGE(%struct.C* %3, %struct.LARGE* byval align 4 %0)

Reduced test case.

I really don't understand anything about this. :-)

aaboud added subscribers: rnk, aaboud.Jul 31 2016, 12:07 AM

Reverting rL244207, would be fine if we assure that PR24235, is fixed in another way.
I added Reid who helped reviewing the original patch D11476.

In D22900#501597, @aaboud wrote:

Reverting rL244207, would be fine if we assure that PR24235, is fixed in another way.
I added Reid who helped reviewing the original patch D11476.

ISTM that the DWARF spec intended such thunks to be encoded as DW_AT_trampoline. That seems more appropriate than relying on codegen emitting a tailcall. This way the debugger can make the policy decision of whether or not thunks should show up in the backtrace.

In any case, correctness must always trump all else. Reverting to green should take precedence over a QoI bug like PR24235.

ISTM that the DWARF spec intended such thunks to be encoded as DW_AT_trampoline. That seems more appropriate than relying on codegen emitting a tailcall. This way the debugger can make the policy decision of whether or not thunks should show up in the backtrace.

In any case, correctness must always trump all else. Reverting to green should take precedence over a QoI bug like PR24235.

I agree to the revert, though I am not sure about the new test, it looks too complected, especially the command line.
I will let David decide on accepting that test or ask for improvement.

Regards,
Amjad

I think the correctness problem is in DSE. It should be safe for clang to put 'tail' here. The tail call does not capture the address of any local variables, but it does load from them via 'byval'.

I think Nick was responsible for adding this DSE optimization *years* ago.

So, if clang were to use a temporary alloca for the byval parameter, then yes, I agree marking it as a tail call would be incorrect. However, clang doesn't use an alloca, it forwards the byval pointer parameter directly to the callee:

define i32 @_ZThn4_N1C4SeekE6_LARGE(%class.C* nocapture readnone %this, %union._LARGE* byval nocapture readonly align 4 %L) unnamed_addr #0 align 2 {
entry:
  %call = tail call i32 @_ZN1C4SeekE6_LARGE(%class.C* undef, %union._LARGE* byval nonnull align 4 %L)
  ret i32 %call
}

Maybe the test case is over-reduced, or the problematic IR was produced by an older version of clang?

For the IR level, I think this has got to be valid:

declare void @bar(i32* byval %p)
define void @foo(i32* byval %p) {
  tail call void @bar(i32* byval %p.tmp)
  ret void
}

The tail is an aliasing property which indicates that the callee doesn't touch any of the alloca's in the caller, a rough proxy for "stack" since there is no other stack in LLVM IR. That doesn't mean that we're going to actually lower it to a tail call in the final assembly, but if we don't put tail on a call, we won't even consider it for becoming a tail call. I think the backend can sort out whether the byval is going to lead to a stack allocation in @foo and emit a non-tail call if so.

Revision Contents

Path

Size

lib/

CodeGen/

CGVTables.cpp

2 lines

test/

CodeGenCXX/

microsoft-abi-structors.cpp

3 lines

tail-byval.cpp

49 lines

Diff 66075

lib/CodeGen/CGVTables.cpp

Show First 20 Lines • Show All 316 Lines • ▼ Show 20 Lines	#endif

// Now emit our call.		// Now emit our call.
llvm::Instruction *CallOrInvoke;		llvm::Instruction *CallOrInvoke;
RValue RV = EmitCall(*CurFnInfo, Callee, Slot, CallArgs, MD, &CallOrInvoke);		RValue RV = EmitCall(*CurFnInfo, Callee, Slot, CallArgs, MD, &CallOrInvoke);

// Consider return adjustment if we have ThunkInfo.		// Consider return adjustment if we have ThunkInfo.
if (Thunk && !Thunk->Return.isEmpty())		if (Thunk && !Thunk->Return.isEmpty())
RV = PerformReturnAdjustment(this, ResultType, RV, Thunk);		RV = PerformReturnAdjustment(this, ResultType, RV, Thunk);
else if (llvm::CallInst* Call = dyn_cast<llvm::CallInst>(CallOrInvoke))
Call->setTailCallKind(llvm::CallInst::TCK_Tail);

// Emit return.		// Emit return.
if (!ResultType->isVoidType() && Slot.isNull())		if (!ResultType->isVoidType() && Slot.isNull())
CGM.getCXXABI().EmitReturnFromThunk(*this, RV, ResultType);		CGM.getCXXABI().EmitReturnFromThunk(*this, RV, ResultType);

// Disable the final ARC autorelease.		// Disable the final ARC autorelease.
AutoreleaseResult = false;		AutoreleaseResult = false;

▲ Show 20 Lines • Show All 646 Lines • Show Last 20 Lines

test/CodeGenCXX/microsoft-abi-structors.cpp

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	void foo() {			void foo() {
	C c;			C c;
	}			}
	// DTORS2-LABEL: define linkonce_odr x86_thiscallcc i8* @"\01??_EC@dtor_in_second_nvbase@@W3AEPAXI@Z"			// DTORS2-LABEL: define linkonce_odr x86_thiscallcc i8* @"\01??_EC@dtor_in_second_nvbase@@W3AEPAXI@Z"
	// DTORS2: (%"struct.dtor_in_second_nvbase::C"* %this, i32 %should_call_delete)			// DTORS2: (%"struct.dtor_in_second_nvbase::C"* %this, i32 %should_call_delete)
	// Do an adjustment from B* to C*.			// Do an adjustment from B* to C*.
	// DTORS2: getelementptr i8, i8* %{{.*}}, i32 -4			// DTORS2: getelementptr i8, i8* %{{.*}}, i32 -4
	// DTORS2: bitcast i8* %{{.}} to %"struct.dtor_in_second_nvbase::C"			// DTORS2: bitcast i8* %{{.}} to %"struct.dtor_in_second_nvbase::C"
	// DTORS2: %[[CALL:.]] = tail call x86_thiscallcc i8 @"\01??_GC@dtor_in_second_nvbase@@UAEPAXI@Z"			// DTORS2: %[[CALL:.]] = call x86_thiscallcc i8 @"\01??_GC@dtor_in_second_nvbase@@UAEPAXI@Z"
	// DTORS2: ret i8* %[[CALL]]			// DTORS2: ret i8* %[[CALL]]

	}			}

	namespace test2 {			namespace test2 {
	// Just like dtor_in_second_nvbase, except put that in a vbase of a diamond.			// Just like dtor_in_second_nvbase, except put that in a vbase of a diamond.

	// C's dtor is in the non-primary base.			// C's dtor is in the non-primary base.
	struct A { virtual void f(); };			struct A { virtual void f(); };
	struct B { virtual ~B(); };			struct B { virtual ~B(); };
	▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

test/CodeGenCXX/tail-byval.cpp

This file was added.

				// RUN: %clang_cc1 %s -I%S -isystem %S/Inputs -emit-llvm -triple i386-apple-darwin9 -Wno-incompatible-ms-struct -o - -Os \| opt - -dse -S -o - \| FileCheck %s
				#pragma ms_struct on

				#include <stdint.h>

				extern "C" int rand();

				typedef union _LARGE {
				struct {
				uint32_t Low;
				int32_t High;
				} u;
				int64_t Quad;
				} LARGE;

				class CRepro {
				protected:
				CRepro();
				virtual ~CRepro() {}
				};

				class I {
				public:
				virtual uint32_t Seek(LARGE L) = 0;
				};

				class CBase : public CRepro {
				protected:
				CBase();
				virtual ~CBase() {};
				};

				class C : public CBase, public I {
				public:
				__attribute__((noinline))
				uint32_t Seek(LARGE L);

				private:
				C();
				~C() {};
				};

				uint32_t C::Seek(LARGE L) {
				return L.u.Low + L.u.High;
				}
				// CHECK: define i32 @_ZThn4_N1C4SeekE6_LARGE
				// CHECK: store i64
				// CHECK-NOT: tail call
				// CHECK: ret