Page MenuHomePhabricator

Revert r244207 - Mark calls in thunk functions as tail-call optimization
Needs ReviewPublic

Authored by Gerolf on Jul 27 2016, 7:15 PM.

Details

Reviewers
eli.friedman
Summary

This is just closing the loop for
https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg28837.html with a
test case and fixes PR28748 which had been introduced by r244207.

Diff Detail

Event Timeline

Gerolf updated this revision to Diff 65858.Jul 27 2016, 7:15 PM
Gerolf retitled this revision from to Revert r244207 - Mark calls in thunk functions as tail-call optimization.
Gerolf updated this object.
Gerolf added reviewers: eli.friedman, mkuper.
Gerolf added a subscriber: cfe-commits.

The test seems a little large, the following shows that we emit a tail call with a byval argument on trunk.

struct LARGE {
  union {
    int i;
  };
};
struct I {
  virtual void m_fn1(LARGE);
};
struct CBase {
  virtual ~CBase();
};
struct C : CBase, I {
  void Seek(LARGE);
};
void C::Seek(LARGE) {}

Please add the options you used to compile? I can certainly shrink the test case a bit before I commit.

Please add the options you used to compile? I can certainly shrink the test case a bit before I commit.

clang -cc1 -x c++ -emit-llvm -triple i386-apple-darwin9 t.ii -o -

Nope, I don't see the tail call. Anyway, I'll simplify my test case. Don't worry about it.

clang++ -cc1 -x c++ -emit-llvm -triple i386-apple-darwin9 t.cpp

cat t.ll:

; ModuleID = 't.cpp'
source_filename = "t.cpp"
target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
target triple = "i386-apple-darwin9"

%struct.C = type { %struct.CBase, %struct.I }
%struct.CBase = type { i32 (...) }
%struct.I = type { i32 (...)
}
%struct.LARGE = type { %union.anon }
%union.anon = type { i32 }

; Function Attrs: nounwind
define void @_ZN1C4SeekE5LARGE(%struct.C* %this, %struct.LARGE* byval align 4) #0 align 2 {
entry:

%this.addr = alloca %struct.C*, align 4
store %struct.C* %this, %struct.C** %this.addr, align 4
%this1 = load %struct.C*, %struct.C** %this.addr, align 4
ret void

}

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-features"="+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

Nope, I don't see the tail call. Anyway, I'll simplify my test case. Don't worry about it.

clang++ -cc1 -x c++ -emit-llvm -triple i386-apple-darwin9 t.cpp

cat t.ll:

; ModuleID = 't.cpp'
source_filename = "t.cpp"
target datalayout = "e-m:o-p:32:32-f64:32:64-f80:128-n8:16:32-S128"
target triple = "i386-apple-darwin9"

%struct.C = type { %struct.CBase, %struct.I }
%struct.CBase = type { i32 (...) }
%struct.I = type { i32 (...)
}
%struct.LARGE = type { %union.anon }
%union.anon = type { i32 }

; Function Attrs: nounwind
define void @_ZN1C4SeekE5LARGE(%struct.C* %this, %struct.LARGE* byval align 4) #0 align 2 {
entry:

%this.addr = alloca %struct.C*, align 4
store %struct.C* %this, %struct.C** %this.addr, align 4
%this1 = load %struct.C*, %struct.C** %this.addr, align 4
ret void

}

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-features"="+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

D'oh, that m_fn1 should have been Seek :/

$ cat t.ii

struct LARGE {
  union {
    int i;
  };
};
struct I {
  virtual void m_fn1(LARGE);
};
struct CBase {
  virtual ~CBase();
};
struct C : CBase, I {
  void m_fn1(LARGE);
};
void C::m_fn1(LARGE) {}

$ ~/llvm/Debug+Asserts/bin/clang -cc1 -x c++ -emit-llvm -triple i386-apple-darwin9 t.ii && grep tail t.ll

tail call void @_ZN1C5m_fn1E5LARGE(%struct.C* %3, %struct.LARGE* byval align 4 %0)
Gerolf updated this revision to Diff 66075.Jul 28 2016, 8:52 PM

Reduced test case.

mkuper resigned from this revision.Jul 29 2016, 10:10 AM
mkuper removed a reviewer: mkuper.

I really don't understand anything about this. :-)

Reverting rL244207, would be fine if we assure that PR24235, is fixed in another way.
I added Reid who helped reviewing the original patch D11476.

Reverting rL244207, would be fine if we assure that PR24235, is fixed in another way.
I added Reid who helped reviewing the original patch D11476.

ISTM that the DWARF spec intended such thunks to be encoded as DW_AT_trampoline. That seems more appropriate than relying on codegen emitting a tailcall. This way the debugger can make the policy decision of whether or not thunks should show up in the backtrace.

In any case, correctness must always trump all else. Reverting to green should take precedence over a QoI bug like PR24235.

ISTM that the DWARF spec intended such thunks to be encoded as DW_AT_trampoline. That seems more appropriate than relying on codegen emitting a tailcall. This way the debugger can make the policy decision of whether or not thunks should show up in the backtrace.

In any case, correctness must always trump all else. Reverting to green should take precedence over a QoI bug like PR24235.

I agree to the revert, though I am not sure about the new test, it looks too complected, especially the command line.
I will let David decide on accepting that test or ask for improvement.

Regards,
Amjad

rnk added a comment.Aug 1 2016, 10:53 AM

I think the correctness problem is in DSE. It should be safe for clang to put 'tail' here. The tail call does not capture the address of any local variables, but it does load from them via 'byval'.

rnk added a subscriber: nlewycky.Aug 1 2016, 10:53 AM

I think Nick was responsible for adding this DSE optimization *years* ago.

rnk added a comment.Aug 1 2016, 12:55 PM

So, if clang were to use a temporary alloca for the byval parameter, then yes, I agree marking it as a tail call would be incorrect. However, clang doesn't use an alloca, it forwards the byval pointer parameter directly to the callee:

define i32 @_ZThn4_N1C4SeekE6_LARGE(%class.C* nocapture readnone %this, %union._LARGE* byval nocapture readonly align 4 %L) unnamed_addr #0 align 2 {
entry:
  %call = tail call i32 @_ZN1C4SeekE6_LARGE(%class.C* undef, %union._LARGE* byval nonnull align 4 %L)
  ret i32 %call
}

Maybe the test case is over-reduced, or the problematic IR was produced by an older version of clang?

For the IR level, I think this has got to be valid:

declare void @bar(i32* byval %p)
define void @foo(i32* byval %p) {
  tail call void @bar(i32* byval %p.tmp)
  ret void
}

The tail is an aliasing property which indicates that the callee doesn't touch any of the alloca's in the caller, a rough proxy for "stack" since there is no other stack in LLVM IR. That doesn't mean that we're going to actually lower it to a tail call in the final assembly, but if we don't put tail on a call, we won't even consider it for becoming a tail call. I think the backend can sort out whether the byval is going to lead to a stack allocation in @foo and emit a non-tail call if so.