This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
7/15
TailRecursionElimination.cpp
-
test/Transforms/TailCallElim/
-
Transforms/
-
TailCallElim/
-
basic.ll
-
tre-byval-parameter-2.ll
-
tre-byval-parameter.ll
-
tre-multiple-exits.ll
-
tre-noncapturing-alloca-calls.ll

Differential D85614

[TRE] Reland: allow TRE for non-capturing calls.
ClosedPublic

Authored by avl on Aug 9 2020, 9:57 AM.

Download Raw Diff

Details

Reviewers

efriedma
jdoerfert
laytonio

Commits

rG10c2e261598a: [TRE] Reland: allow TRE for non-capturing calls.

Summary

[TRE] Reland: allow TRE for non-capturing calls.

The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap.
This patch does the same as D82085 plus fixes bootstrap error.

The problem with D82085 is that it does not create copies for byval
operands, while replacing function call with a branch.

Consider following example:

int zoo ( S p1 );

int foo ( int count, S p1 ) {
  if ( count > 10 )
    return zoo(p1);

  // temporarily variable created for passing byvalue parameter
  // p1 could be used when zoo(p1) is called(after TRE is done).
  // lifetime.start p1.byvalue.temp
  return foo(count+1, p1);
  // lifetime.end p1.byvalue.temp
}

After recursive call to foo is replaced with a jump into
start of the function, its parameters could be passed to
zoo function. i.e. temporarily variable created for byvalue
parameter "p1" could be passed to zoo. Finally zoo receives
broken operand:

int foo ( int count, S p1 ) {
:tailrecurse
  p1_tr = phi p1, p1.byvalue.temp
  if ( count > 10 )
    return zoo(p1_tr);

  // temporarily variable created for passing byvalue parameter
  // p1 could be used when zoo(p1) is called(after TRE is done).
  lifetime.start p1.byvalue.temp
  memcpy (p1.byvalue.temp, p1_tr)
  count = count + 1
  lifetime.end p1.byvalue.temp
  br tailrecurse
}

To prevent using p1.byvalue.temp after its scope finished by
lifetime.end marker this patch copies value from p1.byvalue.temp
into another temporarily variable and uses this variable on
next iteration.

This patch passes bootstrap build and bootstrap build with AddressSanitizer.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

avl created this revision.Aug 9 2020, 9:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2020, 9:57 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

avl requested review of this revision.Aug 9 2020, 9:57 AM

Harbormaster completed remote builds in B67637: Diff 284217.Aug 9 2020, 10:26 AM

ping.

@efriedma would you mind to take a look at this patch once more, please?

ping.

Among all the attributes that can be attached to an argument of a tail call, byval is unusual. It significantly changes the semantics of the call: when the call executes, instead of passing the pointer itself, it makes a copy of the data behind the pointer.

Consider, for example, the following testcase on x86-64:

typedef struct A { long long x[100]; } A;
A global;
int printf(const char*, ...);
void dostuff(A arg, int i) { if (i==100) return; arg.x[5]++; printf("%lld\n", arg.x[i]); dostuff(global, i+1); }
__attribute((optnone)) int main() { dostuff(global, 0); }

dostuff has a byval argument. At the point of the call, the compiled code is supposed to copy that argument onto the stack, so in the callee the stack contains the values of all the array elements. If this is working correctly, the fifth line of the output will be "1": each call copies the value from the global, so we only ever increment the array element once. (With optimizations enabled, clang currently gets this wrong.)

If we're doing tail-call elimination, in general, we need to make this implicit copy explicit. One way to implement this is to generate a pair of copies for each byval argument. First, create a temporary alloca corresponding to each byval argument. Then, copy each byval argument to the call to its temporary alloca. Finally, copy each byval argument to the call from the temporaries to the memory originally allocated for the function's arguments.

The copies have to happen in the order I described in case a byval argument to the call refers to a different byval argument in the caller. For example, if you have something along the lines of void dostuff(A arg, Arg b) { dostuff(b, a); }

You might be able to take shortcuts in some cases. For example, if every byval argument to the call is just forwarding on the corresponding byval argument to the caller, you can just reuse the memory without making any copies. Or if you don't want to implement the general case, it would be fine to make TRE just bail if there's a byval argument.

Hopefully that explains why messing with the lifetime markers isn't the correct solution.

Among all the attributes that can be attached to an argument of a tail call, byval is unusual. It significantly changes the semantics of the call: when the call executes, instead of passing the pointer itself, it makes a copy of the data behind the pointer.

Consider, for example, the following testcase on x86-64:

typedef struct A { long long x[100]; } A;
A global;
int printf(const char*, ...);
void dostuff(A arg, int i) { if (i==100) return; arg.x[5]++; printf("%lld\n", arg.x[i]); dostuff(global, i+1); }
__attribute((optnone)) int main() { dostuff(global, 0); }

dostuff has a byval argument. At the point of the call, the compiled code is supposed to copy that argument onto the stack, so in the callee the stack contains the values of all the array elements. If this is working correctly, the fifth line of the output will be "1": each call copies the value from the global, so we only ever increment the array element once. (With optimizations enabled, clang currently gets this wrong.)

It looks like current patch works correctly with this test case.

If we're doing tail-call elimination, in general, we need to make this implicit copy explicit. One way to implement this is to generate a pair of copies for each byval argument. First, create a temporary alloca corresponding to each byval argument. Then, copy each byval argument to the call to its temporary alloca. Finally, copy each byval argument to the call from the temporaries to the memory originally allocated for the function's arguments.

The copies have to happen in the order I described in case a byval argument to the call refers to a different byval argument in the caller. For example, if you have something along the lines of void dostuff(A arg, Arg b) { dostuff(b, a); }

yep. this patch works incorrectly for "dostuff(A arg, Arg b) { dostuff(b, a); }" case. Thank you for pointing that!
It requires additional temp to store byval argument. will update the patch.

addressed comment. implicit copy replaced with explicit one.

avl edited the summary of this revision. (Show Details)Oct 7 2020, 10:29 AM

avl edited the summary of this revision. (Show Details)

efriedma added inline comments.Oct 7 2020, 10:40 AM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
600	This is the right direction, but I'm not sure it does the right thing in general. In particular, it's possible that `CI->getArgOperand(OpndIdx)` points to one of the allocas created by createTempForByValOperand(); if it does, then this copy might be clobbering data you need for subsequent copies. Again, consider the `void dostuff(A arg, Arg b) { dostuff(b, a); }` case. This is why the general procedure I outlined has an extra step: we copy the temporary allocas to the original byval arguments, and use the byval arguments as the operands to the PHI. That way, `CI->getArgOperand(OpndIdx)` never points to one of the allocas allocated by createTempForByValOperand.

Harbormaster completed remote builds in B74310: Diff 296732.Oct 7 2020, 11:03 AM

avl added inline comments.Oct 7 2020, 11:57 AM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

600

This is the right direction, but I'm not sure it does the right thing in general.

It looks like "void dostuff(A arg, Arg b) { dostuff(b, a); }" test case is working correctly:

$ cat test.cpp
#include <stdio.h>
typedef struct A { long long x[10] = {0}; } A;
A global;
void dostuff(A a, A b, int i) { if (i==10) return; a.x[5]++; printf("%lld %lld\n", a.x[5], b.x[5]); dostuff(b, a, i+1); }
__attribute((optnone)) int main() { dostuff(global, global, 0); }
$ clang++ test.cpp -O3
$ ./a.out
1 0
1 1
2 1
2 2
3 2
3 3
4 3
4 4
5 4
5 5

In particular, it's possible that CI->getArgOperand(OpndIdx) points to one of the allocas created by createTempForByValOperand(); if it does, then this copy might be clobbering data you need for subsequent copies. Again, consider the void dostuff(A arg, Arg b) { dostuff(b, a); } case.

This is why the general procedure I outlined has an extra step: we copy the temporary allocas to the original byval arguments, and use the byval arguments as the operands to the PHI. That way, CI->getArgOperand(OpndIdx) never points to one of the allocas allocated by createTempForByValOperand.

If I am not mistaken the CI->getArgOperand(OpndIdx) could not point to one of the allocas created by createTempForByValOperand(). There is not setArgOperand() which set operand to newly created temps by createTempForByValOperand(). Thus CI->getArgOperand(OpndIdx) always point to the alloca created for recursive call to dostuff().

For the ""void dostuff(A arg, Arg b) { dostuff(b, a); }" test case:

before tre:

%2 = bitcast %struct.A* %agg.tmp to i8*
%3 = bitcast %struct.A* %b to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false), !tbaa.struct !6
^^^^^^^^^^^^^^^^ copy to %agg.tmp which is first byval argument
%4 = bitcast %struct.A* %agg.tmp5 to i8*
%5 = bitcast %struct.A* %a to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false), !tbaa.struct !6
^^^^^^^^^^^^^^^^ copy to %agg.tmp5 which is second byval argument
%add = add nsw i32 %i, 1
call void @_Z7dostuff1AS_i(%struct.A* nonnull byval(%struct.A) align 8 %agg.tmp, %struct.A* nonnull byval(%struct.A) align 8 %agg.tmp5, i32 %add)

after tre:

tailrecurse:                                      ; preds = %if.end, %entry
  %a.tr = phi %struct.A* [ %a, %entry ], [ %agg.tmp7, %if.end ]
  %b.tr = phi %struct.A* [ %b, %entry ], [ %agg.tmp58, %if.end ]

if.end:
...
  %2 = bitcast %struct.A* %agg.tmp to i8*
  %3 = bitcast %struct.A* %b.tr to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false), !tbaa.struct !6
  ^^^^^^^^^^^^^^^^ copy to %agg.tmp which is first byval argument
  %4 = bitcast %struct.A* %agg.tmp5 to i8*
  %5 = bitcast %struct.A* %a.tr to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false), !tbaa.struct !6
  ^^^^^^^^^^^^^^^^ copy to %agg.tmp5 which is second byval argument
  %add = add nsw i32 %i.tr, 1
  %6 = bitcast %struct.A* %agg.tmp7 to i8*
  %7 = bitcast %struct.A* %agg.tmp to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %6, i8* align 8 %7, i64 80, i1 false)
  ^^^^^^^^^^^^^^^^ copy to the agg.tmp7 which is used as input for next iteration
  %8 = bitcast %struct.A* %agg.tmp58 to i8*
  %9 = bitcast %struct.A* %agg.tmp5 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %8, i8* align 8 %9, i64 80, i1 false)
  ^^^^^^^^^^^^^^^^ copy to the agg.tmp58 which is used as input for next iteration
  br label %tailrecurse

Probably the only difference with your description is that this patch creates additional temporarily variables(agg.tmp7/agg.tmp58) while it could use locations of a/b parameters instead of newly created temps.

efriedma added inline comments.Oct 7 2020, 12:26 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
600	I guess if you're looking at code emitted directly by clang, you'll end up with "extra" aggregates due to the way clang does call lowering. That said, the temporary aggregates aren't actually required; optimizations like memcpyopt can remove them. For example, on master, "clang -O2 -emit-llvm" produces a call like `tail call void @_Z7dostuff1AS_i(%struct.A* nonnull byval(%struct.A) align 4 %b, %struct.A* nonnull byval(%struct.A) align 4 %a, i32 %add)`.

avl added inline comments.Oct 7 2020, 2:10 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
600	right. But for the case "void dostuff(A arg, Arg b) { dostuff(b, a); }" these temps and memcpy are not useless and then they would not be removed by memcpyopt. Anyway, minimizing number of temps is also useful thing. Will update a patch to use function arguments locations instead of created temps.

efriedma added inline comments.Oct 7 2020, 2:22 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
600	Take a look at the output of `clang -O2 -emit-llvm` for the following: `typedef struct { int x[100]; } A; void foo(A, A); void bar(A a, A b) { return foo(b, a); }`. No memcpy.

avl added inline comments.Oct 7 2020, 2:41 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
600	For above test case that is probably correct(since foo is a last call). But for that test case : typedef struct { int x[100]; } A; void foo(A, A); void bar(A a, A b) { foo(b, a); foo(b,a); } define void @_Z3bar1AS_(%struct.A* byval nocapture readonly align 8, %struct.A* byval nocapture readonly align 8) local_unnamed_addr #0 { call void @_Z3foo1AS_(%struct.A* byval nonnull align 8 %1, %struct.A* byval nonnull align 8 %0) call void @_Z3foo1AS_(%struct.A* byval nonnull align 8 %1, %struct.A* byval nonnull align 8 %0) ret void } What if first foo writes at byvalue location ? In this case second call to foo() would receive incorrect arguments. Is it correct behavior ?

efriedma added inline comments.Oct 7 2020, 2:49 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
600	byval implicitly makes a copy of the underlying memory. If foo modifies the memory, it's modifying its own copy, not the version in bar.

avl added inline comments.Oct 7 2020, 3:12 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

600

looks like I lost a bit. That : "byval implicitly makes a copy of the underlying memory" is understood. So for that test case "typedef struct { int x[100]; } A; void foo(A, A); void bar(A a, A b) { return foo(b, a); }" -mllvm -print-after-all shows :

%agg.tmp = alloca %struct.A, align 8
%agg.tmp1 = alloca %struct.A, align 8
%0 = bitcast %struct.A* %agg.tmp to i8*
%1 = bitcast %struct.A* %b to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %0, i8* align 8 %1, i64 400, i1 false), !tbaa.struct !2
%2 = bitcast %struct.A* %agg.tmp1 to i8*
%3 = bitcast %struct.A* %a to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %2, i8* align 8 %3, i64 400, i1 false), !tbaa.struct !2
call void @_Z3foo1AS_(%struct.A* byval(%struct.A) align 8 %agg.tmp, %struct.A* byval(%struct.A) align 8 %agg.tmp

Why it is important that output of "clang -O2 -emit-llvm" for the same test case does not contain calls to memcpy ?

efriedma added inline comments.Oct 7 2020, 3:18 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
600	I'm not completely sure what you're asking. In general, we expect that the TRE pass should be able to handle arbitrary LLVM IR, even if clang with the standard pass pipeline can't generate IR with that pattern. So TRE needs to be able to handle the form where the byval argument is passed directly to a tail call.

use incoming function arguments area for storing byvalue operand of the recursive call.

avl added inline comments.Oct 8 2020, 2:07 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

600

In general, we expect that the TRE pass should be able to handle arbitrary LLVM IR, even if clang with the standard pass pipeline can't generate IR with that pattern. So TRE needs to be able to handle the form where the byval argument is passed directly to a tail call.

The IR for the testcase "typedef struct { int x[100]; } A; void foo(A, A); void bar(A a, A b) { return foo(b, a); }" before TRE loos like this currently :

%0 = bitcast %struct.A* %agg.tmp to i8*
%1 = bitcast %struct.A* %b to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(400) %0, i8* nonnull align 8 dereferenceable(400) %1, i64 400, i1 false), !tbaa.struct !2
%2 = bitcast %struct.A* %agg.tmp1 to i8*
%3 = bitcast %struct.A* %a to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(400) %2, i8* nonnull align 8 dereferenceable(400) %3, i64 400, i1 false), !tbaa.struct !2
tail call void @_Z3foo1AS_(%struct.A* nonnull byval(%struct.A) align 8 %agg.tmp, %struct.A* nonnull byval(%struct.A) align 8 %agg.tmp1)

Did I correctly understand that TRE should also correctly work if above memcpy calls are not generated ?

Harbormaster completed remote builds in B74485: Diff 297049.Oct 8 2020, 2:29 PM

efriedma added inline comments.Oct 8 2020, 3:07 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
600	Right, it should still work if the memcpy calls are not generated, or optimized out. For example, we shouldn't miscompile if someone writes `clang foo.c -o - -S -emit-llvm -O2 -mllvm -disable-llvm-optzns \| opt -S -sroa -memcpyopt -instcombine -tailcallelim`. Usually, memcpyopt doesn't run before tailcallelim, but allowing users to specify arbitrary optimizations in any order is part of what makes LLVM IR transforms flexible.

Implemented that algorithm: First, create a temporary alloca
corresponding to each byval argument. Then, copy each byval
argument to the call to its temporary alloca. Finally, copy
each byval argument to the call from the temporaries to the
memory originally allocated for the function's arguments.

Harbormaster completed remote builds in B74786: Diff 297566.Oct 12 2020, 6:56 AM

efriedma added inline comments.Oct 20 2020, 12:18 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
610	I think you need to be more careful with alignment here: it's UB if you specify alignment higher than the actual alignment of the value. For byval arguments with the alignment set, the alignment is exactly that. For byval arguments where the alignment attribute is missing, probably fine to just refuse to do TRE. It's an edge case which shouldn't really come up in practice.

avl added inline comments.Oct 21 2020, 7:57 AM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
610	I think you need to be more careful with alignment here: it's UB if you specify alignment higher than the actual alignment of the value. For byval arguments with the alignment set, the alignment is exactly that. so, instead of this: Align Alignment(DL.getPrefTypeAlignment(AggTy)); Alignment = max(Alignment, MaybeAlign(CI->getParamAlign(OpndIdx))); I need to do just this : Align Alignment(DL.getPrefTypeAlignment(AggTy)); right? For byval arguments where the alignment attribute is missing, probably fine to just refuse to do TRE. It's an edge case which shouldn't really come up in practice. If alignment attribute is missing would it be OK to use Align(1) ?

avl added inline comments.Oct 21 2020, 9:41 AM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
610	i.e. both above conditions would look like this: Align StackSlotAlignment(max(1, MaybeAlign(CI->getParamAlign(OpndIdx)))); Align FinalAlignment(min(DL.getPrefTypeAlignment(AggTy), StackSlotAlignment)); would it be correct?

efriedma added inline comments.Oct 21 2020, 2:04 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
610	I don't think you need to mix in getPrefTypeAlignment. I mean, it's not wrong to use a lower alignment than StackSlotAlignment, but there isn't any reason you'd want to. Instead of `max(1, MaybeAlign(CI->getParamAlign(OpndIdx))`, you'd write `CI->getParamAlign(OpndIdx).valueOrOne()`, but sure, that's correct. The code quality might be a bit iffy for byval without alignment, but nothing should be generating code like that anyway, so I guess it's fine.

corrected byvalue operand`s alignment calculation.

Harbormaster completed remote builds in B76052: Diff 300008.Oct 22 2020, 9:52 AM

Sorry I didn't spot this sooner, but the current version of the memcpy-to-temp still isn't quite right. The operations aren't in the right order: you have to do all the copies to temporaries before any of the copies to arguments.

changed patch so that all temporarily variables created first,
and then copy these variables into incoming parameters.

Harbormaster completed remote builds in B76189: Diff 300278.Oct 23 2020, 7:27 AM

ping.

ping

ping.

rebased. retested.

Harbormaster completed remote builds in B96115: Diff 333871.Mar 29 2021, 8:45 AM

@efriedma Would you mind to take a look at this review, please? It looks like it is finally addressed all comments.

LGTM. Sorry about the delay. Thanks for working through all the issues here.

This revision is now accepted and ready to land.May 4 2021, 11:27 AM

@efriedma Thank you for the review.

rebased to have a buildbot report.

Harbormaster completed remote builds in B105944: Diff 347442.May 24 2021, 10:53 AM

This revision was landed with ongoing or failed builds.May 25 2021, 1:37 AM

Closed by commit rG10c2e261598a: [TRE] Reland: allow TRE for non-capturing calls. (authored by avl). · Explain Why

This revision was automatically updated to reflect the committed changes.

avl added a commit: rG10c2e261598a: [TRE] Reland: allow TRE for non-capturing calls..

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

TailRecursionElimination.cpp

136 lines

test/

Transforms/

TailCallElim/

basic.ll

7 lines

tre-byval-parameter-2.ll

155 lines

tre-byval-parameter.ll

123 lines

tre-multiple-exits.ll

125 lines

tre-noncapturing-alloca-calls.ll

74 lines

Diff 296732

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
#include "llvm/Analysis/DomTreeUpdater.h"		#include "llvm/Analysis/DomTreeUpdater.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/Loads.h"		#include "llvm/Analysis/Loads.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
		#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "tailcallelim"		#define DEBUG_TYPE "tailcallelim"

STATISTIC(NumEliminated, "Number of tail calls removed");		STATISTIC(NumEliminated, "Number of tail calls removed");
STATISTIC(NumRetDuped, "Number of return duplicated");		STATISTIC(NumRetDuped, "Number of return duplicated");
STATISTIC(NumAccumAdded, "Number of accumulators introduced");		STATISTIC(NumAccumAdded, "Number of accumulators introduced");

/// Scan the specified function for alloca instructions.		/// Scan the specified function for alloca instructions.
/// If it contains any dynamic allocas, returns false.		/// If it contains any dynamic allocas, returns false.
static bool canTRE(Function &F) {		static bool canTRE(Function &F) {
// FIXME: The code generator produces really bad code when an 'escaping		// TODO: We don't do TRE if dynamic allocas are used.
// alloca' is changed from being a static alloca to being a dynamic alloca.		// Dynamic allocas allocate stack space which should be
// Until this is resolved, disable this transformation if that would ever		// deallocated before new iteration started. That is
// happen. This bug is PR962.		// currently not implemented.
return llvm::all_of(instructions(F), [](Instruction &I) {		return llvm::all_of(instructions(F), [](Instruction &I) {
auto *AI = dyn_cast<AllocaInst>(&I);		auto *AI = dyn_cast<AllocaInst>(&I);
return !AI \|\| AI->isStaticAlloca();		return !AI \|\| AI->isStaticAlloca();
});		});
}		}

namespace {		namespace {
struct AllocaDerivedValueTracker {		struct AllocaDerivedValueTracker {
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	if (!CB.onlyReadsMemory())
EscapePoints.insert(&CB);		EscapePoints.insert(&CB);
}		}

SmallPtrSet<Instruction *, 32> AllocaUsers;		SmallPtrSet<Instruction *, 32> AllocaUsers;
SmallPtrSet<Instruction *, 32> EscapePoints;		SmallPtrSet<Instruction *, 32> EscapePoints;
};		};
}		}

static bool markTails(Function &F, bool &AllCallsAreTailCalls,		static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) {
OptimizationRemarkEmitter *ORE) {
if (F.callsFunctionThatReturnsTwice())		if (F.callsFunctionThatReturnsTwice())
return false;		return false;
AllCallsAreTailCalls = true;

// The local stack holds all alloca instructions and all byval arguments.		// The local stack holds all alloca instructions and all byval arguments.
AllocaDerivedValueTracker Tracker;		AllocaDerivedValueTracker Tracker;
for (Argument &Arg : F.args()) {		for (Argument &Arg : F.args()) {
if (Arg.hasByValAttr())		if (Arg.hasByValAttr())
Tracker.walk(&Arg);		Tracker.walk(&Arg);
}		}
for (auto &BB : F) {		for (auto &BB : F) {
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	for (auto &I : *BB) {
<< "marked as tail call candidate (readnone)";		<< "marked as tail call candidate (readnone)";
});		});
CI->setTailCall();		CI->setTailCall();
Modified = true;		Modified = true;
continue;		continue;
}		}
}		}

if (!IsNoTail && Escaped == UNESCAPED && !Tracker.AllocaUsers.count(CI)) {		if (!IsNoTail && Escaped == UNESCAPED && !Tracker.AllocaUsers.count(CI))
DeferredTails.push_back(CI);		DeferredTails.push_back(CI);
} else {
AllCallsAreTailCalls = false;
}
}		}

for (auto *SuccBB : make_range(succ_begin(BB), succ_end(BB))) {		for (auto *SuccBB : make_range(succ_begin(BB), succ_end(BB))) {
auto &State = Visited[SuccBB];		auto &State = Visited[SuccBB];
if (State < Escaped) {		if (State < Escaped) {
State = Escaped;		State = Escaped;
if (State == ESCAPED)		if (State == ESCAPED)
WorklistEscaped.push_back(SuccBB);		WorklistEscaped.push_back(SuccBB);
Show All 20 Lines	static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) {

for (CallInst *CI : DeferredTails) {		for (CallInst *CI : DeferredTails) {
if (Visited[CI->getParent()] != ESCAPED) {		if (Visited[CI->getParent()] != ESCAPED) {
// If the escape point was part way through the block, calls after the		// If the escape point was part way through the block, calls after the
// escape point wouldn't have been put into DeferredTails.		// escape point wouldn't have been put into DeferredTails.
LLVM_DEBUG(dbgs() << "Marked as tail call candidate: " << *CI << "\n");		LLVM_DEBUG(dbgs() << "Marked as tail call candidate: " << *CI << "\n");
CI->setTailCall();		CI->setTailCall();
Modified = true;		Modified = true;
} else {
AllCallsAreTailCalls = false;
}		}
}		}

return Modified;		return Modified;
}		}

/// Return true if it is safe to move the specified		/// Return true if it is safe to move the specified
/// instruction from after the call to before the call, assuming that all		/// instruction from after the call to before the call, assuming that all
/// instructions between the call and this instruction are movable.		/// instructions between the call and this instruction are movable.
///		///
static bool canMoveAboveCall(Instruction I, CallInst CI, AliasAnalysis *AA) {		static bool canMoveAboveCall(Instruction I, CallInst CI, AliasAnalysis *AA) {
		if (isa<DbgInfoIntrinsic>(I))
		return true;

		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I))
		if (II->getIntrinsicID() == Intrinsic::lifetime_end &&
		llvm::findAllocaForValue(II->getArgOperand(1)))
		return true;

// FIXME: We can move load/store/call/free instructions above the call if the		// FIXME: We can move load/store/call/free instructions above the call if the
// call does not mod/ref the memory location being processed.		// call does not mod/ref the memory location being processed.
if (I->mayHaveSideEffects()) // This also handles volatile loads.		if (I->mayHaveSideEffects()) // This also handles volatile loads.
return false;		return false;

if (LoadInst *L = dyn_cast<LoadInst>(I)) {		if (LoadInst *L = dyn_cast<LoadInst>(I)) {
// Loads may always be moved above calls without side effects.		// Loads may always be moved above calls without side effects.
if (CI->mayHaveSideEffects()) {		if (CI->mayHaveSideEffects()) {
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	class TailRecursionEliminator {
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;
DomTreeUpdater &DTU;		DomTreeUpdater &DTU;

// The below are shared state we want to have available when eliminating any		// The below are shared state we want to have available when eliminating any
// calls in the function. There values should be populated by		// calls in the function. There values should be populated by
// createTailRecurseLoopHeader the first time we find a call we can eliminate.		// createTailRecurseLoopHeader the first time we find a call we can eliminate.
BasicBlock *HeaderBB = nullptr;		BasicBlock *HeaderBB = nullptr;
SmallVector<PHINode *, 8> ArgumentPHIs;		SmallVector<PHINode *, 8> ArgumentPHIs;
bool RemovableCallsMustBeMarkedTail = false;

// PHI node to store our return value.		// PHI node to store our return value.
PHINode *RetPN = nullptr;		PHINode *RetPN = nullptr;

// i1 PHI node to track if we have a valid return value stored in RetPN.		// i1 PHI node to track if we have a valid return value stored in RetPN.
PHINode *RetKnownPN = nullptr;		PHINode *RetKnownPN = nullptr;

// Vector of select instructions we insereted. These selects use RetKnownPN		// Vector of select instructions we insereted. These selects use RetKnownPN
Show All 10 Lines	class TailRecursionEliminator {
// The instruction doing the accumulating.		// The instruction doing the accumulating.
Instruction *AccumulatorRecursionInstr = nullptr;		Instruction *AccumulatorRecursionInstr = nullptr;

TailRecursionEliminator(Function &F, const TargetTransformInfo *TTI,		TailRecursionEliminator(Function &F, const TargetTransformInfo *TTI,
AliasAnalysis AA, OptimizationRemarkEmitter ORE,		AliasAnalysis AA, OptimizationRemarkEmitter ORE,
DomTreeUpdater &DTU)		DomTreeUpdater &DTU)
: F(F), TTI(TTI), AA(AA), ORE(ORE), DTU(DTU) {}		: F(F), TTI(TTI), AA(AA), ORE(ORE), DTU(DTU) {}

CallInst findTRECandidate(BasicBlock BB,		CallInst findTRECandidate(BasicBlock BB);
bool CannotTailCallElimCallsMarkedTail);

void createTailRecurseLoopHeader(CallInst *CI);		void createTailRecurseLoopHeader(CallInst *CI);

void insertAccumulator(Instruction *AccRecInstr);		void insertAccumulator(Instruction *AccRecInstr);

bool eliminateCall(CallInst *CI);		bool eliminateCall(CallInst *CI);

void cleanupAndFinalize();		void cleanupAndFinalize();

bool processBlock(BasicBlock &BB, bool CannotTailCallElimCallsMarkedTail);		bool processBlock(BasicBlock &BB);

		Value createTempForByValOperand(CallInst CI, int OpndIdx);

public:		public:
static bool eliminate(Function &F, const TargetTransformInfo *TTI,		static bool eliminate(Function &F, const TargetTransformInfo *TTI,
AliasAnalysis AA, OptimizationRemarkEmitter ORE,		AliasAnalysis AA, OptimizationRemarkEmitter ORE,
DomTreeUpdater &DTU);		DomTreeUpdater &DTU);
};		};
} // namespace		} // namespace

CallInst *TailRecursionEliminator::findTRECandidate(		CallInst TailRecursionEliminator::findTRECandidate(BasicBlock BB) {
BasicBlock *BB, bool CannotTailCallElimCallsMarkedTail) {
Instruction *TI = BB->getTerminator();		Instruction *TI = BB->getTerminator();

if (&BB->front() == TI) // Make sure there is something before the terminator.		if (&BB->front() == TI) // Make sure there is something before the terminator.
return nullptr;		return nullptr;

// Scan backwards from the return, checking to see if there is a tail call in		// Scan backwards from the return, checking to see if there is a tail call in
// this block. If so, set CI to it.		// this block. If so, set CI to it.
CallInst *CI = nullptr;		CallInst *CI = nullptr;
BasicBlock::iterator BBI(TI);		BasicBlock::iterator BBI(TI);
while (true) {		while (true) {
CI = dyn_cast<CallInst>(BBI);		CI = dyn_cast<CallInst>(BBI);
if (CI && CI->getCalledFunction() == &F)		if (CI && CI->getCalledFunction() == &F)
break;		break;

if (BBI == BB->begin())		if (BBI == BB->begin())
return nullptr; // Didn't find a potential tail call.		return nullptr; // Didn't find a potential tail call.
--BBI;		--BBI;
}		}

// If this call is marked as a tail call, and if there are dynamic allocas in		assert((!CI->isTailCall() \|\| !CI->isNoTailCall()) &&
// the function, we cannot perform this optimization.		"Incompatible call site attributes(Tail,NoTail)");
if (CI->isTailCall() && CannotTailCallElimCallsMarkedTail)		if (!CI->isTailCall())
return nullptr;		return nullptr;

// As a special case, detect code like this:		// As a special case, detect code like this:
// double fabs(double f) { return __builtin_fabs(f); } // a 'fabs' call		// double fabs(double f) { return __builtin_fabs(f); } // a 'fabs' call
// and disable this xform in this case, because the code generator will		// and disable this xform in this case, because the code generator will
// lower the call to fabs into inline code.		// lower the call to fabs into inline code.
if (BB == &F.getEntryBlock() &&		if (BB == &F.getEntryBlock() &&
firstNonDbg(BB->front().getIterator()) == CI &&		firstNonDbg(BB->front().getIterator()) == CI &&
Show All 15 Lines
void TailRecursionEliminator::createTailRecurseLoopHeader(CallInst *CI) {		void TailRecursionEliminator::createTailRecurseLoopHeader(CallInst *CI) {
HeaderBB = &F.getEntryBlock();		HeaderBB = &F.getEntryBlock();
BasicBlock *NewEntry = BasicBlock::Create(F.getContext(), "", &F, HeaderBB);		BasicBlock *NewEntry = BasicBlock::Create(F.getContext(), "", &F, HeaderBB);
NewEntry->takeName(HeaderBB);		NewEntry->takeName(HeaderBB);
HeaderBB->setName("tailrecurse");		HeaderBB->setName("tailrecurse");
BranchInst *BI = BranchInst::Create(HeaderBB, NewEntry);		BranchInst *BI = BranchInst::Create(HeaderBB, NewEntry);
BI->setDebugLoc(CI->getDebugLoc());		BI->setDebugLoc(CI->getDebugLoc());

// If this function has self recursive calls in the tail position where some
// are marked tail and some are not, only transform one flavor or another.
// We have to choose whether we move allocas in the entry block to the new
// entry block or not, so we can't make a good choice for both. We make this
// decision here based on whether the first call we found to remove is
// marked tail.
// NOTE: We could do slightly better here in the case that the function has
// no entry block allocas.
RemovableCallsMustBeMarkedTail = CI->isTailCall();

// If this tail call is marked 'tail' and if there are any allocas in the
// entry block, move them up to the new entry block.
if (RemovableCallsMustBeMarkedTail)
// Move all fixed sized allocas from HeaderBB to NewEntry.		// Move all fixed sized allocas from HeaderBB to NewEntry.
for (BasicBlock::iterator OEBI = HeaderBB->begin(), E = HeaderBB->end(),		for (BasicBlock::iterator OEBI = HeaderBB->begin(), E = HeaderBB->end(),
NEBI = NewEntry->begin();		NEBI = NewEntry->begin();
OEBI != E;)		OEBI != E;)
if (AllocaInst *AI = dyn_cast<AllocaInst>(OEBI++))		if (AllocaInst *AI = dyn_cast<AllocaInst>(OEBI++))
if (isa<ConstantInt>(AI->getArraySize()))		if (isa<ConstantInt>(AI->getArraySize()))
AI->moveBefore(&*NEBI);		AI->moveBefore(&*NEBI);

// Now that we have created a new block, which jumps to the entry		// Now that we have created a new block, which jumps to the entry
// block, insert a PHI node for each argument of the function.		// block, insert a PHI node for each argument of the function.
// For now, we initialize each PHI to only have the real arguments		// For now, we initialize each PHI to only have the real arguments
// which are passed in.		// which are passed in.
Instruction *InsertPos = &HeaderBB->front();		Instruction *InsertPos = &HeaderBB->front();
for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) {		for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) {
PHINode *PN =		PHINode *PN =
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	for (pred_iterator PI = PB; PI != PE; ++PI) {
} else {		} else {
AccPN->addIncoming(AccPN, P);		AccPN->addIncoming(AccPN, P);
}		}
}		}

++NumAccumAdded;		++NumAccumAdded;
}		}

		Value TailRecursionEliminator::createTempForByValOperand(CallInst CI,
		int OpndIdx) {
		PointerType *ArgTy = cast<PointerType>(CI->getArgOperand(OpndIdx)->getType());
		Type *AggTy = ArgTy->getElementType();
		const DataLayout &DL = F.getParent()->getDataLayout();

		// Calculate alignment of byVal operand.
		Align Alignment(DL.getPrefTypeAlignment(AggTy));

		// If the byval had an alignment specified, we must use at least that
		// alignment, as it is required by the byval argument (and uses of the
		// pointer inside the callee).
		Alignment = max(Alignment, MaybeAlign(CI->getParamAlign(OpndIdx)));

		// Create alloca for temporarily byval operands.
		// Put alloca into the entry block.
		Value *NewAlloca = new AllocaInst(
		AggTy, DL.getAllocaAddrSpace(), nullptr, Alignment,
		CI->getArgOperand(OpndIdx)->getName(), &*F.getEntryBlock().begin());

		IRBuilder<> Builder(CI);
		Value *Size = Builder.getInt64(DL.getTypeAllocSize(AggTy));

		// Copy data from byvalue operand into the temporarily variable.
		Builder.CreateMemCpy(NewAlloca, /DstAlign/ Alignment,
		CI->getArgOperand(OpndIdx),
		/SrcAlign/ Alignment, Size);
		efriedmaUnsubmitted Not Done Reply Inline Actions This is the right direction, but I'm not sure it does the right thing in general. In particular, it's possible that `CI->getArgOperand(OpndIdx)` points to one of the allocas created by createTempForByValOperand(); if it does, then this copy might be clobbering data you need for subsequent copies. Again, consider the `void dostuff(A arg, Arg b) { dostuff(b, a); }` case. This is why the general procedure I outlined has an extra step: we copy the temporary allocas to the original byval arguments, and use the byval arguments as the operands to the PHI. That way, `CI->getArgOperand(OpndIdx)` never points to one of the allocas allocated by createTempForByValOperand. efriedma: This is the right direction, but I'm not sure it does the right thing in general. In…
		avlAuthorUnsubmitted Done Reply Inline Actions This is the right direction, but I'm not sure it does the right thing in general. It looks like "void dostuff(A arg, Arg b) { dostuff(b, a); }" test case is working correctly: $ cat test.cpp #include <stdio.h> typedef struct A { long long x[10] = {0}; } A; A global; void dostuff(A a, A b, int i) { if (i==10) return; a.x[5]++; printf("%lld %lld\n", a.x[5], b.x[5]); dostuff(b, a, i+1); } __attribute((optnone)) int main() { dostuff(global, global, 0); } $ clang++ test.cpp -O3 $ ./a.out 1 0 1 1 2 1 2 2 3 2 3 3 4 3 4 4 5 4 5 5 In particular, it's possible that CI->getArgOperand(OpndIdx) points to one of the allocas created by createTempForByValOperand(); if it does, then this copy might be clobbering data you need for subsequent copies. Again, consider the void dostuff(A arg, Arg b) { dostuff(b, a); } case. This is why the general procedure I outlined has an extra step: we copy the temporary allocas to the original byval arguments, and use the byval arguments as the operands to the PHI. That way, CI->getArgOperand(OpndIdx) never points to one of the allocas allocated by createTempForByValOperand. If I am not mistaken the CI->getArgOperand(OpndIdx) could not point to one of the allocas created by createTempForByValOperand(). There is not setArgOperand() which set operand to newly created temps by createTempForByValOperand(). Thus CI->getArgOperand(OpndIdx) always point to the alloca created for recursive call to dostuff(). For the ""void dostuff(A arg, Arg b) { dostuff(b, a); }" test case: before tre: %2 = bitcast %struct.A* %agg.tmp to i8* %3 = bitcast %struct.A* %b to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false), !tbaa.struct !6 ^^^^^^^^^^^^^^^^ copy to %agg.tmp which is first byval argument %4 = bitcast %struct.A* %agg.tmp5 to i8* %5 = bitcast %struct.A* %a to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false), !tbaa.struct !6 ^^^^^^^^^^^^^^^^ copy to %agg.tmp5 which is second byval argument %add = add nsw i32 %i, 1 call void @_Z7dostuff1AS_i(%struct.A* nonnull byval(%struct.A) align 8 %agg.tmp, %struct.A* nonnull byval(%struct.A) align 8 %agg.tmp5, i32 %add) after tre: tailrecurse: ; preds = %if.end, %entry %a.tr = phi %struct.A* [ %a, %entry ], [ %agg.tmp7, %if.end ] %b.tr = phi %struct.A* [ %b, %entry ], [ %agg.tmp58, %if.end ] if.end: ... %2 = bitcast %struct.A* %agg.tmp to i8* %3 = bitcast %struct.A* %b.tr to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false), !tbaa.struct !6 ^^^^^^^^^^^^^^^^ copy to %agg.tmp which is first byval argument %4 = bitcast %struct.A* %agg.tmp5 to i8* %5 = bitcast %struct.A* %a.tr to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false), !tbaa.struct !6 ^^^^^^^^^^^^^^^^ copy to %agg.tmp5 which is second byval argument %add = add nsw i32 %i.tr, 1 %6 = bitcast %struct.A* %agg.tmp7 to i8* %7 = bitcast %struct.A* %agg.tmp to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %6, i8* align 8 %7, i64 80, i1 false) ^^^^^^^^^^^^^^^^ copy to the agg.tmp7 which is used as input for next iteration %8 = bitcast %struct.A* %agg.tmp58 to i8* %9 = bitcast %struct.A* %agg.tmp5 to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %8, i8* align 8 %9, i64 80, i1 false) ^^^^^^^^^^^^^^^^ copy to the agg.tmp58 which is used as input for next iteration br label %tailrecurse Probably the only difference with your description is that this patch creates additional temporarily variables(agg.tmp7/agg.tmp58) while it could use locations of a/b parameters instead of newly created temps. avl: > This is the right direction, but I'm not sure it does the right thing in general. > It…
		efriedmaUnsubmitted Not Done Reply Inline Actions I guess if you're looking at code emitted directly by clang, you'll end up with "extra" aggregates due to the way clang does call lowering. That said, the temporary aggregates aren't actually required; optimizations like memcpyopt can remove them. For example, on master, "clang -O2 -emit-llvm" produces a call like `tail call void @_Z7dostuff1AS_i(%struct.A* nonnull byval(%struct.A) align 4 %b, %struct.A* nonnull byval(%struct.A) align 4 %a, i32 %add)`. efriedma: I guess if you're looking at code emitted directly by clang, you'll end up with "extra"…
		avlAuthorUnsubmitted Done Reply Inline Actions right. But for the case "void dostuff(A arg, Arg b) { dostuff(b, a); }" these temps and memcpy are not useless and then they would not be removed by memcpyopt. Anyway, minimizing number of temps is also useful thing. Will update a patch to use function arguments locations instead of created temps. avl: right. But for the case "void dostuff(A arg, Arg b) { dostuff(b, a); }" these temps and memcpy…
		efriedmaUnsubmitted Not Done Reply Inline Actions Take a look at the output of `clang -O2 -emit-llvm` for the following: `typedef struct { int x[100]; } A; void foo(A, A); void bar(A a, A b) { return foo(b, a); }`. No memcpy. efriedma: Take a look at the output of `clang -O2 -emit-llvm` for the following: `typedef struct { int x…
		avlAuthorUnsubmitted Done Reply Inline Actions For above test case that is probably correct(since foo is a last call). But for that test case : typedef struct { int x[100]; } A; void foo(A, A); void bar(A a, A b) { foo(b, a); foo(b,a); } define void @_Z3bar1AS_(%struct.A* byval nocapture readonly align 8, %struct.A* byval nocapture readonly align 8) local_unnamed_addr #0 { call void @_Z3foo1AS_(%struct.A* byval nonnull align 8 %1, %struct.A* byval nonnull align 8 %0) call void @_Z3foo1AS_(%struct.A* byval nonnull align 8 %1, %struct.A* byval nonnull align 8 %0) ret void } What if first foo writes at byvalue location ? In this case second call to foo() would receive incorrect arguments. Is it correct behavior ? avl: For above test case that is probably correct(since foo is a last call). But for that test case…
		efriedmaUnsubmitted Not Done Reply Inline Actions byval implicitly makes a copy of the underlying memory. If foo modifies the memory, it's modifying its own copy, not the version in bar. efriedma: byval implicitly makes a copy of the underlying memory. If foo modifies the memory, it's…
		avlAuthorUnsubmitted Done Reply Inline Actions looks like I lost a bit. That : "byval implicitly makes a copy of the underlying memory" is understood. So for that test case "typedef struct { int x[100]; } A; void foo(A, A); void bar(A a, A b) { return foo(b, a); }" -mllvm -print-after-all shows : %agg.tmp = alloca %struct.A, align 8 %agg.tmp1 = alloca %struct.A, align 8 %0 = bitcast %struct.A* %agg.tmp to i8* %1 = bitcast %struct.A* %b to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %0, i8* align 8 %1, i64 400, i1 false), !tbaa.struct !2 %2 = bitcast %struct.A* %agg.tmp1 to i8* %3 = bitcast %struct.A* %a to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %2, i8* align 8 %3, i64 400, i1 false), !tbaa.struct !2 call void @_Z3foo1AS_(%struct.A* byval(%struct.A) align 8 %agg.tmp, %struct.A* byval(%struct.A) align 8 %agg.tmp Why it is important that output of "clang -O2 -emit-llvm" for the same test case does not contain calls to memcpy ? avl: looks like I lost a bit. That : "byval implicitly makes a copy of the underlying memory" is…
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not completely sure what you're asking. In general, we expect that the TRE pass should be able to handle arbitrary LLVM IR, even if clang with the standard pass pipeline can't generate IR with that pattern. So TRE needs to be able to handle the form where the byval argument is passed directly to a tail call. efriedma: I'm not completely sure what you're asking. In general, we expect that the TRE pass should be…
		avlAuthorUnsubmitted Done Reply Inline Actions In general, we expect that the TRE pass should be able to handle arbitrary LLVM IR, even if clang with the standard pass pipeline can't generate IR with that pattern. So TRE needs to be able to handle the form where the byval argument is passed directly to a tail call. The IR for the testcase "typedef struct { int x[100]; } A; void foo(A, A); void bar(A a, A b) { return foo(b, a); }" before TRE loos like this currently : %0 = bitcast %struct.A* %agg.tmp to i8* %1 = bitcast %struct.A* %b to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(400) %0, i8* nonnull align 8 dereferenceable(400) %1, i64 400, i1 false), !tbaa.struct !2 %2 = bitcast %struct.A* %agg.tmp1 to i8* %3 = bitcast %struct.A* %a to i8* call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(400) %2, i8* nonnull align 8 dereferenceable(400) %3, i64 400, i1 false), !tbaa.struct !2 tail call void @_Z3foo1AS_(%struct.A* nonnull byval(%struct.A) align 8 %agg.tmp, %struct.A* nonnull byval(%struct.A) align 8 %agg.tmp1) Did I correctly understand that TRE should also correctly work if above memcpy calls are not generated ? avl: >In general, we expect that the TRE pass should be able to handle arbitrary LLVM IR, even if…
		efriedmaUnsubmitted Not Done Reply Inline Actions Right, it should still work if the memcpy calls are not generated, or optimized out. For example, we shouldn't miscompile if someone writes `clang foo.c -o - -S -emit-llvm -O2 -mllvm -disable-llvm-optzns \| opt -S -sroa -memcpyopt -instcombine -tailcallelim`. Usually, memcpyopt doesn't run before tailcallelim, but allowing users to specify arbitrary optimizations in any order is part of what makes LLVM IR transforms flexible. efriedma: Right, it should still work if the memcpy calls are not generated, or optimized out. For…

		return NewAlloca;
		}

bool TailRecursionEliminator::eliminateCall(CallInst *CI) {		bool TailRecursionEliminator::eliminateCall(CallInst *CI) {
ReturnInst *Ret = cast<ReturnInst>(CI->getParent()->getTerminator());		ReturnInst *Ret = cast<ReturnInst>(CI->getParent()->getTerminator());

// Ok, we found a potential tail call. We can currently only transform the		// Ok, we found a potential tail call. We can currently only transform the
// tail call if all of the instructions between the call and the return are		// tail call if all of the instructions between the call and the return are
// movable to above the call itself, leaving the call next to the return.		// movable to above the call itself, leaving the call next to the return.
		efriedmaUnsubmitted Not Done Reply Inline Actions I think you need to be more careful with alignment here: it's UB if you specify alignment higher than the actual alignment of the value. For byval arguments with the alignment set, the alignment is exactly that. For byval arguments where the alignment attribute is missing, probably fine to just refuse to do TRE. It's an edge case which shouldn't really come up in practice. efriedma: I think you need to be more careful with alignment here: it's UB if you specify alignment…
		avlAuthorUnsubmitted Done Reply Inline Actions I think you need to be more careful with alignment here: it's UB if you specify alignment higher than the actual alignment of the value. For byval arguments with the alignment set, the alignment is exactly that. so, instead of this: Align Alignment(DL.getPrefTypeAlignment(AggTy)); Alignment = max(Alignment, MaybeAlign(CI->getParamAlign(OpndIdx))); I need to do just this : Align Alignment(DL.getPrefTypeAlignment(AggTy)); right? For byval arguments where the alignment attribute is missing, probably fine to just refuse to do TRE. It's an edge case which shouldn't really come up in practice. If alignment attribute is missing would it be OK to use Align(1) ? avl: > I think you need to be more careful with alignment here: it's UB if you specify alignment…
		avlAuthorUnsubmitted Done Reply Inline Actions i.e. both above conditions would look like this: Align StackSlotAlignment(max(1, MaybeAlign(CI->getParamAlign(OpndIdx)))); Align FinalAlignment(min(DL.getPrefTypeAlignment(AggTy), StackSlotAlignment)); would it be correct? avl: i.e. both above conditions would look like this: ``` Align StackSlotAlignment(max(1…
		efriedmaUnsubmitted Not Done Reply Inline Actions I don't think you need to mix in getPrefTypeAlignment. I mean, it's not wrong to use a lower alignment than StackSlotAlignment, but there isn't any reason you'd want to. Instead of `max(1, MaybeAlign(CI->getParamAlign(OpndIdx))`, you'd write `CI->getParamAlign(OpndIdx).valueOrOne()`, but sure, that's correct. The code quality might be a bit iffy for byval without alignment, but nothing should be generating code like that anyway, so I guess it's fine. efriedma: I don't think you need to mix in getPrefTypeAlignment. I mean, it's not wrong to use a lower…
// Check that this is the case now.		// Check that this is the case now.
Instruction *AccRecInstr = nullptr;		Instruction *AccRecInstr = nullptr;
BasicBlock::iterator BBI(CI);		BasicBlock::iterator BBI(CI);
for (++BBI; &*BBI != Ret; ++BBI) {		for (++BBI; &*BBI != Ret; ++BBI) {
if (canMoveAboveCall(&*BBI, CI, AA))		if (canMoveAboveCall(&*BBI, CI, AA))
continue;		continue;

// If we can't move the instruction above the call, it might be because it		// If we can't move the instruction above the call, it might be because it
Show All 16 Lines	return OptimizationRemark(DEBUG_TYPE, "tailcall-recursion", CI)
<< "transforming tail recursion into loop";		<< "transforming tail recursion into loop";
});		});

// OK! We can transform this tail call. If this is the first one found,		// OK! We can transform this tail call. If this is the first one found,
// create the new entry block, allowing us to branch back to the old entry.		// create the new entry block, allowing us to branch back to the old entry.
if (!HeaderBB)		if (!HeaderBB)
createTailRecurseLoopHeader(CI);		createTailRecurseLoopHeader(CI);

if (RemovableCallsMustBeMarkedTail && !CI->isTailCall())
return false;

// Ok, now that we know we have a pseudo-entry block WITH all of the		// Ok, now that we know we have a pseudo-entry block WITH all of the
// required PHI nodes, add entries into the PHI node for the actual		// required PHI nodes, add entries into the PHI node for the actual
// parameters passed into the tail-recursive call.		// parameters passed into the tail-recursive call.
for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i)		for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for variable 'e' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
		if (CI->isByValArgument(i))
		ArgumentPHIs[i]->addIncoming(createTempForByValOperand(CI, i), BB);
		else
ArgumentPHIs[i]->addIncoming(CI->getArgOperand(i), BB);		ArgumentPHIs[i]->addIncoming(CI->getArgOperand(i), BB);
		}

if (AccRecInstr) {		if (AccRecInstr) {
insertAccumulator(AccRecInstr);		insertAccumulator(AccRecInstr);

// Rewrite the accumulator recursion instruction so that it does not use		// Rewrite the accumulator recursion instruction so that it does not use
// the result of the call anymore, instead, use the PHI node we just		// the result of the call anymore, instead, use the PHI node we just
// inserted.		// inserted.
AccRecInstr->setOperand(AccRecInstr->getOperand(0) != CI, AccPN);		AccRecInstr->setOperand(AccRecInstr->getOperand(0) != CI, AccPN);
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	if (RetSelects.empty()) {
AccRecInstrNew->insertBefore(SI);		AccRecInstrNew->insertBefore(SI);
SI->setFalseValue(AccRecInstrNew);		SI->setFalseValue(AccRecInstrNew);
}		}
}		}
}		}
}		}
}		}

bool TailRecursionEliminator::processBlock(		bool TailRecursionEliminator::processBlock(BasicBlock &BB) {
BasicBlock &BB, bool CannotTailCallElimCallsMarkedTail) {
Instruction *TI = BB.getTerminator();		Instruction *TI = BB.getTerminator();

if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {		if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
if (BI->isConditional())		if (BI->isConditional())
return false;		return false;

BasicBlock *Succ = BI->getSuccessor(0);		BasicBlock *Succ = BI->getSuccessor(0);
ReturnInst *Ret = dyn_cast<ReturnInst>(Succ->getFirstNonPHIOrDbg());		ReturnInst *Ret = dyn_cast<ReturnInst>(Succ->getFirstNonPHIOrDbg());

if (!Ret)		if (!Ret)
return false;		return false;

CallInst *CI = findTRECandidate(&BB, CannotTailCallElimCallsMarkedTail);		CallInst *CI = findTRECandidate(&BB);

if (!CI)		if (!CI)
return false;		return false;

LLVM_DEBUG(dbgs() << "FOLDING: " << *Succ		LLVM_DEBUG(dbgs() << "FOLDING: " << *Succ
<< "INTO UNCOND BRANCH PRED: " << BB);		<< "INTO UNCOND BRANCH PRED: " << BB);
FoldReturnIntoUncondBranch(Ret, Succ, &BB, &DTU);		FoldReturnIntoUncondBranch(Ret, Succ, &BB, &DTU);
++NumRetDuped;		++NumRetDuped;

// If all predecessors of Succ have been eliminated by		// If all predecessors of Succ have been eliminated by
// FoldReturnIntoUncondBranch, delete it. It is important to empty it,		// FoldReturnIntoUncondBranch, delete it. It is important to empty it,
// because the ret instruction in there is still using a value which		// because the ret instruction in there is still using a value which
// eliminateCall will attempt to remove. This block can only contain		// eliminateCall will attempt to remove. This block can only contain
// instructions that can't have uses, therefore it is safe to remove.		// instructions that can't have uses, therefore it is safe to remove.
if (pred_empty(Succ))		if (pred_empty(Succ))
DTU.deleteBB(Succ);		DTU.deleteBB(Succ);

eliminateCall(CI);		eliminateCall(CI);
return true;		return true;
} else if (isa<ReturnInst>(TI)) {		} else if (isa<ReturnInst>(TI)) {
CallInst *CI = findTRECandidate(&BB, CannotTailCallElimCallsMarkedTail);		CallInst *CI = findTRECandidate(&BB);

if (CI)		if (CI)
return eliminateCall(CI);		return eliminateCall(CI);
}		}

return false;		return false;
}		}

bool TailRecursionEliminator::eliminate(Function &F,		bool TailRecursionEliminator::eliminate(Function &F,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
AliasAnalysis *AA,		AliasAnalysis *AA,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
DomTreeUpdater &DTU) {		DomTreeUpdater &DTU) {
if (F.getFnAttribute("disable-tail-calls").getValueAsString() == "true")		if (F.getFnAttribute("disable-tail-calls").getValueAsString() == "true")
return false;		return false;

bool MadeChange = false;		bool MadeChange = false;
bool AllCallsAreTailCalls = false;		MadeChange \|= markTails(F, ORE);
MadeChange \|= markTails(F, AllCallsAreTailCalls, ORE);
if (!AllCallsAreTailCalls)
return MadeChange;

// If this function is a varargs function, we won't be able to PHI the args		// If this function is a varargs function, we won't be able to PHI the args
// right, so don't even try to convert it...		// right, so don't even try to convert it...
if (F.getFunctionType()->isVarArg())		if (F.getFunctionType()->isVarArg())
return MadeChange;		return MadeChange;

// If false, we cannot perform TRE on tail calls marked with the 'tail'		if (!canTRE(F))
// attribute, because doing so would cause the stack size to increase (real		return MadeChange;
// TRE would deallocate variable sized allocas, TRE doesn't).
bool CanTRETailMarkedCall = canTRE(F);

// Change any tail recursive calls to loops.		// Change any tail recursive calls to loops.
TailRecursionEliminator TRE(F, TTI, AA, ORE, DTU);		TailRecursionEliminator TRE(F, TTI, AA, ORE, DTU);

for (BasicBlock &BB : F)		for (BasicBlock &BB : F)
MadeChange \|= TRE.processBlock(BB, !CanTRETailMarkedCall);		MadeChange \|= TRE.processBlock(BB);

TRE.cleanupAndFinalize();		TRE.cleanupAndFinalize();

return MadeChange;		return MadeChange;
}		}

namespace {		namespace {
struct TailCallElim : public FunctionPass {		struct TailCallElim : public FunctionPass {
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/Transforms/TailCallElim/basic.ll

	; RUN: opt < %s -tailcallelim -verify-dom-info -S \| FileCheck %s			; RUN: opt < %s -tailcallelim -verify-dom-info -S \| FileCheck %s

	declare void @noarg()			declare void @noarg()
	declare void @use(i32*)			declare void @use(i32*)
	declare void @use_nocapture(i32* nocapture)			declare void @use_nocapture(i32* nocapture)
	declare void @use2_nocapture(i32* nocapture, i32* nocapture)			declare void @use2_nocapture(i32* nocapture, i32* nocapture)

	; Trivial case. Mark @noarg with tail call.			; Trivial case. Mark @noarg with tail call.
	define void @test0() {			define void @test0() {
	; CHECK: tail call void @noarg()			; CHECK: tail call void @noarg()
	call void @noarg()			call void @noarg()
	ret void			ret void
	}			}

	; PR615. Make sure that we do not move the alloca so that it interferes with the tail call.			; Make sure that we do not do TRE if pointer to local stack
				; escapes through function call.
	define i32 @test1() {			define i32 @test1() {
	; CHECK: i32 @test1()			; CHECK: i32 @test1()
	; CHECK-NEXT: alloca			; CHECK-NEXT: alloca
	%A = alloca i32 ; <i32*> [#uses=2]			%A = alloca i32 ; <i32*> [#uses=2]
	store i32 5, i32* %A			store i32 5, i32* %A
	call void @use(i32* %A)			call void @use(i32* %A)
	; CHECK: tail call i32 @test1			; CHECK: call i32 @test1
	%X = tail call i32 @test1() ; <i32> [#uses=1]			%X = call i32 @test1() ; <i32> [#uses=1]
	ret i32 %X			ret i32 %X
	}			}

	; This function contains intervening instructions which should be moved out of the way			; This function contains intervening instructions which should be moved out of the way
	define i32 @test2(i32 %X) {			define i32 @test2(i32 %X) {
	; CHECK: i32 @test2			; CHECK: i32 @test2
	; CHECK-NOT: call			; CHECK-NOT: call
	; CHECK: ret i32			; CHECK: ret i32
	▲ Show 20 Lines • Show All 218 Lines • Show Last 20 Lines

llvm/test/Transforms/TailCallElim/tre-byval-parameter-2.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -tailcallelim -verify-dom-info -S \| FileCheck %s

				; the test was generated from the following C++ source:
				;
				; #include <stdio.h>
				; typedef struct A { long long x[10] = {0}; } A;
				; A global;
				; void dostuff(A a, A b, int i) {
				; if (i==10) return;
				; a.x[5]++;
				; printf("%lld %lld\n", a.x[5], b.x[5]); dostuff(b, a, i+1);
				; }
				; __attribute((optnone)) int main() { dostuff(global, global, 0); }
				;
				; This test checks that value for ByValue operands are copied
				; into temporarily variables before function call(as per
				; definition of the byVal operands). Additionally values from
				; these temporarily variables(byval value holders) are copied into
				; another temporarily variables which are passed to the next iteration.
				; That is neccessary since original byval value holders have reduced
				; lifetime scope and could not be used later. Specifically:
				; Value of the B_TR is copied into AGG_TMP, A_TR is copied into
				; AGG_TMP5. AGG_TMP and AGG_TMP5 are marked with lifetime markers.
				; Later values from these byval holders are copied into
				; temporarily variable used on the next iteration of the loop.
				; AGG_TMP is copied into AGG_TMP1, AGG_TMP5 is copied into
				; AGG_TMP52. An then they are used at next iteration.
				;
				; [[A_TR:%.]] = phi %struct.A [ [[A:%.]], [[ENTRY:%.]] ],
				; [ [[AGG_TMP1]], [[IF_END:%.*]] ]
				; [[B_TR:%.]] = phi %struct.A [ [[B:%.*]], [[ENTRY]] ],
				; [ [[AGG_TMP52]], [[IF_END]] ]

				%struct.A = type { [10 x i64] }

				@global = dso_local local_unnamed_addr global %struct.A zeroinitializer, align 8
				@.str = private unnamed_addr constant [11 x i8] c"%lld %lld\0A\00", align 1

				; Function Attrs: noinline nounwind uwtable
				define dso_local void @_Z7dostuff1AS_i(%struct.A* nocapture byval(%struct.A) align 8 %a, %struct.A* nocapture readonly byval(%struct.A) align 8 %b, i32 %i) local_unnamed_addr #0 {
				; CHECK-LABEL: @_Z7dostuff1AS_i(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AGG_TMP52:%.]] = alloca [[STRUCT_A:%.]], align 8
				; CHECK-NEXT: [[AGG_TMP1:%.*]] = alloca [[STRUCT_A]], align 8
				; CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[STRUCT_A]], align 8
				; CHECK-NEXT: [[AGG_TMP5:%.*]] = alloca [[STRUCT_A]], align 8
				; CHECK-NEXT: br label [[TAILRECURSE:%.*]]
				; CHECK: tailrecurse:
				; CHECK-NEXT: [[A_TR:%.]] = phi %struct.A [ [[A:%.]], [[ENTRY:%.]] ], [ [[AGG_TMP1]], [[IF_END:%.*]] ]
				; CHECK-NEXT: [[B_TR:%.]] = phi %struct.A [ [[B:%.*]], [[ENTRY]] ], [ [[AGG_TMP52]], [[IF_END]] ]
				; CHECK-NEXT: [[I_TR:%.]] = phi i32 [ [[I:%.]], [[ENTRY]] ], [ [[ADD:%.*]], [[IF_END]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[I_TR]], 10
				; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [[STRUCT_A]], %struct.A [[A_TR]], i64 0, i32 0, i64 5
				; CHECK-NEXT: [[TMP0:%.]] = load i64, i64 [[ARRAYIDX]], align 8
				; CHECK-NEXT: [[INC:%.*]] = add nsw i64 [[TMP0]], 1
				; CHECK-NEXT: store i64 [[INC]], i64* [[ARRAYIDX]], align 8
				; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [[STRUCT_A]], %struct.A [[B_TR]], i64 0, i32 0, i64 5
				; CHECK-NEXT: [[TMP1:%.]] = load i64, i64 [[ARRAYIDX4]], align 8
				; CHECK-NEXT: [[CALL:%.]] = tail call i32 (i8, ...) @printf(i8* nonnull dereferenceable(1) getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i64 0, i64 0), i64 [[INC]], i64 [[TMP1]])
				; CHECK-NEXT: [[TMP2:%.]] = bitcast %struct.A [[AGG_TMP]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 80, i8* nonnull [[TMP2]])
				; CHECK-NEXT: [[TMP3:%.]] = bitcast %struct.A [[B_TR]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) [[TMP2]], i8* nonnull align 8 dereferenceable(80) [[TMP3]], i64 80, i1 false)
				; CHECK-NEXT: [[TMP4:%.]] = bitcast %struct.A [[AGG_TMP5]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 80, i8* nonnull [[TMP4]])
				; CHECK-NEXT: [[TMP5:%.]] = bitcast %struct.A [[A_TR]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) [[TMP4]], i8* nonnull align 8 dereferenceable(80) [[TMP5]], i64 80, i1 false)
				; CHECK-NEXT: [[ADD]] = add nsw i32 [[I_TR]], 1
				; CHECK-NEXT: [[TMP6:%.]] = bitcast %struct.A [[AGG_TMP1]] to i8*
				; CHECK-NEXT: [[TMP7:%.]] = bitcast %struct.A [[AGG_TMP]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP6]], i8* align 8 [[TMP7]], i64 80, i1 false)
				; CHECK-NEXT: [[TMP8:%.]] = bitcast %struct.A [[AGG_TMP52]] to i8*
				; CHECK-NEXT: [[TMP9:%.]] = bitcast %struct.A [[AGG_TMP5]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP8]], i8* align 8 [[TMP9]], i64 80, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 80, i8* nonnull [[TMP2]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 80, i8* nonnull [[TMP4]])
				; CHECK-NEXT: br label [[TAILRECURSE]]
				; CHECK: return:
				; CHECK-NEXT: ret void
				;
				entry:
				%agg.tmp = alloca %struct.A, align 8
				%agg.tmp5 = alloca %struct.A, align 8
				%cmp = icmp eq i32 %i, 10
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				%arrayidx = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i64 5
				%0 = load i64, i64* %arrayidx, align 8
				%inc = add nsw i64 %0, 1
				store i64 %inc, i64* %arrayidx, align 8
				%arrayidx4 = getelementptr inbounds %struct.A, %struct.A* %b, i64 0, i32 0, i64 5
				%1 = load i64, i64* %arrayidx4, align 8
				%call = call i32 (i8, ...) @printf(i8 nonnull dereferenceable(1) getelementptr inbounds ([11 x i8], [11 x i8]* @.str
				, i64 0, i64 0), i64 %inc, i64 %1)
				%2 = bitcast %struct.A* %agg.tmp to i8*
				call void @llvm.lifetime.start.p0i8(i64 80, i8* nonnull %2)
				%3 = bitcast %struct.A* %b to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %2, i8* nonnull align 8 dereferenceable(80) %3, i64 80, i1 false)
				%4 = bitcast %struct.A* %agg.tmp5 to i8*
				call void @llvm.lifetime.start.p0i8(i64 80, i8* nonnull %4)
				%5 = bitcast %struct.A* %a to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(80) %4, i8* nonnull align 8 dereferenceable(80) %5, i64 80, i1 false)
				%add = add nsw i32 %i, 1
				call void @_Z7dostuff1AS_i(%struct.A* nonnull byval(%struct.A) align 8 %agg.tmp, %struct.A* nonnull byval(%struct.A) align 8 %agg.tmp5, i32 %add)
				call void @llvm.lifetime.end.p0i8(i64 80, i8* nonnull %2)
				call void @llvm.lifetime.end.p0i8(i64 80, i8* nonnull %4)
				br label %return

				return: ; preds = %entry, %if.end
				ret void
				}

				; Function Attrs: nofree nounwind
				declare dso_local noundef i32 @printf(i8* nocapture noundef readonly, ...) local_unnamed_addr #1

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #2

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2

				; Function Attrs: noinline norecurse nounwind optnone uwtable
				define dso_local i32 @main() local_unnamed_addr #3 {
				; CHECK-LABEL: @main(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AGG_TMP:%.]] = alloca [[STRUCT_A:%.]], align 8
				; CHECK-NEXT: [[AGG_TMP1:%.*]] = alloca [[STRUCT_A]], align 8
				; CHECK-NEXT: [[TMP0:%.]] = bitcast %struct.A [[AGG_TMP]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP0]], i8* align 8 bitcast (%struct.A* @global to i8*), i64 80, i1 false)
				; CHECK-NEXT: [[TMP1:%.]] = bitcast %struct.A [[AGG_TMP1]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP1]], i8* align 8 bitcast (%struct.A* @global to i8*), i64 80, i1 false)
				; CHECK-NEXT: tail call void @_Z7dostuff1AS_i(%struct.A* byval(%struct.A) align 8 [[AGG_TMP]], %struct.A* byval(%struct.A) align 8 [[AGG_TMP1]], i32 0)
				; CHECK-NEXT: ret i32 0
				;
				entry:
				%agg.tmp = alloca %struct.A, align 8
				%agg.tmp1 = alloca %struct.A, align 8
				%0 = bitcast %struct.A* %agg.tmp to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %0, i8* align 8 bitcast (%struct.A* @global to i8*), i64 80, i1 false)
				%1 = bitcast %struct.A* %agg.tmp1 to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %1, i8* align 8 bitcast (%struct.A* @global to i8*), i64 80, i1 false)
				call void @_Z7dostuff1AS_i(%struct.A* byval(%struct.A) align 8 %agg.tmp, %struct.A* byval(%struct.A) align 8 %agg.tmp1, i32 0)
				ret i32 0
				}

				attributes #0 = { uwtable }
				attributes #1 = { uwtable }
				attributes #2 = { argmemonly nounwind willreturn }

llvm/test/Transforms/TailCallElim/tre-byval-parameter.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -tailcallelim -verify-dom-info -S \| FileCheck %s

				; the test was generated from the following C++ source:
				;
				; int zoo ( S p1 );
				;
				; int foo ( int count, S p1 ) {
				; if ( count > 10 )
				; return zoo(p1);
				;
				; // After TRE: temporarily variable created for passing byvalue parameter
				; // p1 could be used when zoo(p1) is called.
				; return foo(count+1, p1);
				; }

				; this test checks that value of temporarily variable AGG_TMP_I
				; (byVal value holder) is copied into another temporarily variable
				; (AGG_TMP_I1). That is neccessary to copy data from variable with
				; reduced scope (lifetime.start/lifetime.end). Specifically when
				; "call i32 @_Z3fooi1S" is replaced with "br label tailrecurse"
				; the value which were copied by "@llvm.memcpy.p0i8.p0i8.i64" into
				; AGG_TMP_I should be later copied into AGG_TMP_I1. Since AGG_TMP_I
				; is marked with lifetime.start/lifetime.end and could not be used
				; later by:
				;
				; "[[P1_TR:%.]] = phi %struct.S [ [[P1:%.*]], [[ENTRY]] ],
				; [ [[AGG_TMP_I1]], [[IF_END]] ]".

				%struct.S = type { i32, i32, float, %struct.B }
				%struct.B = type { i32, float }

				; Function Attrs: uwtable
				define dso_local i32 @_Z3fooi1S(i32 %count, %struct.S* nocapture readonly byval(%struct.S) align 8 %p1) local_unnamed_addr #0 {
				; CHECK-LABEL: @_Z3fooi1S(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[AGG_TMP_I1:%.]] = alloca [[STRUCT_S:%.]], align 8
				; CHECK-NEXT: [[AGG_TMP_I:%.*]] = alloca [[STRUCT_S]], align 8
				; CHECK-NEXT: [[AGG_TMP14:%.*]] = alloca [[STRUCT_S]], align 8
				; CHECK-NEXT: [[AGG_TMP:%.*]] = alloca [[STRUCT_S]], align 8
				; CHECK-NEXT: [[AGG_TMP1:%.*]] = alloca [[STRUCT_S]], align 8
				; CHECK-NEXT: br label [[TAILRECURSE:%.*]]
				; CHECK: tailrecurse:
				; CHECK-NEXT: [[COUNT_TR:%.]] = phi i32 [ [[COUNT:%.]], [[ENTRY:%.]] ], [ [[ADD:%.]], [[IF_END:%.*]] ]
				; CHECK-NEXT: [[P1_TR:%.]] = phi %struct.S [ [[P1:%.*]], [[ENTRY]] ], [ [[AGG_TMP_I1]], [[IF_END]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i32 [[COUNT_TR]], 10
				; CHECK-NEXT: br i1 [[CMP]], label [[IF_THEN:%.*]], label [[IF_END]]
				; CHECK: if.then:
				; CHECK-NEXT: [[TMP0:%.]] = bitcast %struct.S [[AGG_TMP]] to i8*
				; CHECK-NEXT: [[TMP1:%.]] = bitcast %struct.S [[P1_TR]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(20) [[TMP0]], i8* nonnull align 8 dereferenceable(20) [[TMP1]], i64 20, i1 false)
				; CHECK-NEXT: [[CALL:%.]] = tail call i32 @_Z3zoo1S(%struct.S nonnull byval(%struct.S) align 8 [[AGG_TMP]])
				; CHECK-NEXT: br label [[RETURN:%.*]]
				; CHECK: if.end:
				; CHECK-NEXT: [[ADD]] = add nsw i32 [[COUNT_TR]], 1
				; CHECK-NEXT: [[TMP2:%.]] = bitcast %struct.S [[AGG_TMP1]] to i8*
				; CHECK-NEXT: [[TMP3:%.]] = bitcast %struct.S [[P1_TR]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(20) [[TMP2]], i8* nonnull align 8 dereferenceable(20) [[TMP3]], i64 20, i1 false)
				; CHECK-NEXT: [[AGG_TMP14_0__SROA_CAST:%.]] = bitcast %struct.S [[AGG_TMP14]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 20, i8* nonnull [[AGG_TMP14_0__SROA_CAST]])
				; CHECK-NEXT: [[TMP4:%.]] = bitcast %struct.S [[AGG_TMP_I]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 20, i8* nonnull [[TMP4]])
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(20) [[AGG_TMP14_0__SROA_CAST]], i8* nonnull align 8 dereferenceable(20) [[TMP2]], i64 20, i1 false)
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(20) [[TMP4]], i8* nonnull align 8 dereferenceable(20) [[AGG_TMP14_0__SROA_CAST]], i64 20, i1 false)
				; CHECK-NEXT: [[TMP5:%.]] = bitcast %struct.S [[AGG_TMP_I1]] to i8*
				; CHECK-NEXT: [[TMP6:%.]] = bitcast %struct.S [[AGG_TMP_I]] to i8*
				; CHECK-NEXT: call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 [[TMP5]], i8* align 8 [[TMP6]], i64 20, i1 false)
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 20, i8* nonnull [[AGG_TMP14_0__SROA_CAST]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 20, i8* nonnull [[TMP4]])
				; CHECK-NEXT: br label [[TAILRECURSE]]
				; CHECK: return:
				; CHECK-NEXT: ret i32 [[CALL]]
				;
				entry:
				%agg.tmp.i = alloca %struct.S, align 8
				%agg.tmp14 = alloca %struct.S, align 8
				%agg.tmp = alloca %struct.S, align 8
				%agg.tmp1 = alloca %struct.S, align 8
				%cmp = icmp sgt i32 %count, 10
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%0 = bitcast %struct.S* %agg.tmp to i8*
				%1 = bitcast %struct.S* %p1 to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(20) %0, i8* nonnull align 8 dereferenceable(20) %1, i64 20, i1 false)
				%call = call i32 @_Z3zoo1S(%struct.S* nonnull byval(%struct.S) align 8 %agg.tmp)
				br label %return

				if.end: ; preds = %entry
				%add = add nsw i32 %count, 1
				%2 = bitcast %struct.S* %agg.tmp1 to i8*
				%3 = bitcast %struct.S* %p1 to i8*
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(20) %2, i8* nonnull align 8 dereferenceable(20) %3, i64 20, i1 false)
				%agg.tmp14.0..sroa_cast = bitcast %struct.S* %agg.tmp14 to i8*
				call void @llvm.lifetime.start.p0i8(i64 20, i8* nonnull %agg.tmp14.0..sroa_cast)
				%4 = bitcast %struct.S* %agg.tmp.i to i8*
				call void @llvm.lifetime.start.p0i8(i64 20, i8* nonnull %4)
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(20) %agg.tmp14.0..sroa_cast, i8* nonnull align 8 dereferenceable(20) %2, i64 20, i1 false)
				call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(20) %4, i8* nonnull align 8 dereferenceable(20) %agg.tmp14.0..sroa_cast, i64 20, i1 false)
				%call.i = call i32 @_Z3fooi1S(i32 %add, %struct.S* nonnull byval(%struct.S) align 8 %agg.tmp.i)
				call void @llvm.lifetime.end.p0i8(i64 20, i8* nonnull %agg.tmp14.0..sroa_cast)
				call void @llvm.lifetime.end.p0i8(i64 20, i8* nonnull %4)
				br label %return

				return: ; preds = %if.end, %if.then
				%retval.0 = phi i32 [ %call, %if.then ], [ %call.i, %if.end ]
				ret i32 %retval.0
				}

				declare dso_local i32 @_Z3zoo1S(%struct.S* byval(%struct.S) align 8) local_unnamed_addr #1

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #2

				attributes #0 = { uwtable }
				attributes #1 = { uwtable }
				attributes #2 = { argmemonly nounwind willreturn }

llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -tailcallelim -verify-dom-info -S \| FileCheck %s

				; This test checks that TRE would be done for only one recursive call.
				; The test_multiple_exits function has three recursive calls.
				; First recursive call could not be eliminated because there is
				; escaped pointer to local variable. Second recursive call could
				; be eliminated. Thrid recursive call could not be eliminated since
				; this is not last call. Thus, test checks that TRE would be done
				; for only second recursive call.

				; IR for that test was generated from the following C++ source:
				;
				; void capture_arg (int*);
				; void test_multiple_exits (int param);
				; if (param >= 0 && param < 10) {
				; int temp;
				; capture_arg(&temp);
				; // TRE could not be done because pointer to local
				; // variable "temp" is escaped.
				; test_multiple_exits(param + 1);
				; } else if (param >=10 && param < 20) {
				; // TRE should be done.
				; test_multiple_exits(param + 1);
				; } else if (param >= 20 && param < 22) {
				; // TRE could not be done since recursive
				; // call is not last call.
				; test_multiple_exits(param + 1);
				; func();
				; }
				;
				; return;
				; }

				; Function Attrs: noinline optnone uwtable
				declare void @_Z11capture_argPi(i32* %param) #0

				; Function Attrs: noinline optnone uwtable
				declare void @_Z4funcv() #0

				; Function Attrs: noinline nounwind uwtable
				define dso_local void @_Z19test_multiple_exitsi(i32 %param) local_unnamed_addr #2 {
				; CHECK-LABEL: @_Z19test_multiple_exitsi(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TEMP:%.*]] = alloca i32, align 4
				; CHECK-NEXT: br label [[TAILRECURSE:%.*]]
				; CHECK: tailrecurse:
				; CHECK-NEXT: [[PARAM_TR:%.]] = phi i32 [ [[PARAM:%.]], [[ENTRY:%.]] ], [ [[ADD6:%.]], [[IF_THEN5:%.*]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = icmp ult i32 [[PARAM_TR]], 10
				; CHECK-NEXT: br i1 [[TMP0]], label [[IF_THEN:%.]], label [[IF_ELSE:%.]]
				; CHECK: if.then:
				; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[TEMP]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP1]]) #1
				; CHECK-NEXT: call void @_Z11capture_argPi(i32* nonnull [[TEMP]])
				; CHECK-NEXT: [[ADD:%.*]] = add nuw nsw i32 [[PARAM_TR]], 1
				; CHECK-NEXT: call void @_Z19test_multiple_exitsi(i32 [[ADD]])
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP1]]) #1
				; CHECK-NEXT: br label [[IF_END14:%.*]]
				; CHECK: if.else:
				; CHECK-NEXT: [[PARAM_OFF:%.*]] = add i32 [[PARAM_TR]], -10
				; CHECK-NEXT: [[TMP2:%.*]] = icmp ult i32 [[PARAM_OFF]], 10
				; CHECK-NEXT: br i1 [[TMP2]], label [[IF_THEN5]], label [[IF_ELSE7:%.*]]
				; CHECK: if.then5:
				; CHECK-NEXT: [[ADD6]] = add nuw nsw i32 [[PARAM_TR]], 1
				; CHECK-NEXT: br label [[TAILRECURSE]]
				; CHECK: if.else7:
				; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[PARAM_TR]], -2
				; CHECK-NEXT: [[TMP4:%.*]] = icmp eq i32 [[TMP3]], 20
				; CHECK-NEXT: br i1 [[TMP4]], label [[IF_THEN11:%.*]], label [[IF_END14]]
				; CHECK: if.then11:
				; CHECK-NEXT: [[ADD12:%.*]] = add nsw i32 [[PARAM_TR]], 1
				; CHECK-NEXT: tail call void @_Z19test_multiple_exitsi(i32 [[ADD12]])
				; CHECK-NEXT: tail call void @_Z4funcv()
				; CHECK-NEXT: ret void
				; CHECK: if.end14:
				; CHECK-NEXT: ret void
				;
				entry:
				%temp = alloca i32, align 4
				%0 = icmp ult i32 %param, 10
				br i1 %0, label %if.then, label %if.else

				if.then: ; preds = %entry
				%1 = bitcast i32* %temp to i8*
				call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %1) #2
				call void @_Z11capture_argPi(i32* nonnull %temp)
				%add = add nuw nsw i32 %param, 1
				call void @_Z19test_multiple_exitsi(i32 %add)
				call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %1) #2
				br label %if.end14

				if.else: ; preds = %entry
				%param.off = add i32 %param, -10
				%2 = icmp ult i32 %param.off, 10
				br i1 %2, label %if.then5, label %if.else7

				if.then5: ; preds = %if.else
				%add6 = add nuw nsw i32 %param, 1
				call void @_Z19test_multiple_exitsi(i32 %add6)
				br label %if.end14

				if.else7: ; preds = %if.else
				%3 = and i32 %param, -2
				%4 = icmp eq i32 %3, 20
				br i1 %4, label %if.then11, label %if.end14

				if.then11: ; preds = %if.else7
				%add12 = add nsw i32 %param, 1
				call void @_Z19test_multiple_exitsi(i32 %add12)
				call void @_Z4funcv()
				br label %if.end14

				if.end14: ; preds = %if.then5, %if.then11, %if.else7, %if.then
				ret void
				}

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2

				attributes #0 = { nofree noinline norecurse nounwind uwtable }
				attributes #1 = { nounwind uwtable }
				attributes #2 = { argmemonly nounwind willreturn }

llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -tailcallelim -verify-dom-info -S \| FileCheck %s

				; IR for that test was generated from the following C++ source:
				;
				;int count;
				;__attribute__((noinline)) void globalIncrement(const int* param) { count += *param; }
				;
				;void test(int recurseCount)
				;{
				; if (recurseCount == 0) return;
				; int temp = 10;
				; globalIncrement(&temp);
				; test(recurseCount - 1);
				;}
				;

				@count = dso_local local_unnamed_addr global i32 0, align 4

				; Function Attrs: nofree noinline norecurse nounwind uwtable
				declare void @_Z15globalIncrementPKi(i32* nocapture readonly %param) #0

				; Test that TRE could be done for recursive tail routine containing
				; call to function receiving a pointer to local stack.

				; Function Attrs: nounwind uwtable
				define dso_local void @_Z4testi(i32 %recurseCount) local_unnamed_addr #1 {
				; CHECK-LABEL: @_Z4testi(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TEMP:%.*]] = alloca i32, align 4
				; CHECK-NEXT: br label [[TAILRECURSE:%.*]]
				; CHECK: tailrecurse:
				; CHECK-NEXT: [[RECURSECOUNT_TR:%.]] = phi i32 [ [[RECURSECOUNT:%.]], [[ENTRY:%.]] ], [ [[SUB:%.]], [[IF_END:%.*]] ]
				; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[RECURSECOUNT_TR]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.*]], label [[IF_END]]
				; CHECK: if.end:
				; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[TEMP]] to i8*
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP0]])
				; CHECK-NEXT: store i32 10, i32* [[TEMP]], align 4
				; CHECK-NEXT: call void @_Z15globalIncrementPKi(i32* nonnull [[TEMP]])
				; CHECK-NEXT: [[SUB]] = add nsw i32 [[RECURSECOUNT_TR]], -1
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP0]])
				; CHECK-NEXT: br label [[TAILRECURSE]]
				; CHECK: return:
				; CHECK-NEXT: ret void
				;
				entry:
				%temp = alloca i32, align 4
				%cmp = icmp eq i32 %recurseCount, 0
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				%0 = bitcast i32* %temp to i8*
				call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #6
				store i32 10, i32* %temp, align 4
				call void @_Z15globalIncrementPKi(i32* nonnull %temp)
				%sub = add nsw i32 %recurseCount, -1
				call void @_Z4testi(i32 %sub)
				call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #6
				br label %return

				return: ; preds = %entry, %if.end
				ret void
				}

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2

				attributes #0 = { nofree noinline norecurse nounwind uwtable }
				attributes #1 = { nounwind uwtable }
				attributes #2 = { argmemonly nounwind willreturn }

This is an archive of the discontinued LLVM Phabricator instance.

[TRE] Reland: allow TRE for non-capturing calls.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 296732

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

llvm/test/Transforms/TailCallElim/basic.ll

llvm/test/Transforms/TailCallElim/tre-byval-parameter-2.ll

llvm/test/Transforms/TailCallElim/tre-byval-parameter.ll

llvm/test/Transforms/TailCallElim/tre-multiple-exits.ll

llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll

[TRE] Reland: allow TRE for non-capturing calls.
ClosedPublic