This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
19/43
TailRecursionElimination.cpp
-
test/Transforms/TailCallElim/
-
Transforms/
-
TailCallElim/
1
basic.ll
2
tre-noncapturing-alloca-calls.ll

Differential D82085

[TRE] allow TRE for non-capturing calls.
ClosedPublic

Authored by avl on Jun 18 2020, 5:27 AM.

Download Raw Diff

Details

Reviewers

efriedma
jdoerfert
fhahn

Commits

rGf7907e9d223d: [TRE] allow TRE for non-capturing calls.

Summary

The current implementation of Tail Recursion Elimination has a very restricted
pre-requisite: AllCallsAreTailCalls. i.e. it requires that no function
call receives a pointer to local stack. Generally, function calls that
receive a pointer to local stack but do not capture it - should not
break TRE. This fix allows us to do TRE if it is proved that no pointer
to the local stack is escaped. To make sure whether pointer escaped or not,
it examines called function.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

avl created this revision.Jun 18 2020, 5:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2020, 5:27 AM

Herald added subscribers: cfe-commits, kerbowa, dmgreen and 3 others. · View Herald Transcript

Harbormaster failed remote builds in B60813: Diff 271678!Jun 18 2020, 5:58 AM

markTails function set IsTailcall bit for functions which are not
last calls:

It's OK to set "tail" on any call that satisfies these requirements (from https://llvm.org/docs/LangRef.html#call-instruction): "Both markers [tail and musttail] imply that the callee does not access allocas from the caller. The tail marker additionally implies that the callee does not access varargs from the caller."

"tail" does not mean that the call *must* be generated as a tail call. It just means that it's safe to generate it as a tail call if it turns out to be possible (e.g. if the compiler can prove that @noarg doesn't return, or if it can prove that all the code after the call to @noarg has no effect, or so on).

So I don't think there is a bug here that needs to be fixed.

It's OK to set "tail" on any call that satisfies these requirements (from https://llvm.org/docs/LangRef.html#call-instruction): "Both markers [tail and musttail] imply that the callee does not access allocas from the caller. The tail marker additionally implies that the callee does not access varargs from the caller."

"tail" does not mean that the call *must* be generated as a tail call. It just means that it's safe to generate it as a tail call if it turns out to be possible (e.g. if the compiler can prove that @noarg doesn't return, or if it can prove that all the code after the call to @noarg has no effect, or so on).

Yes, that is understood: ""tail" does not mean that the call *must* be generated as a tail call".
I intended to make picture consistent: when markTails sees function body, it is evident that the first call is not a tail call. I agree that a compiler "can prove that @noarg doesn't return, or if it can prove that all the code after the call to @noarg has no effect" and this first call could become tail-call. Probably the suitable strategy, in this case, would be to recalculate marking when it is broken(instead of creating false positive marks). There are other cases(compiler could add function calls), which could break existing marking and then would be necessary to recalculate marking.

Having many false positive "tail" marks could create a confusing picture.
I would describe the full problem which I am trying to solve.
(I assumed this patch would be the first one for that problem):

cat test.cpp

int count;
__attribute__((noinline)) void globalIncrement(const int* param) { count += *param; }

void test(int recurseCount)
{
    if (recurseCount == 0) return;
    {
      int temp = 10;
      globalIncrement(&temp);
    }
    test(recurseCount - 1);
}

TRE is not done for that test case. There are two reasons for that:
First is that AllocaDerivedValueTracker does not use the PointerMayBeCaptured interface, and it does not see that &temp is not escaped.
Second is that AllCallsAreTailCalls is used as a pre-requisite for making TRE.
So it requires that all calls would be marked as "tail". This looks too restricted.
Instead, it should check that "&temp" is not escaped in globalIncrement() and that "test"
is a tail recursive call not using allocas. I think the confusion happened exactly because "tail" marking was done for all calls(not for the real tailcalls).

Thus I planned to do the following fixes:

cleanup "tail" marking.(this patch)
do not use "AllCallsAreTailCalls" as a pre-requisite for TRE.(this patch).
use PointerMayBeCaptured inside AllocaDerivedValueTracker.

What do you think about all of this?

It seems to me that it would be good to have consistent "tail" marking.
But if it does not look like a good idea, I could continue to point 2. and 3.

Relevant llvm-dev thread. Noncapture use of locals disabling TailRecursionElimination

All "tail" needs to mean is "this call does not reference the current function's stack." That's all, no more or less.

The relevant documentation and APIs are a bit confusing. A "tail" marking is a prerequisite for tailcall optimization, but it's not really related to any of the other prerequisites for tailcall optimization. Dropping the "tail" marking just because the call isn't in a tail position would involve a bunch of work adding and removing "tail" markings, for no benefit.

It makes sense to teach tail recursion elimination not to depend so heavily on tail markings. But I don't think that implies we want to mess with the markings themselves.

I think the confusion happened exactly because "tail" marking was done for all calls(not for the real tailcalls).

I don't think there's any confusion here. It's just using the tail markings as a conservative estimate of capturing because it has to compute them anyway. And TRE in general is simplistic code that nobody has spent much time looking at in a long time. There are very few TRE opportunities in C/C++ code anyway.

It makes sense to teach tail recursion elimination not to depend so heavily on tail markings. But I don't think that implies we want to mess with the markings themselves.

Ok. Thus I need to drop part of this patch related to more strict tail marking, and continue with part which changes pre-requisite for TRE.

deleted code doing more strict tailcall marking.
left removal of "AllCallsAreTailCalls".
added check for non-capturing calls while tracking alloca.
re-titled the patch.

avl retitled this revision from [TRE] markTails marks call sites as tailcalls though some of them are not. to [TRE] allow TRE for non-capturing calls..Jun 22 2020, 3:58 AM

avl edited the summary of this revision. (Show Details)

avl added a subscriber: lxfind.Jun 22 2020, 4:01 AM

Harbormaster failed remote builds in B61205: Diff 272378!Jun 22 2020, 5:20 AM

efriedma added inline comments.Jun 22 2020, 1:00 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
145	Please don't add code to examine the callee; if we're not deducing nocapture appropriately, we should fix that elsewhere.
332	What is the new handling for lifetime.end/assume doing?
830	Do you have to redo the AllocaDerivedValueTracker analysis? Is it not enough that the call you're trying to TRE is marked "tail"?
857	I thought we had some tests where we TRE in the presence of recursive calls, like a simple recursive fibonacci. Am I misunderstanding this?

avl marked 4 inline comments as done.Jun 22 2020, 2:23 PM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
145	Ok.
332	They are just skipped. In following test case: call void @_Z5test5i(i32 %sub) call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull %1) #5 call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5 br label %return they are generated in between call and ret. It is safe to ignore them while checking whether transformation is possible.
830	Do you have to redo the AllocaDerivedValueTracker analysis? AllocaDerivedValueTracker analysis(done in markTails) could be reused here. But marking, done in markTails(), looks like separate tasks. i.e. it is better to make TRE not depending on markTails(). There is a review for this - https://reviews.llvm.org/D60031 Thus such separation looks useful(To not reuse result of markTails but have it computed inplace). Is it not enough that the call you're trying to TRE is marked "tail"? It is not enough that call which is subject to TRE is marked "Tail". It also should be checked that other calls does not capture pointer to local stack: // do not do TRE if any pointer to local stack has escaped. if (!Tracker.EscapePoints.empty()) return false;
857	right, there is a testcase for fibonacchi: llvm/test/Transforms/TailCallElim/accum_recursion.ll:@test3_fib areAllLastFuncCallsRecursive() checking works well for fibonacci testcase: return fib(x-1)+fib(x-2); Since, Last funcs call chain is : fib()->fib()->ret. That check should prevent from such cases: return fib(x-1)+another_call()+fib(x-2);

efriedma added inline comments.Jun 22 2020, 2:48 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	It is not enough that call which is subject to TRE is marked "Tail". It also should be checked that other calls does not capture pointer to local stack: If there's an escaped pointer to the local stack, we wouldn't infer "tail" in the first place, would we?
857	That check should prevent from such cases: return fib(x-1)+another_call()+fib(x-2); Why do we need to prevent this?

laytonio added inline comments.Jun 22 2020, 3:05 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
828	There is no need to pass the function here since its a member variable.

hiraditya added inline comments.Jun 22 2020, 3:52 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
806	can we use isa<> here?
821	`CI->getCalledFunction() != &F` seems cheaper than `canMoveAboveCall`
845	Do we need to visit all the instructions twice?

avl marked 3 inline comments as done.Jun 22 2020, 4:15 PM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
828	Ok.
830	If function receives pointer to alloca then it would not be marked with "Tail". Then we do not have a possibility to understand whether this function receives pointer to alloca but does not capture it: void test(int recurseCount) { if (recurseCount == 0) return; int temp = 10; globalIncrement(&temp); test(recurseCount - 1); } test - marked with Tail. globalIncrement - not marked with Tail. But TRE could be done since it does not capture pointer. But if it will capture the pointer then we could not do TRE. So we need to check !Tracker.EscapePoints.empty().
857	We do not. I misunderstood the canTransformAccumulatorRecursion(). That check could be removed.

efriedma added inline comments.Jun 22 2020, 6:14 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	test - marked with Tail. For the given code, TRE won't mark the recursive call "tail". That transform isn't legal: the recursive call could access the caller's version of "temp".

efriedma mentioned this in D82269: [TRE][NFC] Refactor Basic Block Processing.Jun 22 2020, 7:27 PM

avl marked an inline comment as done.Jun 23 2020, 12:54 AM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	For the given code, TRE won't mark the recursive call "tail". That transform isn't legal: the recursive call could access the caller's version of "temp". it looks like recursive call could NOT access the caller's version of "temp": test(recurseCount - 1); Caller`s version of temp is accessed by non-recursive call: globalIncrement(&temp); If globalIncrement does not capture the "&temp" then TRE looks to be legal for that case. globalIncrement() would not be marked with "Tail". test() would be marked with Tail. Thus the pre-requisite for TRE would be: tail-recursive call must not receive pointer to local stack(Tail) and non-recursive calls must not capture the pointer to local stack.

efriedma added inline comments.Jun 23 2020, 11:30 AM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	Can you give a complete IR example where we infer "tail", but TRE is illegal? Can you give a complete IR example, we we don't infer "tail", but we still do the TRE transform here?

avl marked an inline comment as done.Jun 23 2020, 12:09 PM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

830

Can you give a complete IR example where we infer "tail", but TRE is illegal?

there is no such example. Currently all cases where we infer "tail" would be valid for TRE.

Can you give a complete IR example, we we don't infer "tail", but we still do the TRE transform here?

For the following example current code base would not infer "tail" for _Z15globalIncrementPKi and as the result would not do TRE for _Z4testi.
This patch changes this behavior: so that if _Z15globalIncrementPKi is not marked with "tail" and does not capture its pointer argument - TRE would be allowed for _Z4testi.

@count = dso_local local_unnamed_addr global i32 0, align 4

; Function Attrs: nofree noinline norecurse nounwind uwtable
define dso_local void @_Z15globalIncrementPKi(i32* nocapture readonly %param) local_unnamed_addr #0 {
entry:
  %0 = load i32, i32* %param, align 4
  %1 = load i32, i32* @count, align 4
  %add = add nsw i32 %1, %0
  store i32 %add, i32* @count, align 4
  ret void
}

; Function Attrs: nounwind uwtable
define dso_local void @_Z4testi(i32 %recurseCount) local_unnamed_addr #1 {
entry:
  %temp = alloca i32, align 4
  %cmp = icmp eq i32 %recurseCount, 0
  br i1 %cmp, label %return, label %if.end

if.end:                                           ; preds = %entry
  %0 = bitcast i32* %temp to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #6
  store i32 10, i32* %temp, align 4
  call void @_Z15globalIncrementPKi(i32* nonnull %temp)
  %sub = add nsw i32 %recurseCount, -1
  call void @_Z4testi(i32 %sub)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #6
  br label %return

return:                                           ; preds = %entry, %if.end
  ret void
}

; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2

; Function Attrs: argmemonly nounwind willreturn
declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2

attributes #0 = { nofree noinline norecurse nounwind uwtable }
attributes #1 = { nounwind uwtable }
attributes #2 = { argmemonly nounwind willreturn }

efriedma added inline comments.Jun 23 2020, 1:01 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	In your example, we don't infer "tail" for globalIncrement... but we do infer it for the recursive call to test(). I'm suggesting you could just check for "tail" on test(), instead of using AllocaDerivedValueTracker.

avl marked an inline comment as done.Jun 23 2020, 1:48 PM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	Checking only test() is not enough. There additionally should be checked globalIncrement(). It is correct that while checking test() there could be checked "Tail" flag. Which does not require using AllocaDerivedValueTracker. But while checking globalIncrement() there should be checked whether some alloca value escaped or not. "Tail" flag could not be used for that. AllocaDerivedValueTracker allow to do such check: Tracker.EscapePoints.empty() If we would not do check for globalIncrement then it is not valid to do TRE. Thus it seems we need to check globalIncrement for escaping pointer and we need to use AllocaDerivedValueTracker for that.

addressed comments:

removed PointerMayBeCaptured() used for CalledFunction.
rewrote CanTRE() to visiting instructions only once.
replaced areAllLastFuncCallsRecursive() with isInTREPosition().

I did not address request for not using AllocaDerivedValueTracker yet.
Since there is open question on it. I would address it as soon as the question would be resolved.

efriedma added inline comments.Jun 23 2020, 2:41 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	Checking only test() is not enough. There additionally should be checked globalIncrement(). Can you give a complete IR example where we infer "tail", but TRE is illegal? there is no such example. Currently all cases where we infer "tail" would be valid for TRE. I'm not sure how to reconcile this. Are you saying we could infer "tail" in some case where TRE is illegal, but don't right now? Or are you saying that you plan to extend TRE to handle cases where we can't infer "tail" on the recursive call?

avl marked an inline comment as done.Jun 23 2020, 3:16 PM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	Or are you saying that you plan to extend TRE to handle cases where we can't infer "tail" on the recursive call? I think not exactly. more precise would probably be : "I am saying that plan to extend TRE to handle cases where we can't infer "tail" on the NON-recursive NON-last call" globalIncrement() is non-recursive non-last call in above example. But we need to check whether it captures argument pointer or not to decide whether it is OK to do TRE. To make things clear - I am suggesting instead of current pre-requisite for TRE : "All call sites are marked with Tail" to make following: "Recursive last calls are marked with "Tail", non-recursive non-last calls are proved to not capture alloca". For the above example it means : the requirement for test() should stay the same(should be marked with Tail). The requirement for globalIncrement() should be "does not capture alloca".

efriedma added inline comments.Jun 23 2020, 3:30 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
830	"Recursive last calls are marked with 'tail'" implies "non-recursive non-last calls are proved to not capture alloca".

Harbormaster completed remote builds in B61457: Diff 272829.Jun 23 2020, 3:39 PM

avl marked an inline comment as done.Jun 24 2020, 7:45 AM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

830

I see, thank you for explanations.
There is a test which makes me think that above rule is not always correct:

Transforms/TailCallElim/basic.ll:@test1

; PR615. Make sure that we do not move the alloca so that it interferes with the tail call.
define i32 @test1() {
; CHECK: i32 @test1()
; CHECK-NEXT: alloca
        %A = alloca i32         ; <i32*> [#uses=2]
        store i32 5, i32* %A
        call void @use(i32* %A)
; CHECK: tail call i32 @test1
        %X = tail call i32 @test1()             ; <i32> [#uses=1]
        ret i32 %X
}

I removed usages of AllocaDerivedValueTracker and corrected the test1 from Transforms/TailCallElim/basic.ll.

removed usages of AllocaDerivedValueTracker from canTRE().

laytonio added inline comments.Jun 24 2020, 9:06 AM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
831	You can use `for (Instruction &I : instructions(F))` here.
845	Is this correct? I think we want to check these per TRE candidate in findTRECandidate, not just disable TRE in general if one is found.

Harbormaster completed remote builds in B61550: Diff 273029.Jun 24 2020, 9:43 AM

avl marked an inline comment as done.Jun 24 2020, 2:29 PM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
845	I tried to minimize changes and keep old logic here - but yes, it is better to move that check into findTRECandidate(). Will do.

check valid TRE candidate into findTRECandidate()().

Harbormaster completed remote builds in B61634: Diff 273174.Jun 24 2020, 5:24 PM

laytonio added inline comments.Jun 25 2020, 5:34 AM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
843	Is there any reason to find and validate candidates now only to have to redo it when we actually perform the eliminations? If so, is there any reason this needs to follow a different code path than findTRECandidate? findTRECandidate is doing the same checks, except for canMoveAboveCall and canTransformAccumulatorRecursion, which should probably be refactored into findTRECandidate from eliminateCall anyway. If not then all of this code goes away and we're left with the same canTRE as in trunk.

avl marked an inline comment as done.Jun 25 2020, 8:32 AM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
843	We are enumerating all instructions here, so we could understand if there are not TRE candidates and stop earlier. That is the reason for doing it here. I agree that findTRECandidate should be refactored to have the same checks as here. What do you think is better to do: leave early check for TRE candidates in canTRE or remove it refactor findTRECandidate or leave it as is ?

laytonio added inline comments.Jun 25 2020, 9:00 AM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
843	Yes we are iterating all the instructions here but, unless I am missing something, we would literally just be doing the checks twice for no reason. Look at it this way, best case scenario we have to check all possible candidates once, find none and we're done. Worst case, we check all possible candidates once, find one and have to check all possible candidates a second time. Where as if we remove the early checks we only ever have to check the candidates once. So we wouldn't really be stopping any earlier. As for refactoring findTRECandidate, I do think that should be done and we should strive to move all the failure conditions out of eliminateCall in order to avoid having to fold a return only to find out we didn't need to. But, I think that is out of the scope of this change, and if we do decide to keep the early checks here then we should say that findTRECandidate does a good enough job to consider this function as having valid candidates.

avl marked an inline comment as done.Jun 25 2020, 10:22 AM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
843	Yes we are iterating all the instructions here but, unless I am missing something, we would literally just be doing the checks twice for no reason. Look at it this way, best case scenario we have to check all possible candidates once, find none and we're done. Worst case, we check all possible candidates once, find one and have to check all possible candidates a second time. Where as if we remove the early checks we only ever have to check the candidates once. So we wouldn't really be stopping any earlier. yes. we would do check twice if there are TRE candidates. my idea was that number of cases when TRE is applicable less then when TRE is not applicable. Thus we would do double instruction navigation more often than double check for candidates. But, I did not measure real impact. Thus, let`s return old logic here as you suggested.

removed early check for TRE candidates from canTRE().

Harbormaster failed remote builds in B61788: Diff 273457!Jun 25 2020, 12:29 PM

ping.

rebased.

Harbormaster failed remote builds in B62690: Diff 275123!Jul 2 2020, 9:42 AM

@efriedma What do you think about current state of this patch? Is it OK?

efriedma added inline comments.Jul 7 2020, 3:09 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
95	If we're not going to try to do TRE at all on calls not marked "tail", we can probably drop this check.
332	It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the alloca. (Maybe we should check that the pointer argument is pointing at an alloca? That should usually be true anyway, but better to be on the safe side, I guess.) I don't think it's safe to hoist assume without additional checks; I think we'd need to check that the call is marked "willreturn"? Since this is sort of tricky, I'd prefer to split this off into a followup.
811–812	Can you move this FIXME into a more appropriate spot?

avl marked 3 inline comments as done.Jul 8 2020, 8:59 AM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
95	It looks to me that original idea(PR962) was to avoid inefficient code which is generated for dynamic alloca. Currently there would still be generated inefficient code: Doing TRE for dynamic alloca requires correct stack adjustment to avoid extra stack usage. i.e. dynamic stack reservation done for alloca should be restored in the end of the current iteration. Current TRE implementation does not do this. Please, consider the test case: #include <alloca.h> int count; __attribute__((noinline)) void globalIncrement(const int* param) { count += param; } void test(int recurseCount) { if (recurseCount == 0) return; { int temp = (int*)alloca(100); globalIncrement(temp); } test(recurseCount - 1); } Following is the x86 asm generated for the above test case in assumption that dynamic allocas are possible: .LBB1_2: movq %rsp, %rdi addq $-112, %rdi <<<<<<<<<<<<<< dynamic stack reservation, need to be restored before "jne .LBB1_2" movq %rdi, %rsp callq _Z15globalIncrementPKi addl $-1, %ebx jne .LBB1_2 So, it looks like we still have inefficient code here and it was a reason for avoiding TRE.
332	It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the alloca. (Maybe we should check that the pointer argument is pointing at an alloca? That should usually be true anyway, but better to be on the safe side, I guess.) OK, I would add checking that the pointer argument of lifetime.end is pointing to an alloca. I don't think it's safe to hoist assume without additional checks; I think we'd need to check that the call is marked "willreturn"? Since this is sort of tricky, I'd prefer to split this off into a followup. Ok, I would split Intrinsic::assume into another review.
811–812	OK.

efriedma added inline comments.Jul 8 2020, 12:08 PM

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
95	I guess we can leave this for a later patch. This isn't really any worse than the stack usage before TRE, assuming we can't emit a sibling call in the backend. And we could avoid this by making TRE insert stacksave/stackrestore intrinsics. But better to do one thing at a time.

addressed comments.

I think I'd like to see a testcase where there are multiple recursive calls, but only one is a tail call that can be eliminated.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
474	The hasOperandBundles() check looks completely new; is there some test for it? The `isNoTailCall()` check is currently redundant; it isn't legal to write "tail notail". I guess it makes sense to guard against that, though.
llvm/test/Transforms/TailCallElim/basic.ll
23	I'm not sure this is testing what it was originally supposed to. I guess that's okay, but please fix the comment at least.
llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll
20	For the purpose of this testcase, we don't need the definition of _Z15globalIncrementPKi.
37	I think I'd prefer to just generate this with update_test_checks.py

Harbormaster completed remote builds in B63513: Diff 276591.Jul 8 2020, 4:38 PM

avl marked an inline comment as done.Jul 8 2020, 4:43 PM

avl added inline comments.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
474	The hasOperandBundles() check looks completely new; is there some test for it? it is not new. it is copied from 245 line. Now, when patch changed from its original state all above conditions could be changed just to : if (!CI->isTailCall()) the test is Transforms/TailCallElim/deopt-bundle.ll The isNoTailCall() check is currently redundant; it isn't legal to write "tail notail". I guess it makes sense to guard against that, though. would add checking for that.

addressed comments:

added test for multiple recursive calls,
removed duplicated check for operand bundles,
simplified and commented tests.

Harbormaster failed remote builds in B63622: Diff 276799!Jul 9 2020, 12:21 PM

LGTM

This revision is now accepted and ready to land.Jul 9 2020, 4:15 PM

Thank you, for the review.

Closed by commit rGf7907e9d223d: [TRE] allow TRE for non-capturing calls. (authored by avl). · Explain WhyJul 11 2020, 4:03 AM

This revision was automatically updated to reflect the committed changes.

Hello. I have an auto-bisecting multi-stage bot that is failing on two after this change. Can we please revert this or commit a quick fix?

FAIL: Clang :: CXX/class/class.compare/class.spaceship/p1.cpp (6232 of 64222)
******************** TEST 'Clang :: CXX/class/class.compare/class.spaceship/p1.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions
--
Exit Code: 134

Command Output (stderr):
--
clang: /home/dave/s/lp/clang/lib/Basic/SourceManager.cpp:917: clang::FileID clang::SourceManager::getFileIDLoaded(unsigned int) const: Assertion `0 && "Invalid SLocOffset or bad function choice"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.  Program arguments: /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions
1.  /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:127:38: current parser token ','
2.  /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:39:1: parsing namespace 'Deletedness'
3.  /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:123:12: parsing function body 'Deletedness::g'
4.  /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp:123:12: in compound statement ('{}')
 #0 0x000000000359273f llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/tmp/_update_lc/t/bin/clang+0x359273f)
 #1 0x0000000003590912 llvm::sys::RunSignalHandlers() (/tmp/_update_lc/t/bin/clang+0x3590912)
 #2 0x0000000003592bb5 SignalHandler(int) (/tmp/_update_lc/t/bin/clang+0x3592bb5)
 #3 0x00007ffff7fa6a90 __restore_rt (/lib64/libpthread.so.0+0x14a90)
 #4 0x00007ffff7b3da25 raise (/lib64/libc.so.6+0x3ca25)
 #5 0x00007ffff7b26895 abort (/lib64/libc.so.6+0x25895)
 #6 0x00007ffff7b26769 _nl_load_domain.cold (/lib64/libc.so.6+0x25769)
 #7 0x00007ffff7b35e86 (/lib64/libc.so.6+0x34e86)
 #8 0x000000000375636c clang::SourceManager::getFileIDLoaded(unsigned int) const (/tmp/_update_lc/t/bin/clang+0x375636c)
 #9 0x0000000003ee0bbb clang::VerifyDiagnosticConsumer::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (/tmp/_update_lc/t/bin/clang+0x3ee0bbb)
#10 0x00000000037501ab clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&) const (/tmp/_update_lc/t/bin/clang+0x37501ab)
#11 0x0000000003749fca clang::DiagnosticsEngine::EmitCurrentDiagnostic(bool) (/tmp/_update_lc/t/bin/clang+0x3749fca)
#12 0x0000000004df0c60 clang::Sema::EmitCurrentDiagnostic(unsigned int) (/tmp/_update_lc/t/bin/clang+0x4df0c60)
#13 0x0000000005092783 (anonymous namespace)::DefaultedComparisonAnalyzer::visitBinaryOperator(clang::OverloadedOperatorKind, llvm::ArrayRef<clang::Expr*>, (anonymous namespace)::DefaultedComparisonSubobject, clang::OverloadCandidateSet*) (/tmp/_update_lc/t/bin/clang+0x5092783)
#14 0x0000000005091dba (anonymous namespace)::DefaultedComparisonAnalyzer::visitExpandedSubobject(clang::QualType, (anonymous namespace)::DefaultedComparisonSubobject) (/tmp/_update_lc/t/bin/clang+0x5091dba)
#15 0x0000000005091b86 (anonymous namespace)::DefaultedComparisonVisitor<(anonymous namespace)::DefaultedComparisonAnalyzer, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonSubobject>::visitSubobjects((anonymous namespace)::DefaultedComparisonInfo&, clang::CXXRecordDecl*, clang::Qualifiers) (/tmp/_update_lc/t/bin/clang+0x5091b86)
#16 0x0000000005058c8c (anonymous namespace)::DefaultedComparisonAnalyzer::visit() (/tmp/_update_lc/t/bin/clang+0x5058c8c)
#17 0x000000000505ab22 clang::Sema::DiagnoseDeletedDefaultedFunction(clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x505ab22)
#18 0x00000000053e60ed clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::UnresolvedSetImpl const&, clang::Expr*, clang::Expr*, bool, bool, clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x53e60ed)
#19 0x000000000514270a BuildOverloadedBinOp(clang::Sema&, clang::Scope*, clang::SourceLocation, clang::BinaryOperatorKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x514270a)
#20 0x00000000050fbf49 clang::Sema::ActOnBinOp(clang::Scope*, clang::SourceLocation, clang::tok::TokenKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x50fbf49)
#21 0x0000000004d52ccc clang::Parser::ParseRHSOfBinaryExpression(clang::ActionResult<clang::Expr*, true>, clang::prec::Level) (/tmp/_update_lc/t/bin/clang+0x4d52ccc)
#22 0x0000000004d51be9 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51be9)
#23 0x0000000004d60dba clang::Parser::ParseExpressionList(llvm::SmallVectorImpl<clang::Expr*>&, llvm::SmallVectorImpl<clang::SourceLocation>&, llvm::function_ref<void ()>) (/tmp/_update_lc/t/bin/clang+0x4d60dba)
#24 0x0000000004d542d9 clang::Parser::ParsePostfixExpressionSuffix(clang::ActionResult<clang::Expr*, true>) (/tmp/_update_lc/t/bin/clang+0x4d542d9)
#25 0x0000000004d55b95 clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) (/tmp/_update_lc/t/bin/clang+0x4d55b95)
#26 0x0000000004d51b89 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51b89)
#27 0x0000000004d51ac9 clang::Parser::ParseExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51ac9)
#28 0x0000000004d78368 clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) (/tmp/_update_lc/t/bin/clang+0x4d78368)
#29 0x0000000004d76ba0 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&) (/tmp/_update_lc/t/bin/clang+0x4d76ba0)
#30 0x0000000004d76614 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d76614)
#31 0x0000000004d7ecd2 clang::Parser::ParseCompoundStatementBody(bool) (/tmp/_update_lc/t/bin/clang+0x4d7ecd2)
#32 0x0000000004d7fcd0 clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) (/tmp/_update_lc/t/bin/clang+0x4d7fcd0)
#33 0x0000000004cfacc0 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/tmp/_update_lc/t/bin/clang+0x4cfacc0)
#34 0x0000000004d28f2d clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/tmp/_update_lc/t/bin/clang+0x4d28f2d)
#35 0x0000000004cf9f32 clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9f32)
#36 0x0000000004cf9938 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9938)
#37 0x0000000004cf86fc clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf86fc)
#38 0x0000000004d02c15 clang::Parser::ParseInnerNamespace(llvm::SmallVector<clang::Parser::InnerNamespaceInfo, 4u> const&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&) (/tmp/_update_lc/t/bin/clang+0x4d02c15)
#39 0x0000000004d0251a clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) (/tmp/_update_lc/t/bin/clang+0x4d0251a)
#40 0x0000000004d22f0a clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d22f0a)
#41 0x0000000004cf7e39 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf7e39)
#42 0x0000000004cf6858 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, bool) (/tmp/_update_lc/t/bin/clang+0x4cf6858)
#43 0x0000000004cf16ed clang::ParseAST(clang::Sema&, bool, bool) (/tmp/_update_lc/t/bin/clang+0x4cf16ed)
#44 0x0000000003e3eb21 clang::FrontendAction::Execute() (/tmp/_update_lc/t/bin/clang+0x3e3eb21)
#45 0x0000000003dba0e3 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/tmp/_update_lc/t/bin/clang+0x3dba0e3)
#46 0x0000000003ee796b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/tmp/_update_lc/t/bin/clang+0x3ee796b)
#47 0x0000000002244636 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/tmp/_update_lc/t/bin/clang+0x2244636)
#48 0x000000000224297d ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (/tmp/_update_lc/t/bin/clang+0x224297d)
#49 0x0000000002242619 main (/tmp/_update_lc/t/bin/clang+0x2242619)
#50 0x00007ffff7b28042 __libc_start_main (/lib64/libc.so.6+0x27042)
#51 0x000000000223f8ce _start (/tmp/_update_lc/t/bin/clang+0x223f8ce)
/tmp/_update_lc/t/tools/clang/test/CXX/class/class.compare/class.spaceship/Output/p1.cpp.script: line 1: 4146089 Aborted                 /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.spaceship/p1.cpp -fcxx-exceptions

--

********************
Testing:  0..
FAIL: Clang :: CXX/class/class.compare/class.eq/p2.cpp (6242 of 64222)
******************** TEST 'Clang :: CXX/class/class.compare/class.eq/p2.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp
--
Exit Code: 134

Command Output (stderr):
--
clang: /home/dave/s/lp/clang/lib/Basic/SourceManager.cpp:917: clang::FileID clang::SourceManager::getFileIDLoaded(unsigned int) const: Assertion `0 && "Invalid SLocOffset or bad function choice"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.  Program arguments: /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp
1.  /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:47:30: current parser token ')'
2.  /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:30:13: parsing function body 'test'
3.  /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp:30:13: in compound statement ('{}')
 #0 0x000000000359273f llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/tmp/_update_lc/t/bin/clang+0x359273f)
 #1 0x0000000003590912 llvm::sys::RunSignalHandlers() (/tmp/_update_lc/t/bin/clang+0x3590912)
 #2 0x0000000003592bb5 SignalHandler(int) (/tmp/_update_lc/t/bin/clang+0x3592bb5)
 #3 0x00007ffff7fa6a90 __restore_rt (/lib64/libpthread.so.0+0x14a90)
 #4 0x00007ffff7b3da25 raise (/lib64/libc.so.6+0x3ca25)
 #5 0x00007ffff7b26895 abort (/lib64/libc.so.6+0x25895)
 #6 0x00007ffff7b26769 _nl_load_domain.cold (/lib64/libc.so.6+0x25769)
 #7 0x00007ffff7b35e86 (/lib64/libc.so.6+0x34e86)
 #8 0x000000000375636c clang::SourceManager::getFileIDLoaded(unsigned int) const (/tmp/_update_lc/t/bin/clang+0x375636c)
 #9 0x0000000003ee0bbb clang::VerifyDiagnosticConsumer::HandleDiagnostic(clang::DiagnosticsEngine::Level, clang::Diagnostic const&) (/tmp/_update_lc/t/bin/clang+0x3ee0bbb)
#10 0x00000000037501ab clang::DiagnosticIDs::ProcessDiag(clang::DiagnosticsEngine&) const (/tmp/_update_lc/t/bin/clang+0x37501ab)
#11 0x0000000003749fca clang::DiagnosticsEngine::EmitCurrentDiagnostic(bool) (/tmp/_update_lc/t/bin/clang+0x3749fca)
#12 0x0000000004df0c60 clang::Sema::EmitCurrentDiagnostic(unsigned int) (/tmp/_update_lc/t/bin/clang+0x4df0c60)
#13 0x00000000050928b7 (anonymous namespace)::DefaultedComparisonAnalyzer::visitBinaryOperator(clang::OverloadedOperatorKind, llvm::ArrayRef<clang::Expr*>, (anonymous namespace)::DefaultedComparisonSubobject, clang::OverloadCandidateSet*) (/tmp/_update_lc/t/bin/clang+0x50928b7)
#14 0x0000000005091dba (anonymous namespace)::DefaultedComparisonAnalyzer::visitExpandedSubobject(clang::QualType, (anonymous namespace)::DefaultedComparisonSubobject) (/tmp/_update_lc/t/bin/clang+0x5091dba)
#15 0x0000000005091b86 (anonymous namespace)::DefaultedComparisonVisitor<(anonymous namespace)::DefaultedComparisonAnalyzer, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonInfo, (anonymous namespace)::DefaultedComparisonSubobject>::visitSubobjects((anonymous namespace)::DefaultedComparisonInfo&, clang::CXXRecordDecl*, clang::Qualifiers) (/tmp/_update_lc/t/bin/clang+0x5091b86)
#16 0x0000000005058c8c (anonymous namespace)::DefaultedComparisonAnalyzer::visit() (/tmp/_update_lc/t/bin/clang+0x5058c8c)
#17 0x000000000505ab22 clang::Sema::DiagnoseDeletedDefaultedFunction(clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x505ab22)
#18 0x00000000053e60ed clang::Sema::CreateOverloadedBinOp(clang::SourceLocation, clang::BinaryOperatorKind, clang::UnresolvedSetImpl const&, clang::Expr*, clang::Expr*, bool, bool, clang::FunctionDecl*) (/tmp/_update_lc/t/bin/clang+0x53e60ed)
#19 0x000000000514270a BuildOverloadedBinOp(clang::Sema&, clang::Scope*, clang::SourceLocation, clang::BinaryOperatorKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x514270a)
#20 0x00000000050fbf49 clang::Sema::ActOnBinOp(clang::Scope*, clang::SourceLocation, clang::tok::TokenKind, clang::Expr*, clang::Expr*) (/tmp/_update_lc/t/bin/clang+0x50fbf49)
#21 0x0000000004d52ccc clang::Parser::ParseRHSOfBinaryExpression(clang::ActionResult<clang::Expr*, true>, clang::prec::Level) (/tmp/_update_lc/t/bin/clang+0x4d52ccc)
#22 0x0000000004d51be9 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51be9)
#23 0x0000000004d60dba clang::Parser::ParseExpressionList(llvm::SmallVectorImpl<clang::Expr*>&, llvm::SmallVectorImpl<clang::SourceLocation>&, llvm::function_ref<void ()>) (/tmp/_update_lc/t/bin/clang+0x4d60dba)
#24 0x0000000004d4b29c clang::Parser::ParseCXXTypeConstructExpression(clang::DeclSpec const&) (/tmp/_update_lc/t/bin/clang+0x4d4b29c)
#25 0x0000000004d57617 clang::Parser::ParseCastExpression(clang::Parser::CastParseKind, bool, bool&, clang::Parser::TypeCastState, bool, bool*) (/tmp/_update_lc/t/bin/clang+0x4d57617)
#26 0x0000000004d51b89 clang::Parser::ParseAssignmentExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51b89)
#27 0x0000000004d51ac9 clang::Parser::ParseExpression(clang::Parser::TypeCastState) (/tmp/_update_lc/t/bin/clang+0x4d51ac9)
#28 0x0000000004d78368 clang::Parser::ParseExprStatement(clang::Parser::ParsedStmtContext) (/tmp/_update_lc/t/bin/clang+0x4d78368)
#29 0x0000000004d76ba0 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&) (/tmp/_update_lc/t/bin/clang+0x4d76ba0)
#30 0x0000000004d76614 clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) (/tmp/_update_lc/t/bin/clang+0x4d76614)
#31 0x0000000004d7ecd2 clang::Parser::ParseCompoundStatementBody(bool) (/tmp/_update_lc/t/bin/clang+0x4d7ecd2)
#32 0x0000000004d7fcd0 clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) (/tmp/_update_lc/t/bin/clang+0x4d7fcd0)
#33 0x0000000004cfacc0 clang::Parser::ParseFunctionDefinition(clang::ParsingDeclarator&, clang::Parser::ParsedTemplateInfo const&, clang::Parser::LateParsedAttrList*) (/tmp/_update_lc/t/bin/clang+0x4cfacc0)
#34 0x0000000004d28f2d clang::Parser::ParseDeclGroup(clang::ParsingDeclSpec&, clang::DeclaratorContext, clang::SourceLocation*, clang::Parser::ForRangeInit*) (/tmp/_update_lc/t/bin/clang+0x4d28f2d)
#35 0x0000000004cf9f32 clang::Parser::ParseDeclOrFunctionDefInternal(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec&, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9f32)
#36 0x0000000004cf9938 clang::Parser::ParseDeclarationOrFunctionDefinition(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*, clang::AccessSpecifier) (/tmp/_update_lc/t/bin/clang+0x4cf9938)
#37 0x0000000004cf86fc clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (/tmp/_update_lc/t/bin/clang+0x4cf86fc)
#38 0x0000000004cf6858 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, bool) (/tmp/_update_lc/t/bin/clang+0x4cf6858)
#39 0x0000000004cf16ed clang::ParseAST(clang::Sema&, bool, bool) (/tmp/_update_lc/t/bin/clang+0x4cf16ed)
#40 0x0000000003e3eb21 clang::FrontendAction::Execute() (/tmp/_update_lc/t/bin/clang+0x3e3eb21)
#41 0x0000000003dba0e3 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/tmp/_update_lc/t/bin/clang+0x3dba0e3)
#42 0x0000000003ee796b clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/tmp/_update_lc/t/bin/clang+0x3ee796b)
#43 0x0000000002244636 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/tmp/_update_lc/t/bin/clang+0x2244636)
#44 0x000000000224297d ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (/tmp/_update_lc/t/bin/clang+0x224297d)
#45 0x0000000002242619 main (/tmp/_update_lc/t/bin/clang+0x2242619)
#46 0x00007ffff7b28042 __libc_start_main (/lib64/libc.so.6+0x27042)
#47 0x000000000223f8ce _start (/tmp/_update_lc/t/bin/clang+0x223f8ce)
/tmp/_update_lc/t/tools/clang/test/CXX/class/class.compare/class.eq/Output/p2.cpp.script: line 1: 4146047 Aborted                 /tmp/_update_lc/t/bin/clang -cc1 -internal-isystem /tmp/_update_lc/t/lib/clang/11.0.0/include -nostdsysteminc -std=c++2a -verify /home/dave/s/lp/clang/test/CXX/class/class.compare/class.eq/p2.cpp

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..
********************
Failed Tests (2):
  Clang :: CXX/class/class.compare/class.eq/p2.cpp
  Clang :: CXX/class/class.compare/class.spaceship/p1.cpp


Testing Time: 117.51s
  Unsupported      : 12906
  Passed           : 51214
  Expectedly Failed:   100
  Failed           :     2
FAILED: CMakeFiles/check-all
cd /tmp/_update_lc/t && /usr/bin/python3.8 /tmp/_update_lc/t/./bin/llvm-lit -sv --param USE_Z3_SOLVER=0 /tmp/_update_lc/t/tools/clang/test /tmp/_update_lc/t/tools/lld/test /tmp/_update_lc/t/tools/lldb/test /tmp/_update_lc/t/utils/lit /tmp/_update_lc/t/test
ninja: build stopped: subcommand failed.
+ do_error 'FAILURE -- STAGE TWO BUILD of LLVM' 12
+ echo FAILURE -- STAGE TWO BUILD of LLVM
FAILURE -- STAGE TWO BUILD of LLVM
+ exit 12

avl mentioned this in D85614: [TRE] Reland: allow TRE for non-capturing calls..Aug 9 2020, 9:57 AM

avl mentioned this in rG10c2e261598a: [TRE] Reland: allow TRE for non-capturing calls..May 25 2021, 1:37 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

TailRecursionElimination.cpp

114 lines

test/

Transforms/

TailCallElim/

basic.ll

4 lines

tre-noncapturing-alloca-calls.ll

69 lines

Diff 276591

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
		#include "llvm/Transforms/Utils/Local.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "tailcallelim"		#define DEBUG_TYPE "tailcallelim"

STATISTIC(NumEliminated, "Number of tail calls removed");		STATISTIC(NumEliminated, "Number of tail calls removed");
STATISTIC(NumRetDuped, "Number of return duplicated");		STATISTIC(NumRetDuped, "Number of return duplicated");
STATISTIC(NumAccumAdded, "Number of accumulators introduced");		STATISTIC(NumAccumAdded, "Number of accumulators introduced");

/// Scan the specified function for alloca instructions.		/// Scan the specified function for alloca instructions.
/// If it contains any dynamic allocas, returns false.		/// If it contains any dynamic allocas, returns false.
static bool canTRE(Function &F) {		static bool canTRE(Function &F) {
		efriedmaUnsubmitted Not Done Reply Inline Actions If we're not going to try to do TRE at all on calls not marked "tail", we can probably drop this check. efriedma: If we're not going to try to do TRE at all on calls not marked "tail", we can probably drop…
		avlAuthorUnsubmitted Done Reply Inline Actions It looks to me that original idea(PR962) was to avoid inefficient code which is generated for dynamic alloca. Currently there would still be generated inefficient code: Doing TRE for dynamic alloca requires correct stack adjustment to avoid extra stack usage. i.e. dynamic stack reservation done for alloca should be restored in the end of the current iteration. Current TRE implementation does not do this. Please, consider the test case: #include <alloca.h> int count; __attribute__((noinline)) void globalIncrement(const int* param) { count += param; } void test(int recurseCount) { if (recurseCount == 0) return; { int temp = (int)alloca(100); globalIncrement(temp); } test(recurseCount - 1); } Following is the x86 asm generated for the above test case in assumption that dynamic allocas are possible: .LBB1_2: movq %rsp, %rdi addq $-112, %rdi <<<<<<<<<<<<<< dynamic stack reservation, need to be restored before "jne .LBB1_2" movq %rdi, %rsp callq _Z15globalIncrementPKi addl $-1, %ebx jne .LBB1_2 So, it looks like we still have inefficient code here and it was a reason for avoiding TRE. avl:* It looks to me that original idea(PR962) was to avoid inefficient code which is generated for…
		efriedmaUnsubmitted Not Done Reply Inline Actions I guess we can leave this for a later patch. This isn't really any worse than the stack usage before TRE, assuming we can't emit a sibling call in the backend. And we could avoid this by making TRE insert stacksave/stackrestore intrinsics. But better to do one thing at a time. efriedma: I guess we can leave this for a later patch. This isn't really any worse than the stack usage…
// Because of PR962, we don't TRE dynamic allocas.		// TODO: We don't do TRE if dynamic allocas are used.
		// Dynamic allocas allocate stack space which should be
		// deallocated before new iteration started. That is
		// currently not implemented.
return llvm::all_of(instructions(F), [](Instruction &I) {		return llvm::all_of(instructions(F), [](Instruction &I) {
auto *AI = dyn_cast<AllocaInst>(&I);		auto *AI = dyn_cast<AllocaInst>(&I);
return !AI \|\| AI->isStaticAlloca();		return !AI \|\| AI->isStaticAlloca();
});		});
}		}

namespace {		namespace {
struct AllocaDerivedValueTracker {		struct AllocaDerivedValueTracker {
Show All 29 Lines	while (!Worklist.empty()) {
if (CB.isArgOperand(U) && CB.isByValArgument(CB.getArgOperandNo(U)))		if (CB.isArgOperand(U) && CB.isByValArgument(CB.getArgOperandNo(U)))
continue;		continue;
bool IsNocapture =		bool IsNocapture =
CB.isDataOperand(U) && CB.doesNotCapture(CB.getDataOperandNo(U));		CB.isDataOperand(U) && CB.doesNotCapture(CB.getDataOperandNo(U));
callUsesLocalStack(CB, IsNocapture);		callUsesLocalStack(CB, IsNocapture);
if (IsNocapture) {		if (IsNocapture) {
// If the alloca-derived argument is passed in as nocapture, then it		// If the alloca-derived argument is passed in as nocapture, then it
// can't propagate to the call's return. That would be capturing.		// can't propagate to the call's return. That would be capturing.
continue;		continue;
		efriedmaUnsubmitted Not Done Reply Inline Actions Please don't add code to examine the callee; if we're not deducing nocapture appropriately, we should fix that elsewhere. efriedma: Please don't add code to examine the callee; if we're not deducing nocapture appropriately, we…
		avlAuthorUnsubmitted Done Reply Inline Actions Ok. avl: Ok.
}		}
break;		break;
}		}
case Instruction::Load: {		case Instruction::Load: {
// The result of a load is not alloca-derived (unless an alloca has		// The result of a load is not alloca-derived (unless an alloca has
// otherwise escaped, but this is a local analysis).		// otherwise escaped, but this is a local analysis).
continue;		continue;
}		}
Show All 30 Lines	if (!CB.onlyReadsMemory())
EscapePoints.insert(&CB);		EscapePoints.insert(&CB);
}		}

SmallPtrSet<Instruction *, 32> AllocaUsers;		SmallPtrSet<Instruction *, 32> AllocaUsers;
SmallPtrSet<Instruction *, 32> EscapePoints;		SmallPtrSet<Instruction *, 32> EscapePoints;
};		};
}		}

static bool markTails(Function &F, bool &AllCallsAreTailCalls,		static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) {
OptimizationRemarkEmitter *ORE) {
if (F.callsFunctionThatReturnsTwice())		if (F.callsFunctionThatReturnsTwice())
return false;		return false;
AllCallsAreTailCalls = true;

// The local stack holds all alloca instructions and all byval arguments.		// The local stack holds all alloca instructions and all byval arguments.
AllocaDerivedValueTracker Tracker;		AllocaDerivedValueTracker Tracker;
for (Argument &Arg : F.args()) {		for (Argument &Arg : F.args()) {
if (Arg.hasByValAttr())		if (Arg.hasByValAttr())
Tracker.walk(&Arg);		Tracker.walk(&Arg);
}		}
for (auto &BB : F) {		for (auto &BB : F) {
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	for (auto &I : *BB) {
<< "marked as tail call candidate (readnone)";		<< "marked as tail call candidate (readnone)";
});		});
CI->setTailCall();		CI->setTailCall();
Modified = true;		Modified = true;
continue;		continue;
}		}
}		}

if (!IsNoTail && Escaped == UNESCAPED && !Tracker.AllocaUsers.count(CI)) {		if (!IsNoTail && Escaped == UNESCAPED && !Tracker.AllocaUsers.count(CI))
DeferredTails.push_back(CI);		DeferredTails.push_back(CI);
} else {
AllCallsAreTailCalls = false;
}
}		}

for (auto *SuccBB : make_range(succ_begin(BB), succ_end(BB))) {		for (auto *SuccBB : make_range(succ_begin(BB), succ_end(BB))) {
auto &State = Visited[SuccBB];		auto &State = Visited[SuccBB];
if (State < Escaped) {		if (State < Escaped) {
State = Escaped;		State = Escaped;
if (State == ESCAPED)		if (State == ESCAPED)
WorklistEscaped.push_back(SuccBB);		WorklistEscaped.push_back(SuccBB);
Show All 20 Lines	static bool markTails(Function &F, OptimizationRemarkEmitter *ORE) {

for (CallInst *CI : DeferredTails) {		for (CallInst *CI : DeferredTails) {
if (Visited[CI->getParent()] != ESCAPED) {		if (Visited[CI->getParent()] != ESCAPED) {
// If the escape point was part way through the block, calls after the		// If the escape point was part way through the block, calls after the
// escape point wouldn't have been put into DeferredTails.		// escape point wouldn't have been put into DeferredTails.
LLVM_DEBUG(dbgs() << "Marked as tail call candidate: " << *CI << "\n");		LLVM_DEBUG(dbgs() << "Marked as tail call candidate: " << *CI << "\n");
CI->setTailCall();		CI->setTailCall();
Modified = true;		Modified = true;
} else {
AllCallsAreTailCalls = false;
}		}
}		}

return Modified;		return Modified;
}		}

/// Return true if it is safe to move the specified		/// Return true if it is safe to move the specified
/// instruction from after the call to before the call, assuming that all		/// instruction from after the call to before the call, assuming that all
/// instructions between the call and this instruction are movable.		/// instructions between the call and this instruction are movable.
///		///
static bool canMoveAboveCall(Instruction I, CallInst CI, AliasAnalysis *AA) {		static bool canMoveAboveCall(Instruction I, CallInst CI, AliasAnalysis *AA,
		DenseMap<Value , AllocaInst > &AllocaForValue) {
		if (isa<DbgInfoIntrinsic>(I))
		return true;

		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I))
		if (II->getIntrinsicID() == Intrinsic::lifetime_end &&
		llvm::findAllocaForValue(II->getArgOperand(1), AllocaForValue))
		efriedmaUnsubmitted Not Done Reply Inline Actions What is the new handling for lifetime.end/assume doing? efriedma: What is the new handling for lifetime.end/assume doing?
		avlAuthorUnsubmitted Done Reply Inline Actions They are just skipped. In following test case: call void @_Z5test5i(i32 %sub) call void @llvm.lifetime.end.p0i8(i64 24, i8* nonnull %1) #5 call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #5 br label %return they are generated in between call and ret. It is safe to ignore them while checking whether transformation is possible. avl: They are just skipped. In following test case: ``` call void @_Z5test5i(i32 %sub) call…
		efriedmaUnsubmitted Not Done Reply Inline Actions It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the alloca. (Maybe we should check that the pointer argument is pointing at an alloca? That should usually be true anyway, but better to be on the safe side, I guess.) I don't think it's safe to hoist assume without additional checks; I think we'd need to check that the call is marked "willreturn"? Since this is sort of tricky, I'd prefer to split this off into a followup. efriedma: It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the…
		avlAuthorUnsubmitted Done Reply Inline Actions It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the alloca. (Maybe we should check that the pointer argument is pointing at an alloca? That should usually be true anyway, but better to be on the safe side, I guess.) OK, I would add checking that the pointer argument of lifetime.end is pointing to an alloca. I don't think it's safe to hoist assume without additional checks; I think we'd need to check that the call is marked "willreturn"? Since this is sort of tricky, I'd prefer to split this off into a followup. Ok, I would split Intrinsic::assume into another review. avl: >It makes sense we can ignore lifetime.end on an alloca: we know the call doesn't refer to the…
		return true;

// FIXME: We can move load/store/call/free instructions above the call if the		// FIXME: We can move load/store/call/free instructions above the call if the
// call does not mod/ref the memory location being processed.		// call does not mod/ref the memory location being processed.
if (I->mayHaveSideEffects()) // This also handles volatile loads.		if (I->mayHaveSideEffects()) // This also handles volatile loads.
return false;		return false;

if (LoadInst *L = dyn_cast<LoadInst>(I)) {		if (LoadInst *L = dyn_cast<LoadInst>(I)) {
// Loads may always be moved above calls without side effects.		// Loads may always be moved above calls without side effects.
if (CI->mayHaveSideEffects()) {		if (CI->mayHaveSideEffects()) {
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	class TailRecursionEliminator {
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;
DomTreeUpdater &DTU;		DomTreeUpdater &DTU;

// The below are shared state we want to have available when eliminating any		// The below are shared state we want to have available when eliminating any
// calls in the function. There values should be populated by		// calls in the function. There values should be populated by
// createTailRecurseLoopHeader the first time we find a call we can eliminate.		// createTailRecurseLoopHeader the first time we find a call we can eliminate.
BasicBlock *HeaderBB = nullptr;		BasicBlock *HeaderBB = nullptr;
SmallVector<PHINode *, 8> ArgumentPHIs;		SmallVector<PHINode *, 8> ArgumentPHIs;
bool RemovableCallsMustBeMarkedTail = false;

// PHI node to store our return value.		// PHI node to store our return value.
PHINode *RetPN = nullptr;		PHINode *RetPN = nullptr;

// i1 PHI node to track if we have a valid return value stored in RetPN.		// i1 PHI node to track if we have a valid return value stored in RetPN.
PHINode *RetKnownPN = nullptr;		PHINode *RetKnownPN = nullptr;

// Vector of select instructions we insereted. These selects use RetKnownPN		// Vector of select instructions we insereted. These selects use RetKnownPN
// to either propagate RetPN or select a new return value.		// to either propagate RetPN or select a new return value.
SmallVector<SelectInst *, 8> RetSelects;		SmallVector<SelectInst *, 8> RetSelects;

// The below are shared state needed when performing accumulator recursion.		// The below are shared state needed when performing accumulator recursion.
// There values should be populated by insertAccumulator the first time we		// There values should be populated by insertAccumulator the first time we
// find an elimination that requires an accumulator.		// find an elimination that requires an accumulator.

// PHI node to store our current accumulated value.		// PHI node to store our current accumulated value.
PHINode *AccPN = nullptr;		PHINode *AccPN = nullptr;

// The instruction doing the accumulating.		// The instruction doing the accumulating.
Instruction *AccumulatorRecursionInstr = nullptr;		Instruction *AccumulatorRecursionInstr = nullptr;

		// The cache for <value, alloca instruction> pairs.
		DenseMap<Value , AllocaInst > AllocaForValue;

TailRecursionEliminator(Function &F, const TargetTransformInfo *TTI,		TailRecursionEliminator(Function &F, const TargetTransformInfo *TTI,
AliasAnalysis AA, OptimizationRemarkEmitter ORE,		AliasAnalysis AA, OptimizationRemarkEmitter ORE,
DomTreeUpdater &DTU)		DomTreeUpdater &DTU)
: F(F), TTI(TTI), AA(AA), ORE(ORE), DTU(DTU) {}		: F(F), TTI(TTI), AA(AA), ORE(ORE), DTU(DTU) {}

CallInst findTRECandidate(Instruction TI,		CallInst findTRECandidate(Instruction TI);
bool CannotTailCallElimCallsMarkedTail);

void createTailRecurseLoopHeader(CallInst *CI);		void createTailRecurseLoopHeader(CallInst *CI);

void insertAccumulator(Instruction *AccRecInstr);		void insertAccumulator(Instruction *AccRecInstr);

bool eliminateCall(CallInst *CI);		bool eliminateCall(CallInst *CI);

bool foldReturnAndProcessPred(ReturnInst *Ret,		bool foldReturnAndProcessPred(ReturnInst *Ret);
bool CannotTailCallElimCallsMarkedTail);

bool processReturningBlock(ReturnInst *Ret,		bool processReturningBlock(ReturnInst *Ret);
bool CannotTailCallElimCallsMarkedTail);

void cleanupAndFinalize();		void cleanupAndFinalize();

public:		public:
static bool eliminate(Function &F, const TargetTransformInfo *TTI,		static bool eliminate(Function &F, const TargetTransformInfo *TTI,
AliasAnalysis AA, OptimizationRemarkEmitter ORE,		AliasAnalysis AA, OptimizationRemarkEmitter ORE,
DomTreeUpdater &DTU);		DomTreeUpdater &DTU);
};		};
} // namespace		} // namespace

CallInst *TailRecursionEliminator::findTRECandidate(		CallInst TailRecursionEliminator::findTRECandidate(Instruction TI) {
Instruction *TI, bool CannotTailCallElimCallsMarkedTail) {
BasicBlock *BB = TI->getParent();		BasicBlock *BB = TI->getParent();

if (&BB->front() == TI) // Make sure there is something before the terminator.		if (&BB->front() == TI) // Make sure there is something before the terminator.
return nullptr;		return nullptr;

// Scan backwards from the return, checking to see if there is a tail call in		// Scan backwards from the return, checking to see if there is a tail call in
// this block. If so, set CI to it.		// this block. If so, set CI to it.
CallInst *CI = nullptr;		CallInst *CI = nullptr;
BasicBlock::iterator BBI(TI);		BasicBlock::iterator BBI(TI);
while (true) {		while (true) {
CI = dyn_cast<CallInst>(BBI);		CI = dyn_cast<CallInst>(BBI);
if (CI && CI->getCalledFunction() == &F)		if (CI && CI->getCalledFunction() == &F)
break;		break;

if (BBI == BB->begin())		if (BBI == BB->begin())
return nullptr; // Didn't find a potential tail call.		return nullptr; // Didn't find a potential tail call.
--BBI;		--BBI;
}		}

// If this call is marked as a tail call, and if there are dynamic allocas in		// Do not consider CI to be valid TRE candidate
// the function, we cannot perform this optimization.		// if CI explicitly marked as NoTailcall or has
if (CI->isTailCall() && CannotTailCallElimCallsMarkedTail)		// Operand Bundles or not marked as TailCall.
		if (CI->isNoTailCall() \|\| CI->hasOperandBundles() \|\| !CI->isTailCall())
		efriedmaUnsubmitted Not Done Reply Inline Actions The hasOperandBundles() check looks completely new; is there some test for it? The `isNoTailCall()` check is currently redundant; it isn't legal to write "tail notail". I guess it makes sense to guard against that, though. efriedma: The hasOperandBundles() check looks completely new; is there some test for it? The…
		avlAuthorUnsubmitted Done Reply Inline Actions The hasOperandBundles() check looks completely new; is there some test for it? it is not new. it is copied from 245 line. Now, when patch changed from its original state all above conditions could be changed just to : if (!CI->isTailCall()) the test is Transforms/TailCallElim/deopt-bundle.ll The isNoTailCall() check is currently redundant; it isn't legal to write "tail notail". I guess it makes sense to guard against that, though. would add checking for that. avl: >The hasOperandBundles() check looks completely new; is there some test for it? it is not new.
return nullptr;		return nullptr;

// As a special case, detect code like this:		// As a special case, detect code like this:
// double fabs(double f) { return __builtin_fabs(f); } // a 'fabs' call		// double fabs(double f) { return __builtin_fabs(f); } // a 'fabs' call
// and disable this xform in this case, because the code generator will		// and disable this xform in this case, because the code generator will
// lower the call to fabs into inline code.		// lower the call to fabs into inline code.
if (BB == &F.getEntryBlock() &&		if (BB == &F.getEntryBlock() &&
firstNonDbg(BB->front().getIterator()) == CI &&		firstNonDbg(BB->front().getIterator()) == CI &&
Show All 15 Lines
void TailRecursionEliminator::createTailRecurseLoopHeader(CallInst *CI) {		void TailRecursionEliminator::createTailRecurseLoopHeader(CallInst *CI) {
HeaderBB = &F.getEntryBlock();		HeaderBB = &F.getEntryBlock();
BasicBlock *NewEntry = BasicBlock::Create(F.getContext(), "", &F, HeaderBB);		BasicBlock *NewEntry = BasicBlock::Create(F.getContext(), "", &F, HeaderBB);
NewEntry->takeName(HeaderBB);		NewEntry->takeName(HeaderBB);
HeaderBB->setName("tailrecurse");		HeaderBB->setName("tailrecurse");
BranchInst *BI = BranchInst::Create(HeaderBB, NewEntry);		BranchInst *BI = BranchInst::Create(HeaderBB, NewEntry);
BI->setDebugLoc(CI->getDebugLoc());		BI->setDebugLoc(CI->getDebugLoc());

// If this function has self recursive calls in the tail position where some
// are marked tail and some are not, only transform one flavor or another.
// We have to choose whether we move allocas in the entry block to the new
// entry block or not, so we can't make a good choice for both. We make this
// decision here based on whether the first call we found to remove is
// marked tail.
// NOTE: We could do slightly better here in the case that the function has
// no entry block allocas.
RemovableCallsMustBeMarkedTail = CI->isTailCall();

// If this tail call is marked 'tail' and if there are any allocas in the
// entry block, move them up to the new entry block.
if (RemovableCallsMustBeMarkedTail)
// Move all fixed sized allocas from HeaderBB to NewEntry.		// Move all fixed sized allocas from HeaderBB to NewEntry.
for (BasicBlock::iterator OEBI = HeaderBB->begin(), E = HeaderBB->end(),		for (BasicBlock::iterator OEBI = HeaderBB->begin(), E = HeaderBB->end(),
NEBI = NewEntry->begin();		NEBI = NewEntry->begin();
OEBI != E;)		OEBI != E;)
if (AllocaInst *AI = dyn_cast<AllocaInst>(OEBI++))		if (AllocaInst *AI = dyn_cast<AllocaInst>(OEBI++))
if (isa<ConstantInt>(AI->getArraySize()))		if (isa<ConstantInt>(AI->getArraySize()))
AI->moveBefore(&*NEBI);		AI->moveBefore(&*NEBI);

// Now that we have created a new block, which jumps to the entry		// Now that we have created a new block, which jumps to the entry
// block, insert a PHI node for each argument of the function.		// block, insert a PHI node for each argument of the function.
// For now, we initialize each PHI to only have the real arguments		// For now, we initialize each PHI to only have the real arguments
// which are passed in.		// which are passed in.
Instruction *InsertPos = &HeaderBB->front();		Instruction *InsertPos = &HeaderBB->front();
for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) {		for (Function::arg_iterator I = F.arg_begin(), E = F.arg_end(); I != E; ++I) {
PHINode *PN =		PHINode *PN =
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	bool TailRecursionEliminator::eliminateCall(CallInst *CI) {

// Ok, we found a potential tail call. We can currently only transform the		// Ok, we found a potential tail call. We can currently only transform the
// tail call if all of the instructions between the call and the return are		// tail call if all of the instructions between the call and the return are
// movable to above the call itself, leaving the call next to the return.		// movable to above the call itself, leaving the call next to the return.
// Check that this is the case now.		// Check that this is the case now.
Instruction *AccRecInstr = nullptr;		Instruction *AccRecInstr = nullptr;
BasicBlock::iterator BBI(CI);		BasicBlock::iterator BBI(CI);
for (++BBI; &*BBI != Ret; ++BBI) {		for (++BBI; &*BBI != Ret; ++BBI) {
if (canMoveAboveCall(&*BBI, CI, AA))		if (canMoveAboveCall(&*BBI, CI, AA, AllocaForValue))
continue;		continue;

// If we can't move the instruction above the call, it might be because it		// If we can't move the instruction above the call, it might be because it
// is an associative and commutative operation that could be transformed		// is an associative and commutative operation that could be transformed
// using accumulator recursion elimination. Check to see if this is the		// using accumulator recursion elimination. Check to see if this is the
// case, and if so, remember which instruction accumulates for later.		// case, and if so, remember which instruction accumulates for later.
if (AccPN \|\| !canTransformAccumulatorRecursion(&*BBI, CI))		if (AccPN \|\| !canTransformAccumulatorRecursion(&*BBI, CI))
return false; // We cannot eliminate the tail recursion!		return false; // We cannot eliminate the tail recursion!
Show All 11 Lines	return OptimizationRemark(DEBUG_TYPE, "tailcall-recursion", CI)
<< "transforming tail recursion into loop";		<< "transforming tail recursion into loop";
});		});

// OK! We can transform this tail call. If this is the first one found,		// OK! We can transform this tail call. If this is the first one found,
// create the new entry block, allowing us to branch back to the old entry.		// create the new entry block, allowing us to branch back to the old entry.
if (!HeaderBB)		if (!HeaderBB)
createTailRecurseLoopHeader(CI);		createTailRecurseLoopHeader(CI);

if (RemovableCallsMustBeMarkedTail && !CI->isTailCall())
return false;

// Ok, now that we know we have a pseudo-entry block WITH all of the		// Ok, now that we know we have a pseudo-entry block WITH all of the
// required PHI nodes, add entries into the PHI node for the actual		// required PHI nodes, add entries into the PHI node for the actual
// parameters passed into the tail-recursive call.		// parameters passed into the tail-recursive call.
for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i)		for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i)
ArgumentPHIs[i]->addIncoming(CI->getArgOperand(i), BB);		ArgumentPHIs[i]->addIncoming(CI->getArgOperand(i), BB);

if (AccRecInstr) {		if (AccRecInstr) {
insertAccumulator(AccRecInstr);		insertAccumulator(AccRecInstr);
Show All 33 Lines	bool TailRecursionEliminator::eliminateCall(CallInst *CI) {

BB->getInstList().erase(Ret); // Remove return.		BB->getInstList().erase(Ret); // Remove return.
BB->getInstList().erase(CI); // Remove call.		BB->getInstList().erase(CI); // Remove call.
DTU.applyUpdates({{DominatorTree::Insert, BB, HeaderBB}});		DTU.applyUpdates({{DominatorTree::Insert, BB, HeaderBB}});
++NumEliminated;		++NumEliminated;
return true;		return true;
}		}

bool TailRecursionEliminator::foldReturnAndProcessPred(		bool TailRecursionEliminator::foldReturnAndProcessPred(ReturnInst *Ret) {
ReturnInst *Ret, bool CannotTailCallElimCallsMarkedTail) {
BasicBlock *BB = Ret->getParent();		BasicBlock *BB = Ret->getParent();

bool Change = false;		bool Change = false;

// Make sure this block is a trivial return block.		// Make sure this block is a trivial return block.
assert(BB->getFirstNonPHIOrDbg() == Ret &&		assert(BB->getFirstNonPHIOrDbg() == Ret &&
"Trying to fold non-trivial return block");		"Trying to fold non-trivial return block");

// If the return block contains nothing but the return and PHI's,		// If the return block contains nothing but the return and PHI's,
// there might be an opportunity to duplicate the return in its		// there might be an opportunity to duplicate the return in its
// predecessors and perform TRE there. Look for predecessors that end		// predecessors and perform TRE there. Look for predecessors that end
// in unconditional branch and recursive call(s).		// in unconditional branch and recursive call(s).
SmallVector<BranchInst*, 8> UncondBranchPreds;		SmallVector<BranchInst*, 8> UncondBranchPreds;
for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {		for (pred_iterator PI = pred_begin(BB), E = pred_end(BB); PI != E; ++PI) {
BasicBlock Pred = PI;		BasicBlock Pred = PI;
Instruction *PTI = Pred->getTerminator();		Instruction *PTI = Pred->getTerminator();
if (BranchInst *BI = dyn_cast<BranchInst>(PTI))		if (BranchInst *BI = dyn_cast<BranchInst>(PTI))
if (BI->isUnconditional())		if (BI->isUnconditional())
UncondBranchPreds.push_back(BI);		UncondBranchPreds.push_back(BI);
}		}

while (!UncondBranchPreds.empty()) {		while (!UncondBranchPreds.empty()) {
BranchInst *BI = UncondBranchPreds.pop_back_val();		BranchInst *BI = UncondBranchPreds.pop_back_val();
BasicBlock *Pred = BI->getParent();		BasicBlock *Pred = BI->getParent();
if (CallInst *CI =		if (CallInst *CI = findTRECandidate(BI)) {
findTRECandidate(BI, CannotTailCallElimCallsMarkedTail)) {
LLVM_DEBUG(dbgs() << "FOLDING: " << *BB		LLVM_DEBUG(dbgs() << "FOLDING: " << *BB
<< "INTO UNCOND BRANCH PRED: " << *Pred);		<< "INTO UNCOND BRANCH PRED: " << *Pred);
FoldReturnIntoUncondBranch(Ret, BB, Pred, &DTU);		FoldReturnIntoUncondBranch(Ret, BB, Pred, &DTU);

// Cleanup: if all predecessors of BB have been eliminated by		// Cleanup: if all predecessors of BB have been eliminated by
// FoldReturnIntoUncondBranch, delete it. It is important to empty it,		// FoldReturnIntoUncondBranch, delete it. It is important to empty it,
// because the ret instruction in there is still using a value which		// because the ret instruction in there is still using a value which
// eliminateRecursiveTailCall will attempt to remove.		// eliminateRecursiveTailCall will attempt to remove.
if (!BB->hasAddressTaken() && pred_begin(BB) == pred_end(BB))		if (!BB->hasAddressTaken() && pred_begin(BB) == pred_end(BB))
DTU.deleteBB(BB);		DTU.deleteBB(BB);

eliminateCall(CI);		eliminateCall(CI);
++NumRetDuped;		++NumRetDuped;
Change = true;		Change = true;
}		}
}		}

return Change;		return Change;
}		}

bool TailRecursionEliminator::processReturningBlock(		bool TailRecursionEliminator::processReturningBlock(ReturnInst *Ret) {
ReturnInst *Ret, bool CannotTailCallElimCallsMarkedTail) {		CallInst *CI = findTRECandidate(Ret);
CallInst *CI = findTRECandidate(Ret, CannotTailCallElimCallsMarkedTail);
if (!CI)		if (!CI)
return false;		return false;

return eliminateCall(CI);		return eliminateCall(CI);
}		}

void TailRecursionEliminator::cleanupAndFinalize() {		void TailRecursionEliminator::cleanupAndFinalize() {
// If we eliminated any tail recursions, it's possible that we inserted some		// If we eliminated any tail recursions, it's possible that we inserted some
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	bool TailRecursionEliminator::eliminate(Function &F,
const TargetTransformInfo *TTI,		const TargetTransformInfo *TTI,
AliasAnalysis *AA,		AliasAnalysis *AA,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
DomTreeUpdater &DTU) {		DomTreeUpdater &DTU) {
if (F.getFnAttribute("disable-tail-calls").getValueAsString() == "true")		if (F.getFnAttribute("disable-tail-calls").getValueAsString() == "true")
return false;		return false;

bool MadeChange = false;		bool MadeChange = false;
bool AllCallsAreTailCalls = false;		MadeChange \|= markTails(F, ORE);
MadeChange \|= markTails(F, AllCallsAreTailCalls, ORE);
if (!AllCallsAreTailCalls)
return MadeChange;

// If this function is a varargs function, we won't be able to PHI the args		// If this function is a varargs function, we won't be able to PHI the args
// right, so don't even try to convert it...		// right, so don't even try to convert it...
if (F.getFunctionType()->isVarArg())		if (F.getFunctionType()->isVarArg())
return MadeChange;		return MadeChange;

// If false, we cannot perform TRE on tail calls marked with the 'tail'		if (!canTRE(F))
		hiradityaUnsubmitted Not Done Reply Inline Actions can we use isa<> here? hiraditya: can we use isa<> here?
// attribute, because doing so would cause the stack size to increase (real		return MadeChange;
// TRE would deallocate variable sized allocas, TRE doesn't).
bool CanTRETailMarkedCall = canTRE(F);

TailRecursionEliminator TRE(F, TTI, AA, ORE, DTU);		TailRecursionEliminator TRE(F, TTI, AA, ORE, DTU);

// Change any tail recursive calls to loops.		// Change any tail recursive calls to loops.
//
// FIXME: The code generator produces really bad code when an 'escaping
// alloca' is changed from being a static alloca to being a dynamic alloca.
// Until this is resolved, disable this transformation if that would ever
// happen. This bug is PR962.
for (Function::iterator BBI = F.begin(), E = F.end(); BBI != E; /in loop/) {		for (Function::iterator BBI = F.begin(), E = F.end(); BBI != E; /in loop/) {
		efriedmaUnsubmitted Not Done Reply Inline Actions Can you move this FIXME into a more appropriate spot? efriedma: Can you move this FIXME into a more appropriate spot?
		avlAuthorUnsubmitted Done Reply Inline Actions OK. avl: OK.
BasicBlock BB = &BBI++; // foldReturnAndProcessPred may delete BB.		BasicBlock BB = &BBI++; // foldReturnAndProcessPred may delete BB.
if (ReturnInst *Ret = dyn_cast<ReturnInst>(BB->getTerminator())) {		if (ReturnInst *Ret = dyn_cast<ReturnInst>(BB->getTerminator())) {
bool Change = TRE.processReturningBlock(Ret, !CanTRETailMarkedCall);		bool Change = TRE.processReturningBlock(Ret);
if (!Change && BB->getFirstNonPHIOrDbg() == Ret)		if (!Change && BB->getFirstNonPHIOrDbg() == Ret)
Change = TRE.foldReturnAndProcessPred(Ret, !CanTRETailMarkedCall);		Change = TRE.foldReturnAndProcessPred(Ret);
MadeChange \|= Change;		MadeChange \|= Change;
}		}
}		}

		hiradityaUnsubmitted Not Done Reply Inline Actions `CI->getCalledFunction() != &F` seems cheaper than `canMoveAboveCall` hiraditya: `CI->getCalledFunction() != &F` seems cheaper than `canMoveAboveCall`
TRE.cleanupAndFinalize();		TRE.cleanupAndFinalize();

return MadeChange;		return MadeChange;
}		}

namespace {		namespace {
struct TailCallElim : public FunctionPass {		struct TailCallElim : public FunctionPass {
		laytonioUnsubmitted Not Done Reply Inline Actions There is no need to pass the function here since its a member variable. laytonio: There is no need to pass the function here since its a member variable.
		avlAuthorUnsubmitted Done Reply Inline Actions Ok. avl: Ok.
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
TailCallElim() : FunctionPass(ID) {		TailCallElim() : FunctionPass(ID) {
		efriedmaUnsubmitted Not Done Reply Inline Actions Do you have to redo the AllocaDerivedValueTracker analysis? Is it not enough that the call you're trying to TRE is marked "tail"? efriedma: Do you have to redo the AllocaDerivedValueTracker analysis? Is it not enough that the call…
		avlAuthorUnsubmitted Done Reply Inline Actions Do you have to redo the AllocaDerivedValueTracker analysis? AllocaDerivedValueTracker analysis(done in markTails) could be reused here. But marking, done in markTails(), looks like separate tasks. i.e. it is better to make TRE not depending on markTails(). There is a review for this - https://reviews.llvm.org/D60031 Thus such separation looks useful(To not reuse result of markTails but have it computed inplace). Is it not enough that the call you're trying to TRE is marked "tail"? It is not enough that call which is subject to TRE is marked "Tail". It also should be checked that other calls does not capture pointer to local stack: // do not do TRE if any pointer to local stack has escaped. if (!Tracker.EscapePoints.empty()) return false; avl: >Do you have to redo the AllocaDerivedValueTracker analysis? AllocaDerivedValueTracker…
		efriedmaUnsubmitted Not Done Reply Inline Actions It is not enough that call which is subject to TRE is marked "Tail". It also should be checked that other calls does not capture pointer to local stack: If there's an escaped pointer to the local stack, we wouldn't infer "tail" in the first place, would we? efriedma: > It is not enough that call which is subject to TRE is marked "Tail". It also should be…
		avlAuthorUnsubmitted Done Reply Inline Actions If function receives pointer to alloca then it would not be marked with "Tail". Then we do not have a possibility to understand whether this function receives pointer to alloca but does not capture it: void test(int recurseCount) { if (recurseCount == 0) return; int temp = 10; globalIncrement(&temp); test(recurseCount - 1); } test - marked with Tail. globalIncrement - not marked with Tail. But TRE could be done since it does not capture pointer. But if it will capture the pointer then we could not do TRE. So we need to check !Tracker.EscapePoints.empty(). avl: If function receives pointer to alloca then it would not be marked with "Tail". Then we do not…
		efriedmaUnsubmitted Not Done Reply Inline Actions test - marked with Tail. For the given code, TRE won't mark the recursive call "tail". That transform isn't legal: the recursive call could access the caller's version of "temp". efriedma: > test - marked with Tail. For the given code, TRE won't mark the recursive call "tail". That…
		avlAuthorUnsubmitted Done Reply Inline Actions For the given code, TRE won't mark the recursive call "tail". That transform isn't legal: the recursive call could access the caller's version of "temp". it looks like recursive call could NOT access the caller's version of "temp": test(recurseCount - 1); Caller`s version of temp is accessed by non-recursive call: globalIncrement(&temp); If globalIncrement does not capture the "&temp" then TRE looks to be legal for that case. globalIncrement() would not be marked with "Tail". test() would be marked with Tail. Thus the pre-requisite for TRE would be: tail-recursive call must not receive pointer to local stack(Tail) and non-recursive calls must not capture the pointer to local stack. avl: >For the given code, TRE won't mark the recursive call "tail". That transform isn't legal: the…
		efriedmaUnsubmitted Not Done Reply Inline Actions Can you give a complete IR example where we infer "tail", but TRE is illegal? Can you give a complete IR example, we we don't infer "tail", but we still do the TRE transform here? efriedma: Can you give a complete IR example where we infer "tail", but TRE is illegal? Can you give a…
		avlAuthorUnsubmitted Done Reply Inline Actions Can you give a complete IR example where we infer "tail", but TRE is illegal? there is no such example. Currently all cases where we infer "tail" would be valid for TRE. Can you give a complete IR example, we we don't infer "tail", but we still do the TRE transform here? For the following example current code base would not infer "tail" for _Z15globalIncrementPKi and as the result would not do TRE for _Z4testi. This patch changes this behavior: so that if _Z15globalIncrementPKi is not marked with "tail" and does not capture its pointer argument - TRE would be allowed for _Z4testi. @count = dso_local local_unnamed_addr global i32 0, align 4 ; Function Attrs: nofree noinline norecurse nounwind uwtable define dso_local void @_Z15globalIncrementPKi(i32* nocapture readonly %param) local_unnamed_addr #0 { entry: %0 = load i32, i32* %param, align 4 %1 = load i32, i32* @count, align 4 %add = add nsw i32 %1, %0 store i32 %add, i32* @count, align 4 ret void } ; Function Attrs: nounwind uwtable define dso_local void @_Z4testi(i32 %recurseCount) local_unnamed_addr #1 { entry: %temp = alloca i32, align 4 %cmp = icmp eq i32 %recurseCount, 0 br i1 %cmp, label %return, label %if.end if.end: ; preds = %entry %0 = bitcast i32* %temp to i8* call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #6 store i32 10, i32* %temp, align 4 call void @_Z15globalIncrementPKi(i32* nonnull %temp) %sub = add nsw i32 %recurseCount, -1 call void @_Z4testi(i32 %sub) call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #6 br label %return return: ; preds = %entry, %if.end ret void } ; Function Attrs: argmemonly nounwind willreturn declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2 ; Function Attrs: argmemonly nounwind willreturn declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2 attributes #0 = { nofree noinline norecurse nounwind uwtable } attributes #1 = { nounwind uwtable } attributes #2 = { argmemonly nounwind willreturn } avl: >Can you give a complete IR example where we infer "tail", but TRE is illegal? there is no…
		efriedmaUnsubmitted Not Done Reply Inline Actions In your example, we don't infer "tail" for globalIncrement... but we do infer it for the recursive call to test(). I'm suggesting you could just check for "tail" on test(), instead of using AllocaDerivedValueTracker. efriedma: In your example, we don't infer "tail" for globalIncrement... but we do infer it for the…
		avlAuthorUnsubmitted Done Reply Inline Actions Checking only test() is not enough. There additionally should be checked globalIncrement(). It is correct that while checking test() there could be checked "Tail" flag. Which does not require using AllocaDerivedValueTracker. But while checking globalIncrement() there should be checked whether some alloca value escaped or not. "Tail" flag could not be used for that. AllocaDerivedValueTracker allow to do such check: Tracker.EscapePoints.empty() If we would not do check for globalIncrement then it is not valid to do TRE. Thus it seems we need to check globalIncrement for escaping pointer and we need to use AllocaDerivedValueTracker for that. avl: Checking only test() is not enough. There additionally should be checked globalIncrement(). It…
		efriedmaUnsubmitted Not Done Reply Inline Actions Checking only test() is not enough. There additionally should be checked globalIncrement(). Can you give a complete IR example where we infer "tail", but TRE is illegal? there is no such example. Currently all cases where we infer "tail" would be valid for TRE. I'm not sure how to reconcile this. Are you saying we could infer "tail" in some case where TRE is illegal, but don't right now? Or are you saying that you plan to extend TRE to handle cases where we can't infer "tail" on the recursive call? efriedma: > Checking only test() is not enough. There additionally should be checked globalIncrement().
		avlAuthorUnsubmitted Done Reply Inline Actions Or are you saying that you plan to extend TRE to handle cases where we can't infer "tail" on the recursive call? I think not exactly. more precise would probably be : "I am saying that plan to extend TRE to handle cases where we can't infer "tail" on the NON-recursive NON-last call" globalIncrement() is non-recursive non-last call in above example. But we need to check whether it captures argument pointer or not to decide whether it is OK to do TRE. To make things clear - I am suggesting instead of current pre-requisite for TRE : "All call sites are marked with Tail" to make following: "Recursive last calls are marked with "Tail", non-recursive non-last calls are proved to not capture alloca". For the above example it means : the requirement for test() should stay the same(should be marked with Tail). The requirement for globalIncrement() should be "does not capture alloca". avl: >Or are you saying that you plan to extend TRE to handle cases where we can't infer "tail" on…
		efriedmaUnsubmitted Not Done Reply Inline Actions "Recursive last calls are marked with 'tail'" implies "non-recursive non-last calls are proved to not capture alloca". efriedma: "Recursive last calls are marked with 'tail'" implies "non-recursive non-last calls are proved…
		avlAuthorUnsubmitted Done Reply Inline Actions I see, thank you for explanations. There is a test which makes me think that above rule is not always correct: Transforms/TailCallElim/basic.ll:@test1 ; PR615. Make sure that we do not move the alloca so that it interferes with the tail call. define i32 @test1() { ; CHECK: i32 @test1() ; CHECK-NEXT: alloca %A = alloca i32 ; <i32> [#uses=2] store i32 5, i32 %A call void @use(i32* %A) ; CHECK: tail call i32 @test1 %X = tail call i32 @test1() ; <i32> [#uses=1] ret i32 %X } I removed usages of AllocaDerivedValueTracker and corrected the test1 from Transforms/TailCallElim/basic.ll. avl: I see, thank you for explanations. There is a test which makes me think that above rule is not…
initializeTailCallElimPass(*PassRegistry::getPassRegistry());		initializeTailCallElimPass(*PassRegistry::getPassRegistry());
		laytonioUnsubmitted Not Done Reply Inline Actions You can use `for (Instruction &I : instructions(F))` here. laytonio: You can use `for (Instruction &I : instructions(F))` here.
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<OptimizationRemarkEmitterWrapperPass>();		AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<PostDominatorTreeWrapperPass>();		AU.addPreserved<PostDominatorTreeWrapperPass>();
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
		laytonioUnsubmitted Not Done Reply Inline Actions Is there any reason to find and validate candidates now only to have to redo it when we actually perform the eliminations? If so, is there any reason this needs to follow a different code path than findTRECandidate? findTRECandidate is doing the same checks, except for canMoveAboveCall and canTransformAccumulatorRecursion, which should probably be refactored into findTRECandidate from eliminateCall anyway. If not then all of this code goes away and we're left with the same canTRE as in trunk. laytonio: Is there any reason to find and validate candidates now only to have to redo it when we…
		avlAuthorUnsubmitted Done Reply Inline Actions We are enumerating all instructions here, so we could understand if there are not TRE candidates and stop earlier. That is the reason for doing it here. I agree that findTRECandidate should be refactored to have the same checks as here. What do you think is better to do: leave early check for TRE candidates in canTRE or remove it refactor findTRECandidate or leave it as is ? avl: We are enumerating all instructions here, so we could understand if there are not TRE…
		laytonioUnsubmitted Not Done Reply Inline Actions Yes we are iterating all the instructions here but, unless I am missing something, we would literally just be doing the checks twice for no reason. Look at it this way, best case scenario we have to check all possible candidates once, find none and we're done. Worst case, we check all possible candidates once, find one and have to check all possible candidates a second time. Where as if we remove the early checks we only ever have to check the candidates once. So we wouldn't really be stopping any earlier. As for refactoring findTRECandidate, I do think that should be done and we should strive to move all the failure conditions out of eliminateCall in order to avoid having to fold a return only to find out we didn't need to. But, I think that is out of the scope of this change, and if we do decide to keep the early checks here then we should say that findTRECandidate does a good enough job to consider this function as having valid candidates. laytonio: Yes we are iterating all the instructions here but, unless I am missing something, we would…
		avlAuthorUnsubmitted Done Reply Inline Actions Yes we are iterating all the instructions here but, unless I am missing something, we would literally just be doing the checks twice for no reason. Look at it this way, best case scenario we have to check all possible candidates once, find none and we're done. Worst case, we check all possible candidates once, find one and have to check all possible candidates a second time. Where as if we remove the early checks we only ever have to check the candidates once. So we wouldn't really be stopping any earlier. yes. we would do check twice if there are TRE candidates. my idea was that number of cases when TRE is applicable less then when TRE is not applicable. Thus we would do double instruction navigation more often than double check for candidates. But, I did not measure real impact. Thus, let`s return old logic here as you suggested. avl: >Yes we are iterating all the instructions here but, unless I am missing something, we would…
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;
		hiradityaUnsubmitted Not Done Reply Inline Actions Do we need to visit all the instructions twice? hiraditya: Do we need to visit all the instructions twice?
		laytonioUnsubmitted Not Done Reply Inline Actions Is this correct? I think we want to check these per TRE candidate in findTRECandidate, not just disable TRE in general if one is found. laytonio: Is this correct? I think we want to check these per TRE candidate in findTRECandidate, not just…
		avlAuthorUnsubmitted Done Reply Inline Actions I tried to minimize changes and keep old logic here - but yes, it is better to move that check into findTRECandidate(). Will do. avl: I tried to minimize changes and keep old logic here - but yes, it is better to move that check…

auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();		auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();
auto *DT = DTWP ? &DTWP->getDomTree() : nullptr;		auto *DT = DTWP ? &DTWP->getDomTree() : nullptr;
auto *PDTWP = getAnalysisIfAvailable<PostDominatorTreeWrapperPass>();		auto *PDTWP = getAnalysisIfAvailable<PostDominatorTreeWrapperPass>();
auto *PDT = PDTWP ? &PDTWP->getPostDomTree() : nullptr;		auto *PDT = PDTWP ? &PDTWP->getPostDomTree() : nullptr;
// There is no noticable performance difference here between Lazy and Eager		// There is no noticable performance difference here between Lazy and Eager
// UpdateStrategy based on some test results. It is feasible to switch the		// UpdateStrategy based on some test results. It is feasible to switch the
// UpdateStrategy to Lazy if we find it profitable later.		// UpdateStrategy to Lazy if we find it profitable later.
DomTreeUpdater DTU(DT, PDT, DomTreeUpdater::UpdateStrategy::Eager);		DomTreeUpdater DTU(DT, PDT, DomTreeUpdater::UpdateStrategy::Eager);

return TailRecursionEliminator::eliminate(		return TailRecursionEliminator::eliminate(
F, &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F),		F, &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F),
		efriedmaUnsubmitted Not Done Reply Inline Actions I thought we had some tests where we TRE in the presence of recursive calls, like a simple recursive fibonacci. Am I misunderstanding this? efriedma: I thought we had some tests where we TRE in the presence of recursive calls, like a simple…
		avlAuthorUnsubmitted Done Reply Inline Actions right, there is a testcase for fibonacchi: llvm/test/Transforms/TailCallElim/accum_recursion.ll:@test3_fib areAllLastFuncCallsRecursive() checking works well for fibonacci testcase: return fib(x-1)+fib(x-2); Since, Last funcs call chain is : fib()->fib()->ret. That check should prevent from such cases: return fib(x-1)+another_call()+fib(x-2); avl: right, there is a testcase for fibonacchi: llvm/test/Transforms/TailCallElim/accum_recursion.
		efriedmaUnsubmitted Not Done Reply Inline Actions That check should prevent from such cases: return fib(x-1)+another_call()+fib(x-2); Why do we need to prevent this? efriedma: > That check should prevent from such cases: return fib(x-1)+another_call()+fib(x-2); Why do…
		avlAuthorUnsubmitted Done Reply Inline Actions We do not. I misunderstood the canTransformAccumulatorRecursion(). That check could be removed. avl: We do not. I misunderstood the canTransformAccumulatorRecursion(). That check could be removed.
&getAnalysis<AAResultsWrapperPass>().getAAResults(),		&getAnalysis<AAResultsWrapperPass>().getAAResults(),
&getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE(), DTU);		&getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE(), DTU);
}		}
};		};
}		}

char TailCallElim::ID = 0;		char TailCallElim::ID = 0;
INITIALIZE_PASS_BEGIN(TailCallElim, "tailcallelim", "Tail Call Elimination",		INITIALIZE_PASS_BEGIN(TailCallElim, "tailcallelim", "Tail Call Elimination",
Show All 33 Lines

llvm/test/Transforms/TailCallElim/basic.ll

	Show All 13 Lines

	; PR615. Make sure that we do not move the alloca so that it interferes with the tail call.			; PR615. Make sure that we do not move the alloca so that it interferes with the tail call.
	define i32 @test1() {			define i32 @test1() {
	; CHECK: i32 @test1()			; CHECK: i32 @test1()
	; CHECK-NEXT: alloca			; CHECK-NEXT: alloca
	%A = alloca i32 ; <i32*> [#uses=2]			%A = alloca i32 ; <i32*> [#uses=2]
	store i32 5, i32* %A			store i32 5, i32* %A
	call void @use(i32* %A)			call void @use(i32* %A)
	; CHECK: tail call i32 @test1			; CHECK: call i32 @test1
	%X = tail call i32 @test1() ; <i32> [#uses=1]			%X = call i32 @test1() ; <i32> [#uses=1]
				efriedmaUnsubmitted Not Done Reply Inline Actions I'm not sure this is testing what it was originally supposed to. I guess that's okay, but please fix the comment at least. efriedma: I'm not sure this is testing what it was originally supposed to. I guess that's okay, but…
	ret i32 %X			ret i32 %X
	}			}

	; This function contains intervening instructions which should be moved out of the way			; This function contains intervening instructions which should be moved out of the way
	define i32 @test2(i32 %X) {			define i32 @test2(i32 %X) {
	; CHECK: i32 @test2			; CHECK: i32 @test2
	; CHECK-NOT: call			; CHECK-NOT: call
	; CHECK: ret i32			; CHECK: ret i32
	▲ Show 20 Lines • Show All 218 Lines • Show Last 20 Lines

llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll

This file was added.

				; RUN: opt < %s -tailcallelim -verify-dom-info -S \| FileCheck %s

				; IR for that test was generated from the following C++ source:
				;
				;int count;
				;__attribute__((noinline)) void globalIncrement(const int* param) { count += *param; }
				;
				;void test(int recurseCount)
				;{
				; if (recurseCount == 0) return;
				; int temp = 10;
				; globalIncrement(&temp);
				; test(recurseCount - 1);
				;}
				;

				@count = dso_local local_unnamed_addr global i32 0, align 4

				; Function Attrs: nofree noinline norecurse nounwind uwtable
				define dso_local void @_Z15globalIncrementPKi(i32* nocapture readonly %param) local_unnamed_addr #0 {
				efriedmaUnsubmitted Not Done Reply Inline Actions For the purpose of this testcase, we don't need the definition of _Z15globalIncrementPKi. efriedma: For the purpose of this testcase, we don't need the definition of _Z15globalIncrementPKi.
				entry:
				%0 = load i32, i32* %param, align 4
				%1 = load i32, i32* @count, align 4
				%add = add nsw i32 %1, %0
				store i32 %add, i32* @count, align 4
				ret void
				}

				; Test that TRE could be done for recursive tail routine containing
				; call to function receiving a pointer to local stack.

				; CHECK: void @_Z4testi
				; CHECK: br label %tailrecurse
				; CHECK: tailrecurse:
				; CHECK-NOT: call void @_Z4testi
				; CHECK: br label %tailrecurse
				; CHECK-NOT: call void @_Z4testi
				efriedmaUnsubmitted Not Done Reply Inline Actions I think I'd prefer to just generate this with update_test_checks.py efriedma: I think I'd prefer to just generate this with update_test_checks.py
				; CHECK: ret

				; Function Attrs: nounwind uwtable
				define dso_local void @_Z4testi(i32 %recurseCount) local_unnamed_addr #1 {
				entry:
				%temp = alloca i32, align 4
				%cmp = icmp eq i32 %recurseCount, 0
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				%0 = bitcast i32* %temp to i8*
				call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #6
				store i32 10, i32* %temp, align 4
				call void @_Z15globalIncrementPKi(i32* nonnull %temp)
				%sub = add nsw i32 %recurseCount, -1
				call void @_Z4testi(i32 %sub)
				call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #6
				br label %return

				return: ; preds = %entry, %if.end
				ret void
				}

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #2

				; Function Attrs: argmemonly nounwind willreturn
				declare void @llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #2

				attributes #0 = { nofree noinline norecurse nounwind uwtable }
				attributes #1 = { nounwind uwtable }
				attributes #2 = { argmemonly nounwind willreturn }

This is an archive of the discontinued LLVM Phabricator instance.

[TRE] allow TRE for non-capturing calls.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 276591

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp

llvm/test/Transforms/TailCallElim/basic.ll

llvm/test/Transforms/TailCallElim/tre-noncapturing-alloca-calls.ll

[TRE] allow TRE for non-capturing calls.
ClosedPublic