This is an archive of the discontinued LLVM Phabricator instance.

[TailCallElim] Enable marking of calls with byval as tails
ClosedPublic

Authored by rob.lougher on Oct 4 2018, 10:54 AM.

Details

Summary

In D50679 the alias analysis rules were changed with regards to tail calls and byval arguments. Previously, tail calls were assumed not to alias allocas from the current frame. This has been updated, to not assume this for arguments with the byval attribute.

With this change, TailCallElim can now be more aggressive and mark more calls as tails, e.g.:

define void @test() {
  %f = alloca %struct.foo
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test2(%struct.foo* byval %f) {
  call void @bar(%struct.foo* byval %f)
  ret void
}

define void @test3(%struct.foo* byval %f) {
  %agg.tmp = alloca %struct.foo
  %0 = bitcast %struct.foo* %agg.tmp to i8*
  %1 = bitcast %struct.foo* %f to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 40, i1 false)
  call void @bar(%struct.foo* byval %agg.tmp)
  ret void
}

The problematic case where a byval parameter is captured by a call is still handled correctly, and will not be marked as a tail (see PR7272).

After this change the following code:

-------------- test.cpp --------------
struct foo {
  int i[10];
};

extern void bar(struct foo f);

void foo(struct foo f) {
  bar(f);
}
--------------------------------------

Now produces on Linux x86-64:

_Z3foo3foo:                             # @_Z3foo3foo
	jmp	_Z3bar3foo              # TAILCALL

Previously it gave:

_Z3foo3foo:                             # @_Z3foo3foo
	subq		$40, %rsp
	movq	80(%rsp), %rax
	movq	%rax, 32(%rsp)
	movaps	48(%rsp), %xmm0
	movaps	64(%rsp), %xmm1
	movups	%xmm1, 16(%rsp)
	movups	%xmm0, (%rsp)
	callq	_Z3bar3foo
	addq	$40, %rsp
	retq

The new output matches gcc.

Fixes PR38862.

Diff Detail

Event Timeline

rob.lougher created this revision.Oct 4 2018, 10:54 AM
rob.lougher edited the summary of this revision. (Show Details)Oct 4 2018, 10:57 AM

Yes. I'll update the Details...

rob.lougher edited the summary of this revision. (Show Details)Oct 4 2018, 11:24 AM

I don't understand the logical model behind byval, but I'll review the change to TRE under the assumption that the goal of the patch is correct.

lib/Transforms/Scalar/TailRecursionElimination.cpp
138

I think you may also want to change the logic in callUsesLocalStack; this tracks users of an alloca when the callee could leak/escape this alloca pointer value to a place where a future call could obtain it, even if it isn't passed in as an argument (say, written to a global here and read back in a future call). If the pointer received by the callee is not truly the pointer to the alloca, then this couldn't happen. This case really matters for cycles in the CFG, see the comment on DeferredTails.

test/Transforms/Inline/byval-tail-call.ll
44

This "if X then Y" makes it sound like we don't need to check any other properties in this case, like

if (call passes byval) {
  markTail(call);
}

I think you meant something more like "A byval parameter does not block a call from being marked tail." or "A call with a byval parameter may still be marked as a tail."

rnk added a comment.Oct 4 2018, 1:33 PM

Nice!

I don't understand the logical model behind byval, but I'll review the change to TRE under the assumption that the goal of the patch is correct.

You have to hallucinate an implicit memcpy into the call sequence. Its destination is some stack memory allocated during the call sequence not visible from LLVM IR.

test/Transforms/Inline/byval-tail-call.ll
63

I'd throw in a test case where %x is a local alloca, since that should also work.

In D52895#1255738, @rnk wrote:

Nice!

I don't understand the logical model behind byval, but I'll review the change to TRE under the assumption that the goal of the patch is correct.

You have to hallucinate an implicit memcpy into the call sequence. Its destination is some stack memory allocated during the call sequence not visible from LLVM IR.

How long does this pointer live? In a sequence with two byval calls on the same pointer, can they reuse the same memory, or it is always reallocated even if it happens to have the same address and contents?

If it is always reallocated, any reason it shouldn't be marked nocapture? And if it were nocapture, we wouldn't need this change, right?

Hi Nicholas and Reid,

Thanks for the reviews. I'll make the suggested changes tomorrow (later on today as it's now past midnight in the UK).

Thanks,
Rob.

lib/Transforms/Scalar/TailRecursionElimination.cpp
138

If I understand correctly, you're worried about the byval "pointer" escaping, as calls after the escape point cannot be marked tail? This isn't an issue, as the alloca is copied into some target specific area, and isn't passed directly (this is why it is simply skipped - an alloca with byval is a copy, so it isn't leaking or using the alloca). If a callee did try to save it for a future call it would be equally broken irrespective of whether the future call was marked tail or not.

test/Transforms/Inline/byval-tail-call.ll
44

Yes, on hindsight it does sound confusing. First there's your point, and second the important point is that the byval is both in and out. How about:

"A byval parameter passed into a function which is passed out as byval does not block the call from being marked as tail".

63

OK, that makes sense.

rnk added a comment.Oct 4 2018, 4:29 PM

How long does this pointer live?

I'd say it belongs to the callee. It's argument memory (maybe rinsed through registers on some targets) in the new call frame. Some targets allocate it in the caller's frame, but it should only live as long as the call is active, and TCO would fail in codegen for such a case.

In a sequence with two byval calls on the same pointer, can they reuse the same memory, or it is always reallocated even if it happens to have the same address and contents?

I'd say each byval arg is a new memory allocation. It could have the same address as the old allocation, but that's not relevant.

If it is always reallocated, any reason it shouldn't be marked nocapture? And if it were nocapture, we wouldn't need this change, right?

We could probably mark it nocapture, and that'd probably fix it.

In D52895#1255738, @rnk wrote:

Nice!

I don't understand the logical model behind byval, but I'll review the change to TRE under the assumption that the goal of the patch is correct.

You have to hallucinate an implicit memcpy into the call sequence. Its destination is some stack memory allocated during the call sequence not visible from LLVM IR.

How long does this pointer live? In a sequence with two byval calls on the same pointer, can they reuse the same memory, or it is always reallocated even if it happens to have the same address and contents?

I don't think you can think of it as a pointer as some targets may pass it in unused registers. Simply think of it as a copy which can outlive the destruction of the current frame.

If it is always reallocated, any reason it shouldn't be marked nocapture? And if it were nocapture, we wouldn't need this change, right?

Unfortunately we would still need this change. A nocapture argument is not an escape point but it is still a user of the alloca, which prevents the call from being marked tail.

Unfortunately we would still need this change. A nocapture argument is not an escape point but it is still a user of the alloca, which prevents the call from being marked tail.

In case I wasn't clear. A byval argument is neither an escape point or a user of the alloca. A nocapture argument is simply not an escape point. They are not the same, and this check is necessary to correctly handle byval.

rob.lougher edited the summary of this revision. (Show Details)Oct 4 2018, 6:06 PM

I've updated the patch with the following changes:

  1. Added a new inline test with local alloca.
  2. Re-worded inline test comment.
  3. Modified the first TailCallElim test to check that a byval parameter is neither an escape point or a use of the alloca.
rob.lougher marked 6 inline comments as done.Oct 5 2018, 7:57 AM

This looks good to me, but please wait for rnk.

I misspoke, I meant to say please wait for another reviewer. I've been away from LLVM long enough that I don't have enough context to OK patches to land, but I didn't mean that rnk must be the other reviewer. Sorry!

rnk accepted this revision.Oct 6 2018, 9:31 AM

lgtm, thanks!

This revision is now accepted and ready to land.Oct 6 2018, 9:31 AM
rob.lougher closed this revision.Oct 8 2018, 11:32 AM

Committed in rL343986 (apologies, I forgot the "Differential Revision:" tag)