This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
nocapture-attribute.ll

Differential D151644

[InstCombine] Propagating `nocapture` flag to callsites
AbandonedPublic

Authored by goldstein.w.n on May 29 2023, 12:14 AM.

Download Raw Diff

Details

Reviewers

nikic
StephenFan
jdoerfert

Summary

We can sometimes propagate nocapture attributes from caller
arguments to callsite arguments.

This can occur in in the following cases:

The callsite is in a basic block that ends with a return statement.
Between the callsite the end of its basic block there are no may-write instructions.
The return value of the callsite is not used (directly or indirectly) as the address of a may-read instruction.
There are allocas or leaked (not freed or returned) mallocs reachable from the callsite.
The callsite/caller are nothrow OR there is no landing padd in the caller.

These requirements are intentionally over conservative. We are only trying
to catch relatively trivial cases.

Requirements 1 & 2 are there to ensure that after the callsite has
returned, the state of any captured in memory pointers cannot change. This
implies that if the caller has any nocapture in memory gurantees, that
state has been reached by the end of the callsite.

Requirements 3 & 4 are to cover cases where pointers could escape the
callsite (but not the caller) through non-dead code. Any return value thats
loaded from (or used to create a pointer that is loaded from) could have
derived from an argument. Finally, allocas/leaked mallocs in general are
difficult (so we avoid them entirely). Callsites can arbitrarily store
pointers in allocas for use later without violating a nocapture gurantee by
the caller, as the allocas are torn down at caller return. Likewise a
leaked malloc would not be accessible outside of the caller, but could
still be accessible after the callsite. There are a variety of complex
cases involving allocas/leaked mallocs. For simplicity, if we see either we
simply fail.

Requirement 5 is to cover the last way to escape to occur. If the
callsite/caller is nothrow its a non-issue. If the callsite may throw, then
a method of capture is through an exception. If the caller has no landing
padd to catch this exception, then the exception state will be visible
outside of the caller so any gurantees about nocapture made by the caller
will apply to the callsites throw. If the caller has a landing padd, its
possible for the callsite to capture a pointer in a throw that is later
cleared by the caller.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

goldstein.w.n created this revision.May 29 2023, 12:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2023, 12:14 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

goldstein.w.n requested review of this revision.May 29 2023, 12:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2023, 12:14 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

goldstein.w.n added a parent revision: D151643: [InstCombine] Add tests for propegating `nocapture` flag to callsites; NFC.May 29 2023, 12:14 AM

I don't think this is correct. Consider something like this:

define void @test(ptr nocapture %p) {
  call void @store_ptr_in_global(ptr %p)
  call void @do_something_with_store_pointer()
  call void @store_ptr_in_global(ptr null)
  ret void
}

Here store_ptr_in_global escapes the pointer, but the overall function does not. It is okay to have such interior escapes as long as they are not observable by the caller.

This revision now requires changes to proceed.May 29 2023, 12:21 AM

In D151644#4378832, @nikic wrote:
I don't think this is correct. Consider something like this:
define void @test(ptr nocapture %p) {
  call void @store_ptr_in_global(ptr %p)
  call void @do_something_with_store_pointer()
  call void @store_ptr_in_global(ptr null)
  ret void
}
Here store_ptr_in_global escapes the pointer, but the overall function does not. It is okay to have such interior escapes as long as they are not observable by the caller.

Ah, good catch.

It should still work then in the case that the function doesn't interact with the global state after
the call?
So we could reverse iterate and while the sequence remains pure, it should be appliable?

My motivation is something like: https://godbolt.org/z/b1Pjf4bjP

where AFAICT there is no way to write a syscall wrapper like that and actually preserve nocapture. Maybe a better approach?

Harbormaster completed remote builds in B235125: Diff 526400.May 29 2023, 1:03 AM

In D151644#4378838, @goldstein.w.n wrote:

It should still work then in the case that the function doesn't interact with the global state after
the call?
So we could reverse iterate and while the sequence remains pure, it should be appliable?

I don't think it's quite as simple as that. Here's another example:

define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}

The call-site is not no-capture, even though there are no memory accesses after it.

My motivation is something like: https://godbolt.org/z/b1Pjf4bjP

where AFAICT there is no way to write a syscall wrapper like that and actually preserve nocapture. Maybe a better approach?

So yeah, the case where we're literally calling another function with the exact same arguments is clearly safe -- it's just not entirely obvious to me what the right way to generalize that would be.

In D151644#4379353, @nikic wrote:
In D151644#4378838, @goldstein.w.n wrote:

It should still work then in the case that the function doesn't interact with the global state after
the call?
So we could reverse iterate and while the sequence remains pure, it should be appliable?

I don't think it's quite as simple as that. Here's another example:
define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}

Is this really not safe? The alloca is dead after the store_into_alloca call so does %p ever really escape? Its like if you had a function return %p but the return value was unused would it be unsafe to put nocapture at that return site?

The call-site is not no-capture, even though there are no memory accesses after it.

My motivation is something like: https://godbolt.org/z/b1Pjf4bjP

where AFAICT there is no way to write a syscall wrapper like that and actually preserve nocapture. Maybe a better approach?

So yeah, the case where we're literally calling another function with the exact same arguments is clearly safe -- it's just not entirely obvious to me what the right way to generalize that would be.

So any output value used in creating the return value can't be a medium for capture. Working with your example:

define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}

We can't prove this doesn't capture b.c it may go through the dead alloca.
But if it was slightly different:

define ptr @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  %x = load ptr, ptr %a
  ret ptr %x
}

Now we know for certain %p can't have been captured in %a.

So for any argument if its either non-ptr or its contents are used in creating the return value we can be sure %p is not captured through it.
For a return value its basically the same, if the return value of the callsite is either unused or used in the functions return value then we can be sure %p is not captured through it.

If both of the above, then we can propegate no capture.
This obviously doesn't fit in InstCombine. I was thinking a new pass in Transform/Scalar?

In D151644#4379528, @goldstein.w.n wrote:
In D151644#4379353, @nikic wrote:
In D151644#4378838, @goldstein.w.n wrote:

It should still work then in the case that the function doesn't interact with the global state after
the call?
So we could reverse iterate and while the sequence remains pure, it should be appliable?

I don't think it's quite as simple as that. Here's another example:
define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}
Is this really not safe? The alloca is dead after the store_into_alloca call so does %p ever really escape? Its like if you had a function return %p but the return value was unused would it be unsafe to put nocapture at that return site?

The call-site is not no-capture, even though there are no memory accesses after it.

My motivation is something like: https://godbolt.org/z/b1Pjf4bjP

where AFAICT there is no way to write a syscall wrapper like that and actually preserve nocapture. Maybe a better approach?

So yeah, the case where we're literally calling another function with the exact same arguments is clearly safe -- it's just not entirely obvious to me what the right way to generalize that would be.

Another easy case is a taillcall cannot capture %p.

So any output value used in creating the return value can't be a medium for capture. Working with your example:
define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}
We can't prove this doesn't capture b.c it may go through the dead alloca.
But if it was slightly different:
define ptr @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  %x = load ptr, ptr %a
  ret ptr %x
}
Now we know for certain %p can't have been captured in %a.

So for any argument if its either non-ptr or its contents are used in creating the return value we can be sure %p is not captured through it.
For a return value its basically the same, if the return value of the callsite is either unused or used in the functions return value then we can be sure %p is not captured through it.

If both of the above, then we can propegate no capture.
This obviously doesn't fit in InstCombine. I was thinking a new pass in Transform/Scalar?

Only propegate when nocapture state cannot be changed after the
callsite (so the callers nocapture must be preserved by that point).

goldstein.w.n edited the summary of this revision. (Show Details)May 30 2023, 11:49 PM

@nikic think this should do it. I did assume that non-visible captures are okay. (I.e unused return value, or in your example provably dead-store).
The basic rule AFAICT is:
If the the callsite is in a returning basic-block, and between the callsite and the return there are only readonly instructions, then the capture state CANNOT change between the callsite and the return. Since at the return we KNOW the argument hasn't been captured, we can deduce that after the callsite (since it can't change after that) the pointer hasn't been captured.

nikic added a reviewer: jdoerfert.May 31 2023, 12:53 AM

Harbormaster completed remote builds in B235499: Diff 526926.May 31 2023, 1:14 AM

propegate -> propagate everywhere

You're really playing with fire here when it comes to "it's captured, but it doesn't really matter" reasoning. Consider something like this:

define i32 @test(ptr noalias nocapture %p) {
  %a = alloca ptr
  store i32 0, ptr %p 
  call void @store_into_alloca(ptr %a, ptr %p)
  %p2 = load ptr, ptr %a
  %v = load i32, ptr %p2
  ret i32 %v
}

declare void @store_into_alloca(ptr, ptr)

This call is clearly capturing, in a way that will affect alias analysis results (load i32, ptr %p2 cannot ref %p with that nocapture attribute in place, which will be materialized with invalid alias scope metadata during inlining).

This revision now requires changes to proceed.May 31 2023, 6:52 AM

In D151644#4384102, @nikic wrote:
propegate -> propagate everywhere

You're really playing with fire here when it comes to "it's captured, but it doesn't really matter" reasoning. Consider something like this:
define i32 @test(ptr noalias nocapture %p) {
  %a = alloca ptr
  store i32 0, ptr %p 
  call void @store_into_alloca(ptr %a, ptr %p)
  %p2 = load ptr, ptr %a
  %v = load i32, ptr %p2
  ret i32 %v
}

declare void @store_into_alloca(ptr, ptr)
This call is clearly capturing, in a way that will affect alias analysis results (load i32, ptr %p2 cannot ref %p with that nocapture attribute in place, which will be materialized with invalid alias scope metadata during inlining).

Alright fair enough. I think this can only occur for allocas as any other type of store with no more may-write instructions would actually capture.
I guess we also need to handle returns in a similiar way. You could do:

> define i32 @test(ptr noalias nocapture %p) {
>   %p2 = call ptr @return_ptr(ptr %p)
>   %v = load i32, ptr %p2
>   ret i32 %v
> }

where p2 captures and it would also be an alias analysis issue.
So the rule needs to update.
I don't think we need so go so far as "any load after makes it unsafe"
although maybe that does need to be case. How about the following case:

define i32 @test(ptr noalias nocapture %p, ptr noalias nocapture %q) {
  %a = alloca ptr
  store ptr %a, ptr %q 
  call void @store_p_into_alloca_in_q(ptr %q, ptr %p)
  %p3 = load ptr, ptr %q
  %p2 = load ptr, ptr %p3
  %v = load i32, ptr %p2
  ret i32 %v
}

so %p is stored into %a through %q that is obviously a capture, but does that
violate the nocapture in @test or because the temporary pointer that capture
%p is no longer accesible after @test is it safe?

I was also thinking something like:

> define void @test(ptr noalias nocapture %p) {
>   %p_i64 = call ptr @return_ptr_as_i64(ptr %p)
>   %p2 = ptrtoint ptr %p to i64
>   %d = sub i64 %p_64, %p2
>   %unused = udiv i64 %n, %d
>   ret void
> }

Might be an issue as was thinking if @return_ptr_as_i64 can't capture then the return
value must != %p so the udiv is speculatable. But I think its safe to return random
from a function that has a nocapture argument so its possible to get the exact bit pattern
of the pointer, it just cant be derived from the pointer itself.

In D151644#4384923, @goldstein.w.n wrote:

Alright fair enough. I think this can only occur for allocas as any other type of store with no more may-write instructions would actually capture.

I'm not entirely confident on this, but I believe it can also happen with noalias returns, aka allocators. If you write the pointer into a malloc that gets leaked, I believe that would be non-capturing, as nobody has provenance to access the captured pointer anymore.

I guess we also need to handle returns in a similiar way. You could do:
> define i32 @test(ptr noalias nocapture %p) {
>   %p2 = call ptr @return_ptr(ptr %p)
>   %v = load i32, ptr %p2
>   ret i32 %v
> }
where p2 captures and it would also be an alias analysis issue.
So the rule needs to update.
I don't think we need so go so far as "any load after makes it unsafe"
although maybe that does need to be case. How about the following case:
define i32 @test(ptr noalias nocapture %p, ptr noalias nocapture %q) {
  %a = alloca ptr
  store ptr %a, ptr %q 
  call void @store_p_into_alloca_in_q(ptr %q, ptr %p)
  %p3 = load ptr, ptr %q
  %p2 = load ptr, ptr %p3
  %v = load i32, ptr %p2
  ret i32 %v
}
so %p is stored into %a through %q that is obviously a capture, but does that
violate the nocapture in @test or because the temporary pointer that capture
%p is no longer accesible after @test is it safe?

I'd say this doesn't violate nocapture on @test, for the reason you state.

I was also thinking something like:
> define void @test(ptr noalias nocapture %p) {
>   %p_i64 = call ptr @return_ptr_as_i64(ptr %p)
>   %p2 = ptrtoint ptr %p to i64
>   %d = sub i64 %p_64, %p2
>   %unused = udiv i64 %n, %d
>   ret void
> }
Might be an issue as was thinking if @return_ptr_as_i64 can't capture then the return
value must != %p so the udiv is speculatable. But I think its safe to return random
from a function that has a nocapture argument so its possible to get the exact bit pattern
of the pointer, it just cant be derived from the pointer itself.

Right, I don't think we can do the speculation optimization. In fact, it might be that the address has already been captured prior to the call and return_ptr_as_i64 returns it (neither nocapture nor noalias on the argument prevent this).

In D151644#4385020, @nikic wrote:

In D151644#4384923, @goldstein.w.n wrote:

Alright fair enough. I think this can only occur for allocas as any other type of store with no more may-write instructions would actually capture.

I'm not entirely confident on this, but I believe it can also happen with noalias returns, aka allocators. If you write the pointer into a malloc that gets leaked, I believe that would be non-capturing, as nobody has provenance to access the captured pointer anymore.

Ah that's an interesting case. I'll cover that as well (just disable if any allocator function).

I guess we also need to handle returns in a similiar way. You could do:
> define i32 @test(ptr noalias nocapture %p) {
>   %p2 = call ptr @return_ptr(ptr %p)
>   %v = load i32, ptr %p2
>   ret i32 %v
> }
where p2 captures and it would also be an alias analysis issue.
So the rule needs to update.
I don't think we need so go so far as "any load after makes it unsafe"
although maybe that does need to be case. How about the following case:
define i32 @test(ptr noalias nocapture %p, ptr noalias nocapture %q) {
  %a = alloca ptr
  store ptr %a, ptr %q 
  call void @store_p_into_alloca_in_q(ptr %q, ptr %p)
  %p3 = load ptr, ptr %q
  %p2 = load ptr, ptr %p3
  %v = load i32, ptr %p2
  ret i32 %v
}
so %p is stored into %a through %q that is obviously a capture, but does that
violate the nocapture in @test or because the temporary pointer that capture
%p is no longer accesible after @test is it safe?
I'd say this doesn't violate nocapture on @test, for the reason you state.
I was also thinking something like:
> define void @test(ptr noalias nocapture %p) {
>   %p_i64 = call ptr @return_ptr_as_i64(ptr %p)
>   %p2 = ptrtoint ptr %p to i64
>   %d = sub i64 %p_64, %p2
>   %unused = udiv i64 %n, %d
>   ret void
> }
Might be an issue as was thinking if @return_ptr_as_i64 can't capture then the return
value must != %p so the udiv is speculatable. But I think its safe to return random
from a function that has a nocapture argument so its possible to get the exact bit pattern
of the pointer, it just cant be derived from the pointer itself.
Right, I don't think we can do the speculation optimization. In fact, it might be that the address has already been captured prior to the call and return_ptr_as_i64 returns it (neither nocapture nor noalias on the argument prevent this).

Bail on any alloc, leaked malloc, or load based on the return value

@nikic, okay I re-implemented to be as conservative as possible.
Fail on:

Any alloca/leaked malloc that may reach that callsite
Any may-read on a value derived from the return value.

I think later on we can improve the alloca/malloc cases if after the callsite there are no loads or the callsite is readonly, but for now I think this should cover all the bases.

The only thing is maybe we need nothrow as well? Or at least need a nothrow if the caller has a catch (if the caller doesn't have a catch then the nocapture on caller should apply). What do you think?

goldstein.w.n edited the summary of this revision. (Show Details)May 31 2023, 2:29 PM

In D151644#4384102, @nikic wrote:

propegate -> propagate everywhere

will fix this next revision.

Harbormaster completed remote builds in B235674: Diff 527202.May 31 2023, 3:20 PM

Fix misspells, handle exception capture path

goldstein.w.n retitled this revision from [InstCombine] Propegating `nocapture` flag to callsites to [InstCombine] Propagating `nocapture` flag to callsites.Jun 1 2023, 11:23 AM

goldstein.w.n edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B235930: Diff 527534.Jun 1 2023, 12:47 PM

goldstein.w.n added a child revision: D151943: [InstCombine] Propagate some func/arg/ret attributes from caller to callsite.Jun 1 2023, 2:42 PM

Rebase

Harbormaster completed remote builds in B235999: Diff 527631.Jun 1 2023, 4:16 PM

Rebase

Harbormaster completed remote builds in B236077: Diff 527731.Jun 1 2023, 10:47 PM

goldstein.w.n mentioned this in D152226: [FunctionAttrs] Propagate some func/arg/ret attributes from caller to callsite (WIP).Jun 5 2023, 9:55 PM

(Removing from queue as superseded by D152226.)

nikic requested changes to this revision.Jun 6 2023, 8:11 AM

This revision now requires changes to proceed.Jun 6 2023, 8:11 AM

goldstein.w.n abandoned this revision.Jun 6 2023, 10:14 AM

goldstein.w.n mentioned this in rG4fa971ff62c3: [FunctionAttrs] Propagate some func/arg/ret attributes from caller to callsite….Jun 12 2023, 10:51 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

251 lines

test/

Transforms/

InstCombine/

nocapture-attribute.ll

14 lines

Diff 527731

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 3,372 Lines • ▼ Show 20 Lines	if (llvm::isPowerOf2_64(AlignmentVal)) {
Attribute::getWithAlignment(Call.getContext(), NewAlign));		Attribute::getWithAlignment(Call.getContext(), NewAlign));
Changed = true;		Changed = true;
}		}
}		}
}		}
return Changed;		return Changed;
}		}

		// See which (if any) arguments can of the callsite can inherit nocapture from
		// caller arguments. This is useful if the caller function is inlined as
		// inlining may lose the nocapture information.
		static void getNoCapturePropagations(const CallBase *CB,
		SmallVector<unsigned, 4> *NoCaptureArgs) {
		// TODO: Handle CallBr and Invoke. At the moment we only handle cases where
		// the callsite dominates a return instruction. We could add more
		// sophisticated logic to handle multiple target BBs as well, however.
		if (!isa<CallInst>(CB))
		return;

		SmallPtrSet<Value *, 4> NoCaptureParentArguments;
		// If this callsite is to a readonly function that doesn't throw then the only
		// way to the pointer to be captured is through the return value. If the
		// return type is void or the return value of this callsite is unused, then
		// all the pointer parameters at this callsite must be nocapture. NB: This is
		// a slight strengthening of the case done in the FunctionAttrs pass which has
		// the same logic but only for void function. At specific callsites we can do
		// non-void function if the return value is unused.
		bool IsAlwaysNoCapture = CB->onlyReadsMemory() && CB->doesNotThrow() &&
		(CB->getType()->isVoidTy() \|\| CB->use_empty());
		if (IsAlwaysNoCapture) {
		unsigned N = 0;
		for (Value *V : CB->args()) {
		if (V->getType()->isPointerTy() &&
		!CB->paramHasAttr(N, Attribute::NoCapture))
		NoCaptureArgs->push_back(N);
		++N;
		}
		return;
		}

		// If this is not trivially nocapture, then we propagate a nocapture argument
		// if the callsite meets the following requirements:
		//
		// 1) The callsite is in a basic block that ends with a return statement.
		// 2) Between the callsite the end of its basic block there are no
		// may-write instructions.
		// 3) The return value of the callsite is not used (directly or indirectly)
		// as the address of a may-read instruction.
		// 4) There are allocas or leaked (not freed or returned) mallocs reachable
		// from the callsite.
		// 5) The callsite/caller are nothrow OR there is no landing padd in the
		// caller.
		//
		// These requirements are intentionally over conservative. We are only trying
		// to catch relatively trivial cases.
		//
		// Requirements 1 & 2 are there to ensure that after the callsite has
		// returned, the state of any captured in memory pointers cannot change. This
		// implies that if the caller has any nocapture in memory gurantees, that
		// state has been reached by the end of the callsite.
		//
		// Requirements 3 & 4 are to cover cases where pointers could escape the
		// callsite (but not the caller) through non-dead code. Any return value thats
		// loaded from (or used to create a pointer that is loaded from) could have
		// derived from an argument. Finally, allocas/leaked mallocs in general are
		// difficult (so we avoid them entirely). Callsites can arbitrarily store
		// pointers in allocas for use later without violating a nocapture gurantee by
		// the caller, as the allocas are torn down at caller return. Likewise a
		// leaked malloc would not be accessible outside of the caller, but could
		// still be accessible after the callsite. There are a variety of complex
		// cases involving allocas/leaked mallocs. For simplicity, if we see either we
		// simply fail.
		//
		// Requirement 5 is to cover the last way to escape to occur. If the
		// callsite/caller is nothrow its a non-issue. If the callsite may throw, then
		// a method of capture is through an exception. If the caller has no landing
		// padd to catch this exception, then the exception state will be visible
		// outside of the caller so any gurantees about nocapture made by the caller
		// will apply to the callsites throw. If the caller has a landing padd, its
		// possible for the callsite to capture a pointer in a throw that is later
		// cleared by the caller.
		const BasicBlock *BB = CB->getParent();
		if (!BB)
		return;

		// Make sure this BB ends in a return (Requirement 1).
		const Instruction *ITerm = BB->getTerminator();
		if (!ITerm \|\| !isa<ReturnInst>(ITerm))
		return;

		// Get caller.
		const Function *PF = BB->getParent();
		if (!PF)
		return;

		// See if caller has any nocapture arguments we may be able to propagate
		// attributes from.
		for (unsigned I = 0; I < PF->arg_size(); ++I)
		if (PF->getArg(I)->hasNoCaptureAttr())
		NoCaptureParentArguments.insert(PF->getArg(I));

		unsigned N = 0;
		for (Value *V : CB->args()) {
		// See if this callsite argument is missing nocapture and its propagatable
		// (nocapture in the caller).
		if (!CB->paramHasAttr(N, Attribute::NoCapture) &&
		NoCaptureParentArguments.contains(V))
		NoCaptureArgs->push_back(N);
		++N;
		}

		if (NoCaptureArgs->empty())
		return;

		// Limit maximum amount of instructions we will check. The primary benefit
		// of this combine is for smaller functions that will be inlined
		// (potentially losing nocapture information), so a relatively small
		// threshhold should be sufficient.
		const unsigned kMaxChecks = 40;
		unsigned Cnt = 0;

		// Check requirements 2 & 3. We do so by scanning the basic block with the
		// callsite from the callsite to the return. We keep track of all values
		// derived from the return value of callsite. If we run into a may-read
		// instruction that uses one of those derived value or any may-store
		// instruction we fail.
		SmallPtrSet<const Value *, 8> DerivedFromReturn;
		for (const Value *U : CB->uses())
		DerivedFromReturn.insert(U);

		const Instruction *Ins = CB;
		for (Cnt = 0; Cnt < kMaxChecks; ++Cnt) {
		Ins = Ins->getNextNode();
		if (Ins == nullptr \|\| Ins == ITerm)
		break;

		// We have a write operation after callsite so fail as it may be clearing
		// captured memory.
		if (Ins->mayWriteToMemory() \|\| isa<LandingPadInst>(Ins)) {
		NoCaptureArgs->clear();
		return;
		}

		for (const Value *U : Ins->operands()) {
		if (DerivedFromReturn.contains(U)) {
		DerivedFromReturn.insert(Ins);
		break;
		}
		}

		// We have a load from the returned pointer (or a pointer derived with the
		// return value) so fail as this may have been from a captured return.
		if (Ins->mayReadFromMemory() && DerivedFromReturn.contains(Ins)) {
		NoCaptureArgs->clear();
		return;
		}
		}

		if (Cnt == kMaxChecks) {
		NoCaptureArgs->clear();
		return;
		}

		// Check all predecessors (basic blocks from which an alloca or leaked malloc
		// may be able to reach this callsite). We are being incredibly conservative
		// here. We could likely skip the alloca/leaked malloc search in a few cases.
		// 1) If the callsite is the last instruction before the return or if there
		// are no may-read instructions between the callsite and the return. 2) If
		// there are possible stores to the alloca/leaked malloc that may reach the
		// callsite its probably also safe. And/Or 3) If the callsite is readonly it
		// could never capture in memory so these are non factor concerns. For now
		// stay conservative, but over time these optimizations can be added.
		SmallPtrSet<const BasicBlock *, 16> AllPreds{BB};
		SmallVector<const BasicBlock *, 8> Preds = {BB};
		bool MayThrow = !(CB->doesNotThrow() \|\| PF->doesNotThrow());
		while (!Preds.empty()) {
		const BasicBlock *CurBB = Preds.back();
		Preds.pop_back();
		// Check all instructions in current BB for an alloca/leaked malloc.
		for (const Value &V : *CurBB) {
		if (&V == CB)
		break;

		bool Fail = false;

		// If we reach max checks and can't rule out alloca/leaked malloc case
		// fail.
		if (Cnt++ >= kMaxChecks)
		Fail = true;
		// If we find an alloca fail.
		if (isa<AllocaInst>(&V))
		Fail = true;
		// If we find leaked malloc fail.
		else if (auto *MCB = dyn_cast<CallBase>(&V))
		Fail = MCB->returnDoesNotAlias() &&
		MCB != cast<ReturnInst>(ITerm)->getReturnValue();
		// If we find a landing padd instruction fail.
		else if (MayThrow && isa<LandingPadInst>(&V))
		Fail = true;

		if (Fail) {
		NoCaptureArgs->clear();
		return;
		}
		}

		for (const BasicBlock *Pred : predecessors(CurBB))
		if (AllPreds.insert(Pred).second)
		Preds.push_back(Pred);
		}

		// Finally, if the callsite may throw and there is a landing padd in the
		// caller then fail as we only analyzed the path from callsite -> dominated
		// return. If the callsite may throw but there is no landing padd in the
		// caller, then the throw will exit the callers scope so any nocapture
		// gurantees inplace by the caller will apply to the callsite as well.
		if (MayThrow) {
		for (const BasicBlock &CurBB : *PF) {
		// Skip BBs we have already checked.
		if (AllPreds.contains(&CurBB))
		continue;
		for (const Instruction &CurIns : CurBB) {
		if (isa<LandingPadInst>(&CurIns) \|\| Cnt++ >= kMaxChecks) {
		NoCaptureArgs->clear();
		return;
		}
		}
		}
		}

		return;
		}

/// Improvements for call, callbr and invoke instructions.		/// Improvements for call, callbr and invoke instructions.
Instruction *InstCombinerImpl::visitCallBase(CallBase &Call) {		Instruction *InstCombinerImpl::visitCallBase(CallBase &Call) {
bool Changed = annotateAnyAllocSite(Call, &TLI);		bool Changed = annotateAnyAllocSite(Call, &TLI);

// Mark any parameters that are known to be non-null with the nonnull		// Mark any parameters that are known to be non-null with the nonnull
// attribute. This is helpful for inlining calls to functions with null		// attribute. This is helpful for inlining calls to functions with null
// checks on their arguments.		// checks on their arguments.
SmallVector<unsigned, 4> ArgNos;		// Likewise try to mark parameters that are known not captured from parent
		// attributes as nocapture.
		SmallVector<unsigned, 4> ArgNosNonNull, ArgNosNoCapture;
unsigned ArgNo = 0;		unsigned ArgNo = 0;

for (Value *V : Call.args()) {		for (Value *V : Call.args()) {
if (V->getType()->isPointerTy() &&		if (V->getType()->isPointerTy()) {
!Call.paramHasAttr(ArgNo, Attribute::NonNull) &&		if (!Call.paramHasAttr(ArgNo, Attribute::NonNull) &&
isKnownNonZero(V, DL, 0, &AC, &Call, &DT))		isKnownNonZero(V, DL, 0, &AC, &Call, &DT))
ArgNos.push_back(ArgNo);		ArgNosNonNull.push_back(ArgNo);
		}

ArgNo++;		ArgNo++;
}		}

		getNoCapturePropagations(&Call, &ArgNosNoCapture);

assert(ArgNo == Call.arg_size() && "Call arguments not processed correctly.");		assert(ArgNo == Call.arg_size() && "Call arguments not processed correctly.");

if (!ArgNos.empty()) {		if (!ArgNosNonNull.empty() \|\| !ArgNosNoCapture.empty()) {
AttributeList AS = Call.getAttributes();		AttributeList AS = Call.getAttributes();
LLVMContext &Ctx = Call.getContext();		LLVMContext &Ctx = Call.getContext();
AS = AS.addParamAttribute(Ctx, ArgNos,		if (!ArgNosNonNull.empty())
		AS = AS.addParamAttribute(Ctx, ArgNosNonNull,
Attribute::get(Ctx, Attribute::NonNull));		Attribute::get(Ctx, Attribute::NonNull));
		if (!ArgNosNoCapture.empty())
		AS = AS.addParamAttribute(Ctx, ArgNosNoCapture,
		Attribute::get(Ctx, Attribute::NoCapture));
Call.setAttributes(AS);		Call.setAttributes(AS);
Changed = true;		Changed = true;
}		}

// If the callee is a pointer to a function, attempt to move any casts to the		// If the callee is a pointer to a function, attempt to move any casts to the
// arguments of the call/callbr/invoke.		// arguments of the call/callbr/invoke.
Value *Callee = Call.getCalledOperand();		Value *Callee = Call.getCalledOperand();
Function *CalleeF = dyn_cast<Function>(Callee);		Function *CalleeF = dyn_cast<Function>(Callee);
▲ Show 20 Lines • Show All 639 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/nocapture-attribute.ll

Show All 15 Lines
;		;
tail call void @ptrs_maybe_capture(ptr %a0, ptr %a1, ptr %a2)		tail call void @ptrs_maybe_capture(ptr %a0, ptr %a1, ptr %a2)
ret void		ret void
}		}

define void @simple_propegated_a0_a1_nocapture_a2_maybe_capture(ptr nocapture %a0, ptr nocapture %a1, ptr %a2) local_unnamed_addr {		define void @simple_propegated_a0_a1_nocapture_a2_maybe_capture(ptr nocapture %a0, ptr nocapture %a1, ptr %a2) local_unnamed_addr {
; CHECK-LABEL: define void @simple_propegated_a0_a1_nocapture_a2_maybe_capture		; CHECK-LABEL: define void @simple_propegated_a0_a1_nocapture_a2_maybe_capture
; CHECK-SAME: (ptr nocapture [[A0:%.]], ptr nocapture [[A1:%.]], ptr [[A2:%.*]]) local_unnamed_addr {		; CHECK-SAME: (ptr nocapture [[A0:%.]], ptr nocapture [[A1:%.]], ptr [[A2:%.*]]) local_unnamed_addr {
; CHECK-NEXT: tail call void @ptrs_maybe_capture(ptr [[A0]], ptr [[A1]], ptr [[A2]])		; CHECK-NEXT: tail call void @ptrs_maybe_capture(ptr nocapture [[A0]], ptr nocapture [[A1]], ptr [[A2]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
tail call void @ptrs_maybe_capture(ptr %a0, ptr %a1, ptr %a2)		tail call void @ptrs_maybe_capture(ptr %a0, ptr %a1, ptr %a2)
ret void		ret void
}		}

define void @simple_propegate_a2_nocapture2x_a1_maybe_capture(ptr nocapture %a0, ptr %a1, ptr nocapture %a2) {		define void @simple_propegate_a2_nocapture2x_a1_maybe_capture(ptr nocapture %a0, ptr %a1, ptr nocapture %a2) {
; CHECK-LABEL: define void @simple_propegate_a2_nocapture2x_a1_maybe_capture		; CHECK-LABEL: define void @simple_propegate_a2_nocapture2x_a1_maybe_capture
; CHECK-SAME: (ptr nocapture [[A0:%.]], ptr [[A1:%.]], ptr nocapture [[A2:%.*]]) {		; CHECK-SAME: (ptr nocapture [[A0:%.]], ptr [[A1:%.]], ptr nocapture [[A2:%.*]]) {
; CHECK-NEXT: tail call void @ptrs_maybe_capture(ptr [[A2]], ptr [[A1]], ptr [[A2]])		; CHECK-NEXT: tail call void @ptrs_maybe_capture(ptr nocapture [[A2]], ptr [[A1]], ptr nocapture [[A2]])
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
tail call void @ptrs_maybe_capture(ptr %a2, ptr %a1, ptr %a2)		tail call void @ptrs_maybe_capture(ptr %a2, ptr %a1, ptr %a2)
ret void		ret void
}		}

define i64 @propegate_past_trivially_read_only(ptr nocapture %a0, i64 %r) local_unnamed_addr {		define i64 @propegate_past_trivially_read_only(ptr nocapture %a0, i64 %r) local_unnamed_addr {
; CHECK-LABEL: define i64 @propegate_past_trivially_read_only		; CHECK-LABEL: define i64 @propegate_past_trivially_read_only
; CHECK-SAME: (ptr nocapture [[A0:%.]], i64 [[R:%.]]) local_unnamed_addr {		; CHECK-SAME: (ptr nocapture [[A0:%.]], i64 [[R:%.]]) local_unnamed_addr {
; CHECK-NEXT: call void @ptrs_maybe_capture(ptr [[A0]], ptr [[A0]], ptr [[A0]])		; CHECK-NEXT: call void @ptrs_maybe_capture(ptr nocapture [[A0]], ptr nocapture [[A0]], ptr nocapture [[A0]])
; CHECK-NEXT: [[R0:%.*]] = mul i64 [[R]], [[R]]		; CHECK-NEXT: [[R0:%.*]] = mul i64 [[R]], [[R]]
; CHECK-NEXT: [[R1:%.*]] = mul i64 [[R0]], [[R0]]		; CHECK-NEXT: [[R1:%.*]] = mul i64 [[R0]], [[R0]]
; CHECK-NEXT: [[R2:%.*]] = shl i64 [[R1]], 1		; CHECK-NEXT: [[R2:%.*]] = shl i64 [[R1]], 1
; CHECK-NEXT: [[R3:%.*]] = mul i64 [[R2]], [[R2]]		; CHECK-NEXT: [[R3:%.*]] = mul i64 [[R2]], [[R2]]
; CHECK-NEXT: [[R4:%.*]] = load i64, ptr [[A0]], align 4		; CHECK-NEXT: [[R4:%.*]] = load i64, ptr [[A0]], align 4
; CHECK-NEXT: [[R5:%.*]] = add i64 [[R4]], [[R3]]		; CHECK-NEXT: [[R5:%.*]] = add i64 [[R4]], [[R3]]
; CHECK-NEXT: [[R6:%.*]] = call i64 @barrier(i64 [[R5]]) #[[ATTR0:[0-9]+]]		; CHECK-NEXT: [[R6:%.*]] = call i64 @barrier(i64 [[R5]]) #[[ATTR0:[0-9]+]]
; CHECK-NEXT: ret i64 [[R6]]		; CHECK-NEXT: ret i64 [[R6]]
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
define i64 @propegate_non_reaching_alloca(ptr nocapture %a0, i64 %r, i1 %c, i1 %c2) local_unnamed_addr {		define i64 @propegate_non_reaching_alloca(ptr nocapture %a0, i64 %r, i1 %c, i1 %c2) local_unnamed_addr {
; CHECK-LABEL: define i64 @propegate_non_reaching_alloca		; CHECK-LABEL: define i64 @propegate_non_reaching_alloca
; CHECK-SAME: (ptr nocapture [[A0:%.]], i64 [[R:%.]], i1 [[C:%.]], i1 [[C2:%.]]) local_unnamed_addr {		; CHECK-SAME: (ptr nocapture [[A0:%.]], i64 [[R:%.]], i1 [[C:%.]], i1 [[C2:%.]]) local_unnamed_addr {
; CHECK-NEXT: br i1 [[C]], label [[T:%.]], label [[F:%.]]		; CHECK-NEXT: br i1 [[C]], label [[T:%.]], label [[F:%.]]
; CHECK: T:		; CHECK: T:
; CHECK-NEXT: [[UNUSED:%.*]] = call i64 @barrier(i64 0)		; CHECK-NEXT: [[UNUSED:%.*]] = call i64 @barrier(i64 0)
; CHECK-NEXT: br i1 [[C2]], label [[TT:%.*]], label [[F]]		; CHECK-NEXT: br i1 [[C2]], label [[TT:%.*]], label [[F]]
; CHECK: TT:		; CHECK: TT:
; CHECK-NEXT: call void @ptrs_maybe_capture(ptr [[A0]], ptr [[A0]], ptr [[A0]])		; CHECK-NEXT: call void @ptrs_maybe_capture(ptr nocapture [[A0]], ptr nocapture [[A0]], ptr nocapture [[A0]])
; CHECK-NEXT: ret i64 0		; CHECK-NEXT: ret i64 0
; CHECK: F:		; CHECK: F:
; CHECK-NEXT: [[PN:%.*]] = alloca i64, align 8		; CHECK-NEXT: [[PN:%.*]] = alloca i64, align 8
; CHECK-NEXT: [[TMP1:%.*]] = call i64 @ptr_maybe_capture(ptr nonnull [[PN]]) #[[ATTR0]]		; CHECK-NEXT: [[TMP1:%.*]] = call i64 @ptr_maybe_capture(ptr nonnull [[PN]]) #[[ATTR0]]
; CHECK-NEXT: [[R7:%.*]] = call i64 @barrier(i64 9) #[[ATTR1]]		; CHECK-NEXT: [[R7:%.*]] = call i64 @barrier(i64 9) #[[ATTR1]]
; CHECK-NEXT: ret i64 [[R7]]		; CHECK-NEXT: ret i64 [[R7]]
;		;
br i1 %c, label %T, label %F		br i1 %c, label %T, label %F
Show All 36 Lines
}		}

define ptr @propegate_non_leaked_malloc(ptr nocapture %a0, i64 %r, i1 %c) local_unnamed_addr {		define ptr @propegate_non_leaked_malloc(ptr nocapture %a0, i64 %r, i1 %c) local_unnamed_addr {
; CHECK-LABEL: define ptr @propegate_non_leaked_malloc		; CHECK-LABEL: define ptr @propegate_non_leaked_malloc
; CHECK-SAME: (ptr nocapture [[A0:%.]], i64 [[R:%.]], i1 [[C:%.*]]) local_unnamed_addr {		; CHECK-SAME: (ptr nocapture [[A0:%.]], i64 [[R:%.]], i1 [[C:%.*]]) local_unnamed_addr {
; CHECK-NEXT: [[PN:%.*]] = call noalias ptr @malloc_like(i64 [[R]])		; CHECK-NEXT: [[PN:%.*]] = call noalias ptr @malloc_like(i64 [[R]])
; CHECK-NEXT: br i1 [[C]], label [[T:%.]], label [[F:%.]]		; CHECK-NEXT: br i1 [[C]], label [[T:%.]], label [[F:%.]]
; CHECK: T:		; CHECK: T:
; CHECK-NEXT: call void @ptrs_maybe_capture(ptr [[A0]], ptr [[A0]], ptr [[A0]])		; CHECK-NEXT: call void @ptrs_maybe_capture(ptr nocapture [[A0]], ptr nocapture [[A0]], ptr nocapture [[A0]])
; CHECK-NEXT: ret ptr [[PN]]		; CHECK-NEXT: ret ptr [[PN]]
; CHECK: F:		; CHECK: F:
; CHECK-NEXT: [[TMP1:%.*]] = call i64 @ptr_maybe_capture(ptr [[PN]]) #[[ATTR0]]		; CHECK-NEXT: [[TMP1:%.*]] = call i64 @ptr_maybe_capture(ptr [[PN]]) #[[ATTR0]]
; CHECK-NEXT: [[R7:%.*]] = call i64 @barrier(i64 9) #[[ATTR1]]		; CHECK-NEXT: [[R7:%.*]] = call i64 @barrier(i64 9) #[[ATTR1]]
; CHECK-NEXT: ret ptr null		; CHECK-NEXT: ret ptr null
;		;
%pN = call noalias ptr @malloc_like(i64 %r)		%pN = call noalias ptr @malloc_like(i64 %r)
br i1 %c, label %T, label %F		br i1 %c, label %T, label %F
Show All 13 Lines
; CHECK-NEXT: [[R0:%.*]] = mul i64 [[R]], [[R]]		; CHECK-NEXT: [[R0:%.*]] = mul i64 [[R]], [[R]]
; CHECK-NEXT: [[R1:%.*]] = mul i64 [[R0]], [[R0]]		; CHECK-NEXT: [[R1:%.*]] = mul i64 [[R0]], [[R0]]
; CHECK-NEXT: [[R2:%.*]] = shl i64 [[R1]], 1		; CHECK-NEXT: [[R2:%.*]] = shl i64 [[R1]], 1
; CHECK-NEXT: [[R3:%.*]] = mul i64 [[R2]], [[R2]]		; CHECK-NEXT: [[R3:%.*]] = mul i64 [[R2]], [[R2]]
; CHECK-NEXT: [[R4:%.*]] = load i64, ptr [[A0]], align 4		; CHECK-NEXT: [[R4:%.*]] = load i64, ptr [[A0]], align 4
; CHECK-NEXT: [[R5:%.*]] = add i64 [[R4]], [[R3]]		; CHECK-NEXT: [[R5:%.*]] = add i64 [[R4]], [[R3]]
; CHECK-NEXT: [[R6:%.*]] = call i64 @barrier(i64 [[R5]]) #[[ATTR0]]		; CHECK-NEXT: [[R6:%.*]] = call i64 @barrier(i64 [[R5]]) #[[ATTR0]]
; CHECK-NEXT: store i64 [[R6]], ptr [[A0]], align 4		; CHECK-NEXT: store i64 [[R6]], ptr [[A0]], align 4
; CHECK-NEXT: [[TMP1:%.*]] = call i64 @ptr_maybe_capture(ptr nonnull [[A0]]) #[[ATTR0]]		; CHECK-NEXT: [[TMP1:%.*]] = call i64 @ptr_maybe_capture(ptr nocapture nonnull [[A0]]) #[[ATTR0]]
; CHECK-NEXT: ret i64 [[R6]]		; CHECK-NEXT: ret i64 [[R6]]
;		;
call void @ptrs_maybe_capture(ptr %a0, ptr %a0, ptr %a0)		call void @ptrs_maybe_capture(ptr %a0, ptr %a0, ptr %a0)
%r0 = mul i64 %r, %r		%r0 = mul i64 %r, %r
%r1 = mul i64 %r0, %r0		%r1 = mul i64 %r0, %r0
%r2 = add i64 %r1, %r1		%r2 = add i64 %r1, %r1
%r3 = mul i64 %r2, %r2		%r3 = mul i64 %r2, %r2
%r4 = load i64, ptr %a0		%r4 = load i64, ptr %a0
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	true:
ret i64 %r		ret i64 %r
false:		false:
ret i64 0		ret i64 0
}		}

define i64 @propegate_return(ptr nocapture %a0) {		define i64 @propegate_return(ptr nocapture %a0) {
; CHECK-LABEL: define i64 @propegate_return		; CHECK-LABEL: define i64 @propegate_return
; CHECK-SAME: (ptr nocapture [[A0:%.*]]) {		; CHECK-SAME: (ptr nocapture [[A0:%.*]]) {
; CHECK-NEXT: [[R:%.*]] = call i64 @ptr_maybe_capture.i64(ptr [[A0]])		; CHECK-NEXT: [[R:%.*]] = call i64 @ptr_maybe_capture.i64(ptr nocapture [[A0]])
; CHECK-NEXT: [[RR:%.*]] = mul i64 [[R]], [[R]]		; CHECK-NEXT: [[RR:%.*]] = mul i64 [[R]], [[R]]
; CHECK-NEXT: ret i64 [[RR]]		; CHECK-NEXT: ret i64 [[RR]]
;		;
%r = call i64 @ptr_maybe_capture.i64(ptr %a0)		%r = call i64 @ptr_maybe_capture.i64(ptr %a0)
%rr = mul i64 %r, %r		%rr = mul i64 %r, %r
ret i64 %rr		ret i64 %rr
}		}

▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines