This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Propagating `nocapture` flag to callsites
AbandonedPublic

Authored by goldstein.w.n on May 29 2023, 12:14 AM.

Details

Summary

We can sometimes propagate nocapture attributes from caller
arguments to callsite arguments.

This can occur in in the following cases:

  1. The callsite is in a basic block that ends with a return statement.
  2. Between the callsite the end of its basic block there are no may-write instructions.
  3. The return value of the callsite is not used (directly or indirectly) as the address of a may-read instruction.
  4. There are allocas or leaked (not freed or returned) mallocs reachable from the callsite.
  5. The callsite/caller are nothrow OR there is no landing padd in the caller.

These requirements are intentionally over conservative. We are only trying
to catch relatively trivial cases.

Requirements 1 & 2 are there to ensure that after the callsite has
returned, the state of any captured in memory pointers cannot change. This
implies that if the caller has any nocapture in memory gurantees, that
state has been reached by the end of the callsite.

Requirements 3 & 4 are to cover cases where pointers could escape the
callsite (but not the caller) through non-dead code. Any return value thats
loaded from (or used to create a pointer that is loaded from) could have
derived from an argument. Finally, allocas/leaked mallocs in general are
difficult (so we avoid them entirely). Callsites can arbitrarily store
pointers in allocas for use later without violating a nocapture gurantee by
the caller, as the allocas are torn down at caller return. Likewise a
leaked malloc would not be accessible outside of the caller, but could
still be accessible after the callsite. There are a variety of complex
cases involving allocas/leaked mallocs. For simplicity, if we see either we
simply fail.

Requirement 5 is to cover the last way to escape to occur. If the
callsite/caller is nothrow its a non-issue. If the callsite may throw, then
a method of capture is through an exception. If the caller has no landing
padd to catch this exception, then the exception state will be visible
outside of the caller so any gurantees about nocapture made by the caller
will apply to the callsites throw. If the caller has a landing padd, its
possible for the callsite to capture a pointer in a throw that is later
cleared by the caller.

Diff Detail

Event Timeline

goldstein.w.n created this revision.May 29 2023, 12:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2023, 12:14 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
goldstein.w.n requested review of this revision.May 29 2023, 12:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2023, 12:14 AM
nikic requested changes to this revision.May 29 2023, 12:21 AM

I don't think this is correct. Consider something like this:

define void @test(ptr nocapture %p) {
  call void @store_ptr_in_global(ptr %p)
  call void @do_something_with_store_pointer()
  call void @store_ptr_in_global(ptr null)
  ret void
}

Here store_ptr_in_global escapes the pointer, but the overall function does not. It is okay to have such interior escapes as long as they are not observable by the caller.

This revision now requires changes to proceed.May 29 2023, 12:21 AM
goldstein.w.n added a comment.EditedMay 29 2023, 12:29 AM

I don't think this is correct. Consider something like this:

define void @test(ptr nocapture %p) {
  call void @store_ptr_in_global(ptr %p)
  call void @do_something_with_store_pointer()
  call void @store_ptr_in_global(ptr null)
  ret void
}

Here store_ptr_in_global escapes the pointer, but the overall function does not. It is okay to have such interior escapes as long as they are not observable by the caller.

Ah, good catch.

It should still work then in the case that the function doesn't interact with the global state after
the call?
So we could reverse iterate and while the sequence remains pure, it should be appliable?

My motivation is something like: https://godbolt.org/z/b1Pjf4bjP

where AFAICT there is no way to write a syscall wrapper like that and actually preserve nocapture. Maybe a better approach?

It should still work then in the case that the function doesn't interact with the global state after
the call?
So we could reverse iterate and while the sequence remains pure, it should be appliable?

I don't think it's quite as simple as that. Here's another example:

define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}

The call-site is not no-capture, even though there are no memory accesses after it.

My motivation is something like: https://godbolt.org/z/b1Pjf4bjP

where AFAICT there is no way to write a syscall wrapper like that and actually preserve nocapture. Maybe a better approach?

So yeah, the case where we're literally calling another function with the exact same arguments is clearly safe -- it's just not entirely obvious to me what the right way to generalize that would be.

It should still work then in the case that the function doesn't interact with the global state after
the call?
So we could reverse iterate and while the sequence remains pure, it should be appliable?

I don't think it's quite as simple as that. Here's another example:

define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}

Is this really not safe? The alloca is dead after the store_into_alloca call so does %p ever really escape? Its like if you had a function return %p but the return value was unused would it be unsafe to put nocapture at that return site?

The call-site is not no-capture, even though there are no memory accesses after it.

My motivation is something like: https://godbolt.org/z/b1Pjf4bjP

where AFAICT there is no way to write a syscall wrapper like that and actually preserve nocapture. Maybe a better approach?

So yeah, the case where we're literally calling another function with the exact same arguments is clearly safe -- it's just not entirely obvious to me what the right way to generalize that would be.

So any output value used in creating the return value can't be a medium for capture. Working with your example:

define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}

We can't prove this doesn't capture b.c it may go through the dead alloca.
But if it was slightly different:

define ptr @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  %x = load ptr, ptr %a
  ret ptr %x
}

Now we know for certain %p can't have been captured in %a.

So for any argument if its either non-ptr or its contents are used in creating the return value we can be sure %p is not captured through it.
For a return value its basically the same, if the return value of the callsite is either unused or used in the functions return value then we can be sure %p is not captured through it.

If both of the above, then we can propegate no capture.
This obviously doesn't fit in InstCombine. I was thinking a new pass in Transform/Scalar?

It should still work then in the case that the function doesn't interact with the global state after
the call?
So we could reverse iterate and while the sequence remains pure, it should be appliable?

I don't think it's quite as simple as that. Here's another example:

define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}

Is this really not safe? The alloca is dead after the store_into_alloca call so does %p ever really escape? Its like if you had a function return %p but the return value was unused would it be unsafe to put nocapture at that return site?

The call-site is not no-capture, even though there are no memory accesses after it.

My motivation is something like: https://godbolt.org/z/b1Pjf4bjP

where AFAICT there is no way to write a syscall wrapper like that and actually preserve nocapture. Maybe a better approach?

So yeah, the case where we're literally calling another function with the exact same arguments is clearly safe -- it's just not entirely obvious to me what the right way to generalize that would be.

Another easy case is a taillcall cannot capture %p.

So any output value used in creating the return value can't be a medium for capture. Working with your example:

define void @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  ret void
}

We can't prove this doesn't capture b.c it may go through the dead alloca.
But if it was slightly different:

define ptr @test(ptr nocapture %p) {
  %a = alloca ptr
  call void @store_into_alloca(ptr %a, ptr %p)
  %x = load ptr, ptr %a
  ret ptr %x
}

Now we know for certain %p can't have been captured in %a.

So for any argument if its either non-ptr or its contents are used in creating the return value we can be sure %p is not captured through it.
For a return value its basically the same, if the return value of the callsite is either unused or used in the functions return value then we can be sure %p is not captured through it.

If both of the above, then we can propegate no capture.
This obviously doesn't fit in InstCombine. I was thinking a new pass in Transform/Scalar?

Only propegate when nocapture state cannot be changed after the
callsite (so the callers nocapture must be preserved by that point).

goldstein.w.n edited the summary of this revision. (Show Details)May 30 2023, 11:49 PM

@nikic think this should do it. I did assume that non-visible captures are okay. (I.e unused return value, or in your example provably dead-store).
The basic rule AFAICT is:
If the the callsite is in a returning basic-block, and between the callsite and the return there are only readonly instructions, then the capture state CANNOT change between the callsite and the return. Since at the return we KNOW the argument hasn't been captured, we can deduce that after the callsite (since it can't change after that) the pointer hasn't been captured.

nikic requested changes to this revision.May 31 2023, 6:52 AM

propegate -> propagate everywhere

You're really playing with fire here when it comes to "it's captured, but it doesn't really matter" reasoning. Consider something like this:

define i32 @test(ptr noalias nocapture %p) {
  %a = alloca ptr
  store i32 0, ptr %p 
  call void @store_into_alloca(ptr %a, ptr %p)
  %p2 = load ptr, ptr %a
  %v = load i32, ptr %p2
  ret i32 %v
}

declare void @store_into_alloca(ptr, ptr)

This call is clearly capturing, in a way that will affect alias analysis results (load i32, ptr %p2 cannot ref %p with that nocapture attribute in place, which will be materialized with invalid alias scope metadata during inlining).

This revision now requires changes to proceed.May 31 2023, 6:52 AM

propegate -> propagate everywhere

You're really playing with fire here when it comes to "it's captured, but it doesn't really matter" reasoning. Consider something like this:

define i32 @test(ptr noalias nocapture %p) {
  %a = alloca ptr
  store i32 0, ptr %p 
  call void @store_into_alloca(ptr %a, ptr %p)
  %p2 = load ptr, ptr %a
  %v = load i32, ptr %p2
  ret i32 %v
}

declare void @store_into_alloca(ptr, ptr)

This call is clearly capturing, in a way that will affect alias analysis results (load i32, ptr %p2 cannot ref %p with that nocapture attribute in place, which will be materialized with invalid alias scope metadata during inlining).

Alright fair enough. I think this can only occur for allocas as any other type of store with no more may-write instructions would actually capture.
I guess we also need to handle returns in a similiar way. You could do:

> define i32 @test(ptr noalias nocapture %p) {
>   %p2 = call ptr @return_ptr(ptr %p)
>   %v = load i32, ptr %p2
>   ret i32 %v
> }

where p2 captures and it would also be an alias analysis issue.
So the rule needs to update.
I don't think we need so go so far as "any load after makes it unsafe"
although maybe that does need to be case. How about the following case:

define i32 @test(ptr noalias nocapture %p, ptr noalias nocapture %q) {
  %a = alloca ptr
  store ptr %a, ptr %q 
  call void @store_p_into_alloca_in_q(ptr %q, ptr %p)
  %p3 = load ptr, ptr %q
  %p2 = load ptr, ptr %p3
  %v = load i32, ptr %p2
  ret i32 %v
}

so %p is stored into %a through %q that is obviously a capture, but does that
violate the nocapture in @test or because the temporary pointer that capture
%p is no longer accesible after @test is it safe?

I was also thinking something like:

> define void @test(ptr noalias nocapture %p) {
>   %p_i64 = call ptr @return_ptr_as_i64(ptr %p)
>   %p2 = ptrtoint ptr %p to i64
>   %d = sub i64 %p_64, %p2
>   %unused = udiv i64 %n, %d
>   ret void
> }

Might be an issue as was thinking if @return_ptr_as_i64 can't capture then the return
value must != %p so the udiv is speculatable. But I think its safe to return random
from a function that has a nocapture argument so its possible to get the exact bit pattern
of the pointer, it just cant be derived from the pointer itself.

Alright fair enough. I think this can only occur for allocas as any other type of store with no more may-write instructions would actually capture.

I'm not entirely confident on this, but I believe it can also happen with noalias returns, aka allocators. If you write the pointer into a malloc that gets leaked, I believe that would be non-capturing, as nobody has provenance to access the captured pointer anymore.

I guess we also need to handle returns in a similiar way. You could do:

> define i32 @test(ptr noalias nocapture %p) {
>   %p2 = call ptr @return_ptr(ptr %p)
>   %v = load i32, ptr %p2
>   ret i32 %v
> }

where p2 captures and it would also be an alias analysis issue.
So the rule needs to update.
I don't think we need so go so far as "any load after makes it unsafe"
although maybe that does need to be case. How about the following case:

define i32 @test(ptr noalias nocapture %p, ptr noalias nocapture %q) {
  %a = alloca ptr
  store ptr %a, ptr %q 
  call void @store_p_into_alloca_in_q(ptr %q, ptr %p)
  %p3 = load ptr, ptr %q
  %p2 = load ptr, ptr %p3
  %v = load i32, ptr %p2
  ret i32 %v
}

so %p is stored into %a through %q that is obviously a capture, but does that
violate the nocapture in @test or because the temporary pointer that capture
%p is no longer accesible after @test is it safe?

I'd say this doesn't violate nocapture on @test, for the reason you state.

I was also thinking something like:

> define void @test(ptr noalias nocapture %p) {
>   %p_i64 = call ptr @return_ptr_as_i64(ptr %p)
>   %p2 = ptrtoint ptr %p to i64
>   %d = sub i64 %p_64, %p2
>   %unused = udiv i64 %n, %d
>   ret void
> }

Might be an issue as was thinking if @return_ptr_as_i64 can't capture then the return
value must != %p so the udiv is speculatable. But I think its safe to return random
from a function that has a nocapture argument so its possible to get the exact bit pattern
of the pointer, it just cant be derived from the pointer itself.

Right, I don't think we can do the speculation optimization. In fact, it might be that the address has already been captured prior to the call and return_ptr_as_i64 returns it (neither nocapture nor noalias on the argument prevent this).

Alright fair enough. I think this can only occur for allocas as any other type of store with no more may-write instructions would actually capture.

I'm not entirely confident on this, but I believe it can also happen with noalias returns, aka allocators. If you write the pointer into a malloc that gets leaked, I believe that would be non-capturing, as nobody has provenance to access the captured pointer anymore.

Ah that's an interesting case. I'll cover that as well (just disable if any allocator function).

I guess we also need to handle returns in a similiar way. You could do:

> define i32 @test(ptr noalias nocapture %p) {
>   %p2 = call ptr @return_ptr(ptr %p)
>   %v = load i32, ptr %p2
>   ret i32 %v
> }

where p2 captures and it would also be an alias analysis issue.
So the rule needs to update.
I don't think we need so go so far as "any load after makes it unsafe"
although maybe that does need to be case. How about the following case:

define i32 @test(ptr noalias nocapture %p, ptr noalias nocapture %q) {
  %a = alloca ptr
  store ptr %a, ptr %q 
  call void @store_p_into_alloca_in_q(ptr %q, ptr %p)
  %p3 = load ptr, ptr %q
  %p2 = load ptr, ptr %p3
  %v = load i32, ptr %p2
  ret i32 %v
}

so %p is stored into %a through %q that is obviously a capture, but does that
violate the nocapture in @test or because the temporary pointer that capture
%p is no longer accesible after @test is it safe?

I'd say this doesn't violate nocapture on @test, for the reason you state.

I was also thinking something like:

> define void @test(ptr noalias nocapture %p) {
>   %p_i64 = call ptr @return_ptr_as_i64(ptr %p)
>   %p2 = ptrtoint ptr %p to i64
>   %d = sub i64 %p_64, %p2
>   %unused = udiv i64 %n, %d
>   ret void
> }

Might be an issue as was thinking if @return_ptr_as_i64 can't capture then the return
value must != %p so the udiv is speculatable. But I think its safe to return random
from a function that has a nocapture argument so its possible to get the exact bit pattern
of the pointer, it just cant be derived from the pointer itself.

Right, I don't think we can do the speculation optimization. In fact, it might be that the address has already been captured prior to the call and return_ptr_as_i64 returns it (neither nocapture nor noalias on the argument prevent this).

goldstein.w.n edited the summary of this revision. (Show Details)

Bail on any alloc, leaked malloc, or load based on the return value

@nikic, okay I re-implemented to be as conservative as possible.
Fail on:

  1. Any alloca/leaked malloc that may reach that callsite
  2. Any may-read on a value derived from the return value.

I think later on we can improve the alloca/malloc cases if after the callsite there are no loads or the callsite is readonly, but for now I think this should cover all the bases.

The only thing is maybe we need nothrow as well? Or at least need a nothrow if the caller has a catch (if the caller doesn't have a catch then the nocapture on caller should apply). What do you think?

goldstein.w.n edited the summary of this revision. (Show Details)May 31 2023, 2:29 PM

propegate -> propagate everywhere

will fix this next revision.

goldstein.w.n edited the summary of this revision. (Show Details)

Fix misspells, handle exception capture path

goldstein.w.n retitled this revision from [InstCombine] Propegating `nocapture` flag to callsites to [InstCombine] Propagating `nocapture` flag to callsites.Jun 1 2023, 11:23 AM
goldstein.w.n edited the summary of this revision. (Show Details)
goldstein.w.n edited the summary of this revision. (Show Details)

Rebase

nikic added a comment.Jun 6 2023, 8:10 AM

(Removing from queue as superseded by D152226.)

nikic requested changes to this revision.Jun 6 2023, 8:11 AM
This revision now requires changes to proceed.Jun 6 2023, 8:11 AM
goldstein.w.n abandoned this revision.Jun 6 2023, 10:14 AM