This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] clarify the semantics of nocapture
ClosedPublic

Authored by aqjune on Mar 4 2021, 1:53 AM.

Details

Summary

This patch clarifies the semantics of nocapture attribute.

A 'Pointer Capture' subsection is added to describe the semantics of pointer capture first.

For the nocapture example with two same pointer arguments, it is consistent with the semantics that Alive2 used to run lit tests.

Diff Detail

Event Timeline

aqjune requested review of this revision.Mar 4 2021, 1:53 AM
aqjune created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2021, 1:53 AM
aqjune updated this revision to Diff 328156.Mar 4 2021, 6:28 AM

Minor updates

I would not interchange "capture" and "escape" in the text, it doesn't help. Stick with capture.

The volatile stuff should go in the pointer capture section.

llvm/docs/LangRef.rst
2642

I don't really like this text too much. Storing is only one way and I'm not sure why we need to talk about non-integral pointers at all.

@reames and me recently started a write up to define "capture", maybe we should share that.

2647

The first thing is not true, e.g., it ignores ptr2int.

2653

I honestly don't understand this. For me, the beginning reads like: "If the pointer is nocapture it is captured when ...." which is at least odd. I assume you want to say that nocapture applies to the use and not the value, if so, let's do that more explicitly.

reames added a comment.Mar 4 2021, 4:17 PM

Johannes, Artur, and I have recently spent quite a bit of time discussing the definition of capture offline. I haven't yet read your proposed text - I will shortly - but I wanted to share our progress.

The most recent version of my draft proposal can be found at https://github.com/preames/public-notes/blob/master/defining-capture.rst. This writeup is based on discussion w/Johannes and Artur, but I haven't yet had a chance to get their input on the draft. As such, all mistakes are mine. :)

This is our second major attempt. The previous version, which can be found at https://github.com/preames/public-notes/blob/master/defining-escape.rst, ended up not working out and we had to go back to the drawing board.

reames added a comment.EditedMar 4 2021, 4:30 PM

Now having read the text, and responding to the review.

The definition you're using appears to be in terms of the set of locations which can possibly refer to the object. The problem is that if any element of that set is itself captured, then your object is captured. This is the root confusion which made me realize we need to have a global definition of capture which isn't purely set based.

Reading through your definition, there's a subtlety to the nocapture argument attribute which is tricky. The attribute doesn't simply state that the callee doesn't contain a capturing operation (using the candidate definitions from previously linked document), it also says that pointer isn't propagated into any non-capturing, but unknown to the caller additional location. That's an important detail, and not one we want to loose.

I think we can combine the global definition of capture (as per the previously linked document) and your set based formalism. Let me take a shot.

If we have an object which has not yet been captured passed to a nocapture argument of a function, we know that the object remains uncaptured after the call. Additionally, we also know that the callee hasn't increased the number of locations in which references to the object can be observed after the call. Thus, if the caller can precisely enumerate said set before the call, that said remains precise after the call completes.

What do you think?

(Anyone reading along is probably better off look at https://github.com/preames/public-notes/blob/master/defining-capture.rst#nocapture-argument-attribute. That incorporates a couple of fixes I noticed after first posting this.)

If we have an object which has not yet been captured passed to a nocapture argument of a function, we know that the object remains uncaptured after the call. Additionally, we also know that the callee hasn't increased the number of locations in which references to the object can be observed after the call. Thus, if the caller can precisely enumerate said set before the call, that said remains precise after the call completes.

You are hinting at a more dynamic semantics, while we're proposing a more syntactic one.
Consider this example:

f(nocapture p) {
  p2 = load glb
  ret p2
}

According to your definition, this functions triggers UB if p2 == p, because it increases the number of observations of p. It's counter-intuitive to me that a function that doesn't touch its nocapture argument captures that argument.

Another case is this:

f(nocapture p, q) {
  glb = g
}

Similarly, this function would trigger UB if p == q. This contrasts with how other attributes work. For example, if a pointer is readonly we can't write through it, but we can still write to the object provided we have another non-readonly pointer to it.

We've been thinking along these lines:

  • a function cannot return a pointer tagged as nocapture
  • on return, the memory cannot have any nocapture-tagged pointer stored in captured objects, global objects, or objects passed as arguments

It's a more syntactic view than what you are proposing, but I think it's more in line with the other attributes.

I think we have to stick to a pointer-based definition not an object one. However, we should describe it holistically, which this patch does in parts.

The key idea is, IMO, that if I have a pointer p and I pass it as an argument to a function f, will any bits of p become "public knowledge" through the call.
That is, if I pick random bits for p (thereby eliminating aliases), would they be available after the call (to the executing thread or another thread).

This definition is born out of the use cases I know of for nocapture which basically ask if I can freely change the bits of the pointer without disturbing anything, e.g., move heap to stack.
One note: Most of the time, nocapture is really only interesting if you have a "noalias" (or function identified) pointer, that means, it is often useful at the call site, not inside the callee.

What I want to avoid (not to say this patch doesn't!) is to define temporary storage necessarily captures. For me "capturing" is making the bits available after the call, not the fact of storing it.
So if you have some memory not accessible by others, put your pointer in there, and then you take it out before you return, all is good. Some examples: https://clang.godbolt.org/z/P1d7rG

nlopes added a comment.Mar 8 2021, 6:16 AM
// G is only used in this function and always written first, no leakage!
static int *G;
void noescape1(int *p) {
    int q;
    G = cond() ? p : &q;
    *G = 3;
}

I agree with all your examples except this one. The remaining examples are easy to justify because they store pointers to local memory that is not escaped, so those pointers don't escape either.
This example I think is better handled in two separate phases: one that removes global variables (your example's global variable can be removed easily). Then we have a cheap escape analysis that shouldn't worry with those "fake" escapes because they've been removed already.
We just need a definition of capture that is precise enough for the kind of escape analysis we want to implement. The case above seems a bit artificial.

reames added a comment.Mar 8 2021, 8:47 AM

If we have an object which has not yet been captured passed to a nocapture argument of a function, we know that the object remains uncaptured after the call. Additionally, we also know that the callee hasn't increased the number of locations in which references to the object can be observed after the call. Thus, if the caller can precisely enumerate said set before the call, that said remains precise after the call completes.

You are hinting at a more dynamic semantics, while we're proposing a more syntactic one.

Yes I am, and I think that's important. Its worth noting that every syntactic definition has a corresponding dynamic version, but having the dynamic one be simple and easy to understand is important.

Key reasons:

  1. I personally have a hard time understanding syntactic definitions. Normally, I'd hesitate to insert my personal preferences so strongly, but as one of the people who has historical fixed bugs in this area, this actually an important point.
  2. My experience is that the dynamic definitions are easier to explain.
  3. Dynamic definitions are easier to test. In particular, the map themselves well to a collection of test cases and even sanitizers. I think the last point (sanitizers) is actually quite important because as we become more aggressive about optimizing based on capture semantics I expect we'll find existing frontend bugs in usage.

Consider this example:

f(nocapture p) {
  p2 = load glb
  ret p2
}

According to your definition, this functions triggers UB if p2 == p, because it increases the number of observations of p. It's counter-intuitive to me that a function that doesn't touch its nocapture argument captures that argument.

I agree, this is slightly counter intuitive. I think it's also fundamentally necessary.

One of the key optimization properties we want is that a nocapture *and trackable* pointer remains nocapture *and trackable* after being passed to a nocapture argument. Given we don't have a way to describe in current langref the newly created copy, we must consider this a violation of the existing nocapture parameter property.

Note the distinction in my wording between nocapture and "trackable". The existing nocapture attribute mixes two related, but distinct, properties.

Another case is this:

f(nocapture p, q) {
  glb = g
}

Similarly, this function would trigger UB if p == q. This contrasts with how other attributes work. For example, if a pointer is readonly we can't write through it, but we can still write to the object provided we have another non-readonly pointer to it.

I think for this example, you're looking at the stale wording from the review rather than the updated version in the linked to document. I agree we want pointer properties, not object ones, and attempted to address that in the wording. Does the new wording match what you'd expect?

The "trackability" part my current definition for the argument attribute does seem more object based, and I agree that needs some word smithing. I'm not arguing in favor of an object semantics here, it simply happens to be the way my brain processes this stuff and it tends to slide into my wording if I'm not careful.

We've been thinking along these lines:

  • a function cannot return a pointer tagged as nocapture
  • on return, the memory cannot have any nocapture-tagged pointer stored in captured objects, global objects, or objects passed as arguments

It's a more syntactic view than what you are proposing, but I think it's more in line with the other attributes.

I'll just point out that with this definition, we still need to define what a captured object is. That's the root point I'm trying to get defined, everything else builds on top.

reames added a comment.Mar 8 2021, 8:58 AM

JFYI, updated my candidate text for nocapture parameter attribute to try to clarify points raised in last round of response. (https://github.com/preames/public-notes/blob/master/defining-capture.rst#nocapture-argument-attribute)

We just need a definition of capture that is precise enough for the kind of escape analysis we want to implement. The case above seems a bit artificial.

Artificial, sure. Though, OpenMP GPU runtime as very similar situations, e.g., internal globals we see all uses.
That said, I agree with your statement we want to "optimize" this in two steps, nevertheless, conceptually I'd hope we would not define this a capture even if the global is not demoted to stack.

Some thoughts on other topics discussed here:

dynamic vs static semantic

I doubt we can explain the examples I presented reasonably in a purely static manner. It's a dynamic property for me.

nocapture & trackable

 f(nocapture p) {
 p2 = load glb
 ret p2
}

I do not think the example should cause UB if p == p2. The argument that we want to keep "track of" p and nocapture is a way to ensure we can is in principle valid, however, if p2 can be p, we either lost track of it already or we should have tracked the uses of glb. Either way, nocapture helps us to keep track of the argument without the UB requirement here while it is up to the caller to keep track of the value being passed before the call. Said differently, nocapture uses do not require you to track additional value/location but nocapture is not an object property and other uses/locations have to be tracked separately.

nlopes added a comment.Mar 9 2021, 3:08 AM

Consider this example:

f(nocapture p) {
  p2 = load glb
  ret p2
}

According to your definition, this functions triggers UB if p2 == p, because it increases the number of observations of p. It's counter-intuitive to me that a function that doesn't touch its nocapture argument captures that argument.

I agree, this is slightly counter intuitive. I think it's also fundamentally necessary.

Can you expand why you think this is fundamental?
This semantics makes it very hard for an analysis to tag a parameter as nocapture. if a function stores a pointer to memory or returns a pointer it may be capturing any of the parameters, who knows.

More fundamentally, for optimization I think we only care about 2 cases:

  • An object hasn't escaped
  • An object escaped

The number of references to an escaped object doesn't matter. What we care about is when it flips from non-escaped to escaped.

What are the uses of nocapture information? It's used when traversing a function and analyze call sites. We want to know if a given locally-allocated object can be modified by a callee. If not, we can do all sorts of store forwarding, know that the object is still alive, etc.
So if a pointer is already escaped, we gain no information about the existence of further escapes. What we want is the flip. Hence I don't see how the semantics above are useful for this purpose.

If we have a function with 2 arguments;

 f(p, q) {
    g = p
}

We want to be able to say that q doesn't escape regardless of whether p=q or not. Otherwise I feel we will never be able to infer this noescape attribute.

@aqjune ping: there are a few sentences that need tweaking.

llvm/docs/LangRef.rst
2642

agreed. this non-integral pointer text isn't very readable to me either.

aqjune updated this revision to Diff 337047.Apr 12 2021, 10:52 PM

Address comments, add a few examples

aqjune marked an inline comment as done.EditedApr 12 2021, 10:57 PM

I updated this patch to address comments & added a few examples that I concern.
The examples and added sentences are mainly to represent this, because I also agree with this interpretation:

The key idea is, IMO, that if I have a pointer p and I pass it as an argument to a function f, will any bits of p become "public knowledge" through the call.
That is, if I pick random bits for p (thereby eliminating aliases), would they be available after the call (to the executing thread or another thread).

I hope this helps check whether my understanding about pointer capture is correct.

llvm/docs/LangRef.rst
1203

I moved the volatile thing to the new section.

2647

Thank you. For correctness & clarity, I removed the paragraph.

2653

Yes, your understanding is right. I added a relevant example with nocapture at the nocapture paragraph.

aqjune marked an inline comment as done.Apr 13 2021, 12:30 AM
nlopes added inline comments.Apr 13 2021, 7:42 AM
llvm/docs/LangRef.rst
2641

let's make it explicit the pointer only escapes if that information is readable by the caller after the function exits. To account for the fact that information can be overwritten.
Maybe complement the example below to have something like:

store i32 %j, i32* @glb ; escapes %a
store i32 0, i32* @glb ; %a is not escaped anymore

Minor comments, this looks like it's evolved into a reasonable step forward. As things stand, I think this should go in with some minor stuff fixed, and we can iterate on details in further patches. This is already better than what we have now. :)

llvm/docs/LangRef.rst
1203

I suggest tweaking this wording slightly. Specifically:

"This attribute applies only to the particular copy of the pointer passed in this argument. A caller could pass two copies of the same pointer with one being annotated nocapture and the other not, and the callee could validly capture through the non annotated parameter.

(Great example btw.)

2641

JFYI, this example is very subtle and possibly misleading. It's only correct since both stores are non-atomic.

If there was another threading reading from @glb, then it could capture the value of %a between the two stores.

However, that would require a race which would be UB (since the stores aren't atomic) and thus we can ignore it.

Importantly, if there was any ordering operation between the two stores it might capture, even with non-atomic stores.

2663

You should add an item about the inspection by another thread case.

(This case gets subtle and complicated fast. I'd suggest using deliberately hand wavy wording to start with and refining that in a follow up patch.)

2670

Looks like this should be 3).

aqjune updated this revision to Diff 337315.Apr 13 2021, 7:41 PM

Address comments, updates

aqjune marked 8 inline comments as done.Apr 13 2021, 7:42 PM

Here are updates:

  1. I found from this document that pointer escape is different from pointer capture; pointer capture subsumes pointer escape. I chose pointer capture because we're talking about general cases regardless of dereferenceability. I also adopted this expression from the link: A pointer value is captured if the function makes a copy of any part of the pointer that outlives the call This naturally addressed Nuno's concern.
  2. An example that uses an atomic operation is added.
  3. Updated wordings, added numbering to the volatile case
aqjune updated this revision to Diff 337316.Apr 13 2021, 7:51 PM

replace remaining escape with capture

aqjune added inline comments.Apr 13 2021, 7:54 PM
llvm/docs/LangRef.rst
2659

To clarify the case when @f calls another function, I added this.
If @g does not capture pointer (e.g. readnone & willreturn), calling @g is fine.
Does this make sense & is consistent with people's understanding?

aqjune edited the summary of this revision. (Show Details)Apr 13 2021, 7:55 PM
nlopes accepted this revision.Apr 14 2021, 12:11 PM

It looks great to me!

@reames any further comments/suggestions?

This revision is now accepted and ready to land.Apr 14 2021, 12:11 PM

Looks good to me as well.

aqjune edited the summary of this revision. (Show Details)Apr 15 2021, 5:46 PM
This revision was automatically updated to reflect the committed changes.

Thank you all for reviewing this!

jdoerfert added inline comments.Apr 15 2021, 8:17 PM
llvm/docs/LangRef.rst
2643

It is not the caller that needs to be able to read the bits, but any thread. This is a difference.

Also, I'm still not happy about "storing". We should at least clarify that there doesn't need to be a store instruction of any kind. The important point is that no bit is communicated from inside the function to the outside or another thread.

2666

This is arguably a capture but not a store.

2694

exit is probably not perfect, maybe just set a flag.

aqjune added inline comments.Apr 16 2021, 12:09 AM
llvm/docs/LangRef.rst
2643

Hi,

It is not the caller that needs to be able to read the bits, but any thread. This is a difference.

I think the second item in the text addresses this issue. :)

Also, I'm still not happy about "storing". We should at least clarify that there doesn't need to be a store instruction of any kind.

I was thinking the information can be stored to either a register or a memory location which are encompassed in the word 'place'.
Do you have any suggestion for other word?

2694

I'll update this in the post-review commit.