Runtime functions, as well as regular functions, might require a pointer to be passed in memory even though the memory is simply a means to pass (multiple) arguments. That is, the indirection through memory is only used on the call edge and not otherwise relevant. However, such pointers are currently assumed to escape as soon as they are stored in memory even if the callee only reloads them and use them in a "non-escaping" way. Generally, storing a pointer might not cause it to escape if all "uses of the memory" it is stored to all have the "nocapture" property. To allow optimizations in the presence of pointers stored to memory we introduce two new IR extensions. `!nocapture` metadata on stores and "nocapture_use" operand bundles for call instructions. The former ensures that the store can be ignored for the purpose of escape analysis. The latter indicates that a call is using a pointer value but not capturing it. This is important as the call might still read or write the pointer and since the passing of the pointer through memory is not considered "capturing" with the "nocapture" metadata, we need to otherwise indicate the potential read/write. As an example use case where we can deduce `!nocapture` metadata, consider the following code: ``` struct Payload { int *a; double *b; }; int pthread_create(pthread_t *thread, const pthread_attr_t *attr, ┊ ┊ ┊ ┊ ┊ ┊ ┊ ┊ ┊ void *(*start_routine) (void *), void *arg); int use(double); void fn(void *v) { Payload *p = (Payload*)(v); // Load the pointers from the payload and then dereference them, // this will not capture the pointers. int *a = p->a; double *b = p->b; *a = use(*b); } void foo(int *a, double *b) { Payload p = {a, b}; pthread_create(..., &fn, &p); } ``` Given the usage of the payload struct in `fn` we can conclude neither `a` nor `b` in are captured in `foo`, however we could not express this fact "locally" before. That is, we can deduce and annotate it for the arguments `a` and `b` but only since there is no other use (later on). Similarly, if the callee would not be known, we were not able to describe the "nocapture" behavior of the API. A follow up patch will introduce `!nocapture` metadata to stores generated during OpenMP lowering. This will, among other things, fix PR48475.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Analysis/CaptureTracking.cpp | ||
---|---|---|
321 | Interestingly, we seem to not consider operand bundle uses to be capturing. This is (IMHO) wrong in general. "nocapure_use" does not capture but others should be assumed to do. Will propose a separate patch at some point. |
llvm/lib/Analysis/CaptureTracking.cpp | ||
---|---|---|
321 | Data operands = Arguments + operand bundles, so this should also consider operand bundle uses. |
llvm/lib/Analysis/CaptureTracking.cpp | ||
---|---|---|
321 | Then I don't know why my local tests "work" as I add operand bundles but nocapture is deduced for things used in there. I'll have to look into this. |
I have a question - is it possible to have the same optimization power with !nocapture only? In other words, is it purely for making analysis flow-insensitive?
We can define that it is undefined behavior for any function calls after this store to load the pointer and capture it.
llvm/docs/LangRef.rst | ||
---|---|---|
2325 | nit: typos (stoed, explicitl) | |
llvm/test/Transforms/InstCombine/nocapture_use.ll | ||
21 ↗ | (On Diff #315268) | tmp1, tmp3, tmp4 are not needed (similarly below) |
42 ↗ | (On Diff #315268) | (Orthogonal to this patch) would it be great to explicitly state in LangRef that it is illegal for a function call to return a nocapture pointer? I can make a super short patch for this. @nlopes was also interested in this when implementing nocapture in Alive2. |
Hm. The two could be used in isolation but make most sense together. !nocapture only makes sense alone if you don't actually use the memory later, or the use is already "implied" otherwise. I don't have a use case for "nocapture_use"(%ptr) alone right now. It basically indicates a non-capturing use but I don't know why we would add one on it's own.
Together they allow to weaken/demote an existing use. I'd be interested in ideas for standalone uses.
We can define that it is undefined behavior for any function calls after this store to load the pointer and capture it.
That part is missing, right.
llvm/test/Transforms/InstCombine/nocapture_use.ll | ||
---|---|---|
42 ↗ | (On Diff #315268) | I'm not sure I understand. FWIW, Attributor uses an internal attribute "nocapture_maybe_returned" for call arguments to improve nocapture analysis. |
llvm/test/Transforms/InstCombine/nocapture_use.ll | ||
---|---|---|
42 ↗ | (On Diff #315268) | I meant whether this is UB or not: define i8* @f(i8* nocapture %p) { ret i8* %p } I think it should be UB, but this doesn't seem to be explicit in LangRef. nocapture_maybe_returned seems great..! I believe many library calls like memcpy can have this attached to their arguments. :) |
llvm/test/Transforms/InstCombine/nocapture_use.ll | ||
---|---|---|
42 ↗ | (On Diff #315268) |
Yep, and almost every function that has the returned attribute on a pointer argument. Right now we assume such calls capture which is bad. Whoever introduces the enum version should replace the string version in the Attributor.
Right. UB seems proper as there is no "value we could poison". I added an UB sentence to the new lang ref additions but not to the original nocapture attribute definition. Feel free to do so. |
This isn't a comment on the proposed design so much as how the proposed design is described. (I intend to respond to your llvm-dev thread... at some point.)
I think you need to revise the specification wording into something more operational. As currently described, you define metadata in terms of existing analyzes which aren't themselves well defined. (e.g. What *precisely* does it mean for something to be captured?) What operations are allowed and/or disallowed by the existence of the metadata/bundle? For the ones which are disallowed, is that full UB? Producing poison? Something else?
Entirely seriously, I'm not sure I could implement the current specification. In particular, I don't believe I could take this specification and implement it in another compiler without *very* close examination of LLVM's current implementation details. As one of the people who will end up maintaining this, having to back reference the implementation of a N-year old compiler to understand current semantics will rapidly become very problematic.
Here's an attempt to take what you've written for the !nocapture metadata and translate it into something a bit more operation. Your welcome to take this as either a starting point, or as simply an illustration of what I meant by operational.
The existence of the !nocapture metadata on the instruction indicates that the location stored to is not captured. That is, there exists a small set of frame local copies of points which can be used to access that location, and that if the optimizer can identify all of them, it can use that information to contribute to tracking all copies of the stored pointer.
If the storage location has been captured, execution of a store w/the !nocapture metadata is immediate UB. Storing a previously captured pointer into an uncaptured location is well defined. That is, the metadata says nothing of the capture status for the stored value.
On it's own, a !nocapture store says nothing about whether the contents of storage are later loaded, and the pointer stored later propagated. However, !nocapture stores and the nocapture_use operand bundle can be combined to provide this stronger fact.
Defect List -- These are things I know are wrong with my attempt.
"frame local" needs defined, and might not be quite what we want.
"captured" is loosely defined, and needs revision.
After my own attempt, one further suggestion on structuring the langref changes. I think we need to introduce a definitional section defining capture, and spelling out implications of that, then having the inline description of the metadata be purely structural and link back. Doing it all inline in the store section seems really forced.
(Mildly relevant - I have an old review which takes the approach I described towards specification on a vaguely related topic. Might be useful as a reference. https://reviews.llvm.org/D52192)
Thanks for the input. The details of the design are a bit in flux given the llvm-dev thread but I responded to the more conceptual parts below.
I don't know what you mean. Implement what? TBH, I'm also not sure what the description below does better, from a conceptual standpoint. Maybe it would help me to understand which parts of the proposed wording is problematic and why?
Here's an attempt to take what you've written for the !nocapture metadata and translate it into something a bit more operation. Your welcome to take this as either a starting point, or as simply an illustration of what I meant by operational.
The existence of the !nocapture metadata on the instruction indicates that the location stored to is not captured. That is, there exists a small set of frame local copies of points which can be used to access that location, and that if the optimizer can identify all of them, it can use that information to contribute to tracking all copies of the stored pointer.
(I saw later you have frame local on the defect list, this was my response initially) Generally, I don't think we have a concept of frame local or we should go there. It is also not true for the use case described here, the users might be in other "frames". I also think the wording swapped who is the "recipient" of the nocapture property.
Now with regards to content: Storing into a location doesn't affect the "capture property" of that location. Storing a pointer into a location affects the pointers "capture property".
So, to use your sentence as a base it should read: The existence of the !nocapture metadata on the instruction indicates that the stored pointer is not captured.
I explicitly don't want to require anyone to identify the access locations because they might not be exposed. Let's assume bulk_memset defined below is in a library and we know what it does.
We can now mark the stores of the pointers into the ptrs argument as !nocapture if we also add a "nocapture_use" to the call site of bulk_memset.
/// lib_bulk.so ///{ void bulk_memset(int nargs, char** ptrs, size_t* sizes, char c) { // For example we could do it in parallel: #pragma omp parallel for for (int i = 0; i < nargs; ++i) { memset(ptrs[i], c, sizes[i]); } } ///} // User code, original: char foo(char *a, char *b) { *a = 1; *b = 2; char *ptrs[2]; ptrs[0] = a; // a is captured now ptrs[1] = b; // b is captured now int sizes[2] = {1, 1}; bulk_memset(2, ptrs, sizes, 0); return *a + *b; // needs to be 0! } // User code, optimized: void foo(char /* nocapture */ *a, char * /* nocapture */ b) { *a = 1; *b = 2; char *ptrs[2]; ptrs[0] /* !nocapture */ = a; // a is *not* captured now ptrs[1] /* !nocapture */ = b; // b is *not* captured now int sizes[2] = {1, 1}; bulk_memset(2, ptrs, sizes, 0) ["nocapture_use"(a, b)]; return *a + *b; // needs to be 0! }
This is pretty much the use case we have as part of https://bugs.llvm.org/show_bug.cgi?id=48475 (among other places).
Basically, the API requires us to pass something in memory but we know all we do at the other end is to load it from
that memory and use it without "capturing" it. The pthread_create example I have in the commit message shows how this
can also be used if we see both sides of the API and can deduce the property.
If the storage location has been captured, execution of a store w/the !nocapture metadata is immediate UB. Storing a previously captured pointer into an uncaptured location is well defined. That is, the metadata says nothing of the capture status for the stored value.
But that is what it should do, right? I'm confused why we would need to say something about the captured status of the location we store into at all.
On it's own, a !nocapture store says nothing about whether the contents of storage are later loaded, and the pointer stored later propagated. However, !nocapture stores and the nocapture_use operand bundle can be combined to provide this stronger fact.
Defect List -- These are things I know are wrong with my attempt.
"frame local" needs defined, and might not be quite what we want.
"captured" is loosely defined, and needs revision.
After my own attempt, one further suggestion on structuring the langref changes. I think we need to introduce a definitional section defining capture, and spelling out implications of that, then having the inline description of the metadata be purely structural and link back. Doing it all inline in the store section seems really forced.
A section about "capture" seems reasonable to me. We refer to that concept also in https://llvm.org/docs/LangRef.html#deoptimization-operand-bundles.
(Mildly relevant - I have an old review which takes the approach I described towards specification on a vaguely related topic. Might be useful as a reference. https://reviews.llvm.org/D52192)
This is really interesting. thanks for the link. I'll probably circle back to that one sometime soon.
nit: typos (stoed, explicitl)