Page MenuHomePhabricator

[nofree] Refine concurrency requirements
Needs RevisionPublic

Authored by nhaehnle on May 1 2021, 2:47 PM.

Details

Summary

(Triggered by discussion on https://reviews.llvm.org/D100141, this is
a counter-proposal to https://reviews.llvm.org/D100676)

Declare that a nofree function cannot arrange for memory to be freed
that was dereferenceable before the call -- whether by (transitive) call
or by communication with another thread.

This is a simplification from the perspective of using the information
provided by the function attribute when optimizing the caller, at the
cost of complicating the inference of nofree.

This change arguably increases the expressive power of the IR, since we
can now have functions that are (usefully) nofree without being nosync.
Before this change, nofree on a not-nosync function does not do
anything.

The attributor is updated so that a function containing a volatile
memory operation or a release (or stronger) atomic operations is not
nofree. The reasoning behind only checking for release ordering is
explained in a code comment.

@jdoerfert: I have zero knowledge of the attributor infrastructure, so if
this direction is taken I'd appreciate a review specifically of those parts.

Diff Detail

Event Timeline

nhaehnle created this revision.May 1 2021, 2:47 PM
nhaehnle requested review of this revision.May 1 2021, 2:47 PM
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: bbn. · View Herald Transcript
reames requested changes to this revision.May 1 2021, 6:41 PM

At a minimum, you need to update FunctionAttrs.cpp.

This revision now requires changes to proceed.May 1 2021, 6:41 PM

LGTM for the LangRef changes.
I agree the proposed semantics is more useful this way. Plus I agree with the argument that the IR is more expressive this way, as it then supports nofree functions that do atomic stuff.

nikic added a subscriber: nikic.May 3 2021, 11:19 AM

+1 for the semantics change from me as well.

reames added a comment.May 3 2021, 4:40 PM

I am neutral on the proposed change, but I would very much like to hear from @sstefan1, the original author of the nofree attribute.

I am neutral on the proposed change, but I would very much like to hear from @sstefan1, the original author of the nofree attribute.

I'm not actually the one who introduced it, but I think this sounds good. Attributor changes look good as well.

llvm/docs/LangRef.rst
1619–1620

If I'm not wrong, this part hasn't been implemented yet? We won't infer nofree for a function that frees memory it allocated.

llvm/lib/Transforms/IPO/AttributorAttributes.cpp
1408

Maybe make this static, like AANoSyncImpl does?

nhaehnle updated this revision to Diff 343532.May 6 2021, 4:34 PM
nhaehnle marked an inline comment as done.

Address review feedback: Adding the required change to FunctionAttrs that @reames pointed out; in the new version, I also made sure to really go through all test cases.

Curiously, there is a test case in Analysis/ValueTracking/memory-dereferenceable.ll which already seemed to assume the semantics introduced with this change. At least, the comment seems to follow the logic that readonly implies nofree, which implies (*without* nosync) that dereferenceable on function arguments survives throughout the body. (Or perhaps the comment there could be interpreted as the belief that readonly should imply nosync? But that doesn't seem right to me.)

llvm/docs/LangRef.rst
1619–1620

Well, we do infer nofree for functions that contain alloca, which is implicitly freed at the end, though you're probably right about malloc and friends. In any case, that doesn't contradict what's written here.

llvm/lib/Transforms/IPO/AttributorAttributes.cpp
1408

I'm moving this helper around in the second version, so the point hopefully becomes moot :)

nhaehnle retitled this revision from [RFC][nofree] Refine concurrency requirements to [nofree] Refine concurrency requirements.May 6 2021, 4:37 PM
sstefan1 added inline comments.May 7 2021, 7:56 AM
llvm/docs/LangRef.rst
1619–1620

Yeah, that was just an observation. Maybe we have a TODO in AANoFree for that case?

reames requested changes to this revision.May 7 2021, 8:27 AM
reames added inline comments.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
1303

This is wrong. A thread can coordinate using only acquire ordering within 'f'.

Example:
g = o; // the object being potentially freed
if (g_acquire)

return; // then *caller* does release store

use o;

The other thread does:
while (!g) {

g_acquire = true;
while (!g_release) {}
free(g);

}

Please exactly match the existing nosync logic in this file by using isOrderedAtomic helper in this file. We can be more aggressive later if desired.

This revision now requires changes to proceed.May 7 2021, 8:27 AM

I'm late for the party, apologies.

Let me put down some thoughts:

  1. What is the motivation here? I mean, which functions, except very few intrinsic, would be "strong" nofree but not nosync?
  2. Doesn't this just mean we would check for nosync in order to derive nofree? What would be different?
  3. Have you considered the pointer nofree and if it does not suffices for your use case:
%ptr = not_captured i8*
call @completely_unknown(i8* nofree nocapture)  ; <-- doesn't free %ptr even though it could free other stuff.
llvm/lib/Transforms/IPO/AttributorAttributes.cpp
1452

You would ask AANoSync for the status on all those instructions, and that is the crucial point why I don't even see the benefit here. See main comment.

Let me put down some thoughts:

  1. What is the motivation here? I mean, which functions, except very few intrinsic, would be "strong" nofree but not nosync?

My main motivation here is to make the nofree attribute less surprising and easier to consume. I find having an attribute in the IR that is called "no free" but means "can free, unless some other attribute is present" pretty surprising.

It comes down to the ability to think about callees as abstracted black boxes: I care about the side effects of calling the black box; I don't care (and don't want to have to care) about how those side effects come to be.

  1. Doesn't this just mean we would check for nosync in order to derive nofree? What would be different?

That is one option, though I believe it is slightly more conservative than it needs to be. See the inline discussion with @reames. (There may still be non-correctness arguments for going down the conservative path of course.)

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
1303

This is an interesting example, but I don't think it shows what you want it to show? Without the release store in the caller, the other thread cannot proceed to free(g). In other words, the point at which g/o stops being dereferenceable is the release store in the caller, not the acquire load in the callee.

The acquire load is part of some coordination with the other thread, but it's redundant as far as the free concerned. At least it looks that way to me.

To attack this from a different angle, consider a slight variation of the example where the caller does a g_release store before calling f. In that case, putting dereferenceable on o is incorrect, because the other thread could free o before f is even entered.

From yet another different angle, which part of the reasoning in the comment in the AANoFreeImpl is wrong?

reames added inline comments.May 9 2021, 9:21 AM
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
1303

I'm not going to debate concurrency in this review. My (blocking, mandatory) request is that you exactly match existing behavior in this review.

If you wish to open a follow up review to refine the concurrency logic, we can do so. Warning: Concurrency is *hard* and we will need to loop in some experts who are not on this thread, and should not have to care about this review.

Let me put down some thoughts:

  1. What is the motivation here? I mean, which functions, except very few intrinsic, would be "strong" nofree but not nosync?

My main motivation here is to make the nofree attribute less surprising and easier to consume. I find having an attribute in the IR that is called "no free" but means "can free, unless some other attribute is present" pretty surprising.

But all our attributes work that way so changing nofree would make it an outlier.
Some examples:

willreturn + nounwind -> comes back and executes the next instruction after the call
argmemonly/inaccessiblemem_only + nosync -> there is no synchronization that revealed new values for globals

Depending on where we did fall wrt. synchronizing intrinsics you really cannot even use readnone/readonly
without nosync (maybe restricted to convergent functions).

The reason we want this is that attributes should compose. Almost always they also help on their own.
nofree without nosync can in fact be used for optimization. However, if we force it to go hand
in hand we loose that. Let me show you by example:

a = malloc
b = malloc
unknown(a);
nofree_nocapture_only(a, b);

Here, I can do heap2stack for b but not for a because nofree_nocapture_only could synchronize with someone to free a which escaped before.

It comes down to the ability to think about callees as abstracted black boxes: I care about the side effects of calling the black box; I don't care (and don't want to have to care) about how those side effects come to be.

I do not understand how this would be affected. The attributes limit the effects of the black box, regardless if there is one or two.
Could you elaborate why the black box view would be impaired if we do not apply this change?

  1. Doesn't this just mean we would check for nosync in order to derive nofree? What would be different?

That is one option, though I believe it is slightly more conservative than it needs to be. See the inline discussion with @reames. (There may still be non-correctness arguments for going down the conservative path of course.)

The only more conservative part I can see is arguably out of range for the static analyses we have right now.
That is, we could add "strong nofree" if we show that all synchronizing parties do not free. Other than that
I fail to see how this is not just nofree + nosync packaged as nofree, and that does not make sense to
me at all. Composability is useful, as shown above.

nlopes added a comment.May 9 2021, 3:17 PM

@jdoerfert a function that syncs with another thread may not free anything itself and the other thread may not free anything either.
If you require an optimization to have nofree + nosync in a call to preserve dereferenceability then you can't optimize this case.

That said, I had a thought last night and it isn't clear to me what to do with escaped pointers. On one hand it's nice for the optimizers if nofree implies escaped pointers aren't freed. This, however, makes inferring nofree harder. Functions that sync with other threads can't really be marked nofree without IPA. Essentially it degenerates in your concern, that nofree then implies nosync (in practice). However, it allows improved precision in theory (in case someone cares and implements an IPA).

Another possibility is to restrict nofree to unescaped pointers. But that degenerates in tagging function arguments with nofree, as unescaped can't be freed anyway.

The current proposal leaves room for improvement in the future if someone cares about multi-threaded programs and implements an IPA. Since it doesn't seem to impose any major pain in the implementation, it seems to be the best option.

@jdoerfert a function that syncs with another thread may not free anything itself and the other thread may not free anything either.
If you require an optimization to have nofree + nosync in a call to preserve dereferenceability then you can't optimize this case.

I disagree, and I think this is the conceptual differences we have here.

If one does only look at a function in isolation then I agree, nofree alone is not sufficient to make statements about all call sites and arguments.
However, if you look at a call site of a nofree function the situation is different because you have context information about the arguments.

Let's go back to my last example:

a = malloc
b = malloc
unknown(a);                  // no attributes 0-> escapes
nofree_nocapture_only(a, b); // `nofree`, `nocapture` on args

I know that at the end of this sequence b has not been freed, even though there is no nosync in sight.
I can also derive nofree for nofree_nocapture_only without worrying about atomics, for example.
(Here is the sequence in action with heap2stack kicking in for b: https://godbolt.org/z/hjYrhaEvW)

So, with nofree and nosync being completely separated, as it was in the very beginning, you can:

  1. Optimize even if only one is present, given additional call site information.
  2. Derive nofree even if nosync is not present.

All the "strong nofree" effectively gives us is that it combines the to attributes.
Let's introduce a helper:
bool cannotFreeAtAll(Function &F) { return F.hasAttr(NoFree) && F.hasAttr(NoSync); }
you get your guarantee that F will not, in any way` free something. That said,
I find the call site specific reasoning is the way to go anyway but if some generic
"for all call sites and arguments" handling is required we can go with the helper instead.

@jdoerfert a function that syncs with another thread may not free anything itself and the other thread may not free anything either.
If you require an optimization to have nofree + nosync in a call to preserve dereferenceability then you can't optimize this case.

I disagree, and I think this is the conceptual differences we have here.

If one does only look at a function in isolation then I agree, nofree alone is not sufficient to make statements about all call sites and arguments.
However, if you look at a call site of a nofree function the situation is different because you have context information about the arguments.

Let's go back to my last example:

a = malloc
b = malloc
unknown(a);                  // no attributes 0-> escapes
nofree_nocapture_only(a, b); // `nofree`, `nocapture` on args

I know that at the end of this sequence b has not been freed, even though there is no nosync in sight.
I can also derive nofree for nofree_nocapture_only without worrying about atomics, for example.
(Here is the sequence in action with heap2stack kicking in for b: https://godbolt.org/z/hjYrhaEvW)

So, with nofree and nosync being completely separated, as it was in the very beginning, you can:

  1. Optimize even if only one is present, given additional call site information.
  2. Derive nofree even if nosync is not present.

Your examples are around nofree arguments, not nofree functions, which is what this patch is about.
nofree in functions is useful for pointers that are *not* passed as argument, like this:

f(i8* %p) {
  load %p  ; implies it's dereferenceable
  call @g() nofree
  ; is %p still dereferenceable or not?
}

This patch states that yes, %p is still dereferenceable after the call to @g as it's tagged with nofree. That's it.

@jdoerfert a function that syncs with another thread may not free anything itself and the other thread may not free anything either.
If you require an optimization to have nofree + nosync in a call to preserve dereferenceability then you can't optimize this case.

I disagree, and I think this is the conceptual differences we have here.

If one does only look at a function in isolation then I agree, nofree alone is not sufficient to make statements about all call sites and arguments.
However, if you look at a call site of a nofree function the situation is different because you have context information about the arguments.

Let's go back to my last example:

a = malloc
b = malloc
unknown(a);                  // no attributes 0-> escapes
nofree_nocapture_only(a, b); // `nofree`, `nocapture` on args

I know that at the end of this sequence b has not been freed, even though there is no nosync in sight.
I can also derive nofree for nofree_nocapture_only without worrying about atomics, for example.
(Here is the sequence in action with heap2stack kicking in for b: https://godbolt.org/z/hjYrhaEvW)

So, with nofree and nosync being completely separated, as it was in the very beginning, you can:

  1. Optimize even if only one is present, given additional call site information.
  2. Derive nofree even if nosync is not present.

Your examples are around nofree arguments, not nofree functions, which is what this patch is about.

nofree functions imply nofree arguments so it very much makes a difference if we make it harder to infer
and annotate the former. As we already moved to a more semantic-based definition of escaped we can actually
look at an example where this makes a difference. Take pointers passed via memory to see how you cannot just
move the nofree to account for the change made by this patch:

struct { int *a } s;
s.a = malloc()               // this is a store.
nofree_nocapture_only(&s)    // we can potentially show `s.a` is not captured in the function but we cannot
                             // annotate that fact as a `nofree` argument without argument promotion, which
                             // we cannot always do.
// s.a is still known to be dereferenceable even without `nosync`. The function `nofree` is needed, `nofree`
// argument attributes are not involved.

nofree in functions is useful for pointers that are *not* passed as argument, like this:

f(i8* %p) {
  load %p  ; implies it's dereferenceable
  call @g() nofree
  ; is %p still dereferenceable or not?
}

This patch states that yes, %p is still dereferenceable after the call to @g as it's tagged with nofree. That's it.

Let's not ignore the arguments that are somehow passed though. Anyway, this patch does not actually add anything you
need for you example. If g is nofree & nosync you get exactly the same effect already. I hope we agree on that.

If we allow user feedback you can also make your @f nosync and get the effect you want even if @g is not nosync,
This is especially useful if @g is an intrinsic users cannot actually annotate.

In addition to the examples I brought up that are impacted, let me reiterate what I said in my first post:
Other attributes work in the same composable way:

void foo(int * __restrict__ flag, char* p) {
  argmemonly(flag);                 // Says argmemonly and p is not passed nor does p alias flag, still p is potentially written.
                                    // This is like a call to a `nofree` function which might still free pointers via synchronization.
                                    // `nosync` is a orthogonal component to these attributes as we can otherwise not derive them.
                                    // However, as with `nofree`, we can use the attributes even without `nosync` in some situations.
}

Hi Johannes,

before I give a more complete response, I'd like to understand your position more fully. To that end:

nofree functions imply nofree arguments so it very much makes a difference if we make it harder to infer
and annotate the former. As we already moved to a more semantic-based definition of escaped we can actually
look at an example where this makes a difference. Take pointers passed via memory to see how you cannot just
move the nofree to account for the change made by this patch:

struct { int *a } s;
s.a = malloc()               // this is a store.
nofree_nocapture_only(&s)    // we can potentially show `s.a` is not captured in the function but we cannot
                             // annotate that fact as a `nofree` argument without argument promotion, which
                             // we cannot always do.
// s.a is still known to be dereferenceable even without `nosync`. The function `nofree` is needed, `nofree`
// argument attributes are not involved.

Is that actually true? Regardless of the nocapture flag, s.a may be passed to another thread which then frees it. (nocapture only applies to &s in the first place, and even then, it does not say anything about temporary copies of the pointer that may be passed to other threads.)

In addition to the examples I brought up that are impacted, let me reiterate what I said in my first post:
Other attributes work in the same composable way:

void foo(int * __restrict__ flag, char* p) {
  argmemonly(flag);                 // Says argmemonly and p is not passed nor does p alias flag, still p is potentially written.
                                    // This is like a call to a `nofree` function which might still free pointers via synchronization.
                                    // `nosync` is a orthogonal component to these attributes as we can otherwise not derive them.
                                    // However, as with `nofree`, we can use the attributes even without `nosync` in some situations.
}

I don't think this is true. LangRef certainly doesn't support that claim as far as I can tell (the definition of argmemonly talks about there being no side effects at all, without any exception for side effects triggered via other threads), and it turns out that due to confusion about this in the attributor, I can actually provoke a miscompilation fairly easily. So there may be a bigger issue at play there. If you do think that there is consensus and/or documentation about argmemonly working the way you think it should work, can you please point to it?

nofree functions imply nofree arguments so it very much makes a difference if we make it harder to infer
and annotate the former. As we already moved to a more semantic-based definition of escaped we can actually
look at an example where this makes a difference. Take pointers passed via memory to see how you cannot just
move the nofree to account for the change made by this patch:

struct { int *a } s;
s.a = malloc()               // this is a store.
nofree_nocapture_only(&s)    // we can potentially show `s.a` is not captured in the function but we cannot
                             // annotate that fact as a `nofree` argument without argument promotion, which
                             // we cannot always do.
// s.a is still known to be dereferenceable even without `nosync`. The function `nofree` is needed, `nofree`
// argument attributes are not involved.

Is that actually true? Regardless of the nocapture flag, s.a may be passed to another thread which then frees it. (nocapture only applies to &s in the first place, and even then, it does not say anything about temporary copies of the pointer that may be passed to other threads.)

I realize now that I was partially wrong about this: the definition of the nocapture attribute does say that the function cannot send the pointer to another thread, but the pointer in question is &s. So it still seems to me that in the example, the callee is allowed to send s.a to another thread which then frees it (given today's definition of nofree).

If this example does fall like I think it does, is there still an optimization benefit to having the function attribute nofree as defined today vs. as defined in this patch? As I see it, there were three arguments:

  1. Inference doesn't have to worry about synchronization.
  2. Other attributes work the same way; however, this is false at least for argmemonly and arguably also for inaccessiblememonly, except that one presumably never gets inferred?
  3. Today's nofree enables annotations and optimizations that the proposed change would no longer allow. I can't say for certain whether this is false, but at least the examples given so far seem to fail (the first one because you can just use the nofree argument attribute instead, and the second is discussed above in this comment).

That leaves as a single argument the complexity of inference, but how strong of an argument is that, really?

P.S.: The example I have in mind where confusion about argmemonly can be used to trigger a miscompile is at https://godbolt.org/z/dEG8n6W3E. Current main merges the two loads from %q, which is incorrect since @setup could arrange for a second thread to be created which changes its value in between.

nhaehnle updated this revision to Diff 344603.May 11 2021, 5:18 PM

Split up the patch as per @reames' request

jdoerfert added a comment.EditedMay 11 2021, 5:37 PM

[EDIT: Added response to newest comment in the end.]

Hi Johannes,

before I give a more complete response, I'd like to understand your position more fully. To that end:

nofree functions imply nofree arguments so it very much makes a difference if we make it harder to infer
and annotate the former. As we already moved to a more semantic-based definition of escaped we can actually
look at an example where this makes a difference. Take pointers passed via memory to see how you cannot just
move the nofree to account for the change made by this patch:

struct { int *a } s;
s.a = malloc()               // this is a store.
nofree_nocapture_only(&s)    // we can potentially show `s.a` is not captured in the function but we cannot
                             // annotate that fact as a `nofree` argument without argument promotion, which
                             // we cannot always do.
// s.a is still known to be dereferenceable even without `nosync`. The function `nofree` is needed, `nofree`
// argument attributes are not involved.

Is that actually true? Regardless of the nocapture flag, s.a may be passed to another thread which then frees it. (nocapture only applies to &s in the first place, and even then, it does not say anything about temporary copies of the pointer that may be passed to other threads.)

I am not aware of a way to pass to another thread that does not capture a pointer. Please feel free to show me one.

In addition to the examples I brought up that are impacted, let me reiterate what I said in my first post:
Other attributes work in the same composable way:

void foo(int * __restrict__ flag, char* p) {
  argmemonly(flag);                 // Says argmemonly and p is not passed nor does p alias flag, still p is potentially written.
                                    // This is like a call to a `nofree` function which might still free pointers via synchronization.
                                    // `nosync` is a orthogonal component to these attributes as we can otherwise not derive them.
                                    // However, as with `nofree`, we can use the attributes even without `nosync` in some situations.
}

I don't think this is true. LangRef certainly doesn't support that claim as far as I can tell (the definition of argmemonly talks about there being no side effects at all, without any exception for side effects triggered via other threads), and it turns out that due to confusion about this in the attributor, I can actually provoke a miscompilation fairly easily. So there may be a bigger issue at play there. If you do think that there is consensus and/or documentation about argmemonly working the way you think it should work, can you please point to it?

So you are arguing armemonly is also implying nosync? Then we get into the same problems we have here. You don't actually gain expressiveness, after all you can check if both attribute X and nosync are present, but you loose the ability to derive and utilize them separately. One conceptual thing to consider is that almost all of this predates nosync in the first place. That said, I don't see how you would interpret the text as there are no side effects by other threads:

argmemonly

    This attribute indicates that the only memory accesses inside function are loads and stores from objects pointed to by its pointer-typed arguments, with arbitrary offsets. Or in other words, all memory operations in the function can refer to memory only using pointers based on its function arguments.

    Note that argmemonly can be used together with readonly attribute in order to specify that function reads only from its arguments.

    If an argmemonly function reads or writes memory other than the pointer arguments, or has other side-effects, the behavior is undefined.

What in the above definition prevents me from doing this:

void foo(atomic flag) { atomic_write(flag, 1); while(atomic_read(flag) == 1); atomic_write(flag, 2); }

Inside the function all memory that is accesses are loads/stores from objects pointed by pointer-type arguments (with 0 offset). Do you disagree?

FWIW, this basically breaks down with stuff like malloc which we assume is inaccessiblememonly but it is in fact *not*, and cannot be, nosync. So if we require attributes to imply nosync, malloc cannot be inaccessiblememonly anymore. (see https://reviews.llvm.org/D98605#2625243)


nofree functions imply nofree arguments so it very much makes a difference if we make it harder to infer
and annotate the former. As we already moved to a more semantic-based definition of escaped we can actually
look at an example where this makes a difference. Take pointers passed via memory to see how you cannot just
move the nofree to account for the change made by this patch:

struct { int *a } s;
s.a = malloc()               // this is a store.
nofree_nocapture_only(&s)    // we can potentially show `s.a` is not captured in the function but we cannot
                             // annotate that fact as a `nofree` argument without argument promotion, which
                             // we cannot always do.
// s.a is still known to be dereferenceable even without `nosync`. The function `nofree` is needed, `nofree`
// argument attributes are not involved.

Is that actually true? Regardless of the nocapture flag, s.a may be passed to another thread which then frees it. (nocapture only applies to &s in the first place, and even then, it does not say anything about temporary copies of the pointer that may be passed to other threads.)

I realize now that I was partially wrong about this: the definition of the nocapture attribute does say that the function cannot send the pointer to another thread, but the pointer in question is &s. So it still seems to me that in the example, the callee is allowed to send s.a to another thread which then frees it (given today's definition of nofree).

As stated in my text on the right, we can potentially derive s.a is not captured and we can derive, or read from attributes, &s is not captured. If we cannot derive the either, we can certainly expect someone to free it s.a. However, if we can derive both, we cannot actually put the nofree on the s.a pointer to manifest the information. So there is no place to put it and with a stronger nofree on the function we have less chance to put that one. [While right now there is no place for the "not captured" of the &s.a pointer, we are already working on such ways, e.g., https://reviews.llvm.org/D93189 (and there is an RFC thread).]

If this example does fall like I think it does, is there still an optimization benefit to having the function attribute nofree as defined today vs. as defined in this patch? As I see it, there were three arguments:

  1. Inference doesn't have to worry about synchronization.

Agreed. Split the problem. Though, to be fair, it is trivial to require nosync for nofree to be deducible in the first place.

  1. Other attributes work the same way; however, this is false at least for argmemonly and arguably also for inaccessiblememonly, except that one presumably never gets inferred?

I don't think it is.

  1. Today's nofree enables annotations and optimizations that the proposed change would no longer allow. I can't say for certain whether this is false, but at least the examples given so far seem to fail (the first one because you can just use the nofree argument attribute instead, and the second is discussed above in this comment).

I think once I manage to explain the examples you will reconsider this point. To give you another one:

__attribute__((nosync)) // or __attribute__((assume("nosync")) or #pragma omp assume ext_no_synchronization
void user_code() {
  a = malloc();
  b = malloc();
  unknown_function_with_a_now_nosync_call_site(a, b);
  ...
  free(a);
}

That leaves as a single argument the complexity of inference, but how strong of an argument is that, really?

I think there are a lot of reasons to keep it separated, the examples above still stand. That said, why would
we combine it? It saves one attribute lookup?

P.S.: The example I have in mind where confusion about argmemonly can be used to trigger a miscompile is at https://godbolt.org/z/dEG8n6W3E. Current main merges the two loads from %q, which is incorrect since @setup could arrange for a second thread to be created which changes its value in between.

Agreed this is a bug. Though, it's in instcombine (or probably MemorySSA) not the Attributor. If you disagree you have to tell me which part of the argmemonly definition is violated by release: "This attribute indicates that the only memory accesses inside function are loads and stores from objects pointed to by its pointer-typed arguments, with arbitrary offsets. Or in other words, all memory operations in the function can refer to memory only using pointers based on its function arguments."

nlopes accepted this revision.May 12 2021, 10:47 AM
  1. Today's nofree enables annotations and optimizations that the proposed change would no longer allow. I can't say for certain whether this is false, but at least the examples given so far seem to fail (the first one because you can just use the nofree argument attribute instead, and the second is discussed above in this comment).

I think once I manage to explain the examples you will reconsider this point. To give you another one:

__attribute__((nosync)) // or __attribute__((assume("nosync")) or #pragma omp assume ext_no_synchronization
void user_code() {
  a = malloc();
  b = malloc();
  unknown_function_with_a_now_nosync_call_site(a, b);
  ...
  free(a);
}

This is a good example, that shows that if the user annotates a function with nosync then inferring nofree becomes easier. So the proposed semantics doesn't require nosync, but inference can be aided by its presence (not in this patch yet).

I think all the reviewers' concerns have been addresses at this point. Please let us know if there's some remaining concern regarding the proposed semantics.
The semantics proposed in this patch makes a lot of sense to me. And to make things clear, this proposal *does not* make nofree imply nosync (nor the other way around).

jdoerfert requested changes to this revision.May 12 2021, 11:05 AM
  1. Today's nofree enables annotations and optimizations that the proposed change would no longer allow. I can't say for certain whether this is false, but at least the examples given so far seem to fail (the first one because you can just use the nofree argument attribute instead, and the second is discussed above in this comment).

I think once I manage to explain the examples you will reconsider this point. To give you another one:

__attribute__((nosync)) // or __attribute__((assume("nosync")) or #pragma omp assume ext_no_synchronization
void user_code() {
  a = malloc();
  b = malloc();
  unknown_function_with_a_now_nosync_call_site(a, b);
  ...
  free(a);
}

This is a good example, that shows that if the user annotates a function with nosync then inferring nofree becomes easier. So the proposed semantics doesn't require nosync, but inference can be aided by its presence (not in this patch yet).

This is not at all what the example shows. The example shows that we cannot infer nofree for unknown_function_with_a_now_nosync_call_site with the new semantics but with the old semantics we could.
Only with the existing semantics you can optimize this example, not with the new one. I would have assumed this is implied given that I provided the example and I am arguing the proposed semantics are useless and prevent optimizations.

I think all the reviewers' concerns have been addresses at this point. Please let us know if there's some remaining concern regarding the proposed semantics.

My concerns have not changed at all. Unsure why you would assume they have been addressed.

The semantics proposed in this patch makes a lot of sense to me. And to make things clear, this proposal *does not* make nofree imply nosync (nor the other way around).

For deduction purposes it effectively does imply/require nosync. If you disagree, please provide an example where we can derive nofree without nosync under the proposed semantics.

This revision now requires changes to proceed.May 12 2021, 11:05 AM
  1. Today's nofree enables annotations and optimizations that the proposed change would no longer allow. I can't say for certain whether this is false, but at least the examples given so far seem to fail (the first one because you can just use the nofree argument attribute instead, and the second is discussed above in this comment).

I think once I manage to explain the examples you will reconsider this point. To give you another one:

__attribute__((nosync)) // or __attribute__((assume("nosync")) or #pragma omp assume ext_no_synchronization
void user_code() {
  a = malloc();
  b = malloc();
  unknown_function_with_a_now_nosync_call_site(a, b);
  ...
  free(a);
}

This is a good example, that shows that if the user annotates a function with nosync then inferring nofree becomes easier. So the proposed semantics doesn't require nosync, but inference can be aided by its presence (not in this patch yet).

This is not at all what the example shows. The example shows that we cannot infer nofree for unknown_function_with_a_now_nosync_call_site with the new semantics but with the old semantics we could.
Only with the existing semantics you can optimize this example, not with the new one. I would have assumed this is implied given that I provided the example and I am arguing the proposed semantics are useless and prevent optimizations.

For the example above, I don't see how you can deduce nofree for user_code or for unknown_function_with_a_now_nosync_call_site without further information that is not given in this example. Without knowing anything else, unknown_function_with_a_now_nosync_call_site may free some pointer stored in a global var. Who knows?
If my reading is wrong, please make the example more explicit and explain things more slowly, otherwise I'm unable to understand it. These things are hard and nothing is implied or implicit.

The semantics proposed in this patch makes a lot of sense to me. And to make things clear, this proposal *does not* make nofree imply nosync (nor the other way around).

For deduction purposes it effectively does imply/require nosync. If you disagree, please provide an example where we can derive nofree without nosync under the proposed semantics.

I already gave an example that shows that the semantics proposed in this patch is more expressive that the old one. Please don't mix semantics and some inference algorithm you have in mind (that hasn't been shared with us).
A simple example: consider a program with 2 threads, a main one and a helper thread. The helper thread doesn't free anything. The main thread may sync with the helper thread, so some function are not nosync, but you can still tag them as nofree if they don't free up anything.
So the proposed semantics is strictly more expressive than the old one. Plus it doesn't make it any harder to infer than before.

Even if the current patch cannot infer nofree for the example I've provided, there's nothing preventing someone writing an algorithm that could infer that in future. Plus, using your own argument, users may provide that information.