This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Clarify meaning of "dereferencable" attribute/metadata.
Needs RevisionPublic

Authored by efriedma on Jun 15 2018, 2:41 PM.

Details

Summary

This is consistent with what is currently implemented in LLVM, as far as I know.

There's an alternative model where "dereferenceable" only applies at the point of the call/load. That might be more generally useful? But it doesn't match the implementation of Value::getPointerDereferenceableBytes.

Diff Detail

Repository
rL LLVM

Event Timeline

efriedma created this revision.Jun 15 2018, 2:41 PM
sanjoy added inline comments.Jun 16 2018, 12:25 PM
docs/LangRef.rst
1133

DeadArgElim transforms

define void @f(i8* dereferenceable(16) %ptr) {
  ret void
}

define void @caller(i8* %v) {
  call void @f(i8* %v)
  ret void
}

to

define void @f(i8* dereferenceable(16) %ptr) {
  ret void
}

define void @caller(i8* %v) {
  call void @f(i8* undef)
  ret void
}

Given this semantic this transform is wrong, right?

nlopes added inline comments.Jun 17 2018, 11:27 AM
docs/LangRef.rst
1133

Nice example :)
I guess DeadArgElim could be fixed to drop the dereferenceable tag?

I think being UB when the ptr is not dereferenceable sounds reasonable to me.

sanjoy added inline comments.Jun 17 2018, 11:42 AM
docs/LangRef.rst
1133

I guess DeadArgElim could be fixed to drop the dereferenceable tag?

Sounds good to me. I think we can think about dereferenceable(N) as "there is an N byte load from the pointer at the entry and exit of the function".

There's an alternative model where "dereferenceable" only applies at the point of the call/load. That might be more generally useful? But it doesn't match the implementation of Value::getPointerDereferenceableBytes.

Would it be feasible (and still useful to the optimizer) to add an annotation providing the alternative semantics? Most of the places where Clang emits this annotation mean only "dereferenceable at the point of call", not "dereferenceable for the duration of the function".

It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)

IMO, we really need to add the other attribute to the IR.

If we codify these semantics, we need to stop Clang from using this attribute. But we shouldn't just rip all that code out only to re-add it once the new attribute is in place. I would seem cleaner to add the new attribute with the desired semantics (even if it isn't (yet) wired up to the optimizer) so that we can just switch Clang from one attribute to the other.

chandlerc requested changes to this revision.Jul 6 2018, 6:28 PM

(marking as requested changes so this doesn't show up on my dashboard)

This revision now requires changes to proceed.Jul 6 2018, 6:28 PM

IMO, we really need to add the other attribute to the IR.

If we codify these semantics, we need to stop Clang from using this attribute. But we shouldn't just rip all that code out only to re-add it once the new attribute is in place. I would seem cleaner to add the new attribute with the desired semantics (even if it isn't (yet) wired up to the optimizer) so that we can just switch Clang from one attribute to the other.

I agree.

Also, fortunately, hooking up the new attribute should be easy. llvm::isDereferenceableAndAlignedPointer already takes a context instruction, and if I'm thinking about this correctly, the new attribute behaves like the old attribute so long as the pointer a) hasn't been captured (assuming that freeing a pointer captures it) or b) has been captured but no functions have been called or atomics used, and I'd not bother checking (b) for now, so we just need to call PointerMayBeCapturedBefore (we can later add another attribute to indicate that a function hasn't freed its argument, and then perform bottom-up inference in the usual way, which will make this more precise).

It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction (and we should just need to also check for capturing). Please let me know if you agree.

It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction (and we should just need to also check for capturing). Please let me know if you agree.

If this is the case, then I wonder whether we should really just change the semantics and add a separate attribute to model the concept of dereferenceable lasting the entire function...

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction

Well, not all the users pass in the context parameter, since it's optional, but otherwise, that's all we need in theory. That said, it would be too expensive without some sort of cache. Dereferenceable doesn't imply noalias, so any call could free the pointer. So we have to iterate over every instruction between the use and the function's entry point to find calls which might alias. Given we need a cache, we need an analysis pass to store the cache, and that cache has to be preserved by every loop pass so we can use it from LICM.

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction

Well, not all the users pass in the context parameter, since it's optional, but otherwise, that's all we need in theory.

Agreed. Although LICM does, and (I suspect) that's the most important caller.

That said, it would be too expensive without some sort of cache. Dereferenceable doesn't imply noalias, so any call could free the pointer. So we have to iterate over every instruction between the use and the function's entry point to find calls which might alias. Given we need a cache, we need an analysis pass to store the cache, and that cache has to be preserved by every loop pass so we can use it from LICM.

I'm not sure. One part of this story is that isDereferenceableAndAlignedPointer is itself rarely used directly. LICM calls it directly, as does InstCombine, but essentially all other users call isSafeToLoadUnconditionally which calls isDereferenceableAndAlignedPointer. isSafeToLoadUnconditionally also does scanning for other memory accesses and I've always assumed was not cheap (although I've never profiled it specifically). The scanning is helpful for LICM, so we're right to avoid it there, but my point is that nearly every other place we're querying for this information we're actually doing something relatively expensive.

None of this is to say that we shouldn't be threading a cache of some kind into this interface. We should. It's not clear to me that we need more than a cache of OrderedBasicBlocks (PointerMayBeCapturedBefore can already use one of these), and this is what we currently use with AA.callCapturesBefore. Specifically, I'm thinking of an OrderedInstructions cache (which GVN uses).

I'd be happy to say that we add an OrderedInstructions *OI = nullptr to the interface of isDereferenceableAndAlignedPointer, and only use PointerMayBeCapturedBefore if we can provide the associated OBB, otherwise we fallback to calling PointerMayBeCaptured. Adding support in LICM should be relatively straightforward. What do you think?

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

So instead, we have to scan every instruction in the function between the beginning of the function and the insertion point, which is going to be very expensive. (Or if we set a tight limit on the number of instructions it will walk, it probably give up before it reaches the function's entry point.)

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Ah, indeed. You're certainly correct. And free is marked as nocapture (which we'd likely prefer to keep, although free almost certainly does capture the pointer value in some physical sense, it does not do so in a way that will be visible to the rest of the program (except that it might be returned by some later malloc call)).

We'd need to model this directly for it to be useful (as an attribute that means that the function doesn't call free, or doesn't free a particular argument, or similar). I'll send an RFC about that.

So instead, we have to scan every instruction in the function between the beginning of the function and the insertion point, which is going to be very expensive. (Or if we set a tight limit on the number of instructions it will walk, it probably give up before it reaches the function's entry point.)

hfinkel accepted this revision.Jul 12 2018, 4:15 PM

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Ah, indeed. You're certainly correct. And free is marked as nocapture (which we'd likely prefer to keep, although free almost certainly does capture the pointer value in some physical sense, it does not do so in a way that will be visible to the rest of the program (except that it might be returned by some later malloc call)).

We'd need to model this directly for it to be useful (as an attribute that means that the function doesn't call free, or doesn't free a particular argument, or similar). I'll send an RFC about that.

Based on that RFC (http://lists.llvm.org/pipermail/llvm-dev/2018-July/124555.html), we'll add new attributes to capture Clang's reference use case and leave these as-is. As a result, I think that we can move forward with this clarification for the existing attributes/metadata. LGTM.

So instead, we have to scan every instruction in the function between the beginning of the function and the insertion point, which is going to be very expensive. (Or if we set a tight limit on the number of instructions it will walk, it probably give up before it reaches the function's entry point.)

Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2019, 9:48 AM

While we are going forward with the "no-free" and "no-sync" attributes, I need to also fix the ambiguity here asap.

I'll propose a patch with the alternative today. So I'll make dereferenceable mean dereferenceable at this point and for globally dereferenceable I'll add a new attribute.
This should way the patch of least resistance forward (I'll put detailed argumentation in the commit message). If it turns out we do want the opposite after all, I'm willing to
use this as a base and add a new dereferenceable at this point attribute.

docs/LangRef.rst
7961

I'd nix 7961 and 7962 and put the following after "loaded is known to be dereferenceable":

, thus the conditions described in <link_to_dereferenceable> hold.

This way we have a single definition.

7972

Same as above.

sanjoy resigned from this revision.Jan 29 2022, 5:42 PM