This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
5
LangRef.rst

Differential D48239

[LangRef] Clarify meaning of "dereferencable" attribute/metadata.
Needs RevisionPublic

Authored by efriedma on Jun 15 2018, 2:41 PM.

Download Raw Diff

Details

Reviewers

rsmith
chandlerc
hfinkel
nlopes
apilipenko
sanjoy

Summary

This is consistent with what is currently implemented in LLVM, as far as I know.

There's an alternative model where "dereferenceable" only applies at the point of the call/load. That might be more generally useful? But it doesn't match the implementation of Value::getPointerDereferenceableBytes.

Diff Detail

Repository: rL LLVM

Event Timeline

efriedma created this revision.Jun 15 2018, 2:41 PM

sanjoy added inline comments.Jun 16 2018, 12:25 PM

docs/LangRef.rst

1133

DeadArgElim transforms

define void @f(i8* dereferenceable(16) %ptr) {
  ret void
}

define void @caller(i8* %v) {
  call void @f(i8* %v)
  ret void
}

define void @f(i8* dereferenceable(16) %ptr) {
  ret void
}

define void @caller(i8* %v) {
  call void @f(i8* undef)
  ret void
}

Given this semantic this transform is wrong, right?

nlopes added inline comments.Jun 17 2018, 11:27 AM

docs/LangRef.rst
1133	Nice example :) I guess DeadArgElim could be fixed to drop the dereferenceable tag? I think being UB when the ptr is not dereferenceable sounds reasonable to me.

sanjoy added inline comments.Jun 17 2018, 11:42 AM

docs/LangRef.rst
1133	I guess DeadArgElim could be fixed to drop the dereferenceable tag? Sounds good to me. I think we can think about `dereferenceable(N)` as "there is an `N` byte load from the pointer at the entry and exit of the function".

a.elovikov added a subscriber: a.elovikov.Jun 18 2018, 12:19 AM

There's an alternative model where "dereferenceable" only applies at the point of the call/load. That might be more generally useful? But it doesn't match the implementation of Value::getPointerDereferenceableBytes.

Would it be feasible (and still useful to the optimizer) to add an annotation providing the alternative semantics? Most of the places where Clang emits this annotation mean only "dereferenceable at the point of call", not "dereferenceable for the duration of the function".

It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)

IMO, we really need to add the other attribute to the IR.

If we codify these semantics, we need to stop Clang from using this attribute. But we shouldn't just rip all that code out only to re-add it once the new attribute is in place. I would seem cleaner to add the new attribute with the desired semantics (even if it isn't (yet) wired up to the optimizer) so that we can just switch Clang from one attribute to the other.

(marking as requested changes so this doesn't show up on my dashboard)

This revision now requires changes to proceed.Jul 6 2018, 6:28 PM

In D48239#1154943, @chandlerc wrote:

IMO, we really need to add the other attribute to the IR.

If we codify these semantics, we need to stop Clang from using this attribute. But we shouldn't just rip all that code out only to re-add it once the new attribute is in place. I would seem cleaner to add the new attribute with the desired semantics (even if it isn't (yet) wired up to the optimizer) so that we can just switch Clang from one attribute to the other.

I agree.

Also, fortunately, hooking up the new attribute should be easy. llvm::isDereferenceableAndAlignedPointer already takes a context instruction, and if I'm thinking about this correctly, the new attribute behaves like the old attribute so long as the pointer a) hasn't been captured (assuming that freeing a pointer captures it) or b) has been captured but no functions have been called or atomics used, and I'd not bother checking (b) for now, so we just need to call PointerMayBeCapturedBefore (we can later add another attribute to indicate that a function hasn't freed its argument, and then perform bottom-up inference in the usual way, which will make this more precise).

In D48239#1135631, @efriedma wrote:

It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction (and we should just need to also check for capturing). Please let me know if you agree.

In D48239#1155025, @hfinkel wrote:

In D48239#1135631, @efriedma wrote:

It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction (and we should just need to also check for capturing). Please let me know if you agree.

If this is the case, then I wonder whether we should really just change the semantics and add a separate attribute to model the concept of dereferenceable lasting the entire function...

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction

Well, not all the users pass in the context parameter, since it's optional, but otherwise, that's all we need in theory. That said, it would be too expensive without some sort of cache. Dereferenceable doesn't imply noalias, so any call could free the pointer. So we have to iterate over every instruction between the use and the function's entry point to find calls which might alias. Given we need a cache, we need an analysis pass to store the cache, and that cache has to be preserved by every loop pass so we can use it from LICM.

In D48239#1156604, @efriedma wrote:

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction

Well, not all the users pass in the context parameter, since it's optional, but otherwise, that's all we need in theory.

Agreed. Although LICM does, and (I suspect) that's the most important caller.

That said, it would be too expensive without some sort of cache. Dereferenceable doesn't imply noalias, so any call could free the pointer. So we have to iterate over every instruction between the use and the function's entry point to find calls which might alias. Given we need a cache, we need an analysis pass to store the cache, and that cache has to be preserved by every loop pass so we can use it from LICM.

I'm not sure. One part of this story is that isDereferenceableAndAlignedPointer is itself rarely used directly. LICM calls it directly, as does InstCombine, but essentially all other users call isSafeToLoadUnconditionally which calls isDereferenceableAndAlignedPointer. isSafeToLoadUnconditionally also does scanning for other memory accesses and I've always assumed was not cheap (although I've never profiled it specifically). The scanning is helpful for LICM, so we're right to avoid it there, but my point is that nearly every other place we're querying for this information we're actually doing something relatively expensive.

None of this is to say that we shouldn't be threading a cache of some kind into this interface. We should. It's not clear to me that we need more than a cache of OrderedBasicBlocks (PointerMayBeCapturedBefore can already use one of these), and this is what we currently use with AA.callCapturesBefore. Specifically, I'm thinking of an OrderedInstructions cache (which GVN uses).

I'd be happy to say that we add an OrderedInstructions *OI = nullptr to the interface of isDereferenceableAndAlignedPointer, and only use PointerMayBeCapturedBefore if we can provide the associated OBB, otherwise we fallback to calling PointerMayBeCaptured. Adding support in LICM should be relatively straightforward. What do you think?

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

So instead, we have to scan every instruction in the function between the beginning of the function and the insertion point, which is going to be very expensive. (Or if we set a tight limit on the number of instructions it will walk, it probably give up before it reaches the function's entry point.)

In D48239#1156781, @efriedma wrote:

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Ah, indeed. You're certainly correct. And free is marked as nocapture (which we'd likely prefer to keep, although free almost certainly does capture the pointer value in some physical sense, it does not do so in a way that will be visible to the rest of the program (except that it might be returned by some later malloc call)).

We'd need to model this directly for it to be useful (as an attribute that means that the function doesn't call free, or doesn't free a particular argument, or similar). I'll send an RFC about that.

So instead, we have to scan every instruction in the function between the beginning of the function and the insertion point, which is going to be very expensive. (Or if we set a tight limit on the number of instructions it will walk, it probably give up before it reaches the function's entry point.)

efriedma mentioned this in D49165: Add, and infer, a nofree function attribute.Jul 10 2018, 6:59 PM

dberris added a subscriber: dberris.Jul 10 2018, 9:03 PM

In D48239#1156857, @hfinkel wrote:

In D48239#1156781, @efriedma wrote:

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Ah, indeed. You're certainly correct. And free is marked as nocapture (which we'd likely prefer to keep, although free almost certainly does capture the pointer value in some physical sense, it does not do so in a way that will be visible to the rest of the program (except that it might be returned by some later malloc call)).

We'd need to model this directly for it to be useful (as an attribute that means that the function doesn't call free, or doesn't free a particular argument, or similar). I'll send an RFC about that.

Based on that RFC (http://lists.llvm.org/pipermail/llvm-dev/2018-July/124555.html), we'll add new attributes to capture Clang's reference use case and leave these as-is. As a result, I think that we can move forward with this clarification for the existing attributes/metadata. LGTM.

So instead, we have to scan every instruction in the function between the beginning of the function and the insertion point, which is going to be very expensive. (Or if we set a tight limit on the number of instructions it will walk, it probably give up before it reaches the function's entry point.)

jdoerfert added a subscriber: jdoerfert.May 7 2019, 9:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2019, 9:48 AM

While we are going forward with the "no-free" and "no-sync" attributes, I need to also fix the ambiguity here asap.

I'll propose a patch with the alternative today. So I'll make dereferenceable mean dereferenceable at this point and for globally dereferenceable I'll add a new attribute.
This should way the patch of least resistance forward (I'll put detailed argumentation in the commit message). If it turns out we do want the opposite after all, I'm willing to
use this as a base and add a new dereferenceable at this point attribute.

docs/LangRef.rst
7961	I'd nix 7961 and 7962 and put the following after "loaded is known to be dereferenceable": , thus the conditions described in <link_to_dereferenceable> hold. This way we have a single definition.
7972	Same as above.

jdoerfert mentioned this in D61652: [Attr] Introduce dereferenceable_globally.May 7 2019, 1:28 PM

sanjoy resigned from this revision.Jan 29 2022, 5:42 PM

Revision Contents

Path

Size

docs/

LangRef.rst

12 lines

Diff 151561

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,121 Lines • ▼ Show 20 Lines
	``nonnull``			``nonnull``
	This indicates that the parameter or return pointer is not null. This			This indicates that the parameter or return pointer is not null. This
	attribute may only be applied to pointer typed parameters. This is not			attribute may only be applied to pointer typed parameters. This is not
	checked or enforced by LLVM, the caller must ensure that the pointer			checked or enforced by LLVM, the caller must ensure that the pointer
	passed in is non-null, or the callee must ensure that the returned pointer			passed in is non-null, or the callee must ensure that the returned pointer
	is non-null.			is non-null.

	``dereferenceable(<n>)``			``dereferenceable(<n>)``
	This indicates that the parameter or return pointer is dereferenceable. This			On a parameter, this indicates the parameter is dereferenceable for
				the duration of the function. On a return value, this indicates the
				returned value is dereferenceable for the duration of the program.
				If the pointer cannot be dereferenced, the behavior is undefined. This
				sanjoyUnsubmitted Not Done Reply Inline Actions DeadArgElim transforms define void @f(i8* dereferenceable(16) %ptr) { ret void } define void @caller(i8* %v) { call void @f(i8* %v) ret void } to define void @f(i8* dereferenceable(16) %ptr) { ret void } define void @caller(i8* %v) { call void @f(i8* undef) ret void } Given this semantic this transform is wrong, right? sanjoy: DeadArgElim transforms ``` define void @f(i8* dereferenceable(16) %ptr) { ret void } define…
				nlopesUnsubmitted Not Done Reply Inline Actions Nice example :) I guess DeadArgElim could be fixed to drop the dereferenceable tag? I think being UB when the ptr is not dereferenceable sounds reasonable to me. nlopes: Nice example :) I guess DeadArgElim could be fixed to drop the dereferenceable tag? I think…
				sanjoyUnsubmitted Not Done Reply Inline Actions I guess DeadArgElim could be fixed to drop the dereferenceable tag? Sounds good to me. I think we can think about `dereferenceable(N)` as "there is an `N` byte load from the pointer at the entry and exit of the function". sanjoy: > I guess DeadArgElim could be fixed to drop the dereferenceable tag? Sounds good to me. I…
	attribute may only be applied to pointer typed parameters. A pointer that			attribute may only be applied to pointer typed parameters. A pointer that
	is dereferenceable can be loaded from speculatively without a risk of			is dereferenceable can be loaded from speculatively without a risk of
	trapping. The number of bytes known to be dereferenceable must be provided			trapping. The number of bytes known to be dereferenceable must be provided
	in parentheses. It is legal for the number of bytes to be less than the			in parentheses. It is legal for the number of bytes to be less than the
	size of the pointee type. The ``nonnull`` attribute does not imply			size of the pointee type. The ``nonnull`` attribute does not imply
	dereferenceability (consider a pointer to one element past the end of an			dereferenceability (consider a pointer to one element past the end of an
	array), however ``dereferenceable(<n>)`` does imply ``nonnull`` in			array), however ``dereferenceable(<n>)`` does imply ``nonnull`` in
	``addrspace(0)`` (which is the default address space).			``addrspace(0)`` (which is the default address space).
	▲ Show 20 Lines • Show All 6,810 Lines • ▼ Show 20 Lines
	instruction tells the optimizer that the value loaded is known to			instruction tells the optimizer that the value loaded is known to
	never be null. This is analogous to the ``nonnull`` attribute			never be null. This is analogous to the ``nonnull`` attribute
	on parameters and return values. This metadata can only be applied			on parameters and return values. This metadata can only be applied
	to loads of a pointer type.			to loads of a pointer type.

	The optional ``!dereferenceable`` metadata must reference a single metadata			The optional ``!dereferenceable`` metadata must reference a single metadata
	name ``<deref_bytes_node>`` corresponding to a metadata node with one ``i64``			name ``<deref_bytes_node>`` corresponding to a metadata node with one ``i64``
	entry. The existence of the ``!dereferenceable`` metadata on the instruction			entry. The existence of the ``!dereferenceable`` metadata on the instruction
	tells the optimizer that the value loaded is known to be dereferenceable.			tells the optimizer that the value loaded is known to be dereferenceable
				at any later point in the program. If the pointer cannot be dereferenced,
				jdoerfertUnsubmitted Not Done Reply Inline Actions I'd nix 7961 and 7962 and put the following after "loaded is known to be dereferenceable": , thus the conditions described in <link_to_dereferenceable> hold. This way we have a single definition. jdoerfert: I'd nix 7961 and 7962 and put the following after "loaded is known to be dereferenceable"…
				the behavior is undefined.
	The number of bytes known to be dereferenceable is specified by the integer			The number of bytes known to be dereferenceable is specified by the integer
	value in the metadata node. This is analogous to the ''dereferenceable''			value in the metadata node. This is analogous to the ''dereferenceable''
	attribute on parameters and return values. This metadata can only be applied			attribute on parameters and return values. This metadata can only be applied
	to loads of a pointer type.			to loads of a pointer type.

	The optional ``!dereferenceable_or_null`` metadata must reference a single			The optional ``!dereferenceable_or_null`` metadata must reference a single
	metadata name ``<deref_bytes_node>`` corresponding to a metadata node with one			metadata name ``<deref_bytes_node>`` corresponding to a metadata node with one
	``i64`` entry. The existence of the ``!dereferenceable_or_null`` metadata on the			``i64`` entry. The existence of the ``!dereferenceable_or_null`` metadata on the
	instruction tells the optimizer that the value loaded is known to be either			instruction tells the optimizer that the value loaded is known to be either
	dereferenceable or null.			dereferenceable or null at any later point in the program. If the pointer
				jdoerfertUnsubmitted Not Done Reply Inline Actions Same as above. jdoerfert: Same as above.
				cannot be dereferenced and is not null, the behavior is undefined.
	The number of bytes known to be dereferenceable is specified by the integer			The number of bytes known to be dereferenceable is specified by the integer
	value in the metadata node. This is analogous to the ''dereferenceable_or_null''			value in the metadata node. This is analogous to the ''dereferenceable_or_null''
	attribute on parameters and return values. This metadata can only be applied			attribute on parameters and return values. This metadata can only be applied
	to loads of a pointer type.			to loads of a pointer type.

	The optional ``!align`` metadata must reference a single metadata name			The optional ``!align`` metadata must reference a single metadata name
	``<align_node>`` corresponding to a metadata node with one ``i64`` entry.			``<align_node>`` corresponding to a metadata node with one ``i64`` entry.
	The existence of the ``!align`` metadata on the instruction tells the			The existence of the ``!align`` metadata on the instruction tells the
	▲ Show 20 Lines • Show All 7,087 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Clarify meaning of "dereferencable" attribute/metadata.Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 151561

docs/LangRef.rst

[LangRef] Clarify meaning of "dereferencable" attribute/metadata.
Needs RevisionPublic