Download Raw Diff

Details

Reviewers

reames
theraven
efriedma
jrtc27
nlopes
nikic
sanjoy

Summary

If agreed, some work will be needed to comply with these rules.

Diff Detail

Event Timeline

arsenm created this revision.Jun 18 2019, 4:04 PM

Herald added a subscriber: wdng. · View Herald TranscriptJun 18 2019, 4:04 PM

arsenm added a reviewer: theraven.Jun 18 2019, 4:05 PM

Thanks for working on this, it's a seriously underspecified part of LLVM.

docs/LangRef.rst
9694 ↗	(On Diff #205461)	What does 'is legal' actually mean? For example, if I have a 32-bit VA space and a 64-bit VA space in the same process with a subset of the program understanding only one and a shim layer understanding both, then the cast from 32-bit to 64-bit will always succeed but the converse cast is always a statically valid operation but may dynamically fail (and the failure case may be to return null).
9695 ↗	(On Diff #205461)	Does this not preclude using address space casts to make pointers visible to GC (or, conversely, to notify the GC that a pointer has escaped)? I think we may need to define what 'capture' means in this context. I believe the goal of this is to state that the compiler is free to elide address space casts with no uses?
9704 ↗	(On Diff #205461)	I believe that this holds only if both address spaces are permutations. Again, the simple 32/64-bit case, a round trip starting in the 32-bit world is always possible, but the converse does not apply.

arsenm marked 3 inline comments as done.Jun 19 2019, 7:25 AM

arsenm added inline comments.

docs/LangRef.rst
9694 ↗	(On Diff #205461)	I was thinking legal means a dereferencable result. If you were to access the result when the cast failed, it would be undefined
9695 ↗	(On Diff #205461)	I'm trying to reiterate that this is not an operation that touches memory in any way. We do allow addrspacecasts in constant expressions, and they can't trap. Elision should be OK. The intent is addrspacecast has stronger aliasing properties than a ptrtoint/inttoptr pair, which was part of the original intent of adding a new instruction.
9704 ↗	(On Diff #205461)	This probably needs an additional constraint that the result was also dereferencable / legal as stated before. The goal is it should be legal to insert new addrspacecasts back to the source addrspace in certain contexts. The interesting ones are cases where you are dereferencing the pointer anyway, so it would have been undefined if the cast failed

Say dereferencable instead of legal

Apparently I forgot to hit submit on the comments I made yesterday, sorry!

docs/LangRef.rst
9694 ↗	(On Diff #205461)	I'm happy with that clarification. We probably want to explicitly talk about dynamic failure for addrspacecast somewhere in this documentation.
9695 ↗	(On Diff #205461)	Makes sense, we probably want to clarify what capturing means from the perspective of allowed transforms here. It may be that an addrsoacecast has static side effects (if that's a valid term) causing some extra metadata to be emitted about roots on the stack or in registers, for example, but the compiler is free to move address space casts across other operations and to elide them. In the CHERI case, we do have one operation that it is not safe to move address space casts across (writes to the default data capability change the meaning of the current address space), but it's also not safe to reorder loads / stores across that and so we probably need to add an explicit address space change barrier intrinsic for things like that.
9704 ↗	(On Diff #205461)	I think we need to be very careful about eliding user-expected trapping behaviour here. CHERI has similar behaviour, but consider a simple 32/64-bit example where: 32-bit pointers are AS0 64-bit pointers are AS1 Cast from 32-bit to 64-bit is a zero extension (always valid) Cast from 64-bit to 32-bit is a truncation, equality test, and conditional move of 0 on failure, so an addrspacecast from AS1 to AS0 gives either a valid result or null. If I have a sequence of: %0 is some AS1 pointer %1 casts to AS0 %2 loads from %1 This would be expected to trap, but with the text as written it would be valid to elide %1 and have %2 be a load from %0. This is probably not what the programmer expected. Would you need to mark one of these as non-integral to achieve those guarantees? Or have some ordering of permissiveness? In this scenario, it's fine to do the elision the other way around: %0 is some AS0 pointer %1 casts to AS1 %2 loads from %1 In this version, eliding %1 and making %2 load from %0 is totally fine. I think we really need to model three kinds of cast: Ones that are always valid (statically known) Ones that are always invalid (statically known) Ones that are sometimes valid (dynamically known) I'd suggest making the last case the default and adding metadata for the first two. Front ends generally know which of these are the case and can add the metadata and some later analyses may be able to once some more is known about the values. Optimisers are then free to elide any always-valid addrspacecast instructions. If the metadata is accidentally dropped then that's also fine. In the CHERI case, an AS0 to AS200 cast is always valid (and clang could be taught to add the metadata). An AS200 to AS0 cast is always valid if you can statically prove that it was the result of a cast in the opposite direction, or if it came from a specific allocator. Most AS0 to AS200 casts are followed by a call to an intrinsic that sets bounds / permissions, and this is the only use of the addrspacecast instruction, which prohibits these optimisations (correctly).

jeroen.dobbelaere added a subscriber: jeroen.dobbelaere.Jun 20 2019, 3:36 AM

arsenm marked an inline comment as done.Jun 20 2019, 9:06 AM

arsenm added inline comments.

docs/LangRef.rst
9704 ↗	(On Diff #205461)	Loading from the failed cast would be undefined behavior, so transforming the load and avoiding the trap should be legal. I wouldn't expect this to be any different from the compiler eliminating the a trap from a store to NULL. I do think this is an argument for not allowing changing the address space of volatile operations (see D63401).

Add some notes about changing the address space of accesses

ping

jdoerfert added a subscriber: jdoerfert.Jun 26 2019, 4:09 PM

jdoerfert added inline comments.

docs/LangRef.rst
2713 ↗	(On Diff #205986)	I'd change the last sentence, maybe as shown below, but the meaning is fine. "In general, "null-pointers" have to be assumed dereferenceable in non-zero address spaces." dereferencable -> dereferenceable
2720 ↗	(On Diff #205986)	For a non-volatile access, if an object can be proven accessible through a pointer with a different address space, the access may be modified to use that address space. I'm not happy with this wording,... but I can't quite put my finger on it.
9706 ↗	(On Diff #205986)	same memory location. Is it necessarily the same memory location or the same "logical" memory location? This is an honest question.

theraven added inline comments.Jun 27 2019, 5:25 AM

docs/LangRef.rst
2713 ↗	(On Diff #205986)	I am also unhappy with this as a claim. null pointers should always be non-dereferenceable. The semantics of inttoptr of 0 may not yield null for a given address space and the semantics of ptrtoint of null may not yield the integer 0, but the definition of a null pointer is a guaranteed non-dereferenceable pointer.

arsenm marked an inline comment as done.Jun 27 2019, 9:19 AM

arsenm added inline comments.

docs/LangRef.rst
2713 ↗	(On Diff #205986)	I'm unhappy with the claim, but I'm also unhappy with the state of null in llvm. "i8 addrspace(N)* null" is always interpreted as bit pattern 0. I would like to fix this by defining the invalid pointer value in the datalayout, and remove all special treatment of addrspace(0). As-is I don't think there is a way to describe a constant, known-invalid/null pointer in the IR.

Fix spelling. Rephrase and move volatile note to volatile section

ping

Ping

ping

Sorry for the delay.

I have a few more notes/questions.

docs/LangRef.rst
2230 ↗	(On Diff #206883)	Why, "In general,"? Do we have exceptions?
2726 ↗	(On Diff #206883)	Again, why "possible" and "may". Don't we "just" want to say "Exceptions apply for `volatile` operations"?
9726 ↗	(On Diff #206883)	I'm confused by: both source and destination address spaces are :ref:`integral pointers <nointptrtype>` why are address spaces pointer types (or am I reading this wrong)? Do we want "is assumed" and "should"? As above, that sounds like the are alternative options.

vchuravy added a subscriber: vchuravy.Oct 23 2019, 10:19 AM

anna added a subscriber: anna.Mar 24 2020, 9:21 AM

Just chiming in here. We are planning to use addrspacecast for pointer width conversion (32 to 64 bit and vice-versa). Does not change the memory in anyway, i.e. if we lowered the addrspacecast into an IR call, that call could be marked argmemonly readnone. However, it is *not* a simple sign-extend/truncate either.
I've added some comments inline. Also, I think we have the scope to improve optimizations for addrspacecast at IR level, which is currently limited because different targets have varying lowering for addrspacecast.

Specifically, things like "looking through" an addrspacecast when the source operand of AS is nonnull and noting that the result is nonnull as well (i.e. improvements to isKnownNonZero and computeKnownBits). I was initially leaning towards adding a metadata for special addrspacecast semantics. However, @arsenm mentioned his ideas on adding information in the datalayout, i.e. 0 is a null pointer value for specific addrspaces. The latter may flow nicely with existing optimizations.

docs/LangRef.rst
9716 ↗	(On Diff #206883)	I believe this by itself is a very good statement/property to add to the langref and Verifier.
9722 ↗	(On Diff #206883)	This rule allows us to optimize away redundant addrspacecasts. We should be able to eliminate back-to-back addrspacecast when the source and result are dereferenceable, something like: %v.32 = addrspacecast i8 addrspace(1)* %v.64 to i8 addrspace(2)* %v.64.copy = addrspacecast i8 addrspace(2)* to i8 addrspace(1)*

arsenm marked an inline comment as done.Apr 2 2020, 6:18 PM

arsenm added inline comments.

docs/LangRef.rst
2230 ↗	(On Diff #206883)	The exceptions would be with target specific knowledge. General IR not considering target semantics must follow this rule

Apply phrasing suggestions

Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2020, 1:46 PM

I think this looks reasonable :)

sanjoy resigned from this revision.Jan 29 2022, 5:24 PM

arsenm added reviewers: efriedma, jrtc27, nlopes, nikic.Oct 26 2022, 7:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 7:40 AM

volatile restriction seems fine. New addrspace casting rules seem fine.

llvm/docs/LangRef.rst
3644	Given we don't define what it means for a pointer to be "null", or promise that such a pointer exists in every address-space, it isn't really meaningful to say that dereferencing null is undefined. We can just drop that whole sentence, I think.
11330	"The conversion must have no side effects, and must not capture the value of the pointer."

nlopes added inline comments.Oct 26 2022, 2:29 PM

llvm/docs/LangRef.rst
3644	agreed
3646	unless the function is marked with the `null_pointer_is_valid` attribute
3648	uhm, do we want this? This precludes using two address spaces with different memory protections. I understand why this assumption may be useful for optimization, but not sure we want it.

Rebase and address comments

llvm/docs/LangRef.rst
3644	I can drop the null part here, but I think the important part to note is dereferencing a non-dereferenceable pointer is still undefined behavior in any address space. I would eventually like to move towards defining the null concept, rather than special casing address space 0.
3648	If we know the pointer is to a valid object, does different memory protection really matter? For objects living on the stack, I guess it would make a difference in failure behavior for the stack overflow case. I don't believe address spaces should be a mechanism for you-must-get-this-memory access type of guarantee, and that's what the point of volatile is (which is why I put this exception here for volatile)

arsenm added a child revision: D63401: SROA: Simplify addrspacecasted allocas with volatile accesses.Oct 29 2022, 11:22 AM

Harbormaster completed remote builds in B195100: Diff 471762.Oct 29 2022, 11:51 AM

I like this latest writing. But please wait for another reviewer to make sure I didn't miss smth.

This revision is now accepted and ready to land.Oct 30 2022, 5:20 AM

LGTM

da137e114eb735dc9e79cb30407bcd03118e8863

nikic added inline comments.Nov 2 2022, 12:58 AM

llvm/docs/LangRef.rst
3651	Backticks should probably end after the attribute name?

JOE1994 added a subscriber: JOE1994.Fri, Dec 1, 2:25 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptFri, Dec 1, 2:25 PM

Diff 471762

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,111 Lines • ▼ Show 20 Lines
	address has defined behavior.			address has defined behavior.

	The allowed side-effects for volatile accesses are limited. If a			The allowed side-effects for volatile accesses are limited. If a
	non-volatile store to a given address would be legal, a volatile			non-volatile store to a given address would be legal, a volatile
	operation may modify the memory at that address. A volatile operation			operation may modify the memory at that address. A volatile operation
	may not modify any other memory accessible by the module being compiled.			may not modify any other memory accessible by the module being compiled.
	A volatile operation may not call any code in the current module.			A volatile operation may not call any code in the current module.

				In general (without target specific context), the address space of a
				volatile operation may not be changed. Different address spaces may
				have different trapping behavior when dereferencing an invalid
				pointer.

	The compiler may assume execution will continue after a volatile operation,			The compiler may assume execution will continue after a volatile operation,
	so operations which modify memory or may have undefined behavior can be			so operations which modify memory or may have undefined behavior can be
	hoisted past a volatile operation.			hoisted past a volatile operation.

	As an exception to the preceding rule, the compiler may not assume execution			As an exception to the preceding rule, the compiler may not assume execution
	will continue after a volatile store operation. This restriction is necessary			will continue after a volatile store operation. This restriction is necessary
	to support the somewhat common pattern in C of intentionally storing to an			to support the somewhat common pattern in C of intentionally storing to an
	invalid pointer to crash the program. In the future, it might make sense to			invalid pointer to crash the program. In the future, it might make sense to
	▲ Show 20 Lines • Show All 500 Lines • ▼ Show 20 Lines
	Pointer Type			Pointer Type
	""""""""""""			""""""""""""

	:Overview:			:Overview:

	The pointer type ``ptr`` is used to specify memory locations. Pointers are			The pointer type ``ptr`` is used to specify memory locations. Pointers are
	commonly used to reference objects in memory.			commonly used to reference objects in memory.

	Pointer types may have an optional address space attribute defining the			Pointer types may have an optional address space attribute defining
	numbered address space where the pointed-to object resides. The default			the numbered address space where the pointed-to object resides. For
	address space is number zero. The semantics of non-zero address spaces			example, ``ptr addrspace(5)`` is a pointer to address space 5.
	are target-specific. For example, ``ptr addrspace(5)`` is a pointer
				efriedmaUnsubmitted Not Done Reply Inline Actions Given we don't define what it means for a pointer to be "null", or promise that such a pointer exists in every address-space, it isn't really meaningful to say that dereferencing null is undefined. We can just drop that whole sentence, I think. efriedma: Given we don't define what it means for a pointer to be "null", or promise that such a pointer…
				nlopesUnsubmitted Not Done Reply Inline Actions agreed nlopes: agreed
				arsenmAuthorUnsubmitted Done Reply Inline Actions I can drop the null part here, but I think the important part to note is dereferencing a non-dereferenceable pointer is still undefined behavior in any address space. I would eventually like to move towards defining the null concept, rather than special casing address space 0. arsenm: I can drop the null part here, but I think the important part to note is dereferencing a non…
	to address space 5.			The default address space is number zero.

				nlopesUnsubmitted Done Reply Inline Actions unless the function is marked with the `null_pointer_is_valid` attribute nlopes: unless the function is marked with the `null_pointer_is_valid` attribute
				The semantics of non-zero address spaces are target-specific. Memory
				access through a non-dereferenceable pointer is undefined behavior in
				nlopesUnsubmitted Not Done Reply Inline Actions uhm, do we want this? This precludes using two address spaces with different memory protections. I understand why this assumption may be useful for optimization, but not sure we want it. nlopes: uhm, do we want this? This precludes using two address spaces with different memory protections.
				arsenmAuthorUnsubmitted Done Reply Inline Actions If we know the pointer is to a valid object, does different memory protection really matter? For objects living on the stack, I guess it would make a difference in failure behavior for the stack overflow case. I don't believe address spaces should be a mechanism for you-must-get-this-memory access type of guarantee, and that's what the point of volatile is (which is why I put this exception here for volatile) arsenm: If we know the pointer is to a valid object, does different memory protection really matter?
				any address space. Pointers with the bit-value 0 are only assumed to
				be non-dereferenceable in address space 0, unless the function is
				marked with the ``null_pointer_is_valid attribute``.
				nikicUnsubmitted Not Done Reply Inline Actions Backticks should probably end after the attribute name? nikic: Backticks should probably end after the attribute name?

				If an object can be proven accessible through a pointer with a
				different address space, the access may be modified to use that
				address space. Exceptions apply if the operation is ``volatile``.

	Prior to LLVM 15, pointer types also specified a pointee type, such as			Prior to LLVM 15, pointer types also specified a pointee type, such as
	``i8``, ``[4 x i32]`` or ``i32 (i32)``. In LLVM 15, such "typed			``i8``, ``[4 x i32]`` or ``i32 (i32)``. In LLVM 15, such "typed
	pointers" are still supported under non-default options. See the			pointers" are still supported under non-default options. See the
	`opaque pointers document <OpaquePointers.html>`__ for more information.			`opaque pointers document <OpaquePointers.html>`__ for more information.

	.. _t_vector:			.. _t_vector:

	▲ Show 20 Lines • Show All 7,655 Lines • ▼ Show 20 Lines

	Semantics:			Semantics:
	""""""""""			""""""""""

	The '``addrspacecast``' instruction converts the pointer value			The '``addrspacecast``' instruction converts the pointer value
	``ptrval`` to type ``pty2``. It can be a no-op cast or a complex			``ptrval`` to type ``pty2``. It can be a no-op cast or a complex
	value modification, depending on the target and the address space			value modification, depending on the target and the address space
	pair. Pointer conversions within the same address space must be			pair. Pointer conversions within the same address space must be
	performed with the ``bitcast`` instruction. Note that if the address space			performed with the ``bitcast`` instruction. Note that if the address
	conversion is legal then both result and operand refer to the same memory			space conversion produces a dereferenceable result then both result
	location.			and operand refer to the same memory location. The conversion must
				have no side effects, and must not capture the value of the pointer.
				efriedmaUnsubmitted Done Reply Inline Actions "The conversion must have no side effects, and must not capture the value of the pointer." efriedma: "The conversion must have no side effects, and must not capture the value of the pointer."

				If the source is :ref:`poison <poisonvalues>`, the result is
				:ref:`poison <poisonvalues>`.

				If the source is not :ref:`poison <poisonvalues>`, and both source and
				destination are :ref:`integral pointers <nointptrtype>`, and the
				result pointer is dereferenceable, the cast is assumed to be
				reversible (i.e. casting the result back to the original address space
				should yield the original bit pattern).

	Example:			Example:
	""""""""			""""""""

	.. code-block:: llvm			.. code-block:: llvm

	%X = addrspacecast ptr %x to ptr addrspace(1)			%X = addrspacecast ptr %x to ptr addrspace(1)
	%Y = addrspacecast ptr addrspace(1) %y to ptr addrspace(2)			%Y = addrspacecast ptr addrspace(1) %y to ptr addrspace(2)
	▲ Show 20 Lines • Show All 14,786 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

LangRef: Attempt to formulate some rules for addrspacecast
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 471762

llvm/docs/LangRef.rst

This is an archive of the discontinued LLVM Phabricator instance.

LangRef: Attempt to formulate some rules for addrspacecastClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 471762

llvm/docs/LangRef.rst

LangRef: Attempt to formulate some rules for addrspacecast
ClosedPublic