This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/docs/
-
docs/
6/12
LangRef.rst

Differential D94002

[LangRef] Make lifetime intrinsic's semantics consistent with StackColoring's comment
ClosedPublic

Authored by aqjune on Jan 3 2021, 5:07 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
lebedev.ri
nikic
efriedma
nhaehnle
arsenm
nadav
craig.topper
nlopes

Commits

rGc821ef451373: [LangRef] Make lifetime intrinsic's semantics consistent with StackColoring's…

Summary

This patch is an update to LangRef by describing lifetime intrinsics' behavior
by following the description of MIR's LIFETIME_START/LIFETIME_END markers
at StackColoring.cpp (https://github.com/llvm/llvm-project/blob/eb44682d671d66e422b02595a636050582a4d84a/llvm/lib/CodeGen/StackColoring.cpp#L163) and the discussion in llvm-dev.

In order to explicitly define the meaning of an object lifetime, I added 'Object Lifetime' subsection.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aqjune requested review of this revision.Jan 3 2021, 5:07 PM

aqjune created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptJan 3 2021, 5:07 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

aqjune added inline comments.Jan 3 2021, 5:11 PM

llvm/docs/LangRef.rst
17885	I added this part because I found this: https://bugs.llvm.org/show_bug.cgi?id=27903 https://reviews.llvm.org/D20739 This is an old bug, so I'm not sure whether this is still valid, but the patch still exists in StackColoring.cpp, so I added this.

Minor update

aqjune mentioned this in D93376: [LangRef] Clarify the semantics of lifetime intrinsics.Jan 3 2021, 5:15 PM

Harbormaster completed remote builds in B83864: Diff 314307.Jan 3 2021, 5:46 PM

Harbormaster completed remote builds in B83865: Diff 314308.Jan 3 2021, 5:51 PM

I left some comments.

I think I will reply to the email thread because I have more thoughts on this by now.

llvm/docs/LangRef.rst
2580	Is "preserved" the right word here? Maybe "reserved"? _ "allocation instruction" + "allocation value" or something else because globals are not instructions. _ "returns" + "return" _ "free-like commands" + instructions that deallocate the object or impact it's lifetime Lifetime markers, as of now, still talk about memory regions, not objects. I think that can be changed but should be kept in mind. Why the "representable in integers" part, and "integral address"?

aqjune added inline comments.Jan 4 2021, 5:53 PM

llvm/docs/LangRef.rst
2580	Thanks. Why the "representable in integers" part, and "integral address"? Because it is important (IMO) and related to the lifetime. To be specific this patch, It answers whether two stack allocas with different lifetimes may have overlapping addresses.

jeroen.dobbelaere added a subscriber: jeroen.dobbelaere.Jan 5 2021, 1:53 AM

jdoerfert added inline comments.Jan 5 2021, 8:00 AM

llvm/docs/LangRef.rst
2580	I can compare two pointers without converting them to integers, can't I? If so, I don't understand in which situations this special case would be applied. Said differently: When not convertable to integers and not comparable, and why shouldn't lifetime apply then?

aqjune added inline comments.Jan 5 2021, 6:11 PM

llvm/docs/LangRef.rst
2580	While answering to your mail, I think I understood your point. If the memory block is never observed, it doesn't matter whether it is disjoint or not. I'll update the text.

aqjune added inline comments.Jan 5 2021, 6:15 PM

llvm/docs/LangRef.rst
2580	Oh, it was a different story. :/ well, I'll adequately update this part...

Address comments

Harbormaster completed remote builds in B84147: Diff 314768.Jan 5 2021, 7:13 PM

It is memset(undef) in other cases

Thinking about @jdoerfert 's suggestion again, it still implies that objects may have overlapping addresses regardless of lifetime. Am I understanding correctly?

What about this model I just updated? Would this support interested optimizations for lifetimes with non-allocas?

Harbormaster completed remote builds in B84418: Diff 315295.Jan 7 2021, 10:05 PM

RalfJung added inline comments.Jan 11 2021, 4:12 AM

llvm/docs/LangRef.rst
17892	So this means if LLVM becomes smarter, and "what does this point to" changes from "unknown" to "this set of allocas", that could actually introduce UB because "lifetime.start" semantics changes from "memset(undef)" to "the alloca is initially dead and only becomes life at this marker". Did I understand this correctly? If yes, that seems quite problematic from a user perspective -- how can I make sure that my code does not have UB, if that depends on how "smart" LLVM's analysis is?

aqjune added inline comments.Feb 9 2021, 10:28 PM

llvm/docs/LangRef.rst
17892	(I just found your comment, sorry) If the argument of lifetime.start/end is to be syntactically restricted (just an assumption, it's still under discussion), the 'right' syntactic pattern should be already defined somehow. It must be a closure of pointer arithmetic operations or no-ops.

I updated the text to reflect the consensus made in llvm-dev discussion so far.

If lifetime.start/end isn't used with a stack pointer with zero offset, it is equivalent to memset(poison).
Also, the paragraph that describes the disjointness of addresses is removed.

Calling lifetime.start twice on an alive alloca is also updated to memset(poison).
It isn't defined as UB because I couldn't find any LLVM code or comments saying that lifetime may raise UB.
Another choice is to define it as no-op, but https://godbolt.org/z/TqoeqG requires it.

Memory accesses on a dead object should be UB because the comment at StackColoring.cpp specifies that.

I did not mention anything about the size argument of lifetime.start/end because it wasn't clear to me how the argument was used.

aqjune edited the summary of this revision. (Show Details)Feb 23 2021, 8:44 AM

Adding reviewers who work on backends..

Herald added a subscriber: wdng. · View Herald TranscriptFeb 23 2021, 8:47 AM

Harbormaster completed remote builds in B90412: Diff 325805.Feb 23 2021, 9:45 AM

Gently ping, this text is ready to get reviewed. :)

We've had a several months-long discussion on this topic. I think we've reached quorum to move forward.
The patch looks great, thanks for your work. Please go ahead and commit it!

This semantics documents well the current LLVM behavior and doesn't introduce any regression in applications, which was one of the potential concerns with previous versions. Moreover, it doesn't prevent free movement of instructions, which is a generally nice property to have. This patch also gives a reasonable semantics for heap objects which enables future optimizations like preventing loop-carried dependencies and hoisting allocations out of loops.

This revision is now accepted and ready to land.Mar 2 2021, 10:15 AM

Closed by commit rGc821ef451373: [LangRef] Make lifetime intrinsic's semantics consistent with StackColoring's… (authored by aqjune). · Explain WhyMar 3 2021, 4:58 PM

This revision was automatically updated to reflect the committed changes.

aqjune added a commit: rGc821ef451373: [LangRef] Make lifetime intrinsic's semantics consistent with StackColoring's….

RalfJung added inline comments.Mar 5 2021, 2:57 AM

llvm/docs/LangRef.rst
17871	This sentence here is doing a lot of work. In particular, it makes the behavior of `alloca` depend on the future of the current execution, i.e. behavior depends on whether at some point in the future, a lifetime.start intrinsic is called on the resulting pointer. I think such acausal definitions are a mistake; they make it impossible to write an interpreter that just accurately executes LLVM IR and detects UB. At the very least, the documentation of `alloca` should be adjusted to explicitly talk about this. Right now, the documentation of lifetime.start alters the behavior of another instruction, which is really surprising and will easily be missed. Does LLVM correctly handle code like the following, which should not be UB even though there is an access to an alloca before the lifetime.start? (Imagine this to be LLVM IR code, and without "mustprogress") x = alloca ...; *x = 5; while (true) {} lifetime.start(x)
17885	So when `ptr` is a pointer to a stack-allocated object at offset 4, then all bytes of the object, including the ones at offset 0-3, will become `poison`?

nlopes added inline comments.Mar 5 2021, 4:16 AM

llvm/docs/LangRef.rst
17871	Ralf we already agreed that the current design is suboptimal and that it makes it impossible to write a precise interpreter for LLVM IR. The exact semantics of lifetime.start depends on the pattern matching patterns in the stack coloring algorithm. So this intrinsic cannot be abused. It must be used for the uses it was created for only. We agree a better design would be to tag allocas with `dead` so they are initially dead or just to allow allocas in any BB rather than having all them pushed to the first BB (if we assume a basic stack allocation algorithm would run even in fastisel). So while we can write a billion corner-case examples, by hand, they are not useful for the discussion of this particular patch. We can't go back in time and fix the current design. We can't change the current design without breaking all the frontends out there either. The way forward, if someone is interested in doing more aggressive optimizations, is to design a new mechanism that would have to live side-by-side with the current one for a couple of years to allow frontends to migrate. Or create an auto-upgrade algorithm to convert the old idiom to the new one. I agree adding a note to alloca and cross-reference this section makes sense.

The exact semantics of lifetime.start depends on the pattern matching patterns in the stack coloring algorithm. So this intrinsic cannot be abused. It must be used for the uses it was created for only.

That's fair, but then shouldn't the docs say that? Usually one would expect the docs to say everything there is to be said; so in this case a dedicated warning might be in order to document the caveats you just mentioned.

In D94002#2606276, @RalfJung wrote:

The exact semantics of lifetime.start depends on the pattern matching patterns in the stack coloring algorithm. So this intrinsic cannot be abused. It must be used for the uses it was created for only.

That's fair, but then shouldn't the docs say that? Usually one would expect the docs to say everything there is to be said; so in this case a dedicated warning might be in order to document the caveats you just mentioned.

Ok, agreed! Let's make that explicit in the document.

In D94002#2606307, @nlopes wrote:

In D94002#2606276, @RalfJung wrote:

The exact semantics of lifetime.start depends on the pattern matching patterns in the stack coloring algorithm. So this intrinsic cannot be abused. It must be used for the uses it was created for only.

That's fair, but then shouldn't the docs say that? Usually one would expect the docs to say everything there is to be said; so in this case a dedicated warning might be in order to document the caveats you just mentioned.

Ok, agreed! Let's make that explicit in the document.

I'll make a super small patch that adds this sentence. :)

llvm/docs/LangRef.rst
17885	Yes, it was chosen to make all bytes poison; the first argument wasn't used because the description is saying it is the size of the object.

aqjune mentioned this in D98112: [LangRef] mention that the lifetime intrinsics' description in LangRef isn't everything.Mar 6 2021, 1:54 AM

aqjune mentioned this in rG3d6183661d3a: [LangRef] mention that the lifetime intrinsics' description in LangRef isn't….Mar 8 2021, 6:34 PM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

72 lines

Diff 327967

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,538 Lines • ▼ Show 20 Lines	::

ARCHITECTURE-VENDOR-OPERATING_SYSTEM		ARCHITECTURE-VENDOR-OPERATING_SYSTEM
ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT		ARCHITECTURE-VENDOR-OPERATING_SYSTEM-ENVIRONMENT

This information is passed along to the backend so that it generates		This information is passed along to the backend so that it generates
code for the proper architecture. It's possible to override this on the		code for the proper architecture. It's possible to override this on the
command line with the ``-mtriple`` command line option.		command line with the ``-mtriple`` command line option.

		.. _objectlifetime:

		Object Lifetime
		----------------------

		A memory object, or simply object, is a region of a memory space that is
		reserved by a memory allocation such as :ref:`alloca <i_alloca>`, heap
		allocation calls, and global variable definitions.
		Once it is allocated, the bytes stored in the region can only be read or written
		through a pointer that is :ref:`based on <_pointeraliasing>` the allocation
		value.
		If a pointer that is not based on the object tries to read or write to the
		object, it is undefined behavior.

		A lifetime of a memory object is a property that decides its accessibility.
		Unless stated otherwise, a memory object is alive since its allocation, and
		dead after its deallocation.
		It is undefined behavior to access a memory object that isn't alive, but
		operations that don't dereference it such as
		:ref:`getelementptr <i_getelementptr>`, :ref:`ptrtoint <i_ptrtoint>` and
		:ref:`icmp <i_icmp>` return a valid result.
		This explains code motion of these instructions across operations that
		impact the object's lifetime.
		A stack object's lifetime can be explicitly specified using
		:ref:`llvm.lifetime.start <_int_lifestart>` and
		:ref:`llvm.lifetime.end <_int_lifeend>` intrinsic function calls.

.. _pointeraliasing:		.. _pointeraliasing:

Pointer Aliasing Rules		Pointer Aliasing Rules
----------------------		----------------------

Any memory access must be done through a pointer value associated with		Any memory access must be done through a pointer value associated with
an address range of the memory access, otherwise the behavior is		an address range of the memory access, otherwise the behavior is
		jdoerfertUnsubmitted Not Done Reply Inline Actions Is "preserved" the right word here? Maybe "reserved"? _ "allocation instruction" + "allocation value" or something else because globals are not instructions. _ "returns" + "return" _ "free-like commands" + instructions that deallocate the object or impact it's lifetime Lifetime markers, as of now, still talk about memory regions, not objects. I think that can be changed but should be kept in mind. Why the "representable in integers" part, and "integral address"? jdoerfert: Is "preserved" the right word here? Maybe "reserved"? --- _ "allocation instruction" +…
		aqjuneAuthorUnsubmitted Done Reply Inline Actions Thanks. Why the "representable in integers" part, and "integral address"? Because it is important (IMO) and related to the lifetime. To be specific this patch, It answers whether two stack allocas with different lifetimes may have overlapping addresses. aqjune: Thanks. > Why the "representable in integers" part, and "integral address"? Because it is…
		jdoerfertUnsubmitted Not Done Reply Inline Actions I can compare two pointers without converting them to integers, can't I? If so, I don't understand in which situations this special case would be applied. Said differently: When not convertable to integers and not comparable, and why shouldn't lifetime apply then? jdoerfert: I can compare two pointers without converting them to integers, can't I? If so, I don't…
		aqjuneAuthorUnsubmitted Done Reply Inline Actions While answering to your mail, I think I understood your point. If the memory block is never observed, it doesn't matter whether it is disjoint or not. I'll update the text. aqjune: While answering to your mail, I think I understood your point. If the memory block is never…
		aqjuneAuthorUnsubmitted Done Reply Inline Actions Oh, it was a different story. :/ well, I'll adequately update this part... aqjune: Oh, it was a different story. :/ well, I'll adequately update this part...
undefined. Pointer values are associated with address ranges according		undefined. Pointer values are associated with address ranges according
to the following rules:		to the following rules:

- A pointer value is associated with the addresses associated with any		- A pointer value is associated with the addresses associated with any
value it is based on.		value it is based on.
- An address of a global variable is associated with the address range		- An address of a global variable is associated with the address range
of the variable's storage.		of the variable's storage.
- The result value of an allocation instruction is associated with the		- The result value of an allocation instruction is associated with the
▲ Show 20 Lines • Show All 15,241 Lines • ▼ Show 20 Lines


Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.		Other targets may support this intrinsic differently, for example, by lowering it into a sequence of branches that guard scalar store operations.


Memory Use Markers		Memory Use Markers
------------------		------------------

This class of intrinsics provides information about the lifetime of		This class of intrinsics provides information about the
memory objects and ranges where variables are immutable.		:ref:`lifetime of memory objects <_objectlifetime>` and ranges where variables
		are immutable.

.. _int_lifestart:		.. _int_lifestart:

'``llvm.lifetime.start``' Intrinsic		'``llvm.lifetime.start``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

declare void @llvm.lifetime.start(i64 <size>, i8* nocapture <ptr>)		declare void @llvm.lifetime.start(i64 <size>, i8* nocapture <ptr>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.lifetime.start``' intrinsic specifies the start of a memory		The '``llvm.lifetime.start``' intrinsic specifies the start of
object's lifetime.		:ref:`a memory object's lifetime <_objectlifetime>`.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument is a constant integer representing the size of the		The first argument is a constant integer representing the size of the
object, or -1 if it is variable sized. The second argument is a pointer		object, or -1 if it is variable sized. The second argument is a pointer
to the object.		to the object.

Semantics:		Semantics:
""""""""""		""""""""""

This intrinsic indicates that before this point in the code, the value		If ``ptr`` is a stack-allocated object and its offset is zero, the object is
of the memory pointed to by ``ptr`` is dead. This means that it is known		initially marked as dead.
		RalfJungUnsubmitted Not Done Reply Inline Actions This sentence here is doing a lot of work. In particular, it makes the behavior of `alloca` depend on the future of the current execution, i.e. behavior depends on whether at some point in the future, a lifetime.start intrinsic is called on the resulting pointer. I think such acausal definitions are a mistake; they make it impossible to write an interpreter that just accurately executes LLVM IR and detects UB. At the very least, the documentation of `alloca` should be adjusted to explicitly talk about this. Right now, the documentation of lifetime.start alters the behavior of another instruction, which is really surprising and will easily be missed. Does LLVM correctly handle code like the following, which should not be UB even though there is an access to an alloca before the lifetime.start? (Imagine this to be LLVM IR code, and without "mustprogress") x = alloca ...; x = 5; while (true) {} lifetime.start(x) RalfJung:* This sentence here is doing a lot of work. In particular, it makes the behavior of `alloca`…
		nlopesUnsubmitted Not Done Reply Inline Actions Ralf we already agreed that the current design is suboptimal and that it makes it impossible to write a precise interpreter for LLVM IR. The exact semantics of lifetime.start depends on the pattern matching patterns in the stack coloring algorithm. So this intrinsic cannot be abused. It must be used for the uses it was created for only. We agree a better design would be to tag allocas with `dead` so they are initially dead or just to allow allocas in any BB rather than having all them pushed to the first BB (if we assume a basic stack allocation algorithm would run even in fastisel). So while we can write a billion corner-case examples, by hand, they are not useful for the discussion of this particular patch. We can't go back in time and fix the current design. We can't change the current design without breaking all the frontends out there either. The way forward, if someone is interested in doing more aggressive optimizations, is to design a new mechanism that would have to live side-by-side with the current one for a couple of years to allow frontends to migrate. Or create an auto-upgrade algorithm to convert the old idiom to the new one. I agree adding a note to alloca and cross-reference this section makes sense. nlopes: Ralf we already agreed that the current design is suboptimal and that it makes it impossible to…
to never be used and has an undefined value. A load from the pointer		After '``llvm.lifetime.start``', the stack object that ``ptr`` points is marked
that precedes this intrinsic can be replaced with ``'undef'``.		as alive and has an uninitialized value.
		The stack object is marked as dead when either
		:ref:`llvm.lifetime.end <int_lifeend>` to the alloca is executed or the
		function returns.

		After :ref:`llvm.lifetime.end <int_lifeend>` is called,
		'``llvm.lifetime.start``' on the stack object can be called again.
		The second '``llvm.lifetime.start``' call marks the object as alive, but it
		does not change the address of the object.

		If ``ptr`` is a non-stack-allocated object, its offset is non-zero or it is
		a stack object that is already alive, it simply fills all bytes of the object
		with ``poison``.
		aqjuneAuthorUnsubmitted Done Reply Inline Actions I added this part because I found this: https://bugs.llvm.org/show_bug.cgi?id=27903 https://reviews.llvm.org/D20739 This is an old bug, so I'm not sure whether this is still valid, but the patch still exists in StackColoring.cpp, so I added this. aqjune: I added this part because I found this: https://bugs.llvm.org/show_bug.cgi?id=27903 https…
		RalfJungUnsubmitted Not Done Reply Inline Actions So when `ptr` is a pointer to a stack-allocated object at offset 4, then all bytes of the object, including the ones at offset 0-3, will become `poison`? RalfJung: So when `ptr` is a pointer to a stack-allocated object at offset 4, then all bytes of the…
		aqjuneAuthorUnsubmitted Done Reply Inline Actions Yes, it was chosen to make all bytes poison; the first argument wasn't used because the description is saying it is the size of the object. aqjune: Yes, it was chosen to make all bytes poison; the first argument wasn't used because the…


.. _int_lifeend:		.. _int_lifeend:

'``llvm.lifetime.end``' Intrinsic		'``llvm.lifetime.end``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		RalfJungUnsubmitted Not Done Reply Inline Actions So this means if LLVM becomes smarter, and "what does this point to" changes from "unknown" to "this set of allocas", that could actually introduce UB because "lifetime.start" semantics changes from "memset(undef)" to "the alloca is initially dead and only becomes life at this marker". Did I understand this correctly? If yes, that seems quite problematic from a user perspective -- how can I make sure that my code does not have UB, if that depends on how "smart" LLVM's analysis is? RalfJung: So this means if LLVM becomes smarter, and "what does this point to" changes from "unknown" to…
		aqjuneAuthorUnsubmitted Done Reply Inline Actions (I just found your comment, sorry) If the argument of lifetime.start/end is to be syntactically restricted (just an assumption, it's still under discussion), the 'right' syntactic pattern should be already defined somehow. It must be a closure of pointer arithmetic operations or no-ops. aqjune: (I just found your comment, sorry) If the argument of lifetime.start/end is to be syntactically…
Syntax:		Syntax:
"""""""		"""""""

::		::

declare void @llvm.lifetime.end(i64 <size>, i8* nocapture <ptr>)		declare void @llvm.lifetime.end(i64 <size>, i8* nocapture <ptr>)

Overview:		Overview:
"""""""""		"""""""""

The '``llvm.lifetime.end``' intrinsic specifies the end of a memory		The '``llvm.lifetime.end``' intrinsic specifies the end of
object's lifetime.		:ref:`a memory object's lifetime <_objectlifetime>`.

Arguments:		Arguments:
""""""""""		""""""""""

The first argument is a constant integer representing the size of the		The first argument is a constant integer representing the size of the
object, or -1 if it is variable sized. The second argument is a pointer		object, or -1 if it is variable sized. The second argument is a pointer
to the object.		to the object.

Semantics:		Semantics:
""""""""""		""""""""""

This intrinsic indicates that after this point in the code, the value of		If ``ptr`` is a stack-allocated object and its offset is zero, the object is
the memory pointed to by ``ptr`` is dead. This means that it is known to		dead.
never be used and has an undefined value. Any stores into the memory		Calling ``llvm.lifetime.end`` on an already dead alloca is no-op.
object following this intrinsic may be removed as dead.
		If ``ptr`` is a non-stack-allocated object or its offset is non-zero,
		it is equivalent to simply filling all bytes of the object with ``poison``.


'``llvm.invariant.start``' Intrinsic		'``llvm.invariant.start``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""
This is an overloaded intrinsic. The memory object can belong to any address space.		This is an overloaded intrinsic. The memory object can belong to any address space.

▲ Show 20 Lines • Show All 3,531 Lines • Show Last 20 Lines