Download Raw Diff

Details

Reviewers

efriedma
hfinkel
nlopes
reames

Commits

rGad81d427caaf: [LangRef] Clarify poison semantics
rL363320: [LangRef] Clarify poison semantics

Summary

I find the current documentation of poison rather confusing, mainly because its use of "undefined behavior" doesn't seem to align with our usual interpretation (of immediate UB). Especially the sentence "any instruction that has a dependence on a poison value has undefined behavior" is very confusing.

Clarify poison semantics by:

Replacing the introductory paragraph with the standard rationale for having poison value.
Spelling out that an instruction depending on poison returns poison.
Spelling out how we go from a poison value to immediate undefined behavior and give the two examples we currently use in ValueTracking.
Spelling out that side effects depending on poison are UB.

(Context: Discussion in D62939 on when exactly poison turns into UB.)

Diff Detail

Repository: rL LLVM

Event Timeline

nikic created this revision.Jun 8 2019, 1:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2019, 1:43 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

jdoerfert added a subscriber: jdoerfert.Jun 8 2019, 11:32 AM

Sounds great! Just 1 comment inline.

llvm/docs/LangRef.rst
3275 ↗	(On Diff #203676)	This implies that doing a volatile store of a poison to memory is UB. Is this intended? I don't know of a use case that requires such behavior.

nlopes added a subscriber: aqjune.Jun 9 2019, 2:54 AM

nikic marked an inline comment as done.Jun 9 2019, 4:36 AM

nikic added inline comments.

llvm/docs/LangRef.rst
3275 ↗	(On Diff #203676)	One of the examples below explicitly lists volatile store of poison value as UB, which is why I mentioned it here: store volatile i32 %poison, i32* @g ; External observation; undefined behavior. The only code I'm aware of that does does reasoning in terms of side-effects with (control) dependence on poison is https://github.com/llvm-mirror/llvm/blob/1cbbb3f527a5aa13eda990e3dfa31f2ca4f64e07/lib/Analysis/ScalarEvolution.cpp#L6010-L6030, which does not care about the classification of any particular operation. Personally I don't think it makes sense to classify a volatile store of poison as UB for two reasons: First, a volatile operation can have side-effects, but does not need to. If it does have side-effects then it is UB, but for the purposes of generic IR reasoning, we have to assume here that it might just be a store to normal memory that happens to be marked volatile. Second, making an operation `volatile` generally allows strictly less transformations to be performed. This would be a case where marking something `volatile` would allow more aggressive optimizations using poison-based reasoning, which does not seem right. I'd be happy to drop the provision about volatile operations here, as well as the example below.

jdoerfert added inline comments.Jun 9 2019, 8:34 AM

llvm/docs/LangRef.rst
3270 ↗	(On Diff #203676)	Why do you make `null` special here? Generally, there are a lot of non-dereferenceable addresses and `null` is just a special one. I'd argue, loading poison, etc., is bad even if `null` is a valid address.

nlopes added inline comments.Jun 9 2019, 11:25 AM

llvm/docs/LangRef.rst
3270 ↗	(On Diff #203676)	I agree. Dereferencing poison should always be UB, since that pointer is not based on a valid object.
3275 ↗	(On Diff #203676)	Thank you, sounds good. I don't see why external observability actually matters for poison. I agree with removing this part.

Remove provision that a volatile store of a poison value is UB.

Dereferencing poison is always UB, independent of whether dereferencing null is UB.

nikic marked an inline comment as done.Jun 9 2019, 12:27 PM

nikic added inline comments.

llvm/docs/LangRef.rst
3270 ↗	(On Diff #203676)	I mentioned null here because I thought it is the only way in which UB for dereferencing follows directly from existing semantics (poison -> undef -> null -> UB). Apparently that's not right. Looking around the code, https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/Local.cpp#L2102-L2109 does indeed assume that storing to undef is always UB, independent of whether null is dereferencable. I assume that this is a consequence of https://llvm.org/docs/LangRef.html#pointer-aliasing-rules. I've added an explicit bullet for it there, as it wasn't obvious to me that this is the case.

Update example: Store to poison was not marked as UB.

jdoerfert added inline comments.Jun 9 2019, 1:10 PM

llvm/docs/LangRef.rst
2160 ↗	(On Diff #203752)	IMHO this is great, regardless of the poison discussion. Thanks for adding it.
3276 ↗	(On Diff #203752)	Now, side effects, what do we count and why do we have it explicitly. Also, should we explicitly mention control flow or is it sufficiently covered by the "depends" definition above?

aqjune added inline comments.Jun 9 2019, 1:53 PM

llvm/docs/LangRef.rst
3274 ↗	(On Diff #203752)	How about mentioning right-shift operations on poison (e.g. `ashr x, poison`) as well? Its semantics is equivalent to `x / 2^y` when x and y are non-poison(and non-undef), but LLVM seems to hoist `ashr x, poison` during optimization, implying that it cannot be UB : https://godbolt.org/z/2LG0xX .
3276 ↗	(On Diff #203752)	I also think that control flow on poison is an important issue. https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/LoopUnswitch.cpp#L585 also mentions that there's discrepancy between semantics of branch on undef/poison. My suggestion is to mention the discrepancy explicitly here so people can acknowledge the issue while reading LangRef.

Explicitly mention control dependent side-effects.

nikic marked an inline comment as done.Jun 10 2019, 9:30 AM

nikic added inline comments.

llvm/docs/LangRef.rst
3274 ↗	(On Diff #203752)	`ashr` does not have any special semantics in this context: It propagates poison, i.e. `ashr x, poison` is poison again. (Oversize shifts are defined as poison, not as UB.)
3276 ↗	(On Diff #203752)	Now, side effects, what do we count and why do we have it explicitly. I don't think there is anything in the LLVM IR where we can say that it definitely has a side-effect, just operations that might have one (volatile, calls, etc). We need this to be UB to allow reasoning like https://github.com/llvm-mirror/llvm/blob/1cbbb3f527a5aa13eda990e3dfa31f2ca4f64e07/lib/Analysis/ScalarEvolution.cpp#L6010-L6030, but don't really care about what exactly a side-effect is -- just that if one exists, a dependence on poison is UB. Also, should we explicitly mention control flow or is it sufficiently covered by the "depends" definition above? Control flow is covered by the dependence definition, but as this is an important case, I've added an explicit note now.

LGTM, thank you!

Also LGTM. Nice cleanup.

I will separately raise a conversation of whether storing poison should be immediate UB, but I'm explicitly OK w/it being removed in this patch given nothing actually appears to implement that semantic at the moment.

LGTM as well. Thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Jun 13 2019, 12:43 PM

Closed by commit rL363320: [LangRef] Clarify poison semantics (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

Diff 204610

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,147 Lines • ▼ Show 20 Lines
- A pointer value is associated with the addresses associated with any		- A pointer value is associated with the addresses associated with any
value it is based on.		value it is based on.
- An address of a global variable is associated with the address range		- An address of a global variable is associated with the address range
of the variable's storage.		of the variable's storage.
- The result value of an allocation instruction is associated with the		- The result value of an allocation instruction is associated with the
address range of the allocated storage.		address range of the allocated storage.
- A null pointer in the default address-space is associated with no		- A null pointer in the default address-space is associated with no
address.		address.
		- An :ref:`undef value <undefvalues>` in any address-space is
		associated with no address.
- An integer constant other than zero or a pointer value returned from		- An integer constant other than zero or a pointer value returned from
a function not defined within LLVM may be associated with address		a function not defined within LLVM may be associated with address
ranges allocated through mechanisms other than those provided by		ranges allocated through mechanisms other than those provided by
LLVM. Such ranges shall not overlap with any ranges of addresses		LLVM. Such ranges shall not overlap with any ranges of addresses
allocated by mechanisms provided by LLVM.		allocated by mechanisms provided by LLVM.

A pointer value is based on another pointer value according to the		A pointer value is based on another pointer value according to the
following rules:		following rules:
▲ Show 20 Lines • Show All 1,036 Lines • ▼ Show 20 Lines
location could clobber arbitrary memory, therefore, it has undefined		location could clobber arbitrary memory, therefore, it has undefined
behavior.		behavior.

.. _poisonvalues:		.. _poisonvalues:

Poison Values		Poison Values
-------------		-------------

Poison values are similar to :ref:`undef values <undefvalues>`, however		In order to facilitate speculative execution, many instructions do not
they also represent the fact that an instruction or constant expression		invoke immediate undefined behavior when provided with illegal operands,
that cannot evoke side effects has nevertheless detected a condition		and return a poison value instead.
that results in undefined behavior.

There is currently no way of representing a poison value in the IR; they		There is currently no way of representing a poison value in the IR; they
only exist when produced by operations such as :ref:`add <i_add>` with		only exist when produced by operations such as :ref:`add <i_add>` with
the ``nsw`` flag.		the ``nsw`` flag.

Poison value behavior is defined in terms of value dependence:		Poison value behavior is defined in terms of value dependence:

- Values other than :ref:`phi <i_phi>` nodes depend on their operands.		- Values other than :ref:`phi <i_phi>` nodes depend on their operands.
Show All 20 Lines	- An instruction control-depends on a :ref:`terminator
control transfers to one of the successors, and may not be executed		control transfers to one of the successors, and may not be executed
when control is transferred to another.		when control is transferred to another.
- Additionally, an instruction also control-depends on a terminator		- Additionally, an instruction also control-depends on a terminator
instruction if the set of instructions it otherwise depends on would		instruction if the set of instructions it otherwise depends on would
be different if the terminator had transferred control to a different		be different if the terminator had transferred control to a different
successor.		successor.
- Dependence is transitive.		- Dependence is transitive.

Poison values have the same behavior as :ref:`undef values <undefvalues>`,		An instruction that depends on a poison value, produces a poison value
with the additional effect that any instruction that has a dependence		itself. A poison value may be relaxed into an
on a poison value has undefined behavior.		:ref:`undef value <undefvalues>`, which takes an arbitrary bit-pattern.

		This means that immediate undefined behavior occurs if a poison value is
		used as an instruction operand that has any values that trigger undefined
		behavior. Notably this includes (but is not limited to):

		- The pointer operand of a :ref:`load <i_load>`, :ref:`store <i_store>` or
		any other pointer dereferencing instruction (independent of address
		space).
		- The divisor operand of a ``udiv``, ``sdiv``, ``urem`` or ``srem``
		instruction.

		Additionally, undefined behavior occurs if a side effect depends on poison.
		This includes side effects that are control dependent on a poisoned branch.

Here are some examples:		Here are some examples:

.. code-block:: llvm		.. code-block:: llvm

entry:		entry:
%poison = sub nuw i32 0, 1 ; Results in a poison value.		%poison = sub nuw i32 0, 1 ; Results in a poison value.
%still_poison = and i32 %poison, 0 ; 0, but also poison.		%still_poison = and i32 %poison, 0 ; 0, but also poison.
%poison_yet_again = getelementptr i32, i32* @h, i32 %still_poison		%poison_yet_again = getelementptr i32, i32* @h, i32 %still_poison
store i32 0, i32* %poison_yet_again ; memory at @h[0] is poisoned		store i32 0, i32* %poison_yet_again ; Undefined behavior due to
		; store to poison.

store i32 %poison, i32* @g ; Poison value stored to memory.		store i32 %poison, i32* @g ; Poison value stored to memory.
%poison2 = load i32, i32* @g ; Poison value loaded back from memory.		%poison2 = load i32, i32* @g ; Poison value loaded back from memory.

store volatile i32 %poison, i32* @g ; External observation; undefined behavior.

%narrowaddr = bitcast i32* @g to i16*		%narrowaddr = bitcast i32* @g to i16*
%wideaddr = bitcast i32* @g to i64*		%wideaddr = bitcast i32* @g to i64*
%poison3 = load i16, i16* %narrowaddr ; Returns a poison value.		%poison3 = load i16, i16* %narrowaddr ; Returns a poison value.
%poison4 = load i64, i64* %wideaddr ; Returns a poison value.		%poison4 = load i64, i64* %wideaddr ; Returns a poison value.

%cmp = icmp slt i32 %poison, 0 ; Returns a poison value.		%cmp = icmp slt i32 %poison, 0 ; Returns a poison value.
br i1 %cmp, label %true, label %end ; Branch to either destination.		br i1 %cmp, label %true, label %end ; Branch to either destination.

▲ Show 20 Lines • Show All 13,940 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 4,330 Lines • ▼ Show 20 Lines	bool llvm::isGuaranteedToExecuteForEveryIteration(const Instruction *I,
for (const Instruction &LI : *L->getHeader()) {		for (const Instruction &LI : *L->getHeader()) {
if (&LI == I) return true;		if (&LI == I) return true;
if (!isGuaranteedToTransferExecutionToSuccessor(&LI)) return false;		if (!isGuaranteedToTransferExecutionToSuccessor(&LI)) return false;
}		}
llvm_unreachable("Instruction not contained in its own parent basic block.");		llvm_unreachable("Instruction not contained in its own parent basic block.");
}		}

bool llvm::propagatesFullPoison(const Instruction *I) {		bool llvm::propagatesFullPoison(const Instruction *I) {
		// TODO: This should include all instructions apart from phis, selects and
		// call-like instructions.
switch (I->getOpcode()) {		switch (I->getOpcode()) {
case Instruction::Add:		case Instruction::Add:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Xor:		case Instruction::Xor:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::BitCast:		case Instruction::BitCast:
case Instruction::AddrSpaceCast:		case Instruction::AddrSpaceCast:
case Instruction::Mul:		case Instruction::Mul:
▲ Show 20 Lines • Show All 1,382 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Clarify poison semantics
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 204610

llvm/trunk/docs/LangRef.rst

llvm/trunk/lib/Analysis/ValueTracking.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Clarify poison semanticsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 204610

llvm/trunk/docs/LangRef.rst

llvm/trunk/lib/Analysis/ValueTracking.cpp

[LangRef] Clarify poison semantics
ClosedPublic