This is an archive of the discontinued LLVM Phabricator instance.

[LangRef] Clarify poison semantics
ClosedPublic

Authored by nikic on Jun 8 2019, 1:43 AM.

Details

Summary

I find the current documentation of poison rather confusing, mainly because its use of "undefined behavior" doesn't seem to align with our usual interpretation (of immediate UB). Especially the sentence "any instruction that has a dependence on a poison value has undefined behavior" is very confusing.

Clarify poison semantics by:

  • Replacing the introductory paragraph with the standard rationale for having poison value.
  • Spelling out that an instruction depending on poison returns poison.
  • Spelling out how we go from a poison value to immediate undefined behavior and give the two examples we currently use in ValueTracking.
  • Spelling out that side effects depending on poison are UB.

(Context: Discussion in D62939 on when exactly poison turns into UB.)

Diff Detail

Repository
rL LLVM

Event Timeline

nikic created this revision.Jun 8 2019, 1:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2019, 1:43 AM
nlopes added a comment.Jun 9 2019, 2:53 AM

Sounds great! Just 1 comment inline.

llvm/docs/LangRef.rst
3275 ↗(On Diff #203676)

This implies that doing a volatile store of a poison to memory is UB. Is this intended? I don't know of a use case that requires such behavior.

nlopes added a subscriber: aqjune.Jun 9 2019, 2:54 AM
nikic marked an inline comment as done.Jun 9 2019, 4:36 AM
nikic added inline comments.
llvm/docs/LangRef.rst
3275 ↗(On Diff #203676)

One of the examples below explicitly lists volatile store of poison value as UB, which is why I mentioned it here:

store volatile i32 %poison, i32* @g  ; External observation; undefined behavior.

The only code I'm aware of that does does reasoning in terms of side-effects with (control) dependence on poison is https://github.com/llvm-mirror/llvm/blob/1cbbb3f527a5aa13eda990e3dfa31f2ca4f64e07/lib/Analysis/ScalarEvolution.cpp#L6010-L6030, which does not care about the classification of any particular operation.

Personally I don't think it makes sense to classify a volatile store of poison as UB for two reasons: First, a volatile operation can have side-effects, but does not need to. If it does have side-effects then it is UB, but for the purposes of generic IR reasoning, we have to assume here that it might just be a store to normal memory that happens to be marked volatile. Second, making an operation volatile generally allows strictly less transformations to be performed. This would be a case where marking something volatile would allow more aggressive optimizations using poison-based reasoning, which does not seem right.

I'd be happy to drop the provision about volatile operations here, as well as the example below.

jdoerfert added inline comments.Jun 9 2019, 8:34 AM
llvm/docs/LangRef.rst
3270 ↗(On Diff #203676)

Why do you make null special here? Generally, there are a lot of non-dereferenceable addresses and null is just a special one. I'd argue, loading poison, etc., is bad even if null is a valid address.

nlopes added inline comments.Jun 9 2019, 11:25 AM
llvm/docs/LangRef.rst
3270 ↗(On Diff #203676)

I agree. Dereferencing poison should always be UB, since that pointer is not based on a valid object.

3275 ↗(On Diff #203676)

Thank you, sounds good. I don't see why external observability actually matters for poison.
I agree with removing this part.

nikic updated this revision to Diff 203748.Jun 9 2019, 11:43 AM
nikic marked 2 inline comments as done.

Remove provision that a volatile store of a poison value is UB.

nikic updated this revision to Diff 203751.Jun 9 2019, 12:21 PM
nikic marked 2 inline comments as done.
nikic edited the summary of this revision. (Show Details)

Dereferencing poison is always UB, independent of whether dereferencing null is UB.

nikic marked an inline comment as done.Jun 9 2019, 12:27 PM
nikic added inline comments.
llvm/docs/LangRef.rst
3270 ↗(On Diff #203676)

I mentioned null here because I thought it is the only way in which UB for dereferencing follows directly from existing semantics (poison -> undef -> null -> UB).

Apparently that's not right. Looking around the code, https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Utils/Local.cpp#L2102-L2109 does indeed assume that storing to undef is always UB, independent of whether null is dereferencable.

I assume that this is a consequence of https://llvm.org/docs/LangRef.html#pointer-aliasing-rules. I've added an explicit bullet for it there, as it wasn't obvious to me that this is the case.

nikic updated this revision to Diff 203752.Jun 9 2019, 12:49 PM

Update example: Store to poison was not marked as UB.

jdoerfert added inline comments.Jun 9 2019, 1:10 PM
llvm/docs/LangRef.rst
2160 ↗(On Diff #203752)

IMHO this is great, regardless of the poison discussion. Thanks for adding it.

3276 ↗(On Diff #203752)

Now, side effects, what do we count and why do we have it explicitly.

Also, should we explicitly mention control flow or is it sufficiently covered by the "*depends*" definition above?

aqjune added inline comments.Jun 9 2019, 1:53 PM
llvm/docs/LangRef.rst
3274 ↗(On Diff #203752)

How about mentioning right-shift operations on poison (e.g. ashr x, poison) as well?
Its semantics is equivalent to x / 2^y when x and y are non-poison(and non-undef), but LLVM seems to hoist ashr x, poison during optimization, implying that it cannot be UB : https://godbolt.org/z/2LG0xX .

3276 ↗(On Diff #203752)

I also think that control flow on poison is an important issue.

https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/LoopUnswitch.cpp#L585 also mentions that there's discrepancy between semantics of branch on undef/poison.
My suggestion is to mention the discrepancy explicitly here so people can acknowledge the issue while reading LangRef.

nikic updated this revision to Diff 203846.Jun 10 2019, 9:28 AM
nikic marked an inline comment as done.

Explicitly mention control dependent side-effects.

nikic marked an inline comment as done.Jun 10 2019, 9:30 AM
nikic added inline comments.
llvm/docs/LangRef.rst
3274 ↗(On Diff #203752)

ashr does not have any special semantics in this context: It propagates poison, i.e. ashr x, poison is poison again. (Oversize shifts are defined as poison, not as UB.)

3276 ↗(On Diff #203752)

Now, side effects, what do we count and why do we have it explicitly.

I don't think there is anything in the LLVM IR where we can say that it definitely has a side-effect, just operations that might have one (volatile, calls, etc). We need this to be UB to allow reasoning like https://github.com/llvm-mirror/llvm/blob/1cbbb3f527a5aa13eda990e3dfa31f2ca4f64e07/lib/Analysis/ScalarEvolution.cpp#L6010-L6030, but don't really care about what exactly a side-effect is -- just that if one exists, a dependence on poison is UB.

Also, should we explicitly mention control flow or is it sufficiently covered by the "*depends*" definition above?

Control flow is covered by the dependence definition, but as this is an important case, I've added an explicit note now.

LGTM, thank you!

Also LGTM. Nice cleanup.

I will separately raise a conversation of whether storing poison should be immediate UB, but I'm explicitly OK w/it being removed in this patch given nothing actually appears to implement that semantic at the moment.

LGTM as well. Thanks!

This revision was not accepted when it landed; it landed in state Needs Review.Jun 13 2019, 12:43 PM
This revision was automatically updated to reflect the committed changes.