Page MenuHomePhabricator

[LangRef] Clarify "undefined" for various instructions.

Authored by efriedma on Jun 4 2018, 4:11 PM.



Some places in LangRef say something like "the result is undefined"; instead, state what happens more explicitly. Some places don't say at all what happens when an invariant is violated; readers should assume the behavior is undefined, but clarify that in a few places related to memory accesses, where it might be confusing since some loads return undef or poison.

Not sure I've chosen the right resolution to all these cases; we might want to change some of these to return poison instead.

Diff Detail


Event Timeline

efriedma created this revision.Jun 4 2018, 4:11 PM
nlopes added a subscriber: aqjune.Jun 5 2018, 7:35 AM
nlopes added a comment.Jun 5 2018, 7:46 AM

Eli, thanks a lot for kicking off the discussion. I think this patch is a bit too big since there are a few things that are not trivial.
For example, I would rather not introduce more functions returning undef, but rather return poison instead. If there's no good motivation for undef, poison should be used by default from now on, since it's much easier to handle than undef.
This patch also introduces a lot of UB with metadata tags, which is a departure from how we handle things like nsw/nuw which make the instructions yield poison instead of UB. Why is it more important to preserve nsw when hoisting an add than preserving !nonnull when hoisting a load? I really don't know; hence I'm asking.
I think it would help to split this patch a bit.

nlopes added inline comments.Jun 5 2018, 8:02 AM

I'm fine with all these UBs in function attributes, since it seems it's how they work today.
However, we need to make sure that when hoisting function calls, which I don't think it's done today. This may not be feasible if the attribute is on the function decl rather than on the call instruction.


I'm not sure this sentence is clear.
Does it mean that for NaNs the result has to be the same before and after optimization? Or does it mean that the result after the optimization can be whatever, but fixed (no poison or undef allowed)?


Can we have poison here instead? (and for the following ones as well)


UB vs poison here:

  • UB: after a load with 'range' we know the memory has a value within the range.
  • Poison: after branching on the loaded value we know the memory as a value within the range.

UB has more imediate effects than poison, of course. For GVN and friends, UB is a bit easier, but hoisting such an instruction we need to drop the range metadata. Is it done today?
For range analysis, I guess both semantics are fine, though the UB semantics may potentially allow a better analysis since after the load the analysis can assume the range right away instead of waiting for a branch. The UB semantics helps shrinking bitwidth of arithmetic as well, which otherwise isn't easy to do (since you would need to check the users of the expression tree).

Bottom line: this really depends on what kind of transformations LLVM does today (and in the future) that care about this range metadata.


can it be poison instead?


Is undef required by the C/C++ standards or can it be poison?


GEP is difficult. I suggest we leave GEP inbounds/inrange discussion for a separate patch.

efriedma added inline comments.Jun 5 2018, 1:24 PM

Should be "whatever, but no poison/undef" (there's very little benefit and a lot of risk involved in opening up the possibility of undefined behavior from fast-math). But yes, should be called out explicitly.


Making fptoui produce poison is probably fine.

For uitofp, we should probably consider defining the overflow cases to match the IEEE754 convertFromInt (produce +-Inf). But that's not what LLVM currently does, so it would involve some code changes.


Probably, yes; it's not expected, in the same way as an out-of-range shift.


Actually, thinking about this a bit more, we might need a defined, non-null value.

C and C++ both prohibit zero-size types (arrays can't be zero length, and structs have at least one member in C, and implicit padding in C++). But there's a GNU extension to allow zero-length arrays. That extension is usually only used as a replacement for flexible array members in pre-C99 code, but we allow such arrays anywhere. And given zero-length array variables are allowed, the natural way to lower it is a zero-byte alloca. This is what clang currently emits.

Given such variables are allowed, we probably need to ensure &x == &x, and &x != 0, which implies a value that is not poison or undef.

hfinkel added inline comments.

Chatting offline with @chandlerc and @rsmith , we currently have a problem with the way we use dereferenceable in Clang. We add this attribute for references, but it's possible to free the underlying storage during the execution of the function. I'd prefer to fix this by saying that dereferenceable only guarantees its properties at function entry, as this lets us preserve some of the optimization benefit (where the existing semantics can be obtained, in large part, by combining dereferenceable with noalias or noalias metadata).

We should discuss what we'd like to do here.

efriedma added inline comments.Jun 6 2018, 1:14 PM

"Duration of the function" reflects what's actually implemented at the moment. And it's generally useful even if we can't use it for C++ references. C structs which are passed/returned indirectly work this way; Rust references also work like this.

If "dereferencable on entry to the function" would be useful for C++, I think it should be a separate attribute.

Since this is apparently controversial, I'll split "dereferencable" it into it's own patch, so we can continue discussing it.

See,,, for patches split off from this. Still need to split off patches for GEP, alloca, fast-math flags, the dereferenceable function attribute, and the other function attributes.

fhahn added inline comments.Jun 7 2018, 1:58 AM

AFAIK GVN and friends use the combineMetadata helper to set the metadata of a instruction K replacing an instruction J. Currently this function only sets metadata like !range and !nonnull if both instructions have them (for range it selects the most general range). With using UB here (and for nonnull) we should be able to preserve such metadata kinds in K if it dominates J, as in D47339 and D47475

efriedma abandoned this revision.Jul 6 2018, 1:58 PM

function attributes:

That should be everything (except the getelementptr change, which I don't really want to dive into), so closing.