Page MenuHomePhabricator

Allow nonnull/align attribute to accept poison
ClosedPublic

Authored by aqjune on Oct 31 2020, 6:56 AM.

Details

Summary

Currently LLVM is relying on ValueTracking's isKnownNonZero to attach nonnull, which can return true when the value is poison.
To make the semantics of nonnull consistent with the behavior of isKnownNonZero, this makes the semantics of nonnull to accept poison, and return poison if the input pointer isn't null.
This makes many transformations like below legal:

%p = gep inbounds %x, 1 ; % p is non-null pointer or poison
call void @f(%p)        ; instcombine converts this to call void @f(nonnull %p)

Instead, this semantics makes propagation of nonnull to caller illegal.
The reason is that, passing poison to nonnull does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument.
Having noundef attribute there re-allows this.

define void @f(i8* %p) {       ; functionattr cannot mark %p nonnull here anymore
  call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p.
  ret void
}

Another attribute that needs to be updated is align. This patch updates the semantics of align to accept poison as well.

Diff Detail

Event Timeline

aqjune created this revision.Oct 31 2020, 6:56 AM
aqjune requested review of this revision.Oct 31 2020, 6:56 AM

I tried to update Attributor as well, but the infrastructure was quite big; I'll update Attributor as well if someone points where to fix.

aqjune updated this revision to Diff 302096.Oct 31 2020, 8:32 AM

make LangRef explicitly describe non-ub

To share my understanding about how nonnull is used in optimizations:

  1. It is mainly used to fold away null pointer checks.
define void @f(i8* nonnull %p) {
  if (icmp eq %p, null) ret void         ; this check can be folded to true
  ...
}

Note that the new nonnull-or-poison semantics is fine in the above case because if %p is poison we can fold icmp eq %p, null into false.

A similar but slightly different case:

...
call void @f(i8* nonnull %p)
if (icmp eq %p, null) ret void

In this case, the nonnull-or-poison semantics cannot make the comparison after call folded, because f may not use the poison argument at all, making the program well-defined.
But, if f has noundef attribute as well, this is fine because passing null to f is now UB:

...
call void @f(i8* noundef nonnull %p)
if (icmp eq %p, null) ret void            ; %p is always non-null pointer!
  1. It is combined with dereferenceable_or_null to infer dereferenceable.

This is still fine with the new nonnull-or-poison because passing poison to dereferenceable_or_null is still UB, guaranteeing that the pointer is never poison.

jdoerfert requested changes to this revision.Oct 31 2020, 3:35 PM

I think we have two choices here:

  1. Don't raise UB when "value attributes" are passed a "wrong value", e.g., null for a nonnull attribute, but make the value poison. Use nonull + noundef to make it UB.
  2. Make all "value attributes" accept poison without raising UB.

I was in the past pushing for 1), I don't have the link handy but we can probably find it. I think the last time I brought this up was the noundef discussion actually.
One of my examples was the gep one shown in the commit message,
I vaguely remember one where the user "broke the contract" but in a way they would assume to be harmless, e.g., they did not cause any side-effect, maybe something like:

void foo(bool valid, X& x) {
  if (!valid) return;
  ...
}

obj_ptr = null;
foo(obj_ptr != null, *obj_ptr);

I remember @efriedma was not a fan of 1) at the time, unsure if that is still the case with noundef in place.

If we don't do 1), we should talk about 2) before we make nonnull special. I fail to see the reason it is different from any other (or at least most) "value attributes".

This revision now requires changes to proceed.Oct 31 2020, 3:35 PM
aqjune added a comment.Nov 1 2020, 2:37 AM

I think we have two choices here:

  1. Don't raise UB when "value attributes" are passed a "wrong value", e.g., null for a nonnull attribute, but make the value poison. Use nonull + noundef to make it UB.
  2. Make all "value attributes" accept poison without raising UB.

If we don't do 1), we should talk about 2) before we make nonnull special.

It seems these two choices are entangled.
If f(nonnull poison) is okay (in other words, not UB), then f(nonnull null) shouldn't be UB as well.
The reason is that poison can be folded into null in any time.
For example, when inbounds is stripped from v = gep inbounds ..., v can be transformed from poison to null pointer.

I fail to see the reason it is different from any other (or at least most) "value attributes".

I had a thought about this, and here's what I think:
Each value attribute has a primary transformation to target.
For example, nonnull is for null comparison folding, and dereferenceable is for reordering of load instructions by guaranteeing that dereferencing the pointer never raises UB (unless the pointer is freed).

I think nonnull is different from dereferenceable in that allowing it to be poison doesn't break the intended optimization.
This folding is still okay even if the pointer is poison.
By allowing poison, further optimizations can be done as well because gep inbounds ptr, 1 can be marked as nonnull.
However, allowing dereferenceable to be poison will block such optimizations, which doesn't seem quite desirable IMO.

This causes the properties of attributes slightly heterogeneous, but I think it is fine if the goal is to support more optimizations.

uenoku added a subscriber: uenoku.Nov 1 2020, 10:12 AM

I think we have two choices here:

  1. Don't raise UB when "value attributes" are passed a "wrong value", e.g., null for a nonnull attribute, but make the value poison. Use nonull + noundef to make it UB.
  2. Make all "value attributes" accept poison without raising UB.

If we don't do 1), we should talk about 2) before we make nonnull special.

It seems these two choices are entangled.
If f(nonnull poison) is okay (in other words, not UB), then f(nonnull null) shouldn't be UB as well.
The reason is that poison can be folded into null in any time.
For example, when inbounds is stripped from v = gep inbounds ..., v can be transformed from poison to null pointer.

Agreed. So let's go with 1) and we change all the "value attribute" semantics to produce poison on violation.
We have noundef for the UB case, that was the reason I wanted it ;)

I fail to see the reason it is different from any other (or at least most) "value attributes".

I had a thought about this, and here's what I think:
Each value attribute has a primary transformation to target.
For example, nonnull is for null comparison folding, and dereferenceable is for reordering of load instructions by guaranteeing that dereferencing the pointer never raises UB (unless the pointer is freed).

I think nonnull is different from dereferenceable in that allowing it to be poison doesn't break the intended optimization.
This folding is still okay even if the pointer is poison.
By allowing poison, further optimizations can be done as well because gep inbounds ptr, 1 can be marked as nonnull.
However, allowing dereferenceable to be poison will block such optimizations, which doesn't seem quite desirable IMO.

This causes the properties of attributes slightly heterogeneous, but I think it is fine if the goal is to support more optimizations.

I don't think we should argue this way, at least it should be the last resort. An attribute describes a property, doesn't matter if it is
nonnull, align, or dereferenceable, it's a property of the value. Now, due to the way things are it is possible to violate
that property and then we need to define what is happening. If we do this on a case-by-case basis and with specific transformations
in mind, we will make the IR more complex, the attributes confusing, and in the end non-composeable.

What is the reason not to go with 1) above, so to remove the UB behavior and produce poison. We write an RFC and do the change.
FWIW, that is what AAUndefinedBehavior in the Attributor already does, only with noundef it will deduce UB for nonnull null.

Agreed. So let's go with 1) and we change all the "value attribute" semantics to produce poison on violation.
We have noundef for the UB case, that was the reason I wanted it ;)

Thank you, glad to hear that moving towards the change!

I don't think we should argue this way, at least it should be the last resort. An attribute describes a property, doesn't matter if it is
nonnull, align, or dereferenceable, it's a property of the value. Now, due to the way things are it is possible to violate
that property and then we need to define what is happening. If we do this on a case-by-case basis and with specific transformations
in mind, we will make the IR more complex, the attributes confusing, and in the end non-composeable.

I think the discussion finally falls into which attribute is a value attribute and which is not.
Value attributes should have the same semantics as you said.
Sometimes value attribute and non-value attribute can interact with each other (e.g., byval implies nonnull unless null is a valid pointer), but such complexity should be only at there.

FWIW, that is what AAUndefinedBehavior in the Attributor already does, only with noundef it will deduce UB for nonnull null.

Aha, thanks for the info.

Agreed. So let's go with 1) and we change all the "value attribute" semantics to produce poison on violation.
We have noundef for the UB case, that was the reason I wanted it ;)

Thank you, glad to hear that moving towards the change!

Wasn't that my position for more than a year now ;)

I don't think we should argue this way, at least it should be the last resort. An attribute describes a property, doesn't matter if it is
nonnull, align, or dereferenceable, it's a property of the value. Now, due to the way things are it is possible to violate
that property and then we need to define what is happening. If we do this on a case-by-case basis and with specific transformations
in mind, we will make the IR more complex, the attributes confusing, and in the end non-composeable.

I think the discussion finally falls into which attribute is a value attribute and which is not.
Value attributes should have the same semantics as you said.
Sometimes value attribute and non-value attribute can interact with each other (e.g., byval implies nonnull unless null is a valid pointer), but such complexity should be only at there.

Fair. Though, I think we want to produce poison for one set of attributes for which the name "value attribute" was not well chosen.
So far, the things I think should produce poison not UB are:

(pure) value attributes:

  • nonnull
  • align
  • [used_bits] (not existing yet)

(context) value attributes:

  • dereferenceable
  • dereferenceable_or_null
  • [object_size] (as proposed on the list)

WDYT?

FWIW, that is what AAUndefinedBehavior in the Attributor already does, only with noundef it will deduce UB for nonnull null.

Aha, thanks for the info.

aqjune added a comment.Nov 2 2020, 1:02 PM

Fair. Though, I think we want to produce poison for one set of attributes for which the name "value attribute" was not well chosen.
So far, the things I think should produce poison not UB are:

(pure) value attributes:

  • nonnull
  • align
  • [used_bits] (not existing yet)

(context) value attributes:

  • dereferenceable
  • dereferenceable_or_null
  • [object_size] (as proposed on the list)

WDYT?

In case of dereferenceable, my opinion is still slightly different: it represents the property of the memory at the point.
If a memory block is freed, a same pointer value won't be dereferenceable after the deallocation.

Other than the conceptual reason, my practical concern about dereferenceable is that dereferenceable is hard to use unless it is with noundef.
Since loading poison is UB, the attribute can't still give a guarantee that loading the pointer is well defined.
Furthermore, noundef is hard to infer from the context. For example,

store i32 0, i32* %p
call void @f(i32* %p) ; can we infer %p's noundef & dereferenceable(4) from store?

store i32 0, i32* %p does not guarantee that %p is noundef because it can be partially undef but still dereferenceable (D87994 is a relevant LangRef patch).

The example above was propagating the attribute from caller to callee, but the inverse direction (callee -> caller) is the same: If @f contains a store, it guarantees that the pointer is dereferenceable but not noundef.

In order to regain optimization power, we need a way to attach noundef more aggressively.
D81678 will attach noundef to arguments, and additionally I think it is valid for clang to attach !noundef when lowering non-char typed lvalues to load.

Fair. Though, I think we want to produce poison for one set of attributes for which the name "value attribute" was not well chosen.
So far, the things I think should produce poison not UB are:

(pure) value attributes:

  • nonnull
  • align
  • [used_bits] (not existing yet)

(context) value attributes:

  • dereferenceable
  • dereferenceable_or_null
  • [object_size] (as proposed on the list)

WDYT?

In case of dereferenceable, my opinion is still slightly different: it represents the property of the memory at the point.
If a memory block is freed, a same pointer value won't be dereferenceable after the deallocation.

That is why it is not listed in the "pure" category ;). It is not only a property of the value, but it still should be split wrt UB behavior. See below for my reasoning based on your example.

Partially related:
It is, nowadays, unclear to me if your interpretation of dereferenceable is what we want though. It is/was my interpretation as well FWIW.
@reames suggested that dereferenceability is not about it pointing to an allocated object but to a memory location that can be loaded without causing observable behavior, i.a., a trap.
Unsure if we want a new attribute for that but certainly there are reasons that you want that information which is a "value property" and not tied to the "allocation status".

Other than the conceptual reason, my practical concern about dereferenceable is that dereferenceable is hard to use unless it is with noundef.

Regardless of my argument below, nobody said you cannot use dereferenceable with noundef "all the time".
I mean, if you argue a current use of dereferenceable would actually require the "UB if violated" behavior, then simply also add noundef as well.

Since loading poison is UB, the attribute can't still give a guarantee that loading the pointer is well defined.

Can you clarify "loading poison is UB". I'm unsure I follow/agree.

FWIW, loading a dereferenceable pointer might result in poison but it's even not clear to me that out-of-bounds accesses are UB as of now (in IR).
Maybe I just forgot where it says I cannot do load i32, i32* (bitcast i8* [alloca i8] to i32*). I certainly would get 3 poison bytes if not UB.

Furthermore, noundef is hard to infer from the context. For example,

store i32 0, i32* %p
call void @f(i32* %p) ; can we infer %p's noundef & dereferenceable(4) from store?

store i32 0, i32* %p does not guarantee that %p is noundef because it can be partially undef but still dereferenceable (D87994 is a relevant LangRef patch).

The example above was propagating the attribute from caller to callee, but the inverse direction (callee -> caller) is the same: If @f contains a store, it guarantees that the pointer is dereferenceable but not noundef.

OK, I'm unsure I get the point though. We can derive dereferenceable in either case, agreed? What is currently better?

(@hideto @stefan1 @okura, I think the Attributor might derive noundef here but should not, wdyt?)

In order to regain optimization power, we need a way to attach noundef more aggressively.

This is correct but orthogonal. Having noundef "backed-in" to other attributes is not helping us to derive noundef it just prevents us to derive other attributes.

D81678 will attach noundef to arguments, and additionally I think it is valid for clang to attach !noundef when lowering non-char typed lvalues to load.

Great.

aqjune added a comment.Nov 3 2020, 1:19 AM

In order to regain optimization power, we need a way to attach noundef more aggressively.

This is correct but orthogonal. Having noundef "backed-in" to other attributes is not helping us to derive noundef it just prevents us to derive other attributes.

I think dereferenceable(current UB semantics) != dereferenceable(new poison semantics) + noundef. In the past I thought they were equal, but here is the reason...

The reason is that the current dereferenceable accepts partially undefined pointers, IIUC. At least, LangRef does not prohibit it.
noundef allows accepting well-defined pointers only, so there is a slight difference.

Note that this also happens to nonnull: nonnull (current UB sem.) != nonnull(new poison sem.) + noundef.
However, the situation is slightly different, because the change allows more analysis.
gep inbounds p, 1 is now nonnull, which is certainly a benefit.

In order to regain equivalence, nopoison should be introduced. nopoison allows the value to be (partially) undef but not poison.
dereferenceable(current UB semantics) == dereferenceable(new poison semantics) + nopoison holds.

This is really due to the complexity of undef.... :(

Can you clarify "loading poison is UB". I'm unsure I follow/agree.

FWIW, loading a dereferenceable pointer might result in poison but it's even not clear to me that out-of-bounds accesses are UB as of now (in IR).
Maybe I just forgot where it says I cannot do load i32, i32* (bitcast i8* [alloca i8] to i32*). I certainly would get 3 poison bytes if not UB.

I meant 'loading a poison pointer', sorry. load i8* poison raises UB.

aqjune added a comment.Nov 3 2020, 1:21 AM

OK, I'm unsure I get the point though. We can derive dereferenceable in either case, agreed? What is currently better?

The current dereferenceable definition gives a guarantee that the pointer is not poison. :)
It is good because it guarantees that loads and stores to the pointer don't trap.

aqjune added a comment.Nov 3 2020, 7:03 PM

BTW, there are !nonnull metadata and llvm.assume with nonnull operand bundle as well - the semantics of those should be updated too.
Maybe we can just follow the semantics of nonnull attribute here.

(reviving an old discussion) to summarize the issue:
The issue was whether it is okay to make dereferenceable %p non-UB if %p was not dereferenceable.

In order to make the attribute useful, noundef %p should be accompanied.
The problem is that we cannot infer noundef %p from store or load in general; noundef can be inferred in limited cases only.

aqjune added a comment.Dec 2 2020, 4:28 AM

@jdoerfert Kindly ping, what do you think?

I think making align and nonnull to follow poison semantics is still a great thing.
The update of nonnull makes its semantics and existing optimizations about nonnull consistent.
I did not deeply investigate align yet, but I have a good feeling that that moving towards poison semantics will explain more optimizations about align that involve gep inbounds with aligned offsets.

jdoerfert accepted this revision.Dec 2 2020, 8:41 AM

Some nits. I think the direction is right. Update the commit message though. And please wait for an OK by @efriedma .

llvm/docs/LangRef.rst
1231

Mention the combination with noundef please.

llvm/lib/Analysis/ValueTracking.cpp
2091–2092

/* AllowUndefOrPoison */ false

llvm/lib/Transforms/IPO/FunctionAttrs.cpp
645

see above.

llvm/test/Analysis/ValueTracking/known-nonnull-at.ll
5

maybe rename to make it clear at the call site

This revision is now accepted and ready to land.Dec 2 2020, 8:41 AM
aqjune updated this revision to Diff 309157.Dec 2 2020, 10:40 PM

Address comments

aqjune marked 4 inline comments as done.Dec 2 2020, 10:42 PM
aqjune edited the summary of this revision. (Show Details)Dec 2 2020, 10:46 PM
aqjune edited the summary of this revision. (Show Details)
aqjune updated this revision to Diff 309159.Dec 2 2020, 10:50 PM

minor fix to a test

aqjune added a comment.Dec 8 2020, 2:43 AM

@efriedma gentle ping

Slightly related to this patch: should llvm.assume's nonnull/align operand bundles be updated to follow this semantics as well?
I'll follow any semantics that would be convenient for Attributor's implementation or anything related to this.
I can make a LangRef update patch as well.

Should we have clarification about the semantics of align/nonnull bundle for the assume intrinsic?
When a function call is inlined, the attributes of the parameters may remain as assume's bundles:

call f(nonnull %p)
->
llvm.assume [ "nonnull"(%p) ]

Here are two possible semantics:

  1. When "nonnull"(%p) is at the operand bundle, there also should be "noundef"(%p), otherwise it is a no-op.
; A correct transformation
call f(nonnull noundef %p)
->
llvm.assume [ "nonnull"(%p), "noundef"(%p) ] ; since noundef and nonnull are both in the bundle, it is UB if %p is null
  1. "nonnull"(%p) diverges from nonnull, and immediately raises UB if %p is null. This means that the lowering should be done only when noundef existed.
; A correct transformation
call f(nonnull noundef %p)
->
llvm.assume [ "nonnull"(%p) ] ; this is UB if %p is null

The former makes attribute -> op bundle conversion simpler, whereas the latter makes reasoning from op bundle simpler (well, maybe both are simple enough). :)
I think either option is fine.

Should we have clarification about the semantics of align/nonnull bundle for the assume intrinsic?

Probably true, good catch. Why not make it the same as the attribute? So,

; A correct transformation
call f(nonnull %p, nonnull noundef %q)
->
llvm.assume [ "nonnull"(%p), "nonnull"(%q), "noundef"(%q)]
; where a `null` value for %p can be assumed to be a poison value and
; a `null` value for %q can be assumed to cause UB because `noundef` of
; poison is UB.

Should we have clarification about the semantics of align/nonnull bundle for the assume intrinsic?

Probably true, good catch. Why not make it the same as the attribute? So,

; A correct transformation
call f(nonnull %p, nonnull noundef %q)
->
llvm.assume [ "nonnull"(%p), "nonnull"(%q), "noundef"(%q)]
; where a `null` value for %p can be assumed to be a poison value and
; a `null` value for %q can be assumed to cause UB because `noundef` of
; poison is UB.

+1, I'll make a separate patch for this.

I like where this is going. Most of LLVM's alias analysis produce information that only holds if the value is not poison. Since these attributes are derived from said analysis, then it makes sense then they have the same "X is poison or foo(X) holds" semantics.
I agree that certain attributes are different, like dereferenceable. It is useless if the value might be poison as well. Though we may go with the same semantics and then require the noundef attribute to make it useful. Seems like a good way to go as well.

For the partial undef memory access example that Juneyoung gave.. Well, maybe we need to make it UB to dereference a non-deterministic value. Doesn't seem like it's a very useful thing to do, and this non-determinism comes from some previous undefined behavior, so it seems fine to just make dereference of partial undef UB. Simplifies things.

I'll make this patch include the updates in the align attribute first because it helps making fewer patches.
The next patch (operand bundle) will include changes in nonnull/align operand bundles as well.

For the partial undef memory access example that Juneyoung gave.. Well, maybe we need to make it UB to dereference a non-deterministic value. Doesn't seem like it's a very useful thing to do, and this non-determinism comes from some previous undefined behavior, so it seems fine to just make dereference of partial undef UB. Simplifies things.

There was a discussion for this: https://groups.google.com/g/llvm-dev/c/2Qk4fOHUoAE/m/OxZa3bIhAgAJ
This partially undef thing is a bit painful.. :/

For the partial undef memory access example that Juneyoung gave.. Well, maybe we need to make it UB to dereference a non-deterministic value. Doesn't seem like it's a very useful thing to do, and this non-determinism comes from some previous undefined behavior, so it seems fine to just make dereference of partial undef UB. Simplifies things.

There was a discussion for this: https://groups.google.com/g/llvm-dev/c/2Qk4fOHUoAE/m/OxZa3bIhAgAJ
This partially undef thing is a bit painful.. :/

If we allow partial undef pointers to be dereferenced, then it's nearly impossible to derive noundef for pointers. So we will have to rely on frontends to annotate. So the question is whether there's any benefit of allowing such accesses and whether that breaks any frontend if not. Can't think of any reason to allow them and that thread doesn't really provide any either AFAICT.

aqjune updated this revision to Diff 311540.Dec 14 2020, 3:39 AM

update langref align

I tried to fix Attributor to propagate nonnull/align only when there is noundef, but couldn't: https://godbolt.org/z/Yzsc14
Where should I start?

If we allow partial undef pointers to be dereferenced, then it's nearly impossible to derive noundef for pointers. So we will have to rely on frontends to annotate. So the question is whether there's any benefit of allowing such accesses and whether that breaks any frontend if not. Can't think of any reason to allow them and that thread doesn't really provide any either AFAICT.

I agree that any frontend won't emit undef when compiling a valid program, but I think the point is that the semantics requires a transformation that hoists a memory instruction to prove that the pointer isn't partially undef, which isn't done by LLVM now.

void f(i64 %offset) {
  p = alloca [16 x i8]
  assume(0 <= offset < 16)
  init(p)
  for (..) {
    v = load p[offset] // if offset is partially undef, this load cannot be hoisted out of the loop
    use(v)
  }
}
aqjune retitled this revision from Allow nonnull attribute to accept poison to Allow nonnull/align attribute to accept poison.Dec 14 2020, 4:03 AM
aqjune edited the summary of this revision. (Show Details)

ping @efriedma

@jdoerfert Would it be okay if I land this after a week? D91480 also uses the semantics described in this patch as well.

jdoerfert accepted this revision.Tue, Jan 12, 7:52 AM

For the attributor, add the tests that don't work with FIXMEs into the attributor test folder please, potentially just into the respective test files, e.g. nonnull.ll

I proposed some wording changes, but no semantic change.

llvm/docs/LangRef.rst
1163–1169

poison is not aligned, and can well be null. Let's state the intent and not "include" poison as valid input. It is not and that is why it triggers the output to be poison.

1228–1233
aqjune updated this revision to Diff 316308.Tue, Jan 12, 7:13 PM

Address comments, leave tests with FIXME

aqjune marked 2 inline comments as done.Tue, Jan 12, 7:13 PM
This revision was landed with ongoing or failed builds.Tue, Jan 19, 6:54 PM
This revision was automatically updated to reflect the committed changes.