This is an archive of the discontinued LLVM Phabricator instance.

[CaptureTracking] Don't let comparisons against null escape inbounds pointers
ClosedPublic

Authored by aykevl on Mar 31 2019, 7:10 AM.

Details

Summary

Pointers that are in-bounds (either through dereferenceable_or_null or thorough a getelementptr inbounds) cannot be captured with a comparison against null. There is no way to construct a pointer that is still in bounds but also NULL.

Diff Detail

Repository
rL LLVM

Event Timeline

aykevl created this revision.Mar 31 2019, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2019, 7:10 AM
aykevl added a comment.EditedMar 31 2019, 7:20 AM

Some more background:

I'm developing a compiler in which I use lots of compiler-inserted null checks. It also does escape analysis, which relies on the nocapture flag for cheap interprocedural analysis.
The main target of this compiler is microcontrollers which don't have a MMU (or even MPU) and therefore don't trap on a read from address 0. This means that it is necessary to explicitly compare against null (fault maps cannot be used). It also uses escape analysis to avoid heap allocation, which is critical for performance. However, I found that since inserting these null checks, escape analysis suffers badly because all icmp instructions are seen as captures.
I could implement interprocedural escape analysis or do a manual pass that inserts nocapture (basically copying CaptureTracking), but I would like to rely on the functionattrs pass for this if at all possible.

I wrote a partial fix here, but it is not complete as it does not work across functions. This patch makes it possible for it to work across functions:
https://github.com/tinygo-org/tinygo/pull/250

sanjoy requested changes to this revision.Mar 31 2019, 10:28 AM

Things that are UB in C++ are not necessarily UB in LLVM. For instance you could capture a pointer by doing something like this:

int global;

void f() {
  int* ptr = new int;

  // Probably UB in C++?  But not UB in LLVM if the GEP is not
  // an inbounds GEP.
  if ((ptr+4096) == &global) {
    intptr_t addr = ((intptr_t)&global) - 4096;
    escape((int*)addr)
  }
}

However, I found that since inserting these null checks, escape analysis suffers badly because all icmp instructions are seen as captures.motivation:

A more extreme version of the example above could be written to just use a comparison against null:

void f() {
  int* ptr = new int;
  if ((ptr-4096) == nullptr) {
    escape((int*)4096)
  }
}
This revision now requires changes to proceed.Mar 31 2019, 10:28 AM
aykevl added a comment.Apr 3 2019, 4:28 PM

Oh, I see. Thank you for explaining the actual behavior of getelementptr.

So I'm thinking about how to implement this feature. Would it be possible to infer the nocapture flag for the ptr argument in the following function, if it is also marked with dereferenceable_or_null(4)?

int deref(int *ptr) {
    if (ptr == NULL) {
        abort(); // does not return
    }
    return *ptr;
}

My understanding is that dereferenceable_or_null(N) implies that it is either a valid pointer (no out-of-bounds GEP allowed) or null, meaning that icmp cannot be used to let the pointer escape with a comparison against null.
In other words, it seems to me that if a pointer comes from a dereferenceable_or_null(N) argument (possibly through bitcast, inbounds GEP, addrspacecast, or other 'inbounds' operations) an icmp against null cannot let this pointer escape.

My goal is to make CaptureTracking (used in the functionattrs pass) powerful enough to be usable for escape analysis. The source language is Go, which has pointer semantics similar to Java.

aykevl updated this revision to Diff 196616.Apr 25 2019, 5:44 AM
aykevl retitled this revision from [CaptureTracking] Pointer comparisons cannot escape to [CaptureTracking] Don't let comparisons against nil escape inbounds pointers.
aykevl edited the summary of this revision. (Show Details)

Update CaptureTracking to be more conservative and add FunctionAttrs tests.

aykevl retitled this revision from [CaptureTracking] Don't let comparisons against nil escape inbounds pointers to [CaptureTracking] Don't let comparisons against null escape inbounds pointers.Apr 25 2019, 5:49 AM
aykevl set the repository for this revision to rL LLVM.
hfinkel added inline comments.May 2 2019, 9:51 AM
lib/Analysis/CaptureTracking.cpp
352

I think that you want to stripPointerCasts here (and an associated test case).

aykevl updated this revision to Diff 197971.May 3 2019, 5:53 AM
  • updated to allow bitcasts
  • include bitcasts in tests
  • other small improvements to the test
aykevl marked 2 inline comments as done.May 3 2019, 5:55 AM
aykevl added inline comments.
lib/Analysis/CaptureTracking.cpp
352

Good idea! I didn't think of that - this is my first patch to LLVM in the LLVM core. Updated the patch and improved tests.

hfinkel accepted this revision.May 3 2019, 9:51 AM

LGTM

aykevl added a comment.May 3 2019, 5:01 PM

Thank you, so I guess this is now ready to land?

Can someone commit this patch? I do not have commit access.

jdoerfert requested changes to this revision.May 5 2019, 12:09 PM

We need to address the comments before we commit this.

Sorry that I only look at this so late in the game!

lib/Analysis/CaptureTracking.cpp
342

Strip pointer casts is actually/unfortunately too strong because it strips address space casts:
See also: http://lists.llvm.org/pipermail/llvm-dev/2018-December/128423.html

345

I do not understand the comment above. A GEP inbounds can have any value, including 0, or am I missing a precondition here? Could you also describe why non-inbounds GEPs are not eligible?

353

Why is this specific to arguments? We have other pointers that can be dereferenceable_or_null, see Value::getPointerDereferenceableBytes

This revision now requires changes to proceed.May 5 2019, 12:09 PM

Thinking about this further, I guess what you need is:

  1. A scope (=function) in which null is not a valid pointer (see NullPointerIsDefined(Function *, unsigned AddrSpace))
  2. A dereferenceable_or_null pointer Ptr (see Value::getPointerDereferenceableBytes)

Now comparing Ptr agains null should be noescape.

My reasoning (which has to be checked!):

The idea is that you cannot arbitrarily manipulate the pointer while keeping the two properties.
If Ptr is null to begin with, we are fine (I think).
If it is not, it points to an allocation with extend [start:end) where end - start equals the dereferenceable bytes.
Due to property 1) we know null is not contained in [start:end).
That means we cannot manipulate a pointer such that 2) holds and a null comparison would leak information because it always has to be false.

Now the GEP inbounds case is in my opinion either a special case of the above or not a valid reason for noescape.
I walk you through my thought process (with names for each step so we can discuss):
A) Assuming we have a GEP inbounds named GI that is compared against null.
B) GI is either a "valid pointer" into an allocation, a pointer to the end of the allocation (the famous "one past the end"), or poison if it is out-of-bounds or the base wasn't an allocation at all.
C) In the poison case we could argue that the comparison can be replaced by a constant, e.g., true, which eliminates the use and the possibility to escape.
D) In the "valid pointer" case, we know GI is dereferenceable and the reasoning for such pointer should be applicable.
E) In the "one past the end" case, we can either not justify noescape or, what I think, we can argue that the pointer cannot be null as the addition "with infinitely precise signed arithmetic" would not overflow. If it is not null we can use the final step of the dereferenceable reasoning again.

Thoughts?

sanjoy added a comment.May 5 2019, 3:58 PM
  1. A scope (=function) in which null is not a valid pointer (see NullPointerIsDefined(Function *, unsigned AddrSpace))
  2. A dereferenceable_or_null pointer Ptr (see Value::getPointerDereferenceableBytes)

Now comparing Ptr agains null should be noescape.

My reasoning (which has to be checked!):

The idea is that you cannot arbitrarily manipulate the pointer while keeping the two properties.
If Ptr is null to begin with, we are fine (I think).
If it is not, it points to an allocation with extend [start:end) where end - start equals the dereferenceable bytes.
Due to property 1) we know null is not contained in [start:end).
That means we cannot manipulate a pointer such that 2) holds and a null comparison would leak information because it always has to be false.

One pedantic case is when end is null:

void f(int32* /*deref_or_null(4)*/ ptr) {
  if (ptr == null) {
    int32* ptr_leaked = (int32*)-4;
    *ptr_leaked = 42;
  }
}

int main() {
  int* p = new int;
  // p happens to numerically be -4
  f(p);
  print(*p); // Should print 42
}

However, this example is kind of iffy because in general the deref_or_null attribute is not valid since p in main could have been something other than -4. But *if* the execution has p == -4 then the attribute is well defined.

lib/Analysis/CaptureTracking.cpp
334

auto * is probably appropriate here. Same for GEP and A below.

jdoerfert added a comment.EditedMay 5 2019, 6:33 PM

Should we interpret your comment as "the reasoning seems sound", or did you only want to point out the special case which I discuss below?

@sanjoy wrote:
However, this example is kind of iffy because in general the deref_or_null attribute is not valid since p in main could have been something other than -4. But *if* the execution has p == -4 then the attribute is well defined.

I thought about this case, though I didn't spell it out explicitly in my earlier comment (the part: as the addition "with infinitely precise signed arithmetic" would not overflow).

Let me start by stating that I interpret pointers as unsigned values which, under certain conditions, are allowed to overflow. (Maybe that is in itself a problem in my thinking.)
So for me p is not -4 but always some positive value, probably 2^N - 1 - 4 where N is the pointer bit-width. If that is acceptable, then the inbounds wording in the lang ref
prevents the problem you describe. Any addition with a positive value, e.g., the one you choose (2^N - 1 - 4) + 4, cannot be null without an overflow and inbounds prohibits this overflow.

jdoerfert added inline comments.May 5 2019, 11:01 PM
lib/Analysis/CaptureTracking.cpp
345

I answered my questions in a later comment. The condition that 0 is not a valid address is missing here.

Should we interpret your comment as "the reasoning seems sound", or did you only want to point out the special case which I discuss below?

You should interpret it as a drive by comment. :) I'm fine if you've thought about the case I suggest and are convinced that this change is sound nevertheless.

@sanjoy wrote:
However, this example is kind of iffy because in general the deref_or_null attribute is not valid since p in main could have been something other than -4. But *if* the execution has p == -4 then the attribute is well defined.

I thought about this case, though I didn't spell it out explicitly in my earlier comment (the part: as the addition "with infinitely precise signed arithmetic" would not overflow).

Let me start by stating that I interpret pointers as unsigned values which, under certain conditions, are allowed to overflow. (Maybe that is in itself a problem in my thinking.)
So for me p is not -4 but always some positive value, probably 2^N - 1 - 4 where N is the pointer bit-width. If that is acceptable, then the inbounds wording in the lang ref
prevents the problem you describe. Any addition with a positive value, e.g., the one you choose (2^N - 1 - 4) + 4, cannot be null without an overflow and inbounds prohibits this overflow.

I'm not sure if inbounds play into this, and I also noticed my example has typos and also is needlessly confusing; corrected example below:

int** global_ptr;

void unknown_func();

void foo(int32* /*deref_or_null(4)*/ ptr) {
  if (ptr == null) {
    int32* ptr_leaked = (int32*)(intptr_t)-4;
    *global_ptr = ptr_leaked;
  }
}

void bar() {
  int32* p = new int32;
  // p happens to numerically be -4 == 2^64-4
  int32* p_with_offset = p + 1; // non-inbounds GEP, evaluates to null
  foo(p_with_offset);

  // p has been escaped and we can't DCE the first store:
  *p = 10;
  unknown_func();
  *p = 20;
}

Now lets say we're only looking at foo. I think the current code will conclude that foo does not capture ptr (we're comparing a deref_or_null pointer against null) and we will (lets assume) mark ptr as nocapture which I believe is incorrect, modulo the iffy bit I mentioned before.

Should we interpret your comment as "the reasoning seems sound", or did you only want to point out the special case which I discuss below?

You should interpret it as a drive by comment. :) I'm fine if you've thought about the case I suggest and are convinced that this change is sound nevertheless.

These things are always tricky. It's better if multiple ppl think about the corner cases and implications :)

@sanjoy wrote:
However, this example is kind of iffy because in general the deref_or_null attribute is not valid since p in main could have been something other than -4. But *if* the execution has p == -4 then the attribute is well defined.

I thought about this case, though I didn't spell it out explicitly in my earlier comment (the part: as the addition "with infinitely precise signed arithmetic" would not overflow).

Let me start by stating that I interpret pointers as unsigned values which, under certain conditions, are allowed to overflow. (Maybe that is in itself a problem in my thinking.)
So for me p is not -4 but always some positive value, probably 2^N - 1 - 4 where N is the pointer bit-width. If that is acceptable, then the inbounds wording in the lang ref
prevents the problem you describe. Any addition with a positive value, e.g., the one you choose (2^N - 1 - 4) + 4, cannot be null without an overflow and inbounds prohibits this overflow.

I'm not sure if inbounds play into this, and I also noticed my example has typos and also is needlessly confusing; corrected example below:

int** global_ptr;

void unknown_func();

void foo(int32* /*deref_or_null(4)*/ ptr) {
  if (ptr == null) {
    int32* ptr_leaked = (int32*)(intptr_t)-4;
    *global_ptr = ptr_leaked;
  }
}

void bar() {
  int32* p = new int32;
  // p happens to numerically be -4 == 2^64-4
  int32* p_with_offset = p + 1; // non-inbounds GEP, evaluates to null
  foo(p_with_offset);

  // p has been escaped and we can't DCE the first store:
  *p = 10;
  unknown_func();
  *p = 20;
}

Now lets say we're only looking at foo. I think the current code will conclude that foo does not capture ptr (we're comparing a deref_or_null pointer against null) and we will (lets assume) mark ptr as nocapture which I believe is incorrect, modulo the iffy bit I mentioned before.

I see your point. My initial argumentation is problematic here: "If Ptr is null to begin with, we are fine (I think)."
Due to the information leakage when a modified pointer is compared to null, which can actually be null, capturing is possible. Though, one basically has to guess the pointer. Idk. if it is useful/problematic to treat that as no-capture. If it is problematic, then we need to require dereferenceable (without the or_null part).

Btw. GEP inbounds (with a non-null offset) transition from dereferenceable_or_null to dereferenceable, though idk if there is already logic for that upstream.

I again think dereferenceable_or_null is fine because you basically have a single shot to guess the pointer by making it null:

Any guess (with an offset) that keeps it in the dereferenceable part is useless as it cannot be null.
Any guess that brings it outside the dereferenceable part which is not null will trigger undefined behavior (IMHO).
Left is only the single guess that is just right which I would not count as escaping because the program could just as well
take the guessed value and pretend it is the pointer value without the check.

Does this make sense?

If D61607 goes in, you can use Value::stripPointerCasts(/* KeepBitPattern */ true) without modifying your tests, etc. it might be worth a negative tests with an address space cast though.

Left is only the single guess that is just right which I would not count as escaping because the program could just as well take the guessed value and pretend it is the pointer value without the check.

This argument makes sense. But I'd like to see better definitions of nocapture and dereferenceable_or_null in LangRef which make this a little more obvious.

Left is only the single guess that is just right which I would not count as escaping because the program could just as well take the guessed value and pretend it is the pointer value without the check.

This argument makes sense. But I'd like to see better definitions of nocapture and dereferenceable_or_null in LangRef which make this a little more obvious.

Agreed. I was actually talking to @chandlerc about the current nocapture semantic at EuroLLVM because I want/introduced nocapture-maybe-returned for the Attributor in D59922.
It is also not very well specified how "undefined" it is to have something neither null nor dereferenceable where an "dereferenceable_or_null(X > 0)" was expected. In my above reasoning, I simply made it "very undefined".

Long story short, I'll come up with some LangRef patches soonish to clarify things I think are underspecified. (There were multiple situations already, especially while working on the Attributor, where I made assumptions based on what I think the rest of LLVM does. Now with GSoC students working on the Attributor, I will also get input from "outside" people which is probably good as well.)

sanjoy added a comment.May 6 2019, 9:25 PM

I again think dereferenceable_or_null is fine because you basically have a single shot to guess the pointer by making it null:

Any guess (with an offset) that keeps it in the dereferenceable part is useless as it cannot be null.

Agreed.

Any guess that brings it outside the dereferenceable part which is not null will trigger undefined behavior (IMHO).

Agreed.

Left is only the single guess that is just right which I would not count as escaping because the program could just as well
take the guessed value and pretend it is the pointer value without the check.

I don't think it could have done the same without the check. One way to think about this is foo and bar are "colluding", foo knows that it will get a pointer that is at offset 4 from a valid pointer, so *if* it gets null then it knows that -4 is a valid pointer. This is just a more stylized way of doing:

int* ptr = ...;
int* ptr2 = (int*) 0x42000;
if (ptr == ptr2) {  // Escapes ptr
  use ptr2;
}

Left is only the single guess that is just right which I would not count as escaping because the program could just as well
take the guessed value and pretend it is the pointer value without the check.

I don't think it could have done the same without the check. One way to think about this is foo and bar are "colluding", foo knows that it will get a pointer that is at offset 4 from a valid pointer, so *if* it gets null then it knows that -4 is a valid pointer. This is just a more stylized way of doing:

int* ptr = ...;
int* ptr2 = (int*) 0x42000;
if (ptr == ptr2) {  // Escapes ptr
  use ptr2;
}

I don't see it and your example misses the very important dereferenceable_or_null part.

Even if they are colluding, the behavior is undefined if the "guessed" (= modified) pointer is outside the dereferenceable range and not null.
So, assuming the pointer is initially not known and not null. Furthermore, it is B bytes dereferenceable and O bytes "away" from null.
All checks with a positive offset <= B are, as mentioned before, useless.
Any check with an offset > B that is not equivalent to the O offset will cause UB.
Only "guessing" the O offset will not cause UB. Now if you could do that, you could also not do the comparison at all and simply go with ptr + O.

So I don't see how there can be any guessing, colluding, or similar thing without triggering UB almost inevitable or knowing the pointer value beforehand.

I don't see it and your example misses the very important dereferenceable_or_null part.

Even if they are colluding, the behavior is undefined if the "guessed" (= modified) pointer is outside the dereferenceable range and not null.
So, assuming the pointer is initially not known and not null. Furthermore, it is B bytes dereferenceable and O bytes "away" from null.
All checks with a positive offset <= B are, as mentioned before, useless.
Any check with an offset > B that is not equivalent to the O offset will cause UB.
Only "guessing" the O offset will not cause UB. Now if you could do that, you could also not do the comparison at all and simply go with ptr + O.

I don't think this is true. bar could have allocated a buffer of size 1000, which happened to end up at address 42000 (numerically) and then (legally, without UB) passed in 42004 to foo (42004 is dereferenceable(4)). In such a case the null check would not pass and there would not be a valid object at -4.

The "collusion" is that whatever pointer foo gets, that pointer minus 4 is a valid pointer. If it got null (as is allowed by deref_or_null) then -4 is a valid pointer, if it got 100 then 96 is a valid pointer etc. This sort of invariant is not that weird; even in the LLVM codebase llvm::User pointers point into the "middle" of an object by construction.

So I don't see how there can be any guessing, colluding, or similar thing without triggering UB almost inevitable or knowing the pointer value beforehand.

I don't see it and your example misses the very important dereferenceable_or_null part.

Even if they are colluding, the behavior is undefined if the "guessed" (= modified) pointer is outside the dereferenceable range and not null.
So, assuming the pointer is initially not known and not null. Furthermore, it is B bytes dereferenceable and O bytes "away" from null.
All checks with a positive offset <= B are, as mentioned before, useless.
Any check with an offset > B that is not equivalent to the O offset will cause UB.
Only "guessing" the O offset will not cause UB. Now if you could do that, you could also not do the comparison at all and simply go with ptr + O.

I don't think this is true. bar could have allocated a buffer of size 1000, which happened to end up at address 42000 (numerically) and then (legally, without UB) passed in 42004 to foo (42004 is dereferenceable(4)). In such a case the null check would not pass and there would not be a valid object at -4.

The "collusion" is that whatever pointer foo gets, that pointer minus 4 is a valid pointer. If it got null (as is allowed by deref_or_null) then -4 is a valid pointer, if it got 100 then 96 is a valid pointer etc. This sort of invariant is not that weird; even in the LLVM codebase llvm::User pointers point into the "middle" of an object by construction.

So I don't see how there can be any guessing, colluding, or similar thing without triggering UB almost inevitable or knowing the pointer value beforehand.

Could you explain this again with example code? I don't think I understand what you are saying, sorry.

sanjoy added a comment.May 7 2019, 3:05 PM

Could you explain this again with example code? I don't think I understand what you are saying, sorry.

void foo(int32* /*deref_or_null(4)*/ ptr) {
  // Here "I" is the inner dialogue of function itself. :P
  //
  // I know that ptr-4 is a valid pointer so I can do this:

  if (ptr == null) {
    // It is not necessary that ptr == null.  E.g. it isn't null when bar_1 calls
    // me.  So the check above is necessary.
    int32* ptr_leaked = (int32*)(intptr_t)-4;
    *global_ptr = ptr_leaked;
  }

  // I could have similarly done this:
  //
  // if (ptr == 0x424204) {
  //    int32* ptr_leaked = (int32*)(intptr_t)0x424200;
  //    *global_ptr = ptr_leaked;
  // }
  //
  // in the same spirit.
}

void bar_0() {
  int32* p = new int32;
  // p happens to numerically be -4 == 2^64-4
  int32* p_with_offset = p + 1; // non-inbounds GEP, evaluates to null
  foo(p_with_offset);

  // p has escaped
}

void bar_1() {
  int32* p = new int32[1000];
  // p is 0xff00
  int32* p_with_offset = p + 1; // p_with_offset is 0xff04
  foo(p_with_offset);

  // p has not escaped
}
jdoerfert added a comment.EditedMay 7 2019, 4:08 PM

To get the escape case you have to have guessed the correct offset from the initial pointer value to null.

foo(p_with_offset); in void bar_0() would be UB if you would have chosen any offset not in {0,4} (both bytes). The first because the dereferenceability property is fulfilled, the second because it happens to result in a null pointer.
You basically picked offset = 0 - ptr and then you can make it escape through the check. But if you can pick offset that way, you do not need the check since you have to know ptr.
I think that the "one past the end" pointer is null in this example is confusing and coincidental, e.g., shift the allocation 4 bytes down to make the valid offsets {0, 8} with everything else the same. The invariant in foo (first comment) only holds for the particular value of ptr in bar_0 and can therefore not be exploited.

The bar_1 example just shows that picking an offset which will fulfill the dereferenceable property will not leak information (it cannot be null). Note that p_with_offset in bar_0 does not fulfill the dereferenceability property!

sanjoy added a comment.May 7 2019, 4:45 PM

To get the escape case you have to have guessed the correct offset from the initial pointer value to null.

foo(p_with_offset); in void bar_0() would be UB if you would have chosen any offset not in {0,4} (both bytes). The first because the dereferenceability property is fulfilled, the second because it happens to result in a null pointer.

Agreed.

You basically picked offset = 0 - ptr and then you can make it escape through the check. But if you can pick offset that way, you do not need the check since you have to know ptr.

Maybe we're disagreeing on what it means to "guess" a value. I don't think transforming:

void foo(int x) {
  print(x);
}

to

void foo(int x) {
  if (x == 42) {
    print(42);
  } else {
    print(x);
  }
}

counts as "guessing" x. You could call it value profiling plus speculative optimization but that's a standard optimization technique.

I think that the "one past the end" pointer is null in this example is confusing and coincidental, e.g., shift the allocation 4 bytes down to make the valid offsets {0, 8} with everything else the same. The invariant in foo (first comment) only holds for the particular value of ptr in bar_0 and can therefore not be exploited.

Maybe another way of thinking about this can be more productive. If we focus on bar_0 only:

int** global_ptr;

void foo(int32* /*deref_or_null(4)*/ ptr) {
  if (ptr == null) {
    int32* ptr_leaked = (int32*)(intptr_t)-4;
    *global_ptr = ptr_leaked;
  }
}

void bar_0() {
  int32* p = new int32;
  // p happens to numerically be -4 == 2^64-4
  int32* p_with_offset = p + 1; // non-inbounds GEP, evaluates to null
  foo(p_with_offset);

  // XXX

  print(p);
}

then under the assumption that at runtime the program prints 2^64-4.

  1. Is the program well defined?
  2. Has p escaped by the time execution reaches point XXX?

If (1) and (2) are true, given this change, will we correctly infer that these facts?
If at least one of (1) and (2) are false, then which one is false and why?
Or is the question ill posed in some way?

One mode of reasoning that could invalidate this example is that p *could have been* something other than 2^64-4 which would invoke UB. This UB would both let us pretend that foo does not escape p, and also print 2^64-4 even though p was not 2^64-4 in the execution we chose. This seems a bit subtle though, and I'm not entirely sure this is sound reasoning.

The bar_1 example just shows that picking an offset which will fulfill the dereferenceable property will not leak information (it cannot be null). Note that p_with_offset in bar_0 does not fulfill the dereferenceability property!

Yes, but it does fulfill the "null" property.

To get the escape case you have to have guessed the correct offset from the initial pointer value to null.

foo(p_with_offset); in void bar_0() would be UB if you would have chosen any offset not in {0,4} (both bytes). The first because the dereferenceability property is fulfilled, the second because it happens to result in a null pointer.

Agreed.

You basically picked offset = 0 - ptr and then you can make it escape through the check. But if you can pick offset that way, you do not need the check since you have to know ptr.

Maybe we're disagreeing on what it means to "guess" a value. I don't think transforming:

void foo(int x) {
  print(x);
}

to

void foo(int x) {
  if (x == 42) {
    print(42);
  } else {
    print(x);
  }
}

counts as "guessing" x. You could call it value profiling plus speculative optimization but that's a standard optimization technique.

This is for sure OK, no questions asked. You can even compare it against any other value (constant or not) and specialize afterwards.
For our discussion, only the comparison against null is interesting though.

I think that the "one past the end" pointer is null in this example is confusing and coincidental, e.g., shift the allocation 4 bytes down to make the valid offsets {0, 8} with everything else the same. The invariant in foo (first comment) only holds for the particular value of ptr in bar_0 and can therefore not be exploited.

Maybe another way of thinking about this can be more productive. If we focus on bar_0 only:

int** global_ptr;

void foo(int32* /*deref_or_null(4)*/ ptr) {
  if (ptr == null) {
    int32* ptr_leaked = (int32*)(intptr_t)-4;
    *global_ptr = ptr_leaked;
  }
}

void bar_0() {
  int32* p = new int32;
  // p happens to numerically be -4 == 2^64-4
  int32* p_with_offset = p + 1; // non-inbounds GEP, evaluates to null
  foo(p_with_offset);

  // XXX

  print(p);
}

then under the assumption that at runtime the program prints 2^64-4.

  1. Is the program well defined?
  2. Has p escaped by the time execution reaches point XXX?

Under the assumption, I'd say yes and yes.

If (1) and (2) are true, given this change, will we correctly infer that these facts?
If at least one of (1) and (2) are false, then which one is false and why?
Or is the question ill posed in some way?

The problem is the assumption. It encodes the offset from null (here 4) that makes the program well defined. (The same offset as hard coded in the program.)
If we start without the assumption, the answers change to:

  1. No, except if p happens to be 2^64-4.
  2. No, because we could simply use 2^64-4 as a "copy" of p at XXX with exactly the same chance of having a conforming program (run). Thus running it and letting the "escape" happen through the null comparison is no different from "guessing" the correct offset from p to null. (I think I do a really bad job at explaining what I mean...)

One mode of reasoning that could invalidate this example is that p *could have been* something other than 2^64-4 which would invoke UB. This UB would both let us pretend that foo does not escape p, and also print 2^64-4 even though p was not 2^64-4 in the execution we chose. This seems a bit subtle though, and I'm not entirely sure this is sound reasoning.

I don't think so, p has a value depending on which UB is caused or not.

The bar_1 example just shows that picking an offset which will fulfill the dereferenceable property will not leak information (it cannot be null). Note that p_with_offset in bar_0 does not fulfill the dereferenceability property!

Yes, but it does fulfill the "null" property.

Agreed, but again, only because the hardcoded offset (4bytes) matches the runtime offset from 0.

aqjune added a subscriber: aqjune.May 8 2019, 10:25 AM

Hello,

One mode of reasoning that could invalidate this example is that p *could have been* something other than 2^64-4 which would invoke UB. This UB would both let us pretend that foo does not escape p, and also print 2^64-4 even though p was not 2^64-4 in the execution we chose. This seems a bit subtle though, and I'm not entirely sure this is sound reasoning.

(sorry for my sudden interruption :) I think this reasoning makes sense, but to say why it makes sense a few minor details should be explained.
We can say that there are two kinds of nondeterminisms -
First one is from an external input. After scanf("%d", &x), we can say that the value of x is 'nondeterministically' chosen.
However, compiler cannot conveniently choose what x is; for example, it is invalid to assume that x == 1 after scanf and optimize printf("%d", x) to print("1").
Second one is the nondeterminism coming from the semantics of language. For example, given a C statement f(e1, e2) where e1 and e2 are expressions, the order of evaluation of e1 and e2 is nondeterministic. In this case, it is allowed for compiler to choose which one to execute.
This is why the statement 'a racy program has undefined behavior' makes sense. In languages like C, the order of interleaving of instructions is not given by an external identity (which would be something like a stream of scanf getting which thread to execute next from a process scheduler), but it is nondeterministic by its semantics. If there exists a racy execution among them, compiler can just choose the execution and say that 'the source program has undefined behavior, so I'm going to compile it into anything'.
I think the story is similar here. We can define that the address of allocation is nondeterministically chosen (like the case of a racy program), so LLVM can pick the most problematic one, and say that the source program already has undefined behavior. :)

Hello,

One mode of reasoning that could invalidate this example is that p *could have been* something other than 2^64-4 which would invoke UB. This UB would both let us pretend that foo does not escape p, and also print 2^64-4 even though p was not 2^64-4 in the execution we chose. This seems a bit subtle though, and I'm not entirely sure this is sound reasoning.

(sorry for my sudden interruption :)

That is what I did to re-start the discussion on this thread as well...

I think this reasoning makes sense, but to say why it makes sense a few minor details should be explained.
[...]
I think the story is similar here. We can define that the address of allocation is nondeterministically chosen (like the case of a racy program), so LLVM can pick the most problematic one, and say that the source program already has undefined behavior. :)

Cool. @sanjoy, are we in agreement?

To recap: What we need to verify a pointer comparison agains null is noescape would be:

  1. A dereferenceable pointer (see Value::getPointerDereferenceableBytes), or
  2. A dereferenceable_or_null pointer Ptr (see Value::getPointerDereferenceableBytes) in a scope (=function) in which null is not a valid pointer (see NullPointerIsDefined(Function *, unsigned AddrSpace))

I don't think I'd look at it in terms of a nondeterministic choice. I'm not sure that really solves the issue, anyway, because the program flow could potentially constrain the non-determinism.

Really, I think this discussion is going in circles because we don't have a precise definition of what it means to capture a pointer. "Does not make any copies of the pointer" is unclear.

I spent a few minutes trying to formulate a more precise rule, but the result I came up with is a mess, so I won't post it. That said, I think the key here is that we have to make sure whatever definition we use allows GVN-like transforms to work: replacing a use of an integer value with a use of an equivalent integer value is legal, and replacing a use of a pointer value with a use of another pointer value with the same raw bits and basis is legal. So an icmp doesn't count as a "copy" if we can compute the result some other way.

I don't think I'd look at it in terms of a nondeterministic choice. I'm not sure that really solves the issue, anyway, because the program flow could potentially constrain the non-determinism.

Really, I think this discussion is going in circles because we don't have a precise definition of what it means to capture a pointer. "Does not make any copies of the pointer" is unclear.

I spent a few minutes trying to formulate a more precise rule, but the result I came up with is a mess, so I won't post it. That said, I think the key here is that we have to make sure whatever definition we use allows GVN-like transforms to work: replacing a use of an integer value with a use of an equivalent integer value is legal, and replacing a use of a pointer value with a use of another pointer value with the same raw bits and basis is legal. So an icmp doesn't count as a "copy" if we can compute the result some other way.

For all purposes so far we interpreted "Does not make any copies of the pointer" as "you cannot reproduce the bit pattern" and we use it to argue that the memory cannot be altered by outside side effects. Allowing arbitrary icmp sequences obviously allow to copy the bits (if have to one by one). Thus, icmp captures in general. icmp against null with some conditions on the pointer (see https://reviews.llvm.org/D60047#1495488) cannot really copy a single bit without causing UB. That is what we discuss here (the conditions) and what we need to make generalize the proposed functionality.

@aykevl If you address the inline comments you can probably commit and close this review if you want. The special cases you encoded seem to be sound. If you want you can also generalize the patch and we review that version again. Please let me know what you intend to do.

sanjoy added a comment.May 9 2019, 9:41 PM

Hello,

One mode of reasoning that could invalidate this example is that p *could have been* something other than 2^64-4 which would invoke UB. This UB would both let us pretend that foo does not escape p, and also print 2^64-4 even though p was not 2^64-4 in the execution we chose. This seems a bit subtle though, and I'm not entirely sure this is sound reasoning.

(sorry for my sudden interruption :)

That is what I did to re-start the discussion on this thread as well...

I think this reasoning makes sense, but to say why it makes sense a few minor details should be explained.
[...]
I think the story is similar here. We can define that the address of allocation is nondeterministically chosen (like the case of a racy program), so LLVM can pick the most problematic one, and say that the source program already has undefined behavior. :)

Cool. @sanjoy, are we in agreement?

Yes, this sounds fine.

aykevl updated this revision to Diff 200152.May 18 2019, 8:57 AM
aykevl marked an inline comment as done.
  • replace some types with auto
  • update comment for getelementptr inbounds

I've updated the code, but there are a few things I'm not so sure about:

  • It seems to me that comparisons against null might escape a pointer if the function is marked "null-pointer-is-valid"? But at the same time, that seems to be equivalent to comparing against a raw pointer (for example, ptr == (uint32*)0x1000) in a function that is not marked "null-pointer-is-valid" so I don't think it's a special case: if the comparison returns true you already had the pointer (in the form of a constant) so no pointer really escapes here.
  • Can pointers in non-0 namespaces be considered noescape with this change? It might be better to be conservative and just don't apply this optimization to non-0 AS pointers. On the other hand, the same reasoning as in the previous point might apply here.

Also, there is the stripPointerCasts issue that has not yet been resolved, waiting for D61607 to be merged.

lib/Analysis/CaptureTracking.cpp
353

Does Value::getPointerDereferenceableBytes have the same semantics (ignoring the CanBeNull) as dereferenceable_or_null? I see there are quite a few things that make a pointer dereferenceable, like byval. It seems to me that the same reasoning that holds true for dereferenceable_or_null also holds true for any dereferenceable (or null) pointer, but just want to make sure.

Also, where do I get a DataLayout from?

I've updated the code, but there are a few things I'm not so sure about:

Thx!

  • It seems to me that comparisons against null might escape a pointer if the function is marked "null-pointer-is-valid"?

"Correct", see below.

But at the same time, that seems to be equivalent to comparing against a raw pointer (for example, ptr == (uint32*)0x1000) in a function that is not marked "null-pointer-is-valid" so I don't think it's a special case: if the comparison returns true you already had the pointer (in the form of a constant) so no pointer really escapes here.

The discussion we had earlier concluded in the fact that the combination of deref_or_null + null is not valid will give you no-escape. If you rempove any part, it doesn't work. So without "null-pointer-is-valid", null might be valid. And any other address might always be valid. In neither case we can conclude no-escape.

  • Can pointers in non-0 namespaces be considered noescape with this change?

It depends on the namespace, see below.

It might be better to be conservative and just don't apply this optimization to non-0 AS pointers. On the other hand, the same reasoning as in the previous point might apply here.

There is the method to determine if null is valid in a namespace, use that one and it should be fine (I think part of Function).

Also, there is the stripPointerCasts issue that has not yet been resolved, waiting for D61607 to be merged.

Correct, though I hope we will come to a conclusion on that one soon.

lib/Analysis/CaptureTracking.cpp
353

Value::getPointerDereferenceableBytes is what you want here. If it is not (in practice) then it is broken and we need to fix it. byval is a "stronger" version of dereferenceable.

Also, where do I get a DataLayout from?

From the Module: Module::getDataLayout()

aykevl updated this revision to Diff 201464.May 26 2019, 12:48 PM
  • use Value::getPointerDereferenceableBytes instead of checking for the specific attribute dereferenceable_or_null.
  • Only apply this optimization when null pointers are not dereferenceable ("null-pointer-is-valid") and add an associated test.

I think I have addressed all review comments (except for stripPointerCasts which is waiting on D61607 and should hopefully land soon).

@aykevl If you address the inline comments you can probably commit and close this review if you want. The special cases you encoded seem to be sound. If you want you can also generalize the patch and we review that version again. Please let me know what you intend to do.

This special case was a bad regression in my compiler after implementing some safety features required by the Go spec (in a way that does not rely on a MMU) so from my perspective it's done and can be committed (after stripPointerCasts). An old version of this patch caught all the missed optimizations and I'm pretty sure the current patch will do so as well. But if you have ideas how I can generalize this feature in a way that is non-controversial I can certainly do that.

jdoerfert accepted this revision.May 27 2019, 12:41 PM

LGTM.

I think @sanjoy was also happy with the argumentation so this is only blocked by D61607 now.

lib/Analysis/CaptureTracking.cpp
342

Make this ->stripPointerCastsSameRepresentation() after D61607 landed.

I'm sorry I delayed this patch forever. The stripPointerCastsSameRepresentation is now available and you had my LGTM already :)

aykevl updated this revision to Diff 203189.Jun 5 2019, 11:25 AM
  • update to stripPointerCastsSameRepresentation
  • move a single line of code (NFC)
This revision was not accepted when it landed; it landed in state Needs Review.Jun 9 2019, 3:18 AM
This revision was automatically updated to reflect the committed changes.
aykevl added a comment.Jun 9 2019, 3:35 AM

Phabricator says:

This revision was not accepted when it landed

but I believe that's in error because it was not marked as accepted (while it was in practice).

Phabricator says:

This revision was not accepted when it landed

but I believe that's in error because it was not marked as accepted (while it was in practice).

Don't worry about this. It is because there was a blocking review and that reviewer never accepted as an action. Though @sanjoy did accept so all is good.