This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
-
LazyValueInfo.cpp
-
test/Transforms/JumpThreading/
-
Transforms/
-
JumpThreading/
-
nonnull-gep-out-of-bounds.ll

Differential D99642

For non-null pointer checks, do not descend through out-of-bounds GEPs
ClosedPublic

Authored by chill on Mar 31 2021, 1:43 AM.

Download Raw Diff

Details

Reviewers

nikic
lebedev.ri
fhahn

Commits

rGacf3279a037f: For non-null pointer checks, do not descend through out-of-bounds GEPs

Summary

In LazyValueInfoImpl::isNonNullAtEndOfBlock we populate a set of
pointers, known to be non-null at the end of a block (e.g. because we
did a load through them). We then infer that any pointer, based on an
element of this set is non-null as well ("based" here meaning a
non-null pointer is the underlying object). This is incorrect, even if
the base pointer was non-null, the value of a GEP, that lacks the
inbounds` attribute, may be null.

This issue appeared as miscompilation of the following test case:

int puts(const char *);
    
typedef struct iter {
  int *val;
} iter_t;
    
static long distance(iter_t first, iter_t last) {
  long r = 0;
  for (; first.val != last.val; first.val++)
    ++r;
  return r;
}
    
int main() {
  int arr[2] = {0};
  iter_t i, j;
  i.val = arr;
  j.val = arr + 1;
  if (distance(i, j) >= 2)
    puts("failed");
  else
    puts("passed");
}

This fixes PR49662.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

chill created this revision.Mar 31 2021, 1:43 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 31 2021, 1:43 AM

chill requested review of this revision.Mar 31 2021, 1:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2021, 1:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

chill added reviewers: nikic, lebedev.ri.Mar 31 2021, 1:45 AM

alive2 again doesn't agree that non-inbounds GEP is allowed to produce null pointer: https://alive2.llvm.org/ce/z/9wfL5x

In D99642#2660824, @lebedev.ri wrote:

alive2 again doesn't agree that non-inbounds GEP is allowed to produce null pointer: https://alive2.llvm.org/ce/z/9wfL5x

Interesting, but also surprising, especially because the LangRef explicitly calls out GEPs without inbounds to wrap silently?

llvm/lib/Analysis/ValueTracking.cpp
4275 ↗	(On Diff #334378)	According to https://llvm.org/docs/LangRef.html#pointer-aliasing-rules, `inbounds` should not impact the underlying object property IIUC. But probably can only rely on that, if the pointer gets actually dereferenced?
llvm/test/Analysis/ValueTracking/unknown-nonnull-gep-out-of-bounds.ll
16 ↗	(On Diff #334378)	do we need the sub & `ptrtoint` bits here? We also shouldn't the the transform if the index is an arbitrary value, e.g. an argument, right?

In D99642#2660849, @fhahn wrote:

In D99642#2660824, @lebedev.ri wrote:

alive2 again doesn't agree that non-inbounds GEP is allowed to produce null pointer: https://alive2.llvm.org/ce/z/9wfL5x

Interesting, but also surprising, especially because the LangRef explicitly calls out GEPs without inbounds to wrap silently?

Yes.
I think it's another example of alive2 trying to invent a memory model that isn't fit for the real world.

Harbormaster completed remote builds in B96485: Diff 334378.Mar 31 2021, 2:32 AM

I think this should be using Val->stripInBoundsOffsets() rather than modify the getUnderlyingObject() API.

In D99642#2660864, @lebedev.ri wrote:

In D99642#2660849, @fhahn wrote:

In D99642#2660824, @lebedev.ri wrote:

alive2 again doesn't agree that non-inbounds GEP is allowed to produce null pointer: https://alive2.llvm.org/ce/z/9wfL5x

Interesting, but also surprising, especially because the LangRef explicitly calls out GEPs without inbounds to wrap silently?

Yes.
I think it's another example of alive2 trying to invent a memory model that isn't fit for the real world.

Let me try to explain our reasoning:

GEP doesn't change the object of a pointer, just the offset within that object. Having this rule enables a lot of optimizations (e.g. you know "for free" that gep %p, %x and gep %q, %y can't alias if %p/%q are the result of e.g. distinct alloca/malloc, even if you know nothing about %x/%y).
Pointer comparisons need to account for the previous rule plus how we lower pointer comparisons (an integer comparison in assembly). For pointers whose offsets are inbounds, integer comparison == object/offset comparison. If one of the pointers is OOB, then things get complicated because in assembly an OOB pointer may be equal to an inbounds pointer, while in IR we would rather not have that situation to enable optimizations. This lead us to pursue a semantics where the result of a comparison with an OOB ptr is non-deterministic, such that the IR can assume it's always false and assembly can sometimes produce true. The implementation in Alive2 is not complete FYI.

This model has some disadvantages, namely inhibiting some optimizations that LLVM does around null (and only that AFAICT). @aqjune is actively working on this, and testing different models to try to find out a good solution, but I can't say when it'll be ready. Hopefully within a couple of weeks. We are still collecting pointer comparison optimizations done by LLVM ATM (instcombine, SCEV, AA, etc).
This is our list of things that need to be taken into account: https://docs.google.com/document/d/1n2Hqw0bkWK_mUi1xEcK4ruLKE-xxTdQ4426KnWY1nHI/edit?usp=sharing

llvm/lib/Analysis/ValueTracking.cpp
4275 ↗	(On Diff #334378)	Agreed. GEP doesn't change the object even without inbounds.

In D99642#2660952, @nlopes wrote:

In D99642#2660864, @lebedev.ri wrote:

In D99642#2660849, @fhahn wrote:

In D99642#2660824, @lebedev.ri wrote:

alive2 again doesn't agree that non-inbounds GEP is allowed to produce null pointer: https://alive2.llvm.org/ce/z/9wfL5x

Interesting, but also surprising, especially because the LangRef explicitly calls out GEPs without inbounds to wrap silently?

Yes.
I think it's another example of alive2 trying to invent a memory model that isn't fit for the real world.

Let me try to explain our reasoning:

GEP doesn't change the object of a pointer, just the offset within that object. Having this rule enables a lot of optimizations (e.g. you know "for free" that gep %p, %x and gep %q, %y can't alias if %p/%q are the result of e.g. distinct alloca/malloc, even if you know nothing about %x/%y).

I think that's fine for AA, which generally looks at pointers that are dereferenced. It seems a bit unfortunate though that we determine that the pointer is based on object but still may not be de-referenceable if the object is though.

Pointer comparisons need to account for the previous rule plus how we lower pointer comparisons (an integer comparison in assembly). For pointers whose offsets are inbounds, integer comparison == object/offset comparison. If one of the pointers is OOB, then things get complicated because in assembly an OOB pointer may be equal to an inbounds pointer, while in IR we would rather not have that situation to enable optimizations. This lead us to pursue a semantics where the result of a comparison with an OOB ptr is non-deterministic, such that the IR can assume it's always false and assembly can sometimes produce true. The implementation in Alive2 is not complete FYI.

I think we need the capability to express arbitrary pointer arithmetic without the assumption that it cannot point to potentially a different object, e.g. for runtime checks that a pointer does not wrap or go out-of-bounds of a certain object. An answer could of course be that you have to use ptrtoint/inttoptr for that case, but I think in practice this is not what is used today for such cases. (and not what the LangRef suggests, at least as long as the pointer is not dereferenced).

In D99642#2660961, @fhahn wrote:

In D99642#2660952, @nlopes wrote:

In D99642#2660864, @lebedev.ri wrote:

In D99642#2660849, @fhahn wrote:

In D99642#2660824, @lebedev.ri wrote:

alive2 again doesn't agree that non-inbounds GEP is allowed to produce null pointer: https://alive2.llvm.org/ce/z/9wfL5x

Interesting, but also surprising, especially because the LangRef explicitly calls out GEPs without inbounds to wrap silently?

Yes.
I think it's another example of alive2 trying to invent a memory model that isn't fit for the real world.

Let me try to explain our reasoning:

GEP doesn't change the object of a pointer, just the offset within that object. Having this rule enables a lot of optimizations (e.g. you know "for free" that gep %p, %x and gep %q, %y can't alias if %p/%q are the result of e.g. distinct alloca/malloc, even if you know nothing about %x/%y).

I think that's fine for AA, which generally looks at pointers that are dereferenced. It seems a bit unfortunate though that we determine that the pointer is based on object but still may not be de-referenceable if the object is though.

That case only happens if some form of UB happens (e.g., gep inbounds that goes OOB, passing a pointer as argument with some incorrect attribute, etc).
The warning in the alias FAQ is because of pointer replacement AFAIR. Just because 2 pointers compare equal, you can't replace one with the other blindly, because one might be dereferenceable and the other not (because one overflowed inbounds, for example).
Pointer comparisons are just tricky.

Pointer comparisons need to account for the previous rule plus how we lower pointer comparisons (an integer comparison in assembly). For pointers whose offsets are inbounds, integer comparison == object/offset comparison. If one of the pointers is OOB, then things get complicated because in assembly an OOB pointer may be equal to an inbounds pointer, while in IR we would rather not have that situation to enable optimizations. This lead us to pursue a semantics where the result of a comparison with an OOB ptr is non-deterministic, such that the IR can assume it's always false and assembly can sometimes produce true. The implementation in Alive2 is not complete FYI.

I think we need the capability to express arbitrary pointer arithmetic without the assumption that it cannot point to potentially a different object, e.g. for runtime checks that a pointer does not wrap or go out-of-bounds of a certain object. An answer could of course be that you have to use ptrtoint/inttoptr for that case, but I think in practice this is not what is used today for such cases. (and not what the LangRef suggests, at least as long as the pointer is not dereferenced).

That's true. I'm aware of the loop vectorizer that inserts runtime checks to only dispatch to the vectorized loop if some pointers don't alias at runtime. We probably don't want to use ptr2int for this.

In D99642#2660970, @nlopes wrote:

In D99642#2660961, @fhahn wrote:

In D99642#2660952, @nlopes wrote:

[snip]

Let me try to explain our reasoning:

GEP doesn't change the object of a pointer, just the offset within that object. Having this rule enables a lot of optimizations (e.g. you know "for free" that gep %p, %x and gep %q, %y can't alias if %p/%q are the result of e.g. distinct alloca/malloc, even if you know nothing about %x/%y).

I think that's fine for AA, which generally looks at pointers that are dereferenced. It seems a bit unfortunate though that we determine that the pointer is based on object but still may not be de-referenceable if the object is though.

That case only happens if some form of UB happens (e.g., gep inbounds that goes OOB, passing a pointer as argument with some incorrect attribute, etc).
The warning in the alias FAQ is because of pointer replacement AFAIR. Just because 2 pointers compare equal, you can't replace one with the other blindly, because one might be dereferenceable and the other not (because one overflowed inbounds, for example).
Pointer comparisons are just tricky.

Agreed! To circle back to the specific case this patch tries to address: Given what we currently have in the LangRef, I think the transformation LLVM does at the moment (eliminate non-null check) is incorrect and should be fixed. WDYT? But having Alive2 disagree here certainly seems unfortunate.

Pointer comparisons need to account for the previous rule plus how we lower pointer comparisons (an integer comparison in assembly). For pointers whose offsets are inbounds, integer comparison == object/offset comparison. If one of the pointers is OOB, then things get complicated because in assembly an OOB pointer may be equal to an inbounds pointer, while in IR we would rather not have that situation to enable optimizations. This lead us to pursue a semantics where the result of a comparison with an OOB ptr is non-deterministic, such that the IR can assume it's always false and assembly can sometimes produce true. The implementation in Alive2 is not complete FYI.

I think we need the capability to express arbitrary pointer arithmetic without the assumption that it cannot point to potentially a different object, e.g. for runtime checks that a pointer does not wrap or go out-of-bounds of a certain object. An answer could of course be that you have to use ptrtoint/inttoptr for that case, but I think in practice this is not what is used today for such cases. (and not what the LangRef suggests, at least as long as the pointer is not dereferenced).

That's true. I'm aware of the loop vectorizer that inserts runtime checks to only dispatch to the vectorized loop if some pointers don't alias at runtime. We probably don't want to use ptr2int for this.

Yep, there probably are other places as well as IR coming from different frontends.

chill updated this revision to Diff 335826.Apr 7 2021, 8:45 AM

In D99642#2660895, @nikic wrote:

I think this should be using Val->stripInBoundsOffsets() rather than modify the getUnderlyingObject() API.

Thanks, indeed!

llvm/lib/Analysis/ValueTracking.cpp
4275 ↗	(On Diff #334378)	Yeah, it was just my clumsy attempt to reuse the implementation (`InBounds`param was defaulted to `false`).

LGTM, just some suggestions for the test.

llvm/test/Analysis/ValueTracking/unknown-nonnull-gep-out-of-bounds.ll
1 ↗	(On Diff #335826)	This test should be in llvm/test/Transform/JumpThreading.
11 ↗	(On Diff #335826)	I'd pass in an `i64* %a` here -- the alloca case is less straightforward to me (because you'd have to guess the alloca offset to reach null).
21 ↗	(On Diff #335826)	The puts calls don't seem relevant, I'd just replace these with `ret i32 0` and `ret i32 1` or so.

This revision is now accepted and ready to land.Apr 7 2021, 9:34 AM

Harbormaster completed remote builds in B97523: Diff 335826.Apr 7 2021, 9:37 AM

LGTM, thanks!

@nlopes do you have any more thoughts on the difference between LLVM & Alive2 on this topic?

llvm/test/Analysis/ValueTracking/unknown-nonnull-gep-out-of-bounds.ll
4 ↗	(On Diff #335826)	This should also not be needed, right? Otherwise we would need to make sure this only runs when the X86 backend is built.

In D99642#2674663, @fhahn wrote:

LGTM, thanks!

@nlopes do you have any more thoughts on the difference between LLVM & Alive2 on this topic?

Hi, currently the bottleneck is on me due to a few busy things, sorry. I am discussing with Nuno and the conclusion will be shared soon.

In D99642#2674663, @fhahn wrote:

LGTM, thanks!

@nlopes do you have any more thoughts on the difference between LLVM & Alive2 on this topic?

Not yet, sorry. But it's unfair to block this patch as we don't have an ETA yet.
At most, this patch is making things more conservative, not introducing a miscompilation, and so it's ok.

The icmp in the test case can be true only if we assume that pointer comparison is equal to integer comparison. Which is something that LLVM assumes in a few places. Though it's not a consistent design choice. We are still trying to understand the pros and cons of the different solutions.

chill updated this revision to Diff 336071.Apr 8 2021, 5:31 AM

chill marked 4 inline comments as done.

Harbormaster completed remote builds in B97701: Diff 336071.Apr 8 2021, 6:29 AM

I did minor test updates, following reviewers' suggestions, don't think another review is warranted, going to commit.

Closed by commit rGacf3279a037f: For non-null pointer checks, do not descend through out-of-bounds GEPs (authored by chill). · Explain WhyApr 9 2021, 6:09 AM

This revision was automatically updated to reflect the committed changes.

chill added a commit: rGacf3279a037f: For non-null pointer checks, do not descend through out-of-bounds GEPs.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

LazyValueInfo.cpp

2 lines

test/

Transforms/

JumpThreading/

nonnull-gep-out-of-bounds.ll

18 lines

Diff 336426

llvm/lib/Analysis/LazyValueInfo.cpp

Show First 20 Lines • Show All 652 Lines • ▼ Show 20 Lines	static void AddNonNullPointersByInstruction(
}		}
}		}

bool LazyValueInfoImpl::isNonNullAtEndOfBlock(Value Val, BasicBlock BB) {		bool LazyValueInfoImpl::isNonNullAtEndOfBlock(Value Val, BasicBlock BB) {
if (NullPointerIsDefined(BB->getParent(),		if (NullPointerIsDefined(BB->getParent(),
Val->getType()->getPointerAddressSpace()))		Val->getType()->getPointerAddressSpace()))
return false;		return false;

Val = getUnderlyingObject(Val);		Val = Val->stripInBoundsOffsets();
return TheCache.isNonNullAtEndOfBlock(Val, BB, [](BasicBlock *BB) {		return TheCache.isNonNullAtEndOfBlock(Val, BB, [](BasicBlock *BB) {
NonNullPointerSet NonNullPointers;		NonNullPointerSet NonNullPointers;
for (Instruction &I : *BB)		for (Instruction &I : *BB)
AddNonNullPointersByInstruction(&I, NonNullPointers);		AddNonNullPointersByInstruction(&I, NonNullPointers);
return NonNullPointers;		return NonNullPointers;
});		});
}		}

▲ Show 20 Lines • Show All 1,252 Lines • Show Last 20 Lines

llvm/test/Transforms/JumpThreading/nonnull-gep-out-of-bounds.ll

This file was added.

				; RUN: opt -jump-threading -S %s -o - \| FileCheck %s

				define i32 @f(i64* %a, i64 %i) {
				entry:
				store i64 0, i64* %a, align 8
				%p = getelementptr i64, i64* %a, i64 %i
				%c = icmp eq i64* %p, null
				; `%a` is non-null at the end of the block, because we store through it.
				; However, `%p` is derived from `%a` via a GEP that is not `inbounds`, therefore we cannot judge `%p` is non-null as well
				; and must retain the `icmp` instruction.
				; CHECK: %c = icmp eq i64* %p, null
				br i1 %c, label %if.else, label %if.then
				if.then:
				ret i32 0

				if.else:
				ret i32 1
				}