Page MenuHomePhabricator

[InstSimplify] Don't fold gep p, -p to null
ClosedPublic

Authored by nikic on Sat, Dec 26, 3:36 AM.

Details

Summary

This is a partial fix for https://bugs.llvm.org/show_bug.cgi?id=44403. Folding gep p, q-p to q is only legal if p and q have the same provenance. This fold should probably be guarded by something like getUnderlyingObject(p) == getUnderlyingObject(q).

This patch is a partial fix that removes the special handling for gep p, 0-p, which will fold to a null pointer, which would certainly not pass an underlying object check (unless p is also null, in which case this would fold trivially anyway). Folding to a null pointer is particularly problematic due to the special handling it receives in many places, making end-to-end miscompiles more likely.

Diff Detail

Event Timeline

nikic created this revision.Sat, Dec 26, 3:36 AM
nikic requested review of this revision.Sat, Dec 26, 3:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptSat, Dec 26, 3:36 AM

I only looked at the tests and they were correct before, see here: https://alive2.llvm.org/ce/z/UzW3pv
The tests are weird because they have 'gep inbounds'. The reason they are correct (and weird) is that the only way p - (int)p/sizeof(*p) is inbounds is p being null. Anything else will overflow.

The code is buggy because it doesn't require inbounds (not caught because there's no test without inbounds): https://gcc.godbolt.org/z/GYxc78

So in the end the transformation doesn't seem very useful in practice?

nikic added a comment.Sat, Dec 26, 5:56 AM

I only looked at the tests and they were correct before, see here: https://alive2.llvm.org/ce/z/UzW3pv
The tests are weird because they have 'gep inbounds'. The reason they are correct (and weird) is that the only way p - (int)p/sizeof(*p) is inbounds is p being null. Anything else will overflow.

This doesn't look right to me, at least not given current LangRef wording. Lets say we have gep inbounds p, -p, where p = ptr(base_addr = 1, offset = -1). This means that the address value of p is 0, but it has provenance of the object at base_addr = 1. As such, the inbounds is not violated (both p and the gep results are inbounds of the zero address), but we still change provenance.

For the transform to be correct, the GEP inbounds specification would have to require not only that the base address is in bounds of an allocated object, but that it is in bounds of the allocated object corresponding to the provenance of the base value. Is that supposed to be part of the GEP inbounds semantics?

The code is buggy because it doesn't require inbounds (not caught because there's no test without inbounds): https://gcc.godbolt.org/z/GYxc78

I did consider making this transform require inbounds instead. The requirement that at least the address values belong to the same object would make practical miscompiles less likely. However, as argued above, I don't think the inbounds requirements would actually make the transform correct. And alive does agree that requiring inbounds is not sufficient for the more general case (https://alive2.llvm.org/ce/z/Fykn-3).

So in the end the transformation doesn't seem very useful in practice?

At least, the null pointer case I'm touching here doesn't fire on test-suite -O3 at all...

I only looked at the tests and they were correct before, see here: https://alive2.llvm.org/ce/z/UzW3pv
The tests are weird because they have 'gep inbounds'. The reason they are correct (and weird) is that the only way p - (int)p/sizeof(*p) is inbounds is p being null. Anything else will overflow.

This doesn't look right to me, at least not given current LangRef wording. Lets say we have gep inbounds p, -p, where p = ptr(base_addr = 1, offset = -1). This means that the address value of p is 0, but it has provenance of the object at base_addr = 1. As such, the inbounds is not violated (both p and the gep results are inbounds of the zero address), but we still change provenance.

There's an extra catch: gep inbounds requires both the input and output pointers to be in bounds. This part is explicit in LangRef, at least.
Some examples:

p = malloc()
q = gep inbounds p, -1  // poison
r = gep p, -1           // ok
s = gep inbounds r, 1   // poison: r is not inbounds
t = gep r, 1            // ok, offset = 0
u = gep inbounds t, 1   // ok, offset = 1 (assuming malloc size > 0)

For the transform to be correct, the GEP inbounds specification would have to require not only that the base address is in bounds of an allocated object, but that it is in bounds of the allocated object corresponding to the provenance of the base value. Is that supposed to be part of the GEP inbounds semantics?

The code is buggy because it doesn't require inbounds (not caught because there's no test without inbounds): https://gcc.godbolt.org/z/GYxc78

I did consider making this transform require inbounds instead. The requirement that at least the address values belong to the same object would make practical miscompiles less likely. However, as argued above, I don't think the inbounds requirements would actually make the transform correct. And alive does agree that requiring inbounds is not sufficient for the more general case (https://alive2.llvm.org/ce/z/Fykn-3).

Right, only the null case is correct because it only works when the input pointer is null; all other cases were poison. So transforming poison into null is fine.
The general case changes provenance. It's very wrong. And LangRef is explicit about this case as well. Well, and BasicAA will break programs if you do this general transformation as it uses provenance to prove no alias.

So in the end the transformation doesn't seem very useful in practice?

At least, the null pointer case I'm touching here doesn't fire on test-suite -O3 at all...

The transformation does look weird. When is the code trying to compute nullptr in some convoluted way? I have some inclination to remove it altogether.

nikic added a comment.Sat, Dec 26, 9:55 AM

I only looked at the tests and they were correct before, see here: https://alive2.llvm.org/ce/z/UzW3pv
The tests are weird because they have 'gep inbounds'. The reason they are correct (and weird) is that the only way p - (int)p/sizeof(*p) is inbounds is p being null. Anything else will overflow.

This doesn't look right to me, at least not given current LangRef wording. Lets say we have gep inbounds p, -p, where p = ptr(base_addr = 1, offset = -1). This means that the address value of p is 0, but it has provenance of the object at base_addr = 1. As such, the inbounds is not violated (both p and the gep results are inbounds of the zero address), but we still change provenance.

There's an extra catch: gep inbounds requires both the input and output pointers to be in bounds. This part is explicit in LangRef, at least.
Some examples:

p = malloc()
q = gep inbounds p, -1  // poison
r = gep p, -1           // ok
s = gep inbounds r, 1   // poison: r is not inbounds
t = gep r, 1            // ok, offset = 0
u = gep inbounds t, 1   // ok, offset = 1 (assuming malloc size > 0)

Right, but inbounds and provenance are, as far as I can tell, orthogonal concepts. Alive claims that this code has UB due to use of gep inbounds: https://alive2.llvm.org/ce/z/zTctIR At the same time, the gep inbounds itself is not poison: https://alive2.llvm.org/ce/z/wxGGyu That makes it looks like Alive also constrains provenance based on gep inbounds, not just the value of the pointer.

I only looked at the tests and they were correct before, see here: https://alive2.llvm.org/ce/z/UzW3pv
The tests are weird because they have 'gep inbounds'. The reason they are correct (and weird) is that the only way p - (int)p/sizeof(*p) is inbounds is p being null. Anything else will overflow.

This doesn't look right to me, at least not given current LangRef wording. Lets say we have gep inbounds p, -p, where p = ptr(base_addr = 1, offset = -1). This means that the address value of p is 0, but it has provenance of the object at base_addr = 1. As such, the inbounds is not violated (both p and the gep results are inbounds of the zero address), but we still change provenance.

There's an extra catch: gep inbounds requires both the input and output pointers to be in bounds. This part is explicit in LangRef, at least.
Some examples:

p = malloc()
q = gep inbounds p, -1  // poison
r = gep p, -1           // ok
s = gep inbounds r, 1   // poison: r is not inbounds
t = gep r, 1            // ok, offset = 0
u = gep inbounds t, 1   // ok, offset = 1 (assuming malloc size > 0)

Right, but inbounds and provenance are, as far as I can tell, orthogonal concepts. Alive claims that this code has UB due to use of gep inbounds: https://alive2.llvm.org/ce/z/zTctIR At the same time, the gep inbounds itself is not poison: https://alive2.llvm.org/ce/z/wxGGyu That makes it looks like Alive also constrains provenance based on gep inbounds, not just the value of the pointer.

Your example just needs reasoning about inbounds. Let's see:

%p.int = ptrtoint i8* %p to i64
%p.neg = sub i64 0, %p.int
%p.null = getelementptr i8, i8* %p, i64 %p.neg
%p.null2 = getelementptr inbounds i8, i8* %p.null, i64 %p.null.neg

gep inbounds requires %p.null to be inbounds, otherwise the result is poison. For %p.null to be inbounds, we need the following (assuming %p has a positive offset to start with):

  • %p.neg but be >= 0
  • therefore %p.int <= 0
  • therefore %p=null

He have established that %p.null2 is either null if %p=null, or poison otherwise. As both can be replaced with null, your example transformation is correct.

If %p was OOB, then its offset might have been negative, for example. But the reasoning is similar to what I wrote above.

nagisa added a subscriber: nagisa.Thu, Dec 31, 7:59 AM

Why is discussion about gepi coming up at all?

This differential, from its description is all about a regular gep, which is currently mis-optimized according to Alive itself. Sure there has been collateral damage from fixing handling of gep, but I think we all agree having a working gep is more important than a very efficient gepi?

(If the fold turns out to be actually valid for gepi, it can be added back just for gepi…)

The only thing missing in this diff from what I can tell is just a regression test…

nikic updated this revision to Diff 314222.Fri, Jan 1, 8:04 AM

Rebase over additional tests without inbounds.

nikic added a subscriber: RalfJung.Sun, Jan 10, 7:02 AM

Based on what @RalfJung mentioned on zulip, the question of whether the transform is legal for inbounds comes down to the particular choice of inbounds semantics. I was using the semantics specified in LangRef, which make the optimization illegal, while @nlopes used the semantics from https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf (or something similar), which makes it legal. The relevant difference to the LangRef semantics (if we stick to the gep-inbounds-logical case) would be:

- The base pointer has an in bounds address of an allocated object [...]
+ The base pointer has an in bounds address of the allocated object it is based on [...]

In any case, regardless of whether this is legal for the inbounds case, I think everyone agrees it's not legal for the non-inbounds case (and not legal for the non-null case regardless of inbounds). Is that enough to move forward here, or do you want me to thread inbounds information through SimplifyGEPInst and retain this optimization for the inbounds case?

Based on what @RalfJung mentioned on zulip, the question of whether the transform is legal for inbounds comes down to the particular choice of inbounds semantics. I was using the semantics specified in LangRef, which make the optimization illegal, while @nlopes used the semantics from https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf (or something similar), which makes it legal. The relevant difference to the LangRef semantics (if we stick to the gep-inbounds-logical case) would be:

- The base pointer has an in bounds address of an allocated object [...]
+ The base pointer has an in bounds address of the allocated object it is based on [...]

In any case, regardless of whether this is legal for the inbounds case, I think everyone agrees it's not legal for the non-inbounds case (and not legal for the non-null case regardless of inbounds). Is that enough to move forward here, or do you want me to thread inbounds information through SimplifyGEPInst and retain this optimization for the inbounds case?

LGTM, it's a step in the right direction (removing wrong optimizations)!

+ The base pointer has an in bounds address of the allocated object it is based on [...]

It's slightly more tricky than this. A wrong GEPi doesn't always immediately produce poison; instead it records (in the provenance) which offsets all have to be inbounds and then when a load/store happens, it causes UB if any of these offsets is not inbounds. So there are "doomed" pointers (that cause UB when dereferenced) that are not poison. They cannot be optimized to poison since they can still be legally used e.g. in "icmp".

All this is necessary since LLVM considers both inttoptr and GEPi pure operations that it will reorder across anything. To determine "the allocated object it (the pointer) is based on" we'd have to look up in memory which allocated object corresponds to this integer address, but that would make the operation impure and make the reorderings wrong. So instead determining which allocated object this ptr is based on is delayed until the next memory access.

I think everyone agrees it's not legal for the non-inbounds case (and not legal for the non-null case regardless of inbounds)

For non-inbounds, I agree that "folding gep p, q-p to q is only legal if p and q have the same provenance" (so, barely ever).

I didn't entirely follow the "inbounds" arguments here, but I am happy to leave that to y'all. ;)

This revision was not accepted when it landed; it landed in state Needs Review.Tue, Jan 12, 11:25 AM
This revision was automatically updated to reflect the committed changes.