Page MenuHomePhabricator

InstCombine: Fold comparisons between unguessable allocas and other pointers
ClosedPublic

Authored by hans on Oct 1 2015, 11:19 AM.

Details

Summary

This will allow us to optimize code such as:

int f(int *p) {
  int x;
  return p == &x;
}

as well as:

int *allocate(void);
int f() {
  int x;
  int *p = allocate();
  return p == &x;
}

The folding can only be done under certain circumstances. Even though p and &x cannot alias, the comparison must still return true if the pointer representations are equal. If a user successfully generates a p that's a correct guess for &x, comparison should return true even though p is an invalid pointer.

This patch argues that if the address of the alloca isn't observable outside the function, the function can act as-if the address is impossible to guess from the outside. The tricky part is keeping the act consistent: if we fold p == &x to false in one place, we must make sure to fold any other comparisons based on those pointers similarly. To ensure that, we only fold when &x is involved exactly once in comparison instructions.

Let me know what you think.

Diff Detail

Repository
rL LLVM

Event Timeline

hans updated this revision to Diff 36273.Oct 1 2015, 11:19 AM
hans retitled this revision from to InstSimplify: Fold comparisons between allocas and arguments under certain circumstances.
hans updated this object.
hans added subscribers: llvm-commits, hansw.
sanjoy added a subscriber: sanjoy.Oct 1 2015, 1:04 PM
sanjoy added inline comments.
lib/Analysis/InstructionSimplify.cpp
2130 ↗(On Diff #36273)

I have a question about this second bit -- do we have to ensure that we only *fold* such comparisons consistently, or do we have to ensure that they also evaluate consistently? IOW, say I have

void f(i8* %arg) {
 i8* %val = alloca
 %t0 = %val == %arg
 ...
 %t1 = complex_xform(%val) == %arg

such that complex_xform(x) == x always, but the compiler cannot prove that. Consequently we fold %t0 to false, but don't fold complex_xform(x) == x at all, and it ends up evaluating to true at runtime.

Are you relying on the fact that a complex_xform cannot be formed out of the operators you look through in CanTrackAlluses? I'd be more comfortable if you folded *all* comparisons based off of %alloca in one go.

2137 ↗(On Diff #36273)

Can this be rewritten to use PointerMayBeCaptured?

2138 ↗(On Diff #36273)

Minor: I'd call this MaxIter.

2163 ↗(On Diff #36273)

Why not just use APInt's constructor?

2168 ↗(On Diff #36273)

patchpoint, stackmap and gc.statepoint all can escape values -- they're basically a call to an unknown function (I don't know if they'll show up as an IntrinsicInst though).

hans marked 2 inline comments as done.Oct 1 2015, 2:56 PM
hans added inline comments.
lib/Analysis/InstructionSimplify.cpp
2130 ↗(On Diff #36273)

Yes, exactly: the comparisons have to evaluate consistently, so we have to fold them all, and I'm relying on CanTrackAllUses (maybe not a great name actually) to ensure that we can do that.

I don't think I could fold all comparisons based on an %alloca in one go from InstSimplify as it's actually not changing the instruction, just returning a new value for it. Maybe there's a better place to do this?

On the other hand, I don't think multiple comparisons is very common. Maybe we should just bail out if we see another comparison? I'll do that for now.

2137 ↗(On Diff #36273)

My check is much more conservative. For example, it won't follow the use through a phi node, and certainly won't allow it in a function call - even if the function is nocapture, it could make comparisons with the pointer.

David pointed out that I should probably break this out into a separate function though, so I'll do that.

2168 ↗(On Diff #36273)

Thanks. David also pointed out he had some intrinsic that would escape a pointer. I'll build a whitelist here.

hans updated this revision to Diff 36306.Oct 1 2015, 2:56 PM

Addressing Sanjoy's comments.

hans added a comment.Oct 1 2015, 5:55 PM

Actually, requiring that the alloca is only used in a single comparison makes everything much simpler. I also verified that it didn't make the optimization fire any less over Chromium (this patch reduced binary size by about 10k).

Also, this means we don't need to trace the Argument value. It's fine if it escapes, because we know it can't be compared against any values based on the alloca since only allow one cmp.

And now that we don't need to be able to trace the non-Argument side of the comparison, it means we don't have to restrains that to just Arguments, in fact under these conditions we can act as if the alloca doesn't alias pointers based on any other value, including e.g. pointers from malloc. (This is similar to the "an alloca which is only used in a single cmp can be trivially folded" argument in the patch Philip sent to the list.)

I think this makes the optimization much more powerful. (Unfortunately it only knocked 100 more bytes of the Chromium build.) I'll get an updated patch out tomorrow.

hans updated this revision to Diff 36384.Oct 2 2015, 11:14 AM
hans retitled this revision from InstSimplify: Fold comparisons between allocas and arguments under certain circumstances to InstSimplify: Fold comparisons between unguessable allocas and other pointers.
hans updated this object.
hans updated this revision to Diff 36417.Oct 2 2015, 5:21 PM
hans retitled this revision from InstSimplify: Fold comparisons between unguessable allocas and other pointers to InstCombine: Fold comparisons between unguessable allocas and other pointers.
hans updated this object.

Moving this to instcombine.

The previous patch ran into a terrifying bug where instsimplify would get run on IR that wasn't fully constructed yet, which meant it couldn't see all uses of an alloca and folded a bit too aggressively.

This new patch successfully bootstraps Clang. The optimization fires 67 times during bootstrap and reduces Clang binary size with 8519 bytes.

majnemer added inline comments.Oct 3 2015, 5:41 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
780 ↗(On Diff #36417)

Is this sufficient? What if getValueOperand is not U->get() but, say, a bitcast of U->get() ?

792–793 ↗(On Diff #36417)

Can't memcpy escape the value just like a StoreInst would?

hans added inline comments.Oct 3 2015, 5:53 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
780 ↗(On Diff #36417)

I think so. Such a bitcast would show up as a use of our value, so we'd track it and see if it ends up getting stored.

792–793 ↗(On Diff #36417)

It's not possible to memcpy the value directly. The value would have to be stored in an object, and then that could be memcpy'd. But we'd notice the value escaping through the store.

sanjoy added a comment.Oct 3 2015, 6:49 PM

Right now GetUnderlyingObject does not look through PHI nodes, but if it starts to at some point, this change will fold %cmp in

ph:
 %k = alloca i32
 br label %loop

loop:
 %t = phi i32* [ %k, %ph ], [ gep(%k, 1), %loop ]
 %cmp = icmp eq %k, %arg
 br i1 %cmp, label %loop, label %leave

I don't know if that ^ is something you want, but I wanted to point it out.

lib/Transforms/InstCombine/InstCombineCompares.cpp
786 ↗(On Diff #36417)

How about assert(V == &ICI) here?

792 ↗(On Diff #36417)

Might be helpful to note that memset is safe only because you don't allow ptrtoint.

hans added a comment.Oct 5 2015, 9:39 AM

Right now GetUnderlyingObject does not look through PHI nodes, but if it starts to at some point, this change will fold %cmp [...]

That would be correct, though. Or are you saying there's a problem there?

lib/Transforms/InstCombine/InstCombineCompares.cpp
786 ↗(On Diff #36417)

It might find a different cmp first though, and then bail out later when it finds ICI.

792 ↗(On Diff #36417)

Thanks, I'll put a comment here.

hans updated this revision to Diff 36524.Oct 5 2015, 9:40 AM

Add comment about the memory intrinsics.

sanjoy added a comment.Oct 5 2015, 1:17 PM
In D13358#259819, @hans wrote:

Right now GetUnderlyingObject does not look through PHI nodes, but if it starts to at some point, this change will fold %cmp [...]

That would be correct, though. Or are you saying there's a problem there?

There's a potential problem here in justifying the transform -- in
the loop, the programmer is no longer doing a "one-off guess" anymore,
but systematically searching for the offset of a passed in argument
(which points to some slot in the caller's stack, say) from the
alloca. For instance, such a loop can allow you to implement
pointer subtraction without ptrtoint -- while (&ptra[diff++] != ptrb).

In other words, I understand the justification for the current change
to be that: an alloca can be allocated to any stack slot in the
current frame, so the compiler is allowed to fold any equality
comparisons to false. This seems fine. But a loop with an
incrementing induction variable can use an equality check to
"implement" an inequality check like ult or slt, and by folding
the != to true in while (&ptra[diff++] != ptrb), you're really
semantically folding ptra < ptrb to false. I personally think
this is not a problem, but I hope I was able to communicate what the
difference is.

lib/Transforms/InstCombine/InstCombineCompares.cpp
786 ↗(On Diff #36524)

Ah, ok. (This is completely optional) I'd be tempted to get rid of NumCmps and write the check as

if (U != AllocaUse)
  return nullptr;

where AllocaUse is the Use * of the alloca in ICI which you either pass in or compute.

hans added a comment.Oct 5 2015, 2:23 PM

There's a potential problem here in justifying the transform -- in
the loop, the programmer is no longer doing a "one-off guess" anymore,
but systematically searching for the offset of a passed in argument
(which points to some slot in the caller's stack, say) from the
alloca.

Thanks for clarifying. It's an interesting example :-)

I think we're still in the clear here. The way I see it, alloca has a lot of freedom in how it allocates memory, and we can act as if it allocated the memory in a place that the programmer isn't finding with their guesses.

hans added a comment.Oct 6 2015, 8:28 AM

Sanjoy, David: thanks for your comments so far. OK to commit?

hans updated this revision to Diff 36683.Oct 6 2015, 4:48 PM

Rebase. No change.

hans updated this revision to Diff 36685.Oct 6 2015, 4:56 PM
hans updated this object.

Bail early if we hit an instruction with many uses. In particular, don't cause WorkList to get heap allocated by adding elements to it that we're not going to process anyway.

I can has lgtm?

sanjoy added a comment.Oct 6 2015, 4:59 PM

FWIW, this LGTM, but given this is in territory I'm not wholly familiar with, I'll wait for @majnemer to take a look.

majnemer accepted this revision.Oct 6 2015, 5:07 PM
majnemer edited edge metadata.

LGTM with nits addressed.

lib/Transforms/InstCombine/InstCombineCompares.cpp
798 ↗(On Diff #36685)

Can you make this >=, would make it a little easier to reason about.

This revision is now accepted and ready to land.Oct 6 2015, 5:07 PM
hans marked an inline comment as done.Oct 6 2015, 5:21 PM
This revision was automatically updated to reflect the committed changes.