Download Raw Diff

Details

Reviewers

aaron.ballman
reames
majnemer
sunfish
hfinkel

Commits

rGf1f36517b7c6: InstCombine: Fold comparisons between unguessable allocas and other pointers
rL249490: InstCombine: Fold comparisons between unguessable allocas and other pointers

Summary

This will allow us to optimize code such as:

int f(int *p) {
  int x;
  return p == &x;
}

as well as:

int *allocate(void);
int f() {
  int x;
  int *p = allocate();
  return p == &x;
}

The folding can only be done under certain circumstances. Even though p and &x cannot alias, the comparison must still return true if the pointer representations are equal. If a user successfully generates a p that's a correct guess for &x, comparison should return true even though p is an invalid pointer.

This patch argues that if the address of the alloca isn't observable outside the function, the function can act as-if the address is impossible to guess from the outside. The tricky part is keeping the act consistent: if we fold p == &x to false in one place, we must make sure to fold any other comparisons based on those pointers similarly. To ensure that, we only fold when &x is involved exactly once in comparison instructions.

Let me know what you think.

Diff Detail

Repository: rL LLVM

Event Timeline

hans updated this revision to Diff 36273.Oct 1 2015, 11:19 AM

hans retitled this revision from to InstSimplify: Fold comparisons between allocas and arguments under certain circumstances.

hans updated this object.

hans added reviewers: sunfish, reames, hfinkel, majnemer, aaron.ballman.

hans added subscribers: llvm-commits, hansw.

sanjoy added a subscriber: sanjoy.Oct 1 2015, 1:04 PM

sanjoy added inline comments.

lib/Analysis/InstructionSimplify.cpp
2130 ↗	(On Diff #36273)	I have a question about this second bit -- do we have to ensure that we only fold such comparisons consistently, or do we have to ensure that they also evaluate consistently? IOW, say I have void f(i8* %arg) { i8* %val = alloca %t0 = %val == %arg ... %t1 = complex_xform(%val) == %arg such that `complex_xform(x) == x` always, but the compiler cannot prove that. Consequently we fold `%t0` to `false`, but don't fold `complex_xform(x) == x` at all, and it ends up evaluating to `true` at runtime. Are you relying on the fact that a `complex_xform` cannot be formed out of the operators you look through in `CanTrackAlluses`? I'd be more comfortable if you folded all comparisons based off of `%alloca` in one go.
2137 ↗	(On Diff #36273)	Can this be rewritten to use `PointerMayBeCaptured`?
2138 ↗	(On Diff #36273)	Minor: I'd call this `MaxIter`.
2163 ↗	(On Diff #36273)	Why not just use `APInt`'s constructor?
2168 ↗	(On Diff #36273)	`patchpoint`, `stackmap` and `gc.statepoint` all can escape values -- they're basically a call to an unknown function (I don't know if they'll show up as an `IntrinsicInst` though).

hans marked 2 inline comments as done.Oct 1 2015, 2:56 PM

hans added inline comments.

lib/Analysis/InstructionSimplify.cpp
2130 ↗	(On Diff #36273)	Yes, exactly: the comparisons have to evaluate consistently, so we have to fold them all, and I'm relying on CanTrackAllUses (maybe not a great name actually) to ensure that we can do that. I don't think I could fold all comparisons based on an %alloca in one go from InstSimplify as it's actually not changing the instruction, just returning a new value for it. Maybe there's a better place to do this? On the other hand, I don't think multiple comparisons is very common. Maybe we should just bail out if we see another comparison? I'll do that for now.
2137 ↗	(On Diff #36273)	My check is much more conservative. For example, it won't follow the use through a phi node, and certainly won't allow it in a function call - even if the function is nocapture, it could make comparisons with the pointer. David pointed out that I should probably break this out into a separate function though, so I'll do that.
2168 ↗	(On Diff #36273)	Thanks. David also pointed out he had some intrinsic that would escape a pointer. I'll build a whitelist here.

Addressing Sanjoy's comments.

Actually, requiring that the alloca is only used in a single comparison makes everything much simpler. I also verified that it didn't make the optimization fire any less over Chromium (this patch reduced binary size by about 10k).

Also, this means we don't need to trace the Argument value. It's fine if it escapes, because we know it can't be compared against any values based on the alloca since only allow one cmp.

And now that we don't need to be able to trace the non-Argument side of the comparison, it means we don't have to restrains that to just Arguments, in fact under these conditions we can act as if the alloca doesn't alias pointers based on any other value, including e.g. pointers from malloc. (This is similar to the "an alloca which is only used in a single cmp can be trivially folded" argument in the patch Philip sent to the list.)

I think this makes the optimization much more powerful. (Unfortunately it only knocked 100 more bytes of the Chromium build.) I'll get an updated patch out tomorrow.

hans updated this revision to Diff 36384.Oct 2 2015, 11:14 AM

hans retitled this revision from InstSimplify: Fold comparisons between allocas and arguments under certain circumstances to InstSimplify: Fold comparisons between unguessable allocas and other pointers.

hans updated this object.

Moving this to instcombine.

The previous patch ran into a terrifying bug where instsimplify would get run on IR that wasn't fully constructed yet, which meant it couldn't see all uses of an alloca and folded a bit too aggressively.

This new patch successfully bootstraps Clang. The optimization fires 67 times during bootstrap and reduces Clang binary size with 8519 bytes.

majnemer added inline comments.Oct 3 2015, 5:41 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp
780 ↗	(On Diff #36417)	Is this sufficient? What if `getValueOperand` is not `U->get()` but, say, a bitcast of `U->get()` ?
792–793 ↗	(On Diff #36417)	Can't `memcpy` escape the value just like a `StoreInst` would?

hans added inline comments.Oct 3 2015, 5:53 PM

lib/Transforms/InstCombine/InstCombineCompares.cpp
780 ↗	(On Diff #36417)	I think so. Such a bitcast would show up as a use of our value, so we'd track it and see if it ends up getting stored.
792–793 ↗	(On Diff #36417)	It's not possible to memcpy the value directly. The value would have to be stored in an object, and then that could be memcpy'd. But we'd notice the value escaping through the store.

Right now GetUnderlyingObject does not look through PHI nodes, but if it starts to at some point, this change will fold %cmp in

ph:
 %k = alloca i32
 br label %loop

loop:
 %t = phi i32* [ %k, %ph ], [ gep(%k, 1), %loop ]
 %cmp = icmp eq %k, %arg
 br i1 %cmp, label %loop, label %leave

I don't know if that ^ is something you want, but I wanted to point it out.

lib/Transforms/InstCombine/InstCombineCompares.cpp
786 ↗	(On Diff #36417)	How about `assert(V == &ICI)` here?
792 ↗	(On Diff #36417)	Might be helpful to note that `memset` is safe only because you don't allow `ptrtoint`.

Right now GetUnderlyingObject does not look through PHI nodes, but if it starts to at some point, this change will fold %cmp [...]

That would be correct, though. Or are you saying there's a problem there?

lib/Transforms/InstCombine/InstCombineCompares.cpp
786 ↗	(On Diff #36417)	It might find a different cmp first though, and then bail out later when it finds ICI.
792 ↗	(On Diff #36417)	Thanks, I'll put a comment here.

Add comment about the memory intrinsics.

In D13358#259819, @hans wrote:

Right now GetUnderlyingObject does not look through PHI nodes, but if it starts to at some point, this change will fold %cmp [...]

That would be correct, though. Or are you saying there's a problem there?

There's a potential problem here in justifying the transform -- in
the loop, the programmer is no longer doing a "one-off guess" anymore,
but systematically searching for the offset of a passed in argument
(which points to some slot in the caller's stack, say) from the
alloca. For instance, such a loop can allow you to implement
pointer subtraction without ptrtoint -- while (&ptra[diff++] != ptrb).

In other words, I understand the justification for the current change
to be that: an alloca can be allocated to any stack slot in the
current frame, so the compiler is allowed to fold any equality
comparisons to false. This seems fine. But a loop with an
incrementing induction variable can use an equality check to
"implement" an inequality check like ult or slt, and by folding
the != to true in while (&ptra[diff++] != ptrb), you're really
semantically folding ptra < ptrb to false. I personally think
this is not a problem, but I hope I was able to communicate what the
difference is.

lib/Transforms/InstCombine/InstCombineCompares.cpp
786 ↗	(On Diff #36524)	Ah, ok. (This is completely optional) I'd be tempted to get rid of `NumCmps` and write the check as if (U != AllocaUse) return nullptr; where `AllocaUse` is the `Use *` of the `alloca` in `ICI` which you either pass in or compute.

There's a potential problem here in justifying the transform -- in
the loop, the programmer is no longer doing a "one-off guess" anymore,
but systematically searching for the offset of a passed in argument
(which points to some slot in the caller's stack, say) from the
alloca.

Thanks for clarifying. It's an interesting example :-)

I think we're still in the clear here. The way I see it, alloca has a lot of freedom in how it allocates memory, and we can act as if it allocated the memory in a place that the programmer isn't finding with their guesses.

Sanjoy, David: thanks for your comments so far. OK to commit?

Rebase. No change.

Bail early if we hit an instruction with many uses. In particular, don't cause WorkList to get heap allocated by adding elements to it that we're not going to process anyway.

I can has lgtm?

FWIW, this LGTM, but given this is in territory I'm not wholly familiar with, I'll wait for @majnemer to take a look.

LGTM with nits addressed.

lib/Transforms/InstCombine/InstCombineCompares.cpp
798 ↗	(On Diff #36685)	Can you make this `>=`, would make it a little easier to reason about.

This revision is now accepted and ready to land.Oct 6 2015, 5:07 PM

Closed by commit rL249490: InstCombine: Fold comparisons between unguessable allocas and other pointers (authored by hans). · Explain WhyOct 6 2015, 5:22 PM

This revision was automatically updated to reflect the committed changes.

Diff 36687

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 724 Lines • ▼ Show 20 Lines	if (GEPsInBounds && (isa<ConstantExpr>(GEPLHS) \|\| GEPLHS->hasOneUse()) &&
Value *L = EmitGEPOffset(GEPLHS);		Value *L = EmitGEPOffset(GEPLHS);
Value *R = EmitGEPOffset(GEPRHS);		Value *R = EmitGEPOffset(GEPRHS);
return new ICmpInst(ICmpInst::getSignedPredicate(Cond), L, R);		return new ICmpInst(ICmpInst::getSignedPredicate(Cond), L, R);
}		}
}		}
return nullptr;		return nullptr;
}		}

		Instruction InstCombiner::FoldAllocaCmp(ICmpInst &ICI, AllocaInst Alloca,
		Value *Other) {
		assert(ICI.isEquality() && "Cannot fold non-equality comparison.");

		// It would be tempting to fold away comparisons between allocas and any
		// pointer not based on that alloca (e.g. an argument). However, even
		// though such pointers cannot alias, they can still compare equal.
		//
		// But LLVM doesn't specify where allocas get their memory, so if the alloca
		// doesn't escape we can argue that it's impossible to guess its value, and we
		// can therefore act as if any such guesses are wrong.
		//
		// The code below checks that the alloca doesn't escape, and that it's only
		// used in a comparison once (the current instruction). The
		// single-comparison-use condition ensures that we're trivially folding all
		// comparisons against the alloca consistently, and avoids the risk of
		// erroneously folding a comparison of the pointer with itself.

		unsigned MaxIter = 32; // Break cycles and bound to constant-time.

		SmallVector<Use *, 32> Worklist;
		for (Use &U : Alloca->uses()) {
		if (Worklist.size() >= MaxIter)
		return nullptr;
		Worklist.push_back(&U);
		}

		unsigned NumCmps = 0;
		while (!Worklist.empty()) {
		assert(Worklist.size() <= MaxIter);
		Use *U = Worklist.pop_back_val();
		Value *V = U->getUser();
		--MaxIter;

		if (isa<BitCastInst>(V) \|\| isa<GetElementPtrInst>(V) \|\| isa<PHINode>(V) \|\|
		isa<SelectInst>(V)) {
		// Track the uses.
		} else if (isa<LoadInst>(V)) {
		// Loading from the pointer doesn't escape it.
		continue;
		} else if (auto *SI = dyn_cast<StoreInst>(V)) {
		// Storing to the pointer is fine, but storing the pointer escapes it.
		if (SI->getValueOperand() == U->get())
		return nullptr;
		continue;
		} else if (isa<ICmpInst>(V)) {
		if (NumCmps++)
		return nullptr; // Found more than one cmp.
		continue;
		} else if (auto *Intrin = dyn_cast<IntrinsicInst>(V)) {
		switch (Intrin->getIntrinsicID()) {
		// These intrinsics don't escape or compare the pointer. Memset is safe
		// because we don't allow ptrtoint. Memcpy and memmove are safe because
		// we don't allow stores, so src cannot point to V.
		case Intrinsic::lifetime_start: case Intrinsic::lifetime_end:
		case Intrinsic::dbg_declare: case Intrinsic::dbg_value:
		case Intrinsic::memcpy: case Intrinsic::memmove: case Intrinsic::memset:
		continue;
		default:
		return nullptr;
		}
		} else {
		return nullptr;
		}
		for (Use &U : V->uses()) {
		if (Worklist.size() >= MaxIter)
		return nullptr;
		Worklist.push_back(&U);
		}
		}

		Type *CmpTy = CmpInst::makeCmpResultType(Other->getType());
		return ReplaceInstUsesWith(
		ICI,
		ConstantInt::get(CmpTy, !CmpInst::isTrueWhenEqual(ICI.getPredicate())));
		}

/// FoldICmpAddOpCst - Fold "icmp pred (X+CI), X".		/// FoldICmpAddOpCst - Fold "icmp pred (X+CI), X".
Instruction *InstCombiner::FoldICmpAddOpCst(Instruction &ICI,		Instruction *InstCombiner::FoldICmpAddOpCst(Instruction &ICI,
Value X, ConstantInt CI,		Value X, ConstantInt CI,
ICmpInst::Predicate Pred) {		ICmpInst::Predicate Pred) {
// From this point on, we know that (X+C <= X) --> (X+C < X) because C != 0,		// From this point on, we know that (X+C <= X) --> (X+C < X) because C != 0,
// so the values can never be equal. Similarly for all other "or equals"		// so the values can never be equal. Similarly for all other "or equals"
// operators.		// operators.

▲ Show 20 Lines • Show All 2,465 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitICmpInst(ICmpInst &I) {
if (GEPOperator *GEP = dyn_cast<GEPOperator>(Op0))		if (GEPOperator *GEP = dyn_cast<GEPOperator>(Op0))
if (Instruction *NI = FoldGEPICmp(GEP, Op1, I.getPredicate(), I))		if (Instruction *NI = FoldGEPICmp(GEP, Op1, I.getPredicate(), I))
return NI;		return NI;
if (GEPOperator *GEP = dyn_cast<GEPOperator>(Op1))		if (GEPOperator *GEP = dyn_cast<GEPOperator>(Op1))
if (Instruction *NI = FoldGEPICmp(GEP, Op0,		if (Instruction *NI = FoldGEPICmp(GEP, Op0,
ICmpInst::getSwappedPredicate(I.getPredicate()), I))		ICmpInst::getSwappedPredicate(I.getPredicate()), I))
return NI;		return NI;

		// Try to optimize equality comparisons against alloca-based pointers.
		if (Op0->getType()->isPointerTy() && I.isEquality()) {
		assert(Op1->getType()->isPointerTy() && "Comparing pointer with non-pointer?");
		if (auto *Alloca = dyn_cast<AllocaInst>(GetUnderlyingObject(Op0, DL)))
		if (Instruction *New = FoldAllocaCmp(I, Alloca, Op1))
		return New;
		if (auto *Alloca = dyn_cast<AllocaInst>(GetUnderlyingObject(Op1, DL)))
		if (Instruction *New = FoldAllocaCmp(I, Alloca, Op0))
		return New;
		}

// Test to see if the operands of the icmp are casted versions of other		// Test to see if the operands of the icmp are casted versions of other
// values. If the ptr->ptr cast can be stripped off both arguments, we do so		// values. If the ptr->ptr cast can be stripped off both arguments, we do so
// now.		// now.
if (BitCastInst *CI = dyn_cast<BitCastInst>(Op0)) {		if (BitCastInst *CI = dyn_cast<BitCastInst>(Op0)) {
if (Op0->getType()->isPointerTy() &&		if (Op0->getType()->isPointerTy() &&
(isa<Constant>(Op1) \|\| isa<BitCastInst>(Op1))) {		(isa<Constant>(Op1) \|\| isa<BitCastInst>(Op1))) {
// We keep moving the cast from the left operand over to the right		// We keep moving the cast from the left operand over to the right
// operand, where it can often be eliminated completely.		// operand, where it can often be eliminated completely.
▲ Show 20 Lines • Show All 936 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	public:
Instruction FoldICmpCstShrCst(ICmpInst &I, Value Op, Value *A,		Instruction FoldICmpCstShrCst(ICmpInst &I, Value Op, Value *A,
ConstantInt CI1, ConstantInt CI2);		ConstantInt CI1, ConstantInt CI2);
Instruction FoldICmpCstShlCst(ICmpInst &I, Value Op, Value *A,		Instruction FoldICmpCstShlCst(ICmpInst &I, Value Op, Value *A,
ConstantInt CI1, ConstantInt CI2);		ConstantInt CI1, ConstantInt CI2);
Instruction FoldICmpAddOpCst(Instruction &ICI, Value X, ConstantInt *CI,		Instruction FoldICmpAddOpCst(Instruction &ICI, Value X, ConstantInt *CI,
ICmpInst::Predicate Pred);		ICmpInst::Predicate Pred);
Instruction FoldGEPICmp(GEPOperator GEPLHS, Value *RHS,		Instruction FoldGEPICmp(GEPOperator GEPLHS, Value *RHS,
ICmpInst::Predicate Cond, Instruction &I);		ICmpInst::Predicate Cond, Instruction &I);
		Instruction FoldAllocaCmp(ICmpInst &ICI, AllocaInst Alloca, Value *Other);
Instruction FoldShiftByConstant(Value Op0, Constant *Op1,		Instruction FoldShiftByConstant(Value Op0, Constant *Op1,
BinaryOperator &I);		BinaryOperator &I);
Instruction *commonCastTransforms(CastInst &CI);		Instruction *commonCastTransforms(CastInst &CI);
Instruction *commonPointerCastTransforms(CastInst &CI);		Instruction *commonPointerCastTransforms(CastInst &CI);
Instruction *visitTrunc(TruncInst &CI);		Instruction *visitTrunc(TruncInst &CI);
Instruction *visitZExt(ZExtInst &CI);		Instruction *visitZExt(ZExtInst &CI);
Instruction *visitSExt(SExtInst &CI);		Instruction *visitSExt(SExtInst &CI);
Instruction *visitFPTrunc(FPTruncInst &CI);		Instruction *visitFPTrunc(FPTruncInst &CI);
▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/compare-alloca.ll

				; RUN: opt -instcombine -S %s \| FileCheck %s
				target datalayout = "p:32:32"


				define i1 @alloca_argument_compare(i64* %arg) {
				%alloc = alloca i64
				%cmp = icmp eq i64* %arg, %alloc
				ret i1 %cmp
				; CHECK-LABEL: alloca_argument_compare
				; CHECK: ret i1 false
				}

				define i1 @alloca_argument_compare_swapped(i64* %arg) {
				%alloc = alloca i64
				%cmp = icmp eq i64* %alloc, %arg
				ret i1 %cmp
				; CHECK-LABEL: alloca_argument_compare_swapped
				; CHECK: ret i1 false
				}

				define i1 @alloca_argument_compare_ne(i64* %arg) {
				%alloc = alloca i64
				%cmp = icmp ne i64* %arg, %alloc
				ret i1 %cmp
				; CHECK-LABEL: alloca_argument_compare_ne
				; CHECK: ret i1 true
				}

				define i1 @alloca_argument_compare_derived_ptrs(i64* %arg, i64 %x) {
				%alloc = alloca i64, i64 8
				%p = getelementptr i64, i64* %arg, i64 %x
				%q = getelementptr i64, i64* %alloc, i64 3
				%cmp = icmp eq i64* %p, %q
				ret i1 %cmp
				; CHECK-LABEL: alloca_argument_compare_derived_ptrs
				; CHECK: ret i1 false
				}

				declare void @escape(i64*)
				define i1 @alloca_argument_compare_escaped_alloca(i64* %arg) {
				%alloc = alloca i64
				call void @escape(i64* %alloc)
				%cmp = icmp eq i64* %alloc, %arg
				ret i1 %cmp
				; CHECK-LABEL: alloca_argument_compare_escaped_alloca
				; CHECK: %cmp = icmp eq i64* %alloc, %arg
				; CHECK: ret i1 %cmp
				}

				declare void @check_compares(i1, i1)
				define void @alloca_argument_compare_two_compares(i64* %p) {
				%q = alloca i64, i64 8
				%r = getelementptr i64, i64* %p, i64 1
				%s = getelementptr i64, i64* %q, i64 2
				%cmp1 = icmp eq i64* %p, %q
				%cmp2 = icmp eq i64* %r, %s
				call void @check_compares(i1 %cmp1, i1 %cmp2)
				ret void
				; We will only fold if there is a single cmp.
				; CHECK-LABEL: alloca_argument_compare_two_compares
				; CHECK: call void @check_compares(i1 %cmp1, i1 %cmp2)
				}

				define i1 @alloca_argument_compare_escaped_through_store(i64* %arg, i64** %ptr) {
				%alloc = alloca i64
				%cmp = icmp eq i64* %alloc, %arg
				%p = getelementptr i64, i64* %alloc, i64 1
				store i64* %p, i64** %ptr
				ret i1 %cmp
				; CHECK-LABEL: alloca_argument_compare_escaped_through_store
				; CHECK: %cmp = icmp eq i64* %alloc, %arg
				; CHECK: ret i1 %cmp
				}

				declare void @llvm.lifetime.start(i64, i8* nocapture)
				declare void @llvm.lifetime.end(i64, i8* nocapture)
				define i1 @alloca_argument_compare_benign_instrs(i8* %arg) {
				%alloc = alloca i8
				call void @llvm.lifetime.start(i64 1, i8* %alloc)
				%cmp = icmp eq i8* %arg, %alloc
				%x = load i8, i8* %arg
				store i8 %x, i8* %alloc
				call void @llvm.lifetime.end(i64 1, i8* %alloc)
				ret i1 %cmp
				; CHECK-LABEL: alloca_argument_compare_benign_instrs
				; CHECK: ret i1 false
				}

				declare i64* @allocator()
				define i1 @alloca_call_compare() {
				%p = alloca i64
				%q = call i64* @allocator()
				%cmp = icmp eq i64* %p, %q
				ret i1 %cmp
				; CHECK-LABEL: alloca_call_compare
				; CHECK: ret i1 false
				}

This is an archive of the discontinued LLVM Phabricator instance.

InstCombine: Fold comparisons between unguessable allocas and other pointers
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 36687

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/trunk/test/Transforms/InstCombine/compare-alloca.ll

This is an archive of the discontinued LLVM Phabricator instance.

InstCombine: Fold comparisons between unguessable allocas and other pointersClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 36687

llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/trunk/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/trunk/test/Transforms/InstCombine/compare-alloca.ll

InstCombine: Fold comparisons between unguessable allocas and other pointers
ClosedPublic