This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineCasts.cpp
-
InstCombineInternal.h
-
InstCombineLoadStoreAlloca.cpp
-
InstCombineSelect.cpp
-
InstructionCombining.cpp
-
test/
-
CodeGen/AArch64/
-
AArch64/
-
select-load.ll
-
Transforms/InstCombine/
-
InstCombine/
3
gep-select.ll

Differential D54170

[InstCombine][SelectionDAG][AArch64] fold gep into select to enable speculation of load
AbandonedPublic

Authored by labrinea on Nov 6 2018, 11:40 AM.

Download Raw Diff

Details

Reviewers

spatel
efriedma
llvm-commits
john.brawn

Summary

The motivation here is to reduce the stall cycles spent when a load is waiting for its address to become available. I am using InstCombine to fold GEP into Select when it seems profitable. This can enable speculation of loads, which is already happening in InstCombine. The DAGCombiner tries to revert the folding of load into select at the moment.

I am planning to break this patch down to smaller pieces, but first I wanted to show what I am trying to achieve and hopefully get some feedback. Another place this optimization might fit is codegenprepare. I think implementing it at the backend is not an option as it's too late to know whether we can safely speculate the loads.

The codegen test below shows the motivating example. The load doesn't have to wait for the select.
Before:

add x8, x20, #8
add x9, x20, #4
cmp w0, w19
csel x8, x8, x9, gt
ldr w0, [x8]

After:

ldp w8, w9, [x20, #4]
cmp w0, w19
csel w0, w9, w8, gt

This change improves an internal benchmark approximately by 4%.

Diff Detail

Event Timeline

labrinea created this revision.Nov 6 2018, 11:40 AM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptNov 6 2018, 11:40 AM

This looks like a bunch of separate changes which should be split into multiple patches. Especially the changes to DAGCombine and InstCombiner::visitZExt .

test/Transforms/InstCombine/gep-select.ll
29	I'm pretty sure this isn't safe, in general. The "inbounds" marker only guarantees that the arithmetic is in bounds; it doesn't make any guarantees about the type of the pointer.

In D54170#1289220, @efriedma wrote:

This looks like a bunch of separate changes which should be split into multiple patches. Especially the changes to DAGCombine and InstCombiner::visitZExt .

Sure, I've explained in the description why I put everything together in this revision.

test/Transforms/InstCombine/gep-select.ll
29	My comment was not referring to the inbounds marker. I am actually looking whether the indices derived from the select fall into the boundaries of a static type. In this example the last index of the geps ( `i32 1` and `i32 2` respectively) indicate valid offsets of struct members (the last two i32 members of struct.A).

efriedma added inline comments.Nov 6 2018, 2:35 PM

test/Transforms/InstCombine/gep-select.ll
29	The static type of a pointer or GEP can't be used to prove anything about the memory it points to. In C, accessing a member does imply the base pointer points to a valid struct, but we don't record that anywhere in IR. (IIRC there was some discussion of trying to add metadata to model this, but it never went anywhere.)

john.brawn resigned from this revision.May 12 2020, 6:46 AM

Herald added a subscriber: danielkiss. · View Herald TranscriptMay 12 2020, 6:46 AM

labrinea abandoned this revision.Jan 28 2022, 3:12 AM

Herald added subscribers: ecnelises, steven.zhang. · View Herald TranscriptJan 28 2022, 3:12 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

6 lines

Transforms/

InstCombine/

InstCombineCasts.cpp

13 lines

InstCombineInternal.h

2 lines

InstCombineLoadStoreAlloca.cpp

50 lines

InstCombineSelect.cpp

7 lines

InstructionCombining.cpp

36 lines

test/

CodeGen/

AArch64/

select-load.ll

6 lines

Transforms/

InstCombine/

gep-select.ll

101 lines

Diff 172790

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,954 Lines • ▼ Show 20 Lines	bool DAGCombiner::SimplifySelectOps(SDNode *TheSelect, SDValue LHS,
// If this is a load and the token chain is identical, replace the select		// If this is a load and the token chain is identical, replace the select
// of two loads with a load through a select of the address to load from.		// of two loads with a load through a select of the address to load from.
// This triggers in things like "select bool X, 10.0, 123.0" after the FP		// This triggers in things like "select bool X, 10.0, 123.0" after the FP
// constants have been dropped into the constant pool.		// constants have been dropped into the constant pool.
if (LHS.getOpcode() == ISD::LOAD) {		if (LHS.getOpcode() == ISD::LOAD) {
LoadSDNode *LLD = cast<LoadSDNode>(LHS);		LoadSDNode *LLD = cast<LoadSDNode>(LHS);
LoadSDNode *RLD = cast<LoadSDNode>(RHS);		LoadSDNode *RLD = cast<LoadSDNode>(RHS);

		unsigned RequiredAlignment;
		EVT LDVT = LLD->getMemoryVT();
		if (DAG.areNonVolatileConsecutiveLoads(LLD, RLD, LDVT.getStoreSize(), 1) &&
		TLI.hasPairedLoad(LDVT, RequiredAlignment))
		return false;

// Token chains must be identical.		// Token chains must be identical.
if (LHS.getOperand(0) != RHS.getOperand(0) \|\|		if (LHS.getOperand(0) != RHS.getOperand(0) \|\|
// Do not let this transformation reduce the number of volatile loads.		// Do not let this transformation reduce the number of volatile loads.
LLD->isVolatile() \|\| RLD->isVolatile() \|\|		LLD->isVolatile() \|\| RLD->isVolatile() \|\|
// FIXME: If either is a pre/post inc/dec load,		// FIXME: If either is a pre/post inc/dec load,
// we'd need to split out the address adjustment.		// we'd need to split out the address adjustment.
LLD->isIndexed() \|\| RLD->isIndexed() \|\|		LLD->isIndexed() \|\| RLD->isIndexed() \|\|
// If this is an EXTLOAD, the VT's must match.		// If this is an EXTLOAD, the VT's must match.
▲ Show 20 Lines • Show All 1,043 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineCasts.cpp

	Show First 20 Lines • Show All 1,071 Lines • ▼ Show 20 Lines
	}			}

	Instruction *InstCombiner::visitZExt(ZExtInst &CI) {			Instruction *InstCombiner::visitZExt(ZExtInst &CI) {
	// If this zero extend is only used by a truncate, let the truncate be			// If this zero extend is only used by a truncate, let the truncate be
	// eliminated before we try to optimize this zext.			// eliminated before we try to optimize this zext.
	if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))			if (CI.hasOneUse() && isa<TruncInst>(CI.user_back()))
	return nullptr;			return nullptr;

				// zext(cmp) ~> select cmp, 1, 0
				// Canonicalize into select to give the opportunity of folding a gep
				// into the select. This will eventually turn back to zext(cmp) if the
				// gep doesn't fold.
				if (CI.hasOneUse() && isa<GetElementPtrInst>(CI.user_back()) &&
				cast<GEPOperator>(CI.user_back())->countNonConstantIndices() == 1) {
				if (auto *Cmp = dyn_cast<CmpInst>(CI.getOperand(0))) {
				auto *TV = ConstantInt::get(cast<IntegerType>(CI.getType()), 1);
				auto *FV = ConstantInt::get(cast<IntegerType>(CI.getType()), 0);
				return SelectInst::Create(Cmp, TV, FV);
				}
				}

	// If one of the common conversion will work, do it.			// If one of the common conversion will work, do it.
	if (Instruction *Result = commonCastTransforms(CI))			if (Instruction *Result = commonCastTransforms(CI))
	return Result;			return Result;

	Value *Src = CI.getOperand(0);			Value *Src = CI.getOperand(0);
	Type SrcTy = Src->getType(), DestTy = CI.getType();			Type SrcTy = Src->getType(), DestTy = CI.getType();

	// Attempt to extend the entire input expression tree to the destination			// Attempt to extend the entire input expression tree to the destination
	▲ Show 20 Lines • Show All 1,335 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineInternal.h

Show First 20 Lines • Show All 804 Lines • ▼ Show 20 Lines	Value simplifyAMDGCNMemoryIntrinsicDemanded(IntrinsicInst II,
int DmaskIdx = -1);		int DmaskIdx = -1);

Value SimplifyDemandedVectorElts(Value V, APInt DemandedElts,		Value SimplifyDemandedVectorElts(Value V, APInt DemandedElts,
APInt &UndefElts, unsigned Depth = 0);		APInt &UndefElts, unsigned Depth = 0);

/// Canonicalize the position of binops relative to shufflevector.		/// Canonicalize the position of binops relative to shufflevector.
Instruction *foldVectorBinop(BinaryOperator &Inst);		Instruction *foldVectorBinop(BinaryOperator &Inst);

		Instruction FoldGEPIntoSelect(GetElementPtrInst &GEP, SelectInst SI);

/// Given a binary operator, cast instruction, or select which has a PHI node		/// Given a binary operator, cast instruction, or select which has a PHI node
/// as operand #0, see if we can fold the instruction into the PHI (which is		/// as operand #0, see if we can fold the instruction into the PHI (which is
/// only possible if all operands to the PHI are constants).		/// only possible if all operands to the PHI are constants).
Instruction foldOpIntoPhi(Instruction &I, PHINode PN);		Instruction foldOpIntoPhi(Instruction &I, PHINode PN);

/// Given an instruction with a select as one operand and a constant as the		/// Given an instruction with a select as one operand and a constant as the
/// other operand, try to fold the binary operator into the select arguments.		/// other operand, try to fold the binary operator into the select arguments.
/// This also works for Cast instructions, which obviously do not have a		/// This also works for Cast instructions, which obviously do not have a
▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show First 20 Lines • Show All 984 Lines • ▼ Show 20 Lines	static bool canSimplifyNullLoadOrGEP(LoadInst &LI, Value *Op) {
}		}
if (isa<UndefValue>(Op) \|\|		if (isa<UndefValue>(Op) \|\|
(isa<ConstantPointerNull>(Op) &&		(isa<ConstantPointerNull>(Op) &&
!NullPointerIsDefined(LI.getFunction(), LI.getPointerAddressSpace())))		!NullPointerIsDefined(LI.getFunction(), LI.getPointerAddressSpace())))
return true;		return true;
return false;		return false;
}		}

		// If we are loading from a select of GEPs, unless either of them was going to
		// trap anyway, we can safely speculate the load as long as we can statically
		// infer that the memory locations from both GEPs are in bounds.
		static bool isSafeToSpeculateLoad(Value TValue, Value FValue, LoadInst &LI,
		const DataLayout &DL) {
		assert(LI.isUnordered() && "Illegal speculation of ordered load");

		if (isa<GetElementPtrInst>(TValue) && isa<GetElementPtrInst>(FValue)) {
		auto &TGEP = *cast<GetElementPtrInst>(TValue);
		auto &FGEP = *cast<GetElementPtrInst>(FValue);

		// Returns the index of the different operand between two
		// Users as long as the rest of the operands are identical.
		// Returns -1 otherwise.
		auto HaveIdenticalOperandsExcept = [] (User &U1, User &U2) {
		if (U1.getNumOperands() != U2.getNumOperands())
		return -1;
		for (unsigned Idx = 0; Idx < U1.getNumOperands(); ++Idx)
		if (U1.getOperand(Idx) != U2.getOperand(Idx))
		if (std::equal(U1.op_begin()+Idx+1, U1.op_end(), U2.op_begin()+Idx+1))
		return (int)Idx;
		return -1;
		};

		int Idx = HaveIdenticalOperandsExcept(TGEP, FGEP);
		// Disregard the pointer operand.
		if (Idx > 0) {
		auto C1 = dyn_cast<ConstantInt>((TGEP.op_begin()+Idx));
		auto C2 = dyn_cast<ConstantInt>((FGEP.op_begin()+Idx));
		if (C1 && C2) {
		SmallVector<Value *, 8> Indices(TGEP.op_begin()+1, TGEP.op_begin()+Idx);
		uint64_t Max = std::max(C1->getLimitedValue(), C2->getLimitedValue());
		Type *GepRetTy = GetElementPtrInst::getGEPReturnType(
		TGEP.getPointerOperand(), Indices);
		Type *ElemTy = GepRetTy->getPointerElementType();
		if ((ElemTy->isArrayTy() && ElemTy->getArrayNumElements() > Max) \|\|
		(ElemTy->isStructTy() && ElemTy->getStructNumElements() > Max))
		return true;
		}
		}
		}

		// Fall-through to the conservative speculation logic.
		return isSafeToLoadUnconditionally(TValue, LI.getAlignment(), DL, &LI) &&
		isSafeToLoadUnconditionally(FValue, LI.getAlignment(), DL, &LI);
		}

Instruction *InstCombiner::visitLoadInst(LoadInst &LI) {		Instruction *InstCombiner::visitLoadInst(LoadInst &LI) {
Value *Op = LI.getOperand(0);		Value *Op = LI.getOperand(0);

// Try to canonicalize the loaded type.		// Try to canonicalize the loaded type.
if (Instruction Res = combineLoadToOperationType(this, LI))		if (Instruction Res = combineLoadToOperationType(this, LI))
return Res;		return Res;

// Attempt to improve the alignment.		// Attempt to improve the alignment.
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (Op->hasOneUse()) {
// introduced loads cannot trap! Something like this is valid as long as		// introduced loads cannot trap! Something like this is valid as long as
// the condition is always false: load (select bool %C, int* null, int* %G),		// the condition is always false: load (select bool %C, int* null, int* %G),
// but it would not be valid if we transformed it to load from null		// but it would not be valid if we transformed it to load from null
// unconditionally.		// unconditionally.
//		//
if (SelectInst *SI = dyn_cast<SelectInst>(Op)) {		if (SelectInst *SI = dyn_cast<SelectInst>(Op)) {
// load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2).		// load (select (Cond, &V1, &V2)) --> select(Cond, load &V1, load &V2).
unsigned Align = LI.getAlignment();		unsigned Align = LI.getAlignment();
if (isSafeToLoadUnconditionally(SI->getOperand(1), Align, DL, SI) &&		if (isSafeToSpeculateLoad(SI->getOperand(1), SI->getOperand(2), LI, DL)) {
isSafeToLoadUnconditionally(SI->getOperand(2), Align, DL, SI)) {
LoadInst *V1 = Builder.CreateLoad(SI->getOperand(1),		LoadInst *V1 = Builder.CreateLoad(SI->getOperand(1),
SI->getOperand(1)->getName()+".val");		SI->getOperand(1)->getName()+".val");
LoadInst *V2 = Builder.CreateLoad(SI->getOperand(2),		LoadInst *V2 = Builder.CreateLoad(SI->getOperand(2),
SI->getOperand(2)->getName()+".val");		SI->getOperand(2)->getName()+".val");
assert(LI.isUnordered() && "implied by above");		assert(LI.isUnordered() && "implied by above");
V1->setAlignment(Align);		V1->setAlignment(Align);
V1->setAtomic(LI.getOrdering(), LI.getSyncScopeID());		V1->setAtomic(LI.getOrdering(), LI.getSyncScopeID());
V2->setAlignment(Align);		V2->setAlignment(Align);
▲ Show 20 Lines • Show All 577 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 1,562 Lines • ▼ Show 20 Lines	if (llvm::any_of(SI.users(), [&](User *U) {
if (CI && CI->isEquality())		if (CI && CI->isEquality())
return true;		return true;
return false;		return false;
})) {		})) {
return nullptr;		return nullptr;
}		}
}		}

		// Let the GEP fold into the Select if possible. We can still optimize
		// it later if the GEP doesn't fold.
		if (SI.hasOneUse() && isa<GetElementPtrInst>(SI.user_back()) &&
		cast<GEPOperator>(SI.user_back())->countNonConstantIndices() == 1 &&
		isa<ConstantInt>(TrueVal) && isa<ConstantInt>(FalseVal))
		return nullptr;

if (Value *V = SimplifySelectInst(CondVal, TrueVal, FalseVal,		if (Value *V = SimplifySelectInst(CondVal, TrueVal, FalseVal,
SQ.getWithInstruction(&SI)))		SQ.getWithInstruction(&SI)))
return replaceInstUsesWith(SI, V);		return replaceInstUsesWith(SI, V);

if (Instruction *I = canonicalizeSelectToShuffle(SI))		if (Instruction *I = canonicalizeSelectToShuffle(SI))
return I;		return I;

// Canonicalize a one-use integer compare with a non-canonical predicate by		// Canonicalize a one-use integer compare with a non-canonical predicate by
▲ Show 20 Lines • Show All 436 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 775 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = CV->getNumOperands(); i != e; ++i) {
return nullptr;		return nullptr;
}		}
return ConstantExpr::getNeg(CV);		return ConstantExpr::getNeg(CV);
}		}

return nullptr;		return nullptr;
}		}

		Instruction *InstCombiner::FoldGEPIntoSelect(GetElementPtrInst &GEP,
		SelectInst *SI) {
		assert(GEP.hasOneUse() && isa<LoadInst>(GEP.user_back()) &&
		"GEP user is not a Load");
		assert(cast<GEPOperator>(&GEP)->countNonConstantIndices() == 1 &&
		"GEP has multiple non-constant indices");

		if (!SI->hasOneUse())
		return nullptr;

		Value *TV = SI->getTrueValue();
		Value *FV = SI->getFalseValue();

		if (!isa<ConstantInt>(TV) \|\| !isa<ConstantInt>(FV))
		return nullptr;

		Instruction *NewTV = GEP.clone();
		Instruction *NewFV = GEP.clone();
		NewTV->replaceUsesOfWith(SI, TV);
		NewFV->replaceUsesOfWith(SI, FV);
		InsertNewInstBefore(NewTV, GEP);
		InsertNewInstBefore(NewFV, GEP);
		return SelectInst::Create(SI->getCondition(), NewTV, NewFV);
		}

static Value foldOperationIntoSelectOperand(Instruction &I, Value SO,		static Value foldOperationIntoSelectOperand(Instruction &I, Value SO,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
if (auto *Cast = dyn_cast<CastInst>(&I))		if (auto *Cast = dyn_cast<CastInst>(&I))
return Builder.CreateCast(Cast->getOpcode(), SO, I.getType());		return Builder.CreateCast(Cast->getOpcode(), SO, I.getType());

assert(I.isBinaryOp() && "Unexpected opcode for select folding");		assert(I.isBinaryOp() && "Unexpected opcode for select folding");

// Figure out if the constant is the left or the right argument.		// Figure out if the constant is the left or the right argument.
▲ Show 20 Lines • Show All 891 Lines • ▼ Show 20 Lines	if (DI == -1) {
GEP.getParent()->getFirstInsertionPt(), NewGEP);		GEP.getParent()->getFirstInsertionPt(), NewGEP);
NewGEP->setOperand(DI, NewPN);		NewGEP->setOperand(DI, NewPN);
}		}

GEP.setOperand(0, NewGEP);		GEP.setOperand(0, NewGEP);
PtrOp = NewGEP;		PtrOp = NewGEP;
}		}

		// Before combining indices, try to fold the GEP into a Select as
		// long as its only user is a load. This can allow futher folding
		// of the load into the Select.
		if (GEP.hasOneUse() && isa<LoadInst>(GEP.user_back()) &&
		cast<GEPOperator>(&GEP)->countNonConstantIndices() == 1) {
		for (auto &Op : GEP.operands())
		if (auto *SI = dyn_cast<SelectInst>(Op))
		if (Instruction *NewSel = FoldGEPIntoSelect(GEP, SI))
		return NewSel;
		}

// Combine Indices - If the source pointer to this getelementptr instruction		// Combine Indices - If the source pointer to this getelementptr instruction
// is a getelementptr instruction, combine the indices of the two		// is a getelementptr instruction, combine the indices of the two
// getelementptr instructions into a single instruction.		// getelementptr instructions into a single instruction.
if (auto *Src = dyn_cast<GEPOperator>(PtrOp)) {		if (auto *Src = dyn_cast<GEPOperator>(PtrOp)) {
if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))		if (!shouldMergeGEPs(cast<GEPOperator>(&GEP), Src))
return nullptr;		return nullptr;

// Try to reassociate loop invariant GEP chains to enable LICM.		// Try to reassociate loop invariant GEP chains to enable LICM.
▲ Show 20 Lines • Show All 1,808 Lines • Show Last 20 Lines

test/CodeGen/AArch64/select-load.ll

	Show All 11 Lines
	; CHECK-NEXT: stp x19, x30, [sp, #16] // 16-byte Folded Spill			; CHECK-NEXT: stp x19, x30, [sp, #16] // 16-byte Folded Spill
	; CHECK-NEXT: .cfi_def_cfa_offset 32			; CHECK-NEXT: .cfi_def_cfa_offset 32
	; CHECK-NEXT: .cfi_offset w30, -8			; CHECK-NEXT: .cfi_offset w30, -8
	; CHECK-NEXT: .cfi_offset w19, -16			; CHECK-NEXT: .cfi_offset w19, -16
	; CHECK-NEXT: .cfi_offset w20, -32			; CHECK-NEXT: .cfi_offset w20, -32
	; CHECK-NEXT: mov w19, w1			; CHECK-NEXT: mov w19, w1
	; CHECK-NEXT: mov x20, x0			; CHECK-NEXT: mov x20, x0
	; CHECK-NEXT: bl foo			; CHECK-NEXT: bl foo
	; CHECK-NEXT: add x8, x20, #8 // =8			; CHECK-NEXT: ldp w8, w9, [x20, #4]
	; CHECK-NEXT: add x9, x20, #4 // =4
	; CHECK-NEXT: cmp w0, w19			; CHECK-NEXT: cmp w0, w19
	; CHECK-NEXT: csel x8, x8, x9, gt
	; CHECK-NEXT: ldr w0, [x8]
	; CHECK-NEXT: ldp x19, x30, [sp, #16] // 16-byte Folded Reload			; CHECK-NEXT: ldp x19, x30, [sp, #16] // 16-byte Folded Reload
				; CHECK-NEXT: csel w0, w9, w8, gt
	; CHECK-NEXT: ldr x20, [sp], #32 // 8-byte Folded Reload			; CHECK-NEXT: ldr x20, [sp], #32 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%call = tail call i32 @foo() #2			%call = tail call i32 @foo() #2
	%cmp = icmp sgt i32 %call, %x			%cmp = icmp sgt i32 %call, %x
	%0 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i64 2			%0 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i64 2
	%1 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i64 1			%1 = getelementptr inbounds %struct.A, %struct.A* %a, i64 0, i32 0, i64 1
	%.val = load i32, i32* %0, align 4			%.val = load i32, i32* %0, align 4
	%.val2 = load i32, i32* %1, align 4			%.val2 = load i32, i32* %1, align 4
	%2 = select i1 %cmp, i32 %.val, i32 %.val2			%2 = select i1 %cmp, i32 %.val, i32 %.val2
	ret i32 %2			ret i32 %2
	}			}

test/Transforms/InstCombine/gep-select.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -instcombine -S \| FileCheck %s

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64-arm"

				%struct.B = type { [4 x %struct.A], i32 }
				%struct.A = type { [2 x i32], i32, i32 }

				declare i32 @val()

				; It's safe to speculatively execute load(select c, gep1, gep2) if the geps
				; are almost identical, having only one different index, which is statically
				; known to be in bounds.
				define i32 @test1(%struct.B* %b, i64 %idx1, i64 %idx2, i1 %cmp) {
				; CHECK-LABEL: @test1(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds [[STRUCT_B:%.]], %struct.B* [[B:%.]], i64 [[IDX1:%.]], i32 0, i64 [[IDX2:%.*]], i32 1
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds [[STRUCT_B]], %struct.B [[B]], i64 [[IDX1]], i32 0, i64 [[IDX2]], i32 2
				; CHECK-NEXT: [[GEP1_VAL:%.]] = load i32, i32 [[GEP1]], align 4
				; CHECK-NEXT: [[GEP2_VAL:%.]] = load i32, i32 [[GEP2]], align 4
				; CHECK-NEXT: [[LD:%.]] = select i1 [[CMP:%.]], i32 [[GEP1_VAL]], i32 [[GEP2_VAL]]
				; CHECK-NEXT: ret i32 [[LD]]
				;
				entry:
				%gep1 = getelementptr inbounds %struct.B, %struct.B* %b, i64 %idx1, i32 0, i64 %idx2, i32 1
				%gep2 = getelementptr inbounds %struct.B, %struct.B* %b, i64 %idx1, i32 0, i64 %idx2, i32 2
				%sel = select i1 %cmp, i32* %gep1, i32* %gep2
				%ld = load i32, i32* %sel, align 4
				efriedmaUnsubmitted Not Done Reply Inline Actions I'm pretty sure this isn't safe, in general. The "inbounds" marker only guarantees that the arithmetic is in bounds; it doesn't make any guarantees about the type of the pointer. efriedma: I'm pretty sure this isn't safe, in general. The "inbounds" marker only guarantees that the…
				labrineaAuthorUnsubmitted Not Done Reply Inline Actions My comment was not referring to the inbounds marker. I am actually looking whether the indices derived from the select fall into the boundaries of a static type. In this example the last index of the geps ( `i32 1` and `i32 2` respectively) indicate valid offsets of struct members (the last two i32 members of struct.A). labrinea: My comment was not referring to the inbounds marker. I am actually looking whether the indices…
				efriedmaUnsubmitted Not Done Reply Inline Actions The static type of a pointer or GEP can't be used to prove anything about the memory it points to. In C, accessing a member does imply the base pointer points to a valid struct, but we don't record that anywhere in IR. (IIRC there was some discussion of trying to add metadata to model this, but it never went anywhere.) efriedma: The static type of a pointer or GEP can't be used to prove anything about the memory it points…
				ret i32 %ld
				}

				; Can't trivially speculate load(select c , gep1, gep2) since
				; the geps look quite different.
				define i32 @test2(%struct.B* %b, i64 %idx1, i64 %idx2, i1 %cmp) {
				; CHECK-LABEL: @test2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds [[STRUCT_B:%.]], %struct.B* [[B:%.]], i64 [[IDX1:%.]], i32 0, i64 [[IDX2:%.*]], i32 1
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds [[STRUCT_B]], %struct.B [[B]], i64 [[IDX1]], i32 1
				; CHECK-NEXT: [[SEL:%.]] = select i1 [[CMP:%.]], i32* [[GEP1]], i32* [[GEP2]]
				; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[SEL]], align 4
				; CHECK-NEXT: ret i32 [[LD]]
				;
				entry:
				%gep1 = getelementptr inbounds %struct.B, %struct.B* %b, i64 %idx1, i32 0, i64 %idx2, i32 1
				%gep2 = getelementptr inbounds %struct.B, %struct.B* %b, i64 %idx1, i32 1
				%sel = select i1 %cmp, i32* %gep1, i32* %gep2
				%ld = load i32, i32* %sel, align 4
				ret i32 %ld
				}

				; Can't speculate load(select c , gep1, gep2) since we can
				; statically infer that gep1 goes out of bounds.
				define i32 @test3(%struct.B* %b, i64 %idx1, i64 %idx2, i1 %cmp) {
				; CHECK-LABEL: @test3(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds [[STRUCT_B:%.]], %struct.B* [[B:%.]], i64 [[IDX1:%.]], i32 0, i64 [[IDX2:%.*]], i32 0, i64 2
				; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds [[STRUCT_B]], %struct.B [[B]], i64 [[IDX1]], i32 0, i64 [[IDX2]], i32 0, i64 1
				; CHECK-NEXT: [[SEL:%.]] = select i1 [[CMP:%.]], i32* [[GEP1]], i32* [[GEP2]]
				; CHECK-NEXT: [[LD:%.]] = load i32, i32 [[SEL]], align 4
				; CHECK-NEXT: ret i32 [[LD]]
				;
				entry:
				%gep1 = getelementptr inbounds %struct.B, %struct.B* %b, i64 %idx1, i32 0, i64 %idx2, i32 0, i64 2
				%gep2 = getelementptr inbounds %struct.B, %struct.B* %b, i64 %idx1, i32 0, i64 %idx2, i32 0, i64 1
				%sel = select i1 %cmp, i32* %gep1, i32* %gep2
				%ld = load i32, i32* %sel, align 4
				ret i32 %ld
				}

				; zext(cmp) ~> select cmp, 1, 0
				; gep(select cmp, 1, 0) ~> select cmp, (gep 1), (gep 0)
				; load(select cmp, (gep 1), (gep 0)) ~> select cmp, load(gep 1), load(gep 0)
				define i32 @test4(%struct.B* %b, i32 %x, i32 %y, i64 %idx) {
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CALL:%.*]] = call i32 @val()
				; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[CALL]], [[X:%.]]
				; CHECK-NEXT: [[CMP2:%.]] = icmp sgt i32 [[CALL]], [[Y:%.]]
				; CHECK-NEXT: [[IDX1:%.*]] = zext i1 [[CMP1]] to i64
				; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds [[STRUCT_B:%.]], %struct.B* [[B:%.]], i64 [[IDX:%.]], i32 0, i64 [[IDX1]], i32 0, i64 1
				; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds [[STRUCT_B]], %struct.B [[B]], i64 [[IDX]], i32 0, i64 [[IDX1]], i32 0, i64 0
				; CHECK-NEXT: [[DOTVAL:%.]] = load i32, i32 [[TMP0]], align 4
				; CHECK-NEXT: [[DOTVAL1:%.]] = load i32, i32 [[TMP1]], align 4
				; CHECK-NEXT: [[LD:%.*]] = select i1 [[CMP2]], i32 [[DOTVAL]], i32 [[DOTVAL1]]
				; CHECK-NEXT: ret i32 [[LD]]
				;
				entry:
				%call = call i32 @val()
				%cmp1 = icmp sgt i32 %call, %x
				%cmp2 = icmp sgt i32 %call, %y
				%idx1 = zext i1 %cmp1 to i64
				%idx2 = zext i1 %cmp2 to i64
				%gep1 = getelementptr inbounds %struct.B, %struct.B* %b, i64 %idx
				%a = getelementptr inbounds %struct.B, %struct.B* %gep1, i32 0, i32 0
				%gep2 = getelementptr inbounds [4 x %struct.A], [4 x %struct.A]* %a, i64 0, i64 %idx1
				%c = getelementptr inbounds %struct.A, %struct.A* %gep2, i32 0, i32 0
				%gep3 = getelementptr inbounds [2 x i32], [2 x i32]* %c, i64 0, i64 %idx2
				%ld = load i32, i32* %gep3, align 4
				ret i32 %ld
				}