This is an archive of the discontinued LLVM Phabricator instance.

Fixed several correctness issues on mishandling s/zext in SeparateConstOffsetFromGEP
ClosedPublic

Authored by jingyue on Jun 1 2014, 7:31 PM.

Download Raw Diff

Details

Reviewers

eliben
meheff

Summary

Fixes:

When rebuilding new indices, s/zext should be distributed to sub-expressions. e.g., sext(a +nsw (b +nsw 5)) = sext(a) + sext(b) + 5 but not sext(a + b) + 5. This also affects the logic of recursively looking for a constant offset, we need to include s/zext into the context of the searching.
Function find should return the bitwidth of the constant offset instead of always sign-extending it to i64.
Stop shortcutting zext'ed GEP indices. LLVM conceptually sign-extends GEP indices to pointer-size before computing the address. Therefore, gep base, zext(a + b) != gep base, a + b

Improvements:

Add an optimization for splitting sext(a + b): if a + b is proven non-negative (e.g., used as an index of an inbound GEP) and one of a, b is non-negative, sext(a + b) = sext(a) + sext(b)
Function Distributable checks whether both sext and zext can be distributed to operands of a binary operator. This helps us split zext(sext(a + b)) to zext(sext(a) + zext(sext(b)) when a + b does not signed or unsigned overflow.

Refactor:

Merge some common logic of handling add/sub/or in find.

Add many tests in split-gep.ll and split-gep-and-gvn.ll to verify the changes we made.

Diff Detail

Event Timeline

jingyue updated this revision to Diff 9999.Jun 1 2014, 7:31 PM

jingyue retitled this revision from to Fixed several correctness issues in SeparateConstOffsetFromGEP.

jingyue updated this object.

jingyue edited the test plan for this revision. (Show Details)

jingyue added a reviewer: eliben.

jingyue added a subscriber: Unknown Object (MLST).

Herald added a subscriber: jholewinski. · View Herald TranscriptJun 1 2014, 7:31 PM

The amount of change here is very large - seems like you combined the functional fixes with some refactoring. It would be helpful if the commit message went into more detail about the refactoring as well.

jingyue added a reviewer: meheff.Jun 2 2014, 9:57 AM

jingyue retitled this revision from Fixed several correctness issues in SeparateConstOffsetFromGEP to Fixed several correctness issues on mishandling s/zext in SeparateConstOffsetFromGEP.Jun 2 2014, 2:48 PM

jingyue updated this object.

jingyue edited edge metadata.

Looks pretty good, in general. It was a bit tough to reason about in places, but I think that had to do with the complexity of wrapping my head around how sext/zext work across operators as much as anything. I don't have great suggestions for making it significantly simpler.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
279	Are XOR and AND handled properly? OR is limited to cases where it is equivalent to add so no problem there, but it seems like hoisting a constant out of a XOR or AND expression would be problematic.
374	Can this OR logic be moved up into Distributable or maybe move Distributable logic here? In general, I found find() a bit tricky to follow and reason about. Some of it is due to the complexity about the sext/zext identities you have to use, but some it was because the logic for checking whether you can traverse beneath an operator is in more than one place.
391	Is UserChain updated properly in the case where the constant is negative? Seems like you could have some elements added to UserChain in the call to findInEitherOperand above then it hits this condition and returns 0 and searching begins along another branch in the expression with stale elements in UserChain. Also, if ConstOffset is nonnegative does that necessarily guarantee that one of the operands of the add is nonnegative? The constant could be down more than one level in the expression, right?
466	Why is the type of ExtInsts[0] used here rather than the last element of ExtInsts which would be closest to the constant?
485	Should be: ...is not the RHS of a sub. Also, I think it'd be a bit clearer if your logic was in the same form as your comment like so: if (I->isZero() && !(BO->getOpcode() == Instruction::Sub && OpNo == 0)) { Or alternatively change the comment logic.

Mark, thanks for your reviews! I fixed all the issues you spotted. PTAL

jingyue added inline comments.Jun 5 2014, 1:36 PM

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
279	Good catch! Fixed by ignoring XOR and AND.
374	Merged the logics for OR, ADD and SUB into CanTraceInto.
391	Good catch * 2! The new code only traces into sext(a + b) (without nsw) if a + b >= 0 and one of a and b is non-negative. Therefore, we needn't worry about restoring UserChain any more.
466	We distribute s/zext all the way down UserChain, so the type should be the outmost type. e.g., after distributing s/zext, zext(sext(a + (b + 5)) (assuming no overflow) becomes zext(sext(a)) + (zext(sext(b)) + zext(sext(5))). Then, removing constant offset 5 leaves zext(sext(a)) + (zext(sext(b)) + zext(sext(0))). The type of zext(sext(0)) is the type of the outmost zext. To make the logic of rebuildWithoutConstOffset clearer, I split it into two steps: distributing s/zext and removing the constant offset. Although the code may run slower (two passes instead of one), the logic should be much simpler to follow. PTAL
485	You're right. Done

LGTM. I like the restructuring of the traversal logic. It's easier to follow now.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
302	True for zext as well? If so, please add to comment.
429–461	nit: maybe call function distributeExtsAndCloneChain.
503	Can number of users ever be zero? That is, can't you assert B0->getNumUses() == 1?
504	If you change the name of distributeExt, you'll have to change it here as well.

Addressed all Mark's comments.

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
302	Yes, although not leveraged just yet.
429–461	done
503	UserChain[0], the future GEP index, is not linked in the program yet, and is unused.
504	Done.

meheff accepted this revision.Jun 5 2014, 3:17 PM

meheff edited edge metadata.

This revision is now accepted and ready to land.Jun 5 2014, 3:17 PM

r210291

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

SeparateConstOffsetFromGEP.cpp

542 lines

test/

Transforms/

SeparateConstOffsetFromGEP/

NVPTX/

split-gep-and-gvn.ll

55 lines

split-gep.ll

193 lines

Diff 10157

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines
/// 5); nor can we transform (3 * (a + 5)) to (3 * a + 5), however in this case,		/// 5); nor can we transform (3 * (a + 5)) to (3 * a + 5), however in this case,
/// -instcombine probably already optimized (3 * (a + 5)) to (3 * a + 15).		/// -instcombine probably already optimized (3 * (a + 5)) to (3 * a + 15).
class ConstantOffsetExtractor {		class ConstantOffsetExtractor {
public:		public:
/// Extracts a constant offset from the given GEP index. It outputs the		/// Extracts a constant offset from the given GEP index. It outputs the
/// numeric value of the extracted constant offset (0 if failed), and a		/// numeric value of the extracted constant offset (0 if failed), and a
/// new index representing the remainder (equal to the original index minus		/// new index representing the remainder (equal to the original index minus
/// the constant offset).		/// the constant offset).
/// \p Idx The given GEP index		/// \p Idx The given GEP index
/// \p NewIdx The new index to replace		/// \p NewIdx The new index to replace (output)
/// \p DL The datalayout of the module		/// \p DL The datalayout of the module
/// \p IP Calculating the new index requires new instructions. IP indicates		/// \p GEP The given GEP
/// where to insert them (typically right before the GEP).
static int64_t Extract(Value Idx, Value &NewIdx, const DataLayout *DL,		static int64_t Extract(Value Idx, Value &NewIdx, const DataLayout *DL,
Instruction *IP);		GetElementPtrInst *GEP);
/// Looks for a constant offset without extracting it. The meaning of the		/// Looks for a constant offset without extracting it. The meaning of the
/// arguments and the return value are the same as Extract.		/// arguments and the return value are the same as Extract.
static int64_t Find(Value Idx, const DataLayout DL);		static int64_t Find(Value Idx, const DataLayout DL, GetElementPtrInst *GEP);

private:		private:
ConstantOffsetExtractor(const DataLayout Layout, Instruction InsertionPt)		ConstantOffsetExtractor(const DataLayout Layout, Instruction InsertionPt)
: DL(Layout), IP(InsertionPt) {}		: DL(Layout), IP(InsertionPt) {}
/// Searches the expression that computes V for a constant offset. If the		/// Searches the expression that computes V for a non-zero constant C s.t.
/// searching is successful, update UserChain as a path from V to the constant		/// V can be reassociated into the form V' + C. If the searching is
/// offset.		/// successful, returns C and update UserChain as a def-use chain from C to V;
int64_t find(Value *V);		/// otherwise, UserChain is empty.
/// A helper function to look into both operands of a binary operator U.
/// \p IsSub Whether U is a sub operator. If so, we need to negate the
/// constant offset at some point.
int64_t findInEitherOperand(User *U, bool IsSub);
/// After finding the constant offset and how it is reached from the GEP
/// index, we build a new index which is a clone of the old one except the
/// constant offset is removed. For example, given (a + (b + 5)) and knowning
/// the constant offset is 5, this function returns (a + b).
///		///
/// We cannot simply change the constant to zero because the expression that		/// \p V The given expression
/// computes the index or its intermediate result may be used by others.		/// \p SignExtended Whether V will be sign-extended in the computation of the
Value *rebuildWithoutConstantOffset();		/// GEP index
// A helper function for rebuildWithoutConstantOffset that rebuilds the direct		/// \p ZeroExtended Whether V will be zero-extended in the computation of the
// user (U) of the constant offset (C).		/// GEP index
Value rebuildLeafWithoutConstantOffset(User U, Value *C);		/// \p NonNegative Whether V is guaranteed to be non-negative. For example,
/// Returns a clone of U except the first occurrence of From with To.		/// an index of an inbounds GEP is guaranteed to be
Value cloneAndReplace(User U, Value From, Value To);		/// non-negative. Levaraging this, we can better split
		/// inbounds GEPs.
		APInt find(Value *V, bool SignExtended, bool ZeroExtended, bool NonNegative);
		/// A helper function to look into both operands of a binary operator.
		APInt findInEitherOperand(BinaryOperator *BO, bool SignExtended,
		bool ZeroExtended);
		/// After finding the constant offset C from the GEP index I, we build a new
		/// index I' s.t. I' + C = I. This function builds and returns the new
		/// index I' according to UserChain produced by function "find".
		///
		/// The building conceptually takes two steps:
		/// 1) iteratively distribute s/zext towards the leaves of the expression tree
		/// that computes I
		/// 2) reassociate the expression tree to the form I' + C.
		///
		/// For example, to extract the 5 from sext(a + (b + 5)), we first distribute
		/// sext to a, b and 5 so that we have
		/// sext(a) + (sext(b) + 5).
		/// Then, we reassociate it to
		/// (sext(a) + sext(b)) + 5.
		/// Given this form, we know I' is sext(a) + sext(b).
		Value *rebuildWithoutConstOffset();
		/// After the first step of rebuilding the GEP index without the constant
		/// offset, distribute s/zext to the operands of all operators in UserChain.
		/// e.g., zext(sext(a + (b + 5)) (assuming no overflow) =>
		/// zext(sext(a)) + (zext(sext(b)) + zext(sext(5))).
		///
		/// The function also updates UserChain to point to new subexpressions after
		/// distributing s/zext. e.g., the old UserChain of the above example is
		/// 5 -> b + 5 -> a + (b + 5) -> sext(...) -> zext(sext(...)),
		/// and the new UserChain is
		/// zext(sext(5)) -> zext(sext(b)) + zext(sext(5)) ->
		/// zext(sext(a)) + (zext(sext(b)) + zext(sext(5))
		///
		/// \p ChainIndex The index to UserChain. ChainIndex is initially
		/// UserChain.size() - 1, and is decremented during
		/// the recursion.
		Value *distributeExtsAndCloneChain(unsigned ChainIndex);
		/// Reassociates the GEP index to the form I' + C and returns I'.
		Value *removeConstOffset(unsigned ChainIndex);
		/// A helper function to apply ExtInsts, a list of s/zext, to value V.
		/// e.g., if ExtInsts = [sext i32 to i64, zext i16 to i32], this function
		/// returns "sext i32 (zext i16 V to i32) to i64".
		Value applyExts(Value V);

/// Returns true if LHS and RHS have no bits in common, i.e., LHS \| RHS == 0.		/// Returns true if LHS and RHS have no bits in common, i.e., LHS \| RHS == 0.
bool NoCommonBits(Value LHS, Value RHS) const;		bool NoCommonBits(Value LHS, Value RHS) const;
/// Computes which bits are known to be one or zero.		/// Computes which bits are known to be one or zero.
/// \p KnownOne Mask of all bits that are known to be one.		/// \p KnownOne Mask of all bits that are known to be one.
/// \p KnownZero Mask of all bits that are known to be zero.		/// \p KnownZero Mask of all bits that are known to be zero.
void ComputeKnownBits(Value *V, APInt &KnownOne, APInt &KnownZero) const;		void ComputeKnownBits(Value *V, APInt &KnownOne, APInt &KnownZero) const;
/// Finds the first use of Used in U. Returns -1 if not found.		/// A helper function that returns whether we can trace into the operands
static unsigned FindFirstUse(User U, Value Used);		/// of binary operator BO for a constant offset.
/// Returns whether OPC (sext or zext) can be distributed to the operands of		///
/// BO. e.g., sext can be distributed to the operands of an "add nsw" because		/// \p SignExtended Whether BO is surrounded by sext
/// sext (add nsw a, b) == add nsw (sext a), (sext b).		/// \p ZeroExtended Whether BO is surrounded by zext
static bool Distributable(unsigned OPC, BinaryOperator *BO);		/// \p NonNegative Whether BO is known to be non-negative, e.g., an in-bound
		/// array index.
		bool CanTraceInto(bool SignExtended, bool ZeroExtended, BinaryOperator *BO,
		bool NonNegative);

/// The path from the constant offset to the old GEP index. e.g., if the GEP		/// The path from the constant offset to the old GEP index. e.g., if the GEP
/// index is "a * b + (c + 5)". After running function find, UserChain[0] will		/// index is "a * b + (c + 5)". After running function find, UserChain[0] will
/// be the constant 5, UserChain[1] will be the subexpression "c + 5", and		/// be the constant 5, UserChain[1] will be the subexpression "c + 5", and
/// UserChain[2] will be the entire expression "a * b + (c + 5)".		/// UserChain[2] will be the entire expression "a * b + (c + 5)".
///		///
/// This path helps rebuildWithoutConstantOffset rebuild the new GEP index.		/// This path helps to rebuild the new GEP index.
SmallVector<User *, 8> UserChain;		SmallVector<User *, 8> UserChain;
		/// A data structure used in rebuildWithoutConstOffset. Contains all
		/// sext/zext instructions along UserChain.
		SmallVector<CastInst *, 16> ExtInsts;
/// The data layout of the module. Used in ComputeKnownBits.		/// The data layout of the module. Used in ComputeKnownBits.
const DataLayout *DL;		const DataLayout *DL;
Instruction *IP; /// Insertion position of cloned instructions.		Instruction *IP; /// Insertion position of cloned instructions.
};		};

/// \brief A pass that tries to split every GEP in the function into a variadic		/// \brief A pass that tries to split every GEP in the function into a variadic
/// base and a constant offset. It is a FunctionPass because searching for the		/// base and a constant offset. It is a FunctionPass because searching for the
/// constant offset may inspect other basic blocks.		/// constant offset may inspect other basic blocks.
Show All 34 Lines	INITIALIZE_PASS_END(
SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",		SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",
"Split GEPs to a variadic base and a constant offset for better CSE", false,		"Split GEPs to a variadic base and a constant offset for better CSE", false,
false)		false)

FunctionPass *llvm::createSeparateConstOffsetFromGEPPass() {		FunctionPass *llvm::createSeparateConstOffsetFromGEPPass() {
return new SeparateConstOffsetFromGEP();		return new SeparateConstOffsetFromGEP();
}		}

bool ConstantOffsetExtractor::Distributable(unsigned OPC, BinaryOperator *BO) {		bool ConstantOffsetExtractor::CanTraceInto(bool SignExtended,
assert(OPC == Instruction::SExt \|\| OPC == Instruction::ZExt);		bool ZeroExtended,
		BinaryOperator *BO,
		bool NonNegative) {
		// We only consider ADD, SUB and OR, because a non-zero constant found in
		// expressions composed of these operations can be easily hoisted as a
		// constant offset by reassociation.
		if (BO->getOpcode() != Instruction::Add &&
		BO->getOpcode() != Instruction::Sub &&
		BO->getOpcode() != Instruction::Or) {
		meheffUnsubmitted Not Done Reply Inline Actions Are XOR and AND handled properly? OR is limited to cases where it is equivalent to add so no problem there, but it seems like hoisting a constant out of a XOR or AND expression would be problematic. meheff: Are XOR and AND handled properly? OR is limited to cases where it is equivalent to add so no…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Good catch! Fixed by ignoring XOR and AND. jingyue: Good catch! Fixed by ignoring XOR and AND.
		return false;
		}

		Value LHS = BO->getOperand(0), RHS = BO->getOperand(1);
		// Do not trace into "or" unless it is equivalent to "add". If LHS and RHS
		// don't have common bits, (LHS \| RHS) is equivalent to (LHS + RHS).
		if (BO->getOpcode() == Instruction::Or && !NoCommonBits(LHS, RHS))
		return false;

		// In addition, tracing into BO requires that its surrounding s/zext (if
		// any) is distributable to both operands.
		//
		// Suppose BO = A op B.
		// SignExtended \| ZeroExtended \| Distributable?
		// --------------+--------------+----------------------------------
		// 0 \| 0 \| true because no s/zext exists
		// 0 \| 1 \| zext(BO) == zext(A) op zext(B)
		// 1 \| 0 \| sext(BO) == sext(A) op sext(B)
		// 1 \| 1 \| zext(sext(BO)) ==
		// \| \| zext(sext(A)) op zext(sext(B))
		if (BO->getOpcode() == Instruction::Add && NonNegative) {
		// If a + b >= 0 and (a >= 0 or b >= 0), then
		// s/zext(a + b) = s/zext(a) + s/zext(b)
		meheffUnsubmitted Not Done Reply Inline Actions True for zext as well? If so, please add to comment. meheff: True for zext as well? If so, please add to comment.
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Yes, although not leveraged just yet. jingyue: Yes, although not leveraged just yet.
		// even if the addition is not marked nsw.
		//
		// Leveraging this invarient, we can trace into an sext'ed inbound GEP
		// index if the constant offset is non-negative.
		//
		// Verified in @sext_add in split-gep.ll.
		if (ConstantInt *ConstLHS = dyn_cast<ConstantInt>(LHS)) {
		if (!ConstLHS->isNegative())
		return true;
		}
		if (ConstantInt *ConstRHS = dyn_cast<ConstantInt>(RHS)) {
		if (!ConstRHS->isNegative())
		return true;
		}
		}

// sext (add/sub nsw A, B) == add/sub nsw (sext A), (sext B)		// sext (add/sub nsw A, B) == add/sub nsw (sext A), (sext B)
// zext (add/sub nuw A, B) == add/sub nuw (zext A), (zext B)		// zext (add/sub nuw A, B) == add/sub nuw (zext A), (zext B)
if (BO->getOpcode() == Instruction::Add \|\|		if (BO->getOpcode() == Instruction::Add \|\|
BO->getOpcode() == Instruction::Sub) {		BO->getOpcode() == Instruction::Sub) {
return (OPC == Instruction::SExt && BO->hasNoSignedWrap()) \|\|		if (SignExtended && !BO->hasNoSignedWrap())
(OPC == Instruction::ZExt && BO->hasNoUnsignedWrap());		return false;
		if (ZeroExtended && !BO->hasNoUnsignedWrap())
		return false;
}		}

// sext/zext (and/or/xor A, B) == and/or/xor (sext/zext A), (sext/zext B)		return true;
// -instcombine also leverages this invariant to do the reverse
// transformation to reduce integer casts.
return BO->getOpcode() == Instruction::And \|\|
BO->getOpcode() == Instruction::Or \|\|
BO->getOpcode() == Instruction::Xor;
}		}

int64_t ConstantOffsetExtractor::findInEitherOperand(User *U, bool IsSub) {		APInt ConstantOffsetExtractor::findInEitherOperand(BinaryOperator *BO,
assert(U->getNumOperands() == 2);		bool SignExtended,
int64_t ConstantOffset = find(U->getOperand(0));		bool ZeroExtended) {
		// BO being non-negative does not shed light on whether its operands are
		// non-negative. Clear the NonNegative flag here.
		APInt ConstantOffset = find(BO->getOperand(0), SignExtended, ZeroExtended,
		/* NonNegative */ false);
// If we found a constant offset in the left operand, stop and return that.		// If we found a constant offset in the left operand, stop and return that.
// This shortcut might cause us to miss opportunities of combining the		// This shortcut might cause us to miss opportunities of combining the
// constant offsets in both operands, e.g., (a + 4) + (b + 5) => (a + b) + 9.		// constant offsets in both operands, e.g., (a + 4) + (b + 5) => (a + b) + 9.
// However, such cases are probably already handled by -instcombine,		// However, such cases are probably already handled by -instcombine,
// given this pass runs after the standard optimizations.		// given this pass runs after the standard optimizations.
if (ConstantOffset != 0) return ConstantOffset;		if (ConstantOffset != 0) return ConstantOffset;
ConstantOffset = find(U->getOperand(1));		ConstantOffset = find(BO->getOperand(1), SignExtended, ZeroExtended,
		/* NonNegative */ false);
// If U is a sub operator, negate the constant offset found in the right		// If U is a sub operator, negate the constant offset found in the right
// operand.		// operand.
return IsSub ? -ConstantOffset : ConstantOffset;		if (BO->getOpcode() == Instruction::Sub)
		ConstantOffset = -ConstantOffset;
		return ConstantOffset;
}		}

int64_t ConstantOffsetExtractor::find(Value *V) {		APInt ConstantOffsetExtractor::find(Value *V, bool SignExtended,
// TODO(jingyue): We can even trace into integer/pointer casts, such as		bool ZeroExtended, bool NonNegative) {
		// TODO(jingyue): We could trace into integer/pointer casts, such as
// inttoptr, ptrtoint, bitcast, and addrspacecast. We choose to handle only		// inttoptr, ptrtoint, bitcast, and addrspacecast. We choose to handle only
// integers because it gives good enough results for our benchmarks.		// integers because it gives good enough results for our benchmarks.
assert(V->getType()->isIntegerTy());		unsigned BitWidth = cast<IntegerType>(V->getType())->getBitWidth();

		// We cannot do much with Values that are not a User, such as an Argument.
User *U = dyn_cast<User>(V);		User *U = dyn_cast<User>(V);
// We cannot do much with Values that are not a User, such as BasicBlock and		if (U == nullptr) return APInt(BitWidth, 0);
// MDNode.
if (U == nullptr) return 0;

int64_t ConstantOffset = 0;		APInt ConstantOffset(BitWidth, 0);
if (ConstantInt *CI = dyn_cast<ConstantInt>(U)) {		if (ConstantInt *CI = dyn_cast<ConstantInt>(V)) {
// Hooray, we found it!		// Hooray, we found it!
ConstantOffset = CI->getSExtValue();		ConstantOffset = CI->getValue();
} else if (Operator *O = dyn_cast<Operator>(U)) {		} else if (BinaryOperator *BO = dyn_cast<BinaryOperator>(V)) {
// The GEP index may be more complicated than a simple addition of a		// Trace into subexpressions for more hoisting opportunities.
// varaible and a constant. Therefore, we trace into subexpressions for more		if (CanTraceInto(SignExtended, ZeroExtended, BO, NonNegative)) {
// hoisting opportunities.		ConstantOffset = findInEitherOperand(BO, SignExtended, ZeroExtended);
switch (O->getOpcode()) {		}
case Instruction::Add: {		} else if (isa<SExtInst>(V)) {
		meheffUnsubmitted Not Done Reply Inline Actions Can this OR logic be moved up into Distributable or maybe move Distributable logic here? In general, I found find() a bit tricky to follow and reason about. Some of it is due to the complexity about the sext/zext identities you have to use, but some it was because the logic for checking whether you can traverse beneath an operator is in more than one place. meheff: Can this OR logic be moved up into Distributable or maybe move Distributable logic here? In…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Merged the logics for OR, ADD and SUB into CanTraceInto. jingyue: Merged the logics for OR, ADD and SUB into CanTraceInto.
ConstantOffset = findInEitherOperand(U, false);		ConstantOffset = find(U->getOperand(0), /* SignExtended */ true,
break;		ZeroExtended, NonNegative).sext(BitWidth);
}		} else if (isa<ZExtInst>(V)) {
case Instruction::Sub: {		// As an optimization, we can clear the SignExtended flag because
ConstantOffset = findInEitherOperand(U, true);		// sext(zext(a)) = zext(a). Verified in @sext_zext in split-gep.ll.
break;		//
}		// Clear the NonNegative flag, because zext(a) >= 0 does not imply a >= 0.
case Instruction::Or: {		// TODO: if zext(a) < 2 ^ (bitwidth(a) - 1), we can prove a >= 0.
// If LHS and RHS don't have common bits, (LHS \| RHS) is equivalent to		ConstantOffset =
// (LHS + RHS).		find(U->getOperand(0), /* SignExtended */ false,
if (NoCommonBits(U->getOperand(0), U->getOperand(1)))		/* ZeroExtended / true, / NonNegative */ false).zext(BitWidth);
ConstantOffset = findInEitherOperand(U, false);		}
break;
}		// If we found a non-zero constant offset, add it to the path for
case Instruction::SExt:		// rebuildWithoutConstOffset. Zero is a valid constant offset, but doesn't
case Instruction::ZExt: {		// help this optimization.
// We trace into sext/zext if the operator can be distributed to its
// operand. e.g., we can transform into "sext (add nsw a, 5)" and
// extract constant 5, because
// sext (add nsw a, 5) == add nsw (sext a), 5
if (BinaryOperator *BO = dyn_cast<BinaryOperator>(U->getOperand(0))) {
if (Distributable(O->getOpcode(), BO))
ConstantOffset = find(U->getOperand(0));
}
break;
}
}
}
// If we found a non-zero constant offset, adds it to the path for future
// transformation (rebuildWithoutConstantOffset). Zero is a valid constant
// offset, but doesn't help this optimization.
if (ConstantOffset != 0)		if (ConstantOffset != 0)
		meheffUnsubmitted Not Done Reply Inline Actions Is UserChain updated properly in the case where the constant is negative? Seems like you could have some elements added to UserChain in the call to findInEitherOperand above then it hits this condition and returns 0 and searching begins along another branch in the expression with stale elements in UserChain. Also, if ConstOffset is nonnegative does that necessarily guarantee that one of the operands of the add is nonnegative? The constant could be down more than one level in the expression, right? meheff: Is UserChain updated properly in the case where the constant is negative? Seems like you could…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Good catch * 2! The new code only traces into sext(a + b) (without nsw) if a + b >= 0 and one of a and b is non-negative. Therefore, we needn't worry about restoring UserChain any more. jingyue: Good catch * 2! The new code only traces into sext(a + b) (without nsw) if a + b >= 0 and one…
UserChain.push_back(U);		UserChain.push_back(U);
return ConstantOffset;		return ConstantOffset;
}		}

unsigned ConstantOffsetExtractor::FindFirstUse(User U, Value Used) {		Value ConstantOffsetExtractor::applyExts(Value V) {
for (unsigned I = 0, E = U->getNumOperands(); I < E; ++I) {		Value *Current = V;
if (U->getOperand(I) == Used)		// ExtInsts is built in the use-def order. Therefore, we apply them to V
return I;		// in the reversed order.
}		for (auto I = ExtInsts.rbegin(), E = ExtInsts.rend(); I != E; ++I) {
return -1;		if (Constant *C = dyn_cast<Constant>(Current)) {
}		// If Current is a constant, apply s/zext using ConstantExpr::getCast.
		// ConstantExpr::getCast emits a ConstantInt if C is a ConstantInt.
Value ConstantOffsetExtractor::cloneAndReplace(User U, Value *From,		Current = ConstantExpr::getCast((I)->getOpcode(), C, (I)->getType());
Value *To) {		} else {
// Finds in U the first use of From. It is safe to ignore future occurrences		Instruction Ext = (I)->clone();
// of From, because findInEitherOperand similarly stops searching the right		Ext->setOperand(0, Current);
// operand when the first operand has a non-zero constant offset.		Ext->insertBefore(IP);
unsigned OpNo = FindFirstUse(U, From);		Current = Ext;
assert(OpNo != (unsigned)-1 && "UserChain wasn't built correctly");		}
		}
// ConstantOffsetExtractor::find only follows Operators (i.e., Instructions		return Current;
// and ConstantExprs). Therefore, U is either an Instruction or a		}
// ConstantExpr.
if (Instruction *I = dyn_cast<Instruction>(U)) {		Value *ConstantOffsetExtractor::rebuildWithoutConstOffset() {
Instruction *Clone = I->clone();		distributeExtsAndCloneChain(UserChain.size() - 1);
Clone->setOperand(OpNo, To);		// Remove all nullptrs (used to be s/zext) from UserChain.
Clone->insertBefore(IP);		unsigned NewSize = 0;
return Clone;		for (auto I = UserChain.begin(), E = UserChain.end(); I != E; ++I) {
}		if (*I != nullptr) {
// cast<Constant>(To) is safe because a ConstantExpr only uses Constants.		UserChain[NewSize] = *I;
return cast<ConstantExpr>(U)		NewSize++;
->getWithOperandReplaced(OpNo, cast<Constant>(To));		}
}		}
		UserChain.resize(NewSize);
Value ConstantOffsetExtractor::rebuildLeafWithoutConstantOffset(User U,		return removeConstOffset(UserChain.size() - 1);
Value *C) {		}
assert(U->getNumOperands() <= 2 &&
"We didn't trace into any operator with more than 2 operands");		Value *
// If U has only one operand which is the constant offset, removing the		ConstantOffsetExtractor::distributeExtsAndCloneChain(unsigned ChainIndex) {
// constant offset leaves U as a null value.		User *U = UserChain[ChainIndex];
if (U->getNumOperands() == 1)		if (ChainIndex == 0) {
return Constant::getNullValue(U->getType());		assert(isa<ConstantInt>(U));
		// If U is a ConstantInt, applyExts will return a ConstantInt as well.
// U->getNumOperands() == 2		return UserChain[ChainIndex] = cast<ConstantInt>(applyExts(U));
unsigned OpNo = FindFirstUse(U, C); // U->getOperand(OpNo) == C		}
assert(OpNo < 2 && "UserChain wasn't built correctly");
Value *TheOther = U->getOperand(1 - OpNo); // The other operand of U		if (CastInst *Cast = dyn_cast<CastInst>(U)) {
// If U = C - X, removing C makes U = -X; otherwise U will simply be X.		assert((isa<SExtInst>(Cast) \|\| isa<ZExtInst>(Cast)) &&
if (!isa<SubOperator>(U) \|\| OpNo == 1)		"We only traced into two types of CastInst: sext and zext");
		ExtInsts.push_back(Cast);
		UserChain[ChainIndex] = nullptr;
		return distributeExtsAndCloneChain(ChainIndex - 1);
		}

		// Function find only trace into BinaryOperator and CastInst.
		BinaryOperator *BO = cast<BinaryOperator>(U);
		// OpNo = which operand of BO is UserChain[ChainIndex - 1]
		unsigned OpNo = (BO->getOperand(0) == UserChain[ChainIndex - 1] ? 0 : 1);
		Value *TheOther = applyExts(BO->getOperand(1 - OpNo));
		Value *NextInChain = distributeExtsAndCloneChain(ChainIndex - 1);

		BinaryOperator *NewBO = nullptr;
		if (OpNo == 0) {
		NewBO = BinaryOperator::Create(BO->getOpcode(), NextInChain, TheOther,
		BO->getName(), IP);
		} else {
		NewBO = BinaryOperator::Create(BO->getOpcode(), TheOther, NextInChain,
		BO->getName(), IP);
		}
		return UserChain[ChainIndex] = NewBO;
		meheffUnsubmitted Not Done Reply Inline Actions nit: maybe call function distributeExtsAndCloneChain. meheff: nit: maybe call function distributeExtsAndCloneChain.
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions done jingyue: done
		}

		Value *ConstantOffsetExtractor::removeConstOffset(unsigned ChainIndex) {
		if (ChainIndex == 0) {
		assert(isa<ConstantInt>(UserChain[ChainIndex]));
		meheffUnsubmitted Not Done Reply Inline Actions Why is the type of ExtInsts[0] used here rather than the last element of ExtInsts which would be closest to the constant? meheff: Why is the type of ExtInsts[0] used here rather than the last element of ExtInsts which would…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions We distribute s/zext all the way down UserChain, so the type should be the outmost type. e.g., after distributing s/zext, zext(sext(a + (b + 5)) (assuming no overflow) becomes zext(sext(a)) + (zext(sext(b)) + zext(sext(5))). Then, removing constant offset 5 leaves zext(sext(a)) + (zext(sext(b)) + zext(sext(0))). The type of zext(sext(0)) is the type of the outmost zext. To make the logic of rebuildWithoutConstOffset clearer, I split it into two steps: distributing s/zext and removing the constant offset. Although the code may run slower (two passes instead of one), the logic should be much simpler to follow. PTAL jingyue: We distribute s/zext all the way down UserChain, so the type should be the outmost type. e.g.
		return ConstantInt::getNullValue(UserChain[ChainIndex]->getType());
		}

		BinaryOperator *BO = cast<BinaryOperator>(UserChain[ChainIndex]);
		unsigned OpNo = (BO->getOperand(0) == UserChain[ChainIndex - 1] ? 0 : 1);
		assert(BO->getOperand(OpNo) == UserChain[ChainIndex - 1]);
		Value *NextInChain = removeConstOffset(ChainIndex - 1);
		Value *TheOther = BO->getOperand(1 - OpNo);

		// If NextInChain is 0 and not the LHS of a sub, we can simplify the
		// sub-expression to be just TheOther.
		if (ConstantInt *CI = dyn_cast<ConstantInt>(NextInChain)) {
		if (CI->isZero() && !(BO->getOpcode() == Instruction::Sub && OpNo == 0))
return TheOther;		return TheOther;
if (isa<ConstantExpr>(U))		}
return ConstantExpr::getNeg(cast<Constant>(TheOther));
return BinaryOperator::CreateNeg(TheOther, "", IP);		if (BO->getOpcode() == Instruction::Or) {
}		// Rebuild "or" as "add", because "or" may be invalid for the new
		// epxression.
		meheffUnsubmitted Not Done Reply Inline Actions Should be: ...is not the RHS of a sub. Also, I think it'd be a bit clearer if your logic was in the same form as your comment like so: if (I->isZero() && !(BO->getOpcode() == Instruction::Sub && OpNo == 0)) { Or alternatively change the comment logic. meheff: Should be: ...is not the RHS of a sub. Also, I think it'd be a bit clearer if your logic was…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions You're right. Done jingyue: You're right. Done
Value *ConstantOffsetExtractor::rebuildWithoutConstantOffset() {		//
assert(UserChain.size() > 0 && "you at least found a constant, right?");		// For instance, given
// Start with the constant and go up through UserChain, each time building a		// a \| (b + 5) where a and b + 5 have no common bits,
// clone of the subexpression but with the constant removed.		// we can extract 5 as the constant offset.
// e.g., to build a clone of (a + (b + (c + 5)) but with the 5 removed, we		//
// first c, then (b + c), and finally (a + (b + c)).		// However, reusing the "or" in the new index would give us
//		// (a \| b) + 5
// Fast path: if the GEP index is a constant, simply returns 0.		// which does not equal a \| (b + 5).
if (UserChain.size() == 1)		//
return ConstantInt::get(UserChain[0]->getType(), 0);		// Replacing the "or" with "add" is fine, because
		// a \| (b + 5) = a + (b + 5) = (a + b) + 5
Value *Remainder =		return BinaryOperator::CreateAdd(BO->getOperand(0), BO->getOperand(1),
rebuildLeafWithoutConstantOffset(UserChain[1], UserChain[0]);		BO->getName(), IP);
for (size_t I = 2; I < UserChain.size(); ++I)		}
Remainder = cloneAndReplace(UserChain[I], UserChain[I - 1], Remainder);
return Remainder;		// We can reuse BO in this case, because the new expression shares the same
		// instruction type and BO is used at most once.
		assert(BO->getNumUses() <= 1 &&
		meheffUnsubmitted Not Done Reply Inline Actions Can number of users ever be zero? That is, can't you assert B0->getNumUses() == 1? meheff: Can number of users ever be zero? That is, can't you assert B0->getNumUses() == 1?
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions UserChain[0], the future GEP index, is not linked in the program yet, and is unused. jingyue: UserChain[0], the future GEP index, is not linked in the program yet, and is unused.
		"distributeExtsAndCloneChain clones each BinaryOperator in "
		meheffUnsubmitted Not Done Reply Inline Actions If you change the name of distributeExt, you'll have to change it here as well. meheff: If you change the name of distributeExt, you'll have to change it here as well.
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Done. jingyue: Done.
		"UserChain, so no one should be used more than "
		"once");
		BO->setOperand(OpNo, NextInChain);
		BO->setHasNoSignedWrap(false);
		BO->setHasNoUnsignedWrap(false);
		// Make sure it appears after all instructions we've inserted so far.
		BO->moveBefore(IP);
		return BO;
}		}

int64_t ConstantOffsetExtractor::Extract(Value Idx, Value &NewIdx,		int64_t ConstantOffsetExtractor::Extract(Value Idx, Value &NewIdx,
const DataLayout *DL,		const DataLayout *DL,
Instruction *IP) {		GetElementPtrInst *GEP) {
ConstantOffsetExtractor Extractor(DL, IP);		ConstantOffsetExtractor Extractor(DL, GEP);
// Find a non-zero constant offset first.		// Find a non-zero constant offset first.
int64_t ConstantOffset = Extractor.find(Idx);		APInt ConstantOffset =
if (ConstantOffset == 0)		Extractor.find(Idx, /* SignExtended / false, / ZeroExtended */ false,
return 0;		GEP->isInBounds());
// Then rebuild a new index with the constant removed.		if (ConstantOffset != 0) {
NewIdx = Extractor.rebuildWithoutConstantOffset();		// Separates the constant offset from the GEP index.
return ConstantOffset;		NewIdx = Extractor.rebuildWithoutConstOffset();
		}
		return ConstantOffset.getSExtValue();
}		}

int64_t ConstantOffsetExtractor::Find(Value Idx, const DataLayout DL) {		int64_t ConstantOffsetExtractor::Find(Value Idx, const DataLayout DL,
return ConstantOffsetExtractor(DL, nullptr).find(Idx);		GetElementPtrInst *GEP) {
		// If Idx is an index of an inbound GEP, Idx is guaranteed to be non-negative.
		return ConstantOffsetExtractor(DL, GEP)
		.find(Idx, /* SignExtended / false, / ZeroExtended */ false,
		GEP->isInBounds())
		.getSExtValue();
}		}

void ConstantOffsetExtractor::ComputeKnownBits(Value *V, APInt &KnownOne,		void ConstantOffsetExtractor::ComputeKnownBits(Value *V, APInt &KnownOne,
APInt &KnownZero) const {		APInt &KnownZero) const {
IntegerType *IT = cast<IntegerType>(V->getType());		IntegerType *IT = cast<IntegerType>(V->getType());
KnownOne = APInt(IT->getBitWidth(), 0);		KnownOne = APInt(IT->getBitWidth(), 0);
KnownZero = APInt(IT->getBitWidth(), 0);		KnownZero = APInt(IT->getBitWidth(), 0);
llvm::computeKnownBits(V, KnownZero, KnownOne, DL, 0);		llvm::computeKnownBits(V, KnownZero, KnownOne, DL, 0);
Show All 12 Lines	int64_t SeparateConstOffsetFromGEP::accumulateByteOffset(
GetElementPtrInst GEP, const DataLayout DL, bool &NeedsExtraction) {		GetElementPtrInst GEP, const DataLayout DL, bool &NeedsExtraction) {
NeedsExtraction = false;		NeedsExtraction = false;
int64_t AccumulativeByteOffset = 0;		int64_t AccumulativeByteOffset = 0;
gep_type_iterator GTI = gep_type_begin(*GEP);		gep_type_iterator GTI = gep_type_begin(*GEP);
for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
if (isa<SequentialType>(*GTI)) {		if (isa<SequentialType>(*GTI)) {
// Tries to extract a constant offset from this GEP index.		// Tries to extract a constant offset from this GEP index.
int64_t ConstantOffset =		int64_t ConstantOffset =
ConstantOffsetExtractor::Find(GEP->getOperand(I), DL);		ConstantOffsetExtractor::Find(GEP->getOperand(I), DL, GEP);
if (ConstantOffset != 0) {		if (ConstantOffset != 0) {
NeedsExtraction = true;		NeedsExtraction = true;
// A GEP may have multiple indices. We accumulate the extracted		// A GEP may have multiple indices. We accumulate the extracted
// constant offset to a byte offset, and later offset the remainder of		// constant offset to a byte offset, and later offset the remainder of
// the original GEP with this byte offset.		// the original GEP with this byte offset.
AccumulativeByteOffset +=		AccumulativeByteOffset +=
ConstantOffset * DL->getTypeAllocSize(GTI.getIndexedType());		ConstantOffset * DL->getTypeAllocSize(GTI.getIndexedType());
}		}
}		}
}		}
return AccumulativeByteOffset;		return AccumulativeByteOffset;
}		}

bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {		bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
// Skip vector GEPs.		// Skip vector GEPs.
if (GEP->getType()->isVectorTy())		if (GEP->getType()->isVectorTy())
return false;		return false;

// The backend can already nicely handle the case where all indices are		// The backend can already nicely handle the case where all indices are
// constant.		// constant.
if (GEP->hasAllConstantIndices())		if (GEP->hasAllConstantIndices())
return false;		return false;

bool Changed = false;		bool Changed = false;
		// Canonicalize array indices to pointer-size integers. This helps to simplify
// Shortcuts integer casts. Eliminating these explicit casts can make		// the logic of splitting a GEP. For example, if a + b is a pointer-size
// subsequent optimizations more obvious: ConstantOffsetExtractor needn't		// integer, we have
// trace into these casts.		// gep base, a + b = gep (gep base, a), b
if (GEP->isInBounds()) {		// However, this equality may not hold if the size of a + b is smaller than
// Doing this to inbounds GEPs is safe because their indices are guaranteed		// the pointer size, because LLVM conceptually sign-extends GEP indices to
// to be non-negative and in bounds.		// pointer size before computing the address
		// (http://llvm.org/docs/LangRef.html#id181).
		//
		// This canonicalization is very likely already done in clang and instcombine.
		// Therefore, the program will probably remain the same.
		//
		// Verified in @i32_add in split-gep.ll
		const DataLayout *DL = &getAnalysis<DataLayoutPass>().getDataLayout();
		Type *IntPtrTy = DL->getIntPtrType(GEP->getType());
gep_type_iterator GTI = gep_type_begin(*GEP);		gep_type_iterator GTI = gep_type_begin(*GEP);
for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {		for (User::op_iterator I = GEP->op_begin() + 1, E = GEP->op_end();
		I != E; ++I, ++GTI) {
if (isa<SequentialType>(*GTI)) {		if (isa<SequentialType>(*GTI)) {
if (Operator *O = dyn_cast<Operator>(GEP->getOperand(I))) {		if ((*I)->getType() != IntPtrTy) {
if (O->getOpcode() == Instruction::SExt \|\|		I = CastInst::CreateIntegerCast(I, IntPtrTy, true, "idxprom", GEP);
O->getOpcode() == Instruction::ZExt) {
GEP->setOperand(I, O->getOperand(0));
Changed = true;		Changed = true;
}		}
}		}
}		}
}
}

const DataLayout *DL = &getAnalysis<DataLayoutPass>().getDataLayout();
bool NeedsExtraction;		bool NeedsExtraction;
int64_t AccumulativeByteOffset =		int64_t AccumulativeByteOffset =
accumulateByteOffset(GEP, DL, NeedsExtraction);		accumulateByteOffset(GEP, DL, NeedsExtraction);

if (!NeedsExtraction)		if (!NeedsExtraction)
return Changed;		return Changed;
// Before really splitting the GEP, check whether the backend supports the		// Before really splitting the GEP, check whether the backend supports the
// addressing mode we are about to produce. If no, this splitting probably		// addressing mode we are about to produce. If no, this splitting probably
// won't be beneficial.		// won't be beneficial.
TargetTransformInfo &TTI = getAnalysis<TargetTransformInfo>();		TargetTransformInfo &TTI = getAnalysis<TargetTransformInfo>();
if (!TTI.isLegalAddressingMode(GEP->getType()->getElementType(),		if (!TTI.isLegalAddressingMode(GEP->getType()->getElementType(),
/BaseGV=/nullptr, AccumulativeByteOffset,		/BaseGV=/nullptr, AccumulativeByteOffset,
/HasBaseReg=/true, /Scale=/0)) {		/HasBaseReg=/true, /Scale=/0)) {
return Changed;		return Changed;
}		}

// Remove the constant offset in each GEP index. The resultant GEP computes		// Remove the constant offset in each GEP index. The resultant GEP computes
// the variadic base.		// the variadic base.
gep_type_iterator GTI = gep_type_begin(*GEP);		GTI = gep_type_begin(*GEP);
for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {		for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
if (isa<SequentialType>(*GTI)) {		if (isa<SequentialType>(*GTI)) {
Value *NewIdx = nullptr;		Value *NewIdx = nullptr;
// Tries to extract a constant offset from this GEP index.		// Tries to extract a constant offset from this GEP index.
int64_t ConstantOffset =		int64_t ConstantOffset =
ConstantOffsetExtractor::Extract(GEP->getOperand(I), NewIdx, DL, GEP);		ConstantOffsetExtractor::Extract(GEP->getOperand(I), NewIdx, DL, GEP);
if (ConstantOffset != 0) {		if (ConstantOffset != 0) {
assert(NewIdx != nullptr &&		assert(NewIdx != nullptr &&
"ConstantOffset != 0 implies NewIdx is set");		"ConstantOffset != 0 implies NewIdx is set");
GEP->setOperand(I, NewIdx);		GEP->setOperand(I, NewIdx);
		}
		}
		}
// Clear the inbounds attribute because the new index may be off-bound.		// Clear the inbounds attribute because the new index may be off-bound.
// e.g.,		// e.g.,
//		//
// b = add i64 a, 5		// b = add i64 a, 5
// addr = gep inbounds float* p, i64 b		// addr = gep inbounds float* p, i64 b
//		//
// is transformed to:		// is transformed to:
//		//
// addr2 = gep float* p, i64 a		// addr2 = gep float* p, i64 a
// addr = gep float* addr2, i64 5		// addr = gep float* addr2, i64 5
//		//
// If a is -4, although the old index b is in bounds, the new index a is		// If a is -4, although the old index b is in bounds, the new index a is
// off-bound. http://llvm.org/docs/LangRef.html#id181 says "if the		// off-bound. http://llvm.org/docs/LangRef.html#id181 says "if the
// inbounds keyword is not present, the offsets are added to the base		// inbounds keyword is not present, the offsets are added to the base
// address with silently-wrapping two's complement arithmetic".		// address with silently-wrapping two's complement arithmetic".
// Therefore, the final code will be a semantically equivalent.		// Therefore, the final code will be a semantically equivalent.
//		//
// TODO(jingyue): do some range analysis to keep as many inbounds as		// TODO(jingyue): do some range analysis to keep as many inbounds as
// possible. GEPs with inbounds are more friendly to alias analysis.		// possible. GEPs with inbounds are more friendly to alias analysis.
GEP->setIsInBounds(false);		GEP->setIsInBounds(false);
Changed = true;
}
}
}

// Offsets the base with the accumulative byte offset.		// Offsets the base with the accumulative byte offset.
//		//
// %gep ; the base		// %gep ; the base
// ... %gep ...		// ... %gep ...
//		//
// => add the offset		// => add the offset
//		//
Show All 16 Lines	bool SeparateConstOffsetFromGEP::splitGEP(GetElementPtrInst *GEP) {
// %gep2 ; clone of %gep		// %gep2 ; clone of %gep
// %0 = bitcast %gep2 to i8*		// %0 = bitcast %gep2 to i8*
// %uglygep = gep %0, <offset>		// %uglygep = gep %0, <offset>
// %new.gep = bitcast %uglygep to <type of %gep>		// %new.gep = bitcast %uglygep to <type of %gep>
// ... %new.gep ...		// ... %new.gep ...
Instruction *NewGEP = GEP->clone();		Instruction *NewGEP = GEP->clone();
NewGEP->insertBefore(GEP);		NewGEP->insertBefore(GEP);

Type *IntPtrTy = DL->getIntPtrType(GEP->getType());
uint64_t ElementTypeSizeOfGEP =		uint64_t ElementTypeSizeOfGEP =
DL->getTypeAllocSize(GEP->getType()->getElementType());		DL->getTypeAllocSize(GEP->getType()->getElementType());
if (AccumulativeByteOffset % ElementTypeSizeOfGEP == 0) {		if (AccumulativeByteOffset % ElementTypeSizeOfGEP == 0) {
// Very likely. As long as %gep is natually aligned, the byte offset we		// Very likely. As long as %gep is natually aligned, the byte offset we
// extracted should be a multiple of sizeof(*%gep).		// extracted should be a multiple of sizeof(*%gep).
// Per ANSI C standard, signed / unsigned = unsigned. Therefore, we		// Per ANSI C standard, signed / unsigned = unsigned. Therefore, we
// cast ElementTypeSizeOfGEP to signed.		// cast ElementTypeSizeOfGEP to signed.
int64_t Index =		int64_t Index =
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep-and-gvn.ll

	; RUN: llc < %s -march=nvptx -mcpu=sm_20 \| FileCheck %s --check-prefix=PTX
	; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 \| FileCheck %s --check-prefix=PTX			; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 \| FileCheck %s --check-prefix=PTX
	; RUN: opt < %s -S -separate-const-offset-from-gep -gvn -dce \| FileCheck %s --check-prefix=IR			; RUN: opt < %s -S -separate-const-offset-from-gep -gvn -dce \| FileCheck %s --check-prefix=IR

	; Verifies the SeparateConstOffsetFromGEP pass.			; Verifies the SeparateConstOffsetFromGEP pass.
	; The following code computes			; The following code computes
	; *output = array[x][y] + array[x][y+1] + array[x+1][y] + array[x+1][y+1]			; *output = array[x][y] + array[x][y+1] + array[x+1][y] + array[x+1][y+1]
	;			;
	; We expect SeparateConstOffsetFromGEP to transform it to			; We expect SeparateConstOffsetFromGEP to transform it to
	;			;
	; float *base = &a[x][y];			; float *base = &a[x][y];
	; *output = base[0] + base[1] + base[32] + base[33];			; *output = base[0] + base[1] + base[32] + base[33];
	;			;
	; so the backend can emit PTX that uses fewer virtual registers.			; so the backend can emit PTX that uses fewer virtual registers.

	target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"			target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
	target triple = "nvptx64-unknown-unknown"			target triple = "nvptx64-unknown-unknown"

	@array = internal addrspace(3) constant [32 x [32 x float]] zeroinitializer, align 4			@array = internal addrspace(3) constant [32 x [32 x float]] zeroinitializer, align 4

	define void @sum_of_array(i32 %x, i32 %y, float* nocapture %output) {			define void @sum_of_array(i32 %x, i32 %y, float* nocapture %output) {
	.preheader:			.preheader:
	%0 = zext i32 %y to i64			%0 = sext i32 %y to i64
	%1 = zext i32 %x to i64			%1 = sext i32 %x to i64
	%2 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %1, i64 %0			%2 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %1, i64 %0
	%3 = addrspacecast float addrspace(3)* %2 to float*			%3 = addrspacecast float addrspace(3)* %2 to float*
	%4 = load float* %3, align 4			%4 = load float* %3, align 4
	%5 = fadd float %4, 0.000000e+00			%5 = fadd float %4, 0.000000e+00
	%6 = add i32 %y, 1			%6 = add i32 %y, 1
	%7 = zext i32 %6 to i64			%7 = sext i32 %6 to i64
	%8 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %1, i64 %7			%8 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %1, i64 %7
	%9 = addrspacecast float addrspace(3)* %8 to float*			%9 = addrspacecast float addrspace(3)* %8 to float*
	%10 = load float* %9, align 4			%10 = load float* %9, align 4
	%11 = fadd float %5, %10			%11 = fadd float %5, %10
	%12 = add i32 %x, 1			%12 = add i32 %x, 1
	%13 = zext i32 %12 to i64			%13 = sext i32 %12 to i64
	%14 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %13, i64 %0			%14 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %13, i64 %0
	%15 = addrspacecast float addrspace(3)* %14 to float*			%15 = addrspacecast float addrspace(3)* %14 to float*
	%16 = load float* %15, align 4			%16 = load float* %15, align 4
	%17 = fadd float %11, %16			%17 = fadd float %11, %16
	%18 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %13, i64 %7			%18 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %13, i64 %7
	%19 = addrspacecast float addrspace(3)* %18 to float*			%19 = addrspacecast float addrspace(3)* %18 to float*
	%20 = load float* %19, align 4			%20 = load float* %19, align 4
	%21 = fadd float %17, %20			%21 = fadd float %17, %20
	store float %21, float* %output, align 4			store float %21, float* %output, align 4
	ret void			ret void
	}			}

	; PTX-LABEL: sum_of_array(			; PTX-LABEL: sum_of_array(
	; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG:%(rl\|r)[0-9]+]]{{\]}}			; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG:%(rl\|r)[0-9]+]]{{\]}}
	; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+4{{\]}}			; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+4{{\]}}
	; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+128{{\]}}			; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+128{{\]}}
	; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+132{{\]}}			; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+132{{\]}}

	; IR-LABEL: @sum_of_array(			; IR-LABEL: @sum_of_array(
	; IR: [[BASE_PTR:%[0-9]+]] = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i32 %x, i32 %y			; IR: [[BASE_PTR:%[a-zA-Z0-9]+]] = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i64 %{{[a-zA-Z0-9]+}}
				; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 1
				; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 32
				; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 33

				; @sum_of_array2 is very similar to @sum_of_array. The only difference is in
				; the order of "sext" and "add" when computing the array indices. @sum_of_array
				; computes add before sext, e.g., array[sext(x + 1)][sext(y + 1)], while
				; @sum_of_array2 computes sext before add,
				; e.g., array[sext(x) + 1][sext(y) + 1]. SeparateConstOffsetFromGEP should be
				; able to extract constant offsets from both forms.
				define void @sum_of_array2(i32 %x, i32 %y, float* nocapture %output) {
				.preheader:
				%0 = sext i32 %y to i64
				%1 = sext i32 %x to i64
				%2 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %1, i64 %0
				%3 = addrspacecast float addrspace(3)* %2 to float*
				%4 = load float* %3, align 4
				%5 = fadd float %4, 0.000000e+00
				%6 = add i64 %0, 1
				%7 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %1, i64 %6
				%8 = addrspacecast float addrspace(3)* %7 to float*
				%9 = load float* %8, align 4
				%10 = fadd float %5, %9
				%11 = add i64 %1, 1
				%12 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %11, i64 %0
				%13 = addrspacecast float addrspace(3)* %12 to float*
				%14 = load float* %13, align 4
				%15 = fadd float %10, %14
				%16 = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %11, i64 %6
				%17 = addrspacecast float addrspace(3)* %16 to float*
				%18 = load float* %17, align 4
				%19 = fadd float %15, %18
				store float %19, float* %output, align 4
				ret void
				}
				; PTX-LABEL: sum_of_array2(
				; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG:%(rl\|r)[0-9]+]]{{\]}}
				; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+4{{\]}}
				; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+128{{\]}}
				; PTX: ld.shared.f32 {{%f[0-9]+}}, {{\[}}[[BASE_REG]]+132{{\]}}

				; IR-LABEL: @sum_of_array2(
				; IR: [[BASE_PTR:%[a-zA-Z0-9]+]] = getelementptr inbounds [32 x [32 x float]] addrspace(3)* @array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i64 %{{[a-zA-Z0-9]+}}
	; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 1			; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 1
	; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 32			; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 32
	; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 33			; IR: getelementptr float addrspace(3)* [[BASE_PTR]], i64 33

test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll

	Show All 17 Lines
	; may have different types.			; may have different types.
	define double* @struct(i32 %i) {			define double* @struct(i32 %i) {
	entry:			entry:
	%add = add nsw i32 %i, 5			%add = add nsw i32 %i, 5
	%idxprom = sext i32 %add to i64			%idxprom = sext i32 %add to i64
	%p = getelementptr inbounds [1024 x %struct.S]* @struct_array, i64 0, i64 %idxprom, i32 1			%p = getelementptr inbounds [1024 x %struct.S]* @struct_array, i64 0, i64 %idxprom, i32 1
	ret double* %p			ret double* %p
	}			}
	; CHECK-LABEL: @struct			; CHECK-LABEL: @struct(
	; CHECK: getelementptr [1024 x %struct.S]* @struct_array, i64 0, i32 %i, i32 1			; CHECK: getelementptr [1024 x %struct.S]* @struct_array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i32 1

	; We should be able to trace into sext/zext if it's directly used as a GEP			; We should be able to trace into s/zext(a + b) if a + b is non-negative
	; index.			; (e.g., used as an index of an inbounds GEP) and one of a and b is
	define float* @sext_zext(i32 %i, i32 %j) {			; non-negative.
	entry:			define float* @sext_add(i32 %i, i32 %j) {
	%i1 = add i32 %i, 1			entry:
	%j2 = add i32 %j, 2			%0 = add i32 %i, 1
	%i1.ext = sext i32 %i1 to i64			%1 = sext i32 %0 to i64 ; inbound sext(i + 1) = sext(i) + 1
	%j2.ext = zext i32 %j2 to i64			%2 = sub i32 %j, 2
	%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i1.ext, i64 %j2.ext			; However, inbound sext(j - 2) != sext(j) - 2, e.g., j = INT_MIN
	ret float* %p			%3 = sext i32 %2 to i64
	}			%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %1, i64 %3
	; CHECK-LABEL: @sext_zext			ret float* %p
	; CHECK: getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i32 %i, i32 %j			}
	; CHECK: getelementptr float* %{{[0-9]+}}, i64 34			; CHECK-LABEL: @sext_add(
				; CHECK-NOT: = add
				; CHECK: getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i64 %{{[a-zA-Z0-9]+}}
				; CHECK: getelementptr float* %{{[a-zA-Z0-9]+}}, i64 32

	; We should be able to trace into sext/zext if it can be distributed to both			; We should be able to trace into sext/zext if it can be distributed to both
	; operands, e.g., sext (add nsw a, b) == add nsw (sext a), (sext b)			; operands, e.g., sext (add nsw a, b) == add nsw (sext a), (sext b)
				;
				; This test verifies we can transform
				; gep base, a + sext(b +nsw 1), c + zext(d +nuw 1)
				; to
				; gep base, a + sext(b), c + zext(d); gep ..., 1 * 32 + 1
	define float* @ext_add_no_overflow(i64 %a, i32 %b, i64 %c, i32 %d) {			define float* @ext_add_no_overflow(i64 %a, i32 %b, i64 %c, i32 %d) {
	%b1 = add nsw i32 %b, 1			%b1 = add nsw i32 %b, 1
	%b2 = sext i32 %b1 to i64			%b2 = sext i32 %b1 to i64
	%i = add i64 %a, %b2			%i = add i64 %a, %b2 ; i = a + sext(b +nsw 1)
	%d1 = add nuw i32 %d, 1			%d1 = add nuw i32 %d, 1
	%d2 = zext i32 %d1 to i64			%d2 = zext i32 %d1 to i64
	%j = add i64 %c, %d2			%j = add i64 %c, %d2 ; j = c + zext(d +nuw 1)
	%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i, i64 %j			%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i, i64 %j
	ret float* %p			ret float* %p
	}			}
	; CHECK-LABEL: @ext_add_no_overflow			; CHECK-LABEL: @ext_add_no_overflow(
	; CHECK: [[BASE_PTR:%[0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %{{[0-9]+}}, i64 %{{[0-9]+}}			; CHECK: [[BASE_PTR:%[a-zA-Z0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i64 %{{[a-zA-Z0-9]+}}
	; CHECK: getelementptr float* [[BASE_PTR]], i64 33			; CHECK: getelementptr float* [[BASE_PTR]], i64 33

	; Similar to @ext_add_no_overflow, we should be able to trace into sext/zext if			; Verifies we handle nested sext/zext correctly.
	; its operand is an "or" instruction.			define void @sext_zext(i32 %a, i32 %b, float %out1, float %out2) {
	define float* @ext_or(i64 %a, i32 %b) {			entry:
				%0 = add nsw nuw i32 %a, 1
				%1 = sext i32 %0 to i48
				%2 = zext i48 %1 to i64 ; zext(sext(a +nsw nuw 1)) = zext(sext(a)) + 1
				%3 = add nsw i32 %b, 2
				%4 = sext i32 %3 to i48
				%5 = zext i48 %4 to i64 ; zext(sext(a +nsw 2)) != zext(sext(a)) + 2
				%p1 = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %2, i64 %5
				store float* %p1, float** %out1
				%6 = add nuw i32 %a, 3
				%7 = zext i32 %6 to i48
				%8 = sext i48 %7 to i64 ; sext(zext(b +nuw 3)) = zext(b +nuw 3) = zext(b) + 3
				%9 = add nsw i32 %b, 4
				%10 = zext i32 %9 to i48
				%11 = sext i48 %10 to i64 ; sext(zext(b +nsw 4)) != zext(b) + 4
				%p2 = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %8, i64 %11
				store float* %p2, float** %out2
				ret void
				}
				; CHECK-LABEL: @sext_zext(
				; CHECK: [[BASE_PTR_1:%[a-zA-Z0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i64 %{{[a-zA-Z0-9]+}}
				; CHECK: getelementptr float* [[BASE_PTR_1]], i64 32
				; CHECK: [[BASE_PTR_2:%[a-zA-Z0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i64 %{{[a-zA-Z0-9]+}}
				; CHECK: getelementptr float* [[BASE_PTR_2]], i64 96

				; Similar to @ext_add_no_overflow, we should be able to trace into s/zext if
				; its operand is an OR and the two operands of the OR have no common bits.
				define float* @sext_or(i64 %a, i32 %b) {
	entry:			entry:
	%b1 = shl i32 %b, 2			%b1 = shl i32 %b, 2
	%b2 = or i32 %b1, 1			%b2 = or i32 %b1, 1 ; (b << 2) and 1 have no common bits
	%b3 = or i32 %b1, 2			%b3 = or i32 %b1, 4 ; (b << 2) and 4 may have common bits
	%b2.ext = sext i32 %b2 to i64			%b2.ext = zext i32 %b2 to i64
	%b3.ext = sext i32 %b3 to i64			%b3.ext = sext i32 %b3 to i64
	%i = add i64 %a, %b2.ext			%i = add i64 %a, %b2.ext
	%j = add i64 %a, %b3.ext			%j = add i64 %a, %b3.ext
	%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i, i64 %j			%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i, i64 %j
	ret float* %p			ret float* %p
	}			}
	; CHECK-LABEL: @ext_or			; CHECK-LABEL: @sext_or(
	; CHECK: [[BASE_PTR:%[0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %{{[0-9]+}}, i64 %{{[0-9]+}}			; CHECK: [[BASE_PTR:%[a-zA-Z0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i64 %{{[a-zA-Z0-9]+}}
	; CHECK: getelementptr float* [[BASE_PTR]], i64 34			; CHECK: getelementptr float* [[BASE_PTR]], i64 32

	; We should treat "or" with no common bits (%k) as "add", and leave "or" with
	; potentially common bits (%l) as is.
	define float* @or(i64 %i) {
	entry:
	%j = shl i64 %i, 2
	%k = or i64 %j, 3 ; no common bits
	%l = or i64 %j, 4 ; potentially common bits
	%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %k, i64 %l
	ret float* %p
	}
	; CHECK-LABEL: @or
	; CHECK: [[BASE_PTR:%[0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %j, i64 %l
	; CHECK: getelementptr float* [[BASE_PTR]], i64 96

	; The subexpression (b + 5) is used in both "i = a + (b + 5)" and "*out = b +			; The subexpression (b + 5) is used in both "i = a + (b + 5)" and "*out = b +
	; 5". When extracting the constant offset 5, make sure "*out = b + 5" isn't			; 5". When extracting the constant offset 5, make sure "*out = b + 5" isn't
	; affected.			; affected.
	define float* @expr(i64 %a, i64 %b, i64* %out) {			define float* @expr(i64 %a, i64 %b, i64* %out) {
	entry:			entry:
	%b5 = add i64 %b, 5			%b5 = add i64 %b, 5
	%i = add i64 %b5, %a			%i = add i64 %b5, %a
	%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i, i64 0			%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i, i64 0
	store i64 %b5, i64* %out			store i64 %b5, i64* %out
	ret float* %p			ret float* %p
	}			}
	; CHECK-LABEL: @expr			; CHECK-LABEL: @expr(
	; CHECK: [[BASE_PTR:%[0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %0, i64 0			; CHECK: [[BASE_PTR:%[a-zA-Z0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %{{[a-zA-Z0-9]+}}, i64 0
	; CHECK: getelementptr float* [[BASE_PTR]], i64 160			; CHECK: getelementptr float* [[BASE_PTR]], i64 160
	; CHECK: store i64 %b5, i64* %out			; CHECK: store i64 %b5, i64* %out

				; d + sext(a +nsw (b +nsw (c +nsw 8))) => (d + sext(a) + sext(b) + sext(c)) + 8
				define float* @sext_expr(i32 %a, i32 %b, i32 %c, i64 %d) {
				entry:
				%0 = add nsw i32 %c, 8
				%1 = add nsw i32 %b, %0
				%2 = add nsw i32 %a, %1
				%3 = sext i32 %2 to i64
				%i = add i64 %d, %3
				%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 0, i64 %i
				ret float* %p
				}
				; CHECK-LABEL: @sext_expr(
				; CHECK: sext i32
				; CHECK: sext i32
				; CHECK: sext i32
				; CHECK: getelementptr float* %{{[a-zA-Z0-9]+}}, i64 8

	; Verifies we handle "sub" correctly.			; Verifies we handle "sub" correctly.
	define float* @sub(i64 %i, i64 %j) {			define float* @sub(i64 %i, i64 %j) {
	%i2 = sub i64 %i, 5 ; i - 5			%i2 = sub i64 %i, 5 ; i - 5
	%j2 = sub i64 5, %j ; 5 - i			%j2 = sub i64 5, %j ; 5 - i
	%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i2, i64 %j2			%p = getelementptr inbounds [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i2, i64 %j2
	ret float* %p			ret float* %p
	}			}
	; CHECK-LABEL: @sub			; CHECK-LABEL: @sub(
	; CHECK: %[[j2:[0-9]+]] = sub i64 0, %j			; CHECK: %[[j2:[a-zA-Z0-9]+]] = sub i64 0, %j
	; CHECK: [[BASE_PTR:%[0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i, i64 %[[j2]]			; CHECK: [[BASE_PTR:%[a-zA-Z0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 %i, i64 %[[j2]]
	; CHECK: getelementptr float* [[BASE_PTR]], i64 -155			; CHECK: getelementptr float* [[BASE_PTR]], i64 -155

	%struct.Packed = type <{ [3 x i32], [8 x i64] }> ; <> means packed			%struct.Packed = type <{ [3 x i32], [8 x i64] }> ; <> means packed

	; Verifies we can emit correct uglygep if the address is not natually aligned.			; Verifies we can emit correct uglygep if the address is not natually aligned.
	define i64* @packed_struct(i32 %i, i32 %j) {			define i64* @packed_struct(i32 %i, i32 %j) {
	entry:			entry:
	%s = alloca [1024 x %struct.Packed], align 16			%s = alloca [1024 x %struct.Packed], align 16
	%add = add nsw i32 %j, 3			%add = add nsw i32 %j, 3
	%idxprom = sext i32 %add to i64			%idxprom = sext i32 %add to i64
	%add1 = add nsw i32 %i, 1			%add1 = add nsw i32 %i, 1
	%idxprom2 = sext i32 %add1 to i64			%idxprom2 = sext i32 %add1 to i64
	%arrayidx3 = getelementptr inbounds [1024 x %struct.Packed]* %s, i64 0, i64 %idxprom2, i32 1, i64 %idxprom			%arrayidx3 = getelementptr inbounds [1024 x %struct.Packed]* %s, i64 0, i64 %idxprom2, i32 1, i64 %idxprom
	ret i64* %arrayidx3			ret i64* %arrayidx3
	}			}
	; CHECK-LABEL: @packed_struct			; CHECK-LABEL: @packed_struct(
	; CHECK: [[BASE_PTR:%[0-9]+]] = getelementptr [1024 x %struct.Packed]* %s, i64 0, i32 %i, i32 1, i32 %j			; CHECK: [[BASE_PTR:%[a-zA-Z0-9]+]] = getelementptr [1024 x %struct.Packed]* %s, i64 0, i64 %{{[a-zA-Z0-9]+}}, i32 1, i64 %{{[a-zA-Z0-9]+}}
	; CHECK: [[CASTED_PTR:%[0-9]+]] = bitcast i64* [[BASE_PTR]] to i8*			; CHECK: [[CASTED_PTR:%[a-zA-Z0-9]+]] = bitcast i64* [[BASE_PTR]] to i8*
	; CHECK: %uglygep = getelementptr i8* [[CASTED_PTR]], i64 100			; CHECK: %uglygep = getelementptr i8* [[CASTED_PTR]], i64 100
	; CHECK: bitcast i8* %uglygep to i64*			; CHECK: bitcast i8* %uglygep to i64*

				; We shouldn't be able to extract the 8 from "zext(a +nuw (b + 8))",
				; because "zext(b + 8) != zext(b) + 8"
				define float* @zext_expr(i32 %a, i32 %b) {
				entry:
				%0 = add i32 %b, 8
				%1 = add nuw i32 %a, %0
				%i = zext i32 %1 to i64
				%p = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 0, i64 %i
				ret float* %p
				}
				; CHECK-LABEL: zext_expr(
				; CHECK: getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 0, i64 %i

				; Per http://llvm.org/docs/LangRef.html#id181, the indices of a off-bound gep
				; should be considered sign-extended to the pointer size. Therefore,
				; gep base, (add i32 a, b) != gep (gep base, i32 a), i32 b
				; because
				; sext(a + b) != sext(a) + sext(b)
				;
				; This test verifies we do not illegitimately extract the 8 from
				; gep base, (i32 a + 8)
				define float* @i32_add(i32 %a) {
				entry:
				%i = add i32 %a, 8
				%p = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 0, i32 %i
				ret float* %p
				}
				; CHECK-LABEL: @i32_add(
				; CHECK: getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 0, i64 %{{[a-zA-Z0-9]+}}
				; CHECK-NOT: getelementptr

				; Verifies that we compute the correct constant offset when the index is
				; sign-extended and then zero-extended. The old version of our code failed to
				; handle this case because it simply computed the constant offset as the
				; sign-extended value of the constant part of the GEP index.
				define float* @apint(i1 %a) {
				entry:
				%0 = add nsw nuw i1 %a, 1
				%1 = sext i1 %0 to i4
				%2 = zext i4 %1 to i64 ; zext (sext i1 1 to i4) to i64 = 15
				%p = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 0, i64 %2
				ret float* %p
				}
				; CHECK-LABEL: @apint(
				; CHECK: [[BASE_PTR:%[a-zA-Z0-9]+]] = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 0, i64 %{{[a-zA-Z0-9]+}}
				; CHECK: getelementptr float* [[BASE_PTR]], i64 15

				; Do not trace into binary operators other than ADD, SUB, and OR.
				define float* @and(i64 %a) {
				entry:
				%0 = shl i64 %a, 2
				%1 = and i64 %0, 1
				%p = getelementptr [32 x [32 x float]]* @float_2d_array, i64 0, i64 0, i64 %1
				ret float* %p
				}
				; CHECK-LABEL: @and(
				; CHECK: getelementptr [32 x [32 x float]]* @float_2d_array
				; CHECK-NOT: getelementptr