This is an archive of the discontinued LLVM Phabricator instance.

Differential D5863

[SeparateConstOffsetFromGEP] Fix bugs and improve the current SeparateConstOffsetFromGEP
AbandonedPublic

Authored by • HaoLiu on Oct 19 2014, 10:50 PM.

Download Raw Diff

Details

Reviewers

t.p.northover

Summary

Hi Jingyue and other reviewers,

I investigated the SeparateConstOffsetFromGEP pass Jingyue added on May this year and find it is also beneficial for the address calculation in AArch64 backend. So we want to enable it in AArch64 backend. But before that, I want to fix some problems.

I find there are some run time failures in llvm test-suite benchmark sqlite3. They are caused by following two reasons:
(1) "int64 modulo uint64" generates an incorrect result. See test case "@sign_mod_unsign_bug" in my patch. This is fixed by a cast from uint64_t to int64_t
(2) The OR instruction may generate incorrect result. See the test case "@test_or_bug". This is fixed with the improvement described later.

Also, I find an opportunity to improve this pass. As your comments say, currently it can only find one constant in one index. For the index having several constants like " (a + 4) + (b + 1)", enabling instcombine may not always work as following reasons:
(1) We find some passes may generate such cases after instcombine. E.g. We find the CFGSimplifyPass can even generate index from two constants like "a = 1 + 2", which the current SeparateConstOffsetFromGEP pass can only find constant 1. Also, there are many cases like "(a + 4) + 1" or even more complex.
(2) The ADD-OR situation like the test case "@test_or_bug" will not be optimized by instcombine. We can only find constant 1 in "(a<<2 + 4) | 1" and ignore constant 4.

This patch improves this pass to handle such situations. I don't modify too much code. As the current logic can work well to find one constant in one index, I just add some code to enable it to find more constants in one index. Previously, it finds a constant in either operand of a binary instruction, now it can find the constants in both operands.

I tested this patch on llvm test-suite, now all the benchmarks can pass. Also spec cpu2000 and spec cpu2006 are tested. I don't know what benchmarks you tested on NVPTX backend, But I think this improvement can benefit NVPTX as well. At least it has no regression.

Review please.

Thanks,
-Hao

Diff Detail

Event Timeline

• HaoLiu updated this revision to Diff 15132.Oct 19 2014, 10:50 PM

• HaoLiu retitled this revision from to [SeparateConstOffsetFromGEP] Fix bugs and improve the current SeparateConstOffsetFromGEP.

• HaoLiu updated this object.

• HaoLiu edited the test plan for this revision. (Show Details)

• HaoLiu added reviewers: jingyue, t.p.northover.

• HaoLiu added a subscriber: Unknown Object (MLST).

Herald added a subscriber: aemerson. · View Herald TranscriptOct 19 2014, 10:50 PM

Add another patch fixing some typo issues.

Thanks for working on this!

First round.

I'll fix the two correctness bugs separately. After that, please rebase your diff against the new version.

I am very curious what CFGSimplificationPass does to generate a = 1 + 2. Does a = 1 + 2 appear at the end of -O3? I don't think leaving that clear misoptimization at the end of -O3 is reasonable.

Jingyue

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
120	s/outputs/returns/
129	Not the same as Extract any more.
475	Why do you prefer "unsigned &ChainIndex" to "unsigned ChainIndex"? The latter seems to use more space because unsigned& takes the size of a pointer while unsigned only takes 32 bits. Also, the original logic seems clearer.
712	// Splits this GEP index into a variadic part and a constant offset, and uses the variadic part as the new index.
739	Not quite true. The sum of offsets being 0 doesn't mean every offset is zero. As long as there's a non-zero offset, it's worth creating a new GEP.
774	Nice find! I like this bug :)
test/Transforms/SeparateConstOffsetFromGEP/AArch64/split-gep.ll
5	Remove the extra dash?
37	((a << 2) + 12)

Two correctness bugs are fixed in r220615 and r220618. Thanks for reporting them and providing tests and initial patches!

• HaoLiu updated this revision to Diff 15471.Oct 26 2014, 11:53 PM

• HaoLiu edited edge metadata.

Herald added a subscriber: jholewinski. · View Herald TranscriptOct 26 2014, 11:53 PM

Hi Jingyue,

First round.

I'll fix the two correctness bugs separately. After that, please rebase your diff against the new version.

Thanks very much for the code review and the commits for bugs.

I am very curious what CFGSimplificationPass does to generate a = 1 + 2. Does a = 1 + 2 appear at the end of -O3? I don't think leaving that clear misoptimization at the end of -O3 is reasonable.

simple.c594 BDownload

I've attached a simple test case "simple.c" can reproduce it as following:

clang -O3 -S -emit-llvm simple.c
llc -march=aarch64 < simple.ll -print-after-all (or llc -march=nvptx < simple.ll -print-after-all)

For AArch64 backend We can find such IRs after CFGSimplification pass:

%inc.1.1 = add nsw i32 2, 1
...
%inc.2.1 = add nsw i32 %inc.1.1, 1
...
%inc.1.2 = add nsw i32 %inc.1.1, 2

Similar IRs can be found in NVPTX backend even though it doesn't call CFGSimplification pass.

Besides that, the test case for OR also requires us to find constants in both operands. InstCombine will never optimize test case like "((a<<2) + 12) | 1".

Why do you prefer "unsigned &ChainIndex" to "unsigned ChainIndex"? The latter seems to use more space because unsigned& takes the size of a pointer while unsigned only takes 32 bits. Also, the original logic seems clearer.

This is because I need to know the index of the next node. Previously, it find one constant throw a single path, so when it find one constant, the index is 0. But to find constants in both operands of a binary instruction, it needs to traverse throw a tree (As my example in the comments), which is a little different from previous single path. When we find a constant in an operand of a binary instruction, the current index may be not 0, and current single path is end. Then we can go back to the other operand and traverse another single path to find another constant. See the code

if (BO->getOperand(1) == UserChain[ChainIndex - 1])
if (ChainIndex != 0 && BO->getOperand(0) == UserChain[ChainIndex - 1])

Both operands are checked and visited. To make sure all nodes have been visited, an assertion for "ChainIndex == 0" is added.

A new patch has been added after modification according to the comments and rebase. Review please.

Thanks,
-Hao

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
129	I just think we don't need to return an int64_t, which is only used to check whether the extraction is successful. As we always need to return a NewIdx, we can check whether the NewIdx is nullptr.
739	Sorry I still not quite understand. This is after setting new operands stage, which means the old GEP has already been replaced by new indices. The following code is only to create the second GEP with one index whose value is 0. So I still think a GEP with index 0 is unnecessary.

• HaoLiu updated this revision to Diff 15473.Oct 27 2014, 12:10 AM

I think this 1+2 statement should be optimized away regardless of SeparateConstOffset, because leaving such misoptimization can hurt other passes down the road. This can be done by either running instsimplify after simplifycfg or fixing simplifycfg so that it simplifies the new instructions it creates. It's worth a separate bug report. Btw, I tried simple.c on the NVPTX backend, and found 1+2 is produced by CodeGenPrepare instead of simplifycfg. We probably need to fix the NVPTX backend similarly.

Handling "((a << 2) + 12) | 1" should probably be a feature request to instcombine, because doing so benefits other passes and the code quality in general.

These are all very nice finds, and I like them! I just think they should be fixed in a more general way.

Regarding the improvement on being able to extract multiple constants, I think your implementation is correct in general and I really appreciate that you figured it out. However, I am concerned about the complexity of the code. First, with the above issues in instcombine and simplifycfg fixed, are we still motivated in handling multiple constant offsets in this pass? Second, if we are still motivated, instead of finding and extracting all non-zero constant offsets in one DFS traversal, can we repeatedly call find+extract until we cannot improve any more? Your current implementation is surely faster, but I suspect (1) most GEPs have no constant offsets, and (2) for those who have, the latter approach would only run a very limited number (mostly two) of iterations. In general, I'd trade premature optimization for code simplicity.

Let me know what you think. Thanks!

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
129	Yes, I agree and I like that. I was just saying the comments need to be updated :)
739	Sounds good. Nits: change the comment s/as/if, and add a blank line after GEP->setIsInBoiunds(false). Thanks!

Hi Jingyue,

I agree with you.
The test case about "1 + 2" are corner cases, it won't affect too much about the performance.
For the "((a << 2) + 12) | 1" issue, it is more common than above case. If instcombine can do optimization for such case, I think it can have some small improvement on performance.

I agree to not do such improvement. For other minor changes which you agree, I've added them to the patch in D5864, which is the key thing I want to do. I hope you can have a look and review it.

Thanks,
-Hao

Does that mean we should abandon this patch and move on to D5864?

In D5863#15, @jingyue wrote:

Does that mean we should abandon this patch and move on to D5864?

Yes. Thanks for your comments.

Could you click "abandon changes" so that it doesn't show up in reviewers' TODO list :)? Thanks!

If you are not going to fix the issues you discovered with instcombine and simplifycfg right away, please track them in the buganizer.

jingyue resigned from this revision.Nov 2 2014, 9:14 AM

jingyue removed a reviewer: jingyue.

• HaoLiu abandoned this revision.Nov 3 2014, 5:49 PM

For the CFGSimplification case, a new ticket 21473 in bugzilla has been created.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

SeparateConstOffsetFromGEP.cpp

294 lines

test/

Transforms/

SeparateConstOffsetFromGEP/

AArch64/

lit.local.cfg

3 lines

split-gep.ll

51 lines

NVPTX/

split-gep.ll

10 lines

Diff 15473

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

Context not available.
	/// -instcombine probably already optimized (3 * (a + 5)) to (3 * a + 15).	/// -instcombine probably already optimized (3 * (a + 5)) to (3 * a + 15).
	class ConstantOffsetExtractor {	class ConstantOffsetExtractor {
	public:	public:
	/// Extracts a constant offset from the given GEP index. It outputs the	/// Extracts a constant offset from the given GEP index. It returns the
		jingyueUnsubmitted Not Done Reply Inline Actions s/outputs/returns/ jingyue: s/outputs/returns/
	/// numeric value of the extracted constant offset (0 if failed), and a
	/// new index representing the remainder (equal to the original index minus	/// new index representing the remainder (equal to the original index minus
	/// the constant offset).	/// the constant offset).
	/// \p Idx The given GEP index	/// \p Idx The given GEP index
	/// \p NewIdx The new index to replace (output)
	/// \p DL The datalayout of the module	/// \p DL The datalayout of the module
	/// \p GEP The given GEP	/// \p GEP The given GEP
	static int64_t Extract(Value Idx, Value &NewIdx, const DataLayout *DL,	static Value Extract(Value Idx, const DataLayout *DL,
	GetElementPtrInst *GEP);	GetElementPtrInst *GEP);
	/// Looks for a constant offset without extracting it. The meaning of the	/// Looks for a constant offset without extracting it. The meaning of the
	/// arguments and the return value are the same as Extract.	/// arguments and the return value are the same as Extract.
		jingyueUnsubmitted Not Done Reply Inline Actions Not the same as Extract any more. jingyue: Not the same as Extract any more.
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions I just think we don't need to return an int64_t, which is only used to check whether the extraction is successful. As we always need to return a NewIdx, we can check whether the NewIdx is nullptr. HaoLiu: I just think we don't need to return an int64_t, which is only used to check whether the…
		jingyueUnsubmitted Not Done Reply Inline Actions Yes, I agree and I like that. I was just saying the comments need to be updated :) jingyue: Yes, I agree and I like that. I was just saying the comments need to be updated :)
	static int64_t Find(Value Idx, const DataLayout DL, GetElementPtrInst *GEP);	static int64_t Find(Value Idx, const DataLayout DL, GetElementPtrInst *GEP,
		bool &FoundConst);

	private:	private:
	ConstantOffsetExtractor(const DataLayout Layout, Instruction InsertionPt)	ConstantOffsetExtractor(const DataLayout Layout, Instruction InsertionPt)
Context not available.
	/// an index of an inbounds GEP is guaranteed to be	/// an index of an inbounds GEP is guaranteed to be
	/// non-negative. Levaraging this, we can better split	/// non-negative. Levaraging this, we can better split
	/// inbounds GEPs.	/// inbounds GEPs.
	APInt find(Value *V, bool SignExtended, bool ZeroExtended, bool NonNegative);	/// \p FoundConst Whether V constains constants.
		APInt find(Value *V, bool SignExtended, bool ZeroExtended, bool NonNegative,
		bool &FoundConst);
	/// A helper function to look into both operands of a binary operator.	/// A helper function to look into both operands of a binary operator.
	APInt findInEitherOperand(BinaryOperator *BO, bool SignExtended,	APInt findInOperands(BinaryOperator *BO, bool SignExtended, bool ZeroExtended,
	bool ZeroExtended);	bool &FoundConst);
	/// After finding the constant offset C from the GEP index I, we build a new	/// After finding the constant offset C from the GEP index I, we build a new
	/// index I' s.t. I' + C = I. This function builds and returns the new	/// index I' s.t. I' + C = I. This function builds and returns the new
	/// index I' according to UserChain produced by function "find".	/// index I' according to UserChain produced by function "find".
Context not available.
	///	///
	/// \p ChainIndex The index to UserChain. ChainIndex is initially	/// \p ChainIndex The index to UserChain. ChainIndex is initially
	/// UserChain.size() - 1, and is decremented during	/// UserChain.size() - 1, and is decremented during
	/// the recursion.	/// the recursion. The next index to visit is ChainIndex - 1.
	Value *distributeExtsAndCloneChain(unsigned ChainIndex);	Value *distributeExtsAndCloneChain(unsigned &ChainIndex);
	/// Reassociates the GEP index to the form I' + C and returns I'.	/// Reassociates the GEP index to the form I' + C and returns I'.
	Value *removeConstOffset(unsigned ChainIndex);	Value *removeConstOffset(unsigned &ChainIndex);
	/// A helper function to apply ExtInsts, a list of s/zext, to value V.	/// A helper function to apply ExtInsts, a list of s/zext, to value V.
	/// e.g., if ExtInsts = [sext i32 to i64, zext i16 to i32], this function	/// e.g., if ExtInsts = [sext i32 to i64, zext i16 to i32], this function
	/// returns "sext i32 (zext i16 V to i32) to i64".	/// returns "sext i32 (zext i16 V to i32) to i64".
Context not available.
	bool CanTraceInto(bool SignExtended, bool ZeroExtended, BinaryOperator *BO,	bool CanTraceInto(bool SignExtended, bool ZeroExtended, BinaryOperator *BO,
	bool NonNegative);	bool NonNegative);

	/// The path from the constant offset to the old GEP index. e.g., if the GEP	/// UserChain records the visit path from constant offsets to the old GEP
	/// index is "a * b + (c + 5)". After running function find, UserChain[0] will	/// index. e.g. if the GEP index is calculated as following:
	/// be the constant 5, UserChain[1] will be the subexpression "c + 5", and	/// b = a + 3
	/// UserChain[2] will be the entire expression "a * b + (c + 5)".	/// c = b + 4;
	///	/// e = d + 8;
	/// This path helps to rebuild the new GEP index.	/// f = c + e;
		/// And such index is actually a tree:
		/// f = c + e
		/// / \
		/// c = b + 4 e = d + 8
		/// / \ \
		/// b = a + 3 4 8
		/// /
		/// 3
		/// The traversal is DFS(Depth-First Search) and stored in post-order, so such
		/// nodes are kept in UserChain as following:
		/// 3, b = a + 3, 4, c = b + 4, 8, e = d + 8, f = c + e;
		/// This sequence is important when we try to remove constants.
	SmallVector<User *, 8> UserChain;	SmallVector<User *, 8> UserChain;
	/// A data structure used in rebuildWithoutConstOffset. Contains all	/// A data structure used in rebuildWithoutConstOffset. Contains all
	/// sext/zext instructions along UserChain.	/// sext/zext instructions along UserChain.
Context not available.
	return true;	return true;
	}	}

	APInt ConstantOffsetExtractor::findInEitherOperand(BinaryOperator *BO,	APInt ConstantOffsetExtractor::findInOperands(BinaryOperator *BO,
	bool SignExtended,	bool SignExtended,
	bool ZeroExtended) {	bool ZeroExtended,
		bool &FoundConst) {
	// BO being non-negative does not shed light on whether its operands are	// BO being non-negative does not shed light on whether its operands are
	// non-negative. Clear the NonNegative flag here.	// non-negative. Clear the NonNegative flag here.
	APInt ConstantOffset = find(BO->getOperand(0), SignExtended, ZeroExtended,	APInt Constant0 = find(BO->getOperand(0), SignExtended, ZeroExtended,
	/* NonNegative */ false);	/* NonNegative */ false, FoundConst);
	// If we found a constant offset in the left operand, stop and return that.	APInt Constant1 = find(BO->getOperand(1), SignExtended, ZeroExtended,
	// This shortcut might cause us to miss opportunities of combining the	/* NonNegative */ false, FoundConst);
	// constant offsets in both operands, e.g., (a + 4) + (b + 5) => (a + b) + 9.	bool IsSub = (BO->getOpcode() == Instruction::Sub);
	// However, such cases are probably already handled by -instcombine,	return IsSub ? Constant0 - Constant1 : Constant0 + Constant1;
	// given this pass runs after the standard optimizations.
	if (ConstantOffset != 0) return ConstantOffset;
	ConstantOffset = find(BO->getOperand(1), SignExtended, ZeroExtended,
	/* NonNegative */ false);
	// If U is a sub operator, negate the constant offset found in the right
	// operand.
	if (BO->getOpcode() == Instruction::Sub)
	ConstantOffset = -ConstantOffset;
	return ConstantOffset;
	}	}

	APInt ConstantOffsetExtractor::find(Value *V, bool SignExtended,	APInt ConstantOffsetExtractor::find(Value *V, bool SignExtended,
	bool ZeroExtended, bool NonNegative) {	bool ZeroExtended, bool NonNegative,
		bool &FoundConst) {
	// TODO(jingyue): We could trace into integer/pointer casts, such as	// TODO(jingyue): We could trace into integer/pointer casts, such as
	// inttoptr, ptrtoint, bitcast, and addrspacecast. We choose to handle only	// inttoptr, ptrtoint, bitcast, and addrspacecast. We choose to handle only
	// integers because it gives good enough results for our benchmarks.	// integers because it gives good enough results for our benchmarks.
Context not available.
	if (U == nullptr) return APInt(BitWidth, 0);	if (U == nullptr) return APInt(BitWidth, 0);

	APInt ConstantOffset(BitWidth, 0);	APInt ConstantOffset(BitWidth, 0);
		bool FindInOperands = false;
	if (ConstantInt *CI = dyn_cast<ConstantInt>(V)) {	if (ConstantInt *CI = dyn_cast<ConstantInt>(V)) {
	// Hooray, we found it!	// Hooray, we found it!
	ConstantOffset = CI->getValue();	ConstantOffset = CI->getValue();
		// Zero is a valid constant offset, but doesn't help this optimization.
		FindInOperands = (ConstantOffset != 0);
	} else if (BinaryOperator *BO = dyn_cast<BinaryOperator>(V)) {	} else if (BinaryOperator *BO = dyn_cast<BinaryOperator>(V)) {
	// Trace into subexpressions for more hoisting opportunities.	// Trace into subexpressions for more hoisting opportunities.
	if (CanTraceInto(SignExtended, ZeroExtended, BO, NonNegative)) {	if (CanTraceInto(SignExtended, ZeroExtended, BO, NonNegative)) {
	ConstantOffset = findInEitherOperand(BO, SignExtended, ZeroExtended);	ConstantOffset =
		findInOperands(BO, SignExtended, ZeroExtended, FindInOperands);
	}	}
	} else if (isa<SExtInst>(V)) {	} else if (isa<SExtInst>(V)) {
	ConstantOffset = find(U->getOperand(0), /* SignExtended */ true,	ConstantOffset = find(U->getOperand(0), /* SignExtended */ true,
	ZeroExtended, NonNegative).sext(BitWidth);	ZeroExtended, NonNegative,
		FindInOperands).sext(BitWidth);
	} else if (isa<ZExtInst>(V)) {	} else if (isa<ZExtInst>(V)) {
	// As an optimization, we can clear the SignExtended flag because	// As an optimization, we can clear the SignExtended flag because
	// sext(zext(a)) = zext(a). Verified in @sext_zext in split-gep.ll.	// sext(zext(a)) = zext(a). Verified in @sext_zext in split-gep.ll.
	//	//
	// Clear the NonNegative flag, because zext(a) >= 0 does not imply a >= 0.	// Clear the NonNegative flag, because zext(a) >= 0 does not imply a >= 0.
	ConstantOffset =	ConstantOffset = find(U->getOperand(0), /* SignExtended */ false,
	find(U->getOperand(0), /* SignExtended */ false,	/* ZeroExtended / true, / NonNegative */ false,
	/* ZeroExtended / true, / NonNegative */ false).zext(BitWidth);	FindInOperands).zext(BitWidth);
	}	}

	// If we found a non-zero constant offset, add it to the path for	// If we found constant offset in current Value, add it to the path for
	// rebuildWithoutConstOffset. Zero is a valid constant offset, but doesn't	// rebuildWithoutConstOffset.
	// help this optimization.	if (FindInOperands) {
	if (ConstantOffset != 0)
	UserChain.push_back(U);	UserChain.push_back(U);
		FoundConst = true;
		}
	return ConstantOffset;	return ConstantOffset;
	}	}

Context not available.
	}	}

	Value *ConstantOffsetExtractor::rebuildWithoutConstOffset() {	Value *ConstantOffsetExtractor::rebuildWithoutConstOffset() {
	distributeExtsAndCloneChain(UserChain.size() - 1);	assert(UserChain.size() > 0 && "Empty Chain");

		unsigned ChainIndex = UserChain.size() - 1;
		distributeExtsAndCloneChain(ChainIndex);
		assert(ChainIndex == 0 && "Make sure that all Users have been visited");
	// Remove all nullptrs (used to be s/zext) from UserChain.	// Remove all nullptrs (used to be s/zext) from UserChain.
	unsigned NewSize = 0;	unsigned NewSize = 0;
	for (auto I = UserChain.begin(), E = UserChain.end(); I != E; ++I) {	for (auto I = UserChain.begin(), E = UserChain.end(); I != E; ++I) {
Context not available.
	}	}
	}	}
	UserChain.resize(NewSize);	UserChain.resize(NewSize);
	return removeConstOffset(UserChain.size() - 1);	ChainIndex = NewSize - 1;
		// Remove all the constant offsets and generate a new index.
		Value *NewIdx = removeConstOffset(ChainIndex);
		assert(ChainIndex == 0 && "Make sure that all Users have been visited");
		return NewIdx;
	}	}

	Value *	Value *
	ConstantOffsetExtractor::distributeExtsAndCloneChain(unsigned ChainIndex) {	ConstantOffsetExtractor::distributeExtsAndCloneChain(unsigned &ChainIndex) {
		jingyueUnsubmitted Not Done Reply Inline Actions Why do you prefer "unsigned &ChainIndex" to "unsigned ChainIndex"? The latter seems to use more space because unsigned& takes the size of a pointer while unsigned only takes 32 bits. Also, the original logic seems clearer. jingyue: Why do you prefer "unsigned &ChainIndex" to "unsigned ChainIndex"? The latter seems to use more…
	User *U = UserChain[ChainIndex];	User *U = UserChain[ChainIndex];
	if (ChainIndex == 0) {	// If U is a ConstantInt, applyExts will return a ConstantInt as well.
	assert(isa<ConstantInt>(U));	if (isa<ConstantInt>(U))
	// If U is a ConstantInt, applyExts will return a ConstantInt as well.
	return UserChain[ChainIndex] = cast<ConstantInt>(applyExts(U));	return UserChain[ChainIndex] = cast<ConstantInt>(applyExts(U));
	}

		assert(ChainIndex != 0 &&
		"Only Constant is allowed the last one in the Chain");
		// Store current index in case that ChainIndex may be decremented.
		unsigned ThisIndex = ChainIndex;
	if (CastInst *Cast = dyn_cast<CastInst>(U)) {	if (CastInst *Cast = dyn_cast<CastInst>(U)) {
	assert((isa<SExtInst>(Cast) \|\| isa<ZExtInst>(Cast)) &&	assert((isa<SExtInst>(Cast) \|\| isa<ZExtInst>(Cast)) &&
	"We only traced into two types of CastInst: sext and zext");	"We only traced into two types of CastInst: sext and zext");
	ExtInsts.push_back(Cast);	ExtInsts.push_back(Cast);
	UserChain[ChainIndex] = nullptr;	UserChain[ChainIndex] = nullptr;
	return distributeExtsAndCloneChain(ChainIndex - 1);	Value *NewV = distributeExtsAndCloneChain(--ChainIndex);
		// Pop back as this CastInst is only used when visiting it's operand.
		ExtInsts.pop_back();
		return NewV;
	}	}

	// Function find only trace into BinaryOperator and CastInst.	// Function find only trace into BinaryOperator and CastInst.
	BinaryOperator *BO = cast<BinaryOperator>(U);	BinaryOperator *BO = cast<BinaryOperator>(U);
	// OpNo = which operand of BO is UserChain[ChainIndex - 1]
	unsigned OpNo = (BO->getOperand(0) == UserChain[ChainIndex - 1] ? 0 : 1);	Value NewOp0, NewOp1;
	Value *TheOther = applyExts(BO->getOperand(1 - OpNo));	// At least one of the two operands are the next in Chain. Firstly check if
	Value *NextInChain = distributeExtsAndCloneChain(ChainIndex - 1);	// the next to be visited is operand 1.
		// If true, the next to be visited may be operand 0 (depend on whether
	BinaryOperator *NewBO = nullptr;	// operand 0 is equal to the next in chain).
	if (OpNo == 0) {	// If false, the next to be visited must be operand 0.
	NewBO = BinaryOperator::Create(BO->getOpcode(), NextInChain, TheOther,	if (BO->getOperand(1) == UserChain[ChainIndex - 1]) {
	BO->getName(), IP);	NewOp1 = distributeExtsAndCloneChain(--ChainIndex);

		if (ChainIndex != 0 && BO->getOperand(0) == UserChain[ChainIndex - 1])
		NewOp0 = distributeExtsAndCloneChain(--ChainIndex);
		else
		NewOp0 = applyExts(BO->getOperand(0));
	} else {	} else {
	NewBO = BinaryOperator::Create(BO->getOpcode(), TheOther, NextInChain,	assert(BO->getOperand(0) == UserChain[ChainIndex - 1] &&
	BO->getName(), IP);	"At least one operand is next in Chain");
		NewOp1 = applyExts(BO->getOperand(1));
		NewOp0 = distributeExtsAndCloneChain(--ChainIndex);
	}	}
	return UserChain[ChainIndex] = NewBO;
		BinaryOperator *NewBO = BinaryOperator::Create(BO->getOpcode(), NewOp0,
		NewOp1, BO->getName(), IP);
		return UserChain[ThisIndex] = NewBO;
	}	}

	Value *ConstantOffsetExtractor::removeConstOffset(unsigned ChainIndex) {	Value *ConstantOffsetExtractor::removeConstOffset(unsigned &ChainIndex) {
	if (ChainIndex == 0) {	if (isa<ConstantInt>(UserChain[ChainIndex]))
	assert(isa<ConstantInt>(UserChain[ChainIndex]));
	return ConstantInt::getNullValue(UserChain[ChainIndex]->getType());	return ConstantInt::getNullValue(UserChain[ChainIndex]->getType());
	}

		assert(ChainIndex != 0 &&
		"Only Constant is allowed to be the last one in the Chain");
	BinaryOperator *BO = cast<BinaryOperator>(UserChain[ChainIndex]);	BinaryOperator *BO = cast<BinaryOperator>(UserChain[ChainIndex]);
	unsigned OpNo = (BO->getOperand(0) == UserChain[ChainIndex - 1] ? 0 : 1);
	assert(BO->getOperand(OpNo) == UserChain[ChainIndex - 1]);
	Value *NextInChain = removeConstOffset(ChainIndex - 1);
	Value *TheOther = BO->getOperand(1 - OpNo);

	// If NextInChain is 0 and not the LHS of a sub, we can simplify the
	// sub-expression to be just TheOther.
	if (ConstantInt *CI = dyn_cast<ConstantInt>(NextInChain)) {
	if (CI->isZero() && !(BO->getOpcode() == Instruction::Sub && OpNo == 0))
	return TheOther;
	}

	if (BO->getOpcode() == Instruction::Or) {	if (BO->getOpcode() == Instruction::Or) {
	// Rebuild "or" as "add", because "or" may be invalid for the new	// Rebuild "or" as "add", because "or" may be invalid for the new
	// epxression.	// epxression.
Context not available.
	//	//
	// Replacing the "or" with "add" is fine, because	// Replacing the "or" with "add" is fine, because
	// a \| (b + 5) = a + (b + 5) = (a + b) + 5	// a \| (b + 5) = a + (b + 5) = (a + b) + 5
	if (OpNo == 0) {	BO = BinaryOperator::CreateAdd(BO->getOperand(0), BO->getOperand(1),
	return BinaryOperator::CreateAdd(NextInChain, TheOther, BO->getName(),	BO->getName(), BO);
	IP);	}
	} else {
	return BinaryOperator::CreateAdd(TheOther, NextInChain, BO->getName(),	// Similar to distributeExtsAndCloneChain. Firstly check if the next to be
	IP);	// visited is operand 1.
		// If true, the next to be visited may be operand 0 (depend on whether
		// operand 0 is equal to the next in chain).
		// If false, the next to be visited must be operand 0.
		Value NewOp0 = BO->getOperand(0), NewOp1 = BO->getOperand(1);
		if (BO->getOperand(1) == UserChain[ChainIndex - 1]) {
		NewOp1 = removeConstOffset(--ChainIndex);
		BO->setOperand(1, NewOp1);

		if (ChainIndex != 0 && BO->getOperand(0) == UserChain[ChainIndex - 1]) {
		NewOp0 = removeConstOffset(--ChainIndex);
		BO->setOperand(0, NewOp0);
		}
		} else {
		assert(BO->getOperand(0) == UserChain[ChainIndex - 1] &&
		"At least one operand is next in Chain");
		NewOp0 = removeConstOffset(--ChainIndex);
		BO->setOperand(0, NewOp0);
		}

		// The new operand can be removed if it is constant.
		bool RemoveOp0 = false, RemoveOp1 = false;
		if (ConstantInt *CI = dyn_cast<ConstantInt>(NewOp1)) {
		assert(CI->isZero() && "Make sure that all constants have been removed");
		RemoveOp1 = true;
		}
		if (ConstantInt *CI = dyn_cast<ConstantInt>(NewOp0)) {
		assert(CI->isZero() && "Make sure that all constants have been removed");
		// Can not remove constant 0 from (0 - b)
		if (BO->getOpcode() != Instruction::Sub \|\| RemoveOp1) {
		RemoveOp0 = true;
	}	}
	}	}

	// We can reuse BO in this case, because the new expression shares the same	if (RemoveOp0 && RemoveOp1)
	// instruction type and BO is used at most once.	return ConstantInt::getNullValue(UserChain[ChainIndex]->getType());
	assert(BO->getNumUses() <= 1 &&	if (RemoveOp0 \|\| RemoveOp1)
	"distributeExtsAndCloneChain clones each BinaryOperator in "	return RemoveOp0 ? NewOp1 : NewOp0;
	"UserChain, so no one should be used more than "
	"once");
	BO->setOperand(OpNo, NextInChain);
	BO->setHasNoSignedWrap(false);	BO->setHasNoSignedWrap(false);
	BO->setHasNoUnsignedWrap(false);	BO->setHasNoUnsignedWrap(false);
	// Make sure it appears after all instructions we've inserted so far.
	BO->moveBefore(IP);
	return BO;	return BO;
	}	}

	int64_t ConstantOffsetExtractor::Extract(Value Idx, Value &NewIdx,	Value ConstantOffsetExtractor::Extract(Value Idx, const DataLayout *DL,
	const DataLayout *DL,	GetElementPtrInst *GEP) {
	GetElementPtrInst *GEP) {
	ConstantOffsetExtractor Extractor(DL, GEP);	ConstantOffsetExtractor Extractor(DL, GEP);
	// Find a non-zero constant offset first.	// Try to find constant offsets.
	APInt ConstantOffset =	bool FoundConst = false;
	Extractor.find(Idx, /* SignExtended / false, / ZeroExtended */ false,	Extractor.find(Idx, /* SignExtended / false, / ZeroExtended */ false,
	GEP->isInBounds());	GEP->isInBounds(), FoundConst);
	if (ConstantOffset != 0) {	if (FoundConst)
	// Separates the constant offset from the GEP index.	// Separates the constant offset from the GEP index.
	NewIdx = Extractor.rebuildWithoutConstOffset();	return Extractor.rebuildWithoutConstOffset();
	}	return nullptr;
	return ConstantOffset.getSExtValue();
	}	}

	int64_t ConstantOffsetExtractor::Find(Value Idx, const DataLayout DL,	int64_t ConstantOffsetExtractor::Find(Value Idx, const DataLayout DL,
	GetElementPtrInst *GEP) {	GetElementPtrInst *GEP,
		bool &FoundConst) {
	// If Idx is an index of an inbound GEP, Idx is guaranteed to be non-negative.	// If Idx is an index of an inbound GEP, Idx is guaranteed to be non-negative.
	return ConstantOffsetExtractor(DL, GEP)	return ConstantOffsetExtractor(DL, GEP)
	.find(Idx, /* SignExtended / false, / ZeroExtended */ false,	.find(Idx, /* SignExtended / false, / ZeroExtended */ false,
	GEP->isInBounds())	GEP->isInBounds(), FoundConst)
	.getSExtValue();	.getSExtValue();
	}	}

Context not available.
	for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {	for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
	if (isa<SequentialType>(*GTI)) {	if (isa<SequentialType>(*GTI)) {
	// Tries to extract a constant offset from this GEP index.	// Tries to extract a constant offset from this GEP index.
	int64_t ConstantOffset =	bool FoundConst = false;
	ConstantOffsetExtractor::Find(GEP->getOperand(I), DL, GEP);	int64_t ConstantOffset = ConstantOffsetExtractor::Find(
	if (ConstantOffset != 0) {	GEP->getOperand(I), DL, GEP, FoundConst);
		// Use FoundConst to check whether we need extraction. We don't check
		// whether ConstantOffset is zero, as it can not cover some situations
		// like (a - 4) + 4.
		if (FoundConst) {
	NeedsExtraction = true;	NeedsExtraction = true;
	// A GEP may have multiple indices. We accumulate the extracted	// A GEP may have multiple indices. We accumulate the extracted
	// constant offset to a byte offset, and later offset the remainder of	// constant offset to a byte offset, and later offset the remainder of
Context not available.
	gep_type_iterator GTI = gep_type_begin(*GEP);	gep_type_iterator GTI = gep_type_begin(*GEP);
	for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {	for (unsigned I = 1, E = GEP->getNumOperands(); I != E; ++I, ++GTI) {
	if (isa<SequentialType>(*GTI)) {	if (isa<SequentialType>(*GTI)) {
	Value *NewIdx = nullptr;	// Splits this GEP index into a variadic part and a constant offset, and
		jingyueUnsubmitted Not Done Reply Inline Actions // Splits this GEP index into a variadic part and a constant offset, and uses the variadic part as the new index. jingyue: // Splits this GEP index into a variadic part and a constant offset, and uses the variadic part…
	// Tries to extract a constant offset from this GEP index.	// uses the variadic part as the new index.
	int64_t ConstantOffset =	Value *NewIdx =
	ConstantOffsetExtractor::Extract(GEP->getOperand(I), NewIdx, DL, GEP);	ConstantOffsetExtractor::Extract(GEP->getOperand(I), DL, GEP);
	if (ConstantOffset != 0) {	if (NewIdx != nullptr)
	assert(NewIdx != nullptr &&
	"ConstantOffset != 0 implies NewIdx is set");
	GEP->setOperand(I, NewIdx);	GEP->setOperand(I, NewIdx);
	}
	}	}
	}	}
	// Clear the inbounds attribute because the new index may be off-bound.	// Clear the inbounds attribute because the new index may be off-bound.
Context not available.
	// TODO(jingyue): do some range analysis to keep as many inbounds as	// TODO(jingyue): do some range analysis to keep as many inbounds as
	// possible. GEPs with inbounds are more friendly to alias analysis.	// possible. GEPs with inbounds are more friendly to alias analysis.
	GEP->setIsInBounds(false);	GEP->setIsInBounds(false);
		jingyueUnsubmitted Not Done Reply Inline Actions Not quite true. The sum of offsets being 0 doesn't mean every offset is zero. As long as there's a non-zero offset, it's worth creating a new GEP. jingyue: Not quite true. The sum of offsets being 0 doesn't mean every offset is zero. As long as…
		HaoLiuAuthorUnsubmitted Not Done Reply Inline Actions Sorry I still not quite understand. This is after setting new operands stage, which means the old GEP has already been replaced by new indices. The following code is only to create the second GEP with one index whose value is 0. So I still think a GEP with index 0 is unnecessary. HaoLiu: Sorry I still not quite understand. This is after setting new operands stage, which means the…
		jingyueUnsubmitted Not Done Reply Inline Actions Sounds good. Nits: change the comment s/as/if, and add a blank line after GEP->setIsInBoiunds(false). Thanks! jingyue: Sounds good. Nits: change the comment s/as/if, and add a blank line after GEP->setIsInBoiunds…
		// No need to create another GEP as the accumulative byte offset is 0.
		if (AccumulativeByteOffset == 0)
		return true;

	// Offsets the base with the accumulative byte offset.	// Offsets the base with the accumulative byte offset.
	//	//
Context not available.
		jingyueUnsubmitted Not Done Reply Inline Actions Nice find! I like this bug :) jingyue: Nice find! I like this bug :)

test/Transforms/SeparateConstOffsetFromGEP/AArch64/lit.local.cfg

This file was added.

				if not 'AArch64' in config.root.targets:
				config.unsupported = True

test/Transforms/SeparateConstOffsetFromGEP/AArch64/split-gep.ll

This file was added.

				; RUN: opt < %s -separate-const-offset-from-gep -dce -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
				target triple = "arm64-linux-gnu"

				jingyueUnsubmitted Not Done Reply Inline Actions Remove the extra dash? jingyue: Remove the extra dash?
				; Check that (a - 4) + 4 can be optimized correctly.
				define i16* @find_0(i32 %a, i16* %ptr) {
				%sub = add nsw i32 %a, -4
				%ext = sext i32 %sub to i64
				%sum = add nsw i64 %ext, 4
				%incdec.ptr = getelementptr i16* %ptr, i64 %sum
				ret i16* %incdec.ptr
				}
				; CHECK-LABEL: @find_0(
				; CHECK-NOT: add
				; CHECK: [[EXT:%[a-zA-Z0-9]+]] = sext
				; CHECK: getelementptr i16* %ptr, i64 [[EXT]]
				; CHECK-NEXT: ret

				; Check that ((a - 4) + 4) + 1 can be optimized correctly.
				define i16* @find_1(i32 %a, i16* %ptr) {
				%sub = add nsw i32 %a, -4
				%ext = sext i32 %sub to i64
				%sum = add nsw i64 %ext, 4
				%add.sum = add i64 %sum, 1
				%incdec.ptr = getelementptr i16* %ptr, i64 %add.sum
				ret i16* %incdec.ptr
				}
				; CHECK-LABEL: @find_1(
				; CHECK-NOT: add
				; CHECK: [[EXT:%[a-zA-Z0-9]+]] = sext
				; CHECK: [[PTR:%[a-zA-Z0-9]+]] = getelementptr i16* %ptr, i64 [[EXT]]
				; CHECK: getelementptr i16* [[PTR]], i64 1

				; Check a more complex case: ((a + 4) - 8) + (b + 16)
				define i16* @find_other(i32 %a, i32 %b, i16* %ptr) {
				%add = add nsw i32 %a, 4
				jingyueUnsubmitted Not Done Reply Inline Actions ((a << 2) + 12) jingyue: ((a << 2) + 12)
				%addext = sext i32 %add to i64
				%subadd = add nsw i64 %addext, -8
				%add2 = add nsw i32 %b, 16
				%add2ext = sext i32 %add2 to i64
				%addsum = add nsw i64 %subadd, %add2ext
				%incdec.ptr = getelementptr i16* %ptr, i64 %addsum
				ret i16* %incdec.ptr
				}
				; CHECK-LABEL: @find_other(
				; CHECK: sext i32
				; CHECK: sext i32
				; CHECK: [[SUM:%[a-zA-Z0-9]+]] = add
				; CHECK: [[PTR:%[a-zA-Z0-9]+]] = getelementptr i16* %ptr, i64 [[SUM]]
				; CHECK: getelementptr i16* [[PTR]], i64 12

test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll

Context not available.
	entry:	entry:
	%shl = shl i64 %a, 2	%shl = shl i64 %a, 2
	%add = add i64 %shl, 12	%add = add i64 %shl, 12
	%or = or i64 %add, 1	%or = or i64 %add, 1 ; ((a << 2) + 12) and 1 have no common bits.
	; CHECK: [[OR:%or[0-9]*]] = add i64 %shl, 1
	; ((a << 2) + 12) and 1 have no common bits. Therefore,
	; SeparateConstOffsetFromGEP is able to extract the 12.
	; TODO(jingyue): We could reassociate the expression to combine 12 and 1.
	%p = getelementptr float* %ptr, i64 %or	%p = getelementptr float* %ptr, i64 %or
	; CHECK: [[PTR:%[a-zA-Z0-9]+]] = getelementptr float* %ptr, i64 [[OR]]	; CHECK: [[PTR:%[a-zA-Z0-9]+]] = getelementptr float* %ptr, i64 %shl
	; CHECK: getelementptr float* [[PTR]], i64 12	; CHECK: getelementptr float* [[PTR]], i64 13
	ret float* %p	ret float* %p
	; CHECK-NEXT: ret	; CHECK-NEXT: ret
	}	}
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

[SeparateConstOffsetFromGEP] Fix bugs and improve the current SeparateConstOffsetFromGEPAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 15473

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

test/Transforms/SeparateConstOffsetFromGEP/AArch64/lit.local.cfg

test/Transforms/SeparateConstOffsetFromGEP/AArch64/split-gep.ll

test/Transforms/SeparateConstOffsetFromGEP/NVPTX/split-gep.ll

[SeparateConstOffsetFromGEP] Fix bugs and improve the current SeparateConstOffsetFromGEP
AbandonedPublic