This is an archive of the discontinued LLVM Phabricator instance.

Swap loop invariant GEP with loop variant GEP to allow more LICM.
ClosedPublic

Authored by • hulx2000 on Jul 8 2015, 10:23 PM.

Download Raw Diff

Details

Reviewers

qcolombet
• zinob
atrick
• joker-eph-DISABLED
hfinkel
apazos

Summary

This patch changes the order of GEPs generated by Splitting GEPs
pass, specially when one of the GEPs has constant and the base is
loop invariant, then we will generate the GEP with constant first
when beneficial, to expose more cases for LICM.

If originally Splitting GEP generate the following:
  do.body.i:
    %idxprom.i = sext i32 %shr.i to i64
    %2 = bitcast %typeD* %s to i8*
    %3 = shl i64 %idxprom.i, 2
    %uglygep = getelementptr i8, i8* %2, i64 %3
    %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
  ...
Now it genereates:
  do.body.i:
    %idxprom.i = sext i32 %shr.i to i64
    %2 = bitcast %typeD* %s to i8*
    %3 = shl i64 %idxprom.i, 2
    %uglygep = getelementptr i8, i8* %2, i64 1032
    %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
  ...

For no-loop cases, the original way of generating GEPs seems to
expose more CSE cases, so we don't change the logic for no-loop
cases, and only limit our change to the specific case we are
interested in.

Diff Detail

Repository: rL LLVM

Event Timeline

• hulx2000 updated this revision to Diff 29301.Jul 8 2015, 10:23 PM

• hulx2000 retitled this revision from to Extend LICM to hoist loop invariant GEP out.

• hulx2000 updated this object.

• hulx2000 added reviewers: hfinkel, • joker-eph-DISABLED.

• hulx2000 set the repository for this revision to rL LLVM.

• hulx2000 added subscribers: apazos, mcrosier.

Herald added a subscriber: tberghammer. · View Herald TranscriptJul 8 2015, 10:23 PM

• hulx2000 added a subscriber: llvm-commits.Jul 8 2015, 10:45 PM

• hulx2000 added reviewers: mcrosier, • zinob, apazos.Jul 8 2015, 10:48 PM

mcrosier added reviewers: atrick, qcolombet.Jul 9 2015, 6:30 AM

Hi Lawrence,

I am not sure the transformation is legal, I’ll come back on that, but assuming it is, I think we could consider that moving the constant elements of a gep at the beginning of a gep-chain is the canonical representation.

If people agree on this, then this transformation would be better suited as an inst combine and we can get rid of all the is loop invariant checks.

Now, going back to the legality of the transformation, I am definitely not a language lawyer but it seems to me we may slightly modify the behavior of the program.
E.g., consider
C = gep A, B
D = gep C, 1023

Where gep C, 1023 produces an overflow.

After this transformation, we would get:
New = gep A, 1023
D = gep B, New

Now, the overflow may be on New = gep A, 1023, which potentially affects the behavior of the program.

Thoughts?

Cheers,
-Quentin

Hi, Quentin:

Thanks for your prompt response.

Just want to clarify, after the transformation, the new code would be:
C = gep A, 1023
D = gep C, B // Note that B is still the second operand.

Based on the testcase originated this, those GEPs were generated by Split GEP pass, that means originally there is only one GEP related to A, so overflow on the first or the second should not make too much difference, because they were all based on A originally , that's why I checked the type of A and C must be the same.

However whether the check is strict enough to make sure the transformation is legal, I am not 100% sure, definitely I'd like to hear more comments.

About doing it in inst combining, I am not an expert of inst combining, my question is: is it possible for inst combining to hoist instruction outside loop? The important aspect of this transformation is to enable LICM to hoist a loop invariant computation outside loop, which is impossible before since the first GEP is loop variant, and we have to check if the first GEP is loop variant (or don't know), because this transformation is not needed otherwise -- If LICM can hoist the first GEP out, then it will hoist the second GEP out too.

More comments are welcome.

Regards

Lawrence Hu

I believe that if you restrict to 'inbounds' GEPs, you can avoid the overflow question.

About doing it in inst combining, I am not an expert of inst combining, my question is: is it possible for inst combining to hoist instruction outside loop?

No, but Quentin is right. When this kind of interchange is legal, we should establish a canonical form and use instcombine to transform to it. This will also allow CSE/GVN, etc. to pick up more of these cases.

Please upload your patches with full context.

I don't think GEP reassociation should be done as a canonical InstCombine for two reasons
(1) the inbounds property would need to be dropped
(2) Unlike Reassociate, InstCombine has no knowledge of the CFG

I think this should be part of SeparateConstOffsetFromGEP, which is also optimizing for register pressure and addressing modes. As part of this pass it is target configurable and controlled by EnableGEPOpt.

In D11051#204369, @atrick wrote:

I don't think GEP reassociation should be done as a canonical InstCombine for two reasons
(1) the inbounds property would need to be dropped

Can you even do this if you don't have inbounds GEPs? Nuno, do you have an opinion about this?

(2) Unlike Reassociate, InstCombine has no knowledge of the CFG

I think this should be part of SeparateConstOffsetFromGEP, which is also optimizing for register pressure and addressing modes. As part of this pass it is target configurable and controlled by EnableGEPOpt.

Thanks Hal & Andrew for your valuable comments.

I uploaded the full content diff.

Plus:

This patches originally is developed as part of Splitting GEP, later I moved it into LICM because I can reuse loopinfo in LICM and it save an extra loop to collect information (won't increase compile time much, but should avoid if possible anyway), and avoid some mess check to avoid performance regression. It completely ok to move it back to Splitting GEP if needed, as long as very one agrees.

For the pattern originated this, inbound is set to false by Splitting GEP.

If we want to do it in InstCombine, then we must make sure there is one pass of Splitting GEP, InstCombine, LICM happen in sequence, I will check that.

I will not change the design of this patch for now, will do it after most of the people reach an agreement.

Regards

Lawrence Hu

In D11051#204371, @hfinkel wrote:

In D11051#204369, @atrick wrote:

I don't think GEP reassociation should be done as a canonical InstCombine for two reasons
(1) the inbounds property would need to be dropped

Can you even do this if you don't have inbounds GEPs? Nuno, do you have an opinion about this?

I think Andy is right.
If you don't have the inbounds tag, then you can safely perform this transformation, since LLVM's semantics guarantee that it will compute the overflowing value correctly. The implementation of the lowering and constant folding of GEPs in LLVM matches the semantics.
If you have the inbounds tag, then the transformation can still be performed, although inbounds has to dropped in most cases, since otherwise we would need to prove that the intermediate results don't overflow.

This problem is similar to reassociation of integer expressions. It's always safe when there are no nsw/nuw, but if there are, they usually have to be dropped.

Nuno

Hi, Guys:

Can we reach an agreement about where to do it? I posted my older patch which is part of Split GEP at http://reviews.llvm.org/D11443.

Regards

You need to address this:

If you have the inbounds tag, then the transformation can still be performed, although inbounds has to dropped in most cases, since otherwise we would need to prove that the intermediate results don't overflow.

By either restricting to GEPs that don't have inbounds, or dropping inbounds after performing the transformation. Specifically, I think what Nuno is saying is that, for example, if we have:

ptr = (p + o) + c

and both are inbounds, then we know that (p + o) does not overflow, and specifically is no more than one byte past the end of the object to which p belongs. We also know that (p + o) + c offers the same guarantees. We don't, however, know that (p + c) is necessarily in bounds. Imagine that (p + o) is one past the end of the object, and c is -1. Then p + c might point before the beginning of the object.

We should not give up too easily, however, because is p is some known object we might be able to easily determine whether (p + c) is still inbounds. One way to do this would be to call:

V->stripAndAccumulateInBoundsConstantOffsets

on the base pointer and then see if we determine the size of the object. You can call getObjectSize(...) to attempt to figure this out (include/llvm/Analysis/MemoryBuiltins.h).

lib/Transforms/Scalar/LICM.cpp
415 ↗	(On Diff #29689)	This seems a bit odd. We should not be looking only at the very next instruction. At the very least, check for users in the same block.

Thanks Hal.

For you comments about stripAndAccumulateInBoundsConstantOffsets and getObjectSize.., I didn't understand it 100%, please let me know if that is what you are thinking.

Regards

Herald added subscribers: srhines, danalbert. · View Herald TranscriptJul 25 2015, 4:26 PM

In D11051#212087, @hulx2000 wrote:

Thanks Hal.

For you comments about stripAndAccumulateInBoundsConstantOffsets and getObjectSize.., I didn't understand it 100%, please let me know if that is what you are thinking.

I mean that you can call stripAndAccumulateInBoundsConstantOffsets on the first GEP you form (the one with the constant index), which will give you a new base and an offset. You can then call getObjectSize on that new base, and that might give you the object size. If the Offset <= ObjectSize, then you can mark the first GEP as inbounds.

Regards

Some comments about the inbounds changes. They also need to be covered by the regression tests.

lib/Transforms/Scalar/LICM.cpp
1145 ↗	(On Diff #30640)	I think this can be uge, because inbounds pointers are allowed to point to one byte past the end of the object (and thus, equal the size). Actually, you want Offset.ule (the offset must be less than or equal to the size for it to be inbounds).
1147 ↗	(On Diff #30640)	But you also need to set inbounds to false is the offset is larger than the size (even if the original GEP was inbounds). You need to set the inbounds on the second GEP to false.

• hulx2000 added inline comments.Jul 28 2015, 10:11 AM

lib/Transforms/Scalar/LICM.cpp
1147 ↗	(On Diff #30640)	Thanks Hal. The second GEP should be safe to have whatever inbound attribute as before, since originally the first is p+o, the second is p+o+c, now the first is p+c, the second is p+c+o==p+o+c.

atrick added inline comments.Jul 28 2015, 10:26 AM

lib/Transforms/Scalar/LICM.cpp
1147 ↗	(On Diff #30834)	I don't follow this comment. My reading of langref is that both the GEP's base and result must be inbounds. So if the first GEP is not inbounds and is the base of the second GEP, then the second GEP cannot be inbounds. (This is why I generally don't like this transformation.)

hfinkel added inline comments.Jul 28 2015, 10:45 AM

lib/Transforms/Scalar/LICM.cpp
1147 ↗	(On Diff #30834)	Andy is correct. The LangRef is clear on this matter, saying: "If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses..." So you can't have inbounds on the second GEP either.

If you need to do this transformation is cases where the inbounds flag will be dropped it seems best to do it late in the pipeline (maybe even after LSR) as a target-controlled optimization. Arranging for the hoisting to happen seems like the easy part of this problem.

Out of curiosity, I wonder if the LSR fix here http://reviews.llvm.org/D11212 has any impact on this test case.

In D11051#213677, @atrick wrote:

If you need to do this transformation is cases where the inbounds flag will be dropped it seems best to do it late in the pipeline (maybe even after LSR) as a target-controlled optimization. Arranging for the hoisting to happen seems like the easy part of this problem.

I agree. Also, as it turns out, both NVPTX and PowerPC run LICM late in the pipeline, and AArch64 can as well when -aarch64-gep-opt is enabled (PowerPC also runs it under the control of -ppc-gep-opt, but unlike AArch64, it is currently on by default).

Out of curiosity, I wonder if the LSR fix here http://reviews.llvm.org/D11212 has any impact on this test case.

Oh yeah. I forgot about my first comment on this patch. I'll repeat it:

Hi, Andrew:

About your comments, I have asked to reach an agreement, and posted my earlier version of this patch (in SeparateConstOffsetFromGEP) to http://reviews.llvm.org/D11443 to see if you guys like it or not. 

Since you and Hal both agree it should be done somewhere else, I assume SeparateConstOffsetFromGEP is a good place to go, I will move the code back to SeparateConstOffsetFromGEP then, hopefully I can do better then http://reviews.llvm.org/D11443.

Regards

Lawrence Hu

Forgot to mention.

This transformation is useful, it helps one of our benchmark by 7%, and that is the main reason why llvm is behind gcc for this benchmarks.

• hulx2000 updated this revision to Diff 31193.Aug 1 2015, 2:34 PM

moved the code back to Splitting GEP pass.

mcrosier resigned from this revision.Aug 5 2015, 8:15 AM

mcrosier removed a reviewer: mcrosier.

I don't have any serious objections.

NumOfCommonBaseGEPs==1 seems very artificial. Can you generalize this to NumOfCommonBaseGEPs > 0?

Quentin may want to do an arm64 benchmark run before we ok this. I'll leave that up to him.

I am making a quick assembly diff of the test suite to have a sense of the impact.

Hi Lawrence,

There are no diffs on the test-suite, so you’re good on that side :).

I don’t have any problems with the patch and let you address Andy’s comment.

Thanks,
-Quentin

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
1084	hasOneUse

Thanks Andrew and Quentin.

Quentin, the whole Splitting GEP pass is turned off, could you please try -mllvm -aarch64-gep-opt=true or revert 89cc8dd3b84a636b2798028a994f4b71c78e0163, internally, I have to do that to see difference, of course, it depends on testcase.

I will address Andrew's comment in next few days, stuck on bugs now.

In D11051#218985, @hulx2000 wrote:

Quentin, the whole Splitting GEP pass is turned off, could you please try -mllvm -aarch64-gep-opt=true or revert 89cc8dd3b84a636b2798028a994f4b71c78e0163, internally, I have to do that to see difference, of course, it depends on testcase.

Lawrence,
Just a friendly reminder that the community prefers a svn revision number over a git hash any day.. :)

Chad

I didn't realize the pass was off. We don't need to benchmark the change in that case, although whoever is using the pass might want to ?!

Haha, I missed that too!

+1 for Andy’s comment.

Thanks, Chad:

commit 89cc8dd3b84a636b2798028a994f4b71c78e0163
Author: James Molloy <james.molloy@arm.com>
Date: Wed Apr 22 09:11:38 2015 +0000

[AArch64] Disable complex GEP optimization by default.

Enough concerns were raised that this optimization is pessimising some code patterns.

The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@235491 91177308-0d34-0410-b5e6-96231b3b80d8

Lawrence, Chad, does this pass help you independent from the proposed patch? Is it something you will work toward enabling by default. If so, great. Otherwise, you probably want to decide which part will be controlled by flag and which part you want to enable by default. Obviously, changes to the default should be benchmarked.

Hi, Andy

I have seen up and down with Splitting GEP pass.

Yes, I am considering fixing a few more issues, then enable it by default, of course, we will do benchmark measurement at that time, can I ask your help at then to measure your benchmark?

Regards.

FWIW, I do not see any differences with -Os -mllvm -aarch64-gep-opt=true either.

I don't regularly benchmark llvm test-suite any more, but I can help find someone who does.

In D11051#219017, @atrick wrote:

Lawrence, Chad, does this pass help you independent from the proposed patch?

I believe we saw a 3-4% regression on Spec200x/mcf with James disabled the pass. This was on our internal branch in the context of LTO, which departs greatly from the community LTO flow (we're hoping to fix that soon).

Is it something you will work toward enabling by default. If so, great.

@hulx2000: Let me know if you'd like to discuss this offline. Have you or Ana discussed your plans with James?

Thanks, Quentin.

Interesting though, James said he turned off that Pass because Apple seeing non-negligible slowdowns, he said ok to turn it back on with tweaks, but "you’ll have to provide evidence to convince Gerolf".

Hmm, may be with LTO then.

I haven’t tried that.

hfinkel added inline comments.Aug 10 2015, 12:05 AM

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
758	You shouldn't need to pre-compute this, you can examine the use list if necessary, something like this: auto TooManyUsesInLoop = [](Value V, Loop L) { int UsesInLoop = 0; for (User U : V->users()) { if (Instruction User = dyn_cast<Instruction>(U)) if (L->contains(User)) if (++UsesInLoop > 1) return true; return false; } };
803	Then -> then could move -> can move
1065	This seems unnecessary (see above).

• hulx2000 updated this revision to Diff 32097.Aug 13 2015, 2:50 PM

ping

LGTM.

This revision is now accepted and ready to land.Sep 23 2015, 10:31 AM

merged as git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248420 91177308-0d34-0410-b5e6-96231b3b80d8

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

SeparateConstOffsetFromGEP.cpp

141 lines

test/

CodeGen/

AArch64/

aarch64-loop-gep-opt.ll

50 lines

Diff 32097

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines
// parts and variant parts, LICM can hoist/sink those invariant parts.		// parts and variant parts, LICM can hoist/sink those invariant parts.
// CGP (CodeGen Prepare) tries to sink address calculations that match the		// CGP (CodeGen Prepare) tries to sink address calculations that match the
// target's addressing modes. A GEP with multiple indices may not match and will		// target's addressing modes. A GEP with multiple indices may not match and will
// not be sunk. If we lower such GEP into smaller parts, CGP may sink some of		// not be sunk. If we lower such GEP into smaller parts, CGP may sink some of
// them. So we end up with a better addressing mode.		// them. So we end up with a better addressing mode.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/Analysis/MemoryBuiltins.h"
		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	SeparateConstOffsetFromGEP(const TargetMachine *TM = nullptr,
bool LowerGEP = false)		bool LowerGEP = false)
: FunctionPass(ID), DL(nullptr), DT(nullptr), TM(TM), LowerGEP(LowerGEP) {		: FunctionPass(ID), DL(nullptr), DT(nullptr), TM(TM), LowerGEP(LowerGEP) {
initializeSeparateConstOffsetFromGEPPass(*PassRegistry::getPassRegistry());		initializeSeparateConstOffsetFromGEPPass(*PassRegistry::getPassRegistry());
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
		AU.addRequired<LoopInfoWrapperPass>();
AU.setPreservesCFG();		AU.setPreservesCFG();
		AU.addRequired<TargetLibraryInfoWrapperPass>();
}		}

bool doInitialization(Module &M) override {		bool doInitialization(Module &M) override {
DL = &M.getDataLayout();		DL = &M.getDataLayout();
return false;		return false;
}		}
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

Show All 39 Lines	private:
///		///
/// Returns true if the module changes.		/// Returns true if the module changes.
///		///
/// Verified in @i32_add in split-gep.ll		/// Verified in @i32_add in split-gep.ll
bool canonicalizeArrayIndicesToPointerSize(GetElementPtrInst *GEP);		bool canonicalizeArrayIndicesToPointerSize(GetElementPtrInst *GEP);
/// Verify F is free of dead code.		/// Verify F is free of dead code.
void verifyNoDeadCode(Function &F);		void verifyNoDeadCode(Function &F);

		bool hasMoreThanOneUseInLoop(Value v, Loop L);
		// Swap the index operand of two GEP.
		void swapGEPOperand(GetElementPtrInst First, GetElementPtrInst Second);
		// Check if it is safe to swap operand of two GEP.
		bool isLegalToSwapOperand(GetElementPtrInst First, GetElementPtrInst Second,
		Loop *CurLoop);

const DataLayout *DL;		const DataLayout *DL;
const DominatorTree *DT;		const DominatorTree *DT;
const TargetMachine *TM;		const TargetMachine *TM;

		LoopInfo *LI;
		TargetLibraryInfo *TLI;
/// Whether to lower a GEP with multiple indices into arithmetic operations or		/// Whether to lower a GEP with multiple indices into arithmetic operations or
/// multiple GEPs with a single index.		/// multiple GEPs with a single index.
bool LowerGEP;		bool LowerGEP;
};		};
} // anonymous namespace		} // anonymous namespace

char SeparateConstOffsetFromGEP::ID = 0;		char SeparateConstOffsetFromGEP::ID = 0;
INITIALIZE_PASS_BEGIN(		INITIALIZE_PASS_BEGIN(
SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",		SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",
"Split GEPs to a variadic base and a constant offset for better CSE", false,		"Split GEPs to a variadic base and a constant offset for better CSE", false,
false)		false)
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)		INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
		INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
INITIALIZE_PASS_END(		INITIALIZE_PASS_END(
SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",		SeparateConstOffsetFromGEP, "separate-const-offset-from-gep",
"Split GEPs to a variadic base and a constant offset for better CSE", false,		"Split GEPs to a variadic base and a constant offset for better CSE", false,
false)		false)

FunctionPass *		FunctionPass *
llvm::createSeparateConstOffsetFromGEPPass(const TargetMachine *TM,		llvm::createSeparateConstOffsetFromGEPPass(const TargetMachine *TM,
bool LowerGEP) {		bool LowerGEP) {
▲ Show 20 Lines • Show All 326 Lines • ▼ Show 20 Lines
void SeparateConstOffsetFromGEP::lowerToSingleIndexGEPs(		void SeparateConstOffsetFromGEP::lowerToSingleIndexGEPs(
GetElementPtrInst *Variadic, int64_t AccumulativeByteOffset) {		GetElementPtrInst *Variadic, int64_t AccumulativeByteOffset) {
IRBuilder<> Builder(Variadic);		IRBuilder<> Builder(Variadic);
Type *IntPtrTy = DL->getIntPtrType(Variadic->getType());		Type *IntPtrTy = DL->getIntPtrType(Variadic->getType());

Type *I8PtrTy =		Type *I8PtrTy =
Builder.getInt8PtrTy(Variadic->getType()->getPointerAddressSpace());		Builder.getInt8PtrTy(Variadic->getType()->getPointerAddressSpace());
Value *ResultPtr = Variadic->getOperand(0);		Value *ResultPtr = Variadic->getOperand(0);
		Loop *L = LI->getLoopFor(Variadic->getParent());
		// Check if the base is not loop invariant or used more than once.
		bool isSwapCandidate =
		L && L->isLoopInvariant(ResultPtr) &&
		!hasMoreThanOneUseInLoop(ResultPtr, L);
		hfinkelUnsubmitted Done Reply Inline Actions You shouldn't need to pre-compute this, you can examine the use list if necessary, something like this: auto TooManyUsesInLoop = [](Value V, Loop L) { int UsesInLoop = 0; for (User U : V->users()) { if (Instruction User = dyn_cast<Instruction>(U)) if (L->contains(User)) if (++UsesInLoop > 1) return true; return false; } }; hfinkel: You shouldn't need to pre-compute this, you can examine the use list if necessary, something…
		Value *FirstResult = nullptr;

if (ResultPtr->getType() != I8PtrTy)		if (ResultPtr->getType() != I8PtrTy)
ResultPtr = Builder.CreateBitCast(ResultPtr, I8PtrTy);		ResultPtr = Builder.CreateBitCast(ResultPtr, I8PtrTy);

gep_type_iterator GTI = gep_type_begin(*Variadic);		gep_type_iterator GTI = gep_type_begin(*Variadic);
// Create an ugly GEP for each sequential index. We don't create GEPs for		// Create an ugly GEP for each sequential index. We don't create GEPs for
// structure indices, as they are accumulated in the constant offset index.		// structure indices, as they are accumulated in the constant offset index.
for (unsigned I = 1, E = Variadic->getNumOperands(); I != E; ++I, ++GTI) {		for (unsigned I = 1, E = Variadic->getNumOperands(); I != E; ++I, ++GTI) {
if (isa<SequentialType>(*GTI)) {		if (isa<SequentialType>(*GTI)) {
Show All 12 Lines	if (isa<SequentialType>(*GTI)) {
Idx, ConstantInt::get(IntPtrTy, ElementSize.logBase2()));		Idx, ConstantInt::get(IntPtrTy, ElementSize.logBase2()));
} else {		} else {
Idx = Builder.CreateMul(Idx, ConstantInt::get(IntPtrTy, ElementSize));		Idx = Builder.CreateMul(Idx, ConstantInt::get(IntPtrTy, ElementSize));
}		}
}		}
// Create an ugly GEP with a single index for each index.		// Create an ugly GEP with a single index for each index.
ResultPtr =		ResultPtr =
Builder.CreateGEP(Builder.getInt8Ty(), ResultPtr, Idx, "uglygep");		Builder.CreateGEP(Builder.getInt8Ty(), ResultPtr, Idx, "uglygep");
		if (FirstResult == nullptr)
		FirstResult = ResultPtr;
}		}
}		}

// Create a GEP with the constant offset index.		// Create a GEP with the constant offset index.
if (AccumulativeByteOffset != 0) {		if (AccumulativeByteOffset != 0) {
Value *Offset = ConstantInt::get(IntPtrTy, AccumulativeByteOffset);		Value *Offset = ConstantInt::get(IntPtrTy, AccumulativeByteOffset);
ResultPtr =		ResultPtr =
Builder.CreateGEP(Builder.getInt8Ty(), ResultPtr, Offset, "uglygep");		Builder.CreateGEP(Builder.getInt8Ty(), ResultPtr, Offset, "uglygep");
}		} else
		isSwapCandidate = false;

		// If we created a GEP with constant index, and the base is loop invariant,
		// then we swap the first one with it, so LICM can move constant GEP out
		hfinkelUnsubmitted Done Reply Inline Actions Then -> then could move -> can move hfinkel: Then -> then could move -> can move
		// later.
		GetElementPtrInst *FirstGEP = dyn_cast<GetElementPtrInst>(FirstResult);
		GetElementPtrInst *SecondGEP = dyn_cast<GetElementPtrInst>(ResultPtr);
		if (isSwapCandidate && isLegalToSwapOperand(FirstGEP, SecondGEP, L))
		swapGEPOperand(FirstGEP, SecondGEP);

if (ResultPtr->getType() != Variadic->getType())		if (ResultPtr->getType() != Variadic->getType())
ResultPtr = Builder.CreateBitCast(ResultPtr, Variadic->getType());		ResultPtr = Builder.CreateBitCast(ResultPtr, Variadic->getType());

Variadic->replaceAllUsesWith(ResultPtr);		Variadic->replaceAllUsesWith(ResultPtr);
Variadic->eraseFromParent();		Variadic->eraseFromParent();
}		}

void		void
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines
bool SeparateConstOffsetFromGEP::runOnFunction(Function &F) {		bool SeparateConstOffsetFromGEP::runOnFunction(Function &F) {
if (skipOptnoneFunction(F))		if (skipOptnoneFunction(F))
return false;		return false;

if (DisableSeparateConstOffsetFromGEP)		if (DisableSeparateConstOffsetFromGEP)
return false;		return false;

DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
		TLI = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
bool Changed = false;		bool Changed = false;
for (Function::iterator B = F.begin(), BE = F.end(); B != BE; ++B) {		for (Function::iterator B = F.begin(), BE = F.end(); B != BE; ++B) {
for (BasicBlock::iterator I = B->begin(), IE = B->end(); I != IE; ) {		for (BasicBlock::iterator I = B->begin(), IE = B->end(); I != IE;)
if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I++)) {		if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I++))
Changed \|= splitGEP(GEP);		Changed \|= splitGEP(GEP);
}
// No need to split GEP ConstantExprs because all its indices are constant		// No need to split GEP ConstantExprs because all its indices are constant
// already.		// already.
}		}
}

if (VerifyNoDeadCode)		if (VerifyNoDeadCode)
verifyNoDeadCode(F);		verifyNoDeadCode(F);

		hfinkelUnsubmitted Done Reply Inline Actions This seems unnecessary (see above). hfinkel: This seems unnecessary (see above).
return Changed;		return Changed;
}		}

void SeparateConstOffsetFromGEP::verifyNoDeadCode(Function &F) {		void SeparateConstOffsetFromGEP::verifyNoDeadCode(Function &F) {
for (auto &B : F) {		for (auto &B : F) {
for (auto &I : B) {		for (auto &I : B) {
if (isInstructionTriviallyDead(&I)) {		if (isInstructionTriviallyDead(&I)) {
std::string ErrMessage;		std::string ErrMessage;
raw_string_ostream RSO(ErrMessage);		raw_string_ostream RSO(ErrMessage);
RSO << "Dead instruction detected!\n" << I << "\n";		RSO << "Dead instruction detected!\n" << I << "\n";
llvm_unreachable(RSO.str().c_str());		llvm_unreachable(RSO.str().c_str());
}		}
}		}
}		}
}		}

		bool SeparateConstOffsetFromGEP::isLegalToSwapOperand(
		GetElementPtrInst FirstGEP, GetElementPtrInst SecondGEP, Loop *CurLoop) {
		if (!FirstGEP \|\| !FirstGEP->hasOneUse())
		qcolombetUnsubmitted Done Reply Inline Actions hasOneUse qcolombet: hasOneUse
		return false;

		if (!SecondGEP \|\| FirstGEP->getParent() != SecondGEP->getParent())
		return false;

		if (FirstGEP == SecondGEP)
		return false;

		unsigned FirstNum = FirstGEP->getNumOperands();
		unsigned SecondNum = SecondGEP->getNumOperands();
		// Give up if the number of operands are not 2.
		if (FirstNum != SecondNum \|\| FirstNum != 2)
		return false;

		Value *FirstBase = FirstGEP->getOperand(0);
		Value *SecondBase = SecondGEP->getOperand(0);
		Value *FirstOffset = FirstGEP->getOperand(1);
		// Give up if the index of the first GEP is loop invariant.
		if (CurLoop->isLoopInvariant(FirstOffset))
		return false;

		// Give up if base doesn't have same type.
		if (FirstBase->getType() != SecondBase->getType())
		return false;

		Instruction *FirstOffsetDef = dyn_cast<Instruction>(FirstOffset);

		// Check if the second operand of first GEP has constant coefficient.
		// For an example, for the following code, we won't gain anything by
		// hoisting the second GEP out because the second GEP can be folded away.
		// %scevgep.sum.ur159 = add i64 %idxprom48.ur, 256
		// %67 = shl i64 %scevgep.sum.ur159, 2
		// %uglygep160 = getelementptr i8* %65, i64 %67
		// %uglygep161 = getelementptr i8* %uglygep160, i64 -1024

		// Skip constant shift instruction which may be generated by Splitting GEPs.
		if (FirstOffsetDef && FirstOffsetDef->isShift() &&
		dyn_cast<ConstantInt>(FirstOffsetDef->getOperand(1)))
		FirstOffsetDef = dyn_cast<Instruction>(FirstOffsetDef->getOperand(0));

		// Give up if FirstOffsetDef is an Add or Sub with constant.
		// Because it may not profitable at all due to constant folding.
		if (FirstOffsetDef)
		if (BinaryOperator *BO = dyn_cast<BinaryOperator>(FirstOffsetDef)) {
		unsigned opc = BO->getOpcode();
		if ((opc == Instruction::Add \|\| opc == Instruction::Sub) &&
		(dyn_cast<ConstantInt>(BO->getOperand(0)) \|\|
		dyn_cast<ConstantInt>(BO->getOperand(1))))
		return false;
		}
		return true;
		}

		bool SeparateConstOffsetFromGEP::hasMoreThanOneUseInLoop(Value V, Loop L) {
		int UsesInLoop = 0;
		for (User *U : V->users()) {
		if (Instruction *User = dyn_cast<Instruction>(U))
		if (L->contains(User))
		if (++UsesInLoop > 1)
		return true;
		}
		return false;
		}

		void SeparateConstOffsetFromGEP::swapGEPOperand(GetElementPtrInst *First,
		GetElementPtrInst *Second) {
		Value *Offset1 = First->getOperand(1);
		Value *Offset2 = Second->getOperand(1);
		First->setOperand(1, Offset2);
		Second->setOperand(1, Offset1);

		// We changed p+o+c to p+c+o, p+c may not be inbound anymore.
		const DataLayout &DAL = First->getModule()->getDataLayout();
		APInt Offset(DAL.getPointerSizeInBits(
		cast<PointerType>(First->getType())->getAddressSpace()),
		0);
		Value *NewBase =
		First->stripAndAccumulateInBoundsConstantOffsets(DAL, Offset);
		uint64_t ObjectSize;
		if (!getObjectSize(NewBase, ObjectSize, DAL, TLI) \|\|
		Offset.ugt(ObjectSize)) {
		First->setIsInBounds(false);
		Second->setIsInBounds(false);
		} else
		First->setIsInBounds(true);
		}

test/CodeGen/AArch64/aarch64-loop-gep-opt.ll

This file was added.

				; RUN: llc -O3 -aarch64-gep-opt=true -print-after=codegenprepare -mcpu=cortex-a53 < %s >%t 2>&1 && FileCheck <%t %s
				; REQUIRES: asserts
				target triple = "aarch64--linux-android"

				%typeD = type { i32, i32, [256 x i32], [257 x i32] }

				; Function Attrs: noreturn nounwind uwtable
				define i32 @test1(%typeD* nocapture %s) {
				entry:
				; CHECK-LABEL: entry:
				; CHECK: %uglygep = getelementptr i8, i8* %0, i64 1032
				; CHECK: br label %do.body.i


				%tPos = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 0
				%k0 = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 1
				%.pre = load i32, i32* %tPos, align 4
				br label %do.body.i

				do.body.i:
				; CHECK-LABEL: do.body.i:
				; CHECK: %uglygep2 = getelementptr i8, i8* %uglygep, i64 %3
				; CHECK-NEXT: %4 = bitcast i8* %uglygep2 to i32*
				; CHECK-NOT: %uglygep2 = getelementptr i8, i8* %uglygep, i64 1032


				%0 = phi i32 [ 256, %entry ], [ %.be, %do.body.i.backedge ]
				%1 = phi i32 [ 0, %entry ], [ %.be6, %do.body.i.backedge ]
				%add.i = add nsw i32 %1, %0
				%shr.i = ashr i32 %add.i, 1
				%idxprom.i = sext i32 %shr.i to i64
				%arrayidx.i = getelementptr inbounds %typeD, %typeD* %s, i64 0, i32 3, i64 %idxprom.i
				%2 = load i32, i32* %arrayidx.i, align 4
				%cmp.i = icmp sle i32 %2, %.pre
				%na.1.i = select i1 %cmp.i, i32 %0, i32 %shr.i
				%nb.1.i = select i1 %cmp.i, i32 %shr.i, i32 %1
				%sub.i = sub nsw i32 %na.1.i, %nb.1.i
				%cmp1.i = icmp eq i32 %sub.i, 1
				br i1 %cmp1.i, label %fooo.exit, label %do.body.i.backedge

				do.body.i.backedge:
				%.be = phi i32 [ %na.1.i, %do.body.i ], [ 256, %fooo.exit ]
				%.be6 = phi i32 [ %nb.1.i, %do.body.i ], [ 0, %fooo.exit ]
				br label %do.body.i

				fooo.exit: ; preds = %do.body.i
				store i32 %nb.1.i, i32* %k0, align 4
				br label %do.body.i.backedge
				}

This is an archive of the discontinued LLVM Phabricator instance.

Swap loop invariant GEP with loop variant GEP to allow more LICM.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 32097

lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp

test/CodeGen/AArch64/aarch64-loop-gep-opt.ll

Swap loop invariant GEP with loop variant GEP to allow more LICM.
ClosedPublic